Study Tracks Geographical Gene Flow and Ancestry in the US
Study Tracks Geographical Gene Flow and Ancestry in the US

Study Tracks Geographical Gene Flow and Ancestry in the US

The analysis adds new details to the picture of migration and mixing in a diverse country.

Shawna Williams
Shawna Williams
Sep 1, 2020


When Chengzhen Dai set out to investigate the influence of US geography on human genetics a few years ago, the study made a somewhat unusual addition to the work of MIT’s SENSEable City Lab, whose projects typically focus on solar power, climate, waste streams, and other urban questions. But Dai saw it as fitting. Researchers at the lab, where he was doing his master’s, are interested in how humans move around and interact with one another, he explains. As he and his colleagues planned the study, “we had the hypothesis that cities, and in a broader sense, geography has played a major role in how ancestry and admixture occurs.”

Dai, now a software engineer at the Institute for Systems Biology in Seattle, and his advisor, designer and engineer Carlo Ratti, teamed up with population geneticist Alicia Martin of the Broad Institute and other colleagues to test their hypothesis using data from National Geographic’s Genographic Project, a now-discontinued effort to sequence genomes from around the world to track migration patterns.

Better knowledge of human genetic diversity and patterns of admixture is important not only for understand­ing individuals’ origins, but also for harnessing genetic findings in clini­cal practice.

Focusing on the genomes of 32,589 participants who’d provided a postal code, the team compared single-nucleotide polymorphisms (SNPs) in those genomes with the SNPs present in reference genomes curated by the 1000 Genomes Project, a UK-based initiative that compiled genomic data on 2,504 people representing 26 populations worldwide. Those SNP data revealed substantial diversity of ancestral origins for each demographic group. For example, Hispanics and Latinos tended to have a mixture of African, European, and Native American ancestry, but the makeup and proportions of these ancestries varied widely among individuals.

The researchers also looked for unbroken chunks of the genome, known as haplotypes, that were shared among two or more members of the Genographic cohort. This approach can reveal when two people share a common ancestor within the past 10 to 15 generations, says Gillian Belbin, a population geneticist at Mount Sinai Hospital’s Institute for Genomic Health who uses approaches that explore relat-edness based on these segments but was not involved in the new study. It’s similar to what commercial testing companies use to point their customers toward distant relatives in their databases. At a population level, discerning relatedness in this way can reveal the effects of factors such as migrations and admixture that have shaped a group over recent history, she adds.

STOP AND GO: Regions of higher- (blue) and lower-frequency (brown) migration, as inferred from the genomes of African-Americans (A), Hispanics and Latinos (B), and European-Americans (C).
AM J HUM GENET, 106:371–88, 2020

As Dai and his colleagues had predicted, genetic relatedness turned out to correlate with geography, although the drivers of that relationship seemed to vary among demographic groups. Among African-Americans, for example, there was broad genetic relatedness among people living along the East Coast, from Florida to Maine, indicating frequent migration within that area. But the researchers also found evidence of reduced migration through certain regions. Overall, the authors write, the patterns they found are consistent with movement trends during the Great Migration of the early- to mid-20th century, in which millions of African-Americans left the South for cities in other areas of the country. Two other demographic groups the researchers analyzed—European-Americans, and Hispanics and Latinos—had their own distinctive patterns of migration.

Another geographic finding was of five distinct clusters of related people within the Hispanics/Latinos category, each of which tended to live in particular areas in the lower 48 states based on the postal codes they’d reported. Only one of these showed strong ancestral links to a place outside the continental US: A cluster made up mostly of people living in central Florida and the New York City area reported that most of their grandparents had been born in Puerto Rico. People in another of the clusters, who reported having grandparents born in the US, Mexico, and Cuba, among other countries, tended to live in southern Florida and parts of Texas and California. A third, containing people whose grandparents were predominantly born in the United States, lives mainly in New Mexico and Colorado, as well as parts of California. 

The findings of multiple clusters within the category of Hispanics/Latinos, Belbin says, “indicates that there’s some interesting gene flow and population structure perhaps . . . that is not being effectively captured by the self-reporting labels [for race or other demographic categories] that are available.” In her studies on Mount Sinai’s patient biobank, Belbin says she’s similarly found clusters of genetically related people that aren’t captured by existing demographic labels.

We had the hypoth­esis that cities, and in a broader sense, geography has played a major role in how ancestry and admixture occurs.

 —Chengzhen Dai, Institute for Systems Biology, Seat­tle  

While this is far from the first study of genetic diversity in the US, it’s exceptionally comprehensive, says Scott Williams, a population geneticist at Case Western Reserve University in Cleveland who was not involved in the work. “It’s quite amazing how much information was in there, and the analyses were really complete,” he says. For example, he notes, the researchers included people of East and South Asian descent living in the US, who have been left out of most studies of genetic variation and distribution.

The team found that people of East and South Asian descent in the US were on the whole less admixed than other groups, likely because they or their ancestors tend to be relatively recent arrivals in the country, says Dai. “In the United States, immigration wasn’t really amenable to [East] Asians and South Asians” until the mid-1960s, he explains. As a result of that lack of mixing, the study found, people of Asian descent in the study formed well-defined genetic clusters that roughly correlated with their ancestral countries of origin. “These are very . . . distinct populations, and they’re very diverse,” Dai says. “But a lot of times, genetic studies . . . group them as just like one continental ancestry,” mainly because of small sample sizes. This continent-level grouping is “not a very accurate way of reflecting and capturing the genetic diversity of these individuals,” he adds.

Better knowledge of human genetic diversity and patterns of admixture is important not only for understanding individuals’ origins, but also for harnessing genetic findings in clinical practice, Williams notes. “Until we know about these patterns of diversity, it becomes very difficult to translate genetic findings for risk of disease easily across populations.”