Lack of Diversity in Genetic Datasets is Risky for Treating Disease

Certain populations have been historically underrepresented in genome sequencing studies, but the NIH, private clinics, and 23andMe and other companies are trying to fix that.

Mar 21, 2019
Ashley Yeager
Alessia Ranciaro, a senior research scientist in the Tishkoff lab at the University of Pennsylvania, collects a skin reflectance reading from a genetics study participant from a Nilo-Saharan population. 
TISHKOFF LAB

Not too long ago, a couple came to see Neil Risch, a human geneticist at the University of California, San Francisco, in the hope that he could identify rare gene variants in their child, who had an undiagnosed disease. Not a problem, Risch thought. He’d sequence the exomes of the parents and child and run the data through a rare-variants database to identify the faulty genes underlying the child’s illness. The case wasn’t so straightforward, though. The parents didn’t have European ancestry but were descendants of a population not well represented in the database. 

“For individuals with genetic backgrounds not represented in [the database], there can be additional challenges in properly identifying genetic variants that cause the patient’s symptoms,” Risch says. Namely, it’s hard to identify any genetic link to symptoms, perhaps because the disease is caused by novel variants not yet identified as pathogenic. 

The case, Risch says, is not uncommon the clinical setting, where most of the data represents individuals with European ancestry. The same is true for the data that have, so far, been collected by the National Institutes of Health (NIH) and by companies such as 23andMe. In fact, a Nature comment published in 2016 estimated that about 80 percent of people in genetic studies were of European descent. And as of 2018, the proportions of individuals included in genome-wide association studies (GWAS) are 78 percent European, 10 percent Asian, 2 percent African, 1 percent Hispanic, with all other ethnicities representing less than 1 percent, University of Pennsylvania human geneticist Sarah Tishkoff and her colleagues write today (March 21) in a Cell commentary. That’s problematic, they write, because “the lack of ethnic diversity in human genomic studies means that our ability to translate genetic research into clinical practice or public health policy may be dangerously incomplete, or worse, mistaken.” 

At a very basic level, “if you really want to understand human biology, it makes sense to study the full breadth of human cultural and biological diversity,” Joanna Mountain, an anthropologist and senior director of research at 23andMe tells The Scientist. “If we can understand how [people have evolved culturally and biologically in their environment] around the world, we may be able to understand health and disease at a broader level. That’s one very, very core reason to diversify our genetic database.”

Others are doing the same, including scientists such as Tishkoff, clinical researchers such as Risch, and the NIH.

Hidden variants

This figure shows the distribution of ancestry of individuals in genome-wide association studies as of January 2019.
Sirugo et aL. 2019

When it comes to Mendelian diseases, where one gene variant typically causes a disease, the pathogenic variant in one population should be pathogenic in other populations. But, that’s not always the case. In cystic fibrosis (CF), for example, the gene variant that most often causes CF in Europeans is ΔF508 in the CFTR gene. This mutation accounts for more than 70 percent of CF in Europeans, but only 29 percent of cases in people of the African diaspora. Another mutation, 3120+1GàA explains between 15 percent and 65 percent of CF in South African patients with African ancestry, Tishkoff and her colleagues explain. Each of these variants leads to somewhat different forms of the disease, and based on the population differences, treatments could be different for each group, a fact that becomes apparent only when genetic diversity is included in research datasets.

Another reason for prioritizing diversity in genetic studies, Tishkoff tells The Scientist, is that it can lead to a new understanding of genes that underlie disease or new therapeutics that may not be discoverable in populations that are already well studied. In 2017, for example, Tishkoff and her colleagues published a study in Science revealing novel gene variants and one novel gene associated with skin pigmentation in a sample of 1,600 individuals from diverse African populations. The novel gene MFSD12 “plays a critical role in melanocyte development and production of pigments, and a recent study indicates it plays a role in skin cancer,” Tishkoff says. But before her team’s study, “nothing was known about this gene.”

Another example is the development of PCSK9 inhibitors, new drugs that lower cholesterol. A study of genes related to cholesterol revealed that certain mutations in the PCSK9 genes in some individuals with African descent led them to have low LDL cholesterol. These mutations, however, are extremely rare in Americans with European ancestry, so without studying individuals of African descent, the new cholesterol-controlling drugs might not have been developed.

This is not to suggest that any given ethnicity is homogeneous, and diversity within groups is a critical consideration. Take, for instance, G6PD deficiency, which can lead to red blood cell destruction in response to taking antimalarial drugs. In sub-Saharan Africa, G6PD deficiency can reach a frequency of 25 percent, particularly when treatments for malaria are given. Because the illness was exacerbated by an effective antimalarial drug combination, called chlorproguanil-dapsone, it was pulled from use, even though it could have been used safely in malaria-infected patients not enzymatically deficient in G6PD, Tishkoff and her colleagues note. 

“Lots of populations have their own genetic profile,” Risch says, and there are “some we are clearly missing.”

Steps forward

23andMe recognizes that its genetic diversity is limited and has been recruiting participants who live in the US and who have four grandparents born in any of 61 countries, including Angola, Uzbekistan, Sierra Leone, and Thailand, to broaden its reach. The company is also partnering, through company-awarded grants, with researchers who work in those countries to collect data from individuals who live there. 

People in these different countries are willing to contribute their data, but they want different benefits in return for their participation, notes Anjali Shastri, a research project manager at the company. In Rwanda, improving health might drive participation, while in Angola, individuals are really curious about their origins. Scientists work really closely with each population to develop incentives for participation to ensure everyone gains from the research, she says.

In 2018, the NIH launched a program called All of Us, with goal of creating a database of 1 million diverse participants’ health records—everything from their genetics to their electronic health records. So far, says Katie Baca-Motes, director of The Participant Center for the All of Us Research Program, nearly 100,000 people have completed the informed consent process, provided blood and urine samples, and linked their electronic health records, if they can. Of those, 50 percent represent populations that are traditionally underrepresented. 

“The program is really meant for everyone,” she says, noting that before enrollment even began, program heads were working with community leaders and other influential individuals to dispel any mistrust about what would be done with participants’ data. “In the past, there’s been distrust of the medical establishment and government,” she says, explaining that it’s a challenge the program continues to work to overcome.

“Certainly for the NIH, I think that’s a step in the right direction,” Tishkoff says. “One of the things that really impressed me about the All of Us study is that they’re moving away from race, which is really good. It’s not like, ‘Are you black, white, Hispanic?’ . . . They get into detail about ancestry, ‘Where did your parents come from? Where did your grandparents come from? Where were you born? Where do you live now?’ They ask questions about socioeconomic factors, ‘What ZIP code do you live in?’ Because we know that’s correlated with health.”

Beyond the genome

In her own research, Tishkoff has been working on expanding genetic datasets, specifically in eastern, southern and western Africa. She works with government and institutional ethical review boards in African countries, then with local geneticists, who interface with the populations and share with them what scientists wanted to learn about human health using data from tribes such as the San and Hadza. 

Most recently, she and her colleagues analyzed the gut microbiomes of individuals in the San and Hadza tribes, along with several other groups from Botswana and Tanzania, and compared them with similar data from a group of individuals in Philadelphia. The Botswanans’ microbiomes were more similar to the Philadelphians’ than the Tanzanians’ were, but there were big differences, especially in genetic pathways associated with degrading industrial compounds. In the US individuals’ gut microbiomes, certain pathways to break down industrial compounds, such as bisphenol, were more enriched than in the microbiomes of Botswanans and Tanzanians, suggesting selection in the US cohort for bacteria that can degrade these compounds. Botswanans’ microbiomes were more geared toward breaking down DDT, especially compared to the Tanzanians’, Tishkoff and her colleagues reported January 22 in Genome Biology.

“We know that microbiome composition can influence both health and disease,” Tishkoff says, and the results, she points out, bring up another issue about diversity and understanding human health. “It’s not just genomics. . . . We also look at variation in gene expression and epigenetics,” she explains. “Environment is important, and we can’t ignore that either, so I think getting as much information both about genetics and environment in ethnically diverse populations in the US and around the globe is important.”