ISTOCK, JEZPERKLAUZENIn just 15 years, we have gone from the first rough draft of the human genome to early successes in using genomic information to improve patient care. The rate of progress has been astonishing. But we still have a long way to go before we realize the full potential of routine genome sequencing in the clinic. Perhaps the single most important element we need right now is better context for genome interpretation—access to the relevant data that help medical geneticists and other healthcare professionals pinpoint disease-causing DNA variants, and translate DNA variants into actionable insights that help ensure each patient receives the best course of treatment.
That’s why a collection of academic and commercial organizations, my company among them, banded together to form the Allele Frequency Community, a group we believe will help deliver critical context needed for sequence interpretation to the biomedical community—all built upon effective, responsible data sharing. This community-driven effort is designed to make clinically useful genomic information broadly available to researchers and clinicians alike.
Current genomic databases are extremely valuable for research, but often lack a diverse representation of ethnic groups from around the globe. Clinical geneticists trying to determine whether a particular variant is relevant for a patient from one of these underrepresented ethnic groups face a serious challenge, as simple approaches can yield a number of “red herring” variants thought to be disease-causing that are actually benign.
Let’s say a clinical geneticist is interpreting variants from a patient’s genome to try to find the cause of a rare disease. The geneticist finds several variants that appear to be good candidates because they are not listed in existing genomic variant databases, and thus presumed to be rare in the patient population at large. However, this patient if of an ancestry that is underrepresented in these public sequence databases. If the geneticist had access to a database of variants common among the patient’s specific ethnic group, she would see that these variants exist in high frequencies in that specific population. If these data are not available to her, she must invest a significant amount of time looking into these false positives, diverting resources from the real causative variant.
With so many groups around the world generating genomic data at a breakneck pace, it’s tempting to assume that this limitation will quickly fall away as the data volume grows. But we have seen time and again that unless key information from these data sets is pooled, there is little community benefit. Academic institutions have strict privacy protocols governing genomic data, so even when scientists want to share data across organizations, they are often prevented by logistics.
The founding members of the Allele Frequency Community have proposed a way to share clinically useful reference data while protecting patient anonymity. Allele frequencies, or the rate at which certain DNA variants are seen in certain populations, are a great target for data sharing for the common good. Scientists and clinicians who have generated genomic data sets can safely share only these pooled, anonymized frequencies from across many patients, which would be aggregated with similar statistics from many other sites around the world to be truly anonymous—safeguarding patient privacy as well as research interests and findings of individual groups. This makes it easier to share, delivers valuable insights, and fully protects patient privacy.
The Allele Frequency Community has built in an incentive mechanism to help the resource to grow in value over time as it is used. Any scientist may join and access the Allele Frequency Community. In doing so, she receives the benefit of all the pooled allele frequencies from the rest of the community. In return, she agrees that anonymous, pooled frequency statistics can be computed from across her own samples to improve this shared resource. This virtuous cycle has enabled the Allele Frequency Community to grow rapidly. Since it launched earlier this year, hundreds of community members around the world have joined, growing the resource to more than 104,000 samples—including more than 13,000 whole genomes, and representing more than 100 countries of origin. Just a few months old, this community has already become the world’s largest source of ethnically diverse genomic data.
In the months since the community launched, participating scientists have reported an average false-positive reduction of 43 percent, showing how effective this approach can be in avoiding the “red herring” variants during genome interpretation..
The impact is growing as the Allele Frequency Community continues to expand worldwide. This is one of those precious cases in life where the cost of sharing is low, and the societal benefits are significant. Together, by sharing pooled, anonymized information about human genetic variation, we are making genome sequencing more useful and actionable for healthcare professionals—for the benefit of all patients.
Ramon Felciano is the chief technology officer and vice president, Global Strategy, at QIAGEN Bioinformatics.