Genetic Variant Classification: Challenges and Advancements

Yuya Kobayashi from Invitae explains the difficulties scientists face when classifying sequence variants and discusses how innovative approaches help overcome them.

An illustration of several aligned DNA sequences.

Accurately classifying genetic variants is vital for understanding disease pathogenicity and ensuring patients receive appropriate treatments.

©iStock, zmeel

Register for free to listen to this article
Listen with Speechify
0:00
6:00
Share

Genetic variation is the foundation of human diversity, enabling differences in traits such as height, eye color, or blood type. Some sequence variants also cause inherited diseases, including sickle cell anemia, cystic fibrosis, and mucopolysaccharidosis type III. However, it is often difficult for scientists to identify which variants are responsible for a pathological condition.

In this Innovation Spotlight, Yuya Kobayashi, a clinical genomic scientist at Invitae, discusses how clinical geneticists classify and reclassify variants and how artificial intelligence (AI) helps improve genetic testing.

A headshot of Yuya Kobayashi. Image Credit: Curtis Kautzer on behalf of Invitae
Yuya Kobayashi, PhD
Senior Program Manager
Variant Classification Systems
Invitae

What is the general framework used for classifying genetic variants?

In 2015, the American College of Medical Genetics and Genomics (ACMG) and the Association for Molecular Pathology (AMP) published joint consensus guidelines for classifying germline genetic variants.1 These recommendations, known as the ACMG guidelines, provide a standardized approach for clinical geneticists to determine whether there is sufficient evidence to classify a variant as pathogenic or benign.

The ACMG guidelines established three parameters. First, it defined the types of evidence to be considered. Second, it established the value or weight of each piece of evidence and how to combine them to reach one of five classification tiers: pathogenic, likely pathogenic, variant of uncertain significance (VUS), likely benign, or benign. Finally, the guidelines designated target confidence thresholds for each one of those tiers, with 90 percent confidence as the threshold for classifying variants as likely pathogenic or likely benign.

In our recent JAMA Network Open study, we used historical variant classification data of more than two million genetic variants over an eight-year period to determine how well the current variant classification system lined up with these confidence threshold targets.2 By looking at how the classification of a variant evolved over time, we could estimate the accuracy of the original classifications.

What is Sherloc and how accurate are its variant classifications?

Sherloc (semiquantitative, hierarchical evidence-based rules for locus interpretation) is an ACMG guidelines-compliant, peer-reviewed, and clinically validated variant classification system that defines how to apply the ACMG guidelines in a more concrete and granular way.3 For example, the ACMG guidelines state that a variant that is more frequent in the general population than expected for a disease should be classified as benign, but it does not define what should be expected. A system like Sherloc fills in such gaps with analytical tools and geneticist-defined rules. Importantly, Sherloc is a system that can evolve over time as our knowledge of genetics and available technology improves.

All two million variants in our study had been classified using Sherloc, so examining how those classifications changed over time gave us a way to estimate the accuracy of its initial classifications.2 Our findings show that when Sherloc classifies a variant as likely pathogenic or likely benign, new data confirms it 99.9 percent of the time. This suggests that the accuracy achieved by an ACMG guidelines-compliant system, such as Sherloc, far exceeds the 90 percent confidence target set by ACMG/AMP.

Why is the reclassification of genetic variants necessary in clinical genomics?

The human genome is approximately three billion base pairs in length, which means there are many possible genetic variants, and any given variant has a low probability of having been well-studied or widely observed. We often have limited data about a genetic variant and as a result, approximately half of genetic variants encountered are initially classified as VUS. However, as more patients undergo testing and experimental study methodology improves, new data allows us to re-evaluate previously classified variants.

Our study found that nearly all the reclassifications either confirmed the likely pathogenic and likely benign variants as pathogenic and benign, respectively, or converted a VUS to a more definitive classification.2 Consistent with other studies, about 80 percent of those reclassified VUS ended up as likely benign or benign. Only in very rare instances, about 0.06 percent of reclassifications, did we see situations where new evidence reversed the original classification (e.g., from benign to pathogenic, or vice versa).

A VUS result can be frustrating because it does not offer the patient or clinician an actionable answer. These reclassifications could mean the opportunity to receive proper surveillance regimens or treatments. In some cases, a reclassification can offer peace of mind for the patient by confirming a benign result and reducing unnecessary medical interventions. Ultimately, the ability to provide a more definitive result paves the way for precision medicine, leading to more appropriate targeted care.

What approaches helped reclassify VUS into definitive categories?

An illustration of a DNA molecule emerging from a circuit board.
In their new study, Kobayashi and his colleagues determined that most VUS reclassifications resulted from scientists leveraging machine learning tools to reanalyze existing datasets.
©iStock, BlackJack3D

Our study identified three primary strategies that contributed to the reclassification of VUS.2 The first strategy was to rely on new data collected from additional patient tests or publicly available datasets, which contributed to 30 percent of VUS reclassifications. The second strategy involved generating data with the purpose of resolving VUS, such as testing additional family members to conduct segregation analysis or testing a patient’s RNA to better understand the molecular impact of variants. This strategy accounted for 10 percent of reclassifications.

Surprisingly, the biggest cause of VUS reclassification was not the result of new data but the application of machine learning (ML) to reanalyze existing data. These ML tools allowed us to more accurately measure the importance of each piece of evidence, which in turn helped us reach a more definitive conclusion. Importantly, the ML approaches that made a significant impact on VUS reclassifications were those co-developed by clinical geneticists, who have a deep understanding of the data complexities, and AI scientists.

What implications do your findings have for advancing genetic testing practices?

Our study’s key finding is that the accuracy of current variant classifications is generally extremely high and exceeds the target definitions set by the ACMG guidelines.2 However, this implies that a significant number of variants are being classified as VUS, despite exceeding the 90 percent confidence target for likely benign and likely pathogenic. This gap highlights the need for improved communication about the degree of confidence in genetic test results and a better understanding of how they should be handled in clinical care.

The other notable finding is that even with these strict standards for a non-VUS classification, we have made substantive progress in reducing VUS, particularly among historically underrepresented race, ethnicity, and ancestry groups, with ML tools as the key driver. This finding suggests that ML tools could provide a path forward toward improving equity in genetic testing. However, despite all the progress we have made, nine in ten variants classified as VUS remain unchanged today. Continued innovation in data analysis, including the use of ML and other AI approaches, will be essential to accelerate progress and improve equity in genetic testing.

What are the next steps for improving the processes and guidelines for variant classification in germline genetic testing?

The aspirational goal of our community has been to eventually transition to a quantitative classification framework that can output a variant’s probability of pathogenicity, rather than relying on the qualitative five-tier classifications we use today. Such a shift could sidestep the challenge of harmonizing the observed classification accuracy with the targeted accuracy.

AI and ML technologies are poised to play a significant role in this transition, as evidenced by their positive impact observed in our study. However, it is crucial that clinical geneticists guide the development and implementation of AI-driven systems to ensure they are used thoughtfully and appropriately. Establishing guidelines for how AI tools should be validated and incorporated into clinical settings will be a critical next step in advancing genetic testing practices, making them more accurate and accessible for all patients and clinicians.

Innovation Spotlight


You might also be interested in...
Loading Next Article...
You might also be interested in...
Loading Next Article...
3D illustration of a gold lipid nanoparticle with pink nucleic acid inside of it. Purple and teal spikes stick out from the lipid bilayer representing polyethylene glycol.
February 2025, Issue 1

A Nanoparticle Delivery System for Gene Therapy

A reimagined lipid vehicle for nucleic acids could overcome the limitations of current vectors.

View this Issue
Considerations for Cell-Based Assays in Immuno-Oncology Research

Considerations for Cell-Based Assays in Immuno-Oncology Research

Lonza
An illustration of animal and tree silhouettes.

From Water Bears to Grizzly Bears: Unusual Animal Models

Taconic Biosciences
Sex Differences in Neurological Research

Sex Differences in Neurological Research

bit.bio logo
New Frontiers in Vaccine Development

New Frontiers in Vaccine Development

Sino

Products

Tecan Logo

Tecan introduces Veya: bringing digital, scalable automation to labs worldwide

Explore a Concise Guide to Optimizing Viral Transduction

A Visual Guide to Lentiviral Gene Delivery

Takara Bio
Inventia Life Science

Inventia Life Science Launches RASTRUM™ Allegro to Revolutionize High-Throughput 3D Cell Culture for Drug Discovery and Disease Research

An illustration of differently shaped viruses.

Detecting Novel Viruses Using a Comprehensive Enrichment Panel

Twist Bio