ROBINSON ET AL., GENOME RES, 24:340–48, 2014. Last fall, the conclusion of the 1000 Genomes Project revealed 88 million variants in the human genome. What most of them mean for human health is unclear. Of the known associations between a genetic variant and disease, many are still tenuous at best. How can scientists determine which genes or genetic variants are truly detrimental?
Patients with rare diseases are often caught in the crosshairs of this uncertainty. By the time they have their genome, or portions of it, sequenced, they’ve endured countless physician visits...
Exome sequencing, which covers the 1 percent to 2 percent of the genome that codes for protein, typically turns up some 30,000 genetic variants, which need to be carefully assessed. Advances in bioinformatics tools have allowed researchers to rapidly whittle numerous variants or genes down to a manageable list. From there, other web-based platforms are helping investigators build a case for causation. These steps are important, Liston says, because testing a gene candidate in animal models or cell lines consumes a vast amount of resources.
The Scientist spoke with developers and users of these tools—all of them freely available and each of which takes a slightly different tack to build a case for causation.
EXOMISER
www.sanger.ac.uk/science/tools/exomiser
Launched in 2014, Exomiser is an open-source Java software package that filters and prioritizes candidate genes and variants from whole-exome or whole-genome sequencing data, with a special focus on phenotypic data. The program is a suite of various algorithms developed by the Monarch Initiative, a cross-institutional collaboration that builds bioinformatics tools to help scientists more easily navigate phenotypes, diseases, model systems, and genes in translational research.
How it works: Users enter a patient’s clinical features and exome into the program, and Exomiser generates a scored list of candidate variants based on how frequently a variant occurs in the broader population; the type of mutation and how disruptive it may be; and potentially related genes, which may be implicated in a particular disease or clinical feature. What sets Exomiser and other Monarch Initiative tools apart from others is that they also employ data from model organisms to predict whether a mutation is involved in the person’s disease, says Orion Buske, a computational biology graduate student in Michael Brudno’s lab at the University of Toronto.
At the heart of this capability is the Human Phenotype Ontology, a standardized vocabulary of more than 11,000 clinical signs and symptoms that has seen broad adoption in the genetics research community. Similar annotations for zebrafish, mice, Drosophila, and other model organisms enable Exomiser to draw connections between humans and other species. Such annotations also cast a wider net for functionality, Buske adds; while only about 35 percent of human genes have known associations with disease phenotypes, you can bump that up to 80 percent if you look across other species.
“We’re all conserved enough that that tells you something about humans in general,” Buske says. “It’s not perfect, but it’s way better than knowing nothing.”
Evaluating exome sequence data using human and model organism phenotypes improves diagnostic efficiency, according to a study published by Exomiser’s developers last year (Genet Med, doi:10.1038/gim.2015.137, 2015).
Getting started: Check out the Nature Protocols piece on how to install and use this tool (10:2004-15, 2015). Exomiser is a stand-alone application that can be downloaded and run on a single desktop computer, and is incorporated into the analysis pipeline at the National Institutes of Health’s Undiagnosed Diseases Network.
Considerations: Exomiser incorporates data from the 1000 Genomes Project and the Exome Variant Server. A new beta version of Exomiser also includes data from the Exome Aggregation Consortium (ExAC), a reference collection of exome data from 60,000 people.
CLINVAR
www.ncbi.nlm.nih.gov/clinvar
ClinVar is a publicly available database that curates genetic variants linked to diseases. Launched in 2013 and developed by the NIH’s National Center for Biotechnology Information, ClinVar has collected clinical interpretations of more than 125,000 unique variants from researchers and databases to date, says clinical geneticist Heidi Rehm, director of the Laboratory for Molecular Medicine at Partners Healthcare Personalized Medicine in Cambridge, Massachusetts.
ClinVar takes into account the inexact nature of determining a gene variant’s effect on health, with one research group saying it is benign, while another says it’s more serious. In addition, the categorizations themselves—for example, “likely pathogenic”—are more clearly defined and standardized in the tool.
How it works: ClinVar uses a star-based system to rate the review level of a given variant’s supposed (or interpreted) role in disease. A four-star rating is the highest, meaning the variant has been through a review process with multiple experts in the community weighing in on its interpretation and the supporting evidence. The upside of this detailed review process is that users can trust three- and four-star variants, Rehm says. However, only a small subset of the variants in the ClinVar database (3,800) fit into these categories.
More often, variants receive one star—usually based on a single submission providing an interpretation and rules for interpretation—or no stars, indicating that the submitter did not provide their interpretation criteria and attest to a comprehensive review of supporting evidence. One challenge facing the field is that most of the clinically significant variants (83 percent) in the ClinVar repository are unique to a particular family or are very rare, according to an analysis published last year by Rehm (N Engl J Med, 372:2235-42, 2015).
Getting started: To learn how to take full advantage of ClinVar, check out a recent detailed primer geared for users (Curr Protoc Hum Genet, doi:10.1002/0471142905.hg0816s89, 2016). A YouTube video explains the different search options. Because the usefulness of the tool relies on submissions, Rehm encourages labs to share data. A submission wizard tool can be found on ClinVar’s site.
Considerations: Although ClinVar aims to be everything you need, it isn’t just yet. That’s mainly because the database relies on voluntary submissions. “We’re still trying to convince all of the journals to require [sharing on] ClinVar as part of the publication,” Rehm says. She and her colleagues are working to mandate submission from clinical laboratories as well.
In the meantime, Rehm consults a variety of sources to research the potential clinical significance of candidate variants identified in patients’ genetic data. These include the Human Genetic Mutation Database (HGMD), which collects all of the variants that have been published in the literature. Although HGMD is poorly curated, “it’s at least helpful in trying to find the publications where your variant might be reported,” she says. She also still digs into disease-specific databases, some of them archival, for particular variants.
MATCHMAKER EXCHANGE
www.matchmakerexchange.org
Matchmaker Exchange is a network that connects the stand-alone databases focused on linking human genes and clinical features. Today the platform draws on three existing databases and will incorporate more in the future. In addition to collating information on gene–disease links, Matchmaker Exchange aims to join together researchers who are working on rare disease cases to share information and potentially collaborate. The ultimate goal is to help researchers build a more solid case for a causal gene and publish on it, because many rare disease gene candidates languish in the lab unpublished, notes Rehm, who led the development of the platform. The October 2015 issue of Human Mutation details the platform’s capabilities and utility (36:915-1019).
COURTESY OF ORION BUSKE
How it works: Choose among Matchmaker Exchange’s founding members:
GeneMatcher. Create an entry for the gene you’re interested in. If two people create an entry with the same gene name, the database (which is otherwise not searchable) will send them both an email. As of May, 4,459 genes had been submitted by 1,675 users from 55 countries. More than 5,200 matches have been made on some 1,200 genes. “This brings together people, knowing they can talk more in detail about the features of the patient [and] about variants of the gene they are studying,” says GeneMatcher codeveloper Nara Sobreira, an assistant professor of pediatrics at Johns Hopkins Medicine in Baltimore, Maryland. Patients can also use the tool, as can researchers using animal models, she adds.
PhenomeCentral. This tool for clinicians and scientists is phenotype-focused, allowing users investigating rare and unnamed disorders to be paired based on Human Phenotype Ontology vocabulary. It also incorporates the Exomiser software package to filter and prioritize genes for matching with other cases.
DECIPHER. This web-based database pulls together a variety of bioinformatics tools to help clinicians interpret variants and to pair up cases on the basis of shared variant and clinical data. There are several ways to make matches in the database. For example, nonregistered users can search DECIPHER’s open-access patient records covering 56,000 phenotypes, 1,200 sequence variants, and 28,000 copy number variants and contact the data submitter. Or you can set up your own project to share data from consenting patients and match with other researchers.
More databases are coming soon, including patient portals such as PEER, PatientKind, and ClinGen’s GenomeConnect, as well as Monarch Initiative’s model organism–focused database.
Getting started: You need to create a single log-in to any one of the three databases that are currently part of the network. Pick the one that best fits the data you have and the questions you want to ask. Creating an entry takes roughly 10 minutes, and you can elect which other databases you want to run your query against, Buske says.
Considerations: An ideal time to use Matchmaker Exchange is fairly late in the process, when you have a candidate gene or two for an individual or family that you are fairly certain is causal, but you want to find that second family to strengthen your case.
“That’s important because if everyone put seven or eight variants in Matchmaker Exchange, the chance that you hit some other person that has a case is substantial,” says Christopher Cassa, a geneticist at Brigham and Women’s Hospital in Boston. More hits bring more false positives, though. Cassa and his group have developed a tool called Rare Disease Match, which uses data from the Exome Aggregation Consortium to help predict the probability of observing a gene based on chance (Hum Mutat, 36:998-1003, 2015). They hope to integrate this tool with Matchmaker Exchange.
Despite the challenge of false-positives, Matchmaker Exchange stakeholders hope the tool can eventually be useful earlier in the workflow, Buske says, where, for example, matches can be made with less-complete data. The platform also aims to support whole-exome searching rather than single variants or genes of interest, though there are more privacy issues to iron out.
GENERAL TIPS: Whole-genome vs. whole-exome? Whole-exome sequencing is almost always the better choice, because it costs much less. Annotation of variants in noncoding regions of the genome is still not up to scratch, so most of that data are still unusable, says the University of Leuven’s Adrian Liston. Still, a head-to-head comparison of the approaches published last year found that whole-genome sequencing is more powerful for detecting candidates for rare diseases (PNAS, 112:5473-78, 2015). Use reference populations as similar as possible to your patient. Large reference sets like the 1000 Genomes Project may not serve as adequate filters for your case or cohort, if your population is not represented in these databases (e.g., if you are studying a relatively isolated village that is not sampled in one of these big projects), so you’ll have to start building your own reference database or find someone else who has one. Analyze families whenever possible. Sequence the exomes of parents and, ideally, a healthy sibling. “This is essential if you’re going to look for de novo mutations,” Liston says. Read and read more. Bioinformatics tools should be used to help you narrow down the candidates to a list that you can manage by reading papers on those genes. Prioritization is a team effort. Taking a short list of candidates down from 10 to just 1 or 2 is a subjective decision-making process. Assemble a team of experts, including those well versed in a particular disease process, to help you decide which candidates to take further. A team like this should meet multiple times throughout the process, making decisions about, for example, when to sequence a particular variant (Sanger validation) in additional family members, says Christopher Cassa of Brigham and Women’s Hospital in Boston. Build a case for publication. When he’s reviewing papers, Liston looks for a second, independent family that is affected by the same variant and gene as the original patient. If you don’t have one, you’ll need to pursue other supporting evidence, such as from cell lines or animal models. |