One of the lesser-known differences between people is that some individuals have bigger or smaller proteins than others. That’s because the genes that code for these proteins have repeating regions of DNA that can occur different numbers of times. And each repeat adds an extra string of amino acids to the protein.
The most comprehensive analysis to date of these genetic stutters—called variable numbers of tandem repeats (VNTR)–now shows they assert a strong influence on traits such as height and baldness. The findings, published today (September 23) in Science, could help to explain some of what is known as missing heritability: that known genetic variations in humans cannot account for much of the heritability of diseases, behaviors, and other phenotypes.
“If you have a short version of the gene, you maybe have 1000 amino acids in your protein, and if you have a long version of this gene, you may have 2000,” says Ronen Mukamel, a geneticist at the Broad Institute in Cambridge, Massachusetts, and coauthor on the study. “Literally the protein is bigger or smaller depending on whether that gene is bigger or smaller—if its repetitive element is longer or shorter.”
Geneticists have known about the possible impact of VNTRs for years, but these regions are harder to analyze directly than simpler sources of genetic variation such as single nucleotide polymorphisms (SNPs). One reason is that high-throughput genome sequencing techniques typically break DNA strands down into small fragments and then piece the subsequent sequences back together. Repetitive elements like VNTRs muddle this assembly process.
To get around this problem, Mukamel and his colleagues used a statistical technique to indirectly estimate the size of VNTRs from existing DNA sequencing and SNP data.
That’s a good way forward, says Keeley Brookes, a geneticist at Nottingham Trent University in the UK who was not involved in the project. A lack of high-throughput techniques that can detect VNTRs has held back the study of them until now, she says, adding, “I’ve moved away from NTR research and into SNPs, just because the technology is there to look for them on a genome-wide scale.”
Mukamel and his colleagues used their technique to analyze the effect of VNTRs in 118 protein-coding genes in more than 400,000 participants in the UK Biobank project, a database containing detailed genetic and health information. They then checked for associations between the length of these VNTRs and 786 different phenotypes.
“We found that at a handful of genes, the size of the protein controlled by one of these repetitive regions of the genome is strongly linked to a human phenotype,” Mukamel says. In several instances, he says, the influence of the VNTR was larger than of any other known genetic variant.
Altogether, the researchers found strong links between 19 phenotypes and five distinct VNTRs. Potentially health-relevant characteristics affected by VNTRs included elevated lipoprotein(a) levels, a major risk factor for coronary artery disease, and several traits associated with kidney function, including gout and increased levels of serum urea.
Height was one of the clearest signals. Varying length of a VNTR in the gene ACAN—which codes for the protein aggrecan—was associated with an average height difference among people of 3.2 cm, the study showed. Longer VNTRs discovered in the study more than double the size of an aggrecan domain, which is known from previous studies to enable chemical modifications that help the extracellular matrix hold water. But the researchers say it’s not clear how this variation in water-holding capacity might affect height.
Bjarni Halldorsson, a geneticist at deCODE in Iceland, worked on a study published earlier this year that used a different technique called long read sequencing to identify a large effect of VNTRs in genes such as ACAN on phenotypes including height. Such discoveries could add to the range of genetic tests available, he says, which seek to assess the likelihood that someone could develop clinical and nonclinical traits, including height. “Already, there are these companies that have these genetic tests. They’re probably not testing for this variant, but in theory it’s not difficult.”
The ability to detect VNTR length is already useful in the clinic, Mukamel says, because it can be used to stratify patients recruited for clinical trials by their genetic risk of high lipoprotein(a).