About a month before a New York Academy of Sciences (NYAS) meeting last February, six of the scheduled speakers received an unusual homework assignment. Each was asked to apply the sequences of two animal microRNAs to an algorithm that his team had developed for predicting messenger RNA binding targets.
The assignment was aimed at making the NYAS meeting, which focused on such algorithms, "a bit more interesting," says organizer Thomas Tuschl, who heads Rockefeller University's laboratory of RNA molecular biology. But a side-by-side comparison of the bioinformatic strategies was also timely. Since 2001, researchers have combed through animal, plant, and viral genomes to ferret out hundreds of genes that encode microRNAs.1 Approximately 22 nucleotides long, these RNA snippets cause translational repression in animals after they bind to target sites in mRNAs' 3' untranslated regions (UTRs). During the past year, about half a dozen groups have reported on algorithms that generate candidate binding sites in vertebrates.23456
The impetus behind these bioinformatic studies is obvious. In humans, microRNAs are now thought to influence as many as 30% of all genes.4 "You need to know the targets to understand what the microRNAs are doing," says Nikolaus Rajewsky, an assistant professor of biology and mathematics at New York University who coorganized the NYAS meeting.
At the end of the meeting, Rajewsky announced the homework assignment's results to an overflowing audience. For each micro-RNA, the speakers' algorithms had generated datasets that ranged from one target to hundreds. And between most of the datasets, "there was virtually no overlap, which was really amazing," recalls Frank J. Slack, a Yale University associate professor of molecular, cellular, and developmental biology who spoke at the meeting but did not participate in the target-generating exercise.
Rajewsky stresses that the event was not a competition. Because the targets were not validated, one cannot conclude that "any algorithm is better than any other one," he says. But validation – or experimental support, as some researchers prefer to call it – is clearly needed to make the bioinformatic analyses more than exercises in academic ingenuity. Some investigators have accordingly begun to examine sizeable numbers of microRNA–mRNA interactions in mammalian cell cultures, and less wide-ranging animal studies are also under way.
"It's going to take a couple of years for things to shake down on what the real biological targets [of microRNAs] are," predicts Debora S. Marks, a systems biology researcher at Harvard Medical School who has collaborated with Chris Sander, of Memorial Sloan-Kettering Cancer Center in New York, to compute more than 4,000 mammalian targets. Meanwhile, microRNAs' significance is increasingly apparent.1 Slack recently showed, for example, that the let-7 microRNA targets the
Animal microRNAs bind to 3' UTRs partly through Watson-Crick base pairing. Target identification is particularly amenable to bioinformatics, says Rajewsky, because "a lot of the target recognition is based on primary sequence, and therefore it is, at least to some extent, computable." He contrasts the task with the more empirical challenge of locating the DNA sites to which transcription factors bind. David P. Bartel, a biology professor at the Whitehead Institute for Biomedical Research and the Massachusetts Institute of Technology, even concludes that, for microRNAs, the informatic approach has been "much more convincing than the experimental. And that sounds funny for an experimentalist to say."
The basic computational problem is that microRNAs usually don't bind to long complementary stretches in animals, as they tend to do in plants; loop-outs and non-Watson-Crick base-pairing occur. Each bioinformatics team grapples with this difficulty in a different way, which helps explain the lack of overlap between the homework answers presented at the NYAS meeting. Algorithms diverge in their assumptions about issues such as which nucleotides are critical to binding, and in the weight that they accord to factors such as binding energy.
The latest version of the TargetScan algorithm, devised by Bartel, MIT associate professor of biology Christopher B. Burge, and graduate student Benjamin P. Lewis, relies heavily on perfect Watson-Crick pairing between the UTR and the microRNA's 5' "seed" region containing the six nucleotides at positions 2–7. Burge acknowledges that the 5' emphasis, though supported by some experimental evidence, is controversial. He asserts that in mammals, a microRNA's 3' end is essential to target-binding in relatively few cases. Slack is skeptical, noting a well-established class of microRNAs in lower animals that requires strong 3' pairing but can tolerate imperfect seed matches. He also contends, "There's no experimental evidence really in mammals yet" that the seed-only class of interactions is "biologically relevant."
Using their seed-weighted algorithm and monitoring conservation among five genomes, Bartel, Burge, and Lewis have designated thousands of human genes as candidate microRNA targets.4 "Our statement isn't that gene X is regulated by microRNA Y 100%," notes Burge. "We predict a set of microRNA-target interactions, and that 80% or so of those are correct."
Researchers estimate such probabilities by calculating an algorithm's false-positive rate. In TargetScan's case, for each mRNA sequence matching a microRNA seed, the MIT team found at least five hexamers that were similarly abundant in the human UTR dataset but that lacked a known or suspected function; thus, other seed matches were excluded. These control sequences were then fed into the algorithm to generate target matches. This procedure, Burge explains, "gives us an estimate of the amount of conservation that you would see in the absence of any connection to microRNAs." In other words, the false-positive rate indicates a background level of hexamer conservation. The MIT team estimated that 5,300 microRNA targets exceeded that level.4
Courtesy of Nikolaus Rajewsky
When vertebrate microRNAs bind to their mRNA targets, inexact complementarity results in loop-outs and G-U base pairing. Such features characterize the hypothesized coupling between the microRNA miR-375 and the 3' untranslated region (UTR) of the myotrophin gene, an interaction suspected of regulating insulin secretion (M.N. Poy et al.,
Rajewsky considers his PicTar algorithm similar to the latest version of TargetScan. He says that if he feeds a particular microRNA into the two programs, the output overlaps about 90%. But he also cites distinctive qualities of his algorithm: It places some emphasis on the entire duplex, not just the seed region; allows for very limited non-base-pairing in the seed region; and uses a hidden Markov model, a statistical method, to model how coexpressed microRNAs can compete for binding within the same UTR. This past month, Rajewsky, graduate students Azra Krek and Dominic Grün, and colleagues reported that each vertebrate microRNA targets, on average, roughly 200 transcripts.6
Like PicTar, the Marks/Sander model, dubbed miRanda, can detect what Marks refers to as combinatorics (many microRNAs per UTR) and multiplicity (many targets per microRNA).3 Besides sequence complementarity, input into miRanda includes the binding energy of the putative microRNA–target complex. Marks explains that the binding-energy component involves more than a Gibbs free-energy (ΔG) calculation because the microRNA isn't really free; the protein argonaute grips it in a structure clarified by recent crystallization experiments. "Now that we understand a little bit more how the microRNA is held, we can incorporate that into energy calculations," she says. Other information that she plans to incorporate into miRanda includes UTRs' three-dimensional structures, and the tissues and cell types that express specific microRNAs.
Not only miRanda but also all the algorithms are continually evolving as microRNA knowledge advances, more genomes are sequenced, and genome annotation improves. As a result, the targets listed in a group's latest publication or on its Web site can differ substantially from those presented earlier. "We're certainly not embarrassed about that; we're proud of it," states Marks, declaring that the changes reflect improvements in her algorithm.
Some target-seeking approaches, however, haven't had time to undergo this public metamorphosis. One such method just appeared in
A pattern-discovery algorithm from IBM's Thomas J. Watson Research Center in Yorktown Heights, NY, also finds candidate UTR binding sites without requiring prior input on which particular microRNAs might target those sites. Isidore Rigoutsos, manager of the center's bioinformatics group, developed the strategy, named "rna22," with associate Kevin Miranda and described it at the NYAS meeting. A paper on rna22 is under review, Rigoutsos says.
The algorithm detects patterns shared by already-known animal and plant microRNAs and their precursors. After using these patterns to discover new microRNAs, it seeks their reverse complementary sequences to predict a whopping 100,000-plus target sites in humans. Then it relies on binding energies to guess which microRNAs and target sites might form complexes.
TINY TUMOR SUPPRESSOR?
© 2005 Elsevier
In cultured human liver cells, the let-7 microRNA down-regulates translation of the
Rigoutsos recounts that he recently supplied rna22 with a comprehensive set of the microRNAs known as of January 2004, and the algorithm was able to identify 625 of the 701 new microRNAs that scientists reported over the next 11 months. But its success at target prediction – and indeed the success of all the other algorithms – is unfathomable in the absence of large-scale target validation.
Observing that higher animals are generally not amenable to high-throughput approaches, microRNA researchers say that few targets have been tested in vertebrates. In an attempt to verify hundreds of suspected microRNA-target interactions, Tuschl and postdoc Markus Landthaler have adopted a laborious strategy used by several other labs. Landthaler joins a candidate UTR to a luciferase reporter gene and transfects the construct into human cells whose microRNA populations are already characterized.
Transfection, of course, comes with several caveats: Effects might occur indirectly, not through hypothesized microRNA-target interactions. Alternatively, putative targets might accumulate in transfected cells at nonphysiological concentrations. Bartel notes that a positive finding – microRNA repression of luciferase activity – is vulnerable to the criticism, "How do you know that, in the body, the microRNA and the mRNA are ever in the same cell?" And Tuschl acknowledges that a negative finding would not prove that binding is impossible but might merely show that "you need additional factors that are not present in the cell where you're testing the interaction."
Consequently, Tuschl says he expects to find that "something that's a microRNA target in one cell type will not be a microRNA target in another cell type" that lacks the requisite factors. Another complication is the possibility that several different microRNAs, whose presence varies among different cell lines, might need to bind simultaneously to a particular UTR to inhibit luciferase activity.
A new paper suggests another conceivable way of validating microRNA targets on a large scale. After transfecting either of two microRNAs into human HeLa cells, investigators detected lower levels of about 135 mRNAs on microarrays.9 These mRNAs were presumably the microRNAs' targets. But before microarray analysis can become a touchstone of microRNA-target interactions, two issues must be addressed: How widespread is microRNA-mediated downregulation of mRNAs? And how robust is the effect?
Other validating efforts focus on one microRNA or target at a time. Bartel says his lab is applying a technique to animals that he and others have used to create dramatic developmental phenotypes in plants. The idea is to generate transgenic organisms in which a suspected microRNA-binding site bears subtle mutations that disrupt seed pairing.
Meanwhile, Slack's work demonstrates how a particular microRNA-target interaction can be validated on several levels.7 After a computational scan of the
Even as the validity of most putative targets remains unproven, their numbers should continue to expand. One reason, according to researchers, is that some human microRNAs have not been discovered yet, so their targets are similarly unknown. Investigators also note that they have yet to look carefully for potential targets that lie in open reading frames or are species-specific. Says Bartel, "It's soon going to be the case where it's going to be more unusual not to be a microRNA target than to be a microRNA target."