The Human Genome: RNA Machine

The Human Genome: RNA Machine Contrary to current dogma, most of the genome may be functional. John S. Mattick Related Articles 1 The idea of "junk DNA" is also based on the assumption that most genetic information is transacted via proteins, an assumption that dates back half a century to a time when the pioneers of molecular biology were studying bacteria, wherein most genes do indeed encode proteins. By contrast, protein coding sequences occupy only ~1.2% of the

Oct 1, 2007
John S. Mattick

The Human Genome: RNA Machine

Contrary to current dogma, most of the genome may be functional.

John S. Mattick

Related Articles

1 The idea of "junk DNA" is also based on the assumption that most genetic information is transacted via proteins, an assumption that dates back half a century to a time when the pioneers of molecular biology were studying bacteria, wherein most genes do indeed encode proteins. By contrast, protein coding sequences occupy only ~1.2% of the human genome.

John S Mattick is at the Institute for Molecular Bioscience, University of Queensland, Australia.

When introns were discovered 30 years ago it was immediately and universally assumed that these vast tracts of nonprotein-coding sequences within genes are nonfunctional, despite the fact they are transcribed. Their presence was rationalized as the leftovers of the early evolution of genes.2 At the same time, the finding that much of the mammalian genome (45% in humans) is derived from transposons, which are thought to be mainly parasitic hitchhikers, led to the related concept of "selfish DNA."3,4 This reinforced the increasingly conventional view that the genomes of complex eukaryotes largely comprise accumulated evolutionary debris.

The conclusion that most of the human genome is nonfunctional was further reinforced by estimations of the fraction of the genome that is conserved over evolutionary time. Human-mouse and later human-dog sequence comparisons indicated that only 5% (3%-8% depending on the parameters used) shows evidence of purifying (negative) selection.5,6 The remainder appears to have evolved "neutrally," that is, without obvious constraint, implying that it lacks significant function.

However, some big elephants have entered the room. First, it is clear that the number of protein-coding genes does not correlate with relative complexity. For example, simple nematode worms with only 1,000 cells have almost as many protein-coding genes (~19,300) as humans and other vertebrates (~20,000).1 Therefore there must be a great deal of important additional information, presumably mainly regulatory information, that lies outside of the exonic boundaries of conventional protein-coding genes.

Second, recent studies have shown that most of the genome of mammals and other complex organisms is transcribed in very complex patterns, mainly into nonprotein-coding RNAs (ncRNAs).7-13 The expression of these ncRNAs appears to be developmentally regulated, indicating that it serves a purpose.14 However, how can this vast amount of nonprotein-coding transcription be meaningful if only 5% is under evolutionary selection?

This contradiction was highlighted by the recent ENCODE project, whose objective was to identify all the functional elements in 1% of the human genome. The project's findings, released last spring, concluded that only around 5% of the sequences are evolutionarily constrained,15 suggesting that the remainder has evolved randomly over time and is unimportant, in keeping with orthodox thinking of genes and genetic function. That same project also confirmed that the majority of the genome is transcribed - as much as 93% in different cells - and that "surprisingly, many functional elements are seemingly unconstrained across mammalian evolution."16

Decades-old expectations that the human genome would encode hundreds of thousands if not millions of genes may turn out to be true after all.

In an attempt to reconcile these competing findings, the ENCODE investigators, which included myself among a consortium of 313 scientists, came to the consensus conclusion,16 although I amicably disagreed,17 that there must be "a large pool of neutral elements that are biochemically active but provide no specific benefit to the organism."16 This apparent contradiction can be easily resolved, however, if the neutral rate of primary sequence evolution, and therefore the fraction of the genome as a whole that is under constraint, has been wrongly estimated.18

Most estimates of the neutral rate of evolution are based on the rate of divergence of ancient transposon-derived sequences,5 often pejoratively referred to as repeats.18 and therefore may not provide a representative index of the rate of neutral evolution as is implicitly assumed.

Second, increasing evidence suggests that transposon-derived sequences have acquired function and are therefore subject to some degree of evolutionary selection.18,20 Indeed, the more ancient a sequence the more likely it is to have either become functional or been deleted, and therefore the less likely that the extant population is evolving neutrally.18 Both problems will lead to an underestimate (of unknown magnitude) of the true rate of neutral evolution and therefore of the extent of the genome under selection. Thus, it is likely that the 5% figure is based on incorrect assumptions and represents only the most highly conserved sequences, rather than a reliable estimate of how much of the genome is actually under functional constraint.18

These measures of conservation are also limited to functions common to all mammals and do not include adaptive changes in different lineages or sequences that are very plastic.5,21,22 It is clear that many functional sequences are evolving at different rates under quite different structure-function constraints than those of protein-coding sequences. For example, many promoter sequences and other gene-regulatory elements are known to be evolving quickly and to show a high rate of sequence turnover, even when their functions are conserved.14,34-36 including chromatin remodeling and epigenetic memory, transcription factor nuclear trafficking, and transcriptional activation or repression.14 If all these ncRNAs are functional, as the evidence increasingly suggests they may be, then much and perhaps most of the human genome is functional. If so, the genetic programming of the higher organisms has been fundamentally misunderstood for the past 50 years, because of the presumption - largely true in prokaryotes, but not in complex eukaryotes - that most genetic information is expressed as, and transacted by, proteins.

It now seems likely that the majority of the human genome and those of other complex organisms is devoted to a hidden RNA regulatory system.References

1. R.J. Taft et al., "The relationship between non-protein-coding DNA and eukaryotic complexity," Bioessays, 29:288-99, 2007.
2. W. Gilbert et al., "On the antiquity of introns," Cell, 46:151-4, 1986.
3. W.F. Doolittle C. Sapienza, "Selfish genes, the phenotype paradigm and genome evolution," Nature, 284:601-3, 1980.
4. L.E. Orgel F.H. Crick, "Selfish DNA: the ultimate parasite," Nature 284:604-7, 1980.
5. R.H. Waterston et al., "Initial sequencing and comparative analysis of the mouse genome," Nature, 420:520-62, 2002.
6. K. Lindblad-Toh et al., "Genome sequence, comparative analysis and haplotype structure of the domestic dog," Nature, 438:803-19, 2005.
7. P. Bertone et al., "Global identification of human transcribed sequences with genome tiling arrays," Science, 306:2242-6, 2004.
8. J. Cheng et al., "Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution," Science, 308:1149-54, 2005.
9. P. Kapranov et al., "Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays," Genome Res, 15:987-97, 2005.
10. P. Carninci et al., "The transcriptional landscape of the mammalian genome," Science, 309:1559-63, 2005.
11. T. Ravasi et al., "Experimental validation of the regulated expression of large numbers of non-coding RNAs from the mouse genome," Genome Res, 16:11-9, 2006.
12. J.R. Manak et al., "Biological function of unannotated transcription during the early development of Drosophila melanogaster," Nat Genet, 38:1151-8, 2006.
13. P. Kapranov et al., "RNA maps reveal new RNA classes and a possible function for pervasive transcription," Science, 316:1484-8, 2007.
14. J.S. Mattick I.V. Makunin, "Non-coding RNA," Hum Mol Genet, 15:R17-29, 2006.
15. E.H. Margulies et al., "Analyses of deep mammalian sequence alignments and constraint predictions for 1% of the human genome," Genome Res, 17:760-74, 2007.
16. The ENCODE Project Consortium, "Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project," Nature, 447:799-816, 2007.
17. E. Check, "Genome project turns up evolutionary surprises," Nature, 447:760-1, 2007.
18. M. Pheasant J.S. Mattick, "Raising the estimate of functional human sequences," Genome Res, 17:1245-53, 2007.
19. G.M. Cooper et al., "Distribution and intensity of constraint in mammalian genomic sequence," Genome Res, 15:901-13, 2005.
20. C.B. Lowe et al., "Thousands of human mobile element fragments undergo strong purifying selection near developmental genes," Proc Natl Acad Sci, 104:8005-10, 2007.
21. N.G. Smith et al., "Evidence for turnover of functional noncoding DNA in mammalian genome evolution," Genomics, 84:806-13, 2004.
22. M.S. Taylor et al., "Heterotachy in mammalian promoter evolution," PLoS Genet, 2:e30, 2006.
23. S. Fisher et al., "Conservation of RET regulatory function from human to zebrafish without sequence similarity," Science, 312:276-9, 2006.
24. M.C. Frith et al., "Evolutionary turnover of mammalian transcription start sites," Genome Res, 16:713-22, 2006.
25. K. Tsuritani et al., "Distinct class of putative "non-conserved" promoters in humans: comparative studies of alternative promoters of human and mouse genes," Genome Res, 17:1005-14, 2007.
26. K.C. Pang et al., "Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function," Trends Genet, 22:1-5, 2006.
27. S. Asthana et al., "Widely distributed noncoding purifying selection in the human genome," Proc Natl Acad Sci, 104:12410-5, 2007.
28. J. Ponjavic et al., "Functionality or transcriptional noise" Evidence for selection within long noncoding RNAs," Genome Res, 17:556-65, 2007.
29. J.S. Mattick I.V. Makunin, "Small regulatory RNAs in mammals," Hum Mol Genet, 14:R121-R32, 2005.
30. I. Bentwich et al., "Identification of hundreds of conserved and nonconserved human microRNAs," Nat Genet, 37:766-70, 2005.
31. E. Berezikov et al., "Diversity of microRNAs in human and chimpanzee brain," Nat Genet, 38:1375-7, 2006.
32. E. Berezikov et al., "Many novel mammalian microRNA candidates identified by extensive cloning and RAKE analysis," Genome Res, 16:1289-98, 2006.
33. J.S. Mattick, "A new paradigm for developmental biology," J Exp Biol, 210:1526-47, 2007.
34. S. Kishore S. Stamm, "The snoRNA HBII-52 regulates alternative splicing of the serotonin receptor 2C," Science, 311:230-2, 2006.
35. R. Louro et al., "Androgen responsive intronic non-coding RNAs," BMC Biol, 5:4, 2007.
36. H.I. Nakaya et al., "Genome mapping and expression analyses of human intronic noncoding RNAs reveal tissue-specific patterns and enrichment in genes related to regulation of transcription," Genome Biol, 8:R43, 2007.
37. E. Bernstein C.D. Allis, "RNA meets chromatin," Genes Dev, 19:1635-55, 2005.
38. J.A. Goodrich J.F. Kugel, "Non-coding-RNA regulators of RNA polymerase II transcription," Nat Rev Mol Cell Biol, 7:612-6, 2006.
39. J.L. Rinn et al., "Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs," Cell, 129:1311-23, 2007.
40. I. Martianov et al., "Repression of the human dihydrofolate reductase gene by a non-coding interfering transcript," Nature, 445:666-70, 2007.
41. D.P. Bartel, "MicroRNAs: genomics, biogenesis, mechanism, and function," Cell, 116:281-97, 2004.
42. N. Ishii et al., "Identification of a novel non-coding RNA, MIAT, that confers risk of myocardial infarction," J Hum Genet, 51:1087-99, 2006.
43. E.M. Reis et al., "Antisense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer," Oncogene, 23:6684-92, 2004.
44. M.F. Mehler J.S. Mattick, "Noncoding RNAs and RNA editing in brain development, functional diversification, and neurological disease," Physiol Rev, 87:799-823, 2007.
45. L. Lewejohann et al., "Role of a neuronal small non-messenger RNA: behavioural alterations in BC1 RNA-deleted mice," Behav Brain Res, 154:273-89, 2004.
46. J.S. Mattick M.J. Gagen, "The evolution of controlled multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms," Mol Biol Evol, 18:1611-30, 2001.
47. M. Kimura, "Evolutionary rate at the molecular level," Nature, 217:624-6, 1968.
48. A.P. Bird, "Gene number, noise reduction and biological complexity," Trends Genet, 11:94-100, 1995.