Soon after the surprising announcement that the human genome had far fewer genes than most had expected, researchers began to realize that there were still a great many unexplored transcriptional start sites. Tom Gingeras' team at Affymetrix, along with Kevin Struhl of Harvard University and colleagues, confirmed some of these findings in a 2004 Hot Paper by mapping the binding sites for three DNA transcription factors: Sp1, cMyc, and p53. Using a combination of high-density oligonucleotide tiling arrays (in which every region of the genome is represented, thus eliminating bias toward promoter regions) and chromatin immunoprecipitation, they found that roughly 80% of the binding regions were not located at the 5' ends of protein-coding genes, and that many of these binding sites were associated with the noncoding RNAs found in earlier research.
Struhl says that the results were at first controversial; the paper was originally rejected by Science. "I think everyone now agrees that the transcriptome is much more complex than people ever thought before," he says.
Several have built on this work, and Struhl and Gingeras have a follow-up paper (currently in review) examining the biological role of the noncoding transcripts. Gingeras says some of these regions are actually found within coding transcripts that are expressed only in certain cell types or at specific times during development. "That opens up a second question," he adds: "What is the regulatory mechanism which the genome uses to identify these sites in a very specific time or a very specific cell type?"