WIKIMEDIA COMMONS, SILKY M
Gene sequencing technology has advanced in leaps and bounds in the past couple of decades. But as genomicists and others involved in research projects that generate reams of DNA, RNA, or proteomic data know well, storing and analyzing all that information is rapidly becoming an intractable problem.
A recent article in The New York Times highlights the difficulty, citing many leading researchers airing their frustrations with discrepancies in the pace of innovation between sequencing and data handling technologies. "Data handling is now the bottleneck," David Haussler, director of the center for biomolecular science and engineering at the University of California, Santa Cruz, told the Times. "It costs more to analyze a genome than to sequence a genome."
Indeed, though the price of sequencing an entire human genome is expected to decrease to the long-anticipated $1,000 mark in the next couple of years, that cost is dwarfed by the mounting expenses of storing and analyzing genomic data.
And the data deluge (which The Scientist covered in its October issue) may also cause the shuttering of federal repositories designed to store the information. The amount of data stored one such database has more than tripled in the past year alone, according to the Times article, bulging at the seams with 300 trillion nitrogenous bases occupying almost 700 trillion bytes of computer memory.
"We have these giant piles of data and no way to connect them," Steven Wiley, a biologist at the Pacific Northwest National Laboratory, told the Times. "I'm sitting in front of a pile of data that we’ve been trying to analyze for the last year and a half."