When it comes to big data, genomics may soon take the lead in requiring the most storage space, according to a report published in PLOS Biology this week (July 7). Researchers at University of Illinois at Urbana-Champaign and Cold Spring Harbor Laboratory (CSHL) in New York charted the number of genomes sequenced over the last 15 years and estimated that, based on the current doubling time of seven months, 2–40 × 1018 bytes of genomic data will be generated by the year 2025.
“Big data scientists in astronomy and particle physics thought genomics had a trivial amount of data. But we’re catching up and probably going to surpass them,” study coauthor Michael Schatz, a computational biologist at CSHL told The Washington Post.
The researchers estimated that up to 2 billion human genomes may be sequenced by the year 2025, and could easily exceed the annual data storage needs of Twitter or YouTube.
“The world has a limited capacity for data collection and analysis, and it should be used well,” Narayan Desai, a computer scientist at Ericsson who was not involved in the study, told Nature. “Because of the accessibility of sequencing, the explosive growth of the community has occurred in a largely decentralized fashion, which can’t easily address questions like this.”