© LAUGHING STOCK/CORBISAs next-generation sequencing gets ever cheaper and higher-throughput, data file size continues to surge, creating some new, pressing needs for scientists. It’s not enough to be able to acquire big data using their own machines; researchers have to be able to store it, move it, and analyze it, and they often want to share it. Large collaborations complicate these steps. As a result, many researchers have resorted to planning their workflows around having a single site for analyses—it’s that, or physically shipping hard drives.
Not only are data files growing in size and number, especially those amassing sequence data, but data handling in genomics, epidemiology, and other fields has become unwieldy in other ways. Copying thousands of files, or sharing them with others, has become a laborious process, and as analysis options proliferate, choosing the right tools for the job can take some guesswork. Figuring out how to make data easy to handle and process is a big challenge for life scientists, according to Stan Ahalt, director of the Renaissance Computing Institute and a professor of computer science at the University of North Carolina at Chapel Hill. “The other challenge is learning how to utilize other people’s data to accelerate their own lab’s science,” he says.
Data demands in ...