Raw Data’s Vanishing Act

As scientific publications age, the data that undergird them are disappearing at an alarming rate.

By | December 23, 2013

WIKIMEDIA, CNCPLAYERRaw data sets may be unavailable for nearly 80 percent of studies published 20 years ago, according to a new report that surveyed the authors of more than 500 ecology papers published between 1991 and 2011. The findings, which appeared last week (December 19) in Current Biology, also indicated that authors of older studies are not necessarily easy to track down. Lead author Timothy Vines, from the University of British Columbia in Vancouver, and colleagues were able to contact study authors in only 37 percent of cases, and found that the likelihood of being able to track down working e-mail addresses for authors declines by 7 percent each year post publication.

“Most of the time, researchers said ‘it’s probably in this or that location,’ such as their parents' attic, or on a zip drive for which they haven’t seen the hardware in 15 years," Vines told Nature. "In theory, the data still exist, but the time and effort required by the researcher to get them to you is prohibitive.”

Though many journals now require that published authors submit their raw data to public archives, the trend for older studies may extend across the disciplines, and a large portion of scientific data could be in danger of vanishing.

Add a Comment

Avatar of: You



Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo


Avatar of: Paul Stein

Paul Stein

Posts: 237

December 23, 2013

The former institutions or departments where scientists did their  published work need to hold a repository of current addresses and e-mails.  The issue of impossibility of contacting authors is everywhere in science, not just with ecology.  Google and LinkedIn oftentimes don't help locating people.

To also help matters, those institutions should also have data archives similar to GLP industrial practices, so not only the raw data will be available throughout time, but also precise protocols, not those abbreviated, impossible-to-precisely-replicate outlines.

Avatar of: FJScientist


Posts: 29

December 23, 2013

Interesting but I'm not surprised. Over twenty years, people retire at which time there are no standard mechanisms I am aware of by which a typical academic institution takes responsibility for archiving that data. And, I have stacks of data on zip drives that I and most others no longer have means of accessing. My 3.5" and 5.25" floppy disks have long ago been discarded.

I do have all my old notebooks predating the digital conversion suggesting that the written word may be less ephemeral than the digital word, at least for me. I did have a huge box with all my old films, stored in a room in the department. But that was deemed discardable by some anonymous person a few years back. So, boxes of hard, raw data are subject to loss as well. It is worrisome that over the last twenty years, pretty much all of my raw data has been stored digitally and likely will just disappear.

Morevover, much of the raw data, collected over many years by different individuals, would be hard for others to decipher. We tend to accrue data with specific projects in mind, after which multiple independent studies are immediately collated into graphs and tables, and then use the publication as our primary method for archiving substantiated results. Raw data archives are missing the 'why were experimental directions pursued and others abandoned', which only those of us who were there can state.

Published data represent only a fraction of our output over the years. I have large volumes of equally substantiated but unpublished data collecting dust. That data typically consists of piecemeal investigatations into areas never followed up in the lab, often because someone left the lab before 'completing' the study. I know of many intriguing findings within that data waiting for someone else to run with it. My biggest lament is that no one will ever do it because current publication practices depend on us reporting the 'complete' study rather than reporting 'Hey, what an interesting finding. Who wants to finish that?'

In short, I worry more about continuity of substantiated work found by us to be interesting but no longer pursued. I wish there were some way, at the end of my career (which sadly appears to be now), of presenting some outstanding findings for others to puruse. Some of it is decades old and subject to the criticism (beyond being incomplete) that the methods are outdated and must be followed up with today's capabilities in order to be published. Exactly why I wish there were some way for these short reports to see the light of day. So yes, raw data archives are important from the context of preserving the historical record. But archives in boxes for historians to look at a century from now is only part of the issue. There are publication roadblocks that impede us from letting others know of the nuggets of ground-breaking science buried within those archives.

Avatar of: Barry@DataSense


Posts: 24

December 23, 2013

This loss of data and contact with early researchers is unfortunately regrettable but highlights the hidden costs of our rapid expansion of science and headlong uptake of new technology. It worried me 50 years ago and it worries me now. 

What institions are prepared to pay the price of storing and archiving data, to the degree of maintaining hardware and technology capable or interrogating 'ancient' digital data? Let alone the physical needs for archiving collections of materal?

What this does do is to highlight even more the value and need for proper and thorough peer review of papers so that we can move forward with confidence that a scientific result is indeed valid.

Popular Now

  1. Secret Eugenics Conference Uncovered at University College London
  2. Like Humans, Walruses and Bats Cuddle Infants on Their Left Sides
  3. How Do Infant Immune Systems Learn to Tolerate Gut Bacteria?
  4. Scientists Continue to Use Outdated Methods