Preserving Research

The top online archives for storing your unpublished findings

By | August 1, 2013

© DRAFTER123/ISTOCKPHOTO.COMAs a graduate student in Harvard’s organismic and evolutionary biology department in the early 2000s, I wanted to publicly share all of the research that went into my doctoral thesis in order to contribute to the small body of scientific literature on the little-known group of marine arthropods I studied, sea spiders. However, after I published a few reports and successfully defended my PhD, my drive to submit the final chapter of my thesis to a journal dissolved because of the expense and time involved. Yet, on the rare occasions when researchers have asked to see it, I regretted that it languished on my bookshelf. Although the chapter is far from earth-shattering, it might provide a stepping stone for another biologist.

“There is a need for science to be communicated faster to other researchers and the public, so by putting manuscripts online in places like the [preprint server] arXiv, biologists can quickly disseminate their results and get feedback,” says Dmitri Petrov, an evolutionary biologist at Stanford University. “Sometimes researchers make careful observations that might not tell a thrilling narrative,” but could still save another researcher valuable time re-creating the same experiment, he adds. “It’s wonderful to put those observations online and to provide them for free.”

Luckily, sharing is cheaper and faster now that online, open-access collections for biology are flourishing as researchers realize the benefits of uploading unpublished reports of negative results, observations, grant applications, protocol notes, and yes, their unpublished theses onto the Web for others to peruse. In January, I finally uploaded my thesis chapter on sea spider metamorphosis onto several sites. Within 3 weeks, a zoologist from Germany e-mailed me to ask how to cite it and whether I was still following that line of study.

In addition, unpublished uploads may directly contribute to one’s career. This year, the National Science Foundation announced that grant reviewers would take note of citable and accessible “products” in addition to publications. Because online repositories grant unpublished reports a digital object identifier, or DOI, that can be referenced in a citation, these uploads may now improve a scientist’s reputation.

Submission is typically free and relatively simple. However, how readable, usable, and findable the report is to others remains up to you. In order to explore how several online repositories function, I uploaded my thesis chapter as a test, and spoke with experts who have turned to the Web for similar reasons.


Researchers list various reasons for uploading unpublished material: to get feedback on a paper before submission; to help others learn why a grant was accepted or rejected so that they need not repeat the same mistakes; to place a time stamp on their data or ideas; to share observations and protocols that could be useful to other scientists; and to post movies and other data in formats that most journals cannot handle. Here are a few tips to get the most out of your post.

Choose your words wisely
Search engines pick over the title and abstract of uploaded reports. Therefore, it’s important to think about your wording. “It’s cute to have a title like ‘To Be or Not to Be,’” says physicist Paul Ginsparg, founder of the first major preprint server, arXiv. “But since that does not convey the essential content, it will be missed by your target audience.” Ginsparg complimented me on the title I had chosen for the thesis chapter I uploaded to the arXiv, “Sea Spider Development: How the encysting Anoplodactylus eroticus matures from a buoyant nymph to a grounded adult.” He says that it includes words that a nonspecialist may Google in addition to technical terms like “nymph” and “encysting” that researchers in the field might use to search for the paper. In addition, Ginsparg advises researchers to attach plenty of metadata, such as keywords ranging from the general to the specific, to every upload.

Check the license
Before hitting the submit button on a particular repository, read it’s licensing information carefully. Many repositories now offer Creative Commons (CC) licenses. The most common type, “CC BY,” allows anyone to read and distribute a paper as long as they give proper acknowledgment to the author. This way, anyone who wishes to post the content on Wikipedia or another website need not worry about infringement, as long as they reference the author. A subcategory of the Creative Commons license, “CC BY-NC,” adds the clause that others cannot distribute the report for commercial purposes. If an author intends to also submit the report to a peer-reviewed journal, this option is better, as journals tend to want the exclusive right to distribute the article for commercial purposes.

Compress huge files and append raw data
Some repositories boast that they offer unlimited upload size, but that might not be a blessing. If you manage to upload a huge file before the server times out, the report may cause the browser to perform poorly and readers may not be able to download the file without a high-speed connection. For this reason, Ginsparg recommends that researchers compress figures into a single PDF, but also upload a separate file in a format that preserves the raw data.


arXiv LAUNCHED (1991)
PAUL GINSPARG, ARXIV.ORGTheoretical physicists have posted unpublished reports on for more than a decade, and recently, a growing number of biologists are doing so, too. (See graph on this page.)

The subheading for biology, “Quantitative Biology” is a loose one, with subject matter ranging from cancer to epigenetics.

Number of uploaded reports: About 860,000 reports from a variety of scientific disciplines

Number of biologicay-related reports: 7,200 registered under the quantitative biology category

Cost: Uploads are free. As of 2001, the website is hosted and handled by Cornell University Library in Ithaca, New York.

Submitting: Anyone can upload a report, provided you have an organization or institutional affiliation.

Searchability: The local arXiv search engine indexes the author’s name, keywords, and words in the title, and abstract. It also combs through the text of a PDF (a suggested and common format for uploads), but slightly less thoroughly.

Pro: Reputation. With 2 million downloads weekly, Google and other search engines discover papers on arXiv quickly, and most researchers immediately recognize the website as a mainstay in online publishing.

Con: Usability. There is no comment feature, so if another researcher wants to critique the work, she must send an e-mail. Also, most quantitative biology uploads are in PDF format, as arXiv suggests. As such, researchers cannot update data within a report that has been compressed.

Use of figshare boomed after Nature recommended the site as an alternative when they stopped accepting submissions to Nature Precedings, an online preprint journal (figshare is a sister company of Nature Publishing Group). Figshare’s content includes supplemental data associated with published papers, as well as unpublished data sets and reports, conference presentations, and more.

Number of uploads: Hundreds of thousands, but many are supplemental data associated with peer-reviewed manuscripts

Number of registered users: Thousands of active users, primarily in the life sciences

Cost: Generally free. The site plans to sustain itself by working with publishers, such as F1000Research and PLOS, who pay for figshare services to help with visual content that those journals cannot easily handle.

Submitting: Each upload is free and limited to 250 MB, and users can upload as many projects as they like, as long as the uploads are public. Privacy, or partial privacy with a handful of selected collaborators, is also an option; however, it limits researchers to 1 GB total. If there is a demand for unlimited space, founder Mark Hahnel says he can set up premium accounts for a small fee.

Pro: Usability. Figshare features an intuitive user interface. In addition, Hahnel put special effort into how video data and other nontraditional formats are displayed because of his frustration that he could not easily share his own videos of cell dynamics. Finally, figshare encourages feedback by making it as simple to leave comments below the manuscript as it is on YouTube or a discussion board.

Con: Youth. As a relatively recent site for scientific data, preprints, and published papers, figshare has yet to prove its staying power.

ResearchGate (LAUNCHED 2008)
COLLECTING AT ALL LEVELS: My graduate work focused on the evolution of arthropods, using sea spiders as a model. Some of the sea spiders were collected from rocks along the Pacific coast of Japan. The confocal microscope image (insert) shows a juvenile sea spider’s nervous system tagged with a fluorescent marker and color-coded to indicate depth. My goal in uploading the last chapter of my doctoral thesis was to share more of my data with other scientists.COURTESY OF AMY MAXMEN; KATSUMI MIYAZAKResearchGate focuses on a researcher’s academic network more than the other sites. It initially creates this network by asking a user to invite coauthors, and it automatically locates them by scanning the user’s published research. When people in your network upload unpublished reports, a notification appears on your home page (unless the authors have requested privacy). Most of the content currently on ResearchGate consists of published peer-reviewed material and science-related forum posts; however, cofounder Ijad Madisch expanded the database in December 2012 to include non-peer-reviewed posts. In part, Madisch made the change because “80 percent of the experiments I tried did not work, and I never shared those negative results,” he says. “I was sure someone else had made the same mistakes, and I wanted to be able to find them.”

Number of biology-related posts: More than 100,000 non-peer-reviewed uploads, including many data sets

Number of registered users: As of mid-July, almost 630,000 biologists have signed up for ResearchGate.

Submitting: Users sign in with an e-mail attached to an academic institution.

Searchability: Because ResearchGate smoothly accrues a large collection of published research, a search for the topic “sea spider,” for example, returns a library of information, published and unpublished information alike.

Cost: Uploads are free. Companies and institutions can post job ads on the site for a fee.

Pro: Usability. Users receive a score based on the number of publications in peer-reviewed journals and the impact factor of the journals, as well as an “RG” score based on their participation with the site. This score could be submitted as part of a grant application, although the value of its impact remains to be seen. Also, feedback is social. Readers can post questions about a report to a forum that all users see.

Con: Networking. Some researchers may dislike publicly sharing their query about a report with a forum, and may be turned off by requests from ResearchGate to invite colleagues, or by the Facebook-like home page with a running stream of updates from other scientists.


SAVING DATA: During the course of my research, I gathered a vast number of microscope images, DNA sequences, and other data.COURTESY OF AMY MAXMENMost universities encourage their researchers to submit dissertations and published manuscripts to their repositories. The digital repository called DASH (Digital Access to Scholarship at Harvard) at my alma mater, Harvard University, also permits the submission of unpublished reports, but Stuart Shieber, the founder and former director of Harvard’s Office for Scholarly Communications, says that researchers rarely use it for this function. My review of these repositories is based on DASH, but the capabilities of different institutions vary.

Number of reports on DASH: 12,309. Most are published reports from a wide variety of fields. An additional 625 dissertations are uploaded from the College of Arts and Sciences.

Searchability: People who wish to find reports on digital institutional repositories around the world can search for them at

Pro: Reputation. Because membership requires a university affiliation, readers may feel assured that the research derives from a qualified source. Whereas newer platforms may lose ground over time, those hosted by a university will likely stand the test of time, even if they remain underutilized.

Con: Usability. Because submissions are manually vetted, my chapter did not appear online for 5 weeks after I uploaded it in mid-January. Also, readers cannot leave comments or click a button to send a message to the author. Finally, the system felt less flexible and less intuitive than other online repositories mentioned here. 


Additional sites for non-peer-reviewed uploads: This is not an exhaustive list. Alternatives include PeerJ Preprints (specifically for unpublished reports in the biological and medical sciences); (mainly for published reports, so people may not visit the site to find unpublished information); MyOpenArchive (covers all disciplines and requires no institutional affiliation, but has no search function embedded in the site); and F1000Posters (for posters from conferences).


Add a Comment

Avatar of: You



Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo


Avatar of: AWZ


Posts: 1

August 10, 2013

The most promising repository I've seen is It supports both CC-BY and no license options.

Popular Now

  1. Publishers’ Legal Action Advances Against Sci-Hub
  2. How Microbes May Influence Our Behavior
  3. The Caterpillar that Cries Wolf
  4. Sexual Touch Promotes Early Puberty
    Daily News Sexual Touch Promotes Early Puberty

    The brains and bodies of young female rats can be accelerated into puberty by the presence of an older male or by stimulation of the genitals.