Q&A: 1 Million Preprints and Counting

A conversation with arXiv founder Paul Ginsparg

By | December 29, 2014

Left: number of new submissions/year; right: same data as left, with submission rates divided by the total for each yearARXIVSince 1991, scientists from a variety of fields have published their research to the preprint server arXiv, to quickly share data and to stake intellectual claim on new discoveries.

Today (December 29), the preprint server clocked its one-millionth upload. In anticipation of this milestone, The Scientist spoke with arXiv founder Paul Ginsparg of Cornell University about sharing data, peer review, and what’s next for the resource.

The Scientist: You started arXiv to serve the physics community, but later expanded into other fields, including quantitative biology. What was the impetus to expand into the life sciences?

Paul Ginsparg: Expanding into new fields has always relied on somebody from the target community to make contact. Unless the community is ready for it, and there are some members who are actively willing to organize, we don’t. In the case of biology we were contacted by a few people who happened to have been physicists turned biophysicists/biologists and were accustomed to this posting preprints and said, ‘Why don’t we have this in this area of biology that we’ve moved to?’ So we started getting that set up [in 2003].

TS: How, if at all, have you seen life scientists’ attitudes toward preprints change in the last 23 years?

PG: Twenty-three years ago was before any journals were online—basically before the World Wide Web. Everybody was in this purely print-centric mode. In fact, surveys of librarians at the time that I remember, when asked when research journals would be fully electronic and online, the estimates ranged from 50 to 100 years. From our point of view in the physics community, since we were already exchanging things electronically, we thought it was much more likely to be a five to 10 year timeframe, which of course is what it turned out to be.

Twenty years ago there were people who said, ‘Look, this [arXiv] is only going to be of interest to people in these narrow subfields . . . it will never be adopted by serious field X.’

Today everybody regards it as much more natural because now everybody is exchanging things electronically—everybody’s life is online. Having things disseminated as soon as you’ve finished writing them is a much more natural notion now.

TS: In your experience, have life scientists fully warmed to the idea of exchanging pre-refereed information?

PG: I’m hoping this is going away, but it has been a canard for the last 20 years. You sometimes hear from biologists, ‘If I present materials before they’re published and someone else quickly reproduces the results and publishes first, they get full credit and I get no credit at all.’ To physicists, this attitude has always been non sequitur. In famous instances, presented results publicly without ever publishing an article and gotten full credit for them—including Nobel Prize recognition—because your idea is yours and correct, independent of publication in a journal. That’s one reason why physicists were so eager to adopt arXiv, because it permitted staking intellectual property rights for an idea, and it was time-stamped so nobody could dispute it. That should be obvious to people in the life sciences—and maybe it is, increasingly so—but I still hear mention of this concern that results are not properly credited until formal publication.

TS: How about publishers?

PG: If anything, the attitudes have been looser. Twenty-three years ago it wasn’t an issue because we had the full involvement of the physics community. Facing journal resistance people would have said ‘OK. We’re not going to use your journal.’ In physics, the journals just had to adapt to it.

Some [non-physics-specific] journals have what we used to call a ‘don’t ask, don’t tell’ policy, where even if their policy was they wouldn’t accept it [work published online in advance], editors weren’t actively checking so, unless you went out of your way to say you had deposited it, it wouldn’t be an issue. In general, I’d say journals have been erring on the lenient side. If high-quality authors are intent on using preprint servers, risk alienating them by disallowing that. ArXiv is not going to jeopardize their financial model.

I’m sensitive to the financial concerns and fully willing to support journal publishers’ financial model. I recognize they provide a service that is expensive to maintain. We’re in this strange transitional period where, for some things, people want rapid dissemination, but the same people also want the quality control from high-quality journals.

TS: Cold Spring Harbor Laboratory last year launched bioRxiv, while some journals have launched their own preprint servers. Would you consider these competition, or are these services complementary to your efforts?

PG: There’s no notion of competitiveness. When this started in the early 1990s, I was the only one doing it so there was no competition. But if somebody else can do it better, or better adapted to a given community, they should.

One of the great strengths—one of the things that keeps arXiv propped up—is the advantage of being established. You’ve got all these people who go to the site every day; they’re less likely to switch especially because it’s comprehensive. The important thing is to be able to attract the overwhelming majority of the research community so the resource becomes all that much more valuable.

TS: Last September, a slew of international institutions pledged to help support arXiv through 2017. What’s next for your team?

PG: The infrastructure is well maintained and stable. In my mind, the challenge over the next five to 10 years is to remain agile. There have been many changes over the last decade, especially with the advent of social networks and blogging. I track the way people find [arXiv] articles, [including] through Twitter and Facebook and Reddit. Some of the challenges are to make sure we interoperate well with all of those services and other new services that might emerge. The biggest challenge is not to shrink-wrap it and have some frozen technology—you want this always evolving and adapting . . . in order to keep it a useful resource for researchers.

