Open Access 2.0
The nautilus: where - and how - OA will actually work
The debate over open access to the scientific literature appears to be moving onto a new phase. Many continue to argue one side or the other of a binary choice: Either all research publishing should be open access, or only traditional publishing can maintain peer review and editorial integrity. Others, however, have moved beyond that false dichotomy, instead increasingly seeing various hybrid models emerging and new, often complex, business arrangements.
Partly this is a product of the apparent inability of open-access ventures to produce economically sustainable models. It is unclear whether BioMed Central, a privately held sister company of The Scientist, has broken even, and the Public Library of Science tax return from the fiscal year ending September 30, 2006, the most recent publicly available, indicates that the organization lost $1.4 million on $5 million in revenue. Even if sustainability, rather than profit, is the goal, this is not success.
The new hybrid models are also the product of shrewd thinking on the part of traditional publishers, whether in the for-profit or not-for-profit spheres, which are identifying new ways to hold onto revenues and, in some instances, even to augment them. We are entering a pluralistic phase, where open access and traditional publishing coexist, though they increasingly are finding their own distinctive places in the research universe and are less likely to compete head-on. To respond to the binary argument with which I began this essay, open access is a good thing, but it is also a small and inevitable thing.
A better way to begin to understand what is going on in scholarly communications today is to start with Walt Disney. After completing the hugely successful Disneyland, Disney famously resented all the clever businesspeople who profited from his venture by opening hotels, gift shops, and restaurants around the perimeter of the theme park. In response, Disney went into real estate, buying up much of the Orlando, Florida area, now home to Disneyworld. The hotels and restaurants sit on the property of the Disney company, paying a toll for the privilege.
Traditional publishers take the view that they invested in the creation of scientific literature and thus should, like Disney, be able to extract a toll for each instance of monetization derived from their original investment. This now takes many forms. Increasingly common are so-called "author's choice" programs (for example, Springer's Open Choice and Oxford University Press' Oxford Open) in which an author is permitted to pay a fee to make his or her articles completely open access, the economic equivalent of a hotel on Disney property.
More intriguing is Nature Publishing Group's Nature Precedings, which brands open-access life sciences preprints with the Nature name. I was surprised to learn while working on a project this autumn for a client, a new search-engine company, that at least one publisher expected to be paid for the right to index articles - not the right to display articles, but simply the right to index them. I imagine Walt Disney saying, "Damn! I wish I had thought of that!"
Perhaps the most intriguing recent development is Reed Elsevier's announcement that it has developed OncologySTAT, an advertising-supported online portal for oncologists. Reed Elsevier will make its articles on oncology available on this portal at the time of first publication. Subscriptions to the underlying journals, hardcopy and electronic, will continue to be sold to academic libraries, but for users of the portal, the content would be free of charge. It is not yet clear how "open" this open-access initiative will be, as users must register and be qualified before gaining access (a version of the controlled-circulation model common to trade magazine publishing), but at a minimum Reed Elsevier is opening up the doors a little, even if just to build a small guest house on the property. Thus we now have the same content being monetized through subscriptions to libraries and through the packaging of audiences to advertisers in an open access or almost-open access form.
Each market segment thus attracts its own business model. But will any of these models accomplish the first aim of many open-access advocates, namely increasing the dissemination of research?
Unfortunately for such advocates, open access does not appear to increase dissemination significantly. Here's one simple reason for that: Most researchers are affiliated with institutions, whether academic, governmental, or corporate, that have access to most of the distinguished literature in the field. "Most," of course, is not the same thing as "all." Some researchers are independent or employed by impecunious institutions or reside in developing nations, and some articles appear in publications whose circulation is far from robust.
Thus, though there may be some exceptional situations, especially in the short term, the increased dissemination brought about by open access takes place largely at the margins of the research community. From the point of view of traditional publishers, to paraphrase Voltaire, open-access advocates make the perfect enemy of the good.
Another important reason open access does not significantly increase dissemination is that attention, not scholarly content, is the scarce commodity. You can build it, but they may not come. It is one thing to write an article and upload it to a Web server somewhere, where it will be indexed by Google and its ilk. It is fully another thing for someone to find that article out of the growing millions on the Internet by happening upon just the right combination of keywords to type into a search bar. A researcher advocate of open access might consider this question: Would you rather double the amount of published information available to you, or increase the amount of time you have to review information you can already access by one hour a day? We are awash in information, but short on time to evaluate it. Open access only worsens this by opening the floodgates to more and more unfiltered information.
Open-access advocates would do well to consider what put those keywords into a researcher's mind in the first place. Very often the answer is the sum of all the marketing efforts of a traditional publisher, including the association with a journal's highly regarded brand. Certainly, awareness begins with researcher interest, but it does not translate into popularity without marketing.
There are exceptions to this, however. An entirely new journal, for example, is likely to get a larger readership in an open-access format unless an established publisher gets behind it in the traditional, proprietary way and makes a major marketing push. But without that push, the one-click pass-along capability of the Internet, building on the growing social networking sites, can be highly effective in special circumstances. What is important to bear in mind is that exceptional situations are by definition exceptional.
This does not mean that open access is useless or adds no value when it comes to dissemination; what it does mean is that open access is most meaningful within a small community whose members know each other and formally and informally exchange the terms of discourse.
Who needs peer review, copy editing, or sales and marketing?
What the authors of research material seek from their publishers is the audience - and, transitively, the prestige and certification that derives from that audience - that publishing companies, for-profit and not-for-profit alike, are set up to deliver. But what of the work for which there is little or no audience? What if there is simply no market? This is the ideal province of open access publishing: providing services to authors whose work is so highly specialized as to make it impossible to command the attention of a wide readership. Work can be specialized for any number of reasons. The author may be working in a tiny field; the work in question may effectively be addenda to previously and formally published work; or the author is probing a new area, where a community of fellow researchers has not yet emerged. Terming some information "highly specialized" doesn't mean that it is unimportant or of poor quality; it simply means that the material is of interest to a very small number of readers.
It is useful to think of a primary domain of open-access publishing as existing at the tiny center of scholarly communications, the innermost spiral of the shell of a nautilus, where a particular researcher wishes to communicate with a handful of intimates and researchers working in precisely the same area. Many of the trappings of formal publishing are of little interest to this group. Peer review? But these are the peers; they can make their own judgments. Copy editing? I doubt it, as this inner group already knows one another and can fill in the blanks and make mental corrections for errors in a hastily drafted document. Nor does this group require the sales and marketing of a large publisher, as the group is in regular communication anyway, without the mediation of a sales force or acquisitions librarian.
As one moves beyond the inner group of researchers, however, other readers may be interested in the work, but they may need some guidance in evaluating the material. For these readers, formal publication validates a work and asserts that it is worth giving attention. So we can imagine all of scholarly communications as a Nautilus' spiral: The inner spiral is the researcher's (and author's) intimate colleagues; the next spiral is for people in the field but not working exactly on the topic of interest to the author; one more spiral and we have the broader discipline (e.g., biochemistry); beyond that are adjacent disciplines (e.g., organic chemistry); until we move to scientists in general, other highly educated individuals, university administrators, government policy-makers, investors, and ultimately to the outer spirals, where we have consumer media, whose task is to inform the general public.
Something may be lost in the translation as research data moves outward from the core research colleagues to the disciplines beyond that. Without the "translators," however, which comprise the editorial review systems of traditional publishing, loss would be greater, as many readers would not be able to determine the relative value of different publications. At each step away from the center, the role of the publisher grows and the merits of open access diminish. Researchers not familiar with the author will seek a way to evaluate his or her work, and a reputable publisher's brand is a form of insurance. Formal publishing, in other words, assists an author not in speaking with a tiny group of peers but to a broader audience beyond them. Still, it's important to note that not all brands are created equal. The New York Times is a stronger brand for news than the Huffington Post, and the Huffington Post is a stronger brand than my friend's blog. The same can be said of various scientific publishers.
Whatever the virtues of traditional publishing, authors may choose to work in an open-access environment for any number of reasons. For one, they simply may want to share information with fellow researchers, and posting an article on the Internet is a relatively easy way to do that, especially when supplemented by personal E-mails or other communications to inform people that the article exists and comments are welcome. Some authors may "choose" open access because it is a condition of a funding grant. (I think some of the funding agencies have been misinformed about the benefits of open access, and they certainly have been misinformed about the costs, especially over the long term, but it certainly is within the prerogatives of a funding agency to stipulate open-access publishing.)
An open-access copy may be a way to provide a backup to the "original" on the author's personal workstation. Further, open access would be useful for: an article that may have been rejected by one or more publishers, but the author still wants to get the material "out there"; an author who may be frustrated by the process and scheduling of traditional publishers; an author who may have philosophical reservations about working with large organizations, especially those in the for-profit sector, not to mention deep and growing suspicions about the whole concept of intellectual property.
A reason to publish in an open-access format need not be very strong, as the barriers to such publication are indeed low. It takes little: an Internet connection, a Web server somewhere, and an address for others to find the material. Thus one way to think of open-access publishing is simply as an emergent property of the current state of Web infrastructure.
Despite this, many open-access ventures seem to have had difficulty financially because they have been built on the mistaken assumption that they are replacing traditional publishing and thus have to recreate all the services that traditional publishers now provide. Thus, BioMed Central has set up a series of open-access journals, replete with editors, review boards, and a peer-review system. This is also true of many of the publications of the nonprofit Public Library of Science, whose spending, even by the standards of large commercial publishers, is profligate.
It is unclear whether open-access publishers are less efficient than traditional publishers, even as studies about per-article charges claim one thing or another. Among traditional publishers, commercial publishers are probably more efficient than not-for-profit publishers, but these efficiencies are rarely passed on to subscribers. Regardless of the underlying reasons, the revenue derived from a fully developed service must be very high in order to offset the cost structure. Where does that money come from? It comes from funding agencies, sponsoring institutions, and the authors themselves - it has to come from somewhere.
If, on the other hand, one views the province of open-access publishing as the small, specialized communities of researchers on the inner rings of the nautilus, whose aim is simply to share information with people working in the same narrowly focused area, all this overhead can be tossed out. What's needed is good software, not a stable of editors.
At its most basic level, an open-access service need not be much more than "a hard drive in the cloud," a place where content can be stored and others can access it. But even the most highly automated service can include far more features, and many existing services already do. We can imagine a generic service where an author uploads a document; the document is stored, but only the author can access it for changes or removal. The author has an account with the service; logging onto it, the author may choose to e-mail selected individuals about the document, granting access to that list.
Over time the list of invited readers may grow, and some names may be dropped from the list. The author, in other words, controls access to the document. This access can be extended to an academic department or to the members of a professional society; access can be granted to any authenticated directory of users. At some point the author may remove all access restrictions, making the document fully open access. It is a matter of debate as to whether any of these steps, including the final one, constitutes "publication," but it is indisputable that access can be augmented and that the marginal cost of doing so approaches zero.
Such a service, seeing itself in competition with other services, will likely add to its offerings. The service begins as simple storage, evolves into an access system with the author in control of authentication, and then becomes what we mostly see in the institutional repository arena now, a means to display open-access content without end-user restrictions. The next step is to set up alert services ("Send me a link for all new papers posted by Mary Jones"); these alerts already exist for Medline and Google Scholar and some university repositories. Search capability for entire collections is an obvious follow-on. Then comes the capability to place comments on the papers, and the opportunity for the author to respond. (At this stage we begin to see the social networking capabilities of Web 2.0 technology begin to tiptoe into the realm of peer review, though it is "post-publication" or "post-posting peer review.")
Perhaps one copy of the document is preserved as an uneditable original, where it is displayed side by side with another copy on a wiki platform, providing the means to update or correct the paper. Documents may also be rendered into special file types to facilitate new machine processes (e.g., text mining). They will likely trigger automated searches ("Find other documents like this one"), and they will automatically be linked into other information services such as a library's long-term preservation program and be assigned specialized metadata, such as an algorithmically generated Library of Congress classification. Whatever computers can do, will be done; this is inevitable: The only question is the order of the appearance of features and the timeline for implementation.
Open-access organizations thus are best suited to serve a new market or a small one - or, more likely, a large collection of very small ones - not the established markets of traditional publishers, and they can do this at modest expense.
How modest? Like many of the staff-heavy open-access services that are currently vying for researchers' attention, a highly automated, robust software platform requires a large initial investment. A startup engineering and product-development team of around 12 members is a common formulation. This team would take 6 to 12 months to build a service and data center, with the aims of the research community in mind. To make a product out of this, the team would have to include many of the things we now associate with Web 2.0 businesses: highly interactive sites, with the capability of allowing postings, comments, and alerts - a Facebook for the research community. A back-of-the-envelope estimate suggests that an average salary of $100,000 yields an annual payroll of $1.2 million. Double that figure to include the cost of rent, hardware, and bandwidth and you could bring the service to market in about one year for perhaps $2.5 million.
While the service would be as highly automated as possible, it would still be necessary to attract users to it, which requires marketing. Most Web companies of this kind experience a doubling of staff soon after launch. That brings staffing to 24, payroll to $2.4 million, and a fully-loaded second year expense structure of around $5 million. The first $2.4 million can be regarded as one-time or "sunk" costs; the $5 million a year constitutes ongoing overhead (fixed costs). There are no appreciable variable costs in this kind of Web-based business.
Once an effective open-access platform is in place on the innermost rings of the nautilus, the incremental cost of operating the service can be quite small. Traditional publishing is set up to deliver a different kind of service that refines editorial material and creates a market, but the ongoing costs for an open-access service can be small because of the shared assumptions of the community members resident at the innermost spiral of the nautilus shell. That diminishes the need for authoritative editorial supervision and marketing communications.
If authors would pay $50 to deposit articles, 100,000 articles in a year would bring the service to cash-flow breakeven. To put this into perspective, the arXiv service, funded by Cornell University, receives around 50,000 articles a year (though at no cost to the authors). Of course, the service would have to compete with other services, including the many that have in university libraries. To compete means better services. And here the new organization has an advantage over the current open-access repositories in that it is market-based and is thus set up to service its customers. For the new service the customers are authors, whose every whim will be satisfied with new features, until the cost of depositing articles appears to be negligible. Yes, there is a paradox here: Although open access is free to readers, its real beneficiaries are the authors, who use the service to communicate with peers. BioMed Central and Public Library of Science get this right, but their high-cost editorial model would be difficult to replicate across the entire range of research publications.
The revenue streams for the new organization go beyond posting fees, however. For example, professional societies may wish to have their own brand on an open-access repository. Perhaps AAAS, for example, fearing that Nature's new Precedings product will undermine its flagship Science publication, will license the software service; this is known in the industry as a "white-label" deal. Other organizations (e.g., the research units of corporations) may want to use the software but balk at making its proprietary research public and thus may opt for a license for a gated community. Over time, premium services will evolve as well in which other computer processes (e.g., data mining) are made available for an additional fee.
The fundamental tension in scholarly communications today is between the innermost spiral of the nautilus, where peers, narrowly defined, communicate directly with peers, and the outer spirals, which have been historically well-served by traditional means. Open-access advocates sit at the center and attempt to take their model beyond the peers. As I hope I've made clear, I suspect that this will be difficult to do, but a highly automated service funded by authors' posting fees would indeed put pressure on some outer spirals. At the outer spirals sit the traditional publishers, who are attempting, with increasing success, to extend their reach into the inner spirals, preempting and co-opting open-access initiatives wherever they can. What remains unknown is at just what middle point the two models will meet.