Open Biology's Quest to Explode Data

By John Wilbanks Open Biology’s Quest to Explode Data A “science commons” at the data-intensive layer will encourage scholarly collaboration and communication—and spur drug discovery. “Network of Networkers” by Alex Pico using the Cytoscape gene network tool. Robert Metcalfe, co-inventor of Ethernet and the founder of 3Com, observed that the value of a telecommunications network is proportional to the square of the numbe

John Wilbanks
Jun 1, 2010

Open Biology’s Quest to Explode Data

A “science commons” at the data-intensive layer will encourage scholarly collaboration and communication—and spur drug discovery.

“Network of Networkers” by Alex Pico using the Cytoscape gene network tool.

Robert Metcalfe, co-inventor of Ethernet and the founder of 3Com, observed that the value of a telecommunications network is proportional to the square of the number of connected users of the system. This is known as Metcalfe’s Law, and it goes a long way toward explaining why we can create and realize so much value from the Web. As more users get online, the network gets more valuable, spurring more users to get online, and so on.

Getting Metcalfe’s Law to operate for data is a long-held goal of science. Indeed, the Web was created to share data—physics data—by making it easier to link, find, download, and browse information on disparate computers. But we don’t have...

There are a lot of reasons for this. Studying human disease is complex and almost incomprehensibly expensive. And recent studies from inside the pharmaceutical industry itself draw on 60 years of data to show us that drug discovery is essentially a random process. It’s hard to force network effects onto this world.

That’s because it’s difficult to start an “open” biology process from scratch. The cost of entry is still in the tens of millions of dollars to develop a meaningful corpus of data sets one can legally share and analytic tools one can legally place under open source licenses. Even then you’d have to find incentives to get scientists to share their new data, their models of disease, their software tools—when they’re not rewarded for doing so. It is a tall hill to climb.

Recently, I participated in the Sage Commons Congress—a remarkable event bringing together scientists, university leaders, government officials, patients, advocates, publishers and more. The Congress was led by the nonprofits Sage Bionetworks and Creative Commons, bringing together hard science and the open Web. The Sage Commons draws inspiration and education from the success of Wikipedia and other Web-enabled contributor networks. We want to begin to design systems that increase the numbers of seekers in the fields of disease biology and human health.

The Sage Commons is anchored by massive data sets and powerful analytic software, rendered legally open via Creative Commons legal tools and scientifically useful via global volunteers. It will bring pharmaceutical-grade disease biology and training tools to scientists across the world. Further, it will recommend standards and guidelines for amassing, curating, annotating, and citing data to encourage biomedical researchers to submit and to be credited when their data yield new findings. At least 10,000 hours went into the first Congress, and that’s just the beginning if we are to achieve our goal of a Metcalfe’s Law explosion in the value and amount of data for life sciences.

As a nonprofit, non-university contributor network, Sage Commons represents a novel entrant into the world of large-scale open biology, but is far from the first. The National Center for Biotechnology Information has spent decades building and giving away vital information resources about genomics and biology. The National Cancer Institute is working to build a vast network of digital resources about cancer. Sage’s work integrates with and leverages this vital infrastructure, making it easier to begin the transformation of disease biology into a precompetitive space akin to the human genome.

The goals are ambitious but the reward is great, and there may not be any other way. The only element statistically linked to an increase in drug discovery rates is an increase in the number of those who are looking. By building a public commons of disease biology, and by increasing the number of seekers, we have a non-miraculous methodology to increase the rates and successes of those engaged in drug discovery.

John Wilbanks is Vice President of the Science Commons project at Creative Commons. Videos and more information about the Sage Commons Congress can be found at http://sagecongress.org/WP/presentations.

Interested in reading more?

Magaizne Cover

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!
Already a member?