Give P2P a Chance

Why you should be using peer-to-peer networks to share your data.

Jul 1, 2006
Jeffrey M. Perkel

Remember peer-to-peer (P2P) networking? It's the software technology that incurred the wrath of the entertainment industry for its use in pirating copyrighted material. But P2P isn't bad; it exists simply to share data, a mission that dovetails nicely with science's collaborative ethos.

With P2P you can create pooled image libraries, disseminate genomic-scale datasets, and share publications, patient data, and poster presentations. You could do that from your lab's home page, too, of course, but P2P has an edge over traditional Web pages. Should your host computer go down, lab priorities change, or funding falter, the site could be lost. You also might lack the bandwidth to accommodate a popular site, or the hardware to host a large site.

That's why P2P is so useful: All you need to deliver files via P2P is a desktop computer. Simply download the application, select the files you want to share, and let the software do the rest.

In traditional client-server transactions, you (the client) request data from a database (the server), which then delivers that content back to you. With only two parties to the conversation, you have no recourse should the server go down. And if the transferred file was large enough, your one request could grind the entire system to a halt.

In P2P networks, every machine is both client and server. There is no central repository; each user can request files from other machines and deliver them to its peers in turn. This arrangement eliminates bandwidth issues while also improving stability: By duplicating key files among peers, you ensure their availability, even if their primary host goes offline.

One P2P system designed expressly for academia is Pennsylvania State University's LionShare (http://lionshare.its.psu.edu), whose first public release was expected in June. Employing a mixed P2P/client-server architecture, LionShare blends traditional P2P functionality with traditional academic data repositories. "So the real value added in LionShare," says Mike Halm, senior strategist for e-learning technologies at Penn State, "is that you're doing a federated search over all resources being shared, plus a variety of learning resource repositories around the world, so you get rich results coming back to you when you do a query."

Unlike traditional P2P networks, LionShare places a heavy emphasis on security and user identification, says Halm. Users choose which files on their hard drives will be shared, describe them using metadata tags, and then broadcast those descriptions across the network. Private data remains private, but even public files can be restricted to a specific audience, such as class participants.

LionShare offers a relatively sophisticated and secure form of P2P, but that won't necessarily ensure wide adoption in the scientific community. To do that, P2P software must:

1. Enable scheduled searching: Searching each day to see if anything new and interesting has been added to the network gets old fast. Use RSS feeds to advise users of new content, and allow filtering to show only files matching certain keywords.

2. Grant write permission: Give trusted colleagues the ability not only to access (read) files, but also to edit them remotely. That would simplify the maintenance of consortium-wide spreadsheets. Use of version control can allow owners can roll back unwanted changes.

3. Develop OS-specific versions: As a Java application, LionShare can run on any OS, but it's also slow. OS-specific versions would be faster, and it would then be possible to browse the P2P network as just another folder in your file system.

4. Address copyright issues: Give trusted individuals authorization to oversee the network and enforce copyright laws. Also, give each file an associated license, so it's clear as to who owns it and how it may be used.

5. Integrate with other P2P networks: Enable users to search across different P2P networks. Otherwise, they'll have to run different clients and de-duplicate the resulting hit lists in order to perform a comprehensive search.

The final key: Evangelize. P2P is useless without a large pool of users to sustain it. So get out there and remind your peers: Share and share alike.

jperkel@the-scientist.com