Building High-speed Lanes on the Information Highway

The information highway is adding lanes.

Bennett Daviss(bdaviss@the-scientist.com)
Jan 16, 2005
<p></p>

The information highway is adding lanes. These new thorough-fares aren't designed for trucks carrying bulk E-mail or bus-loads of sightseers cruising along the Web. The new infobahns are strictly for science, and they're changing the way researchers work.

New computing, networking, and data-storage technologies are melding into meganetworks aimed at researchers' growing need to put their heads together, even if only virtually, with diverse specialists around the world. These collaborations aim to analyze complex images and large amounts of data, sharing and manipulating pictures and numbers in real time.

"We have a torrent of data that this new generation of instruments is producing," says Dan Reed, director of the Renaissance Computing Institute spanning the campuses of Duke University, the University of North Carolina, and North Carolina State University. "The problems that we want to solve increasingly require the assembly of teams that are not only geographically distributed, but also multidisciplinary....

RIDING THE RAIL

Among these emerging networks, the NLR is literally at the bottom of the heap; it's the carrier on which many distributed research projects will, and already do, ride. The "rail" comprises more than 11,500 miles of coast-to-coast optical fiber that had been lying idle after the telecommunications industry contracted in the 1990s. A consortium of regional nonprofit groups, each in turn made up of universities and research organizations, purchased the dark cables, making the NLR the only US research infrastructure owned by the research community itself.

Dubbed "lambda" after the Greek letter scientists use to designate wavelength, the rail runs along both US coasts and the central and southern parts of the country. The $125-million project is also planning a northern spur linking Chicago and Seattle through Minneapolis, as well as some additional north-south routes, such as from Atlanta to Chicago. Scientists board the NLR via fiber-optic links from their institutions to one of 27 hubs around the country.

"Because we own the infrastructure, we can be more responsive to researchers' needs and more flexible in deploying new networking technologies," says NLR CEO Tom West. Indeed, NLR has reserved as much as half its capacity for engineers developing those new technologies. "We'll be the site for new technology such as optical switches to evolve," he says.

When an innovation that boosts network speed or capacity is hatched, it might easily appear first as part of the NLR. Last November, a group of physicists showcased a new transmission protocol developed at California Institute of Technology. In so doing they won the Supercomputing Bandwidth Challenge competition, setting a world record by sending 101 gigabits per second (gbps) over the rail – fast enough to transmit the entire contents of the Library of Congress in 15 minutes. (The Internet speed record, in contrast, is barely 4 gbps.)

The other half of the system's capacity will be reserved for research projects vetted by the NLR's scientific advisory panel. The approved projects will be granted a dedicated wavelength over which to create, find, store, share, and analyze data and to communicate with others. "In essence, we're trying to create a nationwide LAN [local area network] using Ethernet connections in 21 locations along our infrastructure," West explains. The NLR expects to have its internal Ethernet running in February. "To our knowledge, no one has tried anything like this before."

GOING GLOBAL

<p>BRINGING BRAINS TOGETHER:</p>

© Drs. Gary Glover and Lara Foland, Stanford Univeersity, Function BIRN

Function BIRN is working to understand the underlying causes of schizophrenia and to develop new treatments for the disease. The effort brings together researchers from different segments of functional neuroimaging to determine the role of frontal and temporal lobe dysfunction in schizophrenia, and to assess the impact of treatments on functional brain bnormalities.

Though located in the US, the NLR has more global ambitions: the project also is part of the Global Lambda Integration Facility (GLIF). Embracing 60 researchers from the six inhabited continents, GLIF is smoothing out kinks so that the NLR and its equivalents in other countries eventually can join seamlessly to circle the globe.

That integration already is underway through DANTE, the Delivery of Advanced Network Technology to Europe. Based in the UK, the project is owned and operated by a consortium of European networks. DANTE, in turn, is the parent of GÉANT, a collaboration of 26 national research and education networks spread across 30 EU member countries. The partnership coordinates such projects as BioCASE, a network-wide integration of biomedical databases from Reykjavik to Budapest.

Wiring Europe both east and west, GÉANT is Europe's representative in the TIEN2 Project, connecting Western researchers to colleagues in Asia and the Pacific Rim, and in ALICE, a Latin American project. It also forms part of EUMEDConnect, which joins scientists and clinicians on the continent to medical study and treatment centers along the Mediterranean from Turkey to Morocco.

ALICE provides teleconsultation services in Latin America, allowing medical patients to be scanned for a variety of conditions, from pregnancy to cancer. When necessary, scans can be referred to specialists in Europe through GÉANT, which links to the Latin network via a trans-Atlantic cable between Madrid and Sao Paulo.

Along the northern US border, those projects eventually may flow seamlessly into the CANARIE network, spanning Canada from Victoria Island on the Pacific to the windswept shores of Newfoundland. CANARIE's optical network unites more than 150 colleges, universities, hospitals, and research centers with colleagues in more than 40 other countries. The network is being used to store images as part of the medical records of cancer patients, and to enable physicians to monitor the health of patients requiring long-term care in Canada's hinterlands.

In the southern hemisphere, CANARIE's initial success inspired the Australian Academic and Research Network (AARnet), a 10-gbps Internet-based Web that shuttles data among 37 universities "down under" as well as to colleagues abroad. Among other ventures, the network is part of a nascent project to help far-flung infectious disease experts swap complex images and data instantly.

In a test run, Pittsburgh scientists were able to zoom in and read the finest print on a business card in Canberra, Australia, which means the link "could realistically be used for remote diagnosis and other telehealth imaging requirements where detail is critical," says George McLaughlin, AARnet's CEO. The young network already has been credited with shortening China's 2003 SARS epidemic.

Though the NLR cannot predict a date for the wider integration of these regional nets, West expects the Rail will have all its elements in place within a year. "We're hoping for June," he confides.

POWERED BY TERAGRID

One of the NLR's chief clients is the TeraGrid, which describes itself as "a multiyear effort to build and deploy the world's largest, most comprehensive, distributed infrastructure for open scientific research." Spread across nine research universities and supercomputing centers, the grid offers not only state-of-the-art imaging and tools for seamless grid computing, but also 20 teraflops of computing power and nearly a petabyte of online storage space. (A teraflop is a trillion floating-point operations per second. A petabyte is 250 bytes or 1,000 terabytes).

"The TeraGrid is a natural follow-on to the NSF's national supercomputing centers," says Charlie Catlett, director of the TeraGrid's grid infrastructure group. He also managed the TeraGrid's assembly during its three-year construction phase, which ended last September. "The technology for taking a supercomputer here and a database there and putting them together in a distributed system only matured within the last few years, to the point where we could think about building a production grid system."

TeraGrid formally came online in 2001. It was originally conceived as a way to unite and synergize the data processing and storage capacities at Argonne National Laboratory at Caltech, the National Center for Supercomputing Applications at the University of Illinois, and the San Diego Supercomputer Center. Since then, the Pittsburgh Supercomputing Center, Oak Ridge National Laboratory, the Texas Advanced Computing Center, and Purdue and Indiana Universities have joined, forming a nine-member network linked by a dedicated, 40-gbps fiber-optic line. It is, says Catlett, "the world's fastest network for this purpose, as far as we know."

Steering the vehicles on this particular byway is a suite of open-source grid software developed by the Argonne-based Globus Alliance, which is installed at participating TeraGrid sites. "That gives scientists the ability to run applications on multiple remote resources with a single login across all sites and to do high-performance data transfer," Catlett explains. "What we've developed isn't so much a piece of software as a specification for what software, tools, compilers, and the general environment will look like for all machines at all locations for all users."

<p>THE NATIONAL LAMBDA RAIL</p>

© 2004 National Lambda Rail, Inc.

Connecting cables was the easy part. To give researchers access to the wealth that lay along the fibers, Catlett's group had to develop an intricate accounting system. In the past, a researcher granted 1,000 hours of CPU time at, say, the San Diego center and 100 units of time at Pittsburgh couldn't transfer San Diego time to Pittsburgh if he used up his allotment; he had to stand idle until California time became available, even if Pittsburgh could accommodate him. "There was no way to change francs into Deutschmarks," Catlett explains. "The TeraGrid needed to create the equivalent of the Euro that researchers could spend at any site without difficulty, which, in turn, required us to develop the software equivalent of a currency exchange."

About 100 research projects and 700 individual scientists already travel the TeraGrid, with another 100 waiting to be granted access. "The TeraGrid is not an exclusive club," Catlett says. An institution "should be able to join pretty painlessly," as long as it has something to offer that justifies the cost of incorporating it into the network, notes Dan Reed, one of TeraGrid's originators. "There's no cash fee to be paid to join," he says. "The litmus test is whether you bring some critically important resource or intellectual capability from which the community can benefit. If you join, are we all better off?"

Researchers have no doubt as to the benefits. A group that includes Schulten uses supercomputers in Pittsburgh and Illinois via TeraGrid to explore the mechanical properties of proteins. "Through the TeraGrid, we're now able to bring more of the power of the NSF's supercomputing centers to the researcher's bench top," Schulten says. "You're sitting at your own terminal, but you have one of the fastest computers in the world at your fingertips. We can be with colleagues in our own labs and interact with this simulated 'live' protein – pulling it, straining it, testing hypotheses, all in real time."

To bring the same power to others, TeraGrid is fashioning topic-specific "science gateways." Each offers a Web-portal opening onto databases, software, and an array of specialized tools to help a research community carry out its work. Nine have been created so far.

The National Institutes of Health recently pledged $18 million to create one such gateway at the University of Chicago. Named the National Microbial Pathogen Data Resource Center, it will stockpile pathogen genomes as well as pool knowledge and data about their evolution and biology, and link the NIH's eight new Regional Centers of Excellence for Biodefense and Emerging Infectious Diseases. Catlett's vision is that, like computers sporting an "Intel Inside" sticker, more and more US research will figuratively carry the legend, "Powered by TeraGrid."

'BIRNING' UP THE HIGHWAY

The NLR and the TeraGrid are only two of the information expressway's new lanes. Another, and one that sometimes hitches rides on the other two, is the Biomedical Informatics Research Network (BIRN). The $20 million-per-year NIH initiative is developing software to link and smooth connections among supercomputing centers, databases, and an integrated suite of imaging tools. Through desktop Web portals, researchers in 22 brain research groups at BIRN's 15 participating universities can use the digital infrastructure to tap an array of federated databases and the data-crunching power of the nation's supercomputing centers.

"BIRN has a fairly specific task," explains Eric Jakobsson, director of the Center for Bioinformatics and Computational Biology at the NIH's National Institute of General Medical Sciences. "It has an array of imaging tools, a particular network of researchers, and is part of the job of integrating basic knowledge about neuroscience with high-quality imaging." The network is also testing a new style of collaborative research and resource organization that the NIH hopes will spread.

To carry out both missions, BIRN has launched three initiatives. Mouse BIRN is studying animal models of illnesses such as Parkinson disease and brain cancer; Function BIRN is looking into disorders in regions of the brain as schizophrenia progresses; and Morphometry BIRN is investigating unipolar depression, Alzheimer disease, and cognitive impairment.

<p>TOWARD A GLOBAL VILLAGE:</p>

Supporting both real-time collaboration and remote "telehealth" services, ultra-high-speed information networks like National Lambda Rail (left) and AARNet (right) are serving to build a virtual community of researchers and clinicians around the world.

Neuroscientist Greg McCarthy, director of the Brain Imaging and Analysis Center at Duke University and the University of North Carolina, is part of the Function BIRN team. His nationally scattered research group conducts videoconferences regularly to design projects, analyze results, and brainstorm.

"The kinds of responses we get from these tasks can be used in a variety of ways that we haven't been able to before," McCarthy says. "For example, they can be related to a specific genetic polymorphism, which means you need a large number of subjects. The idea is not to look at 10 or 12 local subjects but eventually thousands" drawn from a variety of populations. "You could use this kind of network to assess a potential drug's effects before you spend huge amounts of time and money" bringing it to animal or human trials, he adds.

That milestone won't be passed soon. "We're building the infrastructure as we're learning to use it," McCarthy says. BIRN already has yielded new insights into the mechanisms of depression and brain morphology, and in 2003, helped Japanese and US researchers collaborate on climate research.

The new meganetworks might even allow McCarthy to realize a long-held dream. "I've fantasized about building a new center," he says, "colocating people to interact on some important aspect of science. In the old days, my fantasy was set in a building full of psychologists. But now I need to talk to engineers, physicians, statisticians, physicists – a different collection every day. So my fantasy became, how do you create these virtual environments where you can interact with these people but preserve that same kind of water-cooler quality? The use of these integrated cyber-infrastructures is going to allow us to create that kind of virtual presence."

It's at that point that the hardest work of creating a meganetwork might really begin. Issues of data security, intellectual property, and privacy must be negotiated, which means changing scientists' attitudes at least as much as their technology. "More important than having a network to move images around is having people who see the value of moving images around," McCarthy says. "We make our bones professionally by differentiating ourselves – 'my task is better than your task' – and that drives a lot of science. But it's also true that if we want to get someplace quickly, we have to work together."

McCarthy admits an initial reluctance to join the network. "I thought, 'These data are my data.' But then I realized that they're not my data. In many cases, these data were bought by the US government. Getting every last ounce of worth from them by allowing other people to work on them makes a lot of sense to me now."

"There's a cultural change underway," says Reed. "When the larger community sees the early adopters of these network opportunities gaining an intellectual advantage and doing things that traditional methods won't allow, then people jump on the bandwagon."

For Schulten, it was an easy hop. "I'm not flying around to meetings any more," he says. "I'm back in my lab. I almost don't notice the network now because it connects so quickly, and that lets me concentrate on my research."