Structure Made Simple

By Jeffrey M. Perkel Structure Made Simple A step-by-step guide to reaching into structural biology databases and extracting the most for your research. There’s certainly no shortage of structural biology data today, but that doesn’t make it easy to use. Between technical advances and high-throughput structural genomics, structure databases are stuffed to overflowing with multicolored renderings of proteins, nucleic acids, and macromolecul

Mar 1, 2010
Jeffrey M. Perkel

Structure Made Simple

A step-by-step guide to reaching into structural biology databases and extracting the most for your research.

There’s certainly no shortage of structural biology data today, but that doesn’t make it easy to use.

Between technical advances and high-throughput structural genomics, structure databases are stuffed to overflowing with multicolored renderings of proteins, nucleic acids, and macromolecular complexes. As of January 2010, more than 62,000 structures have been deposited in the RCSB Protein Data Bank, more than 50,000 of them since 2000.

Such data have tremendous value, of course. They can guide rational drug design efforts, put genetic mutations into physical context, and identify structural features of interest. But it isn’t easy to extrapolate from published figures alone. “The pictures [in journals] are hard to make sense of,” says Eric Martz, emeritus professor of microbiology at the University of Massachusetts, Amherst, who designs software tools for structure visualization. “They are very complex three-dimensional pictures squashed flat on a page.” Plus, journal figures only highlight features important to the authors; other researchers may be interested in different regions or residues.

Biologists therefore must be prepared to dig into the structural data themselves. Yet if the average researcher lacks the tools and training to visualize and manipulate these data, call out key elements, and—most important—develop hypotheses for future experiments, then the very researchers who might benefit the most from structural biology efforts ultimately will fail to do so.

So how to leverage structural data to answer questions in your own research? The Scientist asked structural biologists and software designers to identify and answer some key questions structure newbies might ask. Here’s what they said.

Published structures are deposited in the Protein Data Bank (PDB) by accession number; simply enter that number (often listed in the structure’s publication) and go. (You may also view structures associated with published articles directly from their PubMed records.) Each PDB page includes a static 3D thumbnail of the structure, as well as links to explore the structure in other viewers, which allow you to rotate them freely, zoom in and out, identify residues, and so on.

One of the viewers in PDB, Jmol, is a Java-enabled viewer that requires no additional browser plug-ins, and is controlled via a scripting language and a right-click-enabled menu. Alternatively, try FirstGlance in Jmol, which simplifies the Jmol interface; written by Martz, it is now used by Nature and Nature Structural and Molecular Biology (via the 3D View link in the contents or in the full-text article). Another option: Proteopedia.org, a structure wiki developed by Joel Sussman, director of the Israel Structural Proteomics Center at the Weizmann Institute of Science, and Weizmann colleagues Jaime Prilusky, and Eran Hodis. (Martz also contributes heavily to this resource.)

"You could download the coordinates from PDB and make the pictures yourself, but here, somebody has made the picture for you."

FirstGlance, Martz says, provides select “canned views” of a structure via “one-click operations,” for instance, to visualize a protein’s charge distribution or hydrophobicity. But it doesn’t provide any information beyond what’s found in the PDB record itself. In contrast, Proteopedia pages can be extensively annotated with user-designed 3D “scenes” to highlight specific features. The hemoglobin entry, for instance, includes Jmol animations identifying the glutamic acid mutated in sickle cell anemia, the contacts holding the heme groups in place, as well as a tool to visualize structural differences between oxy and deoxy forms of the protein. “The picture is linked to the scientific text in a way that makes it meaningful,” says Sussman, who adds that anyone can create such pages, whether for educational purposes (e.g., at Madison West High School, Madison, Wis) or simply to share knowledge.

Explains Sussman, “You could download the coordinates from PDB and make the pictures yourself, but here [on Proteopedia], somebody has made the picture for you.” The result, he says, is a ready-made tutorial, where “each image is always oriented in the same way, and you don’t get lost. In a paper, every image is oriented in a different way, and it’s hard to follow what’s happening.”

No structure, no problem; try a close relative. Many times related structures have been solved, from which you can extrapolate information about your particular protein—assuming you can find them.

Recently, Martz helped UM ass colleague Steven Sandler, a professor of microbiology, find one so-called "homology model" for E. coli DnaC, a DNA-repair enzyme Sandler studies, using SWISS-MODEL (see Nature Protocols, 2008, doi:10.1038/ nprot.2008.197 for detailed information on using SWISS-MODEL).

Based on a submitted protein sequence, SWISS-MODEL "automatically determines whether there is a crystallographic structure in the Protein Data Bank that can be used as a template for homology modeling—based largely on the level of identity between the sequence of the unknown structure and the crystal structure," Martz says. "If a suitable template exists, it builds the model for you and you can download it."

In the event a suitable homolog cannot be found, Proteopedia recommends querying TargetDB, the structural genomics target registration database. "In some cases, a sequence-similar protein has already been crystallized and diffracted, but the model may not have been completed, or the completed model may not yet have been deposited in the PDB," the site advises. "In such cases, it may be worthwhile to contact the team that has made the most progress on a closely related sequence."

For Sandler, the goal of finding a homology model for DnaC was not mere curiosity. Given 12 DnaC "suppressor" mutants, he wanted to relate phenotype with genotype, by mapping each mutation on the structure. "The question was, how could we understand the biology of these mutations in the context of a known structure?" Sandler says.

If that's your aim, too, open your model in FirstGlance for Jmol, select "Find…", and enter a comma-delimited list of residues (e.g., "Ser110, Thr125, Gln137"). Selected residues will light up as small yellow "halos," one for each atom in the amino acid. (By default, Jmol displays a ribbon drawing of the protein without side-chains; if you wish to see more detail, click on "Vines", which gives a ball-and-stick representation.)

"The question was , how could we understand the biology of these mutations in the context of a known structure?"

For Sandler, the result was new insight into the biology of DnaC. "A lot of these mutations reside in one helix," he says. "So obviously this helix, we think, is involved in specificity." However, he adds, "the answer is not completely clear." Though most mutations mapped to a single helix, others did not, suggesting either that the intracellular structure deviates from the published one, or that there are multiple interaction points.

Fundamentally, mutagenesis is a two-part problem. First, you generally don't want random changes; you want to target residues that are proximal to a particular region of interest—an enzyme's catalytic site, say, or a signaling molecule's protein-protein interaction domain. You'll also want to know which (if any) of these potential sites have been retained over evolutionary time, as that would suggest they serve important architectural or functional roles.

"If you mutate something, you're more likely to get a dramatic result if you mutate something that is evolutionarily conserved," Martz says.

To assess evolutionary conservation, try ConSurf-DB. ConSurf-DB ("conserved surfaces") uses sequence alignments to calculate each residue's evolutionary stability, which it then renders in Jmol with a tropical color palate from aqua to pink; the more highly conserved the residue, the pinker it appears. (ConSurf-DB data are also accessible through Proteopedia.)

To address spatial proximity, use Jmol and/or Proteopedia to locate the region of interest (by selecting, for instance, the domain known to contain the catalytic active site); if you're lucky, you can even find the structure of your enzyme bound to its substrate in PD B. You can then identify nearby residues in the structure visually, or if you prefer, measure interatomic distances by double- clicking individual atoms (the Jmol cursor changes from a pointer to a cross).

When proteins bind ligands, inhibitors, or other macromolecules, they invariably change shape. Often, structures are available showing both forms of the protein (with and without ligand). The question is, how can you visualize what's happening at the structural level?

Sussman recommends using the Yale Morph server, which takes two (or more) structure files and calculates "a plausible or semi-plausible pathway" between them, according to the site's documentation.

The resulting files may be viewed directly on the Morph site using Jmol (e.g., myosin: viewed directly) or uploaded to Proteopedia for further annotation. One example illustrates the predicted conformational change that occurs as the influenza virus M2 proton channel switches from an "open" state to "closed"; another shows the conformational change in HIV-1 protease that occurs upon binding to the antiviral drug Saquinavir.

"I think this is one of the most important features of looking at structures," Sussman says. "Your eye can see things much more easily if you see the change than by looking at two different static pictures."

Sometimes, when groups solve the same structure using different methods, they arrive at slightly different answers. Which is correct? The answer, says Dorothee Kern, professor of biochemistry and an HHMI Investigator at Brandeis University, is: both.

Proteins, Kern notes, don't adopt a single conformation. Instead, they interconvert between many structures, spending different lengths of time at each one and sampling what NMR spectroscopists call the "energy landscape." X-ray crystallography will capture whichever segment of that landscape crystallizes, whereas traditional NMR will tend to see the structures the protein adopts most frequently. Rarely, however, will either of these methods capture the entirety of the landscape.

Calmodulin, for example, contains two domains separated by a linker. In the crystal structure, Kern says, that linker appears as an alpha helix. But NMR work from Ad Bax's lab at the National Institute of Diabetes and Digestive and Kidney Diseases suggests that, in fact, that helix is only present about 10 percent of the time.

Ultimately, there's nothing you can do in this case but exhort your friendly neighborhood structural biologist for additional conformations. That's not to say that the existing information is useless. Like trying to imagine a Ferrari's performance by seeing one parked in a garage, "A structure is a starting point," Kerr explains. "This is just one little piece of information, but it is not the ultimate answer, it is just one state out of many. And it doesn't stay in that one configuration— if it only stayed in one configuration, it would not perform function."