The prominent researcher has been put on administrative leave pending an investigation into unspecified allegations.
A guide to free software for constructing and assessing species relationships
August 1, 2011|
CORRIE SAUX MOREAU, FIELD MUSEUM OF NATURAL HISTORY
Constructing an evolutionary tree can seem as unappetizing as filing taxes to those not fluent in computer-speak. But, alas, learning how one organism relates to another is often a necessary first step in approaching biological questions, be they about the evolution of drug-resistant strains or the origin of body parts. Advanced software for aligning genetic or protein sequences and constructing phylogenies exists, but most programs require entering lines of computer script. Richard Ree, an evolutionary biologist at the Field Museum of Natural History in Chicago, explains that the scant commercial interest in developing phylogenetics software has forced biologists to largely write programs on their own. “As a result, the user interface tends to suffer because we don’t have the time to go back and make it user-friendly.”
But fear not: point-and-click tree-building and tree-visualization programs do exist—and they might be all you need to get where you’re going if phylogenetics isn’t your long-term calling. As a service to biologists with deep ideas but a phobia of Java-Script and “R,” The Scientist presents a tour of free software for aligning sequences, building phylogenies learning about evolution, and showing off a clear, visually pleasing final tree in presentations and publications.
The first step of any DNA or protein sequence comparison is to align sequences so that homologous nucleotide or amino acid positions line up across taxa. After you’ve gotten reliable DNA or protein sequences, you’ll need to convert each sequence into a text-based format called FASTA, if it’s not already in that style. To do so, just copy and paste your sequence into any word-processing document, then give the sequence an identifying label that begins with “>” and ends with a space. Insert the sequence after the space. If it’s a protein, it should look something like this: >gi|5524211|gb LCLYTHIGRNIYYGSLP LYSETWNTGIMLLLITMATAFMGY
If you’re adding sequences from GenBank, just download them in FASTA format and copy and paste them into the same file. Save the file of all your FASTA sequences as a .txt file.
A popular workhorse for alignment is Clustal, but there are many others. Platforms such as SeaView drive various alignment and phylogeny programs, including Clustal, and make them easier by simplifying them to their most basic features.“These online resources take some of the difficulty out of running particular programs, which is half the battle,” says Corrie Moreau, a Field Museum biologist who specializes in ant evolution.
To use Clustal via SeaView, open your .txt file in SeaView. Your sequence will appear in the left pane and the corresponding sequences in the right pane. Click Align ? Alignment options and select Clustal (SeaView drives the version ClustalW2). Next click Align ? Align all. A window showing the progression of the alignment procedure will appear. Save the completed alignment as a NEXUS file. You’re now ready to make a tree.
Before jumping into one of the many phylogeny programs available, think about what you ultimately want to know. If you simply need a tree of relationships, then maximum-likelihood programs like RAxML, parsimony programs like TNT, or Bayesian probability programs like MrBayes will do the job. Although these three types of programs use different mathematical methods to analyze evolutionary relationships, the resulting trees should be quite similar. While some phylogeneticists adhere to a single method, many biologists prefer to confirm their work by using two or three. Web-based platforms, like SeaView, make some of these programs and others simpler to use, but be prepared to consult the program manual.
If you want to assess when organisms evolved, you’re in luck, because the phylogeny program BEAST makes that task less daunting. Moreau’s lab uses BEAST because it can incorporate fossil evidence, geologic data, and known mutation rates to estimate species relationships and divergence times simultaneously.
With the BEAST folder open, double-click on BEAUti, BEAST’s graphical user interface. In BEAUti , select File ? Import Alignmentand select your NEXUS formatted alignment. What you do next depends on how you want to measure time: via fossils, geology, and/or mutation rate. Moreau uses fossils and geology to set age limits. “If I have a fossil and I know it belongs in the same group as some of my ants, I tell BEAST that group of ants must be at least as old as the fossil,” she explains. “Or if a group of ants is endemic to an island, I know that group can’t be older than the island.” Alternatively, if a gene you’ve sequenced to create your phylogeny has a known rate of mutation, BEAST can use it to estimate when each taxon originated.
To enter fossil or geological information, click on the Priors tab and highlight the group of taxa related to the fossil, as well as the organism most closely related to this group. Enter the age of the fossil or geological cue (e.g., the age of the island) into the section labeled “TMRCA” (The Most Recent Common Ancestor). To enter a known or estimated mutation rate, click the Clock Model tab, select Strict Clock and insert the rate. For help or to explore other functions, check out online tutorials or the BEAST user group, which is monitored by the developers who wrote the program.
After you save your settings as an XML file, go back to the BEAST folder, open BEAST and select Run. When the program has finished running, import the file into TreeAnnotator (also in the BEAST folder). BEAST generates many plausible trees, each with an associated probability, since it’s impossible to determine the tree with 100 percent certainty. As a result, the data file generated directly from BEAST is too large. TreeAnnotator singles out one representative tree and annotates it with information summarized from other probable trees. For example, if a large proportion of the plausible trees agree on a relationship between A and B, it will indicate that the relationship between A and B is well supported. Save this tree as a .tree file. Next, open your .tree file in FigTree. Here you can arrange other outputs of the program, such as divergence dates (with their corresponding error bars). Save this tree as a NEXUS file. Among other information within that file, a line full of parentheses (such as orangutan(chimp(human))) will encode your tree in a format known as Newick, which phylogeny-related programs universally understand.
Now that you have a tree, you’re ready to test ideas about how or why those organisms diversified. Did a horned beetle give way to many horned species, or did these horned species arise independently from a beetle with a smooth noggin? This might sound like a simple question, but when you have 100 taxa and 8 character states (e.g. tall horn, jagged horn), you’ll need to infer the state of the ancestor between each pair of organisms, down to the root of the tree. For this problem, Ree recommends Mesquite, a graphics-oriented program that handles questions of character evolution, patterns of species diversification, inquiries about population genetics, and more.
Open Mesquite and click on File ? New. Indicate how many taxa you have in your tree, and at the prompt, create a character matrix. If the features you’d like to enter are discrete, click Categorical Matrix. If they are continuous, like height, click Continuous Matrix. Next enter your taxa and character states in the matrix provided. If it’s a measurement, enter in the numbers without units. Finally, upload the NEXUS file containing your tree.
As with building trees, you can estimate ancestral character states with parsimony or maximum likelihood. Parsimony will find the solution with the fewest number of changes. (This is your only option with continuous characters.) Do a parsimony analysis by clicking on Analysis ? Trace Character History ? Parsimony Ancestral States. The inferred ancestral states will then appear at the nodes.
Maximum likelihood, on the other hand, takes into account branch lengths when determining an ancestral state. The program will be less certain about the state of an ancestor connecting two species that split millions of years ago. A small pie chart at each node indicates this probability. And lower probabilities will reverberate at later nodes. To run a maximum likelihood analysis, go to Trace ? Reconstruction Method ? Likelihood Ancestral States.
Anyone who’s looked at trees with more than 30 taxa knows they aren’t simple to read. Dozens of parallel and perpendicular lines blend, and it’s hard to see the story they tell. University of Arizona phylogeneticist Michael Sanderson recommends Dendroscope to make sense out of what you see.
Begin by uploading the NEXUS file containing your tree into Dendroscope. On the tool bar you’ll notice icons for different sorts of trees: ones with diagonal connections, with branches radiating out from the center, with the main groups separated by long branches, and others. Click on each of these to see what your tree will look like in each format—the relationships stay the same.
If you’d like to highlight one group of taxa, press the shift key and click on a branch within that group. This will change the color of these branches. Open the Format window, and under Edit, change the font, color, and width of lines. Once you like what you see, export the file as a JPEG, PDF, GIF, or another format.
For a killer 3-D presentation, upload your NEXUS file into a visualization program called Paloverde, and click on the icon illustrating the form of 3-D tree you prefer. Paloverde works well for visualizing moderately large trees, between 100–2,500 taxa.
Alternatively, if you have reliable information about where each organism was collected, you can spread your phylogeny over the surface of the globe with GeoPhylo, a program that projects phylogenies over Google Earth or NASA World Wind (you’ ll have to download these programs first). Copy and paste the parenthetical line from the NEXUS file generated by your tree-building program into the Rooted Tree Box in GeoPhylo. Under the Coordinates and Data tab, enter the longitude and latitude where each taxon was found. Click Run, and your tree will be displayed over the Earth.
Andrew Hill, a graduate student at the University of Colorado, Boulder who developed GeoPhylo with his advisor, Robert Guralnick, used it to explore the spread of avian influenza. First, they constructed a phylogeny of influenza viruses, particularly those with drug resistance-conferring mutations. They then projected the tree over the globe, to see how those lineages arose and spread around the world.