Ancient Life in the Information Age

What can bioinformatics and systems biology tell us about the ancestor of all living things?

By Aaron David Goldman | March 1, 2014

GETTING TO THE ROOT OF THINGS: Charles Darwin drew this famous tree on page 36 in his Notebook B (1837-38) to illustrate his ideas about the relationship between living (branches with perpendicular tips) and extinct (branches with no tip) organisms that descended from a common ancestor (circled 1 at base of tree).© CAMBRIDGE UNIVERSITY LIBRARYAll known organisms share a number of fundamental features that, taken together, point to a common evolutionary history: DNA as the chief molecule of genetic inheritance, proteins as the primary functional molecules, and RNA as an informational intermediate between the two. The simplest explanation for why organisms share these common features is that they are inherited from a last universal common ancestor (LUCA), which sits at the root of the tree of life. Most studies of gene duplications that occurred prior to the first branch on the tree place LUCA in between the Bacteria and the common ancestor of the Archaea and Eukarya, the three taxonomic domains of cellular life.

The availability of the genome sequences from so many species across the tree of life has made it possible to look for common genomic traits that were most likely inherited from LUCA. The methods employed to identify these common genomic traits can vary greatly, however, and as a result lead to very different predictions. Some studies have estimated there to be fewer than 100 LUCA-derived gene families, while others count more than 1,000, depending on how conservatively the methods rule out genes on suspicion of horizontal gene transfer or how liberally they include genes that appear to have been present in LUCA, but subsequently lost. Despite the conflicting results, the new data are yielding insight into ancient life on Earth.  

The majority of ancient gene families identified in almost all of these studies are involved in the translation of genetic information into proteins. These ancient gene families represent a range of translation functions, from regulation to ribosomal components. The genetic code at the core of translation is also highly conserved across life. In all likelihood, the enzymes responsible for establishing the genetic code by attaching amino acids to particular tRNAs evolved prior to the time of LUCA, although their evolutionary histories are obscured by subsequent horizontal gene transfers between bacteria and archaea. These results depict a translation system in LUCA that was probably similar to and as sophisticated as those of organisms alive today.

In contrast, few genes involved in the synthesis of DNA are conserved across the tree of life. The enzymes responsible for making deoxyribonucleotides from ribonucleotides exist in three distinct families that only show a weak signature of common descent in their active sites. The only DNA polymerase enzymes that are common across the evolutionary tree are those involved in repair, not the polymerases presently responsible for copying complete chromosomes. RNA polymerases from bacteria, archaea, and eukaryotes, on the other hand, do appear to have been inherited from LUCA, and may have previously functioned as DNA polymerases as well. Taken together, these observations suggest that DNA genomes replaced a genome composed of RNA just prior to or perhaps just after the time of LUCA.

The variety of metabolic strategies observed in modern organisms demonstrates that metabolism is generally less highly conserved, which makes it harder to identify those metabolic pathways that were present in LUCA. Still, various databases organize enzymatic data into metabolic maps, which can be used to uncover highly conserved components of modern metabolic pathways. For example, a recent study combined these data with evolutionary trees of carbon-fixation genes and found that the ancestral carbon-fixation pathway was most likely an amalgam of components currently found in two separate pathways in extant archaea and bacteria: the reductive acetyl-CoA pathway and the reductive citric acid cycle (PLOS Comput Biol, 8:e1002455, 2012). Another taxonomically broad comparison study, focused on amino acid metabolism, uncovered conserved biosynthetic pathways for 8 of the 20 canonical amino acids, and conserved enzymes from pathways for another eight (Genome Biology, 9:R95, 2008).

Finally, LUCA most likely had a phospholipid membrane that set the boundaries between organisms and offered protection from the external environment. The universal presence of genes responsible for targeting proteins to membranes suggests that LUCA’s membrane was replete with proteins. Furthermore, the ubiquity of both catalytic subunits of the membrane-bound ATPase motor also implies that this membrane was impermeable enough to ions that it could be used to generate the proton gradients used by the motor to convert ADP to ATP.

While this detailed understanding of LUCA is relatively recent, Darwin proposed the idea of an early common ancestor to all life in the first edition of Origin of Species, where he wrote, “Therefore I should infer from analogy that probably all the organic beings which have ever lived on this earth have descended from some one primordial form, into which life was first breathed.” Although Darwin’s insight is brilliant for its time, the modern view shows that LUCA is not this “primordial form,” but rather a sophisticated cellular organism that, if alive today, would probably be difficult to distinguish from other extant bacteria or archaea. This means that a great detail of evolution must have taken place between the time of the origin of life and the appearance of LUCA. Continuing advances in evolutionary biology, bioinformatics, and computational biology will give us the tools to describe LUCA and the evolutionary transitions preceding it with unprecedented accuracy and detail.

Aaron David Goldman is an assistant professor of biology at Oberlin College. His research employs bioinformatics and systems biology tools to study the genome and metabolism of LUCA and their connections to evolutionary predecessors.

Suggested Reading

A. Becerra et al., "The very early stages of biological evolution and the nature of the last common ancestor of the three major cell domains," Annu Rev Ecol Evol Syst, 8:361-79, 2007.

R. Braakman, E. Smith, "The emergence and early evolution of biological carbon-fixation," PLoS Comput Biol, 8:e1002455, 2012.

P. Forterre, "The origin of DNA genomes and DNA replication proteins," Curr Opin Microbiol, 5:525-32, 2002.

S.J. Freeland et al., "Do proteins predate DNA?" Science, 286:690-92, 1999.

A.D. Goldman, et al., "LUCApedia: a database for the study of ancient life," Nucleic Acids Res, 41:D1079-82, 2013.

A.D. Goldman, L.F. Landweber, "Oxytricha as a modern analog of ancient genome evolution," Trends Genet, 28:382-88, 2012.

A.D, Goldman et al., "The evolution and functional repertoire of translation proteins following the origin of life," Biol Direct, 5:15, 2010.

R.D. Knight et al., "Rewiring the keyboard: evolvability of the genetic code," Nat Rev Genet, 2:49-58, 2001.

J.M. Kollman, R.F. Doolittle, "Determining the relative rates of change for prokaryotic and eukaryotic proteins with anciently duplicated paralogs," J Mol Evol, 51:173-81, 2000.

D. Theobald, "A formal test of the theory of universal common ancestry," Nature, 465:219-22, 2010.

C. Woese, "The universal ancestor," PNAS, 1998;95:6854-59, 1998.



Add a Comment

Avatar of: You



Sign In with your LabX Media Group Passport to leave a comment

Not a member? Register Now!

LabX Media Group Passport Logo


Avatar of: jeenious


Posts: 45

March 4, 2014



I have never believed that if two species have particular characteristics in common that "necessarily" is a consequence of (reliable evidence on which to base) their having shared a common ancestor.




If certain characteristics are attained evolutionarily and subsequently lost, there are multiple ways in which a species might turn out to have come to express them BEYOND mere in-line mutations.  Or, if not, then ruling out other possibilities would leave some puzzling anomolies to the general rule.




When changes occur for one member of a species, and increase in offspring (for any reason) there tend to remain members of that species that retain the previous coding.  And, where many veritable "packages" of characteristics are conserved at some statistical level, this would account for how each and both of natural selective factors or artificial selective factors would be, and do, avail of scattered remnants of characteristics, some dominant and some recessive -- some occurring almost insignificantly beginning at one delta moment, or some appearring in a small, but fitness-significant portion of a population, or a portion cut off (by any means) from back-breeding with the main population.




That, however, need not be the whole story.  Why might it not be that in some instances one or more individuals that have inherited a particular mutation from both parents might not back-mutate -- that is, might by the same error sequence in reverse, might mutate back by the same route to a characteric thereby regained.




Another possibility is that a gamete-infecting virus or bacterium or prion might induce a sequence into RNA that would transfer from, or immulate, a sequence found in an unrelated species.  The statistical likelihood that such senquencing anomoly would be functional and non-deleterious and, also, offer some fitness upgrading value which would tend to conserve it in progeny so as to spread it among a species population may be enormously slim; but so too are the chances that any mutation would go uncorrected, get passed on via fertilization, result in a viable offspring, occur in an offspring that survives to sexual maturity, provide some fitness advantage, and get spread throughout a species population... slim.




It is conceivable that an insect vector might convey some infector, also, after conception, early enough but not too early, to be conserved in the gammetic cells of an offspring.




If any of things could happen, no matter how infrequently, then -- given enough generations -- they will.  And given enough more generations than that... even more alterations of genome surely can occur over time, such that a species population (or sub-population) could devlope and pass along one or more characteristics that may be found in at least one other plant or animal species without being put there "NECESSARILY" by a common ancestor.




After all, when trees are drawn, some things appear to "jump" from one species to another after where particular descendants don't neatly fit.




Problems in drawing a tree that accounts for cross-species marsupialism would certainly suggest that there are other things going on besides just mutations passed along in-line, and in sequence.




Charts of speciation trees once had only taxonomic comparisons to inform them.




After discovering genes, we learned that some sub-species are substantially different in taxonomic appearance from which species they were previously deemed to be of.  




It is only within recent years that we have improved our ways of obtaining, preserving and preparing soft tissue samples, without destroying or drastically corrupting the integrity of RNA/DNA in them.  Therefore, there is much to be studied as to ways of determining how any fitness-enhancing characteristic may have been acquired via one or MANY alterations of sequences within genotypes.




I predict that results of RNA/DNA sequences derived from preserved soft tissue samples will reveal more and more significant multi-generational DNA changes that cannot be accounted for solely in terms of a progress characterizable as:




mutation occurs in one member of a species;




that mutation spreads and turns out to be fitness-improvment-yielding;




that mutation is passed on to many successive generations;








... with no characteristic ever having popped up that parallels something in another species, or that imulates something found in another species, by virtue ALONE of the two homeogenetic programs coming from a common ancestor.




I raise this not as an argument, but as a QUESTION.




Sadly, ancient soft tissue from which sufficient RNA/DNA sequences can be forensically established are rare and DIMINISHING and grossly expensive and difficult to access.




But even one single exception from a general rule can be enormously informative.






Avatar of: Doug Easton

Doug Easton

Posts: 15

March 11, 2014

Convergence and horizontal transfer between major lines are quite confounding issues but not impossible to surmount with enough data in hand. Of course the question is how much data.

Usually convergence can be ruled out by analysis of DNA sequence much more easily than at the phenotypic level because of the complexity of gene sequences and the low likelihood that two very similar sequences arise by chance rather than by shared ancestry.

Horizontal transfer is much more difficult to deal with.

Avatar of: EvMedDr


Posts: 14

March 11, 2014

Looking back to LUCA will ultimately provide insight to vertebrate evolution unaccessible with DNA analysis. This is particularly true for the origin of vertebrate traits that evolved from the plasmalemmae of protists.

Popular Now

  1. Dartmouth Professor Investigated for Sexual Misconduct Retires
  2. Two University of Rochester Professors Resign in Protest
  3. Theranos Leaders Indicted For Fraud
    The Nutshell Theranos Leaders Indicted For Fraud

    Federal prosecutors filed criminal charges that allege the company’s promise to revolutionize blood testing swindled investors out of hundreds of millions of dollars and put patients in danger.

  4. Laxative Causes Long-Term Changes to Mouse Microbiome