For nearly 300 years, cell biology has been largely an observational science. Robert Hooke in 1665 saw structures under the microscope that he called cells. Anthony van Leeuwenhoek discovered cellular substructures in 1700, which Robert Brown dubbed 'nuclei' in 1833. Cell biologists have described many other substructures since then, the most prominent among them being the mitochondria, Golgi apparatus, endoplasmic reticulum, and nucleolus.

With the advent of molecular biology, cell biologists were no longer content to observe these structures' shapes; they wanted to identify their molecular components and learn how those components govern organellar function. In so doing, they could fulfill two interrelated goals. The first, identifying a protein's location or cellular home, helps us to understand how particular organelles work. At the same time, because each organelle has unique functions, assigning novel proteins to a specific cellular address offers vital clues to determining those molecules' duties.

For years, as...


Many of the methods researchers have devised to study protein localization, both computational and experimental, are limited either by the paucity of reagents or the unreliability of gene predictions. Mass spectrometry-based proteomics may offer a better solution. By identifying proteins based solely on the mass and sequence of their peptides, this technique offers perhaps the most direct method to detect organellar protein localization.1


Courtesy of Matthias Mann

A page from Ramon y Cajal's notebook in 1910, showing various nuclear substructures, including nucleoli and the organelle now called Cajal's body

The process begins by purifying organelles, often by centrifugation techniques, and harvesting their proteins. The protein mixtures are digested into small fragments using a specific enzyme and the mix is fractionated, by size or charge, for instance, on ultrasmall liquid chromatography columns. The peptides are ionized as they emerge from the column and then mass analyzed. The resulting spectrum reveals the mass of the peptides in the mixture; the amino acid sequence can also be obtained if a second round of fragmentation and mass analysis are added to the experiment. In either case, the last step is application of computer algorithms, which match the spectra against protein sequence databases to identify the proteins from which the peptides originated.

This liquid chromatography-tandem mass spectrometry (LC MS/MS) approach can identify hundreds of proteins in a single run. But because of the technique's exquisite sensitivity, it can be difficult to determine whether these proteins are actually members of a particular complex or are just copurifying contaminants. To address this question, we developed a simple quantitative proteomics method called protein correlation profiling.2 Usually organelles are purified by centrifugation. But in this case, instead of just sequencing all the proteins in the peak fraction, that is, the fraction where the organelle is most abundant, we sequence the proteins in all of the centrifugal fractions as well. We then quantify the thousands of peptides identified in each fraction using the extracted ion current of each peptide, producing an abundance profile for each protein. Finally, we mathematically determine which of the peptides group together in common profiles and are therefore likely to be found in the same complex.


Mass spectrometry already has been applied to many organelles.34 Together with Angus Lamond in Dundee, Scotland, we recently characterized the protein components of the nucleolus, a sub-body of the cell nucleus and the site of ribosomal RNA synthesis and ribosome maturation.5 Contrary to previous expectation, the nucleolus appears to be an extremely complex cellular machine requiring more than 700 different proteins to fulfill its various functions. Similarly, we have found hundreds of proteins in the human spliceosome, a cellular machine that edits premessenger RNA. Using protein correlation profiling, we identified most of the components of the human centrosome, an organelle responsible for anchoring the microtubule network. We expect that most large cellular complexes can be mapped in this way.

Knowing the cellular home of most human proteins will be tremendously useful in associating proteins with a broad cellular role and a place of activity. That knowledge can then be overlaid with other large-scale data to solve a particular biological or medical question. This is illustrated by the recent identification of the gene responsible for Leigh Syndrome French Canadian variant (LSFC), a fatal autosomal recessive disease characterized by mitochondrial dysfunction. In collaboration with Eric Lander's group at the Whitehead Institute for Biomedical Research in Cambridge, Mass., and with MDS Proteomics in Toronto, we used a combination of genetics, mRNA expression data, and organellar proteomics to identify the culprit gene.6

We began with the results of a genome-wide association study that implicated a 2-Mb interval on chromosome 2p16-21. A survey of the mitochondrial proteome suggested that one of the genes in this interval, LRPPRC, encoded a protein targeted to this organelle, while microarray analyses demonstrated that the transcript encoding this protein is coregulated with other mitochondrial genes. Together these data made LRPPRC our top candidate for the gene underlying LSFC. Direct mapping of tandem mass spectra to the genome allowed us to properly annotate LRPPRC's gene structure, where we discovered three exons not annotated in the human genome. Systematic rese-quencing of the gene in patients, parents, and unrelated controls identified segregating mutations, providing genetic proof that LRPPRC mutations underlie this disorder.



Courtesy of Matthias Mann

Output from mass spectrometry-based proteomics of an organelle. (A) Summed mass spectrometric signal of all peptides eluting from a chromatography column. (B) Mass spectrum at the time indicated by the arrow in panel A. (C) A peptide from panel (B) has been selected in the mass spectrometer and fragmented to determine the amino acid sequence. (D) The signal of the peptide in (B) and (C) plotted as a function of elution time. The area under the curve is a measure of the peptide's abundance.

Separately, the mitochondrial proteomics project established that this important and ubiquitous organelle has a surprisingly different protein composition from tissue to tissue.7 Characterization of the mitochondrial proteome in brain, heart, kidney, and liver showed that there was an overlap of only 85% between proteins from any two tissues. Coexpression studies of the mitochondrial mRNAs across a much wider range of tissues, using microarray data sets that are readily available through the Internet, resulted in essentially the same estimate for tissue diversity of mitochondrial proteins. A core proteome concerned with the energy-generating activities of mitochondria was relatively unchanging.

While it had been clear from electron microscopy that mitochondria look quite different in different tissues, this hadn't been appreciated at the proteomic level. Tissue diversity of organelles clearly needs to be taken into account to understand their function and will add a further layer of complexity to the study of organellar proteomes.

We also used the microarray data to ask how the mitochondrial cast of characters is coregulated. This allowed us to identify nuclear genes that tend to be coexpressed with mitochondrial proteins and are therefore possibly linked to organellar function. Thus we identified mitochondrial proteins that mass spec analysis passed over because of their low abundance, including several excellent candidates for mitochondrial-DNA repair enzymes. These genes had been known in yeast yet elusive in mammals. Genes coregulated with mitochondria also included proteins that were not actually mitochondrial proteins at all, including several known mitochondrial biogenesis regulators from the nucleus.

Going beyond mere protein catalogs, proteomics methods can track protein flux into and out of an organelle directly. For this purpose, we have metabolically encoded the cellular proteome by growing cells in media where an essential amino acid is replaced by its stable 13C-labeled analog (such as 13C-arginine). This method of stable isotope labeling by amino acids in cell culture (SILAC), makes it possible to determine if peptides came from one or another set of cells, which were exposed to different stimuli, for example. In one application, we have used SILAC to identify proteins transported into or out of the human nucleolus in response to a panel of small-molecule drugs.

Clearly, organellar biology has come a long way from the days of passive observation. Modern molecular biology and the proteomics technologies we use in my lab are beginning to reveal just how dynamic these structures can be. So in a sense, we, like van Leeuwenhoek, Brown, Cajal, and Golgi, are really only seeing these structures for the first time.

Matthias Mann directs the Center for Experimental BioInformatics (CEBI; http://www.cebi.sdu.dk), a leading proteomics group, at the University of Southern Denmark. He is interested in a wide range of biological questions that can be solved with input from proteomics methods, as well as in the further development of mass spectrometric technologies and bioinformatic algorithms in proteomics.

He can be contacted at mann@bmb.sdu.dk.

Interested in reading more?

Magaizne Cover

Become a Member of

Receive full access to digital editions of The Scientist, as well as TS Digest, feature stories, more than 35 years of archives, and much more!
Already a member?