Pity the poor protein biologist. DNA sequence gurus have GenBank, structural biologists, PDB. But those looking to data-mine the spectral peaks and valleys of today’s burgeoning proteomics literature are out of luck.
Or are they?
Several freely available databases are dedicated to the storage, annotation, and analysis of mass spectrometric proteomics data. Yet because they are both poorly advertised and sparsely populated, mining them to feed the sorts of meta-analyses that have become staples of gene sequence and gene expression studies largely has not been possible.
“We’re at the stage now in proteomics where genome sequencing was maybe 10 years ago or so,” says Conrad Bessant, a bioinformatics group leader at Cranfield University, United Kingdom, who coauthored a 2009 paper on proteomics databases.1 “We’re starting to get good-quality data into the databases, with data standards to share data. But I don’t see many pieces of work where people are using ...