New Molecular Solutions in Research and Development for Innovative Drugs

Principles of “shotgun” proteomics and proteogenomics

Boris Maček, Proteome Center Tuebingen

With development of new methodologies for protein extraction, separation and especially detection using high precision mass spectrometry, proteomics becomes increasingly capable of comprehensively and reliably identifying gene products.

Smaller proteomes of Protozoa can be almost completely detected and quantified by mass spectrometry, albeit still at a considerable effort. Detection of the complete proteome of the yeast was recently reported and other smaller proteomes, especially those of bacteria, are within reach. These improvements in proteome coverage resulted in an increased application of MS-based proteomics to genome annotation and refinement.

In a typical “shotgun” proteomics experiment, the complete protein extract of an organism is digested into peptides, which are then mass-measured and fragmented in a mass spectrometer. In proteogenomic applications, peptide mass spectra are typically searched against a database containing six-frame translation of the raw genome assembly and can therefore identify new, unpredicted open reading frames and refine existing gene models in terms of protein start and stop positions, exon-intron structure as well as their exact boundaries.

Although conceptually relatively simple, application of mass spectrometry to genome reannotation still presents a range of challenges, especially in terms of data analysis; six-frame translation databases significantly increase the search space, often requiring application of special strategies in database search and data processing1,2.

Here we will give a brief historical overview of the field of shotgun proteomics and proteogenomics, and outline current workflows in sample preparation, MS measurement and data processing strategies used in genome annotation by MS data.

References

  1. Krug, K., Nahnsen, S., Macek, B. 2011. Mass spectrometry at the interface of genomics and proteomics. Mol Biosystems 7(2), 284 – 291
  2. Krug, K., Carpy, A., Behrends, G., Matic, K., Soares, N.C., Macek, B. 2013. Deep coverage of the Escherichia coli proteome enables the assessment of database search strategies in bacterial proteogenomics experiments. Mol Cell Proteomics. 12(11):3420-30