Statistics and Its Interface

Volume 5 (2012)

Number 1

Protein identification problem from a Bayesian point of view

Pages: 21 – 37

DOI: https://dx.doi.org/10.4310/SII.2012.v5.n1.a3

Authors

Randy J. Arnold (Department of Chemistry, Indiana University, Bloomington, In., U.S.A.)

Yong Fuga Li (School of Informatics and Computing, Indiana University, Bloomington, In., U.S.A.)

Predrag Radivojac (School of Informatics and Computing, Indiana University, Bloomington, In., U.S.A.)

Haixu Tang (School of Informatics and Computing, Indiana University, Bloomington, In., U.S.A.)

Abstract

We present a generic Bayesian framework for the peptide and protein identification in proteomics, and provide a unified interpretation for the database searching and the $de novo$ peptide sequencing approaches that are used in peptide identification. We describe several probabilistic graphical models and a variety of prior distributions that can be incorporated into the Bayesian framework to model different types of prior information, such as the known protein sequences, the known protein abundances, the peptide precursor masses, the estimated peptide retention time and the peptide detectabilities. Various applications of the Bayesian framework are discussed theoretically, including its application to the identification of peptides containing mutations and post-translational modifications.

Keywords

shotgun proteomics, protein identification, mass spectrometry, Bayesian methods

2010 Mathematics Subject Classification

60K35

Published 17 February 2012