| Literature DB >> 23360617 |
Ja Koziol1, Nm Griffin, F Long, Y Li, M Latterich, Je Schnitzer.
Abstract
Mass spectrometry, an analytical technique that measures the mass-to-charge ratio of ionized atoms or molecules, dates back more than 100 years, and has both qualitative and quantitative uses for determining chemical and structural information. Quantitative proteomic mass spectrometry on biological samples focuses on identifying the proteins present in the samples, and establishing the relative abundances of those proteins. Such protein inventories create the opportunity to discover novel biomarkers and disease targets. We have previously introduced a normalized, label-free method for quantification of protein abundances under a shotgun proteomics platform (Griffin et al., 2010). The introduction of this method for quantifying and comparing protein levels leads naturally to the issue of modeling protein abundances in individual samples. We here report that protein abundance levels from two recent proteomics experiments conducted by the authors can be adequately represented by Sichel distributions. Mathematically, Sichel distributions are mixtures of Poisson distributions with a rather complex mixing distribution, and have been previously and successfully applied to linguistics and species abundance data. The Sichel model can provide a direct measure of the heterogeneity of protein abundances, and can reveal protein abundance differences that simpler models fail to show.Entities:
Year: 2013 PMID: 23360617 PMCID: PMC3599228 DOI: 10.1186/1477-5956-11-5
Source DB: PubMed Journal: Proteome Sci ISSN: 1477-5956 Impact factor: 2.480
Summary statistics for peptide counts
| 1 | 525 | 7 | 13.13 | 26.36 | 10.56 | 164.7 | 52.9 | |
| 1 | 302 | 6 | 12.37 | 20.43 | 6.06 | 60.8 | 33.7 |
Caption. Experiments 1 and 2 refer to the membrane replicates and caveolae replicates respectively, as described in the methods. 2075 unique proteins were identified in experiment 1 and 1069 in experiment 2.
Figure 1Rank-frequency plots of protein abundances from the first experiment, together with fitted distributions.A. Negative binomial. B. Discrete Weibull. C. Zipf. D. Zipf-Mandelbrot. E. Poisson inverse Gaussian. F. Sichel. Observed data are depicted in blue, and the fitted distributions are depicted in red. As described in the Methods, we start with a listing of all the proteins, along with their frequency of occurrence (abundance). The complementary cumulative distribution P(x) of the abundance x is defined as the fraction of proteins with abundance greater than or equal to x. Our plots depict both the observed and the fitted complementary cumulative distributions (ordinates) vs protein abundances (abscissas).
Figure 2Rank-frequency plots of protein abundances from the second experiment, together with fitted distributions.A. Negative binomial. B. Discrete Weibull. C. Zipf. D. Zipf-Mandelbrot. E. Poisson inverse Gaussian. F. Sichel. Observed data are depicted in blue, and the fitted distributions are depicted in red. As described in the Methods, we start with a listing of all the proteins, along with their frequency of occurrence (abundance). The complementary cumulative distribution P(x) of the abundance x is defined as the fraction of proteins with frequency greater than or equal to x. Our plots depict both the observed and the fitted complementary cumulative distributions (ordinates) vs protein abundances (abscissas).
Comparative statistics for six models
| 14533.9 | 7326.9 | |
| 14413.6 | 7280.8 | |
| 16146.9 | 8067.0 | |
| 14703.5 | 7482.7 | |
| 14238.4 | 7203.0 | |
| 14167.3 | 7189.8 |
Caption. Experiments 1 and 2 refer to the membrane replicates and caveolae replicates respectively, as described in the methods. AIC denotes Akaike’s information criterion; smaller values connote better model fits.