| Literature DB >> 21852304 |
Melissa M Matzke1, Katrina M Waters, Thomas O Metz, Jon M Jacobs, Amy C Sims, Ralph S Baric, Joel G Pounds, Bobbie-Jo M Webb-Robertson.
Abstract
MOTIVATION: In the analysis of differential peptide peak intensities (i.e. abundance measures), LC-MS analyses with poor quality peptide abundance data can bias downstream statistical analyses and hence the biological interpretation for an otherwise high-quality dataset. Although considerable effort has been placed on assuring the quality of the peptide identification with respect to spectral processing, to date quality assessment of the subsequent peptide abundance data matrix has been limited to a subjective visual inspection of run-by-run correlation or individual peptide components. Identifying statistical outliers is a critical step in the processing of proteomics data as many of the downstream statistical analyses [e.g. analysis of variance (ANOVA)] rely upon accurate estimates of sample variance, and their results are influenced by extreme values.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21852304 PMCID: PMC3187650 DOI: 10.1093/bioinformatics/btr479
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Calu-3 cell-line experiment. (a) The rMd-PAV plot of the LC-MS runs. Runs identified as outliers (blue downward triangles) sit above the red horizontal line which represents the log2 (χ0.9999,52) critical value (i.e. P=0.0001). The empty upward triangles below the red horizontal line represent runs identified as suspect by the MS expert that were not identified as statistical extreme. (b) The correlation plot of the LC-MS runs.
Fig. 2.The ROC curves from the rMd-PAV and correlation alone outlier analyses of the calu-3 cell-line experiment.
Fig. 3.Cigarette smoke exposure experiment. (a) Box plots of peptide abundance values observed in LC-MS runs (n=98) for the mouse plasma dataset. The color indicates experimental group membership. (b) The rMd-PAV plot of the LC-MS runs. Those runs identified as outliers sit above the red horizontal line which represents the log2 (χ0.9999,52) critical value (i.e. P≤0.0001). The downward triangles represent outlier runs—red represents all technical replicates from a biological sample, and blue represents individual technical replicates within a sample. (c) The run-by-run (r) correlation plot of the LC-MS runs.
Fig. 4.Cigarette smoke exposure experiment. The score plot of the first two latent variables resulting from the rPCA of the data. It suggests the runs labeled on the plot are outliers due to the fraction of missing peptide abundance values, and the skewness and kurtosis of the peptide abundance distribution within a run.