| Literature DB >> 19633097 |
Smriti R Ramakrishnan1, Christine Vogel, Taejoon Kwon, Luiz O Penalva, Edward M Marcotte, Daniel P Miranker.
Abstract
MOTIVATION: High-throughput protein identification experiments based on tandem mass spectrometry (MS/MS) often suffer from low sensitivity and low-confidence protein identifications. In a typical shotgun proteomics experiment, it is assumed that all proteins are equally likely to be present. However, there is often other evidence to suggest that a protein is present and confidence in individual protein identification can be updated accordingly.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19633097 PMCID: PMC2773251 DOI: 10.1093/bioinformatics/btp461
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Integrative analysis of MS-based shotgun proteomics and gene functional networks. A complex protein sample, e.g. cellular extract, is enzymatically digested into peptides and subjected to tandem mass spectrometry. Experimental spectra are searched against a database of theoretical spectra generated from protein sequences, or identified via de novo sequencing, using a peptide and protein identification software pipeline that produces a confidence score per protein [e.g. PeptideProphet (Keller et al., 2002) and Protein-Prophet (Nesvizhskii et al., 2003)] and a list of high-confidence proteins with scores that satisfy an error threshold (e.g. 5% FDR). We introduce a next stage of computational analysis which places proteins in a broader systems biological framework. MSNet uses protein-protein links from a functional network to identify proteins that may not be identified with high confidence by MS evidence alone, but are nevertheless highly likely to be present as demonstrated by the combination of MS evidence with functional links to other MS identified proteins. We find that the integrated analysis of mass spectrometry experiments and gene functional networks can improve the precision and sensitivity of protein identification at acceptable error rates.
Datasets and experimental setup
| Dataset | MS/MS experiment | Protein reference set | Number of proteins |
|---|---|---|---|
| YPD-ORBI | Cell lysate from yeast BY4742 wild-type grown in rich medium (YPD) analyzed on LTQ- ORBItrap (8inj) | YPD*: Proteins identified in ≥ 1 of three non-mass spectrometry experiments (Futcher | 3816 |
| YPD-LCQ | Cell lysate from yeast BY4742 wild-type grown in rich medium (YPD) analyzed on LCQ (5inj) | YPD* defined above | 4385 |
| YPD-LCQ-Fraction | Cell lysate, fractionated in polysomal gradient from yeast grown in rich medium (YPD) analyzed on LCQ (3inj) | Known ribosomal, translation and ribosome biogenesis proteins (Nash | 1393 |
| YMD-LCQ | Cell lysate from yeast BY4742 wild-type grown in minimal medium (YMD) analyzed on LCQ (6inj) | YMD*: Proteins identified in at least one of three experiments (de Godoy | 4651 |
| Human-293T, ORBI | HEK293T kidney embryonic cells transfected with GFP lenti-virus vector | No comprehensive reference set available | 1860 |
The protein sample undergoes MS/MS analysis to generate a list of proteins identified by MS/MS identification software. We generate MSNet protein identification scores, on a genome-wide scale, for each protein that has at least one peptide identified in the MS experiment (Number of proteins). When available, we use a protein reference set as ground-truth to determine true and false identifications for evaluation. Inj—injection, i.e. technical replicate during MS/MS experiment; LCQ—LCQ DecaXP+MS/MS instrument; ORBI—LTQ-OrbiTrap MS/MS instrument).
MSNet performance evaluated with and without a protein reference set
| AUC (using reference set) | Number of proteins at 5% FDR (using network shuffling) | |||||
|---|---|---|---|---|---|---|
| Experiment | MS | MSN | % Increase | MS | MSN | % Increase |
| YPD-ORBI | 0.69 | 0.76 | 10 | 1420 | 1835 | 29 |
| YPD-LCQ | 0.55 | 0.68 | 24 | 548 | 591 | 8 |
| YPD-LCQ-Fraction | 0.78 | 0.91 | 17 | 246 | 285 | 16 |
| YMD-LCQ | 0.59 | 0.69 | 17 | 644 | 699 | 9 |
| Human-293T | – | – | – | 877 | [870–1233] | [0–40] |
First, we evaluated the performance of MSNet and the MS experiment using protein reference sets (Table 1), marking an identified protein as a true instance if it was present in the reference set and false otherwise. MSNet increased the AUC by 10–24% across datasets. Next, we evaluated MSNet independent of protein reference sets using a network-shuffling procedure (Section 2.2.2). We computed FDRshuff as the ratio between the cumulative null and true score densities at each score x. MSNet reported 8–29% more protein identifications at 5% FDRshuff in yeast and up to 40% more in human than ProteinProphet (Nesvizhskii et al., 2003) at its 5% FDR. MSN—MSNet, MS—ProteinProphet.
Fig. 2.Performance of MSNet on yeast grown in rich medium analyzed on a high-resolution mass spectrometer. (A) At least 94% of proteins identified by MSNet at 5% FDR can be validated either by presence in the protein reference set or by identification in the MS analysis; (B) ROC curves using a protein reference set to determine true and false identifications: MSNet identifies more true instances over a range of FPRs than original MS experiment and results in 10% higher AUC; (C) precision–recall curves: MSNet identifies more proteins at high precision (i.e. low FDR) than the MS analysis.
Fig. 3.Protein YBR234C (ARC40) and its immediate neighbors from the yeast gene functional network (Lee et al., 2007). The protein was identified with high confidence by MSNet, but not by the original MS analysis. YBR234C is an essential subunit of the ARP2/3 complex required for the motility and integrity of cortical actin patches, and involved in cell growth and polarity. Deletion of the gene causes notable growth defects (Giaever et al., 2002), a fact that strongly supports its expression. It is also present in the yeast reference set (Table 1, YPD*). MSNet gave YBR234C a high score because it had multiple neighbors that were either confidently identified in the MS experiment (circle) or had some MS evidence (hexagon, ≥1 peptide identified). The other neighbors (square) had no peptides identified. Figures were created using Cytoscape (Shannon et al., 2003).