| Literature DB >> 25149441 |
Gloria M Sheynkman, James E Johnson, Pratik D Jagtap, Michael R Shortreed, Getiria Onsongo, Brian L Frey, Timothy J Griffin, Lloyd M Smith1.
Abstract
BACKGROUND: Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides. While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. To address this problem, we have implemented readily accessible, modifiable, and extensible workflows within Galaxy-P, short for Galaxy for Proteomics, a web-based bioinformatic extension of the Galaxy framework for the analysis of multi-omics (e.g. genomics, transcriptomics, proteomics) data.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25149441 PMCID: PMC4158061 DOI: 10.1186/1471-2164-15-703
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Experimental overview. The Galaxy-P workflows take as input sample-specific RNA-Seq data and create sample-specific protein databases. These protein databases are then employed for MS-based proteomics database searching. The workflows were developed on datasets generated from human (Jurkat cells) and mouse (B6 and CAST islets) samples.
Figure 2Overview of workflows implemented in Galaxy-P that utilize RNA-Seq data for improved proteomics. The single amino acid polymorphism (SAP) database workflow detects non-synonymous SNPs that yield SAPs. The splice database workflow detects alternatively spliced transcripts and the corresponding novel splice junction polypeptide sequences. The reduced database workflow quantifies the sample’s transcriptome, optionally removes likely unexpressed protein sequences, and allows determination of RNA-protein correlations. Post-search tools filter and annotate novel peptides.
Figure 3Comparison of score distributions of all peptides identified in the search versus peptides containing SAPs. For Jurkat cells, the distribution of SEQUEST XCorr Scores for peptides passing a 1% false discovery rate were compared between 1) peptides mapping to the Ensembl reference proteome, and 2) peptides containing single amino acid polymorphisms (SAPs) derived from the sample-matched RNA-Seq data. SAP-containing peptides had, on average, higher peptide spectral match (PSM) quality scores as compared to those of reference peptides, attesting to the high quality of the sample-specific SAP database employed for MS searching.
Results from creating SAP databases and using them for searching proteomic datasets
| Sample | SAP database | Proteomic identifications | ||
|---|---|---|---|---|
| SAPs | SNP sites | SAP Peptide IDs* | SNPs ID’d | |
| Jurkat human cells | 9,168 | 6,924 | 522 | 491 |
| B6 mouse islets | 1 | 1 | N/A | N/A |
| CAST mouse islets | 476 | 249 | 22 | 19 |
*peptide passing a 1% FDR.
Results from creating splice junction databases and using them for searching proteomic datasets
| Sample | Splice database | ||
|---|---|---|---|
| Size | Min. depth | Peptide IDs* | |
| Jurkat human cells | 33,372 | 6 | 67 |
| B6 mouse islets | 57,587 | 4 | 64 |
| CAST mouse islets | 43,244 | 4 | 66 |
*peptide passing a 1% local FDR.
Results from MS searching with the original Ensembl protein database and the reduced database
| Sample | RNA-Seq reads | Mass spectra | Original database | Reduced database | |||
|---|---|---|---|---|---|---|---|
| # entries | Peptide IDs* | # entries | Peptide IDs* | % increase | |||
| Jurkat human cells | 80 M | 500 K | 104,310 | 73,123 | 82,101 | 73,436 | 0.4 |
| B6 mouse islets | 94 M | 250 K | 52,165 | 30,212 | 18,052 | 30,220 | 0.3 |
| CAST mouse islets | 126 M | 250 K | 52,165 | 28,902 | 16,940 | 28,823 | 0.2 |
*peptide passing a 1% FDR.