| Literature DB >> 27316337 |
Bo Wen1, Shaohang Xu1, Ruo Zhou1, Bing Zhang2, Xiaojing Wang2, Xin Liu1, Xun Xu1, Siqi Liu3,4.
Abstract
BACKGROUND: Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary.Entities:
Keywords: MS/MS; Peptide identification; Proteogenomics; Proteomics; RNA-Seq
Mesh:
Substances:
Year: 2016 PMID: 27316337 PMCID: PMC4912784 DOI: 10.1186/s12859-016-1133-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Schematic overview of PGA package
Fig. 2A pie diagram illustrating the results of novel peptide identification
Fig. 3Search score distribution of novel and canonical peptides. The Evalue was an expectation value to evaluate the PSM confidence, and in this study, it was directly obtained from the search results of MASCOT. The greater the value of -log2(Evalue), the greater the confidence in the identifications
Fig. 4The overlap of peptides identified by searching the customized proteomic database and the reference database
Identified transcripts and peptides at different numbers of input reads for Trinity
| No. of reads | No. of transcripts (>200 bp) | No. of identified peptides (FDR <= 1 %) |
|---|---|---|
| 5,602,829 | 56,809 | 65,841 |
| 12,568,426 | 99,797 | 68,473 |
| 28,886,097 | 174,587 | 69,038 |
| 47,136,664 | 233,256 | 68,900 |
| 81,871,805 | 305,653 | 68,236 |
Fig. 5The overlap of peptides identified by the two workflows