| Literature DB >> 30591035 |
Ziwei Li1,2, Bo He1, Qiang Kou3, Zhe Wang4, Si Wu4, Yunlong Liu5,6, Weixing Feng7, Xiaowen Liu8,9.
Abstract
BACKGROUND: Top-down mass spectrometry has unique advantages in identifying proteoforms with multiple post-translational modifications and/or unknown alterations. Most software tools in this area search top-down mass spectra against a protein sequence database for proteoform identification. When the species studied in a mass spectrometry experiment lacks its proteome sequence database, a homologous protein sequence database can be used for proteoform identification. The accuracy of homologous protein sequences affects the sensitivity of proteoform identification and the accuracy of mass shift localization.Entities:
Keywords: Homologous protein database; Mass spectrometry; Top-down
Mesh:
Substances:
Year: 2018 PMID: 30591035 PMCID: PMC6309053 DOI: 10.1186/s12859-018-2462-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Comparison of the theoretical spectrum (top) of an unmodified protein sequence EPPLSQETFS and the theoretical spectrum (bottom) of a modified proteoform EPPLS[phosphorylation]QETFS, in which the serine residue is phosphorylated. Only N-terminal fragment peaks are included in the theoretical spectra to simply the comparison. The fragment peaks in the box are shifted to the right by 79.97 Da in the bottom spectrum because of the phosphorylation
Fig. 2Outline of the experimental design. Raw MS data are converted to deconvoluted mass spectra, which are further searched against the K12 and ISC11 proteome databases separately. A K12 protein segment is obtained from each K12 proteoform identified from the K12 proteome database and searched against the ISC11 proteome database to find the best homologous ISC11 protein by BLAST-P. Then a global-local alignment between the homologous ISC11 protein sequence and the K12 protein segment is used to find the best-scoring homologous protein segment. Finally, homologous protein segments and ISC11 protein segments identified from the ISC11 proteome database are compared to evaluate the accuracy of the ISC11 protein segments, and ISC11 proteoforms are compared with K12 proteoforms to evaluate the accuracy of mass shift localization
Fig. 3The histogram of the sequence identities between the 3769 protein sequences in the K12 proteome database and their corresponding homologous sequences reported by BLAST-P with a 0.01 E-value cutoff from the ISC11 proteome database
Fig. 4Comparison of the numbers of spectra identified by TopPIC with a 1% spectrum-level FDR using a proteome database of the target species and a homologous proteome database. (a) The EC data set; (b) the MCF-7 data set
Fig. 5The CP and CS rates for the spectra in the 5 perfect subgroups G0, G1, G2, G3, G4 and in the 5 mass shift subgroups H0, H1, H2, H3, H4. (a) The perfect group; (b) the mass shift group