Literature DB >> 25251379

Construction and assessment of individualized proteogenomic databases for large-scale analysis of nonsynonymous single nucleotide variants.

Karsten Krug1, Sasa Popic, Alejandro Carpy, Christoph Taumer, Boris Macek.   

Abstract

Next-generation sequencing projects focusing on genomes and transcriptomes identify millions of single nucleotide variants (SNVs), many of which result in single amino acid substitutions. These nonsynonymous (ns) SNVs are typically not incorporated into protein sequence databases used to identify MS/MS data. Here, we perform a comparative analysis of the assembly of nsSNV-containing proteogenomic databases. We use a comprehensive transcriptome and proteome dataset of HeLa cells from the literature to derive and to incorporate SNVs into databases applicable to proteomics search engines, and to assess their performance in the identification of nsSNVs. We assemble the databases by (1) translation of SNV-containing transcripts into all possible reading frames, (2) translation of predicted reading frame, (3) prediction of nsSNVs and subsequent incorporation into canonical protein sequences. We show substantial differences between generated databases in terms of represented nsSNVs and theoretical search space, affecting sensitivity and specificity of database search. We query the databases with >2.2M high-resolution MS/MS spectra using MaxQuant software and identify 451 variant peptides, containing 401 nsSNVs. We conclude that prediction of reading frame and, if applicable, SNV effect result in comprehensive yet compact databases necessary to retain sensitivity in large-scale analysis of nsSNVs called from transcriptomics data.
© 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

Keywords:  Bioinformatics; Database; Individualized proteomics; Proteogenomics; RNA-Seq; Single nucleotide variant

Mesh:

Year:  2014        PMID: 25251379     DOI: 10.1002/pmic.201400219

Source DB:  PubMed          Journal:  Proteomics        ISSN: 1615-9853            Impact factor:   3.984


  6 in total

Review 1.  Methods, Tools and Current Perspectives in Proteogenomics.

Authors:  Kelly V Ruggles; Karsten Krug; Xiaojing Wang; Karl R Clauser; Jing Wang; Samuel H Payne; David Fenyö; Bing Zhang; D R Mani
Journal:  Mol Cell Proteomics       Date:  2017-04-29       Impact factor: 5.911

2.  Single Amino Acid Variant Profiles of Subpopulations in the MCF-7 Breast Cancer Cell Line.

Authors:  Zhijing Tan; Song Nie; Sean P McDermott; Max S Wicha; David M Lubman
Journal:  J Proteome Res       Date:  2017-01-20       Impact factor: 4.466

3.  Integrating Next-Generation Genomic Sequencing and Mass Spectrometry To Estimate Allele-Specific Protein Abundance in Human Brain.

Authors:  Thomas S Wingo; Duc M Duong; Maotian Zhou; Eric B Dammer; Hao Wu; David J Cutler; James J Lah; Allan I Levey; Nicholas T Seyfried
Journal:  J Proteome Res       Date:  2017-08-09       Impact factor: 4.466

Review 4.  Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation.

Authors:  Gloria M Sheynkman; Michael R Shortreed; Anthony J Cesnik; Lloyd M Smith
Journal:  Annu Rev Anal Chem (Palo Alto Calif)       Date:  2016-03-30       Impact factor: 10.745

5.  MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms.

Authors:  Franziska Zickmann; Bernhard Y Renard
Journal:  Bioinformatics       Date:  2015-06-15       Impact factor: 6.937

6.  ProteomeGenerator: A Framework for Comprehensive Proteomics Based on de Novo Transcriptome Assembly and High-Accuracy Peptide Mass Spectral Matching.

Authors:  Paolo Cifani; Avantika Dhabaria; Zining Chen; Akihide Yoshimi; Emily Kawaler; Omar Abdel-Wahab; John T Poirier; Alex Kentsis
Journal:  J Proteome Res       Date:  2018-10-19       Impact factor: 4.466

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.