Literature DB >> 34039272

A workflow to identify novel proteins based on the direct mapping of peptide-spectrum-matches to genomic locations.

John Anders1, Hannes Petruschke2, Nico Jehmlich2, Sven-Bastiaan Haange2, Martin von Bergen3,2, Peter F Stadler4,5,6,7,8.   

Abstract

BACKGROUND: Small Proteins have received increasing attention in recent years. They have in particular been implicated as signals contributing to the coordination of bacterial communities. In genome annotations they are often missing or hidden among large numbers of hypothetical proteins because genome annotation pipelines often exclude short open reading frames or over-predict hypothetical proteins based on simple models. The validation of novel proteins, and in particular of small proteins (sProteins), therefore requires additional evidence. Proteogenomics is considered the gold standard for this purpose. It extends beyond established annotations and includes all possible open reading frames (ORFs) as potential sources of peptides, thus allowing the discovery of novel, unannotated proteins. Typically this results in large numbers of putative novel small proteins fraught with large fractions of false-positive predictions.
RESULTS: We observe that number and quality of the peptide-spectrum matches (PSMs) that map to a candidate ORF can be highly informative for the purpose of distinguishing proteins from spurious ORF annotations. We report here on a workflow that aggregates PSM quality information and local context into simple descriptors and reliably separates likely proteins from the large pool of false-positive, i.e., most likely untranslated ORFs. We investigated the artificial gut microbiome model SIHUMIx, comprising eight different species, for which we validate 5114 proteins that have previously been annotated only as hypothetical ORFs. In addition, we identified 37 non-annotated protein candidates for which we found evidence at the proteomic and transcriptomic level. Half (19) of these candidates have close functional homologs in other species. Another 12 candidates have homologs designated as hypothetical proteins in other species. The remaining six candidates are short (< 100 AA) and are most likely bona fide novel proteins.
CONCLUSIONS: The aggregation of PSM quality information for predicted ORFs provides a robust and efficient method to identify novel proteins in proteomics data. The workflow is in particular capable of identifying small proteins and frameshift variants. Since PSMs are explicitly mapped to genomic locations, it furthermore facilitates the integration of transcriptomics data and other sources of genome-level information.

Entities:  

Keywords:  Metaproteogenomics; Microbial communitities; Peptide-spectrum matches; Small proteins

Year:  2021        PMID: 34039272     DOI: 10.1186/s12859-021-04159-8

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  34 in total

1.  The Escherichia coli CydX protein is a member of the CydAB cytochrome bd oxidase complex and is required for cytochrome bd oxidase activity.

Authors:  Caitlin E VanOrsdel; Shantanu Bhatt; Rondine J Allen; Evan P Brenner; Jessica J Hobson; Aqsa Jamil; Brittany M Haynes; Allyson M Genson; Matthew R Hemm
Journal:  J Bacteriol       Date:  2013-06-07       Impact factor: 3.490

2.  RNAcode: robust discrimination of coding and noncoding regions in comparative sequence data.

Authors:  Stefan Washietl; Sven Findeiss; Stephan A Müller; Stefan Kalkhof; Martin von Bergen; Ivo L Hofacker; Peter F Stadler; Nick Goldman
Journal:  RNA       Date:  2011-02-28       Impact factor: 4.942

3.  Large-Scale Analyses of Human Microbiomes Reveal Thousands of Small, Novel Genes.

Authors:  Hila Sberro; Brayon J Fremin; Soumaya Zlitni; Fredrik Edfors; Nicholas Greenfield; Michael P Snyder; Georgios A Pavlopoulos; Nikos C Kyrpides; Ami S Bhatt
Journal:  Cell       Date:  2019-08-08       Impact factor: 41.582

Review 4.  Protein analysis by shotgun/bottom-up proteomics.

Authors:  Yaoyang Zhang; Bryan R Fonslow; Bing Shan; Moon-Chang Baek; John R Yates
Journal:  Chem Rev       Date:  2013-02-26       Impact factor: 60.622

5.  Improved Identification and Analysis of Small Open Reading Frame Encoded Polypeptides.

Authors:  Jiao Ma; Jolene K Diedrich; Irwin Jungreis; Cynthia Donaldson; Joan Vaughan; Manolis Kellis; John R Yates; Alan Saghatelian
Journal:  Anal Chem       Date:  2016-03-24       Impact factor: 6.986

Review 6.  Mining for Micropeptides.

Authors:  Catherine A Makarewich; Eric N Olson
Journal:  Trends Cell Biol       Date:  2017-05-18       Impact factor: 20.808

7.  Now, More Than Ever, Proteomics Needs Better Chromatography.

Authors:  Evgenia Shishkova; Alexander S Hebert; Joshua J Coon
Journal:  Cell Syst       Date:  2016-10-26       Impact factor: 10.304

8.  Optimization of parameters for coverage of low molecular weight proteins.

Authors:  Stephan A Müller; Tibor Kohajda; Sven Findeiss; Peter F Stadler; Stefan Washietl; Manolis Kellis; Martin von Bergen; Stefan Kalkhof
Journal:  Anal Bioanal Chem       Date:  2010-08-28       Impact factor: 4.142

9.  Characterization of the Interaction Between the Small Regulatory Peptide SgrT and the EIICBGlc of the Glucose-Phosphotransferase System of E. coli K-12.

Authors:  Anne Kosfeld; Knut Jahreis
Journal:  Metabolites       Date:  2012-10-16

Review 10.  Small proteins: untapped area of potential biological importance.

Authors:  Mingming Su; Yunchao Ling; Jun Yu; Jiayan Wu; Jingfa Xiao
Journal:  Front Genet       Date:  2013-12-16       Impact factor: 4.599

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.