Literature DB >> 32223200

Enhancing Top-Down Proteomics Data Analysis by Combining Deconvolution Results through a Machine Learning Strategy.

Sean J McIlwain, Zhijie Wu, Molly Wetzel, Daniel Belongia, Yutong Jin, Kent Wenger, Irene M Ong, Ying Ge.   

Abstract

Top-down mass spectrometry (MS) is a powerful tool for the identification and comprehensive characterization of proteoforms arising from alternative splicing, sequence variation, and post-translational modifications. However, the complex data set generated from top-down MS experiments requires multiple sequential data processing steps to successfully interpret the data for identifying and characterizing proteoforms. One critical step is the deconvolution of the complex isotopic distribution that arises from naturally occurring isotopes. Multiple algorithms are currently available to deconvolute top-down mass spectra, resulting in different deconvoluted peak lists with varied accuracy compared to true positive annotations. In this study, we have designed a machine learning strategy that can process and combine the peak lists from different deconvolution results. By optimizing clustering results, deconvolution results from THRASH, TopFD, MS-Deconv, and SNAP algorithms were combined into consensus peak lists at various thresholds using either a simple voting ensemble method or a random forest machine learning algorithm. For the random forest algorithm, which had better predictive performance, the consensus peak lists on average could achieve a recall value (true positive rate) of 0.60 and a precision value (positive predictive value) of 0.78. It outperforms the single best algorithm, which achieved a recall value of only 0.47 and a precision value of 0.58. This machine learning strategy enhanced the accuracy and confidence in protein identification during database searches by accelerating the detection of true positive peaks while filtering out false positive peaks. Thus, this method shows promise in enhancing proteoform identification and characterization for high-throughput data analysis in top-down proteomics.

Entities:  

Keywords:  machine learning ensemble; top-down mass spectrometry

Mesh:

Substances:

Year:  2020        PMID: 32223200      PMCID: PMC7909725          DOI: 10.1021/jasms.0c00035

Source DB:  PubMed          Journal:  J Am Soc Mass Spectrom        ISSN: 1044-0305            Impact factor:   3.262


  23 in total

1.  Proteomic mass spectra classification using decision tree based ensemble methods.

Authors:  Pierre Geurts; Marianne Fillet; Dominique de Seny; Marie-Alice Meuwis; Michel Malaise; Marie-Paule Merville; Louis Wehenkel
Journal:  Bioinformatics       Date:  2005-05-12       Impact factor: 6.937

2.  Top down characterization of larger proteins (45 kDa) by electron capture dissociation mass spectrometry.

Authors:  Ying Ge; Brian G Lawhorn; Mariam ElNaggar; Erick Strauss; Joo-Heon Park; Tadhg P Begley; Fred W McLafferty
Journal:  J Am Chem Soc       Date:  2002-01-30       Impact factor: 15.419

3.  De novo peptide sequencing by deep learning.

Authors:  Ngoc Hieu Tran; Xianglilan Zhang; Lei Xin; Baozhen Shan; Ming Li
Journal:  Proc Natl Acad Sci U S A       Date:  2017-07-18       Impact factor: 11.205

Review 4.  Identification and Quantification of Proteoforms by Mass Spectrometry.

Authors:  Leah V Schaffer; Robert J Millikin; Rachel M Miller; Lissa C Anderson; Ryan T Fellers; Ying Ge; Neil L Kelleher; Richard D LeDuc; Xiaowen Liu; Samuel H Payne; Liangliang Sun; Paul M Thomas; Trisha Tucholski; Zhe Wang; Si Wu; Zhijie Wu; Dahang Yu; Michael R Shortreed; Lloyd M Smith
Journal:  Proteomics       Date:  2019-05       Impact factor: 3.984

Review 5.  Top-Down Proteomics: Ready for Prime Time?

Authors:  Bifan Chen; Kyle A Brown; Ziqing Lin; Ying Ge
Journal:  Anal Chem       Date:  2017-12-15       Impact factor: 6.986

6.  Bayesian deconvolution of mass and ion mobility spectra: from binary interactions to polydisperse ensembles.

Authors:  Michael T Marty; Andrew J Baldwin; Erik G Marklund; Georg K A Hochberg; Justin L P Benesch; Carol V Robinson
Journal:  Anal Chem       Date:  2015-04-01       Impact factor: 6.986

7.  Top-down Mass Spectrometry of Sarcomeric Protein Post-translational Modifications from Non-human Primate Skeletal Muscle.

Authors:  Yutong Jin; Gary M Diffee; Ricki J Colman; Rozalyn M Anderson; Ying Ge
Journal:  J Am Soc Mass Spectrom       Date:  2019-03-04       Impact factor: 3.262

8.  Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications.

Authors:  Yiyan Zhang; Yi Xin; Qin Li; Jianshe Ma; Shuai Li; Xiaodan Lv; Weiqi Lv
Journal:  Biomed Eng Online       Date:  2017-11-02       Impact factor: 2.819

9.  The PRIDE database and related tools and resources in 2019: improving support for quantification data.

Authors:  Yasset Perez-Riverol; Attila Csordas; Jingwen Bai; Manuel Bernal-Llinares; Suresh Hewapathirana; Deepti J Kundu; Avinash Inuganti; Johannes Griss; Gerhard Mayer; Martin Eisenacher; Enrique Pérez; Julian Uszkoreit; Julianus Pfeuffer; Timo Sachsenberg; Sule Yilmaz; Shivani Tiwary; Jürgen Cox; Enrique Audain; Mathias Walzer; Andrew F Jarnuczak; Tobias Ternent; Alvis Brazma; Juan Antonio Vizcaíno
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

10.  How many human proteoforms are there?

Authors:  Ruedi Aebersold; Jeffrey N Agar; I Jonathan Amster; Mark S Baker; Carolyn R Bertozzi; Emily S Boja; Catherine E Costello; Benjamin F Cravatt; Catherine Fenselau; Benjamin A Garcia; Ying Ge; Jeremy Gunawardena; Ronald C Hendrickson; Paul J Hergenrother; Christian G Huber; Alexander R Ivanov; Ole N Jensen; Michael C Jewett; Neil L Kelleher; Laura L Kiessling; Nevan J Krogan; Martin R Larsen; Joseph A Loo; Rachel R Ogorzalek Loo; Emma Lundberg; Michael J MacCoss; Parag Mallick; Vamsi K Mootha; Milan Mrksich; Tom W Muir; Steven M Patrie; James J Pesavento; Sharon J Pitteri; Henry Rodriguez; Alan Saghatelian; Wendy Sandoval; Hartmut Schlüter; Salvatore Sechi; Sarah A Slavoff; Lloyd M Smith; Michael P Snyder; Paul M Thomas; Mathias Uhlén; Jennifer E Van Eyk; Marc Vidal; David R Walt; Forest M White; Evan R Williams; Therese Wohlschlager; Vicki H Wysocki; Nathan A Yates; Nicolas L Young; Bing Zhang
Journal:  Nat Chem Biol       Date:  2018-02-14       Impact factor: 15.040

View more
  4 in total

1.  MASH Explorer: A Universal Software Environment for Top-Down Proteomics.

Authors:  Zhijie Wu; David S Roberts; Jake A Melby; Kent Wenger; Molly Wetzel; Yiwen Gu; Sudharshanan Govindaraj Ramanathan; Elizabeth F Bayne; Xiaowen Liu; Ruixiang Sun; Irene M Ong; Sean J McIlwain; Ying Ge
Journal:  J Proteome Res       Date:  2020-08-24       Impact factor: 4.466

2.  Top-down proteomics: challenges, innovations, and applications in basic and clinical research.

Authors:  Kyle A Brown; Jake A Melby; David S Roberts; Ying Ge
Journal:  Expert Rev Proteomics       Date:  2020-12-17       Impact factor: 3.940

Review 3.  Novel Strategies to Address the Challenges in Top-Down Proteomics.

Authors:  Jake A Melby; David S Roberts; Eli J Larson; Kyle A Brown; Elizabeth F Bayne; Song Jin; Ying Ge
Journal:  J Am Soc Mass Spectrom       Date:  2021-05-13       Impact factor: 3.109

4.  Mass Spectrometry-Based Structural Analysis of Cysteine-Rich Metal-Binding Sites in Proteins with MetaOdysseus R Software.

Authors:  Manuel David Peris-Díaz; Roman Guran; Ondrej Zitka; Vojtech Adam; Artur Krężel
Journal:  J Proteome Res       Date:  2020-09-28       Impact factor: 4.466

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.