Qiang Kou1, Si Wu2, Nikola Tolic3, Ljiljana Paša-Tolic3, Yunlong Liu4,5, Xiaowen Liu1,5. 1. Department of BioHealth Informatics, Indiana University-Purdue University Indianapolis, Indianapolis, IN 46202, USA. 2. Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA. 3. Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA 99354, USA. 4. Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA. 5. Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA.
Abstract
Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms. Availability and implementation: http://proteomics.informatics.iupui.edu/software/topmg/. Contact: xwliu@iupui.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Although proteomics has rapidly developed in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a 'bird's eye view' of intact proteoforms. The combinatorial explosion of various alterations on a protein may result in billions of possible proteoforms, making proteoform identification a challenging computational problem. Results: We propose a new data structure, called the mass graph, for efficient representation of proteoforms and design mass graph alignment algorithms. We developed TopMG, a mass graph-based software tool for proteoform identification by top-down mass spectrometry. Experiments on top-down mass spectrometry datasets showed that TopMG outperformed existing methods in identifying complex proteoforms. Availability and implementation: http://proteomics.informatics.iupui.edu/software/topmg/. Contact: xwliu@iupui.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Xiaowen Liu; Yakov Sirotkin; Yufeng Shen; Gordon Anderson; Yihsuan S Tsai; Ying S Ting; David R Goodlett; Richard D Smith; Vineet Bafna; Pavel A Pevzner Journal: Mol Cell Proteomics Date: 2011-10-25 Impact factor: 5.911
Authors: Leah V Schaffer; Michael R Shortreed; Anthony J Cesnik; Brian L Frey; Stefan K Solntsev; Mark Scalf; Lloyd M Smith Journal: Anal Chem Date: 2017-12-22 Impact factor: 6.986
Authors: Leah V Schaffer; Robert J Millikin; Rachel M Miller; Lissa C Anderson; Ryan T Fellers; Ying Ge; Neil L Kelleher; Richard D LeDuc; Xiaowen Liu; Samuel H Payne; Liangliang Sun; Paul M Thomas; Trisha Tucholski; Zhe Wang; Si Wu; Zhijie Wu; Dahang Yu; Michael R Shortreed; Lloyd M Smith Journal: Proteomics Date: 2019-05 Impact factor: 3.984
Authors: Leah V Schaffer; Jarred W Rensvold; Michael R Shortreed; Anthony J Cesnik; Adam Jochem; Mark Scalf; Brian L Frey; David J Pagliarini; Lloyd M Smith Journal: J Proteome Res Date: 2018-09-18 Impact factor: 4.466