Literature DB >> 33307250

Deep learning embedder method and tool for mass spectra similarity search.

Chunyuan Qin1, Xiyang Luo1, Chuan Deng1, Kunxian Shu1, Weimin Zhu2, Johannes Griss3, Henning Hermjakob4, Mingze Bai5, Yasset Perez-Riverol6.   

Abstract

Spectral similarity calculation is widely used in protein identification tools and mass spectra clustering algorithms while comparing theoretical or experimental spectra. The performance of the spectral similarity calculation plays an important role in these tools and algorithms especially in the analysis of large-scale datasets. Recently, deep learning methods have been proposed to improve the performance of clustering algorithms and protein identification by training the algorithms with existing data and the use of multiple spectra and identified peptide features. While the efficiency of these algorithms is still under study in comparison with traditional approaches, their application in proteomics data analysis is becoming more common. Here, we propose the use of deep learning to improve spectral similarity comparison. We assessed the performance of deep learning for spectral similarity, with GLEAMS and a newly trained embedder model (DLEAMSE), which uses high-quality spectra from PRIDE Cluster. Also, we developed a new bioinformatics tool (mslookup - https://github.com/bigbio/DLEAMSE/) that allows users to quickly search for spectra in previously identified mass spectra publish in public repositories and spectral libraries. Finally, we released a human database to enable bioinformaticians and biologists to search for identified spectra in their machines. SIGNIFICANCE STATEMENT: Spectral similarity calculation plays an important role in proteomics data analysis. With deep learning's ability to learn the implicit and effective features from large-scale training datasets, deep learning-based MS/MS spectra embedding models has emerged as a solution to improve mass spectral clustering similarity calculation algorithms. We compare multiple similarity scoring and deep learning methods in terms of accuracy (compute the similarity for a pair of the mass spectrum) and computing-time performance. The benchmark results showed no major differences in accuracy between DLEAMSE and normalized dot product for spectrum similarity calculations. The DLEAMSE GPU implementation is faster than NDP in preprocessing on the GPU server and the similarity calculation of DLEAMSE (Euclidean distance on 32-D vectors) takes about 1/3 of dot product calculations. The deep learning model (DLEAMSE) encoding and embedding steps needed to run once for each spectrum and the embedded 32-D points can be persisted in the repository for future comparison, which is faster for future comparisons and large-scale data. Based on these, we proposed a new tool mslookup that enables the researcher to find spectra previously identified in public data. The tool can be also used to generate in-house databases of previously identified spectra to share with other laboratories and consortiums.
Copyright © 2020. Published by Elsevier B.V.

Entities:  

Keywords:  Deep learning; Mass spectra embedder; Scoring function; Spectral similarity

Mesh:

Year:  2020        PMID: 33307250      PMCID: PMC7613299          DOI: 10.1016/j.jprot.2020.104070

Source DB:  PubMed          Journal:  J Proteomics        ISSN: 1874-3919            Impact factor:   3.855


  37 in total

1.  Using annotated peptide mass spectrum libraries for protein identification.

Authors:  R Craig; J C Cortens; D Fenyo; R C Beavis
Journal:  J Proteome Res       Date:  2006-08       Impact factor: 4.466

2.  Development and validation of a spectral library searching method for peptide identification from MS/MS.

Authors:  Henry Lam; Eric W Deutsch; James S Eddes; Jimmy K Eng; Nichole King; Stephen E Stein; Ruedi Aebersold
Journal:  Proteomics       Date:  2007-03       Impact factor: 3.984

3.  Clustering millions of tandem mass spectra.

Authors:  Ari M Frank; Nuno Bandeira; Zhouxin Shen; Stephen Tanner; Steven P Briggs; Richard D Smith; Pavel A Pevzner
Journal:  J Proteome Res       Date:  2007-12-08       Impact factor: 4.466

4.  Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning.

Authors:  Siegfried Gessulat; Tobias Schmidt; Daniel Paul Zolg; Patroklos Samaras; Karsten Schnatbaum; Johannes Zerweck; Tobias Knaute; Julia Rechenberger; Bernard Delanghe; Andreas Huhmer; Ulf Reimer; Hans-Christian Ehrlich; Stephan Aiche; Bernhard Kuster; Mathias Wilhelm
Journal:  Nat Methods       Date:  2019-05-27       Impact factor: 28.547

5.  The Hybrid Search: A Mass Spectral Library Search Method for Discovery of Modifications in Proteomics.

Authors:  Meghan C Burke; Yuri A Mirokhin; Dmitrii V Tchekhovskoi; Sanford P Markey; Jenny Heidbrink Thompson; Christopher Larkin; Stephen E Stein
Journal:  J Proteome Res       Date:  2017-04-11       Impact factor: 4.466

6.  De novo peptide sequencing by deep learning.

Authors:  Ngoc Hieu Tran; Xianglilan Zhang; Lei Xin; Baozhen Shan; Ming Li
Journal:  Proc Natl Acad Sci U S A       Date:  2017-07-18       Impact factor: 11.205

Review 7.  [Progress in the spectral library based protein identification strategy].

Authors:  Derui Yu; Jie Ma; Zengyan Xie; Mingze Bai; Yunping Zhu; Kunxian Shu
Journal:  Sheng Wu Gong Cheng Xue Bao       Date:  2018-04-25

8.  Fast parallel tandem mass spectral library searching using GPU hardware acceleration.

Authors:  Lydia Ashleigh Baumgardner; Avinash Kumar Shanmugam; Henry Lam; Jimmy K Eng; Daniel B Martin
Journal:  J Proteome Res       Date:  2011-05-05       Impact factor: 4.466

9.  A novel spectral library workflow to enhance protein identifications.

Authors:  Haomin Li; Nobel C Zong; Xiangbo Liang; Allen K Kim; Jeong Ho Choi; Ning Deng; Ivette Zelaya; Maggie Lam; Huilong Duan; Peipei Ping
Journal:  J Proteomics       Date:  2013-02-04       Impact factor: 4.044

10.  MS2PIP prediction server: compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation.

Authors:  Sven Degroeve; Davy Maddelein; Lennart Martens
Journal:  Nucleic Acids Res       Date:  2015-05-18       Impact factor: 16.971

View more
  3 in total

1.  Memory-Efficient Searching of Gas-Chromatography Mass Spectra Accelerated by Prescreening.

Authors:  Aleksandr Smirnov; Yunfei Liao; Xiuxia Du
Journal:  Metabolites       Date:  2022-05-29

2.  SpeCollate: Deep cross-modal similarity network for mass spectrometry data based peptide deductions.

Authors:  Muhammad Usman Tariq; Fahad Saeed
Journal:  PLoS One       Date:  2021-10-29       Impact factor: 3.240

3.  The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.

Authors:  Yasset Perez-Riverol; Jingwen Bai; Chakradhar Bandla; David García-Seisdedos; Suresh Hewapathirana; Selvakumar Kamatchinathan; Deepti J Kundu; Ananth Prakash; Anika Frericks-Zipper; Martin Eisenacher; Mathias Walzer; Shengbo Wang; Alvis Brazma; Juan Antonio Vizcaíno
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.