Literature DB >> 34169593

Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.

Wout Bittremieux1,2, Kris Laukens2, William Stafford Noble3, Pieter C Dorrestein1.   

Abstract

RATIONALE: Advanced algorithmic solutions are necessary to process the ever-increasing amounts of mass spectrometry data that are being generated. In this study, we describe the falcon spectrum clustering tool for efficient clustering of millions of MS/MS spectra.
METHODS: falcon succeeds in efficiently clustering large amounts of mass spectral data using advanced techniques for fast spectrum similarity searching. First, high-resolution spectra are binned and converted to low-dimensional vectors using feature hashing. Next, the spectrum vectors are used to construct nearest neighbor indexes for fast similarity searching. The nearest neighbor indexes are used to efficiently compute a sparse pairwise distance matrix without having to exhaustively perform all pairwise spectrum comparisons within the relevant precursor mass tolerance. Finally, density-based clustering is performed to group similar spectra into clusters.
RESULTS: Several state-of-the-art spectrum clustering tools were evaluated using a large draft human proteome data set consisting of 25 million spectra, indicating that alternative tools produce clustering results with different characteristics. Notably, falcon generates larger highly pure clusters than alternative tools, leading to a larger reduction in data volume without the loss of relevant information for more efficient downstream processing.
CONCLUSIONS: falcon is a highly efficient spectrum clustering tool, which is publicly available as an open source under the permissive BSD license at https://github.com/bittremieux/falcon.
© 2021 John Wiley & Sons Ltd.

Entities:  

Year:  2021        PMID: 34169593      PMCID: PMC8709870          DOI: 10.1002/rcm.9153

Source DB:  PubMed          Journal:  Rapid Commun Mass Spectrom        ISSN: 0951-4198            Impact factor:   2.419


  29 in total

1.  Mass spectral molecular networking of living microbial colonies.

Authors:  Jeramie Watrous; Patrick Roach; Theodore Alexandrov; Brandi S Heath; Jane Y Yang; Roland D Kersten; Menno van der Voort; Kit Pogliano; Harald Gross; Jos M Raaijmakers; Bradley S Moore; Julia Laskin; Nuno Bandeira; Pieter C Dorrestein
Journal:  Proc Natl Acad Sci U S A       Date:  2012-05-14       Impact factor: 11.205

2.  When less can yield more - Computational preprocessing of MS/MS spectra for peptide identification.

Authors:  Bernhard Y Renard; Marc Kirchner; Flavio Monigatti; Alexander R Ivanov; Juri Rappsilber; Dominic Winter; Judith A J Steen; Fred A Hamprecht; Hanno Steen
Journal:  Proteomics       Date:  2009-11       Impact factor: 3.984

3.  Pyteomics 4.0: Five Years of Development of a Python Proteomics Framework.

Authors:  Lev I Levitsky; Joshua A Klein; Mark V Ivanov; Mikhail V Gorshkov
Journal:  J Proteome Res       Date:  2019-01-08       Impact factor: 4.466

4.  PRIDE Cluster: building a consensus of proteomics data.

Authors:  Johannes Griss; Joseph M Foster; Henning Hermjakob; Juan Antonio Vizcaíno
Journal:  Nat Methods       Date:  2013-02       Impact factor: 28.547

5.  Hunting for unexpected post-translational modifications by spectral library searching with tier-wise scoring.

Authors:  Chun Wai Manson Ma; Henry Lam
Journal:  J Proteome Res       Date:  2014-04-02       Impact factor: 4.466

6.  Mass-spectrometry-based draft of the human proteome.

Authors:  Mathias Wilhelm; Judith Schlegl; Hannes Hahne; Amin Moghaddas Gholami; Marcus Lieberenz; Mikhail M Savitski; Emanuel Ziegler; Lars Butzmann; Siegfried Gessulat; Harald Marx; Toby Mathieson; Simone Lemeer; Karsten Schnatbaum; Ulf Reimer; Holger Wenschuh; Martin Mollenhauer; Julia Slotta-Huspenina; Joos-Hendrik Boese; Marcus Bantscheff; Anja Gerstmair; Franz Faerber; Bernhard Kuster
Journal:  Nature       Date:  2014-05-29       Impact factor: 49.962

7.  ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion.

Authors:  Niels Hulstaert; Jim Shofstahl; Timo Sachsenberg; Mathias Walzer; Harald Barsnes; Lennart Martens; Yasset Perez-Riverol
Journal:  J Proteome Res       Date:  2019-12-06       Impact factor: 4.466

8.  The UniProtKB guide to the human proteome.

Authors:  Lionel Breuza; Sylvain Poux; Anne Estreicher; Maria Livia Famiglietti; Michele Magrane; Michael Tognolli; Alan Bridge; Delphine Baratin; Nicole Redaschi
Journal:  Database (Oxford)       Date:  2016-02-20       Impact factor: 3.451

9.  Assembling the Community-Scale Discoverable Human Proteome.

Authors:  Mingxun Wang; Jian Wang; Jeremy Carver; Benjamin S Pullman; Seong Won Cha; Nuno Bandeira
Journal:  Cell Syst       Date:  2018-08-29       Impact factor: 10.304

10.  Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships.

Authors:  Florian Huber; Lars Ridder; Stefan Verhoeven; Jurriaan H Spaaks; Faruk Diblen; Simon Rogers; Justin J J van der Hooft
Journal:  PLoS Comput Biol       Date:  2021-02-16       Impact factor: 4.475

View more
  4 in total

1.  A learned embedding for efficient joint analysis of millions of mass spectra.

Authors:  Wout Bittremieux; Damon H May; Jeffrey Bilmes; William Stafford Noble
Journal:  Nat Methods       Date:  2022-05-30       Impact factor: 47.990

2.  A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics.

Authors:  Xiyang Luo; Wout Bittremieux; Johannes Griss; Eric W Deutsch; Timo Sachsenberg; Lev I Levitsky; Mark V Ivanov; Julia A Bubis; Ralf Gabriels; Henry Webel; Aniel Sanchez; Mingze Bai; Lukas Käll; Yasset Perez-Riverol
Journal:  J Proteome Res       Date:  2022-05-13       Impact factor: 5.370

3.  Chromatography Conditions Development by Design of Experiments for the Chemotype Differentiation of Four Bauhinia Species.

Authors:  Amanda J Aquino; Edenir R Pereira-Filho; Regina V Oliveira; Quezia B Cass
Journal:  Front Chem       Date:  2022-05-23       Impact factor: 5.545

4.  The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences.

Authors:  Yasset Perez-Riverol; Jingwen Bai; Chakradhar Bandla; David García-Seisdedos; Suresh Hewapathirana; Selvakumar Kamatchinathan; Deepti J Kundu; Ananth Prakash; Anika Frericks-Zipper; Martin Eisenacher; Mathias Walzer; Shengbo Wang; Alvis Brazma; Juan Antonio Vizcaíno
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.