Literature DB >> 33579189

CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis.

Olga Permiakova1, Romain Guibert1, Alexandra Kraut1, Thomas Fortin1, Anne-Marie Hesse1, Thomas Burger2.   

Abstract

BACKGROUND: The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms.
RESULTS: We propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles.
CONCLUSIONS: Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data.

Entities:  

Keywords:  Large-scale cluster analysis; Liquid chromatography; Mass spectrometry; Optimal transport; Proteomics; Wasserstein kernel

Year:  2021        PMID: 33579189      PMCID: PMC7881590          DOI: 10.1186/s12859-021-03969-0

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  31 in total

1.  Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility.

Authors:  David L Tabb; Michael J MacCoss; Christine C Wu; Scott D Anderson; John R Yates
Journal:  Anal Chem       Date:  2003-05-15       Impact factor: 6.986

2.  OpenMS and TOPP: open source software for LC-MS data analysis.

Authors:  Andreas Bertsch; Clemens Gröpl; Knut Reinert; Oliver Kohlbacher
Journal:  Methods Mol Biol       Date:  2011

3.  MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification.

Authors:  Jürgen Cox; Matthias Mann
Journal:  Nat Biotechnol       Date:  2008-11-30       Impact factor: 54.908

4.  A spectral clustering approach to MS/MS identification of post-translational modifications.

Authors:  Jayson A Falkner; Jarret W Falkner; Anastasia K Yocum; Philip C Andrews
Journal:  J Proteome Res       Date:  2008-09-19       Impact factor: 4.466

5.  XNet: A Bayesian Approach to Extracted Ion Chromatogram Clustering for Precursor Mass Spectrometry Data.

Authors:  Mathew Gutierrez; Kyle Handy; Rob Smith
Journal:  J Proteome Res       Date:  2019-06-21       Impact factor: 4.466

6.  Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies.

Authors:  Stephane Houel; Robert Abernathy; Kutralanathan Renganathan; Karen Meyer-Arendt; Natalie G Ahn; William M Old
Journal:  J Proteome Res       Date:  2010-08-06       Impact factor: 4.466

7.  A cross-platform toolkit for mass spectrometry and proteomics.

Authors:  Matthew C Chambers; Brendan Maclean; Robert Burke; Dario Amodei; Daniel L Ruderman; Steffen Neumann; Laurent Gatto; Bernd Fischer; Brian Pratt; Jarrett Egertson; Katherine Hoff; Darren Kessner; Natalie Tasman; Nicholas Shulman; Barbara Frewen; Tahmina A Baker; Mi-Youn Brusniak; Christopher Paulse; David Creasy; Lisa Flashner; Kian Kani; Chris Moulding; Sean L Seymour; Lydia M Nuwaysir; Brent Lefebvre; Frank Kuhlmann; Joe Roark; Paape Rainer; Suckau Detlev; Tina Hemenway; Andreas Huhmer; James Langridge; Brian Connolly; Trey Chadick; Krisztina Holly; Josh Eckels; Eric W Deutsch; Robert L Moritz; Jonathan E Katz; David B Agus; Michael MacCoss; David L Tabb; Parag Mallick
Journal:  Nat Biotechnol       Date:  2012-10       Impact factor: 54.908

8.  Response to "Comparison and Evaluation of Clustering Algorithms for Tandem Mass Spectra".

Authors:  Johannes Griss; Yasset Perez-Riverol; Matthew The; Lukas Käll; Juan Antonio Vizcaíno
Journal:  J Proteome Res       Date:  2018-04-25       Impact factor: 5.370

9.  Future Prospects of Spectral Clustering Approaches in Proteomics.

Authors:  Yasset Perez-Riverol; Juan Antonio Vizcaíno; Johannes Griss
Journal:  Proteomics       Date:  2018-07       Impact factor: 3.984

10.  Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets.

Authors:  Johannes Griss; Yasset Perez-Riverol; Steve Lewis; David L Tabb; José A Dianes; Noemi Del-Toro; Marc Rurik; Mathias W Walzer; Oliver Kohlbacher; Henning Hermjakob; Rui Wang; Juan Antonio Vizcaíno
Journal:  Nat Methods       Date:  2016-06-27       Impact factor: 28.547

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.