Literature DB >> 26653874

MaRaCluster: A Fragment Rarity Metric for Clustering Fragment Spectra in Shotgun Proteomics.

Matthew The1, Lukas Käll1.   

Abstract

Shotgun proteomics experiments generate large amounts of fragment spectra as primary data, normally with high redundancy between and within experiments. Here, we have devised a clustering technique to identify fragment spectra stemming from the same species of peptide. This is a powerful alternative method to traditional search engines for analyzing spectra, specifically useful for larger scale mass spectrometry studies. As an aid in this process, we propose a distance calculation relying on the rarity of experimental fragment peaks, following the intuition that peaks shared by only a few spectra offer more evidence than peaks shared by a large number of spectra. We used this distance calculation and a complete-linkage scheme to cluster data from a recent large-scale mass spectrometry-based study. The clusterings produced by our method have up to 40% more identified peptides for their consensus spectra compared to those produced by the previous state-of-the-art method. We see that our method would advance the construction of spectral libraries as well as serve as a tool for mining large sets of fragment spectra. The source code and Ubuntu binary packages are available at https://github.com/statisticalbiotechnology/maracluster (under an Apache 2.0 license).

Entities:  

Keywords:  Mass spectrometry; bioinformatics; database search; hierarchical clustering; proteomics; spectral archives; spectral libraries

Mesh:

Substances:

Year:  2016        PMID: 26653874     DOI: 10.1021/acs.jproteome.5b00749

Source DB:  PubMed          Journal:  J Proteome Res        ISSN: 1535-3893            Impact factor:   4.466


  12 in total

1.  DeMix-Q: Quantification-Centered Data Processing Workflow.

Authors:  Bo Zhang; Lukas Käll; Roman A Zubarev
Journal:  Mol Cell Proteomics       Date:  2016-01-04       Impact factor: 5.911

2.  2018 YPIC Challenge: A Case Study in Characterizing an Unknown Protein Sample.

Authors:  Lindsay Pino; Andy Lin; Wout Bittremieux
Journal:  J Proteome Res       Date:  2019-10-07       Impact factor: 4.466

3.  CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis.

Authors:  Olga Permiakova; Romain Guibert; Alexandra Kraut; Thomas Fortin; Anne-Marie Hesse; Thomas Burger
Journal:  BMC Bioinformatics       Date:  2021-02-12       Impact factor: 3.169

4.  A learned embedding for efficient joint analysis of millions of mass spectra.

Authors:  Wout Bittremieux; Damon H May; Jeffrey Bilmes; William Stafford Noble
Journal:  Nat Methods       Date:  2022-05-30       Impact factor: 47.990

5.  A Comprehensive Evaluation of Consensus Spectrum Generation Methods in Proteomics.

Authors:  Xiyang Luo; Wout Bittremieux; Johannes Griss; Eric W Deutsch; Timo Sachsenberg; Lev I Levitsky; Mark V Ivanov; Julia A Bubis; Ralf Gabriels; Henry Webel; Aniel Sanchez; Mingze Bai; Lukas Käll; Yasset Perez-Riverol
Journal:  J Proteome Res       Date:  2022-05-13       Impact factor: 5.370

6.  Fast Open Modification Spectral Library Searching through Approximate Nearest Neighbor Indexing.

Authors:  Wout Bittremieux; Pieter Meysman; William Stafford Noble; Kris Laukens
Journal:  J Proteome Res       Date:  2018-09-13       Impact factor: 4.466

7.  Large-scale tandem mass spectrum clustering using fast nearest neighbor searching.

Authors:  Wout Bittremieux; Kris Laukens; William Stafford Noble; Pieter C Dorrestein
Journal:  Rapid Commun Mass Spectrom       Date:  2021-06-25       Impact factor: 2.419

8.  Deep learning embedder method and tool for mass spectra similarity search.

Authors:  Chunyuan Qin; Xiyang Luo; Chuan Deng; Kunxian Shu; Weimin Zhu; Johannes Griss; Henning Hermjakob; Mingze Bai; Yasset Perez-Riverol
Journal:  J Proteomics       Date:  2020-12-08       Impact factor: 3.855

9.  Focus on the spectra that matter by clustering of quantification data in shotgun proteomics.

Authors:  Matthew The; Lukas Käll
Journal:  Nat Commun       Date:  2020-06-26       Impact factor: 14.919

10.  Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets.

Authors:  Johannes Griss; Yasset Perez-Riverol; Steve Lewis; David L Tabb; José A Dianes; Noemi Del-Toro; Marc Rurik; Mathias W Walzer; Oliver Kohlbacher; Henning Hermjakob; Rui Wang; Juan Antonio Vizcaíno
Journal:  Nat Methods       Date:  2016-06-27       Impact factor: 28.547

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.