| Literature DB >> 25182276 |
Sean McIlwain1, Kaipo Tamura, Attila Kertesz-Farkas, Charles E Grant, Benjamin Diament, Barbara Frewen, J Jeffry Howbert, Michael R Hoopmann, Lukas Käll, Jimmy K Eng, Michael J MacCoss, William Stafford Noble.
Abstract
Efficiently and accurately analyzing big protein tandem mass spectrometry data sets requires robust software that incorporates state-of-the-art computational, machine learning, and statistical methods. The Crux mass spectrometry analysis software toolkit ( http://cruxtoolkit.sourceforge.net ) is an open source project that aims to provide users with a cross-platform suite of analysis tools for interpreting protein mass spectrometry data.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25182276 PMCID: PMC4184452 DOI: 10.1021/pr500741y
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1Crux analysis workflow and sample results. Crux provides tools for identifying spectra derived from single peptides or from cross-linked peptides as well as tools for postprocessing the resulting identifications to yield peptide- and protein-level identifications.
File Formats in Crux
| command | MS1 | MS2 | various | FASTA | Tide index | TSV | pepXML | PIN | mzIdentML | SQT | Barista XML |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Bullseye | in | in/out | in | ||||||||
| Tide index | in | out | |||||||||
| Tide search | in | in | in | out | out | out | out | out | |||
| Comet | in | in | in | out | out | out | out | out | |||
| Percolator | in/out | in/out | in | out | in | in/out | |||||
| Barista | in | in | in/out | out | in | out | |||||
| spectral counts | in | in | in/out | in | in | in |
Additional vendor proprietary formats for MS1 and MS2 data are supported on Windows: Agilent MassHunter .d, Waters RAW, Thermo RAW, Applied Biosciences Wiff, and Bruker Compass .d/YEP/BAF/FID.
Supported open MS2 file formats include BMS2, CMS2, MGF, mzML, and mzXML.
Comparison of Mass Spectrometry Analysis Toolkitsa
| feature | TPP | MaxQuant | OpenMS | GPM | CPFP | Scaffold | LabKey Server | pFind Studio | Bumbershoot | Mascot tools | Crux |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Tools | |||||||||||
| high-res mass assignment | × | × | × | × | × | × | × | × | |||
| peptide database search | × | × | × | × | × | × | × | × | × | × | × |
| machine learning postprocessor | × | × | × | × | × | × | × | × | |||
| protein cross-link searching | × | × | |||||||||
| RNA cross-link searching | × | ||||||||||
| spectral counting | × | × | × | × | × | × | |||||
| isobaric tag quantification | × | × | × | × | × | × | × | ||||
| peak area quantification | × | × | × | × | × | × | |||||
| Statistical Confidence Estimates | |||||||||||
| decoy-based estimates | × | × | × | × | × | × | × | × | × | × | |
| parametric
PSM | × | × | × | × | |||||||
| exact PSM | × | ||||||||||
| PSM | × | × | × | × | × | × | × | × | × | ||
| PSM PEPs | × | × | × | × | × | × | |||||
| peptide | × | × | × | × | × | × | × | ||||
| peptide PEPs | × | × | × | × | |||||||
| protein | × | × | × | × | × | × | × | ||||
| protein PEPs | × | × | × | × | |||||||
| Input Spectrum File Formats | |||||||||||
| Thermo.RAW | × | × | × | × | × | × | × | × | |||
| Waters.RAW | × | × | × | × | × | × | |||||
| MDS/Sciex.wiff | × | × | × | × | × | × | × | ||||
| Agilent.d | × | × | × | × | × | × | |||||
| Bruker.d | × | × | × | × | × | × | |||||
| MS1 | × | × | |||||||||
| MS2 | × | × | × | × | |||||||
| mzML | × | × | × | × | × | × | × | × | |||
| mzXML | × | × | × | × | × | × | × | × | |||
| MGF | × | × | × | × | × | × | × | ||||
| Input PSM File Formats | |||||||||||
| PepXML | × | × | × | × | |||||||
| mzIdentML | × | × | × | × | |||||||
| mzQuantML | × | ||||||||||
| .dat (Mascot) | × | × | |||||||||
| .out (SEQUEST) | × | × | |||||||||
| .sqt (SEQUEST) | × | × | × | ||||||||
| .srf (SEQUEST) | × | ||||||||||
| other tool-specific formats | × | ||||||||||
| Output File Formats | |||||||||||
| tab-delimited | × | × | × | × | × | × | × | × | × | × | |
| mzTab | × | × | × | ||||||||
| PepXML | × | × | × | × | × | ||||||
| ProtXML | × | × | |||||||||
| mzIdentML | × | × | × | × | × | × | |||||
| mzQuantML | × | ||||||||||
| Implementation | |||||||||||
| free | × | × | × | × | × | × | × | × | × | ||
| source code available | × | × | × | × | × | × | × | ||||
| open source license | × | × | × | × | × | × | × | ||||
| Linux binaries | × | × | × | × | × | × | × | ||||
| MacOS binaries | × | × | × | × | |||||||
| native Windows binaries | × | × | × | × | × | × | × | × | × | ||
| command line interface | × | × | × | × | × | × | × | × | |||
| graphical user interface | × | × | × | × | × | × | × | × | × | × | |
| application programming interface | × | × | × | ||||||||
“Mascot tools” refers to Mascot Server and Mascot Distiller, which are licensed separately. GPM is Perl-based, so no binaries are needed. Scaffold parses tool-specific PSM formats produced by Proteome Discoverer, MS Amanda, Byonic, OMSSA, MaxQuant, SpectrumMill, X!Tandem, Waters Identity E, and Phenyx. Note that as of August 2014 CPFP is no longer actively maintained.
Figure 2(a–c) We used Tide+Percolator to analyze 9 092 380 fragmentation spectra from 95 different human samples. The figure plots the number of spectra, peptides and proteins identified as a function of false discovery rate threshold. (d–f) Each panel plots, from Comet+Percolator analysis of 348 157 Plasmodium falciparum fragmentation spectra, the number of (respectively) spectra, peptides and proteins identified as a function of false discovery rate threshold. Total analysis time was 61.2 m (34.4 m for Comet and 26.8 m for Percolator). The number of proteins identified at 1% FDR (2618) by Comet+Percolator compares favorably with the published analysis (2767 proteins).