| Literature DB >> 28515314 |
Juan Antonio Vizcaíno1, Gerhard Mayer2, Simon Perkins3, Harald Barsnes4,5,6, Marc Vaudel4,6,7, Yasset Perez-Riverol1, Tobias Ternent1, Julian Uszkoreit2, Martin Eisenacher2, Lutz Fischer8, Juri Rappsilber8,9, Eugen Netz10, Mathias Walzer11, Oliver Kohlbacher10,11,12,13, Alexander Leitner14, Robert J Chalkley15, Fawaz Ghali3, Salvador Martínez-Bartolomé16, Eric W Deutsch17, Andrew R Jones18.
Abstract
The first stable version of the Proteomics Standards Initiative mzIdentML open data standard (version 1.1) was published in 2012-capturing the outputs of peptide and protein identification software. In the intervening years, the standard has become well-supported in both commercial and open software, as well as a submission and download format for public repositories. Here we report a new release of mzIdentML (version 1.2) that is required to keep pace with emerging practice in proteome informatics. New features have been added to support: (1) scores associated with localization of modifications on peptides; (2) statistics performed at the level of peptides; (3) identification of cross-linked peptides; and (4) support for proteogenomics approaches. In addition, there is now improved support for the encoding of de novo sequencing of peptides, spectral library searches, and protein inference. As a key point, the underlying XML schema has only undergone very minor modifications to simplify as much as possible the transition from version 1.1 to version 1.2 for implementers, but there have been several notable updates to the format specification, implementation guidelines, controlled vocabularies and validation software. mzIdentML 1.2 can be described as backwards compatible, in that reading software designed for mzIdentML 1.1 should function in most cases without adaptation. We anticipate that these developments will provide a continued stable base for software teams working to implement the standard. All the related documentation is accessible at http://www.psidev.info/mzidentml.Entities:
Mesh:
Year: 2017 PMID: 28515314 PMCID: PMC5500760 DOI: 10.1074/mcp.M117.068429
Source DB: PubMed Journal: Mol Cell Proteomics ISSN: 1535-9476 Impact factor: 5.911
A summary of current software available for processing mzIdentML 1.1+ files by May 2017. Tool = Tool name, followed by (Vendor) [Converter if non-native support]. Type = “Search” (Search engine), “Quant” (Quantification software), “IO” (file input/output), “Pipeline” (processing pipeline), “Grouping” (protein grouping), “Post-Processing” (postprocessing routines), “Proteogenomics” (proteogenomics software), “Variant ID” (variant identification software), “Visualization” (visualization tool)—in all cases referring to the named tool. URL = The web address of the tool itself or the conversion utility, if mzIdentML is not natively supported. I/E = IMPORT/EXPORT functionality. F/C = Free/Commercial, F* = the converter is free but the software is not. Additional abbreviations not indicated in the main text: DDA (Data Dependent Acquisition), DIA (Data Independent Acquisition), MRM (Multiple Reaction Monitoring), PRM (Parallel Reaction Monitoring)
| Tool | Type | Status/Description | URL | I/E | F/C |
|---|---|---|---|---|---|
| Byonic (Protein Metrics Inc.) | Search | Byonic search engine supports mzIdentML 1.1 as an output format | E | C | |
| Crux | Search | Supports mzIdentML 1.1 as an output format and reads mzIdentML 1.1 to generate spectral count data | I & E | F | |
| IDPicker | Grouping | Version 3.x implements mzIdentML 1.1 import | I | F | |
| IP2 | Search & Quant | E | C | ||
| Iquant | Quant | Automated pipeline for quantification by using isobaric tags; identification results are imported | I | F | |
| jmzIdentML | IO | Java API for reading and writing mzIdentML 1.1 | I & E | F | |
| jPOST | Database | identification result files can be uploaded in mzIdentML 1.1 | I | F | |
| Mascot (Matrix Science) | Search & Quant | mzIdentML version 1.1 available in Mascot version 2.4+ | E | C | |
| MassIVE | Database | identification files can be uploaded in mzIdentML 1.1 | I | F | |
| ms-data-core-api | IO | Java API that supports reading of PSI standard and open formats e.g. mzML, mzIdentML, mzTab, mgf and others. | I | F | |
| MS-GF+ | Search | Full support for exporting identification results into mzIdentML 1.1 | E | F | |
| MyriMatch | Search | Identifications exported in mzIdentML 1.1 | E | F | |
| mzID package | IO | R package available through Bioconductor supporting v 1.1 | I | F | |
| mzidLibrary | Post-processing | Routines and viewer (stats, protein inference, CSV import/export, proteogenomics) supporting v1.1 and 1.2 | I & E | F | |
| OMSSA [mzidLib] | Search | Converter from OMSSA .omx files to v1.1 or 1.2 in mzidLibrary. | E | F | |
| OpenMS | Pipeline | mzIdentML 1.1 fully supported in release 1.9 + | I & E | F | |
| PAnalyzer | Grouping | Used for protein grouping; it imports and exports mzIdentML (v1.1 and 1.2) | I & E | F | |
| PEAKS (Bioinformatics Solutions Inc.) | Search & Quant | Native export of mzIdentML version 1.1 | E | C | |
| PeptideShaker | Post-processing | Java stand-alone tool for the analysis and post-processing of proteomics experiments; it support mzIdentML 1.1 & 1.2 | I & E | F | |
| PGA | Proteogenomics | Software for creating RNA-Seq based databases; it supports v1.1 as an input format for post-processing. | I | F | |
| PIA | Grouping | Toolbox for protein inference and identification analysis; it supports mzIdentML 1.1. | I & E | F | |
| ProteinLynx Global Server | Search & Quant | Peptide/protein identification and quantification software; it supports export to mzIdentML in version 3.0.3+ | E | C | |
| PRIDE | Database | mzIdentML 1.1 fully supported as an import format as part of the “complete” dataset submission pipeline | I | F | |
| PRIDE Inspector | Visualisation | Java stand-alone tool that can be used to visualise mzIdentML 1.1 files, independently or together with the corresponding mass spectra files (available in any open formats e.g. mzML, mzXML, mgf, dta, pkl, and apl). | I | F | |
| Progenesis QI for proteomics (Waters Corp.) | Quant | Label-free quantification software can read identifications from Byonic in mzIdentML 1.1 | I | C | |
| ProteinPilot | Search & Quant | ProteinPilot 5.0+ exports search results in mzIdentML version 1.2. | E | C | |
| ProteinScape (Bruker) | Search & Quant | It imports search engine results other than Mascot in mzIdentML 1.1 | I | C | |
| SEQUEST / Proteome Discoverer (Thermo) [m2Lite / ProCon] | Search & Quant | Conversion of msf files from Proteome Discoverer to mzIdentML 1.1 | E | F* | |
| ProteoAnnotator | Proteo-genomics | Proteogenomics software that uses mzIdentML 1.1 as its internal file format | E | F | |
| ProteoWizard | IO | pepXML converter available and support for reading/writing mzIdentML 1.1 | I & E | F | |
| Scaffold | Search & quant | Scaffold 4.0+ supports reading and writing of mzIdentML 1.1 | I & E | C | |
| Skyline | Quant | SRM/MRM/PRM, DIA and targeted DDA software can import mzIdentML 1.1 for spectral library construction | I | F | |
| TagRecon | Variant ID | Identifications exported in mzIdentML 1.1 | E | F | |
| Trans Proteomic Pipeline [ProteoWizard] | Pipeline | pepXML to mzIdentML 1.1 converter available from ProteoWizard | I & E | F | |
| X!Tandem [mzidLib] | Search | Converter from X!Tandem XML files to mzIdentML 1.1 or 1.2 as part of the mzidLibrary. | E | F |
Fig. 1. Unique identifiers and references to other objects in the file are underlined. B, The peptide identified is stored elsewhere in the file, within the
New CV terms in the PSI-MS CV that are now mandatory within the element
| CV term name | Accession number | Comments/Purpose |
|---|---|---|
| Peptide-level scoring | MS:1002490 | Statistics have been performed on non-redundant peptide identifications. |
| Modification localization scoring | MS:1002491 | Scoring has been performed on the sites of peptide modification. |
| Consensus scoring | MS:1002492 | Multiple search engines have been used for peptide identification. |
| Sample prefractionation | MS:1002493 | The file contains the results of merged pre-fractionation analyses. |
| Cross-linking search | MS:1002494 | The search engine has analysed cross-linked (and regular) peptides, using the new encoding described here. |
| De novo search | MS:1001010 | |
| Proteogenomics search | MS:1002635 | Peptides have been mapped back to genome level coordinates, stored in the file. |
| Spectral library search | MS:1001031 | The identifications have been made by searching against a spectral library. 0 . . |
| No special processing | MS:1002495 | Used to indicate that none of the above features have been included in the file. |
Fig. 2.
B, If modification re-scoring has been performed, the protocol must be flagged with the specific CV term and a threshold can be specified as to whether a given modification position has been confidently identified. C, The peptide and modification are represented in the re-usable
Fig. 3.
B, A specific CV term is added to the header of the file to indicate that this is a cross-linking search result set. C, The two peptide chains identified from a given spectrum are presented in a pair of