| Literature DB >> 22375074 |
Andrew R Jones1, Martin Eisenacher, Gerhard Mayer, Oliver Kohlbacher, Jennifer Siepen, Simon J Hubbard, Julian N Selley, Brian C Searle, James Shofstahl, Sean L Seymour, Randall Julian, Pierre-Alain Binz, Eric W Deutsch, Henning Hermjakob, Florian Reisinger, Johannes Griss, Juan Antonio Vizcaíno, Matthew Chambers, Angel Pizarro, David Creasy.
Abstract
We report the release of mzIdentML, an exchange standard for peptide and protein identification data, designed by the Proteomics Standards Initiative. The format was developed by the Proteomics Standards Initiative in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics. Software implementations have been developed to enable conversion from most popular proprietary and open-source formats, and mzIdentML will soon be supported by the major public repositories. These developments enable proteomics scientists to start working with the standard for exchanging and publishing data sets in support of publications and they provide a stable platform for bioinformatics groups and commercial software vendors to work with a single file format for identification data.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22375074 PMCID: PMC3394945 DOI: 10.1074/mcp.M111.014381
Source DB: PubMed Journal: Mol Cell Proteomics ISSN: 1535-9476 Impact factor: 5.911
Fig. 1.The overall structure of a typical mzIdentML file. Each file must contain one or more instances of SpectrumIdentificationList (the set of peptide identifications made by a search) and must contain zero or one ProteinDetectionList (the set of proteins identities inferred from peptide identifications).
Fig. 2.Peptide identification from MS/MS represented in mzIdentML: (i) DBSequence stores database entries, such as complete protein sequences and accessions for their retrieval from external databases; (ii) Peptide holds individual peptide sequences and modifications that have been identified; (iii) PeptideEvidence instances provide the mappings between a peptide sequence and all the protein sequences from which it could have arisen; (iv) The association between SpectrumIdentificationItem and PeptideEvidence is the core result of a single PSM; and (v) SpectrumIdentificationResult captures all ranked identifications (SpectrumIdentificationItem) made from one spectrum and is mapped back to the source spectrum in an external format, such as mzML. Note, the representation of some attributes and elements has been shortened to simplify the figure, for example scores and metrics are represented in mzIdentML using CV terms to incorporate flexibility and extensibility into the schema.
Fig. 3.Protein identifications represented in mzIdentML. If the same set of peptide sequences provides supporting evidence for more than one protein, the proteins appear within a ProteinAmbiguityGroup. (i) Each ProteinDetectionHypothesis contains references back to the instances of PeptideEvidence on which it is based, onward references to Peptide not shown. (ii) The ProteinDetectionHypothesis element has associations to all SpectrumIdentificationItem elements that have been used for protein inference. (iii) Each ProteinDetectionHypothesis references the protein sequence (DBSequence) that has been identified.