Literature DB >> 22375074

The mzIdentML data standard for mass spectrometry-based proteomics results.

Andrew R Jones¹, Martin Eisenacher, Gerhard Mayer, Oliver Kohlbacher, Jennifer Siepen, Simon J Hubbard, Julian N Selley, Brian C Searle, James Shofstahl, Sean L Seymour, Randall Julian, Pierre-Alain Binz, Eric W Deutsch, Henning Hermjakob, Florian Reisinger, Johannes Griss, Juan Antonio Vizcaíno, Matthew Chambers, Angel Pizarro, David Creasy.

Abstract

We report the release of mzIdentML, an exchange standard for peptide and protein identification data, designed by the Proteomics Standards Initiative. The format was developed by the Proteomics Standards Initiative in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics. Software implementations have been developed to enable conversion from most popular proprietary and open-source formats, and mzIdentML will soon be supported by the major public repositories. These developments enable proteomics scientists to start working with the standard for exchanging and publishing data sets in support of publications and they provide a stable platform for bioinformatics groups and commercial software vendors to work with a single file format for identification data.

Entities: Chemical

Mesh：

Substances：
Proteins

Year: 2012 PMID： 22375074 PMCID： PMC3394945 DOI： 10.1074/mcp.M111.014381

Source DB: PubMed Journal: Mol Cell Proteomics ISSN： 1535-9476 Impact factor: 5.911

Protein identification in proteomics is usually performed by MS in a single stage, Peptide Mass Fingerprinting (PMF), or in two stages (tandem MS, MS/MS or MS2), followed by computational analysis for which a variety of software packages are available. For PMF, the data (an MS peak list) consists of the mass/charge versus intensity values for peptide ions. There are a number of software packages available for identifying proteins from PMF data by searching the peak list against a theoretical digest of a protein sequence database such as: MS-Fit (part of ProteinProspector http://prospector.ucsf.edu/), ProFound (1) and Mascot (2). Tandem MS data typically comprises mass/charge versus intensity values for fragmentation products of an individual peptide, for which there are broadly four types of computational pipelines used for interpretation: (1) a sequence database search in which mass/charge values for peptide fragments are queried against an in silico digest of a protein sequence database—Mascot (2), Sequest (3), OMSSA (4), X!Tandem (5), Phenyx (6) (2) de novo sequencing in which the software attempts to identify the complete or partial peptide sequence directly from the spectrum—PEAKS (7), Lutefisk (8), PepNovo (9), Mascot Distiller; (3) tag searching whereby software identifies short sequences of amino acids de novo (for example three amino acids in length) that are used to pre-filter a protein sequence database to reduce the database search space—PeptideSearch (10), InsPecT (11), MS-SEQ in ProteinProspector, Mascot (2), Paragon (12); (4) searches against libraries of experimental spectra that have been pre-assigned to a peptide sequence—SpectraST (13), X!Hunter (14), Bibliospec (15). The release of genome sequences for most species studied, and hence well curated protein sequence databases, means that most proteomic pipelines now use method 1, although there are many applications in which other methods still have considerable utility. There have been developments in statistical techniques for determining whether an individual peptide-spectrum match (PSM), or a protein inferred from a set of PSMs has been correctly identified, as well as techniques for assigning significance values across a global set of identifications in shotgun experiments, such as decoy database searches (16). However, in most proteomic laboratories there remains considerable heterogeneity in the metrics used by experimentalists to determine which peptide/protein identifications are likely to be correct and there is little consensus on the best statistical approach to use. As such, different groups apply different algorithms for determining whether or not a peptide/protein is present. Differences in any part of the analysis workflow may result in different identification lists being produced and thus substantial metadata must be reported to allow critical analysis of the results. Attempts have been made to improve the consistency and quality of proteomics data reported through minimum reporting guidelines (17, 18). Several journals recommend that authors wishing to publish must be compliant with reporting guidelines and deposit their data in a public repository. A number of public proteomics databases exist, with PeptideAtlas (19), PRIDE (20), and GPMDB (21) being the most prominent. However, search engines produce different file formats and each represents data and metadata using different terminology and levels of detail. The bioinformatics expertise required to deal with these issues may not be available to all laboratories, making it difficult for researchers to adhere to minimum reporting guidelines. Consequently, in contrast to the situation in other high-throughput omics technologies, comparatively few MS proteomics data sets are currently available in the public domain (22). Additionally, bioinformatics groups and commercial software vendors continue to support only a subset of the proprietary and open-source formats for identification data, resulting in considerable wasted effort writing bespoke file format converters and keeping existing converters compatible with rapidly changing proprietary formats. The Proteomics Standards Initiative (PSI) of the Human Proteome Organization (HUPO) was created to facilitate community-driven standardization in proteomics data reporting, and has created several reporting requirements documents under the Minimum Information About a Proteomics Experiment (MIAPE) umbrella (18) and data format standards, including mzML for capturing mass spectra (23) and PSI-MI for molecular interactions (24). The PSI, in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics, recognized that there was a growing need for a standard format for MS-based proteomics results, which led to the development of mzIdentML. A recent set of recommendations for mass spectrometry data quality metrics discussed the strong need to associate appropriate meta data with actual data to enable quality estimates to be made on a published dataset (25–27). The mzIdentML standard is, similar to mzML, coping with this requirement and is able to support meta data associated with the identification of peptides and proteins.

EXPERIMENTAL PROCEDURES

Early model drafts of mzIdentML were developed by examining existing formats produced by different software packages and other open formats, such as pepXML/protXML from the Institute for Systems Biology (28) and PEDRo, developed at the University of Manchester (29). The model was developed over several years in a process open to all interested parties and transparent at each stage, consisting of mailing list discussions, a code repository (http://code.google.com/p/psi-pi/), regular conference calls, and development workshops at each PSI meeting (30–33). The mzIdentML specifications were first submitted to the PSI document process in late 2008 and completed in August 2009 from which version 1.0 was released. The process ensures that specifications undergo a formal process, consisting of a public comment phase and anonymous review, similar to a journal article (34). The first software implementations identified some minor issues, particularly related to large file sizes containing some redundancy for large shotgun experiments. Here we report on version 1.1 of the standard, which has been created to reduce redundancy and was released in August 2011 from a second round of the PSI's document process. We expect that version 1.1 will be the stable release, similar to the PSI's mzML format (23). As such, the format has had input from a wide range of stakeholders and represents the consensus view of the academic and industrial research community and software vendors. The schema was tested during the process by creating example files converted from the main search engine formats, and by ensuring that the MIAPE specifications could be fulfilled. mzIdentML uses several components derived from the FuGE schema (35), which has been adapted in this context to facilitate integration with other PSI standards. The controlled vocabulary was first developed by collecting terms from vendors of different software packages. Terms were added to a hierarchy according to logical groupings, which also facilitate the development of mappings between the schema and the CV. In common with other PSI CVs, the CV is in OBO format (http://www.geneontology.org/GO.format.shtml). New terms can be added to the CV by raising a request on the PSI website or the PSI mailing list.

RESULTS

The mzIdentML format stores peptide and protein identifications based on mass spectrometry (Fig. 1) and captures metadata about methods, parameters, and quality metrics. Data are represented through a collection of protein sequences, peptide sequences (with modifications), and structures for capturing the scores associated with ranked peptide matches for each spectrum searched.

Fig. 1.

The overall structure of a typical mzIdentML file. Each file must contain one or more instances of SpectrumIdentificationList (the set of peptide identifications made by a search) and must contain zero or one ProteinDetectionList (the set of proteins identities inferred from peptide identifications).

Peptide Identifications

A typical peptide-spectrum match (PSM) is recorded in mzIdentML as shown in Fig. 2. A ranked set of peptides matched to the same spectrum is collected under with each single PSM recorded as an instance of . references the element, which captures a single unique representation of the peptide sequence and any modifications (see below) that have been found, to reduce file size if the same peptide has been identified multiple times. Attributes are provided on for the rank, peptide charge state, experimental and calculated mass/charge values. The peptide sequence could have arisen from several different protein sequences (), so a many-to-many mapping () is provided representing all the protein sequences in which the peptide sequences can be found. has attributes for the start/end positions of the peptide within the protein sequence and the flanking residues. mzIdentML makes no attempt to import the spectra that were searched because several file formats, such as the PSI's mzML format (36), already exist for this purpose. Each references the spectrum from which identifications have been made in an external format. As part of the documentation, guidelines are provided for unambiguously referencing a single spectrum within an mzML file or within other data formats that may be inputs to a search engine (mgf, dta, mzXML, mzData, pkl, etc.). For many use cases, it is expected that mzIdentML should be transferred in tandem with the peak list file that was searched.

Fig. 2.

Peptide identification from MS/MS represented in mzIdentML: (i) DBSequence stores database entries, such as complete protein sequences and accessions for their retrieval from external databases; (ii) Peptide holds individual peptide sequences and modifications that have been identified; (iii) PeptideEvidence instances provide the mappings between a peptide sequence and all the protein sequences from which it could have arisen; (iv) The association between SpectrumIdentificationItem and PeptideEvidence is the core result of a single PSM; and (v) SpectrumIdentificationResult captures all ranked identifications (SpectrumIdentificationItem) made from one spectrum and is mapped back to the source spectrum in an external format, such as mzML. Note, the representation of some attributes and elements has been shortened to simplify the figure, for example scores and metrics are represented in mzIdentML using CV terms to incorporate flexibility and extensibility into the schema. Peptide and protein identifications are generally associated with some measure related to the probability of a correct identification, and it is common to use a threshold on these metrics. Where the threshold is applied can dramatically alter conclusions, and thus it is important to record it. The threshold used is specified by controlled vocabulary terms within the . has a Boolean attribute, passThreshold, to allow the reporting of identifications that fall below the significance threshold, which are often not considered part of the result set. The inclusion of identifications below the threshold used by the original authors allows subsequent re-analysis by others, allowing them the benefit of the full initial results. This allows a broader range of alternate analysis options, including those that might make different assumptions. In order to assess the quality of a peptide identification made from tandem MS, it can be important to know which products of peptide fragmentation have been identified. mzIdentML uses controlled vocabulary terms to specify the types of ions that have been found (e.g. a-, b-, c-, x-, y-, z-ions and neutral losses of these) and captures data about the ions (such as mass/charge and intensity values) in a compressed array structure within . Because the input spectrum has been referenced in an external format, it is straightforward to write a spectrum viewer showing which product ions have been identified by the search engine, or an application to perform further statistical processing of individual PSMs.

Peptide Modifications

Modifications that have been identified on peptides are encoded in the element (child of ) using a combination of a controlled vocabulary term sourced from Unimod (37) or PSI-MOD (38) (for the name/molecular structure of the modification), the mass delta searched and the location of the modification within the peptide sequence. This representation should ensure that databases or tools importing files can provide consistent analysis, comparison and querying capabilities. If the modification is unknown, the export software can explicitly encode this information, using an “unknown modification” term and the mass delta. If multiple CV terms are provided within a single element, it is understood that the modification is ambiguous but has been identified as one of those listed. Additional scores associated with modification sites should be encoded within the that references because such information is specific to a given PSM.

Protein Identifications and Protein Ambiguity Groups

In “shotgun” approaches, where proteins are digested in peptides prior to separation, the linkage from peptide identifications to protein identifications is lost. It is common for a peptide sequence to be present in more than one protein so software applications must infer the most likely protein identity from a set of peptides. mzIdentML has been designed to accommodate the ambiguity of protein inference (Fig. 3). represents one possible protein identification corresponding to a with one accession (with associated scores or probability values), given a set of peptide identifications, reported as references to the set of elements on which it was based. sits above in the hierarchy, acting as a logical grouping of related hypotheses, for example where the same set of peptide sequences provides supporting evidence for more than one protein identification. This structure allows ambiguity to be communicated, preventing the data producer from having to take a final decision on which proteins are present or absent in the sample. The inclusion of p values for protein identifications, for example output by ProteinProphet (39), would allow data consumers to process the results in different ways depending on the context.

Fig. 3.

Protein identifications represented in mzIdentML. If the same set of peptide sequences provides supporting evidence for more than one protein, the proteins appear within a ProteinAmbiguityGroup. (i) Each ProteinDetectionHypothesis contains references back to the instances of PeptideEvidence on which it is based, onward references to Peptide not shown. (ii) The ProteinDetectionHypothesis element has associations to all SpectrumIdentificationItem elements that have been used for protein inference. (iii) Each ProteinDetectionHypothesis references the protein sequence (DBSequence) that has been identified. An mzIdentML file contains at most one , determined as the final result of an analysis procedure, with no intermediate results reported. In some workflows, a set of protein identifications undergo secondary statistical processing or manual validation over the initial search engine output. Such workflows are encoded in mzIdentML as one overall process that produces the final set of proteins. The design decision was taken to reduce the chance of ambiguity in how different implementers express a data set and to make it simpler for data consumers to process the results. As with peptide identifications, it is possible to report protein identifications that fall below a given threshold or those that have been determined by manual inspection to be incorrect.

Representing Specific Use Cases

Example mzIdentML documents have been made available to illustrate the wide range of proteomic analyses that are supported (http://code.google.com/p/psi-pi/source/browse/trunk/examples/1_1examples): PMF (supplemental file: mascot_pmf_example.mzid), “standard” tandem MS analysis from different search engines (supplemental files: 55merge_tandem.mzid, 55merge_omssa.mzid, 55merge_mascot_full.mzid, Sequest_example_ver1.1.mzid, Phenyx-example.mzid, Mascot_MSMS_example.mzid); and a spectral library search from SpectraST (supplemental file: spectraST.mzid). No attempt has been made to standardize the score parameters output by different search engines, instead differences between the scores and other parameters reported are documented through the use of controlled vocabulary terms. A common analysis approach is to employ multiple search engines (40–43), which can be accommodated in mzIdentML by encoding a process that references several instances of (one per search engine) as input, to produce a single as output (Supplemental file: MPC_example_Multiple_search_engines.mzid). The search of nucleic acid sequences requires translation of nucleic acid sequence into the corresponding amino acids. In mzIdentML, the different rules governing the translation are documented using CV terms. Example encodings of NCBI translation tables (http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi) within are provided (supplemental file: Mascot_NA_example.mzid) in which every instance of contains a reference to the translation table used and the reading frame. A common experimental approach in quantitative proteomics is the use of stable isotope labeling, which typically results in heavy and light versions of amino acids. The mass of each amino acid can be reported within the element (supplemental file: Mascot_N15_example.mzid). In an experiment using stable isotope labeling, two tables are reported for the amino acid masses with the light or heavy isotope incorporated. Every element provides a reference to the appropriate mass table to demonstrate how the molecular weight has been calculated for the PSM. Finally, the use of decoy database searching is a popular method by which the false discovery rate may be estimated (16, 44). The element has a Boolean attribute, isDecoy, which allows consumers of the file to calculate the false discovery rate for different score thresholds (supplemental files: 55merge_omssa.mzid and MPC_example_Multiple_search_engines.mzid). Regarding de novo peptide sequencing results it is possible to enumerate and record all possible matches found by a de novo technique. However, this can produce very large files and we invite proposals in this area for suitable encoding of alternative results in a more compressed structure. In case of sequence-tagged searches the final results from a run can be stored in mzIdentML, but the details of tag generation and filtering cannot, except through the additional annotation of elements with new CV terms. In the case of spectral library searches the recommended encoding is similar to sequence database search results (spectraST.mzid), the main difference being that rather than protein sequences represented in the element, the peptide sequence for each library entry is stored here instead. Additional information about the peptide-spectrum match, such as observed modifications and consensus scores, can be stored as CV terms within each entry.

Controlled Vocabulary

The group has developed controlled vocabulary terms as part of the wider PSI-Mass Spectrometry CV, which is used by mzML and other PSI formats, to ensure unambiguous reporting of search engine methods, parameters and scores, available from the project homepage (http://www.psidev.info/controlled-vocabularies). Each entry contains a definition and a specification of whether the term should be paired with a value, and if so, what the data type and unit should be. The CV therefore provides a flexible mechanism for constraining the values allowed in instance documents without hard-coding enumerations within the schema. As an example, a with a Mascot ion score of 62.7 would be encoded with the following CV term: The accession references a PSI-MS CV term that provides a formal definition of the Mascot score, specifies that a value must be given (as a double-precision floating-point data type) and that no units should be provided. An example term that requires a unit is the fragment ion search tolerance (e.g. ± 0.5 Da) within the , encoded as two CV terms: In this instance, the MS:1001412 CV entry specifies that a mass unit must be provided from the Unit Ontology (available from the OBO foundry, http://www.obofoundry.org/). In addition to the CV, a mapping format has been developed by the PSI which provides formal rules for associating XML Schema elements with particular CV terms (45). The mapping file is checked by validation software to ensure that not only are correct elements provided within mzIdentML (the file is valid XML) but also that valid and sensible data values have been provided at the correct positions within the element hierarchy. The association of an XML Schema with a CV is a general problem that complicates the process of interpreting exchange formats that use CV terms, because terms are frequently used inconsistently or incorrectly. The solution developed by PSI should ensure that consistent, machine-comprehensible files are produced and provides a re-usable solution for other format developers with similar challenges, for example the PSI Molecular Interactions work group (46).

Relationship to MIAPE and MCP Guidelines

The PSI has created the MIAPE guidelines (18) that comprise a parent document and a series of technology specific modules (47–50). Each module is a minimal checklist of information that should be reported about an experiment when it is published in a journal or a data set is submitted to a repository. MIAPE is intended to ensure that the quality checking goals of journals, funding agencies and repository operators can be met. The module corresponding to mzIdentML is MIAPE-Mass Spectrometry Informatics (MSI) (47). An mzIdentML instance document can be a technically valid document without being MIAPE-compliant. supplemental Table S1 provides the mapping relationship between the items required in MIAPE-MSI (version 1.1) and what is captured by mzIdentML, including examples drawn from referenced instance documents. MIAPE-MSI compliance can be fully reached using mzIdentML except for the quantification aspects. The semantic validation software (http://psidev.info/validator) will be adapted to check whether MIAPE compliance has been reached by particular files. Molecular and Cellular Proteomics (MCP), the Journal of Proteome Research, Proteomics, and other journals oblige or suggest authors to adhere to specific guidelines detailing information that should be submitted with manuscripts. Much of this information is currently submitted as tables of protein identifications and annotated spectra. An mzIdentML document can encapsulate all of the data required for these journals apart from the quantification requirements. supplemental Table S2 describes the conformance to these so-called “Paris guidelines” (April 2007 release, (17, 51)).

Quantification Data in mzQuantML

Numerous experimental methods have been developed for quantitative proteomics by incorporating stable isotopic labels or isobaric tags, or by label-free methods (52), and as such, the PSI Proteomics Informatics workgroup is also developing a complementary format for quantification data, called mzQuantML. The purpose of mzQuantML is to communicate data about peptide and protein abundance, such as ratios of quantitative differences across different samples or absolute measures of protein abundance. The format also contains structures for describing how data has been combined from the peptide level up to the protein level and across replicates. The development process of mzQuantML is ongoing, and we encourage further input (please see the group webpage for details http://www.psidev.info/mzQuantML).

Implementations

There is a growing list of implementations available for mzIdentML (http://www.psidev.info/tools-implementing-mzidentml). Results in mzIdentML format can be exported directly from Mascot (2) (export of version 1.0 available in version 2.3, version 1.1 exporter under development), and converters are currently available for Sequest (3) and Proteome Discoverer output (.msf and .protXML) (e.g. within ProCon: http://www.medizinisches-proteom-center.de/ProCon), OMSSA (4) and X!Tandem (5) (http://code.google.com/p/mzidentml-parsers/), and in the pipeline applications Scaffold (42) (import into Scaffold PTM and export of mzIdentML available in Scaffold version 3) and TPP (28) (results can be exported to mzIdentML via the ProteoWizard (53) converter). A beta exporter is also available for Phenyx (6). OpenMS (54) implements C++ code for reading (and as of release 1.9) writing mzIdentML. The OpenMS pipeline tools, TOPP (55), will fully support mzIdentML as of release 1.9 and can convert mzIdentML to and from various other identification formats. PeptideAtlas accepts mass spectrometer output files in a variety of formats, which are processed using standard parameters through the TPP, providing results for download in pepXML and protXML. The ProteoWizard converter can now be used to convert pepXML into mzIdentML, and the full integration of direct mzIdentML export using this mechanism is expected in PeptideAtlas in 2012. An open-source Java API for reading and writing mzIdentML has also been developed, available from http://code.google.com/p/jmzidentml/. PRIDE currently uses its own internal format called PRIDE XML for representing mass spectra and peptide and protein identifications, but is currently in the process of moving its internal pipeline and database schema over to support a complete import/export of mzIdentML (and the PSI standard for mass spectra, mzML). PRIDE can already take data submissions in mzIdentML version 1.1 by converting the files to PRIDE XML. As mentioned above, full import/export support for mzIdentML is under development and it is expected to finalize during 2012. In addition, work is ongoing to fully support the format by the PRIDE Inspector tool (). It is expected that once mzIdentML becomes well established as a community format, tools will routinely use mzIdentML internally for data representation and processing. The combination of the mzIdentML XML Schema plus the associated mapping file and semantic validation software define the minimum information required to create a “valid” file when converting from other identification formats used in proteomics. However, software is also under development to link the mzIdentML specifications formally to the corresponding MIAPE module to enable an automatic test for compliance. As such, an mzIdentML file could have several different states, depending on the user's requirements: (1) valid against the XML Schema but not semantically valid; (2) XML schema valid and semantically valid; and (3) XML schema valid, semantically valid and MIAPE compliant. For public database or tool import—levels (2) or (3) should be reached depending on the context. Level (1) should only be used in tools internally, and would not be considered suitable for transfer between tools or making data sets publicly available.

EXAMPLE FILES

All example files described in the text can be downloaded from: http://code.google.com/p/psi-pi/source/browse/trunk/examples/1_1examples/ • 55merge_mascot_full.mzid - example MS-MS search results including decoy matches from Mascot. • 55merge_omssa.mzid - example MS-MS search results including decoy matches from OMSSA. • 55merge_tandem.mzid - example MS-MS search results including decoy matches from X!Tandem. • MPC_example_Multiple_search_engines.mzid - an example of PSMs from different search engines, assembled into proteins using a third-party algorithm; false-discovery estimation using decoy database. • Mascot_NA_example.mzid - an example of a search against an EST database with Mascot. • Mascot_top_down_example.mzid - a single MS/MS spectra from an intact protein, searched with Mascot. • Sequest_example_ver1.1.mzid - a simple example derived from an “.out” file produced by SEQUEST. • mascot_pmf_example.mzid - example Peptide Mass Fingerprint search with Mascot. • spectraST.mzid - examples search against a spectral library using spectraST • Mascot_N15_example.mzid - an example of a search using two sets of residue masses, 14N and 15N with Mascot. • phenyx-example.mzid - a tandem MS example exported from the Phenyx software. • Mascot_MSMS_example.mzid - a further example of a tandem MS data file exported from Mascot.

DISCUSSION

The mzIdentML standard (and accompanying controlled vocabulary) has been developed over several years within the PSI's standardization process, which is open to all interested parties and transparent at each stage. As such, the format has had input from a wide range of stakeholders and represents the consensus view of academic research groups, industrial representatives and software vendors working in this area. The standard was fixed at version 1.1 in August 2011. Alterations to the schema that could affect software implementations cannot be made without re-entering the standardization process and no major changes, beyond minor bug fixes, are currently planned by the PSI. The PSI proteome informatics workgroup has a stable core of developers working on mzIdentML implementations and we are committed to providing documentation, help guides and support (via the mailing list) for external implementers in the coming years. We anticipate that the release of mzIdentML will greatly facilitate data sharing for proteomics, and its release will serve as the basis for informatics developments in quantitative proteomics. We encourage further input on the standard by joining the mailing list or attending a PSI meeting (see http://www.psidev.info/ for details).

55 in total

1. Unimod: Protein modifications for mass spectrometry.

Authors: David M Creasy; John S Cottrell
Journal: Proteomics Date: 2004-06 Impact factor: 3.984

2. Open source system for analyzing, validating, and storing protein identification data.

Authors: Robertson Craig; John P Cortens; Ronald C Beavis
Journal: J Proteome Res Date: 2004 Nov-Dec Impact factor: 4.466

3. PepNovo: de novo peptide sequencing via probabilistic network modeling.

Authors: Ari Frank; Pavel Pevzner
Journal: Anal Chem Date: 2005-02-15 Impact factor: 6.986

4. Using annotated peptide mass spectrum libraries for protein identification.

Authors: R Craig; J C Cortens; D Fenyo; R C Beavis
Journal: J Proteome Res Date: 2006-08 Impact factor: 4.466

5. The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra.

Authors: Ignat V Shilov; Sean L Seymour; Alpesh A Patel; Alex Loboda; Wilfred H Tang; Sean P Keating; Christie L Hunter; Lydia M Nuwaysir; Daniel A Schaeffer
Journal: Mol Cell Proteomics Date: 2007-05-27 Impact factor: 5.911

Review 6. The minimum information about a proteomics experiment (MIAPE).

Authors: Chris F Taylor; Norman W Paton; Kathryn S Lilley; Pierre-Alain Binz; Randall K Julian; Andrew R Jones; Weimin Zhu; Rolf Apweiler; Ruedi Aebersold; Eric W Deutsch; Michael J Dunn; Albert J R Heck; Alexander Leitner; Marcus Macht; Matthias Mann; Lennart Martens; Thomas A Neubert; Scott D Patterson; Peipei Ping; Sean L Seymour; Puneet Souda; Akira Tsugita; Joel Vandekerckhove; Thomas M Vondriska; Julian P Whitelegge; Marc R Wilkins; Ioannnis Xenarios; John R Yates; Henning Hermjakob
Journal: Nat Biotechnol Date: 2007-08 Impact factor: 54.908

7. The PSI-MOD community standard for representation of protein modification data.

Authors: Luisa Montecchi-Palazzi; Ron Beavis; Pierre-Alain Binz; Robert J Chalkley; John Cottrell; David Creasy; Jim Shofstahl; Sean L Seymour; John S Garavelli
Journal: Nat Biotechnol Date: 2008-08 Impact factor: 54.908

8. Guidelines for reporting the use of mass spectrometry informatics in proteomics.

Authors: Pierre-Alain Binz; Robert Barkovich; Ronald C Beavis; David Creasy; David M Horn; Randall K Julian; Sean L Seymour; Chris F Taylor; Yves Vandenbrouck
Journal: Nat Biotechnol Date: 2008-08 Impact factor: 54.908

9. The PSI semantic validator: a framework to check MIAPE compliance of proteomics data.

Authors: Luisa Montecchi-Palazzi; Samuel Kerrien; Florian Reisinger; Bruno Aranda; Andrew R Jones; Lennart Martens; Henning Hermjakob
Journal: Proteomics Date: 2009-11 Impact factor: 3.984

10. OpenMS - an open-source software framework for mass spectrometry.

Authors: Marc Sturm; Andreas Bertsch; Clemens Gröpl; Andreas Hildebrandt; Rene Hussong; Eva Lange; Nico Pfeifer; Ole Schulz-Trieglaff; Alexandra Zerck; Knut Reinert; Oliver Kohlbacher
Journal: BMC Bioinformatics Date: 2008-03-26 Impact factor: 3.169

82 in total

Review 1. Combining results of multiple search engines in proteomics.

Authors: David Shteynberg; Alexey I Nesvizhskii; Robert L Moritz; Eric W Deutsch
Journal: Mol Cell Proteomics Date: 2013-05-29 Impact factor: 5.911

2. PeptideShaker enables reanalysis of MS-derived proteomics data sets.

Authors: Marc Vaudel; Julia M Burkhart; René P Zahedi; Eystein Oveland; Frode S Berven; Albert Sickmann; Lennart Martens; Harald Barsnes
Journal: Nat Biotechnol Date: 2015-01 Impact factor: 54.908

3. Interactive Peptide Spectral Annotator: A Versatile Web-based Tool for Proteomic Applications.

Authors: Dain R Brademan; Nicholas M Riley; Nicholas W Kwiecien; Joshua J Coon
Journal: Mol Cell Proteomics Date: 2019-05-14 Impact factor: 5.911

4. PDV: an integrative proteomics data viewer.

Authors: Kai Li; Marc Vaudel; Bing Zhang; Yan Ren; Bo Wen
Journal: Bioinformatics Date: 2019-04-01 Impact factor: 6.937

Review 5. Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics.

Authors: Eric W Deutsch; Luis Mendoza; David Shteynberg; Joseph Slagel; Zhi Sun; Robert L Moritz
Journal: Proteomics Clin Appl Date: 2015-04-02 Impact factor: 3.494

Review 6. Quality assessment for clinical proteomics.

Authors: David L Tabb
Journal: Clin Biochem Date: 2012-12-12 Impact factor: 3.281

7. The HUPO proteomics standards initiative- mass spectrometry controlled vocabulary.

Authors: Gerhard Mayer; Luisa Montecchi-Palazzi; David Ovelleiro; Andrew R Jones; Pierre-Alain Binz; Eric W Deutsch; Matthew Chambers; Marius Kallhardt; Fredrik Levander; James Shofstahl; Sandra Orchard; Juan Antonio Vizcaíno; Henning Hermjakob; Christian Stephan; Helmut E Meyer; Martin Eisenacher
Journal: Database (Oxford) Date: 2013-03-12 Impact factor: 3.451

8. An Open Data Format for Visualization and Analysis of Cross-Linked Mass Spectrometry Results.

Authors: Michael R Hoopmann; Luis Mendoza; Eric W Deutsch; David Shteynberg; Robert L Moritz
Journal: J Am Soc Mass Spectrom Date: 2016-07-28 Impact factor: 3.109

Review 9. Proteomics-based methods for discovery, quantification, and validation of protein-protein interactions.

Authors: Yana V Miteva; Hanna G Budayeva; Ileana M Cristea
Journal: Anal Chem Date: 2012-12-12 Impact factor: 6.986

10. Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 2.1.

Authors: Eric W Deutsch; Christopher M Overall; Jennifer E Van Eyk; Mark S Baker; Young-Ki Paik; Susan T Weintraub; Lydie Lane; Lennart Martens; Yves Vandenbrouck; Ulrike Kusebauch; William S Hancock; Henning Hermjakob; Ruedi Aebersold; Robert L Moritz; Gilbert S Omenn
Journal: J Proteome Res Date: 2016-08-24 Impact factor: 4.466