| Literature DB >> 22949509 |
Richard G Côté1, Johannes Griss, José A Dianes, Rui Wang, James C Wright, Henk W P van den Toorn, Bas van Breukelen, Albert J R Heck, Niels Hulstaert, Lennart Martens, Florian Reisinger, Attila Csordas, David Ovelleiro, Yasset Perez-Rivevol, Harald Barsnes, Henning Hermjakob, Juan Antonio Vizcaíno.
Abstract
The original PRIDE Converter tool greatly simplified the process of submitting mass spectrometry (MS)-based proteomics data to the PRIDE database. However, after much user feedback, it was noted that the tool had some limitations and could not handle several user requirements that were now becoming commonplace. This prompted us to design and implement a whole new suite of tools that would build on the successes of the original PRIDE Converter and allow users to generate submission-ready, well-annotated PRIDE XML files. The PRIDE Converter 2 tool suite allows users to convert search result files into PRIDE XML (the format needed for performing submissions to the PRIDE database), generate mzTab skeleton files that can be used as a basis to submit quantitative and gel-based MS data, and post-process PRIDE XML files by filtering out contaminants and empty spectra, or by merging several PRIDE XML files together. All the tools have both a graphical user interface that provides a dialog-based, user-friendly way to convert and prepare files for submission, as well as a command-line interface that can be used to integrate the tools into existing or novel pipelines, for batch processing and power users. The PRIDE Converter 2 tool suite will thus become a cornerstone in the submission process to PRIDE and, by extension, to the ProteomeXchange consortium of MS-proteomics data repositories.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22949509 PMCID: PMC3518121 DOI: 10.1074/mcp.O112.021543
Source DB: PubMed Journal: Mol Cell Proteomics ISSN: 1535-9476 Impact factor: 5.911
Fig. 1.Schematic overview of the workflow and interactions between the tools in the The PRIDE mzTab Generator parses the search result files and generates skeleton mzTab files that can be used as input files to PRIDE Converter 2. The PRIDE XML files generated by PRIDE Converter 2 can be filtered using the PRIDE XML Filter tool and/or merged into a single PRIDE XML file using the PRIDE XML Merger tool.
Tools in the PRIDE Converter 2 tool suite
| Tool name | Function |
|---|---|
| Converts search engine output files into valid, well-annotated PRIDE XML files ready for submission. | |
| Generates skeleton mzTab files where the user can add quantitative and/or gel data. | |
| Merges several PRIDE XML files together, while maintaining internal consistency in spectra and peptide links. | |
| Post-processes PRIDE XML files according to filter rules to remove contaminants, empty spectra and/or update the protein inference assignments. |
Supported formats in PRIDE Converter 2
| Format name | File type | Data content | New in | Used APIs |
|---|---|---|---|---|
| Mascot | .dat | Spectra and Identifications | No | Mascot API ( |
| mzIdentML | .xml | Spectra and Identifications | Yes | jmzIdentML ( |
| X!Tandem | .xml | Spectra and Identifications | No | xtandem-parser ( |
| OMSSA | .csv | Spectra and Identifications | No | New |
| SpectraST | .txt | Spectra and Identifications | Yes | New |
| CRUX | .txt | Spectra and Identifications | Yes | New |
| MSGF | .txt | Spectra and Identifications | Yes | New |
| Proteome Discoverer | .msf | Spectra and Identifications | Yes | Thermo MSF Parser ( |
| mzML | .xml | Spectra Only | Yes | jmzML ( |
| DTA | .dta | Spectra Only | No | jmzReader ( |
| MGF | .mgf | Spectra Only | No | jmzReader ( |
| mzData | .xml | Spectra Only | No | jmzReader ( |
| mzXML | .xml | Spectra Only | No | jmzReader ( |
| PKL | .pkl | Spectra Only | No | jmzReader ( |
Fig. 2.The The GUI for all tools shares common features, wherever possible, to improve usability and provide a consistent user experience. Each tool is composed of a series of forms that are presented in a wizard-like manner. Users can navigate through the forms using a series of buttons located on the lower right corner. A context-sensitive help button is always available, as is a short informative message on the role of the form and what information is expected. User input is always validated to ensure that all required fields are correctly filled-in and a graphical validation status is updated each time a navigation button is pressed.
Fig. 3.The 12-step process of converting search engine result files into well-annotated PRIDE XML files. The approximate duration of the different steps in the conversion process are indicated to give an idea about the time required to do a submission. The boxes not filled are indicating that the duration of these steps depends on the size of the input files. The other steps are related to file selection and/or metadata annotation, and are independent of the size of the files. A Mac Book Pro laptop with 8 GBs of RAM running Mac OS X 10.6.8 was used to estimate the timings. To summarize, users select the appropriate format of their search engine files, then select the file(s) to convert and set any DAO-specific options, if applicable. The annotation process starts with contact, reference, and general project descriptions, then moves on to sample annotations, protocols, instrumentation details, and software processing details. The users are asked to review or complete the automatic PTM annotations and add any additional relevant experiment-level details. The report files are then finalized and the users can either stop the GUI process here or proceed to PRIDE XML file generation. This is where filtering options can also be set. Once the conversion process has completed, the users are invited to review their PRIDE XML files with the PRIDE Inspector tool and submit them to PRIDE and to the ProteomeXchange consortium.