| Literature DB >> 26037908 |
Da Qi1, Huaizhong Zhang2, Jun Fan3, Simon Perkins1, Addolorata Pisconti1, Deborah M Simpson1, Conrad Bessant3, Simon Hubbard2, Andrew R Jones1.
Abstract
The mzQuantML standard has been developed by the Proteomics Standards Initiative for capturing, archiving and exchanging quantitative proteomic data, derived from mass spectrometry. It is a rich XML-based format, capable of representing data about two-dimensional features from LC-MS data, and peptides, proteins or groups of proteins that have been quantified from multiple samples. In this article we report the development of an open source Java-based library of routines for mzQuantML, called the mzqLibrary, and associated software for visualising data called the mzqViewer. The mzqLibrary contains routines for mapping (peptide) identifications on quantified features, inference of protein (group)-level quantification values from peptide-level values, normalisation and basic statistics for differential expression. These routines can be accessed via the command line, via a Java programming interface access or a basic graphical user interface. The mzqLibrary also contains several file format converters, including import converters (to mzQuantML) from OpenMS, Progenesis LC-MS and MaxQuant, and exporters (from mzQuantML) to other standards or useful formats (mzTab, HTML, csv). The mzqViewer contains in-built routines for viewing the tables of data (about features, peptides or proteins), and connects to the R statistical library for more advanced plotting options. The mzqLibrary and mzqViewer packages are available from https://code.google.com/p/mzq-lib/.Entities:
Keywords: Bioinformatics; Data standard; MzQuantML; Proteomics standards initiative (PSI); Software; XML
Mesh:
Year: 2015 PMID: 26037908 PMCID: PMC4973685 DOI: 10.1002/pmic.201400535
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984
The routines presented in the current release of the mzqLibrary and mzqViewe
| Type | Routine | Status | Techniques covered | Inputs (I:) and outputs (O:) | Parameters |
|---|---|---|---|---|---|
| IMPORTER | Progenesis LC‐MS converter | PPP Release 1.0.4 | Label‐free | I: Peptide.csv or Feature.csv and Protein.csv (Progenesis) O: 1 mzq file | |
| IMPORTER | MaxQuant converter | 1.0‐beta | Label‐free and SILAC | I: peptides.txt, proteinGroups.txt, summary.txt, experimentalDesignTemplate.txt O: 1 mzq file | None |
| IMPORTER | OpenMSconsensusXML converter | Alpha | Label‐free | I: 1 consensusXML file O: 1 mzq file containing quantified peptide list | None |
| EXPORTER | mzTab converter | 1.0‐beta | All | I: 1 mzq O: 1 mzTab | None |
| EXPORTER | HTML converter | 1.0‐beta | All | I: 1 mzq O: 1 HTML file | None |
| EXPORTER | CSV converter | 1.0‐beta | All | I: 1 mzq O: 1 CSV (non‐standard format) | None |
| EXPORTER | XLS converter | 1.0‐beta | All | I: 1 mzq O: 1 XLS (Excel) file | None |
| PROCESSING | IDMapper | 1.0‐beta | Label‐free | I: 1 mzq file with feature lists and list of aligned but unidentified peptides and n mzIdentML files – one per FeatureList, O: 1 mzq file | The pairings of raw file name and mzIdentML file |
| PROCESSING | Normalisation | 1.0‐beta | Label‐free | I: 1 mzq file containing “raw” peptide | PSI‐MS CV accession identifying the data type of the input (raw peptide) |
| PROCESSING | Protein quant inference | 1.0‐beta | Any that uses | I: 1 mzq file containing quantified peptides O: 1 mzq file containing quantified peptides and protein groups | PSI‐MS CV accession identifying the data type of the input (peptide) |
| PROCESSING | ANOVA | 1.0‐beta | Any that uses | I: 1 mzq file containing a protein or protein‐group level | N arrays of Assays; PSI‐MS CV accession identifying the data type of the input (protein or protein group) |
| VISUALISATION | Heat map | 1.0‐beta | Any | I: 1 mzq file via mzqViewer; O: on‐screen or PDF | User selects the |
| VISUALISATION | Line plots | 1.0‐beta | Any | I: 1 mzq file via mzqViewer; O: on‐screen | User selects peptides or proteins to plot |
| VISUALISATION | Principal Component Analysis | 1.0‐beta | Any | I: 1 mzq file via mzqViewer; O: on‐screen or PDF | User selects the |
Figure 1Screenshots of the mzqViewer. (A) The main window for viewing quantitative data within the file and calling other plotting features. (B) A heat map exported from the viewer in PDF format. (C) Line plots of a single protein and its constituent peptides. (D) The basic GUI for calling individual routines within the mzqLibrary.
Figure 2MDX processed data shown as (a) a scatter plot of log2 ratios (mean of mdx/mean of wt for normalised protein abundance) as produced natively by Progenesis and by mzqLibrary processing, and (b) as a scatter plot of log10(p‐values) as derived from an ANOVA test for differential expression on normalised protein abundance values, natively exported from Progenesis and from mzqLibrary processing.
Figure 3CPTAC processed data shown as a scatter plot of all protein log2 ratios for (a) E/B, (b) E/C and (c) E/D, showing a strong correlation between values derived from both Progenesis and the mzqLibrary. Log10 p‐values are shown for proteins (d) from the ANOVA test for differential expression, again indicating a good degree of agreement between values from Progenesis and the mzqLibrary. Finally boxplots are used (e) to show the ratios of UPS proteins for conditions B, C, D against E, from both Progenesis processed data and mzqLibrary processed data.
Counts of proteins passing an ANOVA p‐value cut‐off of 0.05, with Bonferroni correction (dividing threshold by protein count of 1263) for various categories
| Category | Protein count |
|---|---|
| Total proteins passing (ANOVA p‐value) threshold from mzqLibrary | 109 |
| Total proteins passing (ANOVA p‐value) threshold from Progenesis QI | 81 |
| Total proteins passing (ANOVA p‐value) threshold from both mzqLibrary & Progenesis QI | 76 |
| UPS proteins identified in both pipelines | 47 |
| UPS proteins passing (ANOVA p‐value) threshold from mzqLibrary | 44 |
| UPS proteins passing (ANOVA p‐value) threshold from Progenesis QI | 44 |
| UPS proteins passing (ANOVA p‐value) threshold from both mzqLibrary & Progenesis QI | 44 |
The same 44 (out of a potential 47 identified) UPS proteins are identified as differentially expressed in mzqLibrary and Progenesis data sets. The stability of the yeast lysate background is unknown, thus the identifications of 76 yeast proteins as differentially expressed (by both packages – out of 81 classified by Progenesis QI for example) is not necessarily an indication of the quality of the analysis.
The performance metrics of the mzqLibrary in terms of speed and memory usage of each use case for three mzqLibrary routines (i.e. Normalisation, ProteinInference and ANOVAPValue)
| File size (MB) | Normalisation | ProteinInference | ANOVAPValue | ||||
|---|---|---|---|---|---|---|---|
| Running time (second) | Memory (MB) | Running time (second) | Memory (MB) | Running time (second) | Memory (MB) | ||
| Use case 1 | 51.2 | 90 | 754 | 104 | 557 | 57 | 300 |
| Use case 2 | 16.9 | 39 | 406 | 44 | 240 | 21 | 85 |
The “File size” is measured from the input mzq file before normalisation routine of each case. The “Running time” and “Memory” values are measured using JProfiler (version 8, https://www.ej‐technologies.com/products/jprofiler/overview.html) on a PC with Xeon® E5‐2630 v2 2.6GHz 2.6GHz CPU and 32GB RAM.