| Literature DB >> 34615866 |
Chengxin Dai1, Anja Füllgrabe2, Julianus Pfeuffer3,4, Elizaveta M Solovyeva5,6, Jingwen Deng1, Pablo Moreno2, Selvakumar Kamatchinathan2, Deepti Jaiswal Kundu2, Nancy George2, Silvie Fexova2, Björn Grüning7, Melanie Christine Föll8,9, Johannes Griss10, Marc Vaudel11, Enrique Audain12, Marie Locard-Paulet13, Michael Turewicz14,15, Martin Eisenacher14,15, Julian Uszkoreit14,15, Tim Van Den Bossche16,17, Veit Schwämmle18, Henry Webel13, Stefan Schulze19, David Bouyssié20, Savita Jayaram21, Vinay Kumar Duggineni21, Patroklos Samaras22, Mathias Wilhelm22, Meena Choi23, Mingxun Wang24, Oliver Kohlbacher25,26,27, Alvis Brazma2, Irene Papatheodorou2, Nuno Bandeira24,28, Eric W Deutsch29, Juan Antonio Vizcaíno2, Mingze Bai30,31, Timo Sachsenberg32, Lev I Levitsky33, Yasset Perez-Riverol34.
Abstract
The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.Entities:
Mesh:
Year: 2021 PMID: 34615866 PMCID: PMC8494749 DOI: 10.1038/s41467-021-26111-3
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1SDRF-Proteomics representation for a label-free-based experiment without fractionation.
a Experimental design, including two biological replicates and two technical replicates per biological replicate. The biological and technical replicates are defined by the variable under study (e.g., phenotype). b The SDRF tab-delimited file, including the three main sections highlighted: sample metadata, data file properties, and the variables under study (factor values).
Fig. 2SDRF-Proteomics file for an experiment combining TMT labeling and sample fractionation.
a TMT experimental design with three samples and three fractions. b SDRF representation for a TMT experiment with three samples and three fractions resulting in nine rows where samples are repeated for each fraction and data-file information is repeated for each labeling channel, which is encoded using the property comment[label].
Fig. 3Using MAGE-TAB-Proteomics for dataset annotations.
a Quantification and data-acquisition methods used in the public datasets that have been annotated with MAGE-Tab-Proteomics by July 2021. b Tools and libraries for the validation and conversion of MAGE-TAB-Proteomics files.
Examples of MAGE-TAB-Proteomics files.
| Dataset type | Accession code/hyperlink | MAGE-TAB |
|---|---|---|
| Label-free | ||
| TMT, CPTAC dataset not in PX | ||
| SILAC | ||
| Phospho-proteomics | ||
| Label-free, multiple fragmentation modes and various enzymes | ||
| AP-MS interactomics | ||
| TMT | ||
| Label-free | ||
| DIA | ||
| Metabolomics | MSV000086206 [ |
Fig. 4PRIDE database-submission workflow supporting IDF and SDRF files.
The IDF file is automatically generated during submission; the SDRF file can be provided by the user in the PRIDE Submission Tool and is automatically validated in the submission pipeline. The sample information is shown on the web page of each PRIDE dataset and submitted to the EBI BioSample database, which assigns a unique accession to each sample. Shown are representative PRIDE and BioSamples outputs for the dataset PXD000561.