| Literature DB >> 35845101 |
Julian Uszkoreit1,2,3, Katalin Barkovits1,2, Sandra Pacharra4, Kathy Pfeiffer1,2, Simone Steinbach1,2, Katrin Marcus1,2, Martin Eisenacher1,2.
Abstract
In this article, we present a data dependent acquisition (DDA) dataset which was generated as a reference and ground truth quantitative dataset. While initially used to compare samples measured with DDA and data independent acquisition (DIA) (Barkovits et al., 2020), the presented dataset holds potential value as a benchmark reference for any workflows working on DDA data. The entire dataset consists of 15 LC-MS/MS measurements composed of five distinct spike-in-states, each with three replicates. To generate the data set, a C2C12 (immortalized mouse myoblast) cell lysate was used as a complex background for five different states which were simulated by spiking 13 defined proteins at different concentrations. For this purpose, the cell lysate was used in a constant amount of 20 µg for all samples and different amounts of the 13 selected proteins ranging from 0.1 to 10 pmol were added, reflecting physiological amounts of proteins. Afterwards, all samples were tryptically digested using the same method. From each sample 200 ng tryptic peptides were measured in triplicates on a Q Exactive HF (Thermo Fisher Scientific). The mass range for MS1 was set to 350-1400 m/z with a resolution of 60,000 at 200 m/z. HCD fragmentation of the Top10 abundant precursor ions was performed at 27% NCE. The fragment analysis (MS2) was performed with a resolution of 30,000 at 200 m/z. Additionally to the raw files, the dataset contains centroided mzML files and spectrum identification results for peptide identifications performed by Mascot (Perkins et al., 1999), MS-GF+ (Kim et al., 2010) and X!Tandem (Craig and Beavis, 2004) for each separate MS analysis. The corresponding FASTA containing protein sequences as well as a combination of all identification runs performed by PIA (Uszkoreit et al., 2019, 2015) and a peptide and protein quantification performed by OpenMS (Pfeuffer et al., 2017) is included. All data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository (Perez-Riverol et al., 2018) with the dataset identifier PXD012986.Entities:
Keywords: C2C12 cell line; Complex proteomics standard; Mass spectrometry; Protein spike-in dataset; Proteomics; Quantitative ground truth dataset
Year: 2022 PMID: 35845101 PMCID: PMC9283871 DOI: 10.1016/j.dib.2022.108435
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Concentrations of the 13 spiked-in proteins per sample. Each protein (group) was spiked in the concentrations 0.1, 0.5, 1, 5 and 10 pmol in one sample, while the overall amount of spike-in proteins was kept as constant as possible.
| Amount of spike-in proteins (pmol) | ||||||
|---|---|---|---|---|---|---|
| UniProt Accession | Sample 1 | Sample 2 | Sample 3 | Sample 4 | Sample 5 | |
| P37840 | 1 | 10 | 0.5 | 0.1 | 5 | |
| P02754 | 0.5 | 0.1 | 5 | 10 | 1 | |
| P02671, P02675, P02679 | 10 | 5 | 1 | 0.5 | 0.1 | |
| P13006 | 0.1 | 1 | 10 | 5 | 0.5 | |
| P69905, P68871 | 0.5 | 5 | 10 | 1 | 0.1 | |
| P20261, P32946, P32947 | 0.1 | 0.5 | 1 | 5 | 10 | |
| P00698 | 5 | 10 | 0.1 | 0.5 | 1 | |
| P68082 | 1 | 0.1 | 5 | 10 | 0.5 | |
Mapping of the raw file name to the spike-in and replicate state.
| state 1 | state 2 | state 3 | state 4 | state 5 | |
|---|---|---|---|---|---|
| replicate 1 | QExHF04026 | QExHF04028 | QExHF04030 | QExHF04032 | QExHF04034 |
| replicate 2 | QExHF04036 | QExHF04038 | QExHF04040 | QExHF04032 | QExHF04044 |
| replicate 3 | QExHF04046 | QExHF04048 | QExHF04050 | QExHF04042 | QExHF04054 |
| Subject | Omics: Proteomics |
| Specific subject area | Proteomics LC-MS/MS ground-truth dataset of quantitative spike-in proteins into complex matrix |
| Type of data | raw proteomics data |
| How the data were acquired | Liquid chromatography coupled to tandem mass spectrometry (Q Exactive HF (Thermo Fisher Scientific) mass spectrometer operated in data dependent acquisition (DDA) mode performing HCD fragmentation) |
| Data format | Raw |
| Description of data collection | C2C12 were grown in cell culture, harvested and lysed. The lysate was split into 5 aliquots. Each aliquot was spiked with 13 non-mouse proteins in varying amounts, keeping comparable overall sample amounts and physiologically plausible protein concentrations. The samples were measured in triplicates by LC-MS/MS in DDA mode and analyzed using peptide identification and quantification. |
| Data source location | Institution: Ruhr University Bochum, Medical Proteome Center (MPC) |
| Data accessibility | Repository name: PRIDE |
| Related research article | Barkovits K, Pacharra S, Pfeiffer K, Steinbach S, Eisenacher M, Marcus K, Uszkoreit J, Reproducibility, specificity and accuracy of relative quantification using spectral library-based data-independent acquisition. Mol Cell Proteomics. 2020 Jan;19(1):181–197. |