| Literature DB >> 35354825 |
Bart Van Puyvelde1, Simon Daled1, Sander Willems2, Ralf Gabriels3,4, Anne Gonzalez de Peredo5, Karima Chaoui5, Emmanuelle Mouton-Barbosa5, David Bouyssié5, Kurt Boonen6,7, Christopher J Hughes8, Lee A Gethings8, Yasset Perez-Riverol9, Nic Bloomfield10, Stephen Tate10, Odile Schiltz5, Lennart Martens3,4, Dieter Deforce1, Maarten Dhaenens11.
Abstract
In the last decade, a revolution in liquid chromatography-mass spectrometry (LC-MS) based proteomics was unfolded with the introduction of dozens of novel instruments that incorporate additional data dimensions through innovative acquisition methodologies, in turn inspiring specialized data analysis pipelines. Simultaneously, a growing number of proteomics datasets have been made publicly available through data repositories such as ProteomeXchange, Zenodo and Skyline Panorama. However, developing algorithms to mine this data and assessing the performance on different platforms is currently hampered by the lack of a single benchmark experimental design. Therefore, we acquired a hybrid proteome mixture on different instrument platforms and in all currently available families of data acquisition. Here, we present a comprehensive Data-Dependent and Data-Independent Acquisition (DDA/DIA) dataset acquired using several of the most commonly used current day instrumental platforms. The dataset consists of over 700 LC-MS runs, including adequate replicates allowing robust statistics and covering over nearly 10 different data formats, including scanning quadrupole and ion mobility enabled acquisitions. Datasets are available via ProteomeXchange (PXD028735).Entities:
Mesh:
Substances:
Year: 2022 PMID: 35354825 PMCID: PMC8967878 DOI: 10.1038/s41597-022-01216-6
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Schematic overview of the different acquisition strategies/instruments applied in the study. A comprehensive LC-MS/MS dataset was generated using samples composed of commercial Human K562, Yeast and Escherichia coli (E.coli) full proteome digests. Two hybrid proteome samples A and B containing known quantities of Human, Yeast and E.coli tryptic peptides, as described by Navarro et al. were prepared in three consecutive times to include handling variability. Additionally, a QC sample was created by mixing one sixth of each of the six master batches (65% w/w Human, 22.5% w/w Yeast and 12.5% w/w E.coli). These commercial lysates were measured individually and as triple hybrid proteome mixtures each in triplicate using DDA and DIA acquisition methodologies available on six LC-MS/MS platforms, i.e. SCIEX TripleTOF 5600 and TripleTOF 6600+, Thermo Orbitrap QE HF-X, Waters Synapt G2-Si and Synapt XS and Bruker timsTOF Pro. The complete dataset was made publicly available to the proteomics community through ProteomeXchange with dataset identifier: PXD028735. In addition, a system suitability workflow (AutoQC) was incorporated on each instrument using commercial E.coli lysate digest which were acquired at multiple timepoints throughout each sample batch. The AutoQC data was automatically imported in Skyline and uploaded to the Panorama AutoQC server using AutoQC loader, enabling system suitability assessment of each LC-MS/MS system used in the dataset.
Fig. 2Levey-Jennings plot of the standard deviation in peak area for 50 selected precursors acquired in DDA with the TripleTOF 6600+. The upper chart shows two distinct outliers, acquired respectively on the 2nd and 12th of December (red boxes). Manual inspection of the data shows these were caused by (a) a wrong vial in the sample tray and (b) an empty vial. When these two samples are excluded from the Levey-Jennings plot (lower chart), a significant drop in standard deviation over the time period of data acquisition is seen.
Fig. 3Comparing the DDA data of six different instruments. Experimental design (a) Triplicate measurements of three individual proteomes. (b) The overlap in uniquely identified peptide sequences and (c) proteins between the six instruments. (d) Number of PSMs per peptide identification throughout nine DDA runs on three different proteomes for three instruments. (e) Pearson Correlation Coefficient (PCC) of the fragment intensities were calculated between the shared identified peptides from the DDA replicates between each instrument. The numbers in each box correspond to the median spectrum PCC between the instrument on the x-axis and the instrument on the y-axis. Dark blue color indicates a higher degree of overlap or higher median PCC. (f) Boxplots of the Pearson correlation coefficients (PCC) between the MS²PIP predicted (HCD and TTOF5600 model) and experimental fragment ion intensities across the six different LC-MS instruments. (g) The benchmark design of mixed proteomes for three instruments as annotated and quantified using AlphaPept. Here, triplicate runs of Condition A and Condition B were used, resulting in the six bars depicted in the middle, respectively representing the number of MS1 features, the number of identified peptides and the number of identified proteins for each instrument. The log-fold plots to the left depict the distribution of the peptide ratios in the x-axis as a function of their intensity in the y-axis; protein log fold changes are depicted to the right.
| Measurement(s) | Digital Data Repository |
| Technology Type(s) | Digital Data Repository |