| Literature DB >> 35803928 |
Vadim Demichev1,2,3, Lukasz Szyrwiel4,5, Fengchao Yu6, Guo Ci Teo6, George Rosenberger7, Agathe Niewienda4, Daniela Ludwig4, Jens Decker8, Stephanie Kaspar-Schoenefeld8, Kathryn S Lilley9, Michael Mülleder10, Alexey I Nesvizhskii11,12, Markus Ralser4,5.
Abstract
The dia-PASEF technology uses ion mobility separation to reduce signal interferences and increase sensitivity in proteomic experiments. Here we present a two-dimensional peak-picking algorithm and generation of optimized spectral libraries, as well as take advantage of neural network-based processing of dia-PASEF data. Our computational platform boosts proteomic depth by up to 83% compared to previous work, and is specifically beneficial for fast proteomic experiments and those with low sample amounts. It quantifies over 5300 proteins in single injections recorded at 200 samples per day throughput using Evosep One chromatography system on a timsTOF Pro mass spectrometer and almost 9000 proteins in single injections recorded with a 93-min nanoflow gradient on timsTOF Pro 2, from 200 ng of HeLa peptides. A user-friendly implementation is provided through the incorporation of the algorithms in the DIA-NN software and by the FragPipe workflow for spectral library generation.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35803928 PMCID: PMC9270362 DOI: 10.1038/s41467-022-31492-0
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Fig. 1A concept for processing of proteomic trapped ion mobility data.
a Our dia-PASEF data processing workflow starts with 2D-peak-picking using a narrow scanning window. Chromatogram extraction is then performed, wherein for each precursor or fragment ion, only peaks within certain m/z and ion mobility thresholds from the expected values are used. Expected values are indicated here with dotted lines, peaks discarded due to m/z thresholding are indicated in gray, and a peak discarded due to only ion mobility thresholding is in red. Observed inverse ion mobility values (1/K0) are compared between different fragment ions (extracted chromatographic elution profiles and apex 1/K0 values of which are indicated with different colors) as well as to the reference library 1/K0 value (here: 1.13), to score putative peptide-spectrum matches. Fragments with outlier ion mobility values (here: black—signal from another peptide, green—signal mildly affected by interference) are assigned lower scores. The resulting data are analyzed by an ensemble of deep neural networks, used to distinguish true and false signals. Signals with deviating ion mobility values are also filtered out to increase quantification accuracy. b In contrast to the 2D-peak-picking introduced herein, direct extraction of chromatograms from the profile data could potentially be used. In this case, if extracting profile data with narrow windows (here: in blue), for example, the same size as used by the 2D-peak-picking algorithm, a significant proportion of ion signal can be lost (example highlighted in red) due to an imperfect match between theoretical and empirical m/z or 1/K0 values. If extracting with wide windows, more interfering signals would be integrated (example highlighted in red), increasing the complexity of the data and hampering correct identification and accurate quantification of peptides.
Fig. 2Protein detection and quantification performance.
a Number of quantified proteins for different injection amounts and instrument settings. Numbers of proteins detected in 1, 2, or all 3 injection replicates for each dataset (nanoflow 25% duty cycle scheme and standard scheme; Evosep 200, 100, and 60 samples per day (SPD) methods) are shown with different color shades, average numbers are indicated. Numbers reported by the original dia-PASEF workflow are shown in gray[14]. The numbers of proteins detected by both workflows are indicated with dashed horizontal lines. b Coefficients of variation (CV) distributions for the same datasets. The boxes correspond to the interquartile range, with the median indicated, and the whiskers extend to the 5–95% percentiles. c Quantification accuracy of dia-PASEF data analyzed with the new software workflow. We reanalyzed previously recorded data[14], generated by spiking a yeast digest into a HeLa digest (200 ng) in different proportions (A, 45 ng, and B, 15 ng) and analyzed in triplicates using a 90-min nanoLC gradient. The runs were processed using a spectral library created with FragPipe. Horizontal lines indicate the expected ratios. On the boxplot, the boxes correspond to the interquartile range, with the median indicated, and the whiskers extend by a 1.5× interquartile range. Expected ratios are indicated with gray lines. d Analysis of a dilution series acquired on timsTOF Pro 2, a second-generation dia-PASEF-capable mass spectrometer, using a 93-min 300 nL/min gradient and a pre-column (Methods). Average protein numbers for triplicate injections after filtering at 1% run-specific protein q-value are shown. e Comparison of the performance of DIA-NN (gray) and Spectronaut (orange) on the leukemia dataset[21]. Total numbers of precursors and proteins (top), protein ID numbers distributions, and consistency of protein detection (bottom) are compared. The y-axis on the histograms represents the counts.
Spectral libraries.
| Library | Ions | Peptides | Proteins | Genes |
|---|---|---|---|---|
| HeLa, nanoflow | 260,785 | 161,325 | 9991 | 9973 |
| HeLa, Evosep One | 145,875 | 98,485 | 8201 | 8187 |
| HeLa, two-organism | 361,555 | 224,597 | 10,353 | 10,332 |
| HeLa, two-organism, filtered | 360,458 | 223,907 | 10,350 | 10,331 |
| Yeast, two-organism | 134,148 | 79,860 | 5134 | 5132 |
| Yeast, two-organism, filtered | 133,351 | 79,337 | 5113 | 5113 |