| Literature DB >> 27224449 |
Johan Teleman1,2, Aakash Chawade1, Marianne Sandin1, Fredrik Levander1,3, Johan Malmström2.
Abstract
In bottom-up mass spectrometry (MS)-based proteomics, peptide isotopic and chromatographic traces (features) are frequently used for label-free quantification in data-dependent acquisition MS but can also be used for the improved identification of chimeric spectra or sample complexity characterization. Feature detection is difficult because of the high complexity of MS proteomics data from biological samples, which frequently causes features to intermingle. In addition, existing feature detection algorithms commonly suffer from compatibility issues, long computation times, or poor performance on high-resolution data. Because of these limitations, we developed a new tool, Dinosaur, with increased speed and versatility. Dinosaur has the functionality to sample algorithm computations through quality-control plots, which we call a plot trail. From the evaluation of this plot trail, we introduce several algorithmic improvements to further improve the robustness and performance of Dinosaur, with the detection of features for 98% of MS/MS identifications in a benchmark data set, and no other algorithm tested in this study passed 96% feature detection. We finally used Dinosaur to reimplement a published workflow for peptide identification in chimeric spectra, increasing chimeric identification from 26% to 32% over the standard workflow. Dinosaur is operating-system-independent and is freely available as open source on https://github.com/fickludd/dinosaur .Entities:
Keywords: algorithm; chimeric spectra; electrospray ionization; feature detection; mass spectrometry; proteomics; software
Mesh:
Substances:
Year: 2016 PMID: 27224449 PMCID: PMC4933939 DOI: 10.1021/acs.jproteome.6b00016
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1Overview of the Dinosaur feature-finding algorithm. Features are detected by (a) centroiding MS1 spectra; (b) assembling centroid peaks into single isotope traces, also referred to as hills; (c) clustering of hills by theoretically possible m/z differences; and (d) deconvolution of clusters into charge-state-consistent features. The plot trail plots randomly selected parts of the intermediary data to support the tuning of parameters and increase transparency of the computational steps. Created plot types are (e) a line graph of a peak centroid, (f) a heatmap and histogram of hill construction, (g) an isotopic profile compared to an averagine, and (h) a complete heatmap of a data section with annotated detected features.
Compatibility and Metadata of Dinosaur and Common-Feature-Detection Tools
Figure 2Dinosaur, msInspect, MaxQuant, MaxQuant-ref, and Open MS feature detection performance based on 57 LC–MS injections of a dilution series of synthetic peptides in bacterial background. (a) The proportion of MS/MS identifications at 1% FDR that were matched to a feature. (b) Log–log coefficients of correlation of feature summed intensities vs theoretical synthetic peptide concentration.
Figure 3Intensity distribution of ID-matched features for compared feature-detection tools. Feature intensities from each tool were normalized by the division by the median intensity of features linked to IDs that all tools matched. (a) Absolute distribution of features over the normalized intensity range. (b) The relative number of features in each intensity bin. MaxQuant tools are relatively strong at low-intensity features, and the OpenMS tool is relatively strong at high-intensity features. Dinosaur shares both these strengths.
Figure 4Usability of Dinosaur. (a–d) Typical proteomics samples were represented by eight samples of different complexity. In total, 583 793 features were detected, of which the 412 331 of charge two or higher are included here. (a) Distributions of detected feature intensities for the eight samples. (b) Distributions of feature retention times for the eight samples. (c) Computation times of the eight samples for Dinosaur compared to MaxQuant and MaxQuant-ref. Because of difficulties with timing the feature-detection part of MaxQuant, two alternative measures are reported. (d) Computation times as in (c) but for Dinosaur compared to an OpenMS feature finder. The missing measurement of the synthetic peptide OpenMS sample is likely due to some corner-case implementation issue. (e) The number of unique peptides identified in three HeLa cell replicates using a new Dinosaur-based implementation of the DeMix workflow compared to the original workflow and analysis without DeMixing.