| Literature DB >> 33184295 |
Mukul K Midha1, Ulrike Kusebauch1, David Shteynberg1, Charu Kapil1, Samuel L Bader1, Panga Jaipal Reddy1, David S Campbell1, Nitin S Baliga1,2,3,4, Robert L Moritz5.
Abstract
Data-Independent Acquisition (DIA) is a method to improve consistent identification and precise quantitation of peptides and proteins by mass spectrometry (MS). The targeted data analysis strategy in DIA relies on spectral assay libraries that are generally derived from a priori measurements of peptides for each species. Although Escherichia coli (E. coli) is among the best studied model organisms, so far there is no spectral assay library for the bacterium publicly available. Here, we generated a spectral assay library for 4,014 of the 4,389 annotated E. coli proteins using one- and two-dimensional fractionated samples, and ion mobility separation enabling deep proteome coverage. We demonstrate the utility of this high-quality library with robustness in quantitation of the E. coli proteome and with rapid-chromatography to enhance throughput by targeted DIA-MS. The spectral assay library supports the detection and quantification of 91.5% of all E. coli proteins at high-confidence with 56,182 proteotypic peptides, making it a valuable resource for the scientific community. Data and spectral libraries are available via ProteomeXchange (PXD020761, PXD020785) and SWATHAtlas (SAL00222-28).Entities:
Year: 2020 PMID: 33184295 PMCID: PMC7665006 DOI: 10.1038/s41597-020-00724-7
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Data acquisition workflow to generate a comprehensive E. coli assay library, quality evaluation with DIALib-QC and DIA/SWATH-MS quantification by Spectronaut. A comprehensive DIA/SWATH assay library for E. coli was generated from whole cell lysate, fractionated samples, overexpressed proteins, and supplemented with synthetic peptides. Samples were analyzed with data-dependent acquisition (DDA) mass spectrometry on TripleTOF 5600+ and TripleTOF 6600 instruments resulting in 209 data files. To generate a DIA/SWATH library, the raw data files were converted to mzML format using the ABSCIEX converter with the profile mode extraction parameter. The mzML files were searched against the reference proteome using both Comet and X!Tandem search engines. The identified sequences were then statistically validated using the Trans-Proteomic Pipeline (TPP) including PeptideProphet and iProphet. MAYU was applied to control the FDR at the protein level. Using SpectraST, confidently assigned spectra were converted into a redundant spectral library and retention times are normalized in iRT space using RTCatalog, then a consensus spectrum library was generated. The assay library was extracted from the consensus library using the spectrast2tsv.py script. Libraries were evaluated with the DIA Library Quality Control (DIALib-QC, www.swathatlas.org) tool and their assessment reports were generated. The performance of the TripleTOF E. coli spectral library was evaluated based on the identification and quantitation of peptides and proteins in data-independent acquisition (DIA) methods with different gradient lengths using the Spectronaut analysis software.
Sample overview.
| Sample Type | Peptide fractionation | Medium | Instrument | MS injections |
|---|---|---|---|---|
| None | LB | TT5600 + | 58 | |
| None | LB/M9 | TT5600+, TT6600 | 33 | |
| OGE | LB | TT5600+ | 24 | |
| OGE | M9 | TT5600+ | 24 | |
| DMS | M9 | TT5600+ | 47 | |
| None | None | TT5600+ | 23 | |
| 209 |
Sample types including peptide fractionation method, MS instruments and number of injections that were used to generate the E. coli spectral library are depicted. ASKA (-) refers to overexpressed strains with histidine-tagged proteins, OGE: Off gel electrophoresis, DMS: differential ion mobility, LB: Luria-Bertani broth medium, and M9: Minimal medium.
Library statistics.
| Proteotypic | Proteotypic and Shared | |
|---|---|---|
| 4,014 | 4,086 | |
| 48,188 | 48,771 | |
| 56,182 | 56,872 | |
| 68,121 | 68,948 | |
| 802,083 | 811,406 |
Overview of the number of proteins, stripped peptides, modified peptides, precursor ions and transitions at 1% protein FDR for proteotypic peptides and all peptides (proteotypic and shared) in the assay library.
Fig. 2Coverage and characteristics of the E. coli spectral assay library. (a) Proteome coverage of the E. coli spectral assay library (complete library and library with top 6 ions and 100 variable windows applied) and SWATH-MS identified proteins with the developed library in comparison to the annotated reference proteome. (b) The graph depicts the number of E. coli peptides per protein in the SWATH assay library. (c) Retention time (RT) fit of +2 and +3 charge states of the same peptide in the assay library by DIALib-QC to assess the quality of the library. (d) Distribution of precursor m/z values across the acquired mass range in the assay library. (e) Frequency of precursor charge states observed in the assay library. (f) Frequency and type of peptide modification observed in the assay library. CAM: carbamidomethylation, Oxi: oxidation, PCm: S-carbamoylmethylcysteine, PGQ: pyroglutamate, and PGE: pyroglutamatic acid. (g) Distribution of peptide length in the assay library. (h) Distribution of the number of fragment ions per precursor. (i) Frequency of observed b- and y- ion fragments with CID fragmentation in the assay library.
Fig. 3Performance of the spectral assay library with different liquid chromatography gradients by DIA/SWATH-MS. (a) Number of unique peptides and (b) protein groups identified with chromatography gradients of different length. An approximately 20% increase in peptide and protein group identifications was observed in the 90 minutes gradient compared to the 15 minutes gradient length. The error bars indicate the variability within five replicates represented as standard error of the mean. These are calculated as the ratio of standard deviation of the number of quantified peptides or proteins observed in each gradient replicate to the square-root of the sample size (n = 5). The small yellow dots denote the number of identifications in each replicate. (c) The plot shows all identified protein groups ranked according to their abundance, highlighting the dynamic range of proteins that can be quantified with liquid chromatography of different gradient length. All gradients resulted in protein quantification across five orders of magnitude, with exception of the 15 min gradient which covered four orders of magnitude. (d) Pearson correlation of protein intensity values obtained from 1,483 proteotypic proteins that were quantified in all five technical replicates by both, 15 minutes and 90 minutes gradients. The high positive correlation indicates quantitative robustness between the gradient methods. (e) Distribution of the coefficient of variation (CV) of proteins identified in all five replicates at 1% protein FDR estimated by Spectronaut. The median CV of 10% (90 minutes gradient) to 11% (15 minutes gradient) correlates well with the expected technical variation. The first and third quartile are marked by a box with whisker marking a minimum/maximum value ranging to 12 interquartile and the median depicted as solid line. (f) Distribution of data points per elution peak for the different gradient methods estimated by Spectronaut. The first and third quartile are marked by a box with whisker marking a minimum/maximum value ranging to 3 interquartile and the median depicted as solid line.
| Measurement(s) | Proteome • database type spectral library |
| Technology Type(s) | SWATH MS protein profiling assay • mass spectrometry • Data-Independent Acquisition |
| Sample Characteristic - Organism |