| Literature DB >> 25977788 |
George Rosenberger1, Ching Chiek Koh2, Tiannan Guo3, Hannes L Röst1, Petri Kouvonen3, Ben C Collins3, Moritz Heusel4, Yansheng Liu3, Etienne Caron3, Anton Vichalkovski3, Marco Faini3, Olga T Schubert1, Pouya Faridi5, H Alexander Ebhardt3, Mariette Matondo3, Henry Lam6, Samuel L Bader7, David S Campbell7, Eric W Deutsch7, Robert L Moritz7, Stephen Tate8, Ruedi Aebersold9.
Abstract
Mass spectrometry is the method of choice for deep and reliable exploration of the (human) proteome. Targeted mass spectrometry reliably detects and quantifies pre-determined sets of proteins in a complex biological matrix and is used in studies that rely on the quantitatively accurate and reproducible measurement of proteins across multiple samples. It requires the one-time, a priori generation of a specific measurement assay for each targeted protein. SWATH-MS is a mass spectrometric method that combines data-independent acquisition (DIA) and targeted data analysis and vastly extends the throughput of proteins that can be targeted in a sample compared to selected reaction monitoring (SRM). Here we present a compendium of highly specific assays covering more than 10,000 human proteins and enabling their targeted analysis in SWATH-MS datasets acquired from research or clinical specimens. This resource supports the confident detection and quantification of 50.9% of all human proteins annotated by UniProtKB/Swiss-Prot and is therefore expected to find wide application in basic and clinical research. Data are available via ProteomeXchange (PXD000953-954) and SWATHAtlas (SAL00016-35).Entities:
Mesh:
Substances:
Year: 2014 PMID: 25977788 PMCID: PMC4322573 DOI: 10.1038/sdata.2014.31
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Overview of the contents of the combined assay library.
|
|
|
|
|
|
|---|---|---|---|---|
| CL refers to cell line and T refers to tissue, indicating the
source of the specimen. The full sample annotation is provided in | ||||
| HEK293 (CL) | AP (Kinases) | Trypsin | None | 12 |
| HEK293 (CL) | AP (14-3-3) | Trypsin | None | 29 |
| HEK293 (CL) | SEC | Trypsin | None | 81 |
| HEK293 (CL) | None | Trypsin | OGE | 11 |
| HEK293 (CL) | None | Trypsin | None | 1 |
| U2OS (CL) | None | PCT | None | 13 |
| HeLa (CL) | None | PCT | None | 9 |
| U2OS and HeLa (CL) | None | Trypsin | OGE | 24 |
| NCI60 (CL) | None | PCT | None | 13 |
| NCI60 (CL) | None | Trypsin | OGE | 24 |
| CAL51 (CL) | None | Trypsin | None | 5 |
| CAL51 (CL) | None | Trypsin | 1D GE | 2 |
| THP1 (CL) | None | Trypsin | OGE | 27 |
| LNCaP (CL) | None | Trypsin | SAX | 6 |
| LNCaP (CL) | None | Trypsin | None | 1 |
| Kidney (T) | None | Trypsin | 1D GE | 15 |
| Kidney (T) | None | PCT | None | 16 |
| Large intestine (T) | None | Trypsin | OGE | 24 |
| Muscle (T) | None | PCT | None | 3 |
| Lung (T) | None | PCT | None | 2 |
| Blood plasma (T) | None | Trypsin | SAX | 8 |
| Monocytes (T) | None | Trypsin | None | 1 |
| Neutrophils (T) | None | Trypsin | None | 1 |
| Purified platelets (T) | None | Trypsin | None | 3 |
| Total |
|
Figure 1Data acquisition and data analysis workflows employed for the generation of assay libraries. (a) Data acquisition: Sampling of different cell lines and tissue types was followed by (optional) protein fractionation, proteolytic digestion (using trypsin or lys-c/trypsin using PCT), (optional) peptide fractionation and LC-MS/MS analysis in discovery proteomics mode. (b) Data analysis: Sequence database search was conducted using four different search engines and the results were statistically evaluated and combined using the Trans-Proteomic Pipeline. False discovery rate (FDR) control was conducted using MAYU. The identified peptides were used to generate a consensus, RT normalized spectral library using SpectraST. Assays were selected using spectrast2tsv.py and the OpenSWATH tool ConvertTSVToTraML.
Figure 2Statistics of the combined assay library and comparison to other human proteome mapping efforts. (a) True positive (red) and all protein identifications (blue) as a function of protein FDR. The graph indicates that the number of true positive protein identifications saturates at a protein FDR cutoff of 0.05. Additional identifications at less strict FDR cutoffs are mainly false positive protein identifications. (b) True positive (red) and all peptides identifications (blue) as a function of protein FDR. The graph indicates that the number of true positive peptide identifications correlates strongly with the total number of peptide identifications and does not reach saturation within typical levels of protein FDR cutoffs. (c) The number of PSM per sample type contributed to the assay library. Multiple PSM can constitute a consensus spectrum and are individually counted per MS injection. The NCI60 cell line panel contributed most, and HEK293 cells, gut tissue and THP1 cells each contributed to more than 10% of all spectra. (d) Overlap of human proteins curated by UniProtKB/Swiss-Prot, a subset annotated with protein-level evidence and the presented combined assay library (CAL). On the protein level, the assay library provides 68.2% coverage of the proteins with evidence while providing assays for an additional 802 proteins. Compared to UniProtKB/Swiss-Prot, the assay library contains 50.9% of all 20,264 proteins.
Assay statistics of the combined assay library.
|
|
| |
|---|---|---|
| The number of proteins, peptides, precursors and transitions, filtered at protein FDR 1% is depicted. The combined assay library is provided with all target and decoy assays, but only proteotypic assays were considered for all downstream analysis. | ||
| Proteins | 10,316 | 11,588 |
| Peptides | 139,449 | 146,576 |
| Precursors | 194,052 | 204,545 |
| Transitions | 1,164,312 | 1,227,270 |
Figure 3Number of peptide and protein identifications by SWATH-MS using different proteotypic assay libraries. (a) The proteotypic peptides contained in the combined assay library (CAL) and the sample–specific (ss) assay libraries and their overlap is depicted. The overlap on peptide-level between the sample-specific libraries is more than 70% and around 80% on protein-level. 239 peptides contained in the sample-specific libraries were not included in the CAL, since they did not meet the stricter quality cutoff of the CAL. (b) The number of true positive peptide identifications in dependency of the peptide FDR is depicted. Using the combined library, the number of true positive peptide identifications matches the sample-specific libraries at peptide FDR below 1% (dashed grey line). (c,d) The number of true positive protein identifications of a HeLa (c) or U2OS (d) whole cell lysate in a single, unfractionated injection in dependency of the protein FDR is depicted. Protein FDR cutoffs are either reported for all identifications or non-single hits (NS). The CAL provides similar sensitivity compared to the sample-specific libraries for HeLa and U2OS at typical levels of error-rate control. The non-single hit identifications of the CAL generally provide a higher sensitivity at lower protein FDR cutoffs. The dashed grey line indicates the protein FDR cutoff at 1%. (e) Reproducibility of the peptide identifications in dependency of the peptide FDR. The colors indicate reproducibility in 1 (green), 2 (blue) or 3 (red) of 3 technical replicates. Both ss HeLa (top) and CAL (bottom) enable detection of a similar number of assays among all replicates at the same peptide FDR. The CAL enables detection of more low intensity peptides in only one or two replicates. (f) Distribution of the coefficient of variation (CV) of summed transition intensities of precursors identified in all three replicates at 1% peptide FDR. The median CV of 5% (U2OS) to 10% (HeLa) corresponds well with the expected technical variation and is very similar between sample-specific and the combined assay library.
Identification statistics of the combined and sample-specific assay libraries.
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| The number of identified proteotypic peptides and proteins in SWATH-MS datasets of whole cell lysates of HeLa and U2OS cell lines at commonly used protein FDR cutoffs using combined (CAL) and sample-specific (ss) assay libraries is reported. Protein FDR cutoffs are either reported for all identifications or non-single hits (NS). The true positive protein (prot) and peptide (pep) identifications for the combined assay library and sample-specific assay libraries are reported as estimated by MAYU. | ||||||||
| 1% | 2,417 | 14,930 | 2,353 | 14,635 | 2,617 | 15,608 | 2,452 | 14,360 |
| 2% | 2,730 | 17,294 | 2,467 | 15,416 | 2,989 | 18,321 | 2,541 | 14,982 |
| 5% | 3,246 | 21,128 | 2,514 | 15,672 | 3,486 | 21,893 | 2,552 | 15,003 |
| NS 1% | 2,608 | 23,075 | 1,750 | 14,999 | 2,803 | 24,009 | 1,763 | 14,599 |
| NS 2% | 2,804 | 25,005 | 1,798 | 15,537 | 2,965 | 25,497 | 1,815 | 15,002 |
| NS 5% | 3,111 | 28,002 | 1,820 | 15,668 | 3,241 | 28,442 | 1,819 | 14,999 |
Figure 4Application of the combined assay library (CAL) to an independently acquired dataset (CDK4 AP-SWATH, Lambert et al.[28]) and comparison to the sample-specific assay library (ss). The fold changes of the comparison wild type (WT) and mutants (R24C or R24H) with whiskers for standard deviation are indicated. The assays contained in the combined library for CD2A1 and CDN2C covered fewer and different peptides than the sample-specific assay library and thus the fold change is smaller. The results indicate that comparable qualitative and quantitative results using the combined assay library can be retrieved from SWATH-MS experiments conducted using different experimental setups, data acquisition and data analysis strategies.