| Literature DB >> 34252933 |
Joris Cadow1, Matteo Manica1, Roland Mathis1, Tiannan Guo2, Ruedi Aebersold3, María Rodríguez Martínez1.
Abstract
SUMMARY: In recent years, SWATH-MS has become the proteomic method of choice for data-independent-acquisition, as it enables high proteome coverage, accuracy and reproducibility. However, data analysis is convoluted and requires prior information and expert curation. Furthermore, as quantification is limited to a small set of peptides, potentially important biological information may be discarded. Here we demonstrate that deep learning can be used to learn discriminative features directly from raw MS data, eliminating hence the need of elaborate data processing pipelines. Using transfer learning to overcome sample sparsity, we exploit a collection of publicly available deep learning models already trained for the task of natural image classification. These models are used to produce feature vectors from each mass spectrometry (MS) raw image, which are later used as input for a classifier trained to distinguish tumor from normal prostate biopsies. Although the deep learning models were originally trained for a completely different classification task and no additional fine-tuning is performed on them, we achieve a highly remarkable classification performance of 0.876 AUC. We investigate different types of image preprocessing and encoding. We also investigate whether the inclusion of the secondary MS2 spectra improves the classification performance. Throughout all tested models, we use standard protein expression vectors as gold standards. Even with our naïve implementation, our results suggest that the application of deep learning and transfer learning techniques might pave the way to the broader usage of raw mass spectrometry data in real-time diagnosis.Entities:
Year: 2021 PMID: 34252933 PMCID: PMC8275322 DOI: 10.1093/bioinformatics/btab311
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Summary of peptide vector post-processing
| Name | Processing steps |
|---|---|
|
| Log2 transformation and quantile normalization of samples. |
|
| Imputation of missing values on |
|
| Batch normalization over different machine runs performed on |
|
| Selection of only the top 3 peptides per protein (over all samples). |
| Imputation of missing values with a linear regression. | |
| The strongest intensity of proteotypic peptides are adopted as protein intensity. | |
Note: During peptide processing, four different intermediate datasets are generated. We test the accuracy of our model on the peptides3, peptides4 and proteins datasets.
Peptide and proteomic feature vectors
| Dataset | Number of features | Number of retained features |
|---|---|---|
|
| 16 644 | 0 |
|
| 16 644 | 1207 |
|
| 16 104 | 1207 |
|
| 2103 | 265 |
Note: Summary of the number of initial features and retained features after pre-processing as described in Table 1. As there are no features without any missing value in at least one sample before imputation in the original dataset (peptides2), all features are eliminated, resulting in zero retained features. We investigate the influence of the different post-processing pipelines in the model’s classification accuracy in Section 3.2.
Vector encodings overview
| Encoder name [ref] | Input | Output |
| |
|---|---|---|---|---|
|
|
| |||
|
| 224×224×3 | 2048 | 942 (45%) | 120 577 (58%) |
|
| 224×224×3 | 2048 | 1145 (55%) | 132 459 (64%) |
|
| 224×224×3 | 2048 | 1570 (76%) | 164 260 (79%) |
|
| 331×331×3 | 4032 | 3671 (91%) | 365 296 (89%) |
|
| 299×299×3 | 1536 | 1536 (100%) | 155 107 (99%) |
|
| 299×299×3 | 2048 | 2045 (99%) | 206 835 (99%) |
|
| 224×224×3 | 1024 | 1018 (99%) | 103 418 (99%) |
|
| 299×299×3 | 2048 | 2044 (99%) | 206 725 (99%) |
|
| 331×331×3 | 7168 | 5114 (71%) | 594 543 (82%) |
|
| 224×224×3 | 1056 | 618 (58%) | 88 492 (82%) |
|
| 224×224×3 | 1024 | 922 (90%) | 102 641 (99%) |
|
| 331×331×3 | 4320 | 4050 (93%) | 427 948 (98%) |
|
| 224×224×3 | 1280 | 1178 (92%) | 118 715 (91%) |
|
| 224×224×3 | 1280 | 1181 (92%) | 116 858 (90%) |
|
| 224×224×3 | 512 | 491 (95%) | 50 680 (98%) |
|
| 128×128×3 | 1280 | 988 (77%) | 102 634 (79%) |
|
| 96×96×3 | 1280 | 787 (61%) | 92 169 (71%) |
|
| 224×224×3 | 256 | 249 (97%) | 25 026 (96%) |
|
| 128×128×3 | 512 | 446 (87%) | 46 829 (90%) |
Note: Characteristics of image to feature vector encoders available from https://tfhub.dev/, i.e. image input resolution and output vector size. For any given dataset, constant features over all samples were removed. Feature retention is reported for grid size 512×512 and is virtually the same for grid size 2048×2048.
Classification algorithms and hyperparameter values tested during optimization
| Classifier | Parameter | Values |
|---|---|---|
| Logistic regression (LG) | C | 0.1, 1, 10, 100 |
| Support vector machine (SVC) | C | 0.1, 1, 10, 100 |
| kernel | ’linear’, ’poly’, ’rbf’ | |
| Random forest (RF) | n_estimators | 100, 500 |
| Gradient boosted trees (XGBoost) | n_estimators | 100, 500 |
Note: ‘C’ is a regularization parameter of inverse strength. ‘linear’, ‘poly’ and ‘rbf’ kernel functions refer to the linear, polynomial and radial basis function, respectively. ‘n_estimators’ is the number of trees in the forest. Classifiers are implemented using scikit-learn (Pedregosa ), with the exception of XGBoost (Chen and Guestrin, 2016).
Fig. 1.Sample representations in the workflow. Mass spectra of MS1 and MS2 scans, illustrated in the top row as 3D views adopted from Ludwig et al.(2018, CC BY 4.0), constitute the raw data for a single sample. Each individual scan is processed to an image representation by rasterizing along the retention time (rt) and mass charge ratio (m/z) axes. The MS images are then resized and the gray-scale channel triplicated to match the input dimensions of the chosen image–to–vector encoder. The MS images are then given as an input to different publicly available models pretrained for natural image classification. Each model transforms the raw MS images into feature vectors, i.e. numerical vectors that encode the information contained in the image. The resulting vectors can consist of encodings from MS1 only (ms1_only), or optionally the concatenation of both MS1 and all MS2 encodings (ms1_and_ms2). Using the feature vectors, a classifier is trained on the training set to predict the phenotype, i.e. cancer or normal sample, and evaluated on the test set. We compare performance between multiple combinations of generated image resolution, encoder and classification algorithm
Summary of classification performances for encoders and gold standards
| median AUC | mean AUC |
| architecture | ||||
|---|---|---|---|---|---|---|---|
| Available input | MS1/2 | MS1 | MS1/2 | MS1 | MS1/2 | MS1 | |
| Encoder | |||||||
| Proteins | 0.951 | NaN | 0.947 | NaN | 0.010 | NaN | Proteomics |
| peptides3 | 0.951 | NaN | 0.948 | NaN | 0.012 | NaN | Proteomics |
| peptides4 | 0.947 | NaN | 0.945 | NaN | 0.012 | NaN | Proteomics |
| resnet_v2_101 | 0.849 | 0.759 | 0.837 | 0.764 | 0.036 | 0.029 | ResNet |
| resnet_v2_50 | 0.834 | 0.784 | 0.826 | 0.783 | 0.029 | 0.013 | ResNet |
| resnet_v2_152 | 0.832 | 0.746 | 0.824 | 0.747 | 0.045 | 0.025 | ResNet |
| nasnet_large | 0.827 | 0.749 | 0.816 | 0.757 | 0.029 | 0.025 | NASNet |
| inception_resnet_v2 | 0.820 | 0.737 | 0.817 | 0.740 | 0.043 | 0.025 | Inception, ResNet |
| inception_v3_imagenet | 0.811 | 0.770 | 0.814 | 0.766 | 0.029 | 0.018 | Inception |
| inception_v2 | 0.806 | 0.745 | 0.795 | 0.735 | 0.030 | 0.022 | Inception |
| inception_v3_inaturalist | 0.795 | 0.732 | 0.800 | 0.727 | 0.023 | 0.022 | Inception |
| amoebanet_a_n18_f448 | 0.793 | 0.733 | 0.788 | 0.730 | 0.034 | 0.023 | NASNet |
| nasnet_mobile | 0.792 | 0.714 | 0.792 | 0.710 | 0.028 | 0.015 | NASNet |
| inception_v1 | 0.789 | 0.717 | 0.777 | 0.719 | 0.035 | 0.021 | Inception |
| pnasnet_large | 0.777 | 0.748 | 0.773 | 0.746 | 0.023 | 0.017 | NASNet |
| mobilenet_v2_050_224 | 0.765 | 0.622 | 0.758 | 0.609 | 0.039 | 0.031 | MobileNet |
| mobilenet_v2_075_224 | 0.737 | 0.524 | 0.717 | 0.520 | 0.055 | 0.032 | MobileNet |
| mobilenet_v1_050_224 | 0.704 | 0.582 | 0.686 | 0.583 | 0.084 | 0.044 | MobileNet |
| mobilenet_v2_100_128 | 0.687 | 0.485 | 0.674 | 0.486 | 0.050 | 0.018 | MobileNet |
| mobilenet_v2_075_96 | 0.666 | 0.530 | 0.670 | 0.534 | 0.057 | 0.027 | MobileNet |
| mobilenet_v1_025_224 | 0.656 | 0.636 | 0.643 | 0.637 | 0.043 | 0.040 | MobileNet |
| mobilenet_v1_050_128 | 0.623 | 0.512 | 0.616 | 0.512 | 0.036 | 0.030 | MobileNet |
Note: For each feature encoding module, median, mean and standard deviation (σ) of the classification performance AUC values over the different classifiers are reported. For each statistic, the input of MS image features concatenated (ms1_and_ms2, in the table MS1/2) is neighbored by MS1 image features only (ms1_only, in the table MS1) for comparison.
Fig. 2.Encoding module and choice of classifier drive classification performance. Publicly available modules—trained to classify natural images—were used to encode off-the-shelf feature vectors. Exceptions to this are the gold standard datasets proteins, peptides3 and peptides4, which were obtained using a curated proteomics analysis pipeline. Classification performance, measured by AUC, is reported in order of descending median AUC for different classifiers and two resolutions of MS images (rasterized spectra). Here, we only report results obtained using concatenated feature vectors encoded from MS1 and all MS2 images (ms1_and_ms2). As observed in the figure, the main driver of performance is the encoding of features. Different off-the-shelf features achieve results ranging from 0.623 up to 0.849 median AUC, while gold standard features reached 0.951 median AUC. The variance over results from different classifiers is much larger for off-the-shelf features compared to the gold standard features
Fig. 3.Classification performance of ms1_only off-the-shelf features. Depicted is the same plot as in Figure 2, but with ms1_only encodings instead of ms1_and_ms2. The order of encoders is identical, with the peptide and protein datasets missing as these cannot be compared to a case where MS2 information is excluded. While the classification performance of ms1_only encodings is generally lower compared to ms1_and_ms2, there is a pronounced drop in performance for mobilenets, with some models performing even worse than random (AUC below 0.5). Different off-the-shelf features achieve results ranging from 0.485 up to 0.784 median AUC
Fig. 4.Correlation of ms1_only and ms1_and_ms2 AUC. The figure shows the Spearman’s rank correlation of the AUC obtained using ms1_only versus ms1_and_ms2. Although the absolute values differ, the ordering of points is highly similar in both dimensions. The overall Spearman’s rank correlation is 0.81. The median values per encoder exhibit a Spearman’s rank correlation of 0.88
Fig. 5.Effects of input features on classification performance. Each encoding module was applied to images resulting from two different resolutions of grids (512×512 and 2048×2048) on the spectra, with some initial resizing to fit the module’s specific input dimensions. Also, features from either only MS1 or MS1 and all MS2 (concatenated) were used as input to train four different classifiers. (A) We observe a clear gain in classification performance when including MS2 features and a slight difference for the change in resolution. (B) The distribution of differences in AUC (ms1_only—ms1_and_ms2) shows a significant benefit (mean of 0.085 AUC) for including MS2 features. The distribution shows a smaller peak with larger deltas (slightly violating the normality assumption) that stems solely from results of mobilenets. (C) The distribution of differences in AUC for different resolutions shows a significant advantage (mean of 0.016 AUC) of using the smaller grid size that is closer to the module’s typical input size and requires less downsizing in the encoding step