Literature DB >> 32657378

Deep multiple instance learning classifies subtissue locations in mass spectrometry images from tissue-level annotations.

Dan Guo¹, Melanie Christine Föll^2,3, Veronika Volkmann^2,3, Kathrin Enderle-Ammour^2,3, Peter Bronsert^2,3,4,5, Oliver Schilling^2,3, Olga Vitek¹.

Abstract

MOTIVATION: Mass spectrometry imaging (MSI) characterizes the molecular composition of tissues at spatial resolution, and has a strong potential for distinguishing tissue types, or disease states. This can be achieved by supervised classification, which takes as input MSI spectra, and assigns class labels to subtissue locations. Unfortunately, developing such classifiers is hindered by the limited availability of training sets with subtissue labels as the ground truth. Subtissue labeling is prohibitively expensive, and only rough annotations of the entire tissues are typically available. Classifiers trained on data with approximate labels have sub-optimal performance.
RESULTS: To alleviate this challenge, we contribute a semi-supervised approach mi-CNN. mi-CNN implements multiple instance learning with a convolutional neural network (CNN). The multiple instance aspect enables weak supervision from tissue-level annotations when classifying subtissue locations. The convolutional architecture of the CNN captures contextual dependencies between the spectral features. Evaluations on simulated and experimental datasets demonstrated that mi-CNN improved the subtissue classification as compared to traditional classifiers. We propose mi-CNN as an important step toward accurate subtissue classification in MSI, enabling rapid distinction between tissue types and disease states.
AVAILABILITY AND IMPLEMENTATION: The data and code are available at https://github.com/Vitek-Lab/mi-CNN_MSI.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 32657378 PMCID： PMC7355295 DOI： 10.1093/bioinformatics/btaa436

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Biochemical constitution of tissues varies with tissue types (such as epithelial and connective tissues), or disease states (such as tumor and healthy tissues). Mass spectrometry imaging (MSI) provides an untargeted characterization of the molecular composition of such tissues at spatial resolution, simultaneously quantifying hundreds of analytes without the need for chemical labels or antibodies (Spengler, 2015; Jones ). Therefore, MSI has a strong potential to become a rapid diagnostic technology in the clinic (Kriegsmann ; Vaysse ). Although the name of the technology contains the word ‘image’, the structure of MSI data is very different from other bioimaging technologies (Fig. 1). In MSI, mass spectra are acquired at thousands of different spatial locations in a raster pattern throughout the tissue. MSI techniques fall into two major categories: matrix-assisted laser desorption/ionization (MALDI) MSI (Aichler and Walch, 2015) and desorption electrospray ionization (DESI) MSI (Wu ). With each technique, the mass spectrum obtained at each location is a collection of features, corresponding to the ions of biochemical analytes such as metabolites, lipids, peptides and proteins. The features do not contain direct information regarding the identity of the underlying analyte, except for their ratios of mass over charge m/z. For one tissue location, a typical MSI experiment reports hundreds to thousands of m/z in ascending order. The intensities of the m/z correlate with the abundance of the analyte. A plot of the abundance of one m/z across all locations is referred to as an ion image.

Fig. 1.

MSI data. (a) H&E-stained optical images of a pair of tumor and healthy tissues from the human renal cell carcinoma (RCC) experiment. (b) Mass spectrum from one location in the tumor tissue. The inset zooms into . Two features with m/z shift can correspond to molecular ions and sodium adducts. Two features with m/z shift can correspond to molecular ions and potassium adducts. (c) Ion images of m/z 215 of the tissues in (a) A reliable diagnostics can be achieved by supervised classification models that take as input the observed mass spectra, and predict labels such as tumor, healthy or tumor subtypes. Beyond tissue-level classification (classifying the entire tissues), subtissue-level classification (classifying the disease status of individual locations within the tissues) is of most interest. Ranking m/z features by their predictive ability is also important. Currently, training subtissue-level classifiers providing this information requires training sets of tissues with reliable subtissue labels. Unfortunately, accessing a training set with reliable subtissue labels is challenging in practice. In a typical workflow, pathologists examine the hematoxylin and eosin (H&E)-stained optical images such as in Figure 1a. To obtain subtissue labels, the pathologist must manually examine and annotate the distinct regions of each tissue (Lou ). The cost of manual work is one of the reasons to the relatively small number of biological replicates in MSI. The procedure is particularly costly for heterogeneous tissues that require labeling of multiple small sub-regions, or for tissues with challenging histology. To be transferrable to MSI, the subtissue labeling must use specialized software that takes time to learn. As the result, pathologists often avoid labeling the individual locations, and only roughly annotate the entire tissues. Figure 1c shows that, although the tissue on the left is annotated as tumor, the ion image indicates tissue heterogeneity, and the tissue likely contains both cancerous tissue and healthy kidney parenchyma. Such imprecise labeling of tissue locations compromises the accuracy of the resulting classifiers. In addition to the labeling, high correlations between many m/z limit our ability to train accurate classifiers on MSI data. For example, in peptide MSI proteins are digested to give rise to multiple peptide ions of a same protein, and therefore have similar spatial distributions of abundance. An analyte can also produce multiple m/z ions for other reasons, that include sodium adducts, neutral loss ions, fragment ions or multiply charged ions. For example, Figure 1b illustrates the potential sodium and potassium adducts that give rise to correlated features. The high correlation in the high-dimensional vector of m/z features undermines the stability of the classifiers, and leads to overfitting (Kriegsmann ; Vaysse ). To improve our ability to accurately classify subtissue locations in MSI from approximate tissue-level annotations, we propose a semi-supervised approach mi-convolutional neural network (CNN). mi-CNN implements multiple instance learning (MIL) with a CNN. The multiple instance aspect of the approach enables weak supervision from tissue-level annotations when classifying subtissue labels. The convolutional architecture of the CNN captures potential contextual dependencies between m/z, such as sodium adducts and dehydrated ions. Evaluations on simulated and experimental datasets demonstrate that mi-CNN improved the subtissue classification as compared with traditional classifiers such as support vector machine (SVM) and CNN, and successfully reflected the truly predictive spectral features. We propose mi-CNN as an important step toward accurate subtissue classification in basic biology and clinical applications of MSI.

2 Background

2.1 Subtissue-level classification in MSI

Classifying tissue locations using MSI spectra has already received a lot of attention (Kriegsmann ; Vaysse ). Various classifiers have been proposed for these task, including linear discriminant analysis (Dill , 2011), regularized logistic regression (Eberlin ; Sans ), SVM (Calligaris ), and many others. Variations of these approaches such as nearest shrunken centroids (Bemis ) incorporate spatial smoothing to enhance the spatial stability of the results. The classifiers take as input m/z features at each location, classify the label of each location, and classify the tissues according to the majority of its location labels. Recently, neural networks became of a great interest for MSI. Rauser used fully connected neural networks for tumor classification, and Inglese used unsupervised neural networks to cluster tumor tissues. CNNs, a class of deep neural networks originally designed for image classification, were also introduced. CNN convolutes the image using a small-sized kernel to capture the local connectivity within an image (Rawat and Wang, 2017). A novel application of CNN to MSI proposed to view mass spectra as 1D images. Behrmann used a modified Residual Net with 13 935 parameters and kernel size of 3 to capture isotopic patterns in mass spectra. van Kersbergen replaced convolutional layers in Behrmann’s network with dilated convolutional layers to increase receptive size, and capture globally distributed patterns in the spectra. Although the approaches above are quite diverse, they all rely on quality subtissue labels for training. As the result, they are undermined by training sets with approximate annotations, such as in Figure 1.

2.2 Multiple instance learning (MIL)

Multiple instance learning is a semi-supervised framework commonly used in a variety of applications such as image and video analysis (Cheplygina ) and computer-aided diagnosis (Fu ; Kandemir and Hamprecht, 2015), but so far not utilized for MSI. In contrast to the classifiers above, MIL allows weak supervision of the training data. The approach considers groups of observations, called bags, where ground-truth labels are only available at the bag level. The labels of the observations in a bag, called instances, are unknown. In a binary classification problem MIL assumes that a positive bag contains at least one positive instance, but the negative bags contain only negative instances. The homogeneity of the data in the negative bags is the key feature of the approach that enables efficient learning. Existing MIL algorithms can be classified into two groups: bag space algorithms and instance space algorithm. Bag space algorithms, such as mi-Graph (Zhou ) and MIL with instance (Fu ), do not predict labels of individual instances. They classify the bags directly by considering similarities of input features between the bags. Instance space algorithms, such as mi-SVM (Andrews ) and MILboost (Zhang ), take features of the instances as input and predict labels of both instances and bags. For instance-level prediction, mi-SVM is one of the most accurate methods (Kandemir and Hamprecht, 2015). The method treats labels of instances in positive bags as latent variables, and estimates them from the data. Parameters of SVM are optimized by iteratively training SVM on the current instance-level labels, and updating the instance-level labels from their predictions by the current SVM.

2.3 Interpretation of black-box machine learning models

Many of the classification approaches above function like a ‘black box’ and lack interpretability. Post-processing of these models (Molnar, 2019) helps characterize the relative importance of each predictive feature after the model is fit. One such approach is Local Interpretable Model-agnostic Explanation (LIME; Ribeiro ), which ranks features by their importance in predicting the label of a particular observation of interest. LIME generates new observations by permuting the values of the predictive features in the dataset, and obtains the black box predictions for these new observations. Next, LIME weights the new observations by their proximity to the observation of interest, and trains a weighted interpretable model (such as linear regression with subset selection or regularization) on the new observations and their predictions. Finally, LIME repeats this procedure multiple times, and ranks the features by their frequency of being selected as predictive.

3 Multiple instance learning with convolutional neural network (mi-CNN)

3.1 Overview

For the purposes of subtissue classification in MSI, we propose to view a tissue as a bag, and a tissue location as an instance. We assume that tissues annotated as non-tumor do not have tumor locations, but tissues annotated as tumor can have both tumor and non-tumor locations. MIL allows us to train classifiers of subtissue locations on training sets with such rough tissue-level annotations. Instance space algorithms are of a particular interest for this task. Our proposed approach takes as the baseline mi-SVM, which reported high classification accuracy on similar tasks in the past, but substitutes the SVM classifier with a CNN (Fig. 2). Although CNN are frequently used for image analyses in computer vision domains, the proposed approach uses CNN is a different way. We do not apply spatial convolution on a tissue, as we expect high heterogeneity of the microenvironment within a tumor, and an insufficient spatial smoothness of the location labels. Instead, the CNN incorporates convolutional filters to m/z in individual locations to capture potential correlations between m/z of a same location. The CNN has a lightweight structure to avoid overfitting. Finally, post-processing with LIME identifies highly predictive m/z for downstream biological and clinical interpretation.

Fig. 2.

Architecture of mi-CNN. π the probability that tissue j belongs to Class 1, and π is the corresponding probability for the location i in that tissue. (a) Training set and (b) validation set

3.2 Notation

Consider tissue j and its locations i. The tissue is characterized by a collection of mass spectra , and each mass spectrum is a vector of M intensities of m/z features . Let denote the annotation of the tissue j, and y the subtissue label at the ith location. Note that Y is known, and y is unknown. Denote π the probability that tissue j belongs to Class 1, and π is the corresponding probability for the location i in that tissue. Given a mass spectrum , our goal is to predict the label y of this location, and the label Y of the entire tissue.

3.3 Subtissue-level classification

Using cross-entropy as the loss function, the objective of MIL is defined as where is the prediction of a classifier (a CNN) with parameters Θ. Since the subtissue labels y are not observed, they are estimated by an expectation–maximization-like algorithm (Algorithm 1, similar to mi-SVM in Andrews et al., 2003) minimizing the entropy loss [Equation (1)]. First, the labels of all subtissue locations are initialized with the annotations of the corresponding tissues. Next, the algorithm iterates between training CNN on the current location labels, estimating the probability π that location i in tissue j belong to Class 1, and imputing the location labels y from these probabilities until convergence. The constraint in Equation (1) ensures that the labels of non-tumor locations in non-tumor tissues are always classified as non-tumor. On the other hand, if no locations on a tumor tissue are classified as tumor, the location with the highest π in this tissue will be labeled as tumor (Lines 7–10, Algorithm 1). The algorithm stops when the number of updated labels is below a threshold, or when the maximum number of iterations is reached.The architecture of CNN must be adapted to the specifics of the MSI. In these experiments the number of m/z features can be very large (up to one hundred thousand), while the number of biological replicates is relatively small (typically < 50). Therefore, the CNN should be relatively lightweight, and minimize the number of parameters to avoid overfitting. The convolution filter should be large enough to incorporate neighboring m/z, but small enough to benefit from weight sharing and computation reduction.

Algorithm 1 mi-CNN

1: procedure mi-CNN(, threshold) 2: Initialize: for 3: while the number of updated labels < threshold do 4: Compute CNN parameters Θ for current labels y 5: Compute 6: For each j where Y = 1, set 7: for each j where Y = 1 do 8: ifthen 9: Compute 10: Set 11: end if 12: end for 13: end while 14: OUTPUT () 15: end procedure We propose a 1D CNN, consisting of three basic components, namely convolutional layers, pooling layers and fully connected layers. Three convolutional layers hierarchically learn the potential patterns in a mass spectrum. For each layer, the filter size is set according to the contextual dependencies between m/z of interest, such as mass shifts corresponding to sodium adducts and molecular ions. After each convolutional layer, maxpooling reduces the resolution of the previous layer by focusing on large intensities of m/z features and reducing the impact of spectral noise. The CNN includes only one fully connected layer that captures globally distributed patterns (Fig. 2). Softmax activation function is used in the output layer to generate probability of each class. The CNN is trained using stochastic gradient descent. It calculates the partial derivative of the loss function in Equation (1) with respect to the learnable parameters in Θ by backpropagation, and iteratively updates Θ and values in each layer until convergence. Tissue-level classification. The proposed tissue-level classification does not count the proportion of predicted location labels in a tissue. Instead, it treats each tissue as one observation, and uses the collection of mass spectra from all the locations in the tissue as its predictive features. The CNN architecture for this task is the same as the architecture for subtissue-level classification, with the exception of combining the probabilities of the individual locations into a pooling layer that estimates the probability of a tissue-level label. The pooling can be a simple max or mean pooling, or a generalized mean pooling where I is the number of locations on tissue j, and r is an integer tuning parameter. The loss function is the cross-entropy of tissue-level predictions and tissue-level labels where π is pooled probability of π, and are the predicted probabilities by CNN.

3.4 Evaluation and interpretation

We evaluate the accuracy of subtissue classification by calculating the accuracy and the balanced accuracy of label predictions at individual locations. We evaluate the accuracy of tissue-level classification by calculating the accuracy and the balanced accuracy of label predictions at the entire tissues. The metrics are defined as where for subtissue-level classification, TP is the number of correctly classified positive (i.e. tumor) locations across all the tissues, TN is the number of correctly classified negative (i.e. non-tumor) locations across all the tissues. P and N are the total numbers of locations across the tissues classified as tumor or non-tumor respectively. For tissue-level classification, TP, TN, P and N have the same interpretation, but for the entire tissues. Accuracy quantifies the overall proportion of correct predictions by model. When the number of observations in each class is not balanced, and the prediction of a minority class is under-represented, overall accuracy may inaccurately characterize the performance. In this case, balanced accuracy, quantifying the average of individual proportions of correct predictions in each class may provide more insights. Even when we can report the accuracy of classification, the classifier remains a black box. Therefore, we use LIME to assist with the interpretation, and identify m/z features that play a particularly important role in classifying the labels of individual locations. We randomly select a subset of locations in the validation sets in our experiments, use LIME to select top five influential features for each location, and rank the selected features by frequency of being selected in multiple locations.

3.5 Implementation

We implemented mi-CNN using Tensorflow (Allaire and Tang, 2019) in the RStudio environment. We constructed a CNN architecture of three convolutional layers with Rectified Linear Unit (ReLU) activation, and a fully connected layer. The filter sizes of each convolutional layer were set as 38, 18 and 16. The network had 1774 trainable parameters in total for an input length of 850. CNN were trained using batch stochastic gradient descent optimization. Training one epoch of the renal cell carcinoma (RCC) dataset with 5350 spectra took ∼10 s, and training the entire model took ∼1.5 h on a computer with 64 RAM and 3.6 GHz CPU. Baseline model mi-SVM was implemented in R following (Andrews et al., 2003). The maximum number of iterations of mi-SVM was set as 200. The kernel function used was radial basis function with gamma as 0.0012 in simulation datasets and human RCC data, and sigmoid function with gamma as 0.00125 in human bladder cancer data. LIME was implemented using R package lime (Pedersen and Benesty, 2019). The number of bins for continuous variable was set as 4 and the kernel width was set as 0.1 in lime.

4 Data

We evaluated the performance of mi-CNN on five datasets. Two experimental datasets represent two human cancers, and two different MSI acquisition strategies (DESI, characterizing metabolites and lipids and MALDI, characterizing peptides). We further simulated three datasets with known ground truth, inspired from one of the experimental datasets.

4.1 Human RCC experiment

The experiment aimed to classify locations in human renal tissues as tumor versus healthy. Pairs of tumor and healthy tissue sections were collected from eight human donors with RCC. The tissues were subjected to serial H&E staining. Pathology examination of the H&E-stained tissues was unable to classify the tissues at the sub-tissue resolution, and only annotated each entire tissue section as tumor or healthy (Fig. 3).

Fig. 3.

Human RCC experiment. Pairs of tumor and healthy tissues from eight donors were H&E stained, and examined by a pathologist. For each pair, the tissue on the left has the pathology annotation of tumor, and the tissue on the right has the pathology annotation of healthy. The subtissue-level annotations were not available for this experiment. (a) Training set and (b) Validation set Data from the tissues were acquired using DESI ionization source on a Thermo Finnigan LTQ ion trap mass spectrometer in negative mode. The mass range covered 150–1000 Da. In total, 7567 mass spectra were collected from on average 472 locations per tissue. Prior to classification, the spectra were normalized by total ion current (TIC) and resampled to unit mass resolution, which produced 850 m/z features per mass spectrum. The data are available in R package CardinalWorkflow (Bemis, 2019). The pairs of tissues were randomly split into a training set (six pairs) and validation set (two pairs).

4.2 Human bladder cancer experiment

The experiment aimed to classify human bladder cancer tissues as tumor versus stroma. Two tissue microarrays (TMAs) containing core needle biopsies from resected formalin-fixed and paraffin-embedded bladder tissues of 49 patients were built, and each TMA was mounted onto a separate glass slide (Fig. 4). A pathologist annotated 42 tissue cores by carefully examining sub-areas of each tissue and color-coded subregions presenting tumor and subregions presenting stroma (Fig. 4). The annotations are viewed as ground truth in this article. The label tumor was assigned to tissue cores containing tumor subregions, and the label stroma to cores containing only stroma.

Fig. 4.

Human bladder cancer experiment. H&E-stained optical images of human bladder cancer tissues after data acquisition. Letters above each tissue are tissue-level annotations (T, tumor; S, stroma). The colors inside each core indicate subtissue-level pathology (red, tumor; blue, stroma), viewed as the ground truth. (a) Training set: 3 purely stroma tissues and 18 tissues with both tumor and stroma locations. (b) Validation set: 7 purely stroma tissues and 14 tissues with both tumor and stroma The proteins in the tissues were digested with trypsin and the peptides were covered with alpha-cyano-4-hydroxycinnamic acid matrix and analyzed with an AB SCIEX 4800 MALDI Time-of-Flight (TOF)/TOF mass spectrometer in positive mode. The mass range was 800–2300 Da. Subregion annotations containing 3152 mass spectra in total and 77 spectra per tissue were extracted via an affine transformation strategy (Föll ). The two datasets were resampled, combined and pre-processed using Cardinal and MALDIquant algorithms on https://usegalaxy.eu (Bemis ; Föll ; Gibb and Strimmer, 2012). The major pre-processing steps comprised peak picking, re-calibration, removal of contaminants and TIC normalization. The pre-processed file contained 593 m/z features. Annotated tissues from one slide were used as training set (21 tissues), and on the second slide as the validation set (21 tissues). The split aims to test the robustness of the classifier to experimental batch effects. Simulated Dataset 1: one differentially abundant analyte with four features, and a complex background. The simulation is based on the mass spectra from eight healthy tissues in RCC dataset. It mimicked real-life variation in feature intensities, while providing the ground truth regarding both the labels of the tissue locations and the predictive features. First, the eight healthy tissues in the RCC dataset were split into two halves, as shown in Figure 5. Since the mass spectra from these tissues have real-life biological and technological variation, but no systematic variation between the tissue types, they are viewed as a complex background.

Fig. 5.

Simulated Dataset 1. Healthy tissues from the RCC experiment were split into halves. Locations on the left half of the upper newly created tissues were labeled as tumor, and the remaining locations as healthy. The labels were viewed as the ground truth. To mimic pathology annotations, the entire upper tissues were annotated as tumor, and the lower tissues as healthy. A synthetic analyte with four features, differentially abundant between tumor and healthy, was added to the experimental spectra. Its intensity was confounded by a morphology structure spanning both tissue types Second, the newly created tissues were assigned tissue- and subtissue-level labels. The left half of the upper newly created tissues was labeled as tumor, and the remaining locations as healthy. These labels were viewed as the ground truth. To mimic pathology annotations at the tissue level, the entire upper tissues were annotated as tumor, and the lower tissues as healthy. Next, one synthetic differentially abundant analyte between the tumor and the healthy locations was added to the experimental spectra. The simulation incorporated a morphology (grey area in Fig. 5) that confounded the intensity of the differentially abundant analyte and spanned both tumor and the healthy tissue locations. The intensity of this analyte at location i in tissue j was simulated as follows where μ is the mean intensity of the analyte for tumor or stroma, S is the biological between-tissue variation, δ is the variation between the morphological region and background, and is the biological and technological variation between locations of a same tissue. All the random variables are independent. Iin and Iout are indicators of whether a tissue location is inside or outside a morphology region, and is the mean intensity shift of locations inside or outside the morphological region. Here μ = 50 for tumor and for healthy, and . Finally, we simulated four individual m/z features generated by this analyte. The features correspond to dehydrated ions (m/z 407), molecular molecules (m/z 425), sodium adducts (m/z 447) and potassium adducts (m/z 463). Each feature was simulated as . Similarly to the RCC dataset, the tissues were split into a training set of six tissue pairs, and a validation set of two tissue pairs. Simulated Dataset 2: one analyte with differential relative intensity of two of the four features, and a complex background. We mimicked a situation where tumor locations affect the relative intensities of features of a same analyte. We assumed that the synthetic analyte produced more potassium adducts in tumor locations, but more sodium adducts in healthy tissues. The simulation repeated the procedure above, while setting the mean intensity of the analyte to 50 for both tumor and healthy locations, and setting the total intensity of molecular ions and dehydrated ions to . The total intensity of adducts was simulated from . Next, in the tumor locations we set the intensity of sodium adducts , the intensity of potassium adducts . In the healthy locations we set the intensity of sodium adducts , and the intensity of potassium adducts in healthy. Simulated Dataset 3: impact of biological variation, technological variation and sample size. The simulation evaluated the effect of biological and technological variation, and of the number of tissues in the training set, on the performance of mi-CNN. We simulated training sets with between 13 and 130 tissues, half of which annotated at the tissue level as tumor, and the other half as healthy. Each simulated tissue was characterized by 25 locations, with spectra randomly selected from the healthy tissues in the RCC experiment to represent complex background. As in Datasets 1 and 2, only half of the locations in the tumor-annotated tissues had tumor locations as the ground truth. The synthetic analyte was simulated as in Equation (6), with μ = 50 for the tumor locations and μ = 150 for the healthy locations. σ varied from to varied between and .

5 Results

5.1 Results for the simulated datasets

Taking as input tissue-level annotations, mi-CNN accurately classified subtissue labels. We compared the ability of mi-CNN and mi-SVM, and that of the classical CNN and SVM, to classify subtissue labels on Simulated Dataset 1. Table 1 shows that SVM and CNN had high accuracy when comparing the classified locations to tissue-level annotations in the training set. This is expected, as the methods were trained to minimize the classification loss with respect to tissue-level annotations. However, these predictive patterns were undermined by the mislabeled healthy locations in the tumor-annotated tissues of the training set. When comparing the classifications to the ground truth at the location level, the methods had worse accuracy (and worse balanced accuracy, that accounts for differences in the number of tumor and healthy locations) in both the training and the validation dataset. Figure 6 details the predictions on the validation set. It illustrates that SVM had poor predictions for both tumor and healthy locations, while CNN had poor predictions for healthy locations.

Table 1.

Simulated Dataset 1: classification accuracy

	Compare with	SVM	CNN	mi-SVM	mi-CNN
Training	Tissue annotations	0.895 (0.895)	0.948 (0.948)	0.885 (0.882)	0.751 (0.742)
	Subtissue labels	0.747 (0.833)	0.747 (0.833)	0.809 (0.862)	0.979 (0.981)
Validation	Subtissue labels	0.778 (0.671)	0.752 (0.831)	0.833 (0.693)	0.975 (0.964)

Note: Accuracy [Equation (4)] and balanced accuracy [in parentheses, Equation (5)]. The first two rows evaluate the accuracy with respect to tissue-level annotations. The last four rows evaluate the accuracy with respect to labels of within-tissue locations.

Fig. 6.

Simulation Dataset 1: subtissue-level classification on the validation set

Simulation Dataset 1: subtissue-level classification on the validation set Simulated Dataset 1: classification accuracy Note: Accuracy [Equation (4)] and balanced accuracy [in parentheses, Equation (5)]. The first two rows evaluate the accuracy with respect to tissue-level annotations. The last four rows evaluate the accuracy with respect to labels of within-tissue locations. Although the accuracy of mi-SVM and mi-CNN classification compared with tissue-level annotations was lower than that of SVM and CNN (Table 1, Row 1 and 2), their results were closer to the ground truth location labels, both on the training (Table 1, Rows 3 and 4) and the validation sets (Table 1, Rows 5 and 6). Figure 6 illustrates that mi-SVM, and in particular mi-CNN, classified the labels of the individual locations more correctly. Table 2 shows that the results are not limited to situations when the predictive analyte is differentially abundant. Qualitatively similar results are obtained with the predictive pattern in Simulated Dataset 2.

Table 2.

Simulated Dataset 2: classification accuracy

	Compare with	SVM	CNN	mi-SVM	mi-CNN
Training	Tissue annotations	0.532 (0.530)	0.778 (0.777)	0.860 (0.856)	0.700 (0.690)
	Subtissue labels	0.565 (0.538)	0.734 (0.810)	0.758 (0.776)	0.877 (0.800)
Validation	Subtissue labels	0.530 (0.449)	0.869 (0.896)	0.771 (0.500)	0.912 (0.701)

Note: As Table 1, for Simulated Dataset 2.

mi-CNN improved subtissue classification by leveraging changes in relative abundances of features from a same analyte. Tables 1 and 2 show that mi-CNN and CNN had higher classification accuracy with respect to the location labels as compared to mi-SVM and SVM. To evaluate whether the improved accuracy was due to the CNN’s ability to capture the contextual relationships between related m/z, we ranked the predictive features by their importance in these methods using LIME. Figure 7 compares the relative importance of the top five features, when classifying a tumor location in one tissue with mi-SVM and mi-CNN. Both methods classified this location correctly. However, while in mi-SVM the most predictive feature is part of the background, mi-CNN ranked the m/z features (407, 425, 447 and 463) of the synthetic differentially abundant analyte among the top five most predictive.

Fig. 7.

Simulated Dataset 1: LIME-based importance of m/z features when classifying a tumor location in the validation set. A location in the simulated tissue UH9912_01 was classified correctly by both mi-SVM and mi-CNN. However, only mi-CNN captured the four m/z features (407, 425, 447 and 463) from the synthetic differentially abundant analyte. (a) mi-SVM and (b) mi-CNN Simulated Dataset 2: classification accuracy Note: As Table 1, for Simulated Dataset 2. Out of 200 randomly selected locations, mi-CNN consistently ranked all these features among the top five most predictive in 32.3% of the locations, and at least one of these features among the top five most predictive in 99.3% of the locations. The respective numbers for mi-SVM were very low, 0% and 6%. This illustrates the utility of incorporating the domain-specific information in the size of the convolution filter in the neural network. In presence of larger variation, accurate subtissue-level classification with mi-CNN required a larger sample size. We evaluated the accuracy of mi-CNN with respect to subtissue labels on Simulated Dataset 3. Figure 8a shows that, in situations where both between-tissue and within-tissue variation is relatively small, mi-CNN can have a high classification accuracy on the validation set, even when trained on a relatively small number of 12 biological replicates. Figure 8c and d illustrates that the between-tissue variation dominates the classification accuracy, and the within-tissue variation has a relatively small impact. Including more biological replicates is beneficial when variation is large.

Fig. 8.

Simulated Dataset 3: impact of biological (a,b) and technological variation (c,d) of the synthetic analyte, and of the number of training set tissues, on the accuracy of mi-CNN with respect to subtissue labels. When biological variation is relatively small, mi-CNN correctly classified subtissue locations, even with a small number of biological replicates in the training set. Including more biological replicates is beneficial when variation is large

5.2 Results for the experimental datasets

RCC experiment. Although subtissue-level ground truth was not available for the RCC experiment, we used the fact that the tissue sections annotated as healthy were expected to be free from tumor. Therefore, we evaluated the classifications with respect to the homogeneity of subtissue classification of the healthy sections. Figure 9 illustrates that, on the training set, mi-SVM and mi-CNN both had homogeneous predictions of healthy on healthy tissue. On the validation set, mi-CNN had slightly more homogeneous predictions of healthy on healthy tissues than mi-SVM. The predictions of SVM and mi-SVM had no substantial difference in this dataset. CNN has less homogeneous predictions of healthy on healthy tissues than mi-CNN in both training and validation set. This indicates that mi-CNN can improve prediction on healthy locations by considering healthy locations in the tumor tissues.

Fig. 9.

Classification accuracy: the RCC experiment. (a) Tissue-level pathology annotations. (b) Optical images of H&E stained tissues. (c-f) Subtissue-level classifications

Classification accuracy: the RCC experiment. (a) Tissue-level pathology annotations. (b) Optical images of H&E stained tissues. (c-f) Subtissue-level classifications LIME-based interpretation of mi-SVM and mi-CNN highlighted different features as highly predictive. For mi-SVM, m/z 181, 215, 760, 865 and 898 were ranked as the top 5 most important. For mi-CNN, these were m/z 217, 751, 773, 885 and 886. These results indicate that the choice of the classifier plays an important role in both predictive accuracy and the choice of predictive features in this dataset. Human bladder cancer experiment. Figure 10 compares the classification of SVM, CNN, mi-SVM and mi-CNN with the ground truth subtissue-level labels on selected heterogeneous tumor tissue and pure stroma tissue. Similar to results on Simulated Datasets 1 and 2, SVM and CNN classified many stroma locations in the tumor tissue as tumor in the training dataset (see Fig. 10). Not surprisingly, both SVM and CNN had poor predictions in the validation set, presenting mixture predictions of tumor and stroma in the stroma tissue.

Fig. 10.

Classification accuracy: the human bladder cancer experiment. (a) Tissue-level pathology annotations. (b) Subtissue-level pathology labels on optical images. (c) Subtissue-level labels on MSI (viewed as ground truth). (d–g) Subtissue-level classifications mi-SVM and mi-CNN improved the classification of SVM and CNN in terms of both accuracy and balanced accuracy (Table 3). From Figure 10, mi-CNN correctly classified more stroma locations in the tumor tissues than mi-SVM for both training and validation tissues. In addition, mi-CNN had the smallest number of false positives on the stroma tissues, showing most clean classifications in stroma tissues in Figure 10.

Table 3.

Classification accuracy: the human bladder cancer experiment

		SVM	CNN	mi-SVM	mi-CNN
Training	Tissue annotations	0.959 (0.946)	0.827 (0.946)	0.939 (0.946)	0.800 (0.855)
Training	Subtissue labels	0.801 (0.793)	0.767 (0.759)	0.847 (0.842)	0.941 (0.941)
Validation	Subtissue labels	0.755 (0.750)	0.779 (0.774)	0.827 (0.823)	0.928 (0.928)

Note: Values without the parentheses are accuracy calculated by Equation (4). Values in parenthesis are balanced accuracy calculated by Equation (5).

Classification accuracy: the human bladder cancer experiment Note: Values without the parentheses are accuracy calculated by Equation (4). Values in parenthesis are balanced accuracy calculated by Equation (5). LIME analysis of mi-CNN classifications of a subset of 200 locations in validation set selected m/z 925.44, 944.44, 946.44, 1105.54 and 1198.69 as most predictive. Among those, m/z 944.44 is likely to be Histone 2 A, which is known to be upregulated in tumors, and m/z 1105.54 is likely to correspond to Collagen I which is known to be upregulated in stroma. LIME analysis of mi-SVM selected five different predictive features, m/z 1669.73, 1475.72, 1529.7, 963.44 and 1054.49.

6 Discussion

We introduced mi-CNN, a deep MIL approach for classifying subtissue locations in MSI experiments. The multiple instance aspect of the approach enabled training the classifier with weak supervision, using rough tissue-level annotations in the training set. The convolutional architecture of the CNN captured contextual dependencies between the spectral features. Evaluations on simulated and experimental datasets demonstrated that mi-CNN improved the subtissue classification as compared with traditional SVM and CNN. The approach assumed that, in a binary classification problem, a tissue labeled as tumor had at least one tumor location, but the tissues labeled as non-tumor were tumor-free. This assumption is reasonable for MSI, as homogeneous healthy tissue biopsies are relatively easy to obtain, however tumor biopsies are more likely to contain a mix of tumor and non-tumor regions. In a case where both non-tumor and tumor tissues are heterogeneous, the proposed approach is no longer suitable since the reliable label of non-tumor is crucial to the method. Although we only discussed binary classification, mi-CNN can also be adapted to multi-class classification, such as different grades of tumor tissues or multiple tissue types. In contrast to the typical applications of CNN in computer vision, the CNN architecture in this work did not include spatial convolution of tissues. This is a consequence of typically high heterogeneity of the microenvironment within a tumor, and of lack of spatial smoothness of location labels. At the same time, the CNN architecture took advantage of the mass spectral patterns to alleviate the high dimensionality and the high correlations in the predictive feature space. In this work, the size of convolutional filters captured one of the most common sources of correlations between m/z, i.e. the presence of molecular ions and their adducts. The m/z dependencies can become more complicated and ambiguous in other cases, e.g. with larger mass ranges. The convolutional aspects can be easily adapted to such situations types by changing the size of filter and the network depth. Although neural networks have a large parameter space and need large training datasets, we found that mi-CNN worked well on the relatively small numbers of biological tissues. This may be due to a combination of the CNN architecture, which uses locally connected neurons and weight sharing filters to reduce the parameter space and the computational cost, and a relatively large number of heterogeneous subtissue locations available for training. Overall, we found that mi-CNN is well-suited for training subtissue-level classifiers on datasets with tissue-level annotations. This is particularly important in situations where tumor and non-tumor tissues are tightly connected, making manual labeling of the training sets difficult or even impossible at all. The approach is an important step toward taking a full advantage of MSI’s capability of providing molecular information, and minimizing manual labor for tissue imaging and classification.

23 in total

Review 1. Mass spectrometry imaging of biomolecular information.

Authors: Bernhard Spengler
Journal: Anal Chem Date: 2014-12-18 Impact factor: 6.986

Review 2. MALDI Imaging mass spectrometry: current frontiers and perspectives in pathology research and practice.

Authors: Michaela Aichler; Axel Walch
Journal: Lab Invest Date: 2015-01-26 Impact factor: 5.662

3. Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review.

Authors: Waseem Rawat; Zenghui Wang
Journal: Neural Comput Date: 2017-06-09 Impact factor: 2.026

4. Deep learning for tumor classification in imaging mass spectrometry.

Authors: Jens Behrmann; Christian Etmann; Tobias Boskamp; Rita Casadonte; Jörg Kriegsmann; Peter Maaß
Journal: Bioinformatics Date: 2018-04-01 Impact factor: 6.937

5. An experimental guideline for the analysis of histologically heterogeneous tumors by MALDI-TOF mass spectrometry imaging.

Authors: Sha Lou; Benjamin Balluff; Arjen H G Cleven; Judith V M G Bovée; Liam A McDonnell
Journal: Biochim Biophys Acta Proteins Proteom Date: 2016-10-08 Impact factor: 3.036

6. Molecular assessment of surgical-resection margins of gastric cancer by mass-spectrometric imaging.

Authors: Livia S Eberlin; Robert J Tibshirani; Jialing Zhang; Teri A Longacre; Gerald J Berry; David B Bingham; Jeffrey A Norton; Richard N Zare; George A Poultsides
Journal: Proc Natl Acad Sci U S A Date: 2014-02-03 Impact factor: 11.205

7. MALDI mass spectrometry imaging analysis of pituitary adenomas for near-real-time tumor delineation.

Authors: David Calligaris; Daniel R Feldman; Isaiah Norton; Olutayo Olubiyi; Armen N Changelian; Revaz Machaidze; Matthew L Vestal; Edward R Laws; Ian F Dunn; Sandro Santagata; Nathalie Y R Agar
Journal: Proc Natl Acad Sci U S A Date: 2015-07-27 Impact factor: 11.205

8. Cardinal: an R package for statistical analysis of mass spectrometry-based imaging experiments.

Authors: Kyle D Bemis; April Harry; Livia S Eberlin; Christina Ferreira; Stephanie M van de Ven; Parag Mallick; Mark Stolowitz; Olga Vitek
Journal: Bioinformatics Date: 2015-03-15 Impact factor: 6.937

9. Deep learning and 3D-DESI imaging reveal the hidden metabolic heterogeneity of cancer.

Authors: Paolo Inglese; James S McKenzie; Anna Mroz; James Kinross; Kirill Veselkov; Elaine Holmes; Zoltan Takats; Jeremy K Nicholson; Robert C Glen
Journal: Chem Sci Date: 2017-02-21 Impact factor: 9.825

10. Accessible and reproducible mass spectrometry imaging data analysis in Galaxy.

Authors: Melanie Christine Föll; Lennart Moritz; Thomas Wollmann; Maren Nicole Stillger; Niklas Vockert; Martin Werner; Peter Bronsert; Karl Rohr; Björn Andreas Grüning; Oliver Schilling
Journal: Gigascience Date: 2019-12-01 Impact factor: 6.524

2 in total

Review 1. Proximity labeling and other novel mass spectrometric approaches for spatiotemporal protein dynamics.

Authors: Lindsay Pino; Birgit Schilling
Journal: Expert Rev Proteomics Date: 2021-09-15 Impact factor: 4.250

2. massNet: integrated processing and classification of spatially resolved mass spectrometry data using deep learning for rapid tumor delineation.

Authors: Walid M Abdelmoula; Sylwia A Stopka; Elizabeth C Randall; Michael Regan; Jeffrey N Agar; Jann N Sarkaria; William M Wells; Tina Kapur; Nathalie Y R Agar
Journal: Bioinformatics Date: 2022-01-18 Impact factor: 6.937

2 in total