Literature DB >> 35199060

Obtaining spatially resolved tumor purity maps using deep multiple instance learning in a pan-cancer study.

Mustafa Umit Oner^1,2, Jianbin Chen³, Egor Revkov^3,2, Anne James⁴, Seow Ye Heng⁴, Arife Neslihan Kaya³, Jacob Josiah Santiago Alvarez^3,2, Angela Takano⁴, Xin Min Cheng⁴, Tony Kiat Hon Lim⁴, Daniel Shao Weng Tan^5,6,3, Weiwei Zhai^3,7,8, Anders Jacobsen Skanderup^3,2,5, Wing-Kin Sung^2,3, Hwee Kuan Lee^{1,2,9,10,11,12}.

Abstract

Tumor purity is the percentage of cancer cells within a tissue section. Pathologists estimate tumor purity to select samples for genomic analysis by manually reading hematoxylin-eosin (H&E)-stained slides, which is tedious, time consuming, and prone to inter-observer variability. Besides, pathologists' estimates do not correlate well with genomic tumor purity values, which are inferred from genomic data and accepted as accurate for downstream analysis. We developed a deep multiple instance learning model predicting tumor purity from H&E-stained digital histopathology slides. Our model successfully predicted tumor purity in eight The Cancer Genome Atlas (TCGA) cohorts and a local Singapore cohort. The predictions were highly consistent with genomic tumor purity values. Thus, our model can be utilized to select samples for genomic analysis, which will help reduce pathologists' workload and decrease inter-observer variability. Furthermore, our model provided tumor purity maps showing the spatial variation within sections. They can help better understand the tumor microenvironment.

Entities: Chemical

Keywords: computational pathology; deep learning; digital histopathology; digital pathology; genomic sequencing; multiple instance learning; spatial omics; tumor microenvironment; tumor purity; whole-slide images

Year: 2021 PMID： 35199060 PMCID： PMC8848022 DOI： 10.1016/j.patter.2021.100399

Source DB: PubMed Journal: Patterns (N Y) ISSN： 2666-3899

Introduction

High-throughput genomic analysis has become an indispensable tool for cancer research and has enabled precision oncology., One of the crucial factors affecting the quality of genomic analysis is the proportion of cancer cells in the samples. Tumors consist of a complex mixture of cells, such as cancer cells, normal epithelial cells, stromal cells, and infiltrating immune cells. The proportion of cancer cells in a section can significantly influence the accuracy of not only sequencing experiments but also precision oncology. The subjective estimates of the percentage of cancer cells within a tissue section—or tumor purities—are routinely evaluated by pathologists. The tumor purity affects both high-throughput data acquisition and analysis. To detect genetic variations of a tumor sample by next-generation sequencing, the sample needs to have sufficient cancer cells.6, 7, 8 Therefore, an accurate tumor purity estimation is of great clinical importance. A sample with low tumor purity, for example, may lead to a false-negative test result, potentially resulting in missed therapeutic opportunities. Besides, the genomic analysis should incorporate the tumor purity to account for normal cell contamination, which can have confounding effects on analysis results.,9, 10, 11, 12, 13, 14 A novel immunotherapy gene signature missed by traditional methods, for example, was discovered using a differential expression analysis incorporating tumor purity. The tumor purity is also associated with clinical variables.15, 16, 17 Low tumor purity, for instance, was associated with poor prognosis in glioma, colon cancer, and gastric cancer. Moreover, tumor purity was a promising predictor for therapeutic response in colon cancer and gastric cancer. A pathologist estimates tumor purity by reading hematoxylin and eosin (H&E)-stained histopathology slides. Essentially, the pathologist counts the percentage of tumor nuclei over a region of interest (ROI) in the slide. The tumor purity estimated in this way is referred to as percentage tumor nuclei in this study. The percentage tumor nuclei estimates are usually used for sample selection and interpretation of results in the molecular analysis. The pathologist can read any H&E-stained slide and estimate percentage tumor nuclei based on a cellular-level analysis. Thus, this approach is widely applicable, and it has a cellular-level resolution. However, counting tumor nuclei is tedious and time consuming. More importantly, there exists inter-observer variability between pathologists' estimates., Tumor purity can also be inferred from different types of genomic data, such as somatic copy number19, 20, 21, 22, 23, 24, 25 and mutations,26, 27, 28, 29, 30, 31 gene expression data,32, 33, 34, 35 and DNA methylation data.36, 37, 38, 39 The tumor purity obtained from these methods will be referred to as genomic tumor purity in this study. Genomic tumor purity values are usually used in genomics analysis to mitigate confounding effects of normal cell contamination40, 41, 42 and in correlational studies to investigate the associations between tumor purity and clinical variables. Nowadays, genomic tumor purity is accepted as “accurate” for downstream analysis.,,,, Genomic methods generally produce consistent values on different cancer datasets in The Cancer Genome Atlas (TCGA). However, they do not work well for the low-tumor-content samples. Furthermore, genomic methods cannot provide information on the spatial organization of the tumor microenvironment. Hence, both genomics methods and pathologists' slide reading approach have different strengths and limitations. Pathologists routinely estimate percentage tumor nuclei in tissue sections. However, besides previously stated challenges, pathologists' estimates do not correlate well with genomic tumor purity values., To assist pathologists, this study develops a machine learning model that predicts the tumor purity from H&E-stained histopathology slides such that the predictions are consistent with the genomic tumor purity values. In addition to giving accurate tumor purity measurements, our model is cost-effective compared with genomics methods. It also provides information about the spatial organization of the tumor microenvironment. Two types of machine learning models can be utilized to predict tumor purity from digital histopathology slides: patch-based models and multiple instance learning (MIL) models. The patch-based models require pathologists' pixel-level annotations showing whether each pixel is cancerous or normal. Although different studies employed this approach for tumor purity prediction,44, 45, 46, 47, 48, 49 they had limited coverage since pixel-level annotations are rarely available, expensive, and tedious. On the other hand, the MIL models do not require pixel-level annotations. Instead, they use sample-level labels, which are weak labels providing only aggregate information rather than pixel-level information. However, they can easily be collected from pathology reports, electronic health records, or different data modalities. The MIL models were successfully used in various digital pathology tasks,50, 51, 52 whereas this is the first study using the MIL approach to predict tumor purity. This study uses sample-level genomic tumor purity values as labels during training and does not require tedious pixel-level annotations by pathologists. We formulate predicting tumor purity of a sample from its H&E-stained histopathology slides as an MIL task (Figure 1A). The sample's top and bottom slides are cropped into many patches, and these patches are collected to form a bag. Then, the task is to predict the bag-level label of tumor purity. To achieve this task, we developed a novel MIL model with a “distribution” pooling filter (see experimental procedures for details).

Figure 1

A novel MIL model predicts sample-level tumor purity from H&E-stained digital histopathology slides

(A) Our model accepts a bag of patches randomly cropped from the top and bottom slides of a sample as input and predicts the sample's tumor purity at its output. The feature extractor module extracts a feature vector for each patch inside the bag. The MIL pooling filter, namely distribution pooling, summarizes extracted features into a bag-level representation by estimating marginal feature distributions. Finally, the bag-level representation transformation module predicts the sample-level tumor purity. We use tumor purity values inferred from genomic sequencing data by ABSOLUTE as ground-truth labels during training.

(B) We obtain a spatial tumor purity map for a slide by inferring tumor purity over each 1-mm2 ROI within the slide in a sliding window fashion. The map shows the variation of tumor purity over the slide.

(C) Our MIL model learned discriminant features for cancerous versus normal histology from sample-level genomic tumor purity labels without requiring exhaustive annotations from pathologists. We used discriminant features to obtain cancerous versus normal segmentation maps for tumor slides. Trained feature extractor module extracts features of patches from tumor and normal slides of a patient. Then, segmentation maps are obtained by hierarchical clustering over the extracted feature vectors.

(D) Our MIL model successfully classifies samples into tumor versus normal.

A novel MIL model predicts sample-level tumor purity from H&E-stained digital histopathology slides (A) Our model accepts a bag of patches randomly cropped from the top and bottom slides of a sample as input and predicts the sample's tumor purity at its output. The feature extractor module extracts a feature vector for each patch inside the bag. The MIL pooling filter, namely distribution pooling, summarizes extracted features into a bag-level representation by estimating marginal feature distributions. Finally, the bag-level representation transformation module predicts the sample-level tumor purity. We use tumor purity values inferred from genomic sequencing data by ABSOLUTE as ground-truth labels during training. (B) We obtain a spatial tumor purity map for a slide by inferring tumor purity over each 1-mm2 ROI within the slide in a sliding window fashion. The map shows the variation of tumor purity over the slide. (C) Our MIL model learned discriminant features for cancerous versus normal histology from sample-level genomic tumor purity labels without requiring exhaustive annotations from pathologists. We used discriminant features to obtain cancerous versus normal segmentation maps for tumor slides. Trained feature extractor module extracts features of patches from tumor and normal slides of a patient. Then, segmentation maps are obtained by hierarchical clustering over the extracted feature vectors. (D) Our MIL model successfully classifies samples into tumor versus normal. Our MIL models successfully predicted sample-level tumor purity in different TCGA cohorts and a local Singapore cohort. The predictions were consistent with genomic tumor purity values (Figure 2). Besides, we obtained spatially resolved tumor purity maps showing the variation of tumor purity over the slides (Figures 1B and 4). We also showed that our MIL models learned discriminant features for cancerous versus normal histology (Figures 1C and 5) and classified samples into tumor versus normal almost perfectly in all cohorts (Figures 1D and 3B).

Figure 2

The MIL model's tumor purity predictions correlate significantly with genomic tumor purity values

(A–I) A scatterplot of genomic tumor purity versus the MIL model's prediction is given for only tumor samples in the test set of each cohort: (A) BRCA, (B) GBM, (C) LGG, (D) LUAD, (E) LUSC, (F) OV, (G) PRAD, (H) UCEC, and (I) LUAD_SG. Correlation coefficients with 95% CIs are given at the top of each plot. Note that the red dotted line in each plot shows the diagonal (i.e., y = x line). All data points would align on the diagonal line in case of zero prediction error.

(J) Violin plots summarize genomic tumor purity values and pathologists' percentage tumor nuclei estimates in the test set of each cohort. Correlation coefficients are given at the top. Red lines show median values. ns, not significant; n, the number of tumor samples.

Results

In this study, there were 10 different TCGA cohorts and a local Singapore cohort. Each TCGA cohort had more than 400 patients, and the Singapore cohort had 179 lung adenocarcinoma patients, such that each patient had both histopathology slides and corresponding genomic sequencing data (Table 1, see also Tables S1 and S2). The histopathology slides in each cohort were randomly segregated at the patient level into training, validation, and test sets (Figures S1 and S2). We trained our MIL model on the training set and chose the best set of model weights based on validation set performance. Finally, we evaluated the performance of our trained MIL model on the data of completely unseen patients in the hold-out test set. Each patient in the test set was like a new patient walking into the clinic.

Table 1

The TCGA and Singapore cohorts

	Tumor samples			Normal samples
Cohorts	Train	Validation	Test	Train	Validation	Test
BRCA	559	185	185	76	27	30
GBM	285	95	94	0	0	0
KIRC	261	85	89	220	71	73
LGG	273	91	90	0	0	0
LUAD	266	90	90	101	37	33
LUSC	273	90	90	132	41	47
OV	310	103	103	53	13	18
PRAD	258	85	85	72	15	24
THCA	258	85	85	48	18	17
UCEC	270	90	89	18	4	10
LUAD_SG	107	36	36	0	0	0

In each cohort, a patient has only one tumor sample and one matching normal sample, if available. The numbers of tumor and matching normal samples in training, validation, and test sets are presented for each cohort. The data are segregated at the patient level. See also Tables S1 and S2, Figures S1–S3, and Note S1.

The TCGA and Singapore cohorts In each cohort, a patient has only one tumor sample and one matching normal sample, if available. The numbers of tumor and matching normal samples in training, validation, and test sets are presented for each cohort. The data are segregated at the patient level. See also Tables S1 and S2, Figures S1–S3, and Note S1.

MIL models' tumor purity predictions correlate significantly with genomic tumor purity values

Our models' performance in 10 different TCGA cohorts was evaluated by correlation analyses between genomic tumor purity values obtained from ABSOLUTE and our MIL models' predictions. The performance metric was Spearman's rank correlation coefficient. We obtained significant correlations (p < 0.05) in eight cohorts, namely breast invasive carcinoma (BRCA), glioblastoma multiforme (GBM), brain lower grade glioma (LGG), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), ovarian serous cystadenocarcinoma (OV), prostate adenocarcinoma (PRAD), and uterine corpus endometrial carcinoma (UCEC) (Figures 2A–2H and Table S3). While the minimum Spearman's = 0.418 (p = 4.1 × 10−5; 95% confidence interval [CI], 0.226–0.574) was in the LGG cohort, the maximum Spearman's = 0.655 (p = 4.6 × 10−24; 95% CI, 0.547–0.743) was in the BRCA cohort. We compared our MIL models'’ predictions with tumor purity values obtained from ESTIMATE as well and observed similar performance (Table S4). We repeated the correlation analyses between genomic tumor purity values and pathologists' percentage tumor nuclei estimates (Figure 2J and Table S3). While the minimum Spearman's = 0.240 (p = 2.7 × 10−2; 95% CI, 0.009–0.446) was in the thyroid carcinoma (THCA) cohort, the maximum Spearman's = 0.344 (p = 9.8 × 10−4; 95% CI, 0.139–0.531) was in the UCEC cohort. There was no significant correlation in the GBM and LGG cohorts. Hence, the minimum correlation with MIL predictions ( = 0.418 in the LGG cohort) was higher than the maximum correlation with pathologists' percentage tumor nuclei estimates ( = 0.344 in the UCEC cohort). This implies that MIL predictions are more consistent with genomic tumor purity values than the pathologists' percentage tumor nuclei estimates. Moreover, we conducted statistical tests on correlation coefficients to compare our MIL models' predictions and pathologists' percentage tumor nuclei estimates. We used the Fisher's z transformation-based method of Meng et al. Two methods were compared only when there was a significant correlation for both methods in a cohort (Table S3). MIL predictions were significantly better than pathologists' estimates in all cohorts except LUSC and PRAD. For these cohorts, two methods performed on par (pcomp = 1.7 × 10−1 > 0.05 for the LUSC and pcomp = 2.0 × 10−1 > 0.05 for the PRAD) in the test sets.

MIL models' predictions have lower mean absolute error than percentage tumor nuclei estimates

Apart from Spearman's correlation coefficients, we also checked the mean absolute errors between genomic tumor purity values and MIL models' predictions, and between genomic tumor purity values and pathologists' percentage tumor nuclei estimates (Table S5). In the analyses of MIL predictions, the minimum and maximum mean-absolute-error values of = 0.105 (standard deviation = 0.091) and = 0.173 ( = 0.154) were obtained in the OV cohort and the PRAD cohort, respectively. On the other hand, in the analyses of pathologists' percentage tumor nuclei estimates, the minimum and maximum mean-absolute-error values of = 0.132 ( = 0.124) and = 0.280 ( = 0.151) were obtained in the UCEC cohort and the LUAD cohort, respectively. In all cohorts, pathologists' estimates were generally higher than genomic tumor purity values (Figure 2J). Similar to our comparison in correlation analyses, we compared two methods based on absolute errors in the test sets of different cohorts. We used the Wilcoxon signed-rank test on absolute error values for tumor samples in the test sets (Table S5). Absolute error values in MIL predictions were significantly lower than those in pathologists' percentage tumor nuclei estimates in all cohorts except the LGG cohort. Two methods performed similarly (pcomp = 5.4 × 10−2 > 0.05) in the test set of the LGG cohort. Figure 3A summarizes correlation and absolute error analyses. We observed that MIL predictions had lower mean absolute error and higher Spearman's correlation coefficient than pathologists' percentage tumor nuclei estimates.

MIL model predicts tumor purity from H&E-stained slides of FFPE sections in the Singapore cohort

Our MIL models successfully predicted tumor purity from H&E-stained digital histopathology slides of fresh-frozen sections in different TCGA cohorts. Besides, we evaluated their performance on slides of formalin-fixed paraffin-embedded (FFPE) sections in a local Singapore cohort consisting of 179 lung adenocarcinoma patients (see Note S1 for details). Similar to TCGA cohorts, we segregated data at the patient level (Table 1 and Figure S3). We used transfer learning and initialized the model with the weights of the MIL model trained on the TCGA LUAD cohort. Then, we froze the weights of all layers in the network except the first convolutional layer in the feature extractor module (Figure 1A). This helped the network adapt the first layer weights to learn the tissue morphology in FFPE sections, which were different from fresh-frozen sections (Figure S4). Note that, while the FFPE method preserves morphology better and is the routine in histopathology, the fresh-frozen method preserves nucleic acids better and is preferred for molecular analysis. Similar to the performance in the TCGA LUAD cohort, we obtained a Spearman's = 0.554 (p = 4.6 × 10−4; 95% CI, 0.283–0.745) and the mean absolute error of = 0.120 ( = 0.091) in the test set of the Singapore LUAD (LUAD_SG) cohort (Figures 2I and 3A). There were substantial differences between the TCGA and LUAD_SG cohorts, such as tissue preservation method (fresh-frozen versus FFPE) and ancestry of patients (European versus East Asian). However, our MIL model successfully predicted tumor purity from slides of FFPE sections using transfer learning with minimal training only in the first convolutional layer of the feature extractor module. The results suggested that our MIL models learned robust features for tumor purity prediction tasks at the higher levels of the network. We also checked the performance of the TCGA LUAD model directly on the LUAD_SG cohort used as an external validation set (Figure S5). Nevertheless, we did not get a significant correlation ( = 0.141; p = 0.06 > 0.05), which highlighted the necessity of adapting the weights of the first layer in feature extractor to FFPE slides using transfer learning. For pathologists' estimates, we obtained a Spearman's = 0.361 (p = 3.0 × 10−2; 95% CI, 0.029–0.644) and the mean absolute error of = 0.202 ( = 0.105) in the test set of the LUAD_SG cohort (Figure 2J). We statistically compared the MIL model's predictions and percentage tumor nuclei estimates. While the difference was not significant (pcomp,ρ = 2.3 × 10−1 > 0.05) in terms of correlation coefficient, it was significant (pcomp,abs = 7.3 × 10−4 < 0.05) in terms of absolute error.

Tumor purity varies spatially within a sample: Top and bottom slides of a sample are different in tumor purity

Intra-tumor heterogeneity is a well-known phenomenon in solid cancers.57, 58, 59, 60, 61 It results in therapeutic failure and drug resistance. We checked whether it is observable from tumor purity predictions of the trained MIL model on the top and bottom slides of a sample. For each slide of a tumor sample with both top and bottom slides in a cohort, 100 bags are created from the slide's patches and predictions are obtained from the trained MIL model. Then, the predictions of two slides are statistically compared using the Wilcoxon signed-rank test. Figure 3C shows the box plot of p values obtained from the statistical tests in each cohort's test set. There is a significant difference between the MIL predictions on the top and bottom slides of the same tumor sample. In all cohorts, at least 75% of samples have p value p < 1.0 × 10−8 and at least 95% of samples have p value p < 0.05. Hence, we conclude that there is a variation in tumor purity between the top and bottom sections of a tumor sample; i.e., tumor purity varies spatially within the sample. The degree of spatial variation in tumor purity is different for different cancer types (Table S6). The UCEC, LGG, and GBM cohorts had the lowest mean absolute differences (μdabs) between top and bottom slides' predictions (μdabs ≤ 0.090); i.e., they were the most spatially homogeneous cancers among all cohorts. On the other hand, the PRAD cohort had the highest mean absolute difference (μdabs = 0.144); i.e., it was the most spatially heterogeneous cancer in tumor purity.

Predicting a sample's tumor purity using both top and bottom slides is better than using only one slide

We checked if there is a significant difference between predicting a sample's tumor purity by using both slides (top and bottom) and using only one slide. For a tumor sample with two slides in a cohort, let psmpl be genomic tumor purity value of the sample; be tumor purity prediction obtained from trained MIL model by using both of the slides together; and be tumor purity predictions obtained from trained MIL model for individual slides. We compared the absolute error of sample-level prediction esmpl = | - psmpl| and the expected value of absolute errors of slide-level predictions esld = 0.5 ∗ (| - psmpl| + | - psmpl|). We used the Wilcoxon signed-rank test on the difference of esmpl − esld (Table S7). Note that the PRAD (n = 21) and UCEC (n = 23) cohorts were excluded from this study due to few samples with two slides. In the test sets of BRCA, LUAD, LUSC, and OV cohorts, using both slides for tumor purity prediction gave better results in terms of absolute error (Figure 3D). However, in the test sets of GBM and LGG cohorts, there was no significant difference using both slides or one slide alone. Indeed, this is not surprising since they had the lowest mean absolute differences between the slides' predictions (Table S6); i.e., the most spatially homogeneous tumors. In fact, when both slides are the same, sample-level prediction and slide predictions would be the same. We conclude that predicting a sample's tumor purity using both the top and bottom slides together is better than using only one of them whenever possible.

Spatial tumor purity map analysis reveals the probable cause of pathologists' high percentage tumor nuclei estimates

Pathologists' percentage tumor nuclei estimates were generally higher than genomic tumor purity values for all TCGA cohorts in our analysis (Figure 2J and see also Figures S1 and S2). Previous studies also stated that,, but the reasons remain unclear. We hypothesized that incorrect size and selection of ROI might be the cause. We obtained tumor purity maps by our trained MIL models in different TCGA cohorts and conducted error analysis over them to test our hypothesis. We followed the same procedure as in Smits et al. to simulate pathologists' percentage tumor nuclei estimation. Tumor purity is predicted over an ROI of 1 mm × 1 mm around each patch in a slide, which corresponds to 16 patches at 20× zoom level (each patch is around 256 μm × 256 μm at the specimen level) (Figure 4A). Then, the predicted value is assigned to the patch in the tumor purity map (Figure 4B). We also obtained tumor purity maps for slides in the Singapore cohort (Figures 4D, 4E, and S6–S10). We observed that a tumor purity map shows variation within the slide, which implies that ROI selection is crucial in pathologists' percentage tumor nuclei estimation. Since tumor purity was higher in pathologists' percentage tumor nuclei estimates, we investigated whether pathologists might have selected high tumor content regions over the slides for percentage tumor nuclei estimation. The highest prediction in a slide's tumor purity map was used as the slide's tumor purity value. Then, error analyses were conducted over the slides' tumor purity values compared with pathologists' percentage tumor nuclei estimates and genomic tumor purity values. The error analyses were repeated by gradually extending the ROI such that a slide's tumor purity was calculated as the average of top-k% of the patches with the highest scores (k = 0, ···, 100) in the slide's tumor purity map. We discovered that the mean absolute error between the slides' predictions and pathologists' percentage tumor nuclei estimates increases as we extend the ROI to cover the lower tumor purity regions (Figure 4G). These observations suggested that pathologists may tend to select high-tumor-content regions to estimate percentage tumor nuclei. The LGG and UCEC cohorts may look exceptional with almost constant mean-absolute-error plots. However, this is expected since these two cohorts' samples have high genomic tumor purity values (Figure S1), so the variation within the slides is very low. The PRAD cohort's plot also has a different pattern than the others. It has an initial decrease and an increase in the later stages, emphasizing the importance of the ROI size. The pathologists may need to analyze a bigger ROI depending on the morphology of the tissue origin to reach a certain nuclei count while estimating percentage tumor nuclei. The PRAD may be one of them due to the glandular structure of the prostate. Furthermore, as the ROI grows, the mean absolute error between the slides' predictions and genomic tumor purity values decreases (Figure 4H). Indeed, this is expected since our MIL models converge to their original performance of prediction over the whole slide (Figure 2). It is even more evident in the LUSC and OV cohorts. The error decreases initially but increases later since our MIL models underestimated the tumor purity compared with genomic tumor purity values in these cohorts.

MIL model learns discriminant features for cancerous versus normal tissue histology

We explored the capability of our MIL model's feature extractor on learning discriminant features for cancerous versus normal tissue histology while being trained on sample-level genomic tumor purity labels. For each patient having both tumor and matching normal samples, features of patches cropped over the slides of the tumor and normal samples were extracted using the trained feature extractor module of the MIL model. Then, slide-level cancerous versus normal segmentation maps were obtained by performing a hierarchical clustering over the extracted feature vectors (Figure 1C and see supplemental experimental procedures for details). The resolution of segmentation was at the patch level, and each patch was around 256 μm × 256 μm at the specimen level. In the test set of the LUAD cohort, there were 33 patients both with tumor and matching normal samples. We constructed slide-level segmentation maps for these patients (Figure 5). We observed that segmentation maps were consistent with the LUAD histopathology during the qualitative assessment of the segmentation maps. While healthy tissue components, like blood vessels, stroma regions, and normal tissue structures, were labeled normal, regions invaded by neoplastic cells were labeled cancerous. Hence, we qualitatively validated that our MIL model learned discriminant features for cancerous versus normal tissue histology in LUAD from sample-level genomic tumor purity labels without requiring pixel-level annotations from pathologists.

MIL model successfully classifies samples into tumor versus normal

A good tumor purity predictor should be able to discriminate between tumor and normal. We checked our MIL model's performance in the tumor versus normal sample classification task. Tumor purity predictions for all samples in the test set of each cohort were obtained and a receiver operating characteristic (ROC) curve analysis was conducted. Then, the area under the ROC curve (AUC) was calculated and a 95% CI was constructed using the percentile bootstrap method. Note that GBM and LGG cohorts were excluded from analysis since there were no normal slides in these cohorts. Our MIL models successfully discriminated tumor samples from normal samples in all cohorts with AUC values greater than or equal to 0.927 (Figure 3B). We got the minimum and maximum AUC values of 0.927 (95% CI, 0.826–0.993) and 1.000 (95% CI, 1.000–1.000) on the test sets of THCA and BRCA cohorts, respectively. Note that, although we did not get a strong correlation between genomic tumor purity values and MIL predictions in the test sets of kidney renal clear cell carcinoma (KIRC) and THCA cohorts, our models successfully classified samples into tumor versus normal in these cohorts. Furthermore, we obtained an AUC value of 0.991 (95% CI, 0.975–1.000) on the test set of the LUAD cohort. Our model outperformed the classical image processing and machine-learning-based method of Yu et al. (AUC, 0.85) and the DNA plasma-based method of Sozzi et al. (AUC, 0.94). Besides, our model performed on par with the deep learning model of Coudray et al. (AUC, 0.993), which was trained on tumor versus normal classification, and the deep learning model of Fu et al. (AUC, 0.977 with 95% CI, 0.976–0.978), which was fine-tuned on pathologists' percentage tumor nuclei estimates in a transfer learning setup. However, there is one concern about the dataset preparation methods of Coudray et al. and Fu et al. They obtained the datasets by segregating data either at slide level or at patch level. These data segregation methods might lead to a severe data leakage problem, and the models' performance might be illusory.

Discussion

Accurate tumor purity estimation is crucial for high-throughput genomic analysis. It is routinely estimated by pathologists; however, pathologists' estimates suffer from inter-observer variability and do not correlate well with genomic tumor purity values. Besides, percentage tumor nuclei estimation by pathologists is tedious and time consuming. To overcome these challenges, we developed a novel MIL model with a distribution pooling filter. It predicted tumor purity from H&E-stained histopathology slides of fresh-frozen and FFPE sections in different TCGA cohorts and a Singapore cohort, respectively. The predictions were consistent with genomic tumor purity values, and they outperformed pathologists' percentage tumor nuclei estimates in the TCGA cohorts. Hence, our MIL models can be utilized for sample selection for high-throughput genomic analysis, which will help reduce pathologists' workload and decrease inter-observer variability. Moreover, spatially resolved tumor purity maps obtained using our MIL models can substantially contribute to a better understanding of the tumor microenvironment. Lastly, our models' predictions can be used as prognostic biomarkers to stratify patients.

MIL model can pre-screen slides for genomic analysis

The current workflow for sample selection for genomic analysis includes screening slides of 8–12 sections, choosing the most appropriate slide, and, possibly, marking out a high tumor content region on the slide for macrodissection before extraction. This adds a heavy burden to pathologists' workload. To help pathologists, our MIL model can pre-screen the slides and suggest the best slide (with the highest predicted tumor purity) for high-throughput sequencing. Moreover, it can propose high-tumor-content regions over the slide for macrodissection via spatial tumor purity map, which is remarkably important for low-purity samples (for example, in lung cancers). Furthermore, our MIL model's predictions can be used as a quality control metric to decide if a section has enough tumor content for sequencing or if a section requires deeper sequencing. This can avoid wasting the limited amount of tissue (especially in biopsy samples) in failed sequencing attempts.

Genomic tumor purity values and pathologists' percentage tumor nuclei estimates are complementary

While genomic tumor purity values have recently been recognized as accurate for downstream genomic analysis after sequencing, sequencing is still subject to sample selection based on pathologists' percentage tumor nuclei estimates. On the other hand, pathologists' slide reading method is inherently limited since it requires cellular-level analysis. It can give reliable results over a selected ROI, but it may not be applicable for sample-level tumor purity prediction, ideally requiring the analysis of gigapixel digital histopathology slides. Therefore, we used sample-level genomic tumor purity values as labels for training our MIL models. Now, we can use our MIL models to support pathologists for sample selection for molecular analysis by pre-screening slides and proposing ROIs for further assessment.

Spatially resolved tumor purity maps can enhance spatial omics

We obtained tumor purity maps showing the variation of tumor purity in slides using our trained MIL models (Figures 4B and 4E). They can potentially help understand the interaction of cancer cells with other tissue components (like normal epithelial, stromal, and immune cells) in the tumor microenvironment, which is a key player in tumor formation and primary determinant of therapeutic response., Furthermore, they can enhance spatial-omics technologies.70, 71, 72, 73

Weak tumor purity labels innately necessitated an MIL approach

Previous studies based on patch-based models worked on few cancer types with relatively few patients (like 10 patients or 64 patients,) since they required pixel-level annotations (rarely available). However, using genomic tumor purity values as sample-level weak labels enabled us to conduct a pan-cancer study on 10 different TCGA cohorts, where each cohort had more than 400 patients. On the other hand, unlike pixel-level annotations providing whether each cell is cancerous or normal, the genomic tumor purity of a sample tells us only the proportion of cancer cells within the sample. Therefore, training a machine learning model using weak tumor purity labels innately necessitated an MIL approach, where a sample was represented as a bag of patches from the sample's slides, and the sample's genomic tumor purity value was used as the bag's label.

The sources of error in MIL predictions

Our MIL models successfully predicted tumor purity (Figure 2). However, they slightly deviated from the genomic tumor purity values. There may be different sources of prediction errors. While some of them can be eliminated, some are inevitable. First, we have fewer patients in our datasets (300 patients per training set) than traditional deep learning datasets containing millions of independent samples. Considering the complexity of cancer, our MIL models effectively captured features that distinguish cancerous versus normal. We also expect that the performance will improve with the increasing number of patients. Indeed, we obtained the best performance in our largest cohort of BRCA (559 patients in the training set). Second, our MIL model uses histopathology slides from the top and bottom sections of the tumor portion. We have already shown the variation in tumor purity between the top and bottom sections of the tumor samples. Thus, for samples with only one slide, the prediction error is expected to be higher. Third, we checked if necrosis regions inside the slides affect our MIL models' performances. For each cohort (except the LGG cohort, in which all samples have percentage necrosis of 0), Spearman's correlation coefficients between absolute errors in MIL predictions and percentage necrosis values are calculated in the test set (Table S8). There is no significant correlation (p > 0.05) in any cohorts except the LUSC cohort, in which we observe a low correlation of 0.253 (p = 1.6 × 10−2; 95% CI, 0.062, 0.432). Overall, it seems that our models can handle necrosis regions well. Last, our model's predictions are based on morphology in H&E-stained histopathology slides. However, genomic tumor purity values were based on DNA data, and all the effects of genetic changes (so the genomic tumor purity changes) may not be observable from the slides due to the selective dyeing characteristics of H&E staining. Even some genetic changes may not manifest in morphology.

Why do MIL predictions perform better than percentage tumor nuclei estimates?

Compared with the percentage tumor nuclei estimates by pathologists, our MIL models' predictions gave a higher correlation and lower mean absolute error with genomic tumor purity values (Figure 3A). One of the primary reasons for this superiority is that the MIL models were trained directly on genomic tumor purity values, which enabled the MIL models to learn associated features. Another reason might be that pathologists concentrate more on tumor cells than infiltrating normal cells within the tumor, which may result in missed normal tissue components. Moreover, cancer cells are usually enlarged. They occupy more space than normal tissue components, stromal cells, and infiltrating lymphocytes, which may create an implication of high tumor content. Pathologists may fail to incorporate this effect in their estimates correctly and may overestimate percentage tumor nuclei. Indeed, this was the case in the cohorts we analyzed (Figures 2J, S1, and S2). Finally, while our MIL models predict tumor purity over the whole slide, pathologists estimate the percentage tumor nuclei by analyzing some selected ROI over the slide. Therefore, the size and selection of the ROI might cause the overestimation in pathologists' percentage tumor nuclei estimates (Figure 4G).

Limitations and future work

Our MIL models, by design, apply to any tumor sample with H&E-stained histopathology slides. We tested them on tumor samples with a broad range of tumor purity values. However, testing them on samples with percentage tumor nuclei lower than TCGA threshold would strengthen the applicability of our MIL models. It is reserved for future work. We evaluated our MIL models on hold-out test sets to simulate real-world clinical workflow and obtained successful results. Besides, our analysis on the LUAD_SG cohort using transfer learning with minimal training for domain adaptation showed that our MIL models learned robust features for tumor purity prediction tasks. However, we could not validate our models on external cohorts due to differences between fresh-frozen and FFPE tissue preservation methods, which might further consolidate their robustness. We qualitatively validated our spatially resolved tumor purity maps. Quantitative validation of them using spatial-omics technologies is reserved for future work, which requires recruitment of a prospective cohort, conducting spatial-omics and image analysis, and evaluating purity maps obtained from the MIL model against spatial-omics. Lastly, our MIL models are deep learning based, and deep learning algorithms perform better with more data. Training of the models with larger cohorts would help to improve the model performance by better capturing patient-to-patient variations.

Experimental procedures

Resource availability

Lead contact

Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Wing-Kin Sung (ksung@comp.nus.edu.sg).

Materials availability

This study did not generate new unique reagents.

Datasets

We downloaded H&E-stained histopathology slides of fresh-frozen sections in 10 different cohorts in TCGA (Table 1). We selected these cohorts since they have more than 400 patients with both histopathology slides and corresponding genomic sequencing data in TCGA. Each patient had a tumor sample, and some patients also had matching normal samples. In TCGA, primary tumor samples and matching normal samples (adjacent non-neoplastic solid tissue or blood) were collected at the Tissue Source Sites (TSSs) from patients who had received no prior treatment (chemotherapy or radiotherapy) for their disease. Collected samples were frozen and shipped overnight to the Biospecimen Core Resource (BCR) for TCGA while maintaining a temperature less than −180°C. At the BCR, each frozen sample was cut into portions. Then, two glass slides (sometimes only one) were prepared by cutting sections 4–6 μm thick from the top and bottom of a portion and staining with H&E. Based on the information from the BCR (via personal communication), these slides were scanned at 40× magnification using an Aperio XT slide scanner. A board-certified pathologist reviewed the slides. Upon passing pathology review, the remaining portion without any tumor enrichment was sent for genomic analysis. In other words, the (top and bottom) slides and the portion sent for genomic analysis were immediate neighbors. During review (personal communication), a pathologist estimated (1) percentage tumor nuclei and percentage of all other nuclei, which add up to 100%; and (2) percentage cellular tumor, percentage normal, percentage stroma, and percentage necrosis, which add up to 100%. Percentage tumor nuclei in each slide of a frozen tumor section was estimated by evaluating at least 10 specimen fields (excluding necrosis regions) via the digital slide viewer. Tumor portions with percentage tumor nuclei of ≥60% and percentage necrosis of ≤20% were accepted to the study and sent for genomic analysis. Besides, pathologists confirmed from slides of frozen normal sections that the adjacent normal tissues (if available) were free of tumor cells. We also collected H&E-stained histopathology slides of FFPE sections in an East Asian cohort consisting of 179 lung adenocarcinoma patients in Singapore. In the Singapore cohort, only one slide was prepared for each tumor sample from the top section of the tissue used for sequencing, and there were no normal samples. All the slides of FFPE sections were prepared, stained, and scanned at 40× magnification using The Philips IntelliSite Pathology Solution (Koninklijke Philips, The Netherlands) in the same laboratory in Singapore. In each cohort, we randomly segregated the data at the patient level (i.e., slides from the same patient should be in the same set) into training (60%), validation (20%), and test (20%) sets (Table 1), which had similar tumor purity distributions (Figures S1 and S2). Note that segregating data at the patient level is crucial to prevent data leakage while training machine learning models. The training set was used to train the machine learning model, the validation set was used to choose the best model, and the test set was held out as unseen data for evaluation of the best model. The list of patients and slides in each set are given in Document S2.

MIL model

Our novel MIL model consists of three modules: feature extractor module, MIL pooling filter, and bag-level representation transformation module (Figure 1A). We use neural networks to implement the feature extractor module and the bag-level representation transformation module to parameterize the learning process fully (see supplemental experimental procedures for details). We use our novel distribution pooling filter as the MIL pooling filter. It is more expressive than standard pooling filters (like mean and maximum pooling) regarding the amount of information captured while obtaining bag-level representations. Given a bag of patches, the feature extractor module extracts a feature vector for each patch inside the bag. Then, thanks to its superiority, the distribution pooling filter obtains a strong bag-level representation by estimating the marginal distributions of the extracted features. Finally, the bag-level representation transformation module predicts tumor purity. This system of neural network modules is trained end-to-end using samples' genomic tumor purity values as labels.

Training of MIL models

To prepare machine learning datasets, tissue regions inside histopathology slides were detected by applying OTSU thresholding. Over the tissue regions, non-overlapping 512 × 512 patches at 20× zoom level (specimen-level pixel size, 0.5 μm × 0.5 μm) were cropped. During training, we used a bag of patches cropped from a sample's top and bottom slides as the input and the sample's tumor purity value obtained from genomic sequencing data by ABSOLUTE as the ground-truth label (Figure S1). At each epoch, one bag per sample is created on the fly by randomly selecting 200 patches from the sample's patches. We also used the matching normal samples whenever available to enable our model to capture the information related to normal tissue histology. We assigned a tumor purity value of 0.0 to a matching normal sample as the ground-truth label. Note that there were no normal samples in the Singapore cohort. We initialized the models' weights randomly and trained them end-to-end using the ADAM optimizer with a learning rate of 0.0001 and L2 regularization on the weights with a weight decay of 0.0005. The batch size was 1. We used absolute error as the loss function and employed early stopping based on loss in the validation set to avoid overfitting.

Predicting tumor purity of a sample

We created 100 bags for each sample in the test set and obtained tumor purity predictions from the trained model. Each bag was created by randomly selecting 200 patches from the available patches cropped from the sample's (top and bottom) slides. We used the average of 100 predictions as the sample's tumor purity prediction during performance evaluation.

Statistical analysis

We obtained 95% CIs for Spearman's rank correlation coefficients and area under the ROC curves using the percentile bootstrap method. To compare the performance of two methods (our MIL models' predictions and pathologists' percentage tumor nuclei estimates), we used the Fisher's z transformation-based method of Meng et al. on Spearman's rank correlation coefficients and Wilcoxon signed-rank test on absolute error values. All statistical tests were two sided and statistical significance was considered when p < 0.05. We used scipy.stats (v1.4.1) python library for statistical tests.

69 in total

Review 1. Guidelines for Validation of Next-Generation Sequencing-Based Oncology Panels: A Joint Consensus Recommendation of the Association for Molecular Pathology and College of American Pathologists.

Authors: Lawrence J Jennings; Maria E Arcila; Christopher Corless; Suzanne Kamel-Reid; Ira M Lubin; John Pfeifer; Robyn L Temple-Smolkin; Karl V Voelkerding; Marina N Nikiforova
Journal: J Mol Diagn Date: 2017-03-21 Impact factor: 5.568

2. Impact of Tumor Purity on Immune Gene Expression and Clustering Analyses across Multiple Cancer Types.

Authors: Je-Keun Rhee; Yu Chae Jung; Kyu Ryung Kim; Jinseon Yoo; Jeeyoon Kim; Yong-Jae Lee; Yoon Ho Ko; Han Hong Lee; Byoung Chul Cho; Tae-Min Kim
Journal: Cancer Immunol Res Date: 2017-11-15 Impact factor: 11.151

3. Stromal contribution to the colorectal cancer transcriptome.

Authors: Claudio Isella; Andrea Terrasi; Sara Erika Bellomo; Consalvo Petti; Giovanni Galatola; Andrea Muratore; Alfredo Mellano; Rebecca Senetta; Adele Cassenti; Cristina Sonetto; Giorgio Inghirami; Livio Trusolino; Zsolt Fekete; Mark De Ridder; Paola Cassoni; Guy Storme; Andrea Bertotti; Enzo Medico
Journal: Nat Genet Date: 2015-02-23 Impact factor: 38.330

4. The Potential of Digital Image Analysis to Determine Tumor Cell Content in Biobanked Formalin-Fixed, Paraffin-Embedded Tissue Samples.

Authors: Christine Greene; Edwina O'Doherty; Fatima Abdullahi Sidi; Victoria Bingham; Natalie C Fisher; Matthew P Humphries; Stephanie G Craig; Louise Harewood; Stephen McQuaid; Claire Lewis; Jacqueline James
Journal: Biopreserv Biobank Date: 2021-03-29 Impact factor: 2.300

5. AISAIC: a software suite for accurate identification of significant aberrations in cancers.

Authors: Bai Zhang; Xuchu Hou; Xiguo Yuan; Ie-Ming Shih; Zhen Zhang; Robert Clarke; Roger R Wang; Yi Fu; Subha Madhavan; Yue Wang; Guoqiang Yu
Journal: Bioinformatics Date: 2013-11-29 Impact factor: 6.937

6. Quantification of free circulating DNA as a diagnostic marker in lung cancer.

Authors: Gabriella Sozzi; Davide Conte; MariaElena Leon; Rosalia Ciricione; Luca Roz; Cathy Ratcliffe; Elena Roz; Nicola Cirenei; Massimo Bellomi; Giuseppe Pelosi; Marco A Pierotti; Ugo Pastorino
Journal: J Clin Oncol Date: 2003-09-24 Impact factor: 44.544

Review 7. The tumor microenvironment and its role in promoting tumor growth.

Authors: T L Whiteside
Journal: Oncogene Date: 2008-10-06 Impact factor: 9.867

8. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images.

Authors: Gabriele Campanella; Matthew G Hanna; Luke Geneslaw; Allen Miraflor; Vitor Werneck Krauss Silva; Klaus J Busam; Edi Brogi; Victor E Reuter; David S Klimstra; Thomas J Fuchs
Journal: Nat Med Date: 2019-07-15 Impact factor: 53.440

9. Pan-cancer patterns of somatic copy number alteration.

Authors: Travis I Zack; Stephen E Schumacher; Scott L Carter; Andre D Cherniack; Gordon Saksena; Barbara Tabak; Michael S Lawrence; Cheng-Zhong Zhsng; Jeremiah Wala; Craig H Mermel; Carrie Sougnez; Stacey B Gabriel; Bryan Hernandez; Hui Shen; Peter W Laird; Gad Getz; Matthew Meyerson; Rameen Beroukhim
Journal: Nat Genet Date: 2013-10 Impact factor: 38.330

10. Systematic pan-cancer analysis of tumour purity.

Authors: Dvir Aran; Marina Sirota; Atul J Butte
Journal: Nat Commun Date: 2015-12-04 Impact factor: 14.919

1 in total

1. Artificial intelligence-augmented histopathologic review using image analysis to optimize DNA yield from formalin-fixed paraffin-embedded slides.

Authors: Bolesław L Osinski; Aïcha BenTaieb; Irvin Ho; Ryan D Jones; Rohan P Joshi; Andrew Westley; Michael Carlson; Caleb Willis; Luke Schleicher; Brett M Mahon; Martin C Stumpe
Journal: Mod Pathol Date: 2022-10-05 Impact factor: 8.209

1 in total