Xiangxue Wang1, Cristian Barrera1, Kaustav Bera1, Vidya Sankar Viswanathan1, Sepideh Azarianpour-Esfahani1, Can Koyuncu1, Priya Velu2, Michael D Feldman3, Michael Yang4, Pingfu Fu5, Kurt A Schalper6, Haider Mahdi7, Cheng Lu1, Vamsidhar Velcheti8, Anant Madabhushi1,9. 1. Center for Computational Imaging and Personalized Diagnostics, Case Western Reserve University, Cleveland, OH, USA. 2. Department of Pathology and Laboratory Medicine, Weill Cornell Medical College, New York, NY, USA. 3. Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. 4. Department of Pathology, University of Colorado School of Medicine, Aurora, CO, USA. 5. Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA. 6. Department of Pathology, Yale School of Medicine, New Haven, CT, USA. 7. Magee-Womens Hospital and Magee-Womens Research Institute, University of Pittsburgh Medical Center, Pittsburgh, PA, USA. 8. Department of Hematology and Oncology, NYU Langone Health, New York, NY, USA. 9. Louis Stokes Cleveland Veterans Affairs Medical Center, Cleveland, OH, USA.
Abstract
Immune checkpoint inhibitors (ICIs) show prominent clinical activity across multiple advanced tumors. However, less than half of patients respond even after molecule-based selection. Thus, improved biomarkers are required. In this study, we use an image analysis to capture morphologic attributes relating to the spatial interaction and architecture of tumor cells and tumor-infiltrating lymphocytes (TILs) from digitized H&E images. We evaluate the association of image features with progression-free (PFS) and overall survival in non-small cell lung cancer (NSCLC) (N = 187) and gynecological cancer (N = 39) patients treated with ICIs. We demonstrated that the classifier trained with NSCLC alone was associated with PFS in independent NSCLC cohorts and also in gynecological cancer. The classifier was also associated with clinical outcome independent of clinical factors. Moreover, the classifier was associated with PFS even with low PD-L1 expression. These findings suggest that image analysis can be used to predict clinical end points in patients receiving ICI.
Immune checkpoint inhibitors (ICIs) show prominent clinical activity across multiple advanced tumors. However, less than half of patients respond even after molecule-based selection. Thus, improved biomarkers are required. In this study, we use an image analysis to capture morphologic attributes relating to the spatial interaction and architecture of tumor cells and tumor-infiltrating lymphocytes (TILs) from digitized H&E images. We evaluate the association of image features with progression-free (PFS) and overall survival in non-small cell lung cancer (NSCLC) (N = 187) and gynecological cancer (N = 39) patients treated with ICIs. We demonstrated that the classifier trained with NSCLC alone was associated with PFS in independent NSCLC cohorts and also in gynecological cancer. The classifier was also associated with clinical outcome independent of clinical factors. Moreover, the classifier was associated with PFS even with low PD-L1 expression. These findings suggest that image analysis can be used to predict clinical end points in patients receiving ICI.
Programmed cell death protein 1 (PD-1)/Programmed death-ligand 1 (PD-L1) axis inhibitors have revolutionized the treatment paradigm across multiple cancers, showing remarkable improvement in mortality in melanoma (), lung (), head and neck (), and gynecologic cancers (), among others. In metastatic non–small cell lung cancer (NSCLC), immune checkpoint inhibitors (ICIs) with or without chemotherapy are now a first-line treatment regimen. ICIs offer a drastic improvement in overall and progression-free survivals (OS and PFS) with minimal treatment-related toxicities (, ). In metastatic NSCLC patients with high PD-L1 expression (>50%), where pembrolizumab (a PD-1 inhibitor) is first-line therapy, significant improvements in OS as compared to chemotherapy alone [hazard ratio (HR), 0.63; 95% confidence interval (CI), 0.47 to 0.86; P = 0.002] (, ) have been reported. Despite robust clinical benefit, one of the caveats is that even when patient selection is driven by PD-L1 expression, treatment response rates ranged from 27 to 45% in the first-line setting (27% in PD-L1 > 1% and 45% in PD-L1 > 50% subgroup of KEYNOTE-024) and 19% in second-line refractory setting (, –).In gynecological cancers (including ovarian, cervical, and uterine cancers) (), preliminary studies with PD-L1 expression–based patient selection have also shown that ICI improved survival and time to progression in patients () with metastatic disease, but with similar limited response rates as in NSCLC. These findings suggest that tissue-derived PD-L1 expression is not an optimal biomarker to identify patients likely to respond to ICI therapy in any of these cancers. This could be due to the lack of an accepted standard in determining PD-L1 expression levels. Other molecular biomarkers, like TMB (tumor mutational burden) or microsatellite instability, have been explored in recent clinical trials to determine the association with clinical outcomes in different tumors (–). However, recent studies suggested that high TMB does not always predict response (, ), and some studies (, ) have emphasized the importance of additional biomarkers that capture the complexity of tumor immune microenvironment. Thus, there is an urgent need to identify novel biomarkers associated with likelihood of response to ICI, helping to identify those patients who are less likely to respond to ICI and hence might benefit from alternative therapeutic strategies ().ICI agents rely on augmenting the body’s immune response by one or a combination of the following proposed mechanisms: (i) inhibiting co-repressive signals on T lymphocytes during the early immune response (anti–CTLA-4) in lymph nodes, (ii) antagonizing PD-1 receptor signaling in T lymphocytes during the effector phase (anti–PD-1) in peripheral tissues, or (iii) neutralizing PD-L1 (anti–PD-L1) expressed on the surface of tumor cells in peripheral tissues. One of the hallmarks of cellular immune response for solid tumors is the presence of tumor-infiltrating lymphocytes (TILs) in the tumor microenvironment (TME). Studies have shown that TIL density is strongly prognostic of response in a variety of solid tumors, irrespective of the treatment type (). TIL density has been shown to be a predictor of ICI response as well, with a recent study () showing that density, as measured by immunohistochemistry (IHC) of lymphocytes present on the invasive margin of melanomas instead of more central lymphocytic infiltration, can predict anti–PD-1 response. To date, TIL analysis has focused largely on pathologist-defined visual TIL density estimation based on a few high-power fields viewed under the microscope (–). This manual TIL estimate suffers from the subjectivity and low consistency between evaluators (). Computer-assisted TIL estimation has been explored based on hematoxylin and eosin (H&E) to decrease the subjectivity (). Moreover, recent studies have shown that density and arrangement patterns of TILs on H&E images or using quantitative multiplex immunofluorescence are associated with risk of recurrence and outcomes in NSCLC and breast cancer (, –). While recent studies have reported that neural network–based approaches could predict the treatment response in cancer patients treated with immunotherapy (, ), the end-to-end deep learning models tend to lack interpretability, which, in turn, could make it harder to achieve generalizable results on new, unseen, external datasets. A number of findings from our group (, ) as well as of others (, ) have emphasized the importance of biologically inspired feature related to spatial heterogeneity and complex interactions between TME and immune microenvironment for predicting clinically relevant outcomes in ICI-treated cancer patients.In this work, we present an automated image classifier that not only computationally enumerates TIL density but also characterizes the spatial architecture and arrangement of TILs to predict clinical outcomes in ICI-treated cancer patients (Fig. 1). Specifically, we aimed to capture quantitative image biomarkers relating to the spatial interaction between immune and tumor cells in the TME (i.e., epithelium and stroma) and subsequently show the association of these image biomarkers against clinical outcomes for ICI (monotherapy and combination of ICI agents)–treated patients with different cancer types [NSCLC and gynecological cancers (Fig. 2)]. The association of these image biomarkers against clinical outcome in ICI-treated patients was also compared against manual TIL estimation and also evaluated in the context of cancer patients with low PD-L1 expression.
Fig. 1.
Consort flow diagram of patient selection.
Patient selection from four different NSCLC cohorts (D1–4) and a gynecological cancer cohort (D5).
Fig. 2.
Flowchart of high-level workflow.
Entire workflow comprises (A) tissue preparation and digitization, (B) tile processing from annotation, (C) region and cell segmentation, (D) image analysis and feature extraction, (E) predictive model construction and validation, and (F) cohort overview.
Consort flow diagram of patient selection.
Patient selection from four different NSCLC cohorts (D1–4) and a gynecological cancer cohort (D5).
Flowchart of high-level workflow.
Entire workflow comprises (A) tissue preparation and digitization, (B) tile processing from annotation, (C) region and cell segmentation, (D) image analysis and feature extraction, (E) predictive model construction and validation, and (F) cohort overview.
RESULTS
HistoTIL predicts response to ICI
Feature discovery on D1 NSCLC cohort revealed a total of seven HistoTIL features (intersected area between non-TIL nuclei and TIL graphs, nuclei shape descriptors in epithelium area, and distance statistics between non-TIL nuclei and TIL graph) that were significantly associated with OS (P < 0.001) and PFS (P = 0.003). These included four features from the non-TIL nuclei and three from the interplay between non-TIL nuclei and TILs (Fig. 3; detailed feature description and distribution in table S1 and fig. S1). In predicting Response Evaluation Criteria in Solid Tumours (RECIST) as a binary clinical outcome, the linear model achieved 0.71 area under the curve (AUC) value on NSCLC training set D1; 0.73, 0.62, and 0.73 on independent NSCLC validation cohort D2–4; and 0.68 on gynecological cohort D5, respectively.
Fig. 3.
Visualization of features relating to TIL–non-TIL interactions.
A high-risk example is found on the left panel. On the right panel is a low-risk sample. (A and B) The whole-slide image (WSI) is depicted with a green-marked line surrounding the tissue that was indicated as sufficient for processing. (C and D) Two small panels indicate the grouping behavior of the lymphocyte and nonlymphocyte cells by means of a two-dimensional plot with both cell groups intertwined, where the numbers in red indicate the main panel it is coming from. (E and F) A zoomed-in region of interest. The groups of cells form distinctive clusters. The clusters are connected through colored lines, which indicate the distance between lymphocytes (green line), nonlymphocytes (orange line), cluster of lymphocytes (cyan line), and cluster of nonlymphocytes (yellow line). Descriptors are calculated that capture general architectural structure and immune-cell interplay. (G and H) Texture-based feature related to the heatmap of the nuclei and surrounding tissue for the low- and high-risk cases. Zernike polynomials and Haralick descriptors are calculated from the shape and texture of the cell.
Visualization of features relating to TIL–non-TIL interactions.
A high-risk example is found on the left panel. On the right panel is a low-risk sample. (A and B) The whole-slide image (WSI) is depicted with a green-marked line surrounding the tissue that was indicated as sufficient for processing. (C and D) Two small panels indicate the grouping behavior of the lymphocyte and nonlymphocyte cells by means of a two-dimensional plot with both cell groups intertwined, where the numbers in red indicate the main panel it is coming from. (E and F) A zoomed-in region of interest. The groups of cells form distinctive clusters. The clusters are connected through colored lines, which indicate the distance between lymphocytes (green line), nonlymphocytes (orange line), cluster of lymphocytes (cyan line), and cluster of nonlymphocytes (yellow line). Descriptors are calculated that capture general architectural structure and immune-cell interplay. (G and H) Texture-based feature related to the heatmap of the nuclei and surrounding tissue for the low- and high-risk cases. Zernike polynomials and Haralick descriptors are calculated from the shape and texture of the cell.
Independent prediction of survival
On D1, HistoTIL yielded an HR = 5.47 (95% CI, 1.77 to 16.9; P < 0.001) and HR = 4.63 (95% CI, 1.38 to 11.8; P = 0.003) (Fig. 4) for OS and PFS, respectively. On (D2–4) NSCLC validation set, the performance of HistoTIL in predicting OS and PFS was HR = 3.29 (95% CI, 1.32 to 8.16; P = 0.042) and HR = 2.45 (95% CI, 1.00 to 6.00; P = 0.031) on D2, HR = 2.03 (95% CI, 1.08 to 3.82; P = 0.027) and HR = 2.43 (95% CI, 1.04 to 5.65; P = 0.018) on D3, and HR = 2.58 (95% CI, 1.02 to 6.49; P = 0.009) and HR = 3.05 (95% CI, 1.07 to 8.69; P = 0.001) on D4. HistoTIL was also evaluated in terms of association with OS and PFS in gynecological cancers (D5) with HR = 2.29 (95% CI, 1.15 to 4.58; P = 0.0037) and HR = 1.93 (95% CI, 1.10 to 3.38; P < 0.043) for OS and PFS, respectively. The differences in the median OS between the high-risk and low-risk groups as determined by HistoTIL were 14.2 and 34.6 months for D2, 15.7 and 23.3 months for D3, and 12.7 and 22.6 months for D4 (all in lung cancer) and 6.5 and 14.2 months for D5 in gynecological cancers, respectively. Similarly, the differences in the median PFS between the two HistoTIL risk groups were 3.1 and 10.6 months for D2, 2.7 and 8.5 months for D3, and 3.4 and 11.2 months for D4 in NSCLC and 2.5 and 9.1 months for D5 in gynecological cancers.
Fig. 4.
Forest plot for individual cohort.
Survival analysis with OS (A) and PFS (B) between HistoTIL-defined low- and high-risk groups.
Forest plot for individual cohort.
Survival analysis with OS (A) and PFS (B) between HistoTIL-defined low- and high-risk groups.Moreover, on multivariable analysis and after adjusting for the effects of clinicopathological variables, HistoTIL was found to be independently prognostic on the NSCLC datasets (D1: HR = 1.38; 95% CI, 1.04 to 1.68; P = 0.012; D2: HR = 1.69; 95% CI, 1.01 to 3.03; P = 0.049; D3: HR = 1.21; 95% CI, 1.01 to 1.44; P = 0.038; D4: HR = 5.25; 95% CI, 1.65 to 16.78; P = 0.005) and for gynecological cancers (D5 cervical, n = 10; ovarian, n = 14; and endometrial, n = 25; HR = 5.22; 95% CI, 1.24 to 21.94; P = 0.024; Table 1). Smoking status was also found to be significant in multivariable analysis (never smoker versus previous/current smoker: HR = 0.271; 95% CI, 0.10 to 0.78; P = 0.015) but only in D3. None of the other clinicopathological covariates were found to be significantly associated with OS.
Table 1.
Multivariable analysis of OS with HistoTIL and clinical variables.
ADC, adenocarcinoma; SCC, squamous cell carcinoma; values in bold are statistically significant by two-tailed test (P < 0.05).
Characteristics
Hazard ratio
95% CI
P
D1 (lung)
Gender
0.528
0.096–2.894
0.462
Male
Female
Smoking status
0.463
0.049–4.396
0.503
Former/current
Never
Histology subtypes
0.347
0.023–5.187
0.443
Adenocarcinoma
SCC
Clinical stage
3.646
0.292–45.484
0.315
III
IV
Treatment
0.441
0.020–9.566
0.602
Mono
Combo
TIL counting
1.385
0.231–8.307
0.721
Low
High
Risk scores
1.397
1.071–1.822
0.014
D2 (lung)
Gender
0.342
0.056–2.083
0.245
Male
Female
Smoking status
0.126
0.010–1.531
0.104
Former/current
Never
Histology subtypes
0.203
0.017–2.488
0.212
Adenocarcinoma
SCC
Clinical stage
3.437
0.179–66.055
0.413
III
IV
Risk scores
1.685
1.011–3.029
0.049
D3 (lung)
Gender
2.334
0.746–7.306
0.145
Male
Female
Smoking status
0.287
0.840–0.098
0.023
Former/current
Never
Histology subtypes
0.849
0.19–3.64
0.826
Adenocarcinoma
SCC
Clinical stage
13.023
0–Inf.
0.994
III
IV
Treatment
0.802
0.246–2.614
0.715
Mono
Combo
PD-L1
0.593
0.200–1.755
0.345
Low
High
Risk scores
1.211
1.013–1.447
0.035
D4 (lung)
Gender
1.558
0.744–3.264
0.240
Male
Female
Smoking status
1.025
0.324–3.241
0.967
Former/current
Never
Histology subtypes
0.985
0.409–2.373
0.972
Adenocarcinoma
SCC
Clinical stage
2.129
0.553–8.194
0.272
III
IV
Treatment
1.252
0.311–5.046
0.752
Mono
Combo
Risk scores
5.253
1.645–16.775
0.005
D5 (gynecology)
Clinical stage
1.377
0.312–6.078
0.673
III
IV
Histology subtypes
Endometrioid
Reference
Reference
Reference
Serous
0.382
0.063–2.315
0.295
SCC
1.980
0–Inf.
1
Clear cell
3.544
0.817–15.369
0.091
Primary site
Ovarian
Reference
Reference
Reference
Endometrial
1.031
0.238–4.472
0.967
Cervix
1.741
0–Inf.
1
IO type
0.081
0–Inf.
0.983
Mono
Combo
Risk score
5.217
1.240–21.943
0.024
Multivariable analysis of OS with HistoTIL and clinical variables.
ADC, adenocarcinoma; SCC, squamous cell carcinoma; values in bold are statistically significant by two-tailed test (P < 0.05).
HistoTIL associated with OS and PFS independent of type of ICI agent
For NSCLC, HistoTIL predicted that low-risk groups had significantly longer survival time for patients who received nivolumab (OS: HR = 2.08; 95% CI, 1.27 to 3.41; P < 0.001; PFS: HR = 2.04; 95% CI, 1.34 to 3.10; P < 0.001) and pembrolizumab (OS: HR = 3.19; 95% CI, 1.06 to 9.61; P = 0.041; PFS: HR = 5.34; 95% CI, 1.80 to 15.90; P < 0.001) compared to high-risk groups. In the gynecological cohort, patients who received pembrolizumab had significant survival differences (OS: HR = 4.65; 95% CI, 1.08 to 19.9; P = 0.023; PFS: HR = 3.55; 95% CI, 1.00 to 12.60; P = 0.008) in HistoTIL-predicted risk groups. While patients treated with other ICI agents, which included fewer patients, did not show significant survival difference in low- and high-risk groups (nivolumab, OS: HR = 2.12; 95% CI, 0.67 to 6.75; P = 0.180; PFS: HR = 2.55; 95% CI, 0.65 to 10; P = 0.088), the effect size (HR) shows a longer survival trend for low-risk groups (fig. S2).
Comparison with PD-L1 expression and TIL estimation
While there was no significant survival difference between low and high HistoTIL-defined risk groups in PD-L1–high group (OS: HR = 2.03; 95% CI, 0.59 to 6.90; P = 0.166; PFS: HR = 0.91; 95% CI, 0.25 to 3.40; P = 0.896; Fig. 5, A and C), there was a significant survival advantage in HistoTIL-defined low-risk group for PD-L1 low expressed patients for both OS and PFS on D3 (HR = 2.51; 95% CI, 1.15 to 5.47; P = 0.017; HR = 2.59; 95% CI, 1.35 to 4.99; P = 0.016; Fig. 5, B and D). However, pathologists’ TIL grading failed to stratify patients based on survival in D1; there was also no correlation between manual TIL grading with HistoTIL-predicted risk groups (Fig. 6).
Fig. 5.
Survival analysis of PD-L1 expression.
The Kaplan-Meier survival analysis of OS in high–PD-L1 (A) and low–PD-L1 (B) expression group and PFS in high–PD-L1 (C) and low–PD-L1 (D) expression group on D3.
Fig. 6.
Survival analysis of manual TIL grading and comparing with HistoTIL.
Kaplan-Meier survival analysis of (A) manual TIL grading with OS, (B) manual TIL grading with PFS, and (C) manual TIL grading with OS, three stratified groups. (D) Correlation between manual grading and HistoTIL prediction.
Survival analysis of PD-L1 expression.
The Kaplan-Meier survival analysis of OS in high–PD-L1 (A) and low–PD-L1 (B) expression group and PFS in high–PD-L1 (C) and low–PD-L1 (D) expression group on D3.
Survival analysis of manual TIL grading and comparing with HistoTIL.
Kaplan-Meier survival analysis of (A) manual TIL grading with OS, (B) manual TIL grading with PFS, and (C) manual TIL grading with OS, three stratified groups. (D) Correlation between manual grading and HistoTIL prediction.
DISCUSSION
Recent advances in computational power and the advent of whole-slide digitization have led to several studies investigating quantitative approaches to interrogate the TME on high-resolution histology images (, ). Research has focused on leveraging artificial intelligence and digital pathology approaches toward answering clinically relevant questions related to detection, diagnosis, prognosis, and treatment response (). Broadly speaking, computational pathology–based approaches have considered either handcrafted features based on predefined image patterns such as collagen orientation () or spatial architecture of cancer cells (). Another category of computational pathology approaches involves deep learning (). These are neural network–based unsupervised feature generation strategies that aim to directly capture patterns from digitized histopathology images without typically requiring specification of predefined image motifs. Deep learning approaches like handcrafted approaches () have been shown to be able to predict underlying molecular signatures of the tumor () and have also been shown to be associated with disease outcome and treatment response (). While deep learning has achieved great performances in many medical image tasks recently, its low interpretability could prevent its wide application in the clinical setting. A recent example of the failure of deep learning approaches in real-world clinical settings can be seen in applications for coronavirus disease 2019 (COVID-19) (). Domain-inspired handcrafted approaches that are human explainable may be an important prerequisite for oncologists in guiding management of ICI-treated patients (, ). AbdulJabbar et al. () combined deep learning with RNA sequencing data to reveal the spatial immune landscape of NSCLC, identifying an image signature that was prognostic of recurrence. Beck et al. () previously reported on stromal morphologic features and identified their prognostic association in breast cancer. These previous studies have suggested the importance of capturing quantitative and interpretable domain-inspired attributes of the disease; however, these studies were primarily in the context of disease prognosis and outcome prediction. It seems logical, therefore, that these interpretable approaches will play an even more significant role in helping clinicians make treatment decisions and in assessing therapeutic benefit.In this work, we presented HistoTIL, a computational-based assay that spatially characterizes TILs in tumor specimens to predict response to ICI in patients with NSCLC and gynecological cancer. HistoTIL is composed of a set of quantitative image descriptors that characterize local TIL–non-TIL interactions within the separate stromal and epithelial compartments of the TME. This is different from measures of TIL density or manual TIL grading that result in a single number that express the average abundance of lymphocytes over an entire slide. In high-risk patients who showed poor response to ICI agents (as determined by HistoTIL), the cancer nuclei clusters dominated the TME and appeared more chaotic, with only limited immune cell infiltration. However, in the ICI responders identified by HistoTIL, there was a preponderance of TIL clusters that appeared to be encasing the cancer nuclei (Fig. 3), possibly hinting at how the ICI agents require the recruitment of the body’s immune defenses to fight against cancer (). This is further strengthened by having four completely blinded validation sets from two different cancer types, as HistoTIL was only exposed to a single institution dataset for training. This is also advantageous because while TILs have recently been shown to be prognostic in various cancer types (, ), their prognostic significance is plagued by the wide variability in pathologist or manual estimation of TILs between different pathologists (). Moreover, in comparison with current RECIST-based response evaluation, HistoTIL could also predict short-term clinical outcomes across different datasets (Fig. 7). HistoTIL was shown to be associated with survival for NSCLC patients treated with nivolumab and pembrolizumab and gynecological cancer treated with pembrolizumab (Fig. 8 and fig. S2). HistoTIL was also associated with OS in the individual cancer types under gynecological cancer (cervical, n = 10; ovarian, n = 14; and endometrial, n = 25). HistoTIL was associated with clinical outcome independent of specific ICI agent, likely because features relating to immune cell and cancer cell interplay on H&E images are a reflection of tumor biology and disease aggressiveness. Independent of the type of ICI agent used or disease indication, HistoTIL-predicted low-risk patients had a relatively lower hazard of disease progression or death compared to high-risk patients.
Fig. 7.
Prediction of clinical response by HistoTIL.
Receiver operating characteristic (ROC) analysis for HistoTIL in predicting clinical response by RECIST among five independent cohorts.
Fig. 8.
Survival analysis of NSCLC among ICI agents.
Kaplan-Meier OS assessment among ICI agents (nivolumab, pembrolizumab, and combination therapy) for patients with NSCLC with OS (A to C) and PFS (D to F).
Prediction of clinical response by HistoTIL.
Receiver operating characteristic (ROC) analysis for HistoTIL in predicting clinical response by RECIST among five independent cohorts.
Survival analysis of NSCLC among ICI agents.
Kaplan-Meier OS assessment among ICI agents (nivolumab, pembrolizumab, and combination therapy) for patients with NSCLC with OS (A to C) and PFS (D to F).HistoTIL could also significantly stratify two risk groups within the PD-L1 low expressed cohorts for both OS and PFS (HR = 2.2; 95% CI, 0.97 to 4.95; P = 0.01; HR = 2.59; 95% CI, 1.35 to 4.99; P = 0.01; Fig. 5, B and D). This finding could potentially suggest that HistoTIL-identified low-risk patients with PD-L1 low expression, typically recommended for ICI-chemo combination therapy currently, might be eligible for ICI monotherapy directly. Manual TIL grading of the slides by pathologists (N = 26) did not result in significant stratification of patients into different survival groups in D1; the high-TIL groups even presented with longer survival trend after ICI (Fig. 6). This might be due to the lack of standard agreement of manual grading and intra-observer variation, which has been previously reported ().Recently, Hu et al. () showed a deep learning model to predict anti–PD-1 response in melanoma (N = 54) and lung cancer (N = 55) (49). While Hu et al. () also found that the model trained on one tumor type could potentially apply to another tumor type for predicting RECIST-defined response, our results were validated with larger set (N = 226) with more consistent performance (i.e., prognostic of survival and predictive of RECIST) among different cancers treated with different ICI agents. Moreover, post hoc analysis of deep learning approaches usually requires the dividing of whole-slide images (WSIs) into smaller image tiles (256 × 256) (), which can dilute and cause loss of information related to spatial heterogeneity within TME (, ). Moreover, the deep learning model was trained on ImageNet, and hidden features were extracted with further principal components analysis (PCA) projection () for response prediction, which further added another layer of opacity for explaining the result. In contrast, HistoTIL achieved higher and more robust accuracy among different tumor types using human explainable features. A recent surge of applying deep learning on histopathology images ranged from object (i.e., tumor and cell) detection and classification () to prognosticating disease-related survival and response to different treatment options (, ). However, the nature of deep learning–based classifiers makes it difficult to fully interpret the basis of the predictions. It has also been suggested that the opacity associated with the deep learning–based predictions could be an impediment from deploying them in a clinical setting for treatment management–based decision-making (, ). Moreover, studies have indicated that biologically inspired image features could provide more transparent and interpretable options in the clinical setting (). Hence, our HistoTIL approach leverages deep learning for object detection and segmentation tasks but subsequently uses interpretable and engineered features to capture and describe the tumor-immune interaction.However, we do acknowledge some limitations of our work. This work was limited by the retrospective nature of data collection. Future work will involve validating HistoTIL on completed clinical trial datasets of cancer patients treated with ICI and also evaluating HistoTIL in a prospective setting. Such a prospective clinical trial validation would also allow for head-to-head comparisons with currently used patient selection biomarkers including PD-L1 and TMB. We also acknowledge that the approach invoking five large image patches for analysis could bias the approach when only smaller samples are available for interrogation.In conclusion, we presented a computational method to prognosticate survival in advanced NSCLC and gynecological patients treated with multiple ICI. Our approach, HistoTIL, was trained on a dataset from a single institution and independently validated on datasets from two different institutions for predicting both OS and PFS just based on H&E WSIs. In addition to being robust to external multisite validation, HistoTIL also uncovered the prognostic significance of the spatial relationships and structural interplay between TILs and tumor cell nuclei in determining response to ICI agents that modulate the body’s immune system. Last, HistoTIL is tissue nondestructive, based on H&E alone, and thus much less time consuming and inexpensive, as it eliminates the need for tissue to be processed and/or analyzed with expensive molecular methods.
MATERIALS AND METHODS
Dataset and image acquisition
H&E-stained pretreatment tumor specimens of patients with advanced NSCLC and gynecological cancers who received ICIs were collected from four independent institutions for this study. The ICI treatment response was evaluated independently by response criteria used for solid tumors (RECIST 1.1) (). All retrospective analyses were conducted under a protocol approved by the institutional review board (IRB), and the requirement of patient informed consent was waived by IRB. From the University of Pennsylvania Hospital, 42 patients diagnosed with NSCLC between 2008 and 2016 with follow-up information were collected. Tissue quality evaluation was performed at the pathology laboratory to exclude tissue with major artifacts (i.e., tissue folding, blur, staining issues, and anthracosis), and only samples with sufficient tissue (accommodating at least five image patches of size 2000 × 2000 pixels) and complete treatment and follow-up information were included in this study. This process rendered 26 patients to construct the learning set (D1). In addition, we collected 47 (chart review between 2005 and 2016), 81 (chart review between 2009 and 2017), and 85 (chart review between 2011 and 2017) NSCLC patients from the Cleveland Clinic, Yale University, and University Hospital Cleveland Medical Center, respectively. Forty-nine patients with gynecological cancer (cervical, n = 10; ovarian, n = 14; and endometrial, n = 25) were collected from the Cleveland Clinic. After applying the same inclusion and exclusion criteria, we obtained independent validation datasets D2 (N = 30), D3 (N = 63), D4 (N = 68), and D5 (N = 39), respectively (Fig. 1). Detailed demographics including the histology subtypes and ICI agents are available in Table 2.
Table 2.
Demographic and characteristics of the patients.
D1
D2
D3
D4
P
D5
Lung
Gender
Gynecology
Primary site
Ovarian
12
Male
9
19
36
32
0.128
Endometrial
20
Female
17
11
27
36
Cervix
7
Histology
Histology subtypes
ADC
19
20
47
52
<0.001
Endometrioid
12
SCC
7
7
12
13
Serous
14
Others
0
3
4
3
SCC
7
Smoking status
Clear cell
4
Unknown
2
Former/current
21
25
51
61
0.552
Clinical stage
Never smoker
5
5
12
7
III
27
Clinical stage
IV
12
III
7
5
2
11
0.062
IV
19
25
61
57
Treatment
Treatment
Nivo
13
30
43
39
<0.001
Nivo
16
Pembro
7
0
4
22
Pembro
20
Atezo
4
0
6
0
Atezo
2
Combo
2
0
10
7
Combo
1
Age (years)
Age (years)
Mean ± SD
65.2 ± 10.1
N.A.
68.8 ± 10.5
63.3 ± 12.2
0.015
Mean ± SD
67.6 ± 12.1
Systematic treatment before ICI
Systematic treatment before ICI
Yes
22
N.A.
53
58
0.99
Yes
35
No
4
10
10
No
4
H&E slides from D1–2 were scanned with Philips IntelliSite Pathology Solution (software version: Philips DP v1.0) at ×40 magnification. An Aperio CS2 tissue scanner (software version: Aperio Image Library v10.2.41) was used at D3–5 to scan slides at ×20 magnification. To compensate for the variations during the tissue imaging process among different scanners and institutions, an open source software, HistoQC (), was applied to identify digitized tissue slide images eligible for downstream analysis. Expert pathologists manually annotated the tumor region on digitized H&E slides. All the subsequent image analyses were conducted at ×20 magnification (images were down-sampled from ×40 to ×20) from the annotated regions (see Fig. 2 for flowchart illustrating the methodology).
Image analysis: Nuclei detection and region separation
For segmentation of the nuclei, each WSI was divided into a set of adjacent 2000 × 2000 pixel tiles. A U-Net style deep convolutional neural network (CNN) at ×20 magnification was used for automated nuclei segmentation (). Following nuclear segmentation, a previously reported lymphocyte detection approach () was used to identify TILs and cancer cell nuclei. Another U-Net style CNN model was used to specifically differentiate and identify epithelial and stromal regions at ×5 (see the Supplementary Materials for more details).
Quantitative histomorphometric features
A total of 753 quantitative histomorphometric (QH) features of two main categories were extracted: nuclei-based features and TIL-related features, from regions of stroma, epithelium, and the entire annotated tumor (table S2).Nuclear features were extracted from within the pixels of nuclei regions. The four types of features relating to nuclei were as follows: shape descriptors, orientation entropy, image texture, and local contextual. Shape features included statistical measurements of the nuclei including nuclear area, perimeter, and eccentricity to mathematical operator like Zernike shape descriptors (). Orientation entropy measures the randomness of cell directionality. Nuclear texture quantitatively characterizes the local pixel distribution pattern in the nuclei and is prognostic among different cancer types. Last, local contextual features capture the local neighboring and topological information of individual nuclei.TIL-related features were extracted from the different tissue regions (i.e., epithelium and stroma) and included enumeration of TILs and determination of the architecture and spatial interaction of TILs and cancer cells. Previous work () has shown that a densely packed cluster of these two cell types has a poorer prognosis than a sparse arrangement in early-stage NSCLC. The extent of TIL infiltration was determined through the counts and density of TILs by automated detection (). Aspects of spatial arrangement were measured by first assigning TILs and non-TIL cells into neighboring clusters (see the “Graph construction” section in the Supplementary Materials) and then characterizing relative abundance and spatial closeness of TILs and non-TIL cells within the cluster as quantitative image features. The variations in cell composition (i.e., ratio of number of TILs to non-TIL nuclei) among each neighboring cluster were also captured to reflect the heterogeneity within TME. These patterns quantitatively describe the histomorphometry (i.e., shape, staining, and local density) among neighboring cells and, in turn, reflect the complex interaction within the tumor immune microenvironment. Details of each feature description are included in table S2.
HistoTIL: An image-based digital risk score
The top-performing QH features were selected from a total of 753 features by applying an elastic net regularized Cox proportional hazard model on D1. Tenfold cross-validation was used to select the regularization parameter (lambda) by grid search, and the best penalty hyperparameter (alpha) controlling the ratio between L1 and L2 was found by changing the value from 0 to 1 with step size of 0.1. The selected features were used to build another Cox proportional hazard model to learn the coefficient for each of the selected features on D1. Survival indices (risk score) were then calculated by a linear combination of selected features and learned coefficients. The median index from the training set D1 was picked up as the threshold to divide the patients into low and high risk based on OS and PFS. D1 to D4 were entirely composed of metastatic NSCLC patients. To evaluate the robustness of HistoTIL, another independently collected set (D5) of 39 gynecological cancer patients was used for validation using the same cutoff value derived from D1. We evaluated the association of HistoTIL against PFS and OS on independent sets D2, D3, D4, and D5.
Statistical analysis
OS was measured from the date of initiation of ICI to the date of death and censored at the date of last follow-up for survivors. PFS was calculated from the date of initiation of ICI to the date of recurrence or death, whichever occurred earlier, and censored at the date of last follow-up for those still alive without recurrence. The HistoTIL classifier assigned every patient to either a low- or high-risk survival group based on the median risk score learned from training set D1. Kaplan-Meier survival curves were obtained to visualize the differences in sets D2, D3, D4, and D5. The log-rank test was then used to evaluate the prognostic performance of HistoTIL in predicting PFS and OS on validation sets. HR between survival groups with 95% CIs and P values were calculated. Multivariable Cox analysis was conducted to assess the relationships between the various covariates and survival while adjusting for baseline factors (i.e., tumor subtypes and pathological stage). The Fisher exact test was used to test the difference of demographics in NSCLC patients across the training and validation datasets. All statistical tests were two sided with significance level set at 0.05. In addition, we also built a logistic regression model in conjunction with the same HistoTIL features identified from D1 to predict patients’ response versus nonresponse of ICI. While the response category included both complete response (CR) and partial response (PR), the nonresponse category included progressive disease (PD) and stable disease (SD) defined by RECIST (). The response prediction was evaluated by AUC of the receiver operating characteristics (ROCs).
ICI agent separated analysis
To evaluate whether HistoTIL could stratify ICI-treated patients into low- and high-risk categories independent of specific agents (i.e., nivolumab, N = 141; pembrolizumab, N = 53; atezolizumab, N = 12; and combination of agents, N = 20), we further conducted the Kaplan-Meier survival analysis for each subset of patients treated with different ICI agents. The same cutoff value of HistoTIL learned from D1 was used to identify patients into low- and high-risk survival groups.
Comparison with PD-L1 expression and manual TIL estimation
HistoTIL was also compared with the current PD-L1 expression–based patient selection strategy, especially to evaluate the performance of HistoTIL in the low–PD-1 expression group. D3 with available IHC PD-L1 expression data was used to validate the HistoTIL prediction of clinical outcome after ICI in PD-L1 low and high groups separately (PD-L1 cutoff was empirically determined at 1% expression). Expert thoracic pathologists with experience in TIL grading were involved in deciding the level of lymphocyte infiltration for each WSI in D1. Four available categories were provided to the pathologists for TIL grading: no infiltration (visual absence of TILs from the entire slide), low TILs (1 to 33% TILs were found from tumor area), moderate (34 to 66% tumor region were found with TILs), and high (more than 66% of tumor bed was infiltrated with lymphocytes). The same Kaplan-Meier survival analysis was conducted to compare the survival difference between manually defined low- and high-TIL groups.
Authors: D J McGrail; P G Pilié; N U Rashid; L Voorwerk; M Slagter; M Kok; E Jonasch; M Khasraw; A B Heimberger; B Lim; N T Ueno; J K Litton; R Ferrarotto; J T Chang; S L Moulder; S-Y Lin Journal: Ann Oncol Date: 2021-03-15 Impact factor: 32.976
Authors: Germán Corredor; Xiangxue Wang; Yu Zhou; Cheng Lu; Pingfu Fu; Konstantinos Syrigos; David L Rimm; Michael Yang; Eduardo Romero; Kurt A Schalper; Vamsidhar Velcheti; Anant Madabhushi Journal: Clin Cancer Res Date: 2018-09-10 Impact factor: 12.531
Authors: Paul C Tumeh; Christina L Harview; Jennifer H Yearley; I Peter Shintaku; Emma J M Taylor; Lidia Robert; Bartosz Chmielowski; Marko Spasic; Gina Henry; Voicu Ciobanu; Alisha N West; Manuel Carmona; Christine Kivork; Elizabeth Seja; Grace Cherry; Antonio J Gutierrez; Tristan R Grogan; Christine Mateus; Gorana Tomasic; John A Glaspy; Ryan O Emerson; Harlan Robins; Robert H Pierce; David A Elashoff; Caroline Robert; Antoni Ribas Journal: Nature Date: 2014-11-27 Impact factor: 49.962
Authors: Nicolas Coudray; Paolo Santiago Ocampo; Theodore Sakellaropoulos; Navneet Narula; Matija Snuderl; David Fenyö; Andre L Moreira; Narges Razavian; Aristotelis Tsirigos Journal: Nat Med Date: 2018-09-17 Impact factor: 53.440
Authors: Sylvia Adams; Robert J Gray; Sandra Demaria; Lori Goldstein; Edith A Perez; Lawrence N Shulman; Silvana Martino; Molin Wang; Vicky E Jones; Thomas J Saphner; Antonio C Wolff; William C Wood; Nancy E Davidson; George W Sledge; Joseph A Sparano; Sunil S Badve Journal: J Clin Oncol Date: 2014-09-20 Impact factor: 44.544
Authors: Khalid AbdulJabbar; Shan E Ahmed Raza; Rachel Rosenthal; Mariam Jamal-Hanjani; Selvaraju Veeriah; Ayse Akarca; Tom Lund; David A Moore; Roberto Salgado; Maise Al Bakir; Luis Zapata; Crispin T Hiley; Leah Officer; Marco Sereno; Claire Rachel Smith; Sherene Loi; Allan Hackshaw; Teresa Marafioti; Sergio A Quezada; Nicholas McGranahan; John Le Quesne; Charles Swanton; Yinyin Yuan Journal: Nat Med Date: 2020-05-27 Impact factor: 53.440