Literature DB >> 35026041

Differentiation between immune checkpoint inhibitor-related and radiation pneumonitis in lung cancer by CT radiomics and machine learning.

Jun Cheng1,2,3, Yi Pan4, Wei Huang4, Kun Huang5,6, Yanhai Cui7, Wenhui Hong1, Lingling Wang1, Dong Ni1,2,3, Peixin Tan4.   

Abstract

PURPOSE: Consolidation immunotherapy after completion of chemoradiotherapy has become the standard of care for unresectable locally advanced non-small cell lung cancer and can induce potentially severe and life-threatening adverse events, including both immune checkpoint inhibitor-related pneumonitis (CIP) and radiation pneumonitis (RP), which are very challenging for radiologists to diagnose. Differentiating between CIP and RP has significant implications for clinical management such as the treatments for pneumonitis and the decision to continue or restart immunotherapy. The purpose of this study is to differentiate between CIP and RP by a CT radiomics approach.
METHODS: We retrospectively collected the CT images and clinical information of patients with pneumonitis who received immune checkpoint inhibitor (ICI) only (n = 28), radiotherapy (RT) only (n = 31), and ICI+RT (n = 14). Three kinds of radiomic features (intensity histogram, gray-level co-occurrence matrix [GLCM] based, and bag-of-words [BoW] features) were extracted from CT images, which characterize tissue texture at different scales. Classification models, including logistic regression, random forest, and linear SVM, were first developed and tested in patients who received ICI or RT only with 10-fold cross-validation and further tested in patients who received ICI+RT using clinicians' diagnosis as a reference.
RESULTS: Using 10-fold cross-validation, the classification models built on the intensity histogram features, GLCM-based features, and BoW features achieved an area under curve (AUC) of 0.765, 0.848, and 0.937, respectively. The best model was then applied to the patients receiving combination treatment, achieving an AUC of 0.896.
CONCLUSIONS: This study demonstrates the promising potential of radiomic analysis of CT images for differentiating between CIP and RP in lung cancer, which could be a useful tool to attribute the cause of pneumonitis in patients who receive both ICI and RT.
© 2022 The Authors. Medical Physics published by Wiley Periodicals LLC on behalf of American Association of Physicists in Medicine.

Entities:  

Keywords:  CT radiomics; immune checkpoint inhibitor-related pneumonitis; lung cancer; machine learning; radiation pneumonitis

Mesh:

Substances:

Year:  2022        PMID: 35026041      PMCID: PMC9306809          DOI: 10.1002/mp.15451

Source DB:  PubMed          Journal:  Med Phys        ISSN: 0094-2405            Impact factor:   4.506


INTRODUCTION

Consolidation immunotherapy of immune checkpoint inhibitor (ICI) durvalumab following concurrent chemoradiotherapy (CCRT) is the current standard of care for patients with stage III unresectable non‐small cell lung cancer (NSCLC) as the phase 3 PACIFIC trial has shown that administering ICI after CCRT significantly improved progression‐free survival and overall survival compared with placebo. , A number of clinical trials are currently evaluating the role of concurrent/induction ICI with chemoradiotherapy in locally advanced NSCLC. , While ICI in conjunction with radiotherapy (RT) has shown promising prospects, treatment‐related pneumonitis including radiation pneumonitis (RP) and checkpoint inhibitor‐related pneumonitis (CIP), one of the most frequent and clinically challenging adverse events in the combination setting, should raise concerns. CIP is a rare but fatal side effect with incidence ranging from 1% to 6% in any grade and <1% to ∼3% in grade 3 or higher, as reported in clinical trials for advanced NSCLC patients treated with PD‐1/PD‐L1 inhibitors. , , , In addition to ICI, radiation also leads to lung damage and induces pneumonitis. The incidence of RP is 14% to 49% in grade 2 or higher , and 4% to 9% in grade 3 or higher in NSCLC patients after radical RT within 6 months. Recent studies reported that the incidence of CIP may be higher because of the potential synergy with RT and lung injury caused by RT. , A second analysis of the KEYNOTE‐001 study demonstrated that pneumonitis of any grade was 63% in patients who have received prior RT versus 40% in those who did not (p = 0.052). In the PACIFIC study, the incidence of pneumonitis of any grade was higher with consolidation ICI (33.9% vs. 24.8%). A recent multicenter retrospective study reported a much higher incidence (81.8%) of any grade pneumonitis in a real‐world cohort of patients treated with durvalumab after CCRT. It is very difficult and challenging for clinicians to differentiate between CIP and RP as the clinical and radiologic features of CIP are very similar to those of RP, with nonproductive cough, unresolved dyspnea, and nonspecific interstitial pneumonia in the periphery or anywhere of the lungs. , Differentiating between CIP and RP can have significant implications for clinical management such as the treatments for pneumonitis and the decision to continue or restart immunotherapy. , Although a few studies discussed the typical radiologic appearance of CIP and RP, , these radiologic findings are only suggestive because pneumonitis has a wide spectrum of radiologic appearance. For example, RP is usually, but not always, limited to the radiation field of the lung. Figure 1 shows some typical CT images of CIP and RP and two images of RP resembling CIP. In lung cancer, CT is routinely used for clinical management, including diagnosis, radiation treatment planning, and surveillance of treatment response. CT‐based radiomics approaches have been successfully applied to various tasks such as differentiation between benign and malignant lesions , ; prediction of prognosis, , treatment response, , , and distance metastasis , ; and associations between genotype and imaging phenotype. , , There are very few studies focusing on the differentiation between CIP and RP using radiomic features, as ICI therapy has been used in lung cancer for only a few years and the incidence of CIP is relatively low.
FIGURE 1

Examples of CT images in patients with pneumonitis who received ICI only or RT only. (a) CT images in two patients with CIP demonstrated ground‐glass and reticular opacities involving both lungs with a diffuse distribution, representing a cryptogenic organizing pneumonia pattern. Left patient also presented a nonspecific interstitial pneumonia pattern. (b) CT images in two patients with RP demonstrated reticular opacities, consolidations, and bronchiectasis. The inflammatory lesions were within radiation field and had clear boundaries. (c) CT images in two patients with RP demonstrated radiologic features resembling those of CIP. CIP, immune checkpoint inhibitor‐related pneumonitis; ICI, immune checkpoint inhibitor; RP, radiation pneumonitis; RT, radiotherapy

Examples of CT images in patients with pneumonitis who received ICI only or RT only. (a) CT images in two patients with CIP demonstrated ground‐glass and reticular opacities involving both lungs with a diffuse distribution, representing a cryptogenic organizing pneumonia pattern. Left patient also presented a nonspecific interstitial pneumonia pattern. (b) CT images in two patients with RP demonstrated reticular opacities, consolidations, and bronchiectasis. The inflammatory lesions were within radiation field and had clear boundaries. (c) CT images in two patients with RP demonstrated radiologic features resembling those of CIP. CIP, immune checkpoint inhibitor‐related pneumonitis; ICI, immune checkpoint inhibitor; RP, radiation pneumonitis; RT, radiotherapy In this study, we present a CT radiomics approach to differentiate between CIP and RP in lung cancer patients. We collected three cohorts of patients with pneumonitis who received ICI only, RT only, and ICI+RT, respectively. Three different kinds of radiomic features were extracted from CT images. The utility of these radiomic features for classifying CIP and RP was first evaluated using the ICI and RT cohorts and further validated using the ICI+RT cohort.

MATERIALS AND METHODS

Patients and CT image acquisition

This retrospective study was approved by the Ethics Committee of Guangdong Provincial People's Hospital. We collected three datasets (ICI, RT, and ICI+RT datasets) which contained the CT images and clinical information of patients who developed pneumonitis. The ICI dataset consisted of 28 lung cancer patients who developed CIP after ICI therapy. Patients were excluded from the analysis if they received thoracic RT before the occurrence of CIP. The RT dataset consisted of 31 patients randomly selected from locally advanced NSCLC patients who were treated with radical thoracic RT in a total dose of 60 to 66 Gy. These patients developed RP within 6 months after RT. Patients were excluded if they received ICI therapy before or after RT. The ICI+RT dataset consisted of 14 patients who developed treatment‐related pneumonitis after induction ICI therapy followed by thoracic RT or consolidation ICI therapy following thoracic RT. Note that we excluded patients with clear alternative etiologies, such as proven active pulmonary infection, tuberculosis, pulmonary embolism, or tumor progression. A flowchart for preparing the patient cohorts is shown in Figure 2, and a summary of patient characteristics is provided in Table 1.
FIGURE 2

Flowchart for preparing the ICI dataset (a), RT dataset (b), and ICI+RT dataset (c). ICI, immune checkpoint inhibitor; RT, radiotherapy

TABLE 1

Clinical characteristics of the patients in the immune checkpoint inhibitor (ICI), radiation therapy (RT), and ICI+RT datasets

Characteristic ICI RT ICI+RT
Patient No.283114
Sex
Female052
Male282612
Age (year)
Median626262
Range39–7544–7041–78
Smoking (pack‐year)
Median404026
Range0–1200–1500–60
Pneumonitis grade
Grade 14195
Grade 21238
Grade 31191
Grade 4100
Flowchart for preparing the ICI dataset (a), RT dataset (b), and ICI+RT dataset (c). ICI, immune checkpoint inhibitor; RT, radiotherapy Clinical characteristics of the patients in the immune checkpoint inhibitor (ICI), radiation therapy (RT), and ICI+RT datasets We defined CIP by (1) a treatment history of ICI therapy; (2) symptoms of nonproductive cough, unresolving dyspnea, fever, and chest pain; and (3) varied radiographic findings in a chest CT imaging, such as cryptogenic organizing pneumonia, with ground‐glass or consolidative opacities in peripheral or peribronchial distribution, or nonspecific interstitial pneumonia, with ground‐glass opacities and reticular opacities primarily in the peripheral and lower lungs, or pneumonitis presenting as acute interstitial pneumonia and acute respiratory distress syndrome. , We defined RP by (1) a treatment history of RT; (2) symptoms of shortness of breath, low‐grade fever, and nonproductive cough; and (3) radiographic findings in a chest CT imaging with patchy consolidation roughly within the area of the high‐dose radiation field and does not conform to normal lobar anatomy. The grade of CIP and RP was scored by treating physicians according to the Common Terminology Criteria for Adverse Events v5.0. The CT examinations were performed using CT scanners from different manufacturers, including Siemens (Somatom Definition Flash; Erlangen, Germany), General Electric (Lightspeed VCT 99; Waukesha, WI, USA), and Philips (iCT 256 and Ingenuity; Cleveland, Ohio, USA). Thoracic CT scans containing the entire lung were analyzed utilizing a multi‐slice helical technique at 120 kVp, mean exposure of 158 mA, mean pixel spacing of 0.78 mm, and slice thickness of 5 mm.

Analysis workflow

The analysis workflow of our study is shown in Figure 3, which consists of three steps. In the first step, we collected CT images and manually segmented regions of interests (ROIs), that is, inflammatory lesions. Next, three kinds of radiomic features that characterize lung tissue texture at different scales were extracted from the ROIs. At last, we built classification models on the ICI and RT datasets. The models were first validated on the ICI and RT datasets with 10‐fold cross‐validation and were then tested on the ICI+RT dataset.
FIGURE 3

Workflow scheme. Three kinds of radiomic features (intensity histogram, GLCM‐based features, and bag‐of‐words features) were extracted from the CT images of patients who received ICI only, RT only, and ICI+RT. After feature selection, classification models were built on the selected features to classify patients into CIP or RP. CIP, checkpoint inhibitor‐related pneumonitis; GLCM, gray‐level co‐occurrence matrix; ICI, immune checkpoint inhibitor; RP, radiation pneumonitis; RT, radiotherapy

Workflow scheme. Three kinds of radiomic features (intensity histogram, GLCM‐based features, and bag‐of‐words features) were extracted from the CT images of patients who received ICI only, RT only, and ICI+RT. After feature selection, classification models were built on the selected features to classify patients into CIP or RP. CIP, checkpoint inhibitor‐related pneumonitis; GLCM, gray‐level co‐occurrence matrix; ICI, immune checkpoint inhibitor; RP, radiation pneumonitis; RT, radiotherapy

CT image feature extraction

We extracted radiomic features from the ROIs (inflammatory lesions), which were annotated by an experienced radiation oncologist (PT) and further reviewed by a senior radiation oncologist (YP). Specifically, three feature extraction methods were employed to quantify the texture of ROIs: intensity histogram features, gray‐level co‐occurrence matrix (GLCM)‐based features, and bag‐of‐words (BoW) features. The three kinds of features describe tissue texture at increasing scales. Intensity histogram is based on individual pixels, GLCM is based on the co‐occurrence of two pixels, and BoW is based on small patches (e.g., 5 × 5 image patch) (see the illustration in Figure 3). Essentially, all three kinds of features are based on the counts of different‐scale patterns, so we can simply calculate these features slice‐by‐slice and aggregate them across the whole CT volume.

Intensity histogram features

To extract intensity histogram features, we first partitioned the pixel values into a specific number of equally spaced bins (i.e., pixel values were quantized to a specific number of gray levels) and then calculated the bin counts using the pixels within the ROI. The bin counts were L1‐normalized (i.e., the sum is equal to 1) to remove the effect of ROIs having different sizes. The normalized bin counts were used as the final intensity histogram features.

GLCM‐based features

GLCM is commonly used to characterize the texture in images. A GLCM is a 2D histogram of co‐occurring intensities (gray levels) at a given offset. There are two parameters involved in the construction of GLCM. One is the number of gray levels, and the other is the offset between the pixel of interest and its neighbor. For a given number of gray levels and the distance between two pixels, four GLCMs in four directions (0, 45, 90, and 135 degrees) were constructed. Based on each GLCM, four second‐order statistical features (contrast, correlation, energy, and homogeneity) were calculated, resulting in 16 texture features per image.

BoW features

The BoW model is a feature representation method originally used in natural language processing and information retrieval. As its name implies, this model can represent a text or document by converting it into a bag of words, which is the occurrence counts of the most frequently used words. The BoW model has also been used in computer vision. In computer vision, the BoW model, sometimes called the bag‐of‐visual‐words model, represents an image as a vector of occurrence counts of a vocabulary of local image features. The vocabulary of local image features, equivalent to frequently used words in document classification, is usually generated by clustering local image features. The BoW representations can be obtained into three steps: extraction of local image features, construction of the visual vocabulary, and representation of images as the occurrence counts of visual words. In our work, we first used raw image patches as the local features. 2D image patches were densely sampled from the ROI and vectorized. Next, to create the visual vocabulary, we performed the k‐means algorithm on the extracted local features. The words in the visual vocabulary were then defined as the learned cluster centers. Finally, for each patient, all its local features were assigned to one of the visual words via vector quantization based on Euclidean distance. The BoW feature representation of a patient is the L1‐normalized counts of words.

Machine‐learning methods for classification

We first trained and tested different classifiers, including logistic regression, random forest, and linear SVM, on the ICI and RT datasets with 10‐fold cross‐validation. In each of the 10 rounds, we first performed feature selection and then trained the classification model based on the selected features using the training set. The learned classification model was then applied to the held‐out test set to make predictions. After 10 rounds were completed, each sample was predicted with a label and a probability. We then applied the model trained on the ICI and RT datasets to the patients in the ICI+RT dataset. For feature selection, we performed a two‐sided Mann–Whitney U‐test on each feature and selected those with a p‐value less than 0.05. We adopted the R package glmnet for logistic regression. Note that in all experiments, feature selection and model training were performed only using the training set, with the test set untouched.

Evaluation metrics

The receiver operating characteristic (ROC) analysis was performed. The area under the ROC curve (AUC) and its 95% confidence interval were calculated using the R package pROC. We computed the Youden's index (defined as sensitivity + specificity ‐ 1) for each of the points on the ROC curve and used the maximum value of this index as a criterion for selecting the optimal cut‐off point. Then the accuracy, sensitivity, and specificity at the optimal cut‐off point were reported. The accuracy is the proportion of samples being correctly classified. In our classification model, we considered RP as the positive class. Therefore, the sensitivity measures the proportion of RP cases that are correctly classified, and the specificity is the proportion of CIP cases that are correctly classified.

RESULTS

Experimental settings

Intensity histogram, GLCM based, and BoW features were extracted with different parameter settings. For the intensity histogram features, we tested different values for the gray level from 20, 40, 60, 80, and 100. For the GLCM‐based features, we tested different values for the gray level from {20, 40, 60, 80, 100} and distance from {1, 2, 3, 4}. For the BoW features, we tested different values for the patch size from {3, 5, 7, 9, 11} and vocabulary size from {16, 32, 64, 128, 256}. For each kind of feature, we reported the results of the highest AUC. In addition, we also investigated whether including more boundary area would affect classification performance. To this end, we dilated the annotated ROI mask using a disk‐shaped structure element with a radius of 5 and 10 pixels, respectively.

Classification performance on the ICI and RT datasets

Based on each of the three kinds of radiomic features, classification models (logistic regression, random forest, and linear SVM) were trained and evaluated with 10‐fold cross‐validation on the ICI and RT datasets. Table 2 shows the classification performance of different features and ROI sizes. Using the originally annotated ROI, GLCM‐based features outperformed intensity histogram and BoW features (AUC: 0.848 vs. 0.677 and 0.834). As the size of ROI increased, the performance of BoW features generally improved and then declined. The best performance (AUC = 0.937) was achieved when we used BoW features, logistic regression, and ROI dilation with five pixels.
TABLE 2

Classification performance of different kinds of features on the immune checkpoint inhibitor and radiation therapy datasets

Intensity histogram GLCM BoW
Index LR RF SVM LR RF SVM LR RF SVM
R = 0, Acc0.6440.7290.6610.7970.7460.7970.7460.7970.797
R = 0, Sen0.6450.7740.8390.9680.7100.8710.5480.8710.742
R = 0, Spe0.6430.6790.4640.6070.7860.7140.9640.7140.857
R = 0, AUC0.6080.6770.6340.8480.7580.8170.8150.8130.834
R = 5, Acc0.6440.7800.5930.7970.7290.8140.9150.7800.881
R = 5, Sen0.9680.8390.9680.8060.5810.8710.9030.7100.839
R = 5, Spe0.2860.7140.1790.7860.8930.7500.9290.8570.929
R = 5, AUC0.5600.7630.5170.8210.7640.8290.9370.8650.926
R = 10, Acc0.6100.7800.6610.7970.7630.8310.8140.8140.831
R = 10, Sen0.8710.7740.8390.7740.7100.9030.7420.7100.871
R = 10, Spe0.3210.7860.4640.8210.8210.750.8930.9290.786
R = 10, AUC0.5050.7650.5630.7890.7840.8250.8940.8660.884

Note: We enlarged the ROI mask by image dilation using a disk‐shaped structure element with a radius (R) of 0, 5, and 10 pixels. R = 0 means no image dilation was performed. We tested different classifiers including logistic regression (LR), random forest (RF), and linear SVM, and reported different metrics including accuracy (Acc), sensitivity (Sen), specificity (Spe), and area under ROC curve (AUC).

Abbreviations: ROC, receiver operating characteristic; ROI, region of interest.

Classification performance of different kinds of features on the immune checkpoint inhibitor and radiation therapy datasets Note: We enlarged the ROI mask by image dilation using a disk‐shaped structure element with a radius (R) of 0, 5, and 10 pixels. R = 0 means no image dilation was performed. We tested different classifiers including logistic regression (LR), random forest (RF), and linear SVM, and reported different metrics including accuracy (Acc), sensitivity (Sen), specificity (Spe), and area under ROC curve (AUC). Abbreviations: ROC, receiver operating characteristic; ROI, region of interest. Figure 4a shows the ROC curves which correspond to the best performance achieved by the three kinds of features. Using the cut‐off of the classifier's output that maximized the Youden's index (sensitivity + specificity − 1), the corresponding accuracy, sensitivity, and specificity of the classifier built on BoW features were 0.915, 0.903, and 0.929, respectively. This means that 2 out of 28 patients with CIP (negative class) were misclassified into RP (positive class) while 3 out of 31 patients with RP were misclassified into CIP.
FIGURE 4

Performance of differentiating CIP and RP. (a) ROC curves for the models with the best performance on the ICI and RT datasets, using intensity histogram features, GLCM based features, and BoW features, respectively. The 95% confidence intervals are 0.638–0.892 for intensity, 0.750–0.946 for GLCM, and 0.873–1 for BoW. (b) ROC curve for classifying CIP and RP in the ICI+RT dataset. The model with the highest AUC in (a) was used. The 95% confidence interval for AUC is 0.714–1. AUC, area under curve; BoW, bag‐of‐words; CIP, checkpoint inhibitor‐related pneumonitis; GLCM, gray‐level co‐occurrence matrix; ICI, immune checkpoint inhibitor; ROC, receiver operating characteristic; RP, radiation pneumonitis; RT, radiotherapy

Performance of differentiating CIP and RP. (a) ROC curves for the models with the best performance on the ICI and RT datasets, using intensity histogram features, GLCM based features, and BoW features, respectively. The 95% confidence intervals are 0.638–0.892 for intensity, 0.750–0.946 for GLCM, and 0.873–1 for BoW. (b) ROC curve for classifying CIP and RP in the ICI+RT dataset. The model with the highest AUC in (a) was used. The 95% confidence interval for AUC is 0.714–1. AUC, area under curve; BoW, bag‐of‐words; CIP, checkpoint inhibitor‐related pneumonitis; GLCM, gray‐level co‐occurrence matrix; ICI, immune checkpoint inhibitor; ROC, receiver operating characteristic; RP, radiation pneumonitis; RT, radiotherapy We further investigated the impact of the parameters of feature extraction methods on classification performance. The parameters of the three feature extraction methods are described in the previous section. Tables S1–S3 show the impact of different parameters for intensity histogram, GLCM based, and BoW features, respectively, when logistic regression was used. The intensity histogram features achieved the highest AUC when the number of gray levels was set to 60. The GLCM‐based features achieved the best performance when the number of gray levels and the distance between two pixels were set to 60 and 3. The BoW features achieved the best performance when the patch size and vocabulary size were set to 9 and 128. A general observation from those tables was that performance got better when relatively larger values of the parameters were used and that, however, the performance began to decline if the parameters were too large.

Assessing importance of features

The best performance was achieved using the BoW features, logistic regression, and ROI dilation with five pixels. The patch size and vocabulary size for the BoW features were set to 9 and 128. This means the BoW features are 128‐dimensional (see Section 2.3 for the details of this method). To identify the features that robustly and significantly contributed to the model, we recorded the selected features and their coefficients in each round of the 10‐fold cross‐validation and computed their counts of selection and mean coefficients. We selected the top nine features with the largest mean coefficients (regardless of the sign) from the features that were selected at least eight times. The visualization of the image patches (9 × 9 pixels) that belong to each of the nine visual words is shown in Figure 5a. As we can see, the 400 (20 × 20) image patches in each of the nine panels present a very similar pattern. Along with each panel, the index and mean coefficient of each feature are also provided. A positive coefficient means that the corresponding image patch pattern is more likely to appear in the RP class (RP was regarded as the positive class when training classifiers), whereas a negative coefficient means that the corresponding image patch pattern tends to appear more frequently in the CIP class. Figure 5b shows the average occurrence frequency of the nine visual words of three CIP patients and three RP patients that were most confidently predicted by our model. We can see clearly that for the top three most significant visual words, the 19th and 113th visual words have a much higher frequency in RP than in CIP, while the 123rd visual word is the opposite.
FIGURE 5

Visualization of the nine visual words that most robustly and significantly contributed to our classification model. (a) Example image patches showing different patterns for each visual word. The index and mean coefficient of each visual word are shown in the white box. (b) Bar graph of the average occurrence frequency of the nine visual words of three CIP patients and three RP patients that were most confidently predicted by our model. The visual words with positive coefficients are more likely to appear in the positive class (i.e., RP) such as visual words 19, 113, and 120. CIP, checkpoint inhibitor‐related pneumonitis; RP, radiation pneumonitis

Visualization of the nine visual words that most robustly and significantly contributed to our classification model. (a) Example image patches showing different patterns for each visual word. The index and mean coefficient of each visual word are shown in the white box. (b) Bar graph of the average occurrence frequency of the nine visual words of three CIP patients and three RP patients that were most confidently predicted by our model. The visual words with positive coefficients are more likely to appear in the positive class (i.e., RP) such as visual words 19, 113, and 120. CIP, checkpoint inhibitor‐related pneumonitis; RP, radiation pneumonitis

Evaluation in patients receiving both ICI and RT

To further validate our method, we applied the classification model with the highest AUC on the ICI and RT datasets to the patients in the ICI+RT dataset, in which patients received both treatments. Performance on the ICI+RT dataset was evaluated using clinicians’ diagnosis as a reference, and the cause of pneumonitis was diagnosed on the basis of radiologic features, clinical symptoms, and onset time of pneumonitis. Three radiation oncologists participated in the diagnosis independently, and the final class label of each patient was determined by majority voting. Table 3 provides a summary of each oncologist's diagnosis, final voting result, and our model's prediction. The three oncologists made the exact same diagnosis for 8 out of 14 patients (Fleiss's kappa = 0.417). Our model generalized well on the ICI+RT dataset, achieving an accuracy of 0.857 and AUC of 0.896. The corresponding ROC curve is provided in Figure 4b.
TABLE 3

Summary of radiation oncologists’ diagnosis, majority voting result, and model's prediction for each patient in the ICI+RT dataset

Patient index Oncologist 1 Oncologist 2 Oncologist 3 Majority voting Model‐predicted probability of being RP
1CIPCIPRPCIP0.255
3CIPCIPCIPCIP0.954
7CIPCIPCIPCIP0.393
9RPCIPCIPCIP0.761
13RPCIPCIPCIP0.504
14CIPCIPCIPCIP0.144
2CIPRPRPRP0.892
4RPRPRPRP0.972
5RPRPRPRP1.000
6RPRPCIPRP0.708
8RPRPRPRP0.995
10RPRPRPRP0.789
11CIPRPRPRP0.913
12RPRPRPRP1.000

Abbreviations: CIP, checkpoint inhibitor‐related pneumonitis; ICI, immune checkpoint inhibitor; RP, radiation pneumonitis; RT, radiotherapy.

Summary of radiation oncologists’ diagnosis, majority voting result, and model's prediction for each patient in the ICI+RT dataset Abbreviations: CIP, checkpoint inhibitor‐related pneumonitis; ICI, immune checkpoint inhibitor; RP, radiation pneumonitis; RT, radiotherapy.

DISCUSSION

In the setting of concomitant ICIs with RT, the distinction between CIP and RP is crucial to subsequent treatment decisions because it is much safer for a patient to restart ICI therapy after experiencing RP. Prior work has documented the radiologic patterns and clinical symptoms of CIP , , and RP, , but the characteristics of these two kinds of pneumonitis can mimic each other. Currently, distinguishing CIP from RP poses a great challenge for clinicians. In this study, we investigated whether CT radiomic features can help differentiate between these two kinds of pneumonitis. We developed a workflow for the generation of a rich set of quantitative features to characterize the texture of inflammatory lesions in CT images. Based on these features, we trained a classification model using the ICI and RT datasets and applied this model to the patients in the ICI+RT dataset. Ten‐fold cross validation on the training set and evaluation on the independent test set demonstrated the efficacy of our method with AUCs of 0.937 and 0.896, respectively. Three kinds of features were tested: intensity histogram features, GLCM‐based features, and BoW features. We found that the BoW features yielded the best cross‐validation performance with an AUC of 0.937, followed by the GLCM‐based features (AUC = 0.848) and the intensity histogram features (AUC = 0.765). The distinction of performance is expected as the three kinds of features characterize the texture of image content at different scales. Intensity histogram features are based on individual pixels and completely ignore the information of surrounding pixels, thereby leading to the worst results. GLCM‐based features describe the pairwise relationships between two pixels and thus provide better results. BoW features deal with a group of adjacent pixels. Therefore, the BoW features are more informative and discriminative. To further show the effectiveness of the radiomic features, we compared the diagnostic performance of our model and a radiation oncologist on the ICI and RT datasets. The ICI and RT datasets were used because there is no ambiguity of the cause of pneumonitis since the patients received either ICI or RT. For a fair comparison, the oncologist made a diagnosis only based on the CT images without referring to other clinical information, which is the same as our method. The classification by the oncologist achieved an AUC of 0.777, which is inferior to our BoW feature‐based model (AUC = 0.937). These results provide compelling evidence that our radiomics approach can discover quantitative and discriminative features to effectively distinguish CIP from RP, which are difficult for humans to notice. Attributing the cause of the pneumonitis in patients receiving both ICI and RT can be a very difficult task, which can be seen from the diagnoses by the three radiation oncologists. As shown in Table 3, the oncologists made the same diagnosis for only 8 out of 14 patients (57.14%, Fleiss's kappa = 0.417). The patients with consensus among oncologists can be easily diagnosed by some clear evidence. For example, clear evidence suggesting RP includes that pneumonitis is only seen in the high‐dose area and that the onset time of pneumonitis is close to the completion of RT and is far from the administration of ICI, and vice versa for the evidence suggesting CIP. However, not all patients exhibit clear evidence; findings of RP and CIP are varied, overlapped, and sometimes non‐specific. This means that clinician's diagnosis for some patients in the ICI+RT dataset may not be accurate. For this reason, we train our classification model using the ICI and RT datasets, in which each patient has a definite diagnosis and solicit the diagnosis from multiple clinicians to reduce potential diagnostic bias. To the best of our knowledge, there are very few studies focusing on this topic. We only found a relevant abstract published in 2020 that used CT radiomics and machine learning for distinguishing between CIP and RP. Chen et al. used a general package (PyRadiomics) to extract radiomic features, trained a random forest classifier in patients who received ICI (n = 23) and RT (n = 29) only, and tested the classifier in patients who received ICI+RT (n = 30). The random classifier achieved an AUC of 0.79 on the training set and an AUC of 0.84 on the test set. Our method achieved better performance with AUCs of 0.937 and 0.896 on our training and test sets, respectively, which can be attributed to the more powerful radiomic features used in our method. A limitation of the present study is that although our method was rigorously validated using 10‐fold cross‐validation on the training set and further tested using an independent dataset, this study was conducted using data from a single institution. Future work will focus on collecting more in‐house samples and samples from different institutions as an external validation set. A prospective study is being designed to rechallenge ICI in the patients who are classified as RP cases by our model.

CONCLUSIONS

In summary, the wide spectrum of radiologic manifestations of CIP and RP poses great diagnostic and management challenges in clinical practice. Our results demonstrated that using CT radiomics and machine learning can successfully distinguish CIP from RP with a high accuracy (AUC of 0.896 on an independent test set). This indicates that our method has the potential to be a useful tool for identifying the RP patients from the patients with pneumonitis who receive both ICI therapy and RT, which has significant implications in improving patient management.

CONFLICT OF INTEREST

The authors declare no conflict of interest. Tables S1–S3 Click here for additional data file.
  40 in total

1.  Perinodular and Intranodular Radiomic Features on Lung CT Images Distinguish Adenocarcinomas from Granulomas.

Authors:  Niha Beig; Mohammadhadi Khorrami; Mehdi Alilou; Prateek Prasanna; Nathaniel Braman; Mahdi Orooji; Sagar Rakshit; Kaustav Bera; Prabhakar Rajiah; Jennifer Ginsberg; Christopher Donatelli; Rajat Thawani; Michael Yang; Frank Jacono; Pallavi Tiwari; Vamsidhar Velcheti; Robert Gilkeson; Philip Linden; Anant Madabhushi
Journal:  Radiology       Date:  2018-12-18       Impact factor: 11.105

2.  Predicting pathologic response to neoadjuvant chemoradiation in resectable stage III non-small cell lung cancer patients using computed tomography radiomic features.

Authors:  Mohammadhadi Khorrami; Prantesh Jain; Kaustav Bera; Mehdi Alilou; Rajat Thawani; Pradnya Patil; Usman Ahmad; Sudish Murthy; Kevin Stephans; Pinfu Fu; Vamsidhar Velcheti; Anant Madabhushi
Journal:  Lung Cancer       Date:  2019-07-05       Impact factor: 5.705

3.  Pneumonitis in Patients Treated With Anti-Programmed Death-1/Programmed Death Ligand 1 Therapy.

Authors:  Jarushka Naidoo; Xuan Wang; Kaitlin M Woo; Tunc Iyriboz; Darragh Halpenny; Jane Cunningham; Jamie E Chaft; Neil H Segal; Margaret K Callahan; Alexander M Lesokhin; Jonathan Rosenberg; Martin H Voss; Charles M Rudin; Hira Rizvi; Xue Hou; Katherine Rodriguez; Melanie Albano; Ruth-Ann Gordon; Charles Leduc; Natasha Rekhtman; Bianca Harris; Alexander M Menzies; Alexander D Guminski; Matteo S Carlino; Benjamin Y Kong; Jedd D Wolchok; Michael A Postow; Georgina V Long; Matthew D Hellmann
Journal:  J Clin Oncol       Date:  2016-09-30       Impact factor: 44.544

4.  Management of Immune-Related Adverse Events in Patients Treated With Immune Checkpoint Inhibitor Therapy: American Society of Clinical Oncology Clinical Practice Guideline.

Authors:  Julie R Brahmer; Christina Lacchetti; Bryan J Schneider; Michael B Atkins; Kelly J Brassil; Jeffrey M Caterino; Ian Chau; Marc S Ernstoff; Jennifer M Gardner; Pamela Ginex; Sigrun Hallmeyer; Jennifer Holter Chakrabarty; Natasha B Leighl; Jennifer S Mammen; David F McDermott; Aung Naing; Loretta J Nastoupil; Tanyanika Phillips; Laura D Porter; Igor Puzanov; Cristina A Reichner; Bianca D Santomasso; Carole Seigel; Alexander Spira; Maria E Suarez-Almazor; Yinghong Wang; Jeffrey S Weber; Jedd D Wolchok; John A Thompson
Journal:  J Clin Oncol       Date:  2018-02-14       Impact factor: 44.544

5.  Extent and computed tomography appearance of early radiation induced lung injury for non-small cell lung cancer.

Authors:  Uffe Bernchou; Rasmus Lübeck Christiansen; Jon Thor Asmussen; Tine Schytte; Olfred Hansen; Carsten Brink
Journal:  Radiother Oncol       Date:  2017-03-01       Impact factor: 6.280

6.  Durvalumab after Chemoradiotherapy in Stage III Non-Small-Cell Lung Cancer.

Authors:  Scott J Antonia; Augusto Villegas; Davey Daniel; David Vicente; Shuji Murakami; Rina Hui; Takashi Yokoi; Alberto Chiappori; Ki H Lee; Maike de Wit; Byoung C Cho; Maryam Bourhaba; Xavier Quantin; Takaaki Tokito; Tarek Mekhail; David Planchard; Young-Chul Kim; Christos S Karapetis; Sandrine Hiret; Gyula Ostoros; Kaoru Kubota; Jhanelle E Gray; Luis Paz-Ares; Javier de Castro Carpeño; Catherine Wadsworth; Giovanni Melillo; Haiyi Jiang; Yifan Huang; Phillip A Dennis; Mustafa Özgüroğlu
Journal:  N Engl J Med       Date:  2017-09-08       Impact factor: 91.245

7.  Association between systemic chemotherapy before chemoradiation and increased risk of treatment-related pneumonitis in esophageal cancer patients treated with definitive chemoradiotherapy.

Authors:  Shulian Wang; Zhongxing Liao; Xiong Wei; H Helen Liu; Susan L Tucker; Chaosu Hu; Jaffer A Ajani; Alexandria Phan; Stephen G Swisher; Radhe Mohan; James D Cox; Ritsuko Komaki
Journal:  J Thorac Oncol       Date:  2008-03       Impact factor: 15.609

8.  Enhanced Performance of Brain Tumor Classification via Tumor Region Augmentation and Partition.

Authors:  Jun Cheng; Wei Huang; Shuangliang Cao; Ru Yang; Wei Yang; Zhaoqiang Yun; Zhijian Wang; Qianjin Feng
Journal:  PLoS One       Date:  2015-10-08       Impact factor: 3.240

9.  Relationship Between Prior Radiotherapy and Checkpoint-Inhibitor Pneumonitis in Patients With Advanced Non-Small-Cell Lung Cancer.

Authors:  Khinh Ranh Voong; Sarah Z Hazell; Wei Fu; Chen Hu; Cheng Ting Lin; Kai Ding; Karthik Suresh; Jonathan Hayman; Russell K Hales; Salem Alfaifi; Kristen A Marrone; Benjamin Levy; Christine L Hann; David S Ettinger; Josephine L Feliciano; Valerie Peterson; Ronan J Kelly; Julie R Brahmer; Patrick M Forde; Jarushka Naidoo
Journal:  Clin Lung Cancer       Date:  2019-03-28       Impact factor: 4.785

Review 10.  Incidence of Programmed Cell Death 1 Inhibitor-Related Pneumonitis in Patients With Advanced Cancer: A Systematic Review and Meta-analysis.

Authors:  Mizuki Nishino; Anita Giobbie-Hurder; Hiroto Hatabu; Nikhil H Ramaiya; F Stephen Hodi
Journal:  JAMA Oncol       Date:  2016-12-01       Impact factor: 31.777

View more
  6 in total

Review 1.  Imaging approaches and radiomics: toward a new era of ultraprecision radioimmunotherapy?

Authors:  Roger Sun; Théophraste Henry; Adrien Laville; Alexandre Carré; Anthony Hamaoui; Sophie Bockel; Ines Chaffai; Antonin Levy; Cyrus Chargari; Charlotte Robert; Eric Deutsch
Journal:  J Immunother Cancer       Date:  2022-07       Impact factor: 12.469

2.  Development and Validation of a Radiomics Nomogram Using Computed Tomography for Differentiating Immune Checkpoint Inhibitor-Related Pneumonitis From Radiation Pneumonitis for Patients With Non-Small Cell Lung Cancer.

Authors:  Qingtao Qiu; Ligang Xing; Yu Wang; Alei Feng; Qiang Wen
Journal:  Front Immunol       Date:  2022-04-26       Impact factor: 8.786

3.  Deep learning predicts immune checkpoint inhibitor-related pneumonitis from pretreatment computed tomography images.

Authors:  Peixin Tan; Wei Huang; Lingling Wang; Guanhua Deng; Ye Yuan; Shili Qiu; Dong Ni; Shasha Du; Jun Cheng
Journal:  Front Physiol       Date:  2022-07-25       Impact factor: 4.755

4.  Differentiation between immune checkpoint inhibitor-related and radiation pneumonitis in lung cancer by CT radiomics and machine learning.

Authors:  Jun Cheng; Yi Pan; Wei Huang; Kun Huang; Yanhai Cui; Wenhui Hong; Lingling Wang; Dong Ni; Peixin Tan
Journal:  Med Phys       Date:  2022-01-27       Impact factor: 4.506

Review 5.  Artificial intelligence and radiomics: fundamentals, applications, and challenges in immunotherapy.

Authors:  Laurent Dercle; Jeremy McGale; Shawn Sun; Aurelien Marabelle; Randy Yeh; Eric Deutsch; Fatima-Zohra Mokrane; Michael Farwell; Samy Ammari; Heiko Schoder; Binsheng Zhao; Lawrence H Schwartz
Journal:  J Immunother Cancer       Date:  2022-09       Impact factor: 12.469

Review 6.  What does radiomics do in PD-L1 blockade therapy of NSCLC patients?

Authors:  Ruichen Cui; Zhenyu Yang; Lunxu Liu
Journal:  Thorac Cancer       Date:  2022-08-29       Impact factor: 3.223

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.