| Literature DB >> 32266148 |
Lan Song1, Zhenchen Zhu1,2, Li Mao3, Xiuli Li3, Wei Han4, Huayang Du1, Huanwen Wu5, Wei Song1, Zhengyu Jin1.
Abstract
Objectives: To predict the anaplastic lymphoma kinase (ALK) mutations in lung adenocarcinoma patients non-invasively with machine learning models that combine clinical, conventional CT and radiomic features.Entities:
Keywords: X-ray computed; anaplastic lymphoma kinase; gene mutation; lung neoplasms; radiomics; tomography
Year: 2020 PMID: 32266148 PMCID: PMC7099003 DOI: 10.3389/fonc.2020.00369
Source DB: PubMed Journal: Front Oncol ISSN: 2234-943X Impact factor: 6.244
Figure 1Eligibility and exclusion criteria of the study. The flowchart depicts the process of patient enrolment, including eligibility, and exclusion criteria of the study. The numbers in parentheses are the numbers of patients. ALK, anaplastic lymphoma kinase; DICOM, Digital Imaging and Communications in Medicine.
Figure 2Workflow of data analysis. The workflow illustrates the radiomic, radiological, and integrated modeling and analysis workflow with one example of a CT image and tumor segmentation. (a) A male lung adenocarcinoma patient, 44 years old. (b) Auto-detection, segmentation, and manual confirmation of the targeted lesion. The red square in the first image mimics the detection process. The initial regions of interest (ROIs) are generated in this step. (c–e) Description of the process of collection of radiomic, conventional CT and clinical features. (f–i) Illustrations of dataset building, feature selection, model training and validation, and model evaluation, respectively.
Figure 3Illustration of the pre-processing methods. The figure displays the VOIs of selected ALK+ and ALK– invasive adenocarcinoma cases after each procedure of the image pre-processing methods. The ALK-positive case was a 44-years-old male patient, and the ALK-negative case was a 60-years-old female patient. Both of the lesions were solid and light lobulated.
Clinical characteristics of ALK– and ALK+ lung adenocarcinoma patients in the primary and test cohort.
| Age (years | 57 ± 10 (26–83) | 59 ± 10 (28–83) | 54 ± 10 (26–73) | <0.001 | 57 ± 11 (34–78) | 59 ± 10 (40–78) | 54 ± 10 (34–76) | 0.116 |
| Sex | ||||||||
| Male | 113 | 76 | 37 | 0.804 | 26 | 19 | 7 | 0.412 |
| Female | 155 | 102 | 53 | 41 | 26 | 15 | ||
| Smoking history | ||||||||
| Never | 182 | 111 | 71 | <0.001 | 50 | 33 | 17 | 0.578 |
| Current | 74 | 65 | 9 | 10 | 8 | 2 | ||
| Former | 12 | 2 | 10 | 7 | 4 | 3 | ||
| SI (pack-years) | ||||||||
| SI ≤ 10 | 208 | 127 | 81 | 0.002 | 55 | 36 | 19 | 0.581 |
| 10 <SI < 20 | 9 | 8 | 1 | 2 | 2 | 0 | ||
| SI ≥ 20 | 51 | 43 | 8 | 10 | 7 | 3 | ||
| Pathology | ||||||||
| AIS | 12 | 10 | 2 | 0.109 | 1/1.5 | 1/2.2 | 0 | 0.410 |
| MIA | 22 | 18 | 4 | 7/10.4 | 6/13.3 | 1 | ||
| IAC | 234 | 150 | 84 | 59/88.1 | 38/84.4 | 21 | ||
| DM (−) | 244 | 174 | 70 | <0.001 | 58/86.6 | 43/95.6 | 15 | 0.004 |
| DM (+) | 24 | 4 | 20 | (Fisher) | 9/13.4 | 2/4.4 | 7 | (Fisher) |
| Clinical stage | ||||||||
| I | 176 | 141 | 35 | <0.001 | 46 | 36 | 10 | 0.002 |
| II | 32 | 16 | 16 | 4 | 3 | 1 | ||
| III | 15 | 7 | 8 | 6 | 4 | 2 | ||
| IV | 45 | 14 | 31 | 11 | 2 | 9 | ||
The data are displayed as n/%, except where otherwise noted. No significant difference exists between the primary and test cohort for all demographic characteristics (P > 0.05) but the smoking history (P = 0.028).
Mean ± standard deviation (range).
ALK– group vs. ALK+ group.
P < 0.05.
ALK, anaplastic lymphoma kinase; AIS, adenocarcinoma in situ; MIA, minimally invasive adenocarcinoma; IAC, invasive adenocarcinoma; SI, smoking index; DM, distant metastasis; Fisher, Fisher's exact test.
Conventional CT features of ALK– and ALK+ lung adenocarcinoma patients in the primary and test cohort.
| mDia. (mm) | 19 ± 16 | 18 ± 15 | 21 ± 23 | 0.007 | 22 ± 19 | 18 ± 15 | 26 ± 22 | 0.089 |
| CT attenuation (HU) | −214 ± 476 | −397 ± 455 | −7 ± 197 | <0.001 | 5 ± 289 | −35 ± 409 | 26 ± 38 | 0.001 |
| Location | ||||||||
| Central | 43 | 21 | 22 | 0.008 | 12 | 4/8.9 | 8/36.4 | 0.014 |
| Peripheral | 225 | 157 | 68 | 55 | 41/91.1 | 14 | (Fisher) | |
| Lobe | ||||||||
| RUL | 78 | 57 | 21 | 0.280 | 16 | 12/26.7 | 4 | 0.274 |
| RML | 14 | 10 | 4 | 0 | 0 | 0 | ||
| RLL | 58 | 40 | 18 | 17 | 11 | 6 | ||
| LUL | 65 | 43 | 22 | 19 | 14 | 5 | ||
| LLL | 51 | 27 | 24 | 13 | 8 | 5 | ||
| Mixed | 2 | 1 | 1 | 2 | 0 | 2 | ||
| Density | ||||||||
| pGGO | 83 | 74 | 9 | <0.001 | 10 | 10 | 0 | <0.001 |
| pSolid | 69 | 51 | 18 | 25 | 22 | 3 | ||
| Solid | 116 | 53 | 63 | 32 | 13 | 19 | ||
| Margin | ||||||||
| Spiculated | 115 | 86 | 29 | 0.004 | 35 | 29 | 6 | 0.009 |
| Lobulated | 120 | 67/37.6 | 53 | 28 | 13 | 15 | ||
| Smooth | 33 | 25 | 8 | 4 | 3 | 1 | ||
| Cavity (–) | 244 | 167 | 77 | 0.039 | 63 | 42 | 21 | 1.000 |
| Cavity (+) | 24 | 11 | 13 | 4 | 3 | 1 | (Fisher) | |
| Calcification (–) | 256 | 170 | 86 | 1.000 | 60 | 41/91.1 | 19 | 0.675 |
| Calcification (+) | 12 | 8 | 4 | (Fisher) | 7 | 4/8.9 | 3 | (Fisher) |
| Plu. retraction (–) | 133 | 85 | 48 | 0.388 | 26 | 18 | 8 | 0.774 |
| Plu. retraction (+) | 135 | 93 | 42 | 41 | 27 | 14 | ||
| Plu. effusion (–) | 237 | 168 | 69 | <0.001 | 57 | 44/97.8 | 13 | <0.001 |
| Plu. effusion (+) | 31 | 10 | 21 | 10 | 1/2.2 | 9 | (Fisher) | |
| Per. effusion (–) | 258 | 178 | 80 | <0.001 | 58 | 43 | 15 | 0.004 |
| Per. effusion (+) | 10 | 0 | 10 | (Fisher) | 9 | 2 | 7 | (Fisher) |
| Lymph. (–) | 205 | 158 | 47 | <0.001 | 48 | 38 | 10 | 0.001 |
| Lymph. (+) | 63 | 20 | 43 | 19 | 7 | 12 | ||
The data are displayed as n/%, except where otherwise noted.
Median ± interquartile interval.
ALK– group vs. ALK+ group.
P < 0.05.
ALK, anaplastic lymphoma kinase; mDia., maximum diameter; RUL, right upper lobe; RML, right middle lobe; RLL, right lower lobe; LUL, left upper lobe; LLL, left lower lobe; pGGO, pure ground-glass opacity; pSolid, partial solid; Plu., pleural; Per., pericardial; Lymph., lymphadenopathy; Fisher, Fisher's exact test.
Figure 4Illustration of the feature selection procedure in the three models. Each vertical panel exhibits the selection process for each of the three predictive models. Each symbol indicates a different type of feature. The number of selected features along with the optimal AUC obtained at each selection step was shown at the top of each sub-panel. In the radiomic model, 1,218 extracted radiomic features were used to begin the selection. In the radiological model, the initial features included 12 conventional CT features and 1,218 radiomic features. In the integrated model, seven clinical characteristics were added in addition to the 12 conventional CT features and 1,218 radiomic features. The features were selected to maximize the AUC of the predictive model at the final step.
Figure 5Selected features and their coefficients in the integrated model. The blue dots indicate the coefficients in the DT model. They denote the decrease of the Gini index when such a feature is used in the DT model. A higher DT value suggests a more significant influence. The red dots represent the beta coefficient in the LR model. Since all features were rescaled before the selection procedure, these coefficients are equivalent to the normalized LR coefficients. A higher positive LR coefficient (right side of the figure) suggests a stronger relationship between the feature and ALK mutation, and a higher negative LR coefficient (left side of the figure) suggests a stronger relationship between the feature and ALK-negative status.
Diagnostic performance of each model in the primary cohort and test cohort.
| Radiomic | Train | 1.00 (0.99–1.00) | 1.00 | 1.00 | 0.99 | 0.80 (0.69–0.89) | 0.73 | 0.73 | 0.73 |
| Validation | 0.83 (0.79–0.88) | 0.76 | 0.70 | 0.80 | |||||
| Radiological | Train | 1.00 (0.99–1.00) | 1.00 | 1.00 | 1.00 | 0.86 (0.75–0.93) | 0.75 | 0.68 | 0.78 |
| Validation | 0.85 (0.80–0.89) | 0.78 | 0.78 | 0.78 | |||||
| Integrated | Train | 1.00 (0.99–1.00) | 1.00 | 1.00 | 0.99 | 0.88 (0.77–0.94) | 0.79 | 0.82 | 0.78 |
| Validation | 0.88 (0.83–0.91) | 0.79 | 0.78 | 0.80 | |||||
In the primary cohort, the performance index of each model in the training and the validation set were displayed separately. The radiomic model contained the selected radiomic features only. The radiological model contained the selected conventional CT features in addition to the radiomic features. The integrated model contained the selected radiomic features, conventional CT features and clinical characteristics. AUC, area under the receiver operating characteristic curve; ACC, accuracy; SEN, sensitivity; SPE, specificity.
Figure 6The ROC curves of the three prediction models that indicate ALK mutation status. (A) The validation set in the primary cohort; (B) the test cohort.
Figure 7The discriminative scores of the three predictive models in the primary (A) and test cohort (B). The discriminative score for each patient is the average of the final predictive probabilities in the LR and DT classifier. The columns above the horizontal axis represent tumors that were predicted to be ALK+, while the columns below the horizontal axis represent the opposite. The color indicates the golden truth of each tumor.