| Literature DB >> 32883973 |
Constance de Margerie-Mellon1, Ritu R Gill2, Pascal Salazar3, Anastasia Oikonomou4, Elsie T Nguyen5, Benedikt H Heidinger2,6, Mayra A Medina7, Paul A VanderLaan7, Alexander A Bankier8.
Abstract
The aim of this study was to develop and test multiclass predictive models for assessing the invasiveness of individual lung adenocarcinomas presenting as subsolid nodules on computed tomography (CT). 227 lung adenocarcinomas were included: 31 atypical adenomatous hyperplasia and adenocarcinomas in situ (class H1), 64 minimally invasive adenocarcinomas (class H2) and 132 invasive adenocarcinomas (class H3). Nodules were segmented, and geometric and CT attenuation features including functional principal component analysis features (FPC1 and FPC2) were extracted. After a feature selection step, two predictive models were built with ordinal regression: Model 1 based on volume (log) (logarithm of the nodule volume) and FPC1, and Model 2 based on volume (log) and Q.875 (CT attenuation value at the 87.5% percentile). Using the 200-repeats Monte-Carlo cross-validation method, these models provided a multiclass classification of invasiveness with discriminative power AUCs of 0.83 to 0.87 and predicted the class probabilities with less than a 10% average error. The predictive modelling approach adopted in this paper provides a detailed insight on how the value of the main predictors contribute to the probability of nodule invasiveness and underlines the role of nodule CT attenuation features in the nodule invasiveness classification.Entities:
Mesh:
Year: 2020 PMID: 32883973 PMCID: PMC7471897 DOI: 10.1038/s41598-020-70316-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Patient and CT protocol characteristics, and geometric and attenuation features for the 3-class nodule classification.
| All (n = 227) | H1 (n = 31) | H2 (n = 64) | H3 (n = 132) | |
|---|---|---|---|---|
| 67 ± 10 | 66 ± 10 | 66 ± 10 | 67 ± 9 | |
| Male | 60 (26%) | 5 (16%) | 19 (30%) | 36 (27%) |
| Female | 167 (74%) | 26 (84%) | 45 (70%) | 96 (73%) |
| No | 63 (28%) | 12 (39%) | 15 (23%) | 36 (27%) |
| Yes | 164 (72%) | 19 (61%) | 49 (77%) | 96 (73%) |
| 1.0–1.5 mm | 122 (54%) | 13 (42%) | 40 (62%) | 69 (52%) |
| 2.0–2.5 mm | 16 (7%) | 1 (3%) | 2 (3%) | 13 (10%) |
| 3.0 mm | 89 (39%) | 17 (55%) | 22 (34%) | 50 (38%) |
| Right upper lobe | 77 (34%) | 15 (48%) | 19 (30%) | 43 (33%) |
| Right middle lobe | 12 (5%) | 2 (6%) | 2 (3%) | 8 (6%) |
| Right lower lobe | 37 (16%) | 1 (3%) | 15 (23%) | 21 (16%) |
| Left upper lobe | 69 (30%) | 10 (32%) | 22 (34%) | 37 (28%) |
| Left lower lobe | 32 (14%) | 3 (10%) | 6 (9%) | 23 (17%) |
| Average diam. (mm) | 17.9 (12.9, 23.0) | 13.4 (10.9, 16.5) | 14.8 (11.9, 20.4) | 20.1 (16.0, 25.1) |
| Max. diam. (mm) | 21.0 (15.0, 28.4) | 16.0 (12.1, 20.0) | 17.8 (14.0, 23.8) | 24.3 (18.5, 31.7) |
| Min. diam. (mm) | 13.1 (10.5, 18.7) | 11.2 (8.7, 13.0) | 12.0 (9.2, 17.4) | 15.2 (11.9, 19.7) |
| Max.min.diam. ratio | 1.5 (1.3, 1.8) | 1.5 (1.2, 1.7) | 1.4 (1.3, 1.7) | 1.6 (1.3, 1.9) |
| Consolidation ratio | 0.43 (0.00, 0.60) | 0.13 ± 0.18 | 0.31 ± 0.27 | 0.52 ± 0.31 |
| Volume (mm3) | 2,242 (1,099, 5,154) | 1,362 (458, 2,178) | 1,297 (830, 4,088) | 3,436 (1,710, 6,349) |
| Volume (log) | 3.4 ± 0.49 | 3.1 ± 0.49 | 3.2 ± 0.44 | 3.5 ± 0.46 |
| Mean (HU) | −500 ± 155 | −649 ± 106 | −569 ± 113 | −432 ± 143 |
| Standard deviation (HU) | 200 ± 55 | 146 ± 40 | 173 ± 47 | 225 ± 45 |
| Skewness | 0.48 (0.15, 0.78) | 0.61 (0.40, 0.92) | 0.68 (0.38, 0.91) | 0.32 (-0.04, 0.55) |
| Kurtosis | 2.7 (2.1, 3.5) | 3.5 (2.8, 4.4) | 3.2 (2.6, 4.2) | 2.4 (2.0, 2.9) |
| IQR (HU) | 282 (200, 381) | 173 (142, 210) | 228 (172, 297) | 354 (267, 437) |
| Q.50 (HU) | −552 (−655, −419) | −702 (−787, −612) | −610 (−682, −534) | −486 (−582, −356) |
| Q.75 (HU) | −374 (−515, −205) | −593 (−678, −438) | −464 (−561, −367) | −272 (−400, −108) |
| Q.875 (HU) | −261 ± 220 | −542 (−619, −380) | −368 (−496, −257) | −136 (−282, −16) |
| FPC1 | 0.00 ± 0.37 | −0.37 (−0.48, −0.17) | −0.19 (−0.40, 0.03) | 0.16 (−0.02, 0.42) |
| FPC2 | −0.00 ± 0.21 | −0.07 ± 0.16 | 0.01 ± 0.17 | 0.01 ± 0.23 |
Normally distributed continuous variables are shown as mean ± standard deviation, non-normally distributed features as median (interquartile range). Categorical variables are shown as number (%). Results of the comparison of patient and CT protocol characteristics, and geometric and attenuation features for the 3 classes of nodules are presented in Supplementary Table S1.
H1: atypical adenomatous hyperplasia and adenocarcinoma in situ.
H2: minimally invasive adenocarcinoma.
H3: invasive adenocarcinoma.
Max.: maximum, Min.: minimum, Diam.: diameter.
Max.min.diam. ratio : maximum diameter/minimum diameter.
Consolidation ratio: solid component maximum diameter/nodule maximum diameter.
HU: Hounsfield unit.
Q.50: CT attenuation value at the 50th percentile, Q.75: CT attenuation value at the 75th percentile, Q.875: CT attenuation value at the 87.5th percentile.
Figure 1CT attenuation curves of five nodules. It shows the CT attenuation curves for a selection of five subsolid nodules with increasing FPC1 values from FPC1: -0.906 to FPC1: 0.619. When FPC1 values increase, the CT attenuation curves become more heterogeneous corresponding to an increasing number of voxels with higher CT attenuation within the nodule.
Main features and ROC-AUC performances for H1 or H2 vs H3 nodule classification (full dataset).
| Parameter | AUC (95% CI) | Sensitivity | Specificity | Best threshold |
|---|---|---|---|---|
| Q.875 (HU) | 0.84 (0.78; 0.88) | 71 | 84 | > −253 |
| FPC1 | 0.83 (0.78; 0.88) | 67 | 90 | > 0.073 |
| IQR (HU) | 0.83 (0.76; 0.87) | 76 | 74 | > 265 |
| Q.75 (HU) | 0.82 (0.77; 0.87) | 74 | 79 | > −393 |
| SD (HU) | 0.82 (0.76; 0.87) | 73 | 78 | > 198 |
| Mean (HU) | 0.81 (0.75; 0.86) | 62 | 86 | > -483 |
| Q.50 (HU) | 0.79 (0.73; 0.84) | 64 | 81 | > −531 |
| Kurtosis | 0.78 (0.72; 0.83) | 73 | 69 | ≤ 2.84 |
| Skewness | 0.77 (0.68; 0.80) | 76 | 63 | ≤ 0.55 |
| Consolidation ratio | 0.75 (0.69; 0.80) | 78 | 64 | > 0.4 |
| Max. diameter (mm) | 0.72 (0.65; 0.78) | 67 | 71 | > 20.04 |
| Volume (mm3) | 0.71 (0.65; 0.77) | 78 | 58 | > 1,495 |
| Volume (log) | 0.71 (0.65; 0.77) | 78 | 58 | > 3.175 |
| Average diameter (mm) | 0.71 (0.64; 0.77) | 76 | 61 | > 15.9 |
| Min. diameter (mm) | 0.67 (0.60; 0.73) | 61 | 68 | > 13.3 |
| Max./min. diam ratio | 0.61 (0.54; 0.67) | 37 | 83 | > 1.74 |
The table summarizes the features in descending order of AUC magnitude. AUCs are presented with their (95% CI).
H1 or H2, N = 95 (42%).
H3, N = 132 (58%).
Q.50: CT attenuation value at the 50th percentile, Q.75: CT attenuation value at the 75th percentile, Q.875: CT attenuation value at the 87.5th percentile.
HU: Hounsfield unit.
Consolidation ratio: solid component maximum diameter/nodule maximum diameter.
Max.: maximum, Min.: minimum, Diam.: diameter.
Main features and ROC-AUC performances for H1 vs H2 or H3 nodule classification (full dataset).
| Parameter | AUC (95% CI) | Sensitivity | Specificity | Best Threshold |
|---|---|---|---|---|
| Q.875 (HU) | 0.86 (0.81; 0.91) | 61 | 100 | > -277 |
| IQR (HU) | 0.84 (0.79; 0.89) | 76 | 81 | > 225 |
| FPC1 | 0.82 (0.77; 0.87) | 63 | 90 | > −0.019 |
| Q.75 (HU) | 0.82 (0.77; 0.87) | 59 | 94 | > −393 |
| SD (HU) | 0.82 (0.77; 0.87) | 76 | 77 | > 172 |
| Mean (HU) | 0.82 (0.77; 0.87) | 67 | 84 | > −546 |
| Q.50 (HU) | 0.82 (0.77; 0.87) | 83 | 71 | > −656 |
| Consolidation ratio | 0.79 (0.73; 0.84) | 70 | 84 | > 0.31 |
| Kurtosis | 0.73 (0.67; 0.79) | 38 | 100 | ≤ 2.35 |
| Max. diameter (mm) | 0.72 (0.65; 0.78) | 57 | 81 | > 20.3 |
| Volume (mm3) | 0.70 (0.63; 0.75) | 51 | 84 | > 2,601 |
| Volume (log) | 0.70 (0.63; 0.75) | 51 | 84 | > 3.41 |
| Average diameter (mm) | 0.71 (0.64; 0.77) | 76 | 61 | > 15.9 |
| Min. diameter (mm) | 0.70 (0.64; 0.76) | 56 | 84 | > 13 |
| Skewness | 0.65 (0.59; 0.71) | 34 | 97 | ≤ 0.23 |
The table summarizes the features in descending order of AUC magnitude. AUCs are presented with their (95% CI).
H1, N = 21 (14%).
H2 or H3, N = 196 (86%).
Q.50: CT attenuation value at the 50th percentile, Q.75: CT attenuation value at the 75th percentile, Q.875: CT attenuation value at the 87.5th percentile.
HU: Hounsfield unit.
Consolidation ratio: solid component maximum diameter/nodule maximum diameter.
Max.: maximum, Min.: minimum.
Figure 2Confounder plot for feature selection. Candidate features are plotted using on the y-axis the similarity to the response (nodule class H1, H2 or H3) and on the x-axis, the similarity to the reference main predictor (nodule volume (log)). Best features are on the top-left side of the plot. Right-sided predictors are more correlated with the volume (log) and may be excluded to avoid collinearity among the predictive features.
Model 1 (volume (log) + FPC1) and Model 2 (volume (log) + Q.875) estimates for the 3-class nodule classification using the ordinal regression model (full dataset).
| Coefficients, odds ratio and intercepts | ||
|---|---|---|
| Volume (log) | C: 1.362 (0.690; 2.035) OR: 3.91 (2.02; 7.78) | < 0.001 |
| FPC1 | C: 3.644 (2.675; 4.619) OR: 38.23 (15.05; 104.98) | < 0.001 |
| H1 vs H2 or H3 (Y ≤ H2) | I: −2.702 | < 0.001 |
| H1 or H2 vs H3 (Y ≤ H3) | I: −0.5208 | 0.003 |
| Volume (log) | C: 1.251 (0.558; 1.943) OR: 3.493 (1.770; 7.098) | < 0.001 |
| Q.875 | C: 0.00695 (0.005; 0.009) OR: 1.0070 (1.005; 1.009) | < 0.001 |
| H1 vs H2 or H3 (Y ≤ H2) | I: −2.8179 | < 0.001 |
| H1 or H2 vs H3 (Y ≤ H3) | I: −0.5197 | 0.003 |
The table presents the coefficients (C), the odds ratios (OR) and the intercepts (I) with their (95% CI) for Model 1 and Model 2.
Figure 3Class probability change with the model predictors. This figure represents the estimated probability for each nodule class (H1, H2, H3) varying with either the nodule volume (log) (a), the FPC1 value (b) and the Q.875 (c). Left: Change in class probabilities with the model predictors: a. Volume (log) (Model 1 & 2). b. FPC1 (Model 1) and c. Q.875 (Model 2). Right: Change in class probabilities between predictor 1st (25%) and 3rd (75%) quartiles for nodule volume, FPC1 and Q.875. Non-significant P values are omitted.
Predictive (cross-validated) performances for the 3-class nodule classification with Model 1 (volume (log) + FPC1) and Model 2 (volume (log) + Q.875) using the ordinal regression model.
| Ordinal regression models | AUC1 | AUC2 | Normalized Brier’s score 1 (%) | Normalized | EAVG 1 | EAVG 2 |
|---|---|---|---|---|---|---|
| Mean | 0.83 | 0.85 | 18 | 36 | 0.060 | 0.075 |
| 2.5% | 0.69 | 0.74 | 0 | 15 | 0.016 | 0.022 |
| 97.5% | 0.97 | 0.96 | 45 | 57 | 0.091 | 0.125 |
| Mean | 0.87 | 0.86 | 28 | 37 | 0.052 | 0.072 |
| 2.5% | 0.75 | 0.76 | 2.5 | 15 | 0.017 | 0.016 |
| 97.5% | 0.99 | 0.96 | 53 | 60 | 0.083 | 0.123 |
AUC1 and AUC2: AUC value for the first cutoff point (H1 vs H2 or H3) or the second cutoff point (H1-H2 vs H3).
Brier’s Score 1 and Brier’s Score 2: Brier’s score value for the first cutoff point (H1 vs H2 or H3) or the second cutoff point (H1 or H2 vs H3). Brier’s scores were normalized with range between 0 and 100%
EAVG1 and EAVG2: average probability calibration error value for the first cutoff point (H1 vs H2 or H3) or the second cut-off point (H1 or H2 vs H3).