| Literature DB >> 33340116 |
Sumedha Singla1, Mingming Gong2, Craig Riley3, Frank Sciurba4, Kayhan Batmanghelich5.
Abstract
PURPOSE: To develop and evaluate a deep learning (DL) approach to extract rich information from high-resolution computed tomography (HRCT) of patients with chronic obstructive pulmonary disease (COPD).Entities:
Mesh:
Year: 2021 PMID: 33340116 PMCID: PMC7965349 DOI: 10.1002/mp.14673
Source DB: PubMed Journal: Med Phys ISSN: 0094-2405 Impact factor: 4.071
Fig. 1The schematic of our model. (a) The input to our model is a three‐dimensional (3D) computed tomography (CT) scan of the lung. The lung is divided into a set of equally sized, overlapping 3D image patches. (a) The generative network is a convolutional auto‐encoder (CAE). The encoder function projects the raw image patch to a latent space and the decoder function reconstructs the image patch from the extracted latent features. (b) The attention network provides interpretability by weighting the patches based on their importance in predicting the disease severity. (c) The discriminative network (c.1) aggregates the local patch‐level information information, based on their attention weights, to create a patient‐level representation, and (c.2) uses it to predict disease severity. (b) An example of the weights learned by the adaptive weighting scheme overlaid on the input CT scan. Red color indicates higher relevance to the disease severity. In severe COPD cases, the red regions mostly focus on the bullae area, although not always. It also picks up normal regions because the absence of the normal tissue suggests more destruction by the disease and hence, more severe emphysema. Figure is best viewed in color.
Summarization of the clinical outcomes considered in the experiments and their numerical type and values.
| Clinical outcomes | Type | Values | Description |
|---|---|---|---|
|
| |||
| FEV1 | Continuous | Percentage predicted forced expiratory volume in 1 s | |
| FEV1/FVC | Continuous | FEV1 ratio with forced vital capacity (FVC) | |
| COPD | Binary | 0 or 1 | True if FEV1/FVC > 0.7 |
| GOLD stages | Categorical | 0–4 | The GOLD stages of 0 (non‐obstructed) through 4 (severely obstructed). |
|
| |||
| Centrilobular emphysema (CLE) | Categorical | 0–5 | CLE parenchymal emphysema severity score using values, none (0) to advanced destructive emphysema (6). |
| Paraseptal emphysema | Categorical | 0–2 | Specified using three labels: none, mild, and substantial. |
|
| |||
| Historic exacerbation | Binary | 0 or 1 | True if patient have experienced exacerbation in the last 1 yr. |
| Future exacerbation | Binary | 0 or 1 | True if patient reported experiencing an exacerbation by the 5th yr followup. |
|
| |||
| mMRC dyspnea scale | Categorical | 0–4 | The modified Medical Research Council (mMRC) dyspnea scale |
| Mortality | Binary | 0 or 1 | Vital status |
Results for predicting spirometry measurements and using them to diagnose and stage COPD.
| Method | FEV1 | FEV1/FVC | COPD diagnosis | GOLD | |||
|---|---|---|---|---|---|---|---|
| R‐square | R‐square | AUC ROC | AUC PR | Recall | % accuracy | % accuracy | |
| Ours (direct) |
|
| 0.82 |
|
|
|
|
| CNN | 0.53 | – |
| – | – | 51.10 | 74.90 |
| Non‐parametric | 0.58 ± 0.03 | 0.70 ± 0.02 | 0.79 | 0.70 |
| 58.85 | 84.15 |
| K‐Means | 0.56 ± 0.01 | 0.68 ± 0.02 | 0.77 | 0.68 |
| 57.27 | 82.28 |
| LAA‐950 | 0.45 ± 0.02 | 0.60 ± 0.01 | 0.75 | 0.64 | 0.70 | 55.75 | 75.69 |
CNN = convolutional neural network; COPD = chronic obstructive pulmonary disease; ROC = receiver operating characteristic; AUC = area under curve; PR = precision‐recall curve; GOLD = the Global Initiative for Chronic Obstructive Lung Disease; LAA = low attenuation area; FEV1 = forced expiratory volume in 1 s; FVC = forced vital capacity;
The bold fond is used to highlight the highest value for each column among different methods. Each row is a different method.
We repeated the experiments on these methods and the results are reported on fivefold cross‐validation over a dataset of 10 300 subjects.
We reuse the results reported by Gonzalez et al. The results are reported on a held‐out set of 1000 subjects.
COPD is diagnosed using model predicted FEV1/FVC > 0.7 and not as a binary classification.
The GOLD‐Stage is computed using decision tree classifier trained on predicted spirometry measurements.
The ROC curve shows how the true positive (TP) vs false positive (FP) relationship changes as we vary the threshold of the positive class in our model. Higher AUC‐ROC suggests better classification.
Precision (TP/TP+FP) and recall (TP/TP+FN) quantifies the model’s ability to identify instances from a positive class. High AUC‐PR and recall indicate better identification of subject’s with COPD.
Fig. 2Comparing different methods in predicting spirometry measurements, and COPD diagnosis and staging. (a) Bar graph comparing the r‐square, coefficient of determination, for regression analysis of the forced expiratory volume in 1 s (FEV1) and FEV1/FVC, where FVC is the forced vital capacity. (b) Receiver operating characteristic (ROC) curve for prediction of COPD. The ROC curve shows how the true positive vs. false positive relationship changes as we vary the threshold of the positive class. Higher AUC‐ROC suggests better classification. (c) Confusion matrix plot for staging subjects using the GOLD stage. Following the GOLD guidelines, we used the model predicted FEV1 and FEV1/FVC ratio to diagnose and stage COPD. (d) Visualizing the population by projecting the patient‐level representations to 2D space using a dimensionality reduction method called UMAP. Each dot represents one subject colored by percentage predicted FEV1. The relative position of a subject can be used to monitor the progression. We use two dimensions for the sake of visualization; it is straightforward to use a higher dimension and improve patient characterization. Figure is best viewed in color.
Fig. 3Comparing our method against traditionally used computed tomography (CT) quantification measures (LAA‐950) in stratifying the population‐based on centrilobular and paraseptal emphysema severity score. Ours (direct) model is trained to predict spirometry measures and emphysema visual score together in a single loss function. The emphysema visual score is predicted in ordinal multi‐class classification analysis. (a) Confusion matrix plot for grouping the COPDGene population‐based on centrilobular emphysema and (b) paraseptal emphysema. Our proposed method performed better than LAA features and created a more significant separation between little and substantial emphysema.
Results classifying subjects based on their emphysema visual score.
| Method | CLE | Para‐septal | ||
|---|---|---|---|---|
| % accuracy | % accuracy | % accuracy | % accuracy | |
| Ours (direct) |
|
|
| 82.99 |
| Ours (in‐direct) | 36.30 | 61.33 | 46.87 | 75.97 |
| Spirometry (FEV1) | 33.52 | 63.96 | 44.64 | 72.77 |
| LAA‐950 | 31.89 | 77.74 | 33.32 |
|
LAA = low attenuation area; FEV1 = forced expiratory volume in 1 s; CLE = centrilobular emphysema;
The bold fond is used to highlight the highest value for each column among different methods. Each row is a different method.
The results are reported on fivefold cross‐validation over a dataset of 10 300 subjects.
Ours (direct) model predicted spirometry measures and emphysema visual score together in a single loss function.
Ours (in‐direct) model predicted only spirometry measures as disease severity. The patient representations from this model are used in a separate multi‐class classification analysis to predict the emphysema visual score.
Fig. 4Receiver operating characteristic (ROC) curve and precision‐recall (PR) curve for identifying subjects with A. exacerbation history and B. future exacerbation as given in longitudinal follow up. The ROC curve shows how the true positive vs false positive relationship changes as we vary the threshold of the positive class. In the top row, the positive class represents those subjects in COPD Cohort who reported experiencing at least one exacerbation before enrolling in the study. In the bottom row, the positive class represents those subjects who reported experiencing at least one exacerbation at the 5‐yr longitudinal follow up. Higher AUC‐ROC number indicates better classification performance. Higher average precision (AP) in the PR curve means the better ability of the model in identifying subjects in a positive class. The plot shows that combining the history of past exacerbation with deep learning features from our model improves the prediction of future exacerbation. Figure is best viewed in color.
Results for identifying subjects with exacerbation risk.
| Method | Exacerbation history | |||
|---|---|---|---|---|
| ROC‐AUC | PR‐AUC | Recall | % accuracy | |
| Ours (direct) | 0.68 ± 0.02 |
| 0.27 ± 0.14 |
|
| Ours (in‐direct) |
|
|
| 74.75 |
| CNN | 0.643 | — | 0.18 | 60.40 |
| LAA‐950 | 0.65 ± 0.01 | 0.35 ± 0.02 | 0.43 ± 0.02 | 73.78 |
CNN = convolutional neural network; ROC = receiver operating characteristic; AUC = area under curve; PR = precision‐recall curve; LAA = low attenuation area;
The bold fond is used to highlight the highest value for each column among different methods. Each row is a different method.
Ours (direct) model predicted spirometry measures and clinical outcomes of interest together in a single loss function.
Ours (in‐direct) model predicted only spirometry measures as disease severity. The generalized patient representations from this model are then used in a separate classification or regression analysis to predict other clinical outcomes.
Results are reported on fivefold cross‐validation over a dataset of 10 300 subjects.
We reuse the results reported by Gonzalez et al. The results are reported on a held‐out set of 1000 subjects.
The ROC curve shows how the true positive (TP) vs false positive (FP) relationship changes as we vary the threshold of the positive class in our model. Higher AUC‐ROC suggests better classification.
Precision (TP/TP + FP) and recall (TP/TP + FN) quantifies the model’s ability to identify instances from a positive class. High AUC‐PR and recall indicate better identification of subject’s with COPD.
Solved as ordinal multiclass classification. mMRC is a 5‐category variable.
Results of Cox proportional‐hazard (PH) model for survival analysis. The probability of death, learned from binary classification of mortality, is used as covariate in Cox regression.
| Method | Hazard ratio | Quantile | Concordance | Global statistical significance | PH‐Assumption (Global |
|---|---|---|---|---|---|
| Ours (direct) | 1.04 | <2e‐16 | 0.590 |
| 0.514 |
| [CI: 0.09, 1.87] | |||||
| Ours (in‐direct) | 1.54 | <2e‐16 | 0.615 |
| 0.598 |
| [CI: 1.09, 2.17] | |||||
| CNN | 2.69 | 0.017 | 0.72 | – | – |
| [CI: 1.19, 6.05] | |||||
| Spirometry (FEV1) | 1.20 | 6.91e‐07 | 0.525 |
| – |
| [CI: 0.94, 1.54] | |||||
| BODE index | 1.68 | <2e‐16 | 0.568 |
| 0.462 |
| [CI: 1.21, 2.31] | |||||
| LAA‐950 | 1.13 | 6.35e‐07 | 0.537 |
| 0.391 |
| [CI: 0.93,1.37] |
PH = proportional hazards; CNN = convolutional neural network; FEV1 = forced expiratory volume in 1 second; BODE = body mass index, airflow obstruction, dyspnea and exercise index; LAA = low attenuation area; CI = confidence interval.
All the models have age, gender, smoking pack‐years, and center of enrollment as covariates.
The bold fond is used to highlight the highest value for each column among different methods. Each row is a different method.
Ours (direct) model predicted spirometry measures and mortality together in a single loss function. Results are reported on fivefold cross‐validation over a dataset of 10 300 subjects.
Ours (in‐direct) model predicted only spirometry measures as disease severity. The generalized patient representations from this model are then used in a separate binary classification analysis to predict mortality.
We reuse the results reported by Gonzalez et al. The results are reported on a held‐out set of 1000 subjects.
BODE index is the clinical index used to predict the mortality rate from COPD.
The Hazard ratio is the exponential coefficient (exp(β)) of the covariate. A covariate is positively associated with the event probability when the hazard ratio is above one and thus is negatively associated with the length of survival. We also report 95% confidence intervals for the hazard ratio.
A significant P‐value with > 1 hazard ratio indicates a strong relationship between the covariate and increased risk of death.
The concordance shows the fraction of pairs, where the observations with higher survival time have a higher probability of survival predicted by the model. It is analog to the area under the ROC curve in classification analysis.
The Global statistical significance of the model is tested using three alternative tests namely the likelihood‐ratio (LR) test, the Wald test, and the score log‐rank statistics. P < 0.001 indicates that the model fits significantly better than the null hypothesis. The null hypothesis states that all the betas (β) are 0.
We used scaled Schoenfeld residuals to check the proportional hazards assumption. A non‐significant P‐value shows no evidence of violation of PH assumption by survival model.
Fig. 5Kaplan–Meier plot for visualizing the results of survival analysis. The plot is obtained by performing Cox regression analysis stratified on the quantile of predicted probability of mortality in binary classification. A good Kaplan–Meier plot has large separations between the groups. BODE index is the body mass index, airflow obstruction, dyspnea, and exercise index which is highly correlated with mortality. Our model performed better than the conventional emphysema quantification, the BODE index, and spirometry measures for mortality assessment. Figure is best viewed in color.