| Literature DB >> 34476540 |
R K Gebre1, J Hirvasniemi2, R A van der Heijden2, I Lantto3,4, S Saarakkala5,4,6, J Leppilahti3,4, T Jämsä5,4,6.
Abstract
We developed and compared deep learning models to detect hip osteoarthritis on clinical CT. The CT-based summation images, CT-AP, that resemble X-ray radiographs can detect radiographic hip osteoarthritis and in the absence of large training data, a reliable deep learning model can be optimized by combining CT-AP and X-ray images.Entities:
Keywords: Classification; Computed tomography; Deep learning; Hip osteoarthritis; Radiology
Mesh:
Year: 2021 PMID: 34476540 PMCID: PMC8813821 DOI: 10.1007/s00198-021-06130-y
Source DB: PubMed Journal: Osteoporos Int ISSN: 0937-941X Impact factor: 5.071
Fig. 1Examples of the images used in the two classes for training the deep learning models: class 1 = no-rHOA and class 2 = rHOA. A The CHECK dataset X-ray images that were KL graded as part of the cohort study. B The CT-AP images that were manually OA graded in a binary classification as part of this study
Fig. 2Summation of CT slices to form the 2-D image referred to as CT-AP. The process of cropping the hip joints by localizing the region of interest using faster RCNN object detector is also shown
Data partitions used to train, validate, and test ResNet18 to predict radiographic hip osteoarthritis (rHOA). Model-1 and Model-2 were trained with unprocessed and downsampled CHECK X-ray images, respectively. Model-3 and Model-4 were trained on a combination of the CT-AP and X-ray images where the X-ray images were similar to the ones used in Model-1 and Model-2, respectively. Model-5 was trained solely on the CT-AP images. The overall images are the total of Cohort Hip and Cohort Knee (CHECK) X-ray and CT-AP images
| Data partitions | CHECK | CT-AP | Combined | Overall (%) | |||
|---|---|---|---|---|---|---|---|
| Model-1 and Model-2 | Model-5 | Model-3 and Model-4 | |||||
| rHOA | no-rHOA | rHOA | no-rHOA | rHOA | no-rHOA | ||
| Training | 1093 | 2019 | 29 | 17 | 1122 | 2036 | 3158 (55%) |
| Validation | 597 | 1101 | 15 | 9 | 612 | 1110 | 1722 (30%) |
| Test | 298 | 551 | 15 | 9 | 313 | 560 | 873 (15%) |
| Total | 1988 | 3671 | 59 | 35 | 2047 | 3706 | 5753 (100%) |
Performances of the five models optimized for detecting radiographic hip osteoarthritis on the X-ray images within the validation and test datasets. The CT-AP images were created by sequentially summing the CT slices. Model-1 was trained with unprocessed X-ray images. Model-2 was trained similarly with downsampled X-rays to resemble CT-AP images. Model-3 and Model-4 were trained on a combination of the CT-AP and X-ray images where the X-ray images were similar to the ones used in Model-1 and Model-2, respectively. Model-5 was trained solely on the CT-AP images
| Trained models | Accuracy | Balanced accuracy | Precision | Recall | F1-score | PR AUC [95% CI] | ROC AUC [95% CI] |
|---|---|---|---|---|---|---|---|
| X-ray images of the validation dataset | |||||||
| Model-1 | 93.3 | 92.1 | 0.92 | 0.93 | 0.93 | 0.96 [0.95–0.97] | 0.98 [0.98–0.99] |
| Model-2 | 90.6 | 89.2 | 0.89 | 0.89 | 0.89 | 0.94 [0.92–0.95] | 0.97 [0.96–0.97] |
| Model-3 | 92.7 | 92.3 | 0.92 | 0.92 | 0.92 | 0.95 [0.93–0.95] | 0.98 [0.97–0.98] |
| Model-4 | 90.4 | 89.2 | 0.89 | 0.89 | 0.89 | 0.94 [0.93–0.95] | 0.94 [0.97–0.98] |
| X-ray images of the test dataset | |||||||
| Model-1 | 92.3 | 88.5 | 0.91 | 0.92 | 0.92 | 0.96 [0.95–0.98] | 0.98 [0.97–0.98] |
| Model-2 | 90.2 | 88.1 | 0.88 | 0.90 | 0.89 | 0.95 [0.93–0.96] | 0.97 [0.96–0.98] |
| Model-3 | 91.9 | 91.1 | 0.91 | 0.91 | 0.91 | 0.95 [0.93–0.96] | 0.98 [0.97–0.98] |
| Model-4 | 91.3 | 90.8 | 0.91 | 0.90 | 0.91 | 0.93 [0.91–0.95] | 0.97 [0.96–0.98] |
| Model-5† | 49.1 | 58.9 | 0.59 | 0.64 | 0.61 | 0.53 [0.50–0.56] | 0.69 [0.66–0.72] |
| Model-5‡ | 60.5 | 55.2 | 0.55 | 0.56 | 0.55 | 0.57 [0.52–0.62] | 0.58 [0.54–0.61] |
†Performance of Model-5 on the X-ray images used in Model-1
‡Performance of Model-5 on the downsampled X-ray images used in Model-2
PR AUC, area under the precision recall curve; ROC AUC, area under the receiver operating characteristics curve; CI, confidence interval
Performances of the five models optimized for detecting of radiographic hip osteoarthritis on the CT-AP images within the validation and test datasets. The CT-AP images were created by sequentially summing the CT slices. Model-1 was trained with unprocessed X-ray images. Model-2 was trained similarly with downsampled X-rays to resemble CT-AP images. Model-3 and Model-4 were trained on a combination of the CT-AP and X-ray images where the X-ray images were similar to the ones used in Model-1 and Model-2, respectively. Model-5 was trained solely on the CT-AP images
| Trained models | Accuracy | Balanced accuracy | Precision | Recall | F1-score | PR AUC [95% CI] | ROC AUC [95% CI] |
|---|---|---|---|---|---|---|---|
| CT-AP images of the validation dataset | |||||||
| Model-3 | 79.2 | 79.0 | 0.79 | 0.78 | 0.78 | 0.90 [0.79–0.94] | 0.93 [0.73–0.98] |
| Model-4 | 87.5 | 83.3 | 0.83 | 0.92 | 0.87 | 0.88 [0.76–0.94] | 0.91 [0.69–0.99] |
| Model-5 | 83.3 | 83.3 | 0.83 | 0.82 | 0.83 | 0.76 [0.52–0.91] | 0.93 [0.71–0.99] |
| CT-AP images of the test dataset | |||||||
| Model-1 | 50.0 | 60.0 | 0.60 | 0.71 | 0.65 | 0.79 [0.63–0.93] | 0.73 [0.49–0.89] |
| Model-2 | 50.0 | 60.0 | 0.60 | 0.71 | 0.65 | 0.72 [0.55–0.89] | 0.63 [0.37–0.84] |
| Model-3 | 83.3 | 82.2 | 0.82 | 0.82 | 0.82 | 0.89 [0.79–0.94] | 0.93 [0.72–0.99] |
| Model-4 | 75.0 | 80.0 | 0.80 | 0.80 | 0.80 | 0.87 [0.77–0.94] | 0.87 [0.64–0.97] |
| Model-5 | 83.3 | 82.2 | 0.82 | 0.82 | 0.82 | 0.80 [0.57–0.92] | 0.89 [0.67–0.97] |
PR AUC, area under the precision recall curve; ROC AUC, area under the receiver operating characteristics curve; CI, confidence interval
Fig. 3Classification performances of the models on the CT-AP images shown as area under the curve (AUC) values of the receiver operator characteristic (ROC) curves. A ROC AUCs of Model-3, Model-4, and Model-5 for the validation dataset. B ROC AUCs for all the five models of the test dataset. Model-1 was trained with unpreprocessed X-ray images and Model-2 with only the downsampled X-rays images. Model-3 and Model-4 were trained on a combination of the CT-AP and X-ray images where the X-ray images were similar to the ones used in Model-1 and Model-2, respectively. Model-5 was trained solely on the CT-AP images
Fig. 4Examples of radiographic hip osteoarthritis (rHOA) and no-rHOA features learned by ResNet18 that was trained using a combination of CT-AP and X-ray images. The images shown are CT-AP images. The prediction probabilities (PP) are on the top right, and the ground truth (GT) from the manual grading is on the top left sides of the images. The red bright areas indicate learned features contributing more to the highest predicted probability class. A Misclassification of no-rHOA as rHOA. B Misclassification of rHOA as no-rHOA. C The different learned features which were accurately classified into the rHOA class