| Literature DB >> 35396665 |
Claudio E von Schacky1, Nikolas J Wilhelm2, Valerie S Schäfer3, Yannik Leonhardt3, Matthias Jung4, Pia M Jungmann4, Maximilian F Russe4, Sarah C Foreman3, Felix G Gassert3, Florian T Gassert3, Benedikt J Schwaiger5, Carolin Mogler6, Carolin Knebel2, Ruediger von Eisenhart-Rothe2, Marcus R Makowski3, Klaus Woertler3, Rainer Burgkart2, Alexandra S Gersing3,7.
Abstract
OBJECTIVES: To develop and validate machine learning models to distinguish between benign and malignant bone lesions and compare the performance to radiologists.Entities:
Keywords: Bone neoplasms; Diagnostic imaging; Machine learning; Musculoskeletal system; Radiography
Mesh:
Year: 2022 PMID: 35396665 PMCID: PMC9381439 DOI: 10.1007/s00330-022-08764-w
Source DB: PubMed Journal: Eur Radiol ISSN: 0938-7994 Impact factor: 7.034
Subject characteristics*
| Subject characteristics | Overall | Training set | Validation set | Test set | External Test Set |
|---|---|---|---|---|---|
| 33.1 ± 19.4 | 34.0 ± 19.9 | 31.8 ± 17.3 | 30.3 ± 18.5 | 31.7 ± 22.1 | |
| 395 (44.9%) | 275 (44.8%) | 60 (45.1%) | 60 (45.1%) | 40 (41.7%) | |
| 213 (24.2%) | 149 (24.3%) | 32 (24.1%) | 32 (24.1%) | 31 (32.3%) | |
| Chondrosarcoma | 87 (9.8%) | 61 (9.9%) | 13 (9.8%) | 13 (9.8%) | 11 (11.5%) |
| Osteosarcoma | 34 (3.8%) | 24 (3.9%) | 5 (3.8%) | 5 (3.8%) | 7 (7.3%) |
| Ewing’s sarcoma | 32 (3.6%) | 22 (3.6%) | 5 (3.8%) | 5 (3.8%) | 5 (5.2%) |
| Plasma cell myeloma | 28 (3.2%) | 20 (3.3%) | 4 (3.0%) | 4 (3.0%) | 4 (4.2%) |
| NHL B cell | 26 (2.9%) | 18 (2.9%) | 4 (3.0%) | 4 (3.0%) | 4 (4.2%) |
| Chordoma | 6 (0.6%) | 4 (0.6%) | 1 (0.7%) | 1 (0.7%) | 0 (0%) |
| 667 (75.8%) | 465 (75.7%) | 101 (75.9%) | 101 (75.9%) | 65 (67.7%) | |
| Osteochondroma | 228 (25.9%) | 160 (26.1%) | 34 (25.6%) | 34 (25.6%) | 16 (16.7%) |
| Enchondroma | 153 (17.4%) | 107 (17.4%) | 23 (17.3%) | 23 (17.3%) | 12 (12.5%) |
| Chondroblastoma | 19 (0.2%) | 13 (2.1%) | 3 (2.3%) | 3 (2.3%) | (2.1%) |
| Osteoid osteoma | 19 (0.2%) | 13 (2.1%) | 3 (2.3%) | 3 (2.3%) | 1 (1.0%) |
| Giant cell tumor of bone | 44 (4.7%) | 30 (4.6%) | 7 (5.0%) | 7 (5.0%) | 6 (6.2%) |
| Non-ossifying fibroma | 34 (3.9%) | 24 (3.9%) | 5 (3.8%) | 5 (3.8%) | 7 (7.3%) |
| Haemangioma | 12 (1.4%) | 8 (1.3%) | 2 (1.5%) | 2 (1.5%) | 3 (3.1%) |
| Aneurysmal bone cyst | 82 (9.3%) | 58 (9.4%) | 12 (9.0%) | 12 (9.0%) | 8 (8.3%) |
| Simple bone cyst | 24 (2.7%) | 16 (2.6%) | 4 (3.0%) | 4 (3.0%) | 5 (5.2%) |
| Fibrous dysplasia | 52 (5.9%) | 36 (5.9%) | 8 (6.0%) | 8 (6.0%) | 5 (5.2%) |
| Torso/head | 118 (13.4%) | 79 (12.9%) | 16 (12.0%) | 23 (17.3%) | 16 (16.7%) |
| Upper extremity | 234 (26.6%) | 166 (27.0%) | 28 (21.1%) | 40 (30.0%) | 29 (30.2%) |
| Lower extremity | 528 (60.0%) | 369 (60.1%) | 89 (66.9%) | 70 (52.6%) | 51 (53.1%) |
*Data is given as mean ± standard deviation; data in parentheses are percentages. The internal data set was split for training, validation, and testing 70%, 15%, 15%, respectively. The external test set obtained from a different institution was included for further independent testing. Malignant tumors included chondrosarcoma, osteosarcoma, Ewing’s sarcoma, chordoma, plasma cell myeloma, and b cell non-Hodgkin’s lymphoma NHL. Benign tumors included osteochondroma, enchondroma, chondroblastoma, osteoid osteoma, non-ossifying fibroma NOF, giant cell tumor, haemangioma, simple and aneurysmatic bone cyst, and fibrous dysplasia
Fig. 1Overview of the utilized pipeline. The image and binary mask are fed to the pyRadiomics model to extract all relevant radiomic features. The extracted features and clinical information are then sent to an ANN in order to distinguish between benign and malignant tumors
Fig. 2Visualization of the 10 most important features with their relative importance of the random forest classifier
Performance on the 10 most significant radiomic and demographic features alone
| Feature | AUC | Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|
| age | 0.49 ± 0.01 | 0.58 ± 0.01 | 0.33 ± 0.09 | 0.67 ± 0.05 |
| wavelet-LLH_firstorder_TotalEnergy | 0.64 ± 0.01 | 0.6 ± 0.05 | 0.73 ± 0.03 | 0.55 ± 0.08 |
| wavelet-HHH_firstorder_TotalEnergy | 0.65 ± 0.02 | 0.61 ± 0.03 | 0.71 ± 0.04 | 0.57 ± 0.05 |
| wavelet-LHH_firstorder_TotalEnergy | 0.63 ± 0.01 | 0.62 ± 0.01 | 0.71 ± 0.04 | 0.59 ± 0.03 |
| wavelet-LLH_firstorder_Energy | 0.61 ± 0.01 | 0.58 ± 0.02 | 0.56 ± 0.05 | 0.59 ± 0.05 |
| wavelet-HLH_firstorder_TotalEnergy | 0.65 ± 0.01 | 0.55 ± 0.03 | 0.76 ± 0.03 | 0.48 ± 0.05 |
| wavelet-HHH_firstorder_Energy | 0.6 ± 0.01 | 0.59 ± 0.03 | 0.56 ± 0.1 | 0.59 ± 0.07 |
| original_firstorder_TotalEnergy | 0.59 ± 0.02 | 0.52 ± 0.02 | 0.78 ± 0.03 | 0.42 ± 0.03 |
| wavelet-LHL_firstorder_TotalEnergy | 0.6 ± 0.01 | 0.55 ± 0.01 | 0.67 ± 0.03 | 0.51 ± 0.03 |
| wavelet-HLL_firstorder_TotalEnergy | 0.57 ± 0.02 | 0.52 ± 0.02 | 0.72 ± 0.03 | 0.45 ± 0.04 |
*Data is given as mean ± standard deviation
The classification performances of the models on the internal test set using radiomic features or demographic information alone, as well as combining both radiomic features and demographic information. As model architectures, the following three were used: A random forest classifier (RFC), a Gaussian naïve Bayes classifier (GNB), and an artificial neural network (ANN)*
| Model architecture | Score | Demographic features | Radiomic features | Combined: radiomic + demographic features |
|---|---|---|---|---|
| 0.75 | 0.73 | 0.76 | ||
0.76 (101/133; 95% CI: 0.68, 0.83) | 0.59 (78/133; 95% CI: 0.50, 0.67) | 0.60 (80/133; 95% CI: 0.51, 0.69) | ||
0.41 (13/32; 95% CI: 0.24, 0.59) | 0.84 (27/32; 95% CI: 0.67, 0.95) | 0.81 (26/32; 95% CI: 0.64, 0.93) | ||
0.87 (88/101; 95% CI: 0.79, 0.93) | 0.5 (51/101; 95% CI: 0.40, 0.61) | 0.53 (54/101; 95% CI: 0.43, 0.63) | ||
| 0.72 | 0.68 | 0.68 | ||
0.44 (59/133; 95% CI: 0.36, 0.53) | 0.76 (101/133; 95% CI: 0.68, 0.83) | 0.76 (101/133; 95% CI: 0.68, 0.83) | ||
0.92 (29/32; 95% CI: 0.75, 0.98) | 0.44 (14/32; 95% CI: 0.26, 0.62) | 0.44 (14/32; 95% CI: 0.26, 0.62) | ||
0.29 (29/101; 95% CI: 0.20, 0.39) | 0.86 (87/101; 95% CI: 0.78, 0.92) | 0.86 (87/101; 95% CI: 0.78, 0.92) | ||
| 0.59 | 0.71 | 0.79 | ||
0.67 (89/133; 95% CI: 0.58, 0.75) | 0.75 (100/133; 95% CI: 0.67, 0.82) | 0.80 (107/133; 95% CI: 0.73, 0.87) | ||
0.38 (12/32; 95% CI: 0.21, 0.56) | 0.66 (21/32; 95% CI: 0.47, 0.81) | 0.75 (24/32; 95% CI: 0.57, 0.89) | ||
0.76 (77/101; 95% CI: 0.67, 0.84) | 0.78 (79/101; 95% CI: 0.69, 0.86) | 0.82 (83/101; 95% CI: 0.73, 0.89) |
*In parenthesis proportions are given as numerical values and 95% confidence intervals (CI) are provided. Area under the curve (AUC) were obtained from receiver operating characteristics (ROC)
Fig. 3A shows the receiver operating characteristics (ROC) on the internal test set for three artificial neural networks (ANN). One ANN was based on demographic information alone (red). Another ANN was based on radiomic features alone (yellow). A third ANN was based on the combination of demographic information and radiomic features (blue). The ANN based on both demographic information and radiomic features displayed the highest discriminatory power. B shows the ROC on the external test set for the ANN combining demographic and radiomic features
Fig. 4A shows a confusion matrix of the overall best performing model, an artificial neural network (ANN) combing both radiomic and demographic information on the internal test set. B shows the confusion matrix of the same model on the external test set obtained from another institution for further, independent testing
Fig. 5A and B Example of a malignant tumor in the tibia of a 33-year-old male with a chondrosarcoma. A shows the radiograph and B shows the segmentation for the radiomics extraction. The artificial neural network model combining both demographic and radiomic information correctly predicted a malignant tumor with a certainty of 86%. C and D Example of a benign tumor in the proximal tibia of a 15-year-old male with a non-ossifying fibroma. A shows the radiograph and B shows the segmentation for the radiomics extraction. The artificial neural network model using the combination of both, the demographic and radiomic information, correctly predicted a benign tumor with a certainty of 93%
Fig. 6A and B Example of a misclassified tumor from a 41-year-old female with an enchondroma and a pathological fracture through the tumor. A shows the radiograph and B shows the segmentation for the radiomics extraction. The artificial neural network model combining both demographic and radiomic information incorrectly classified this tumor as malignant with a certainty of 54%. C and D Example of a malignant tumor from a 45-year-old diagnosed with a chondrosarcoma. A shows the radiograph and B shows the segmentation for the radiomics extraction. The artificial neural network model combining both demographic and radiomic information correctly predicted a malignant tumor with a certainty of 67%