| Literature DB >> 35821056 |
Jarno T Huhtanen1,2, Mikko Nyman3, Dorin Doncenco4, Maral Hamedian4, Davis Kawalya4, Leena Salminen5, Roberto Blanco Sequeiros3, Seppo K Koskinen6, Tomi K Pudas6, Sami Kajander7, Pekka Niemi7, Jussi Hirvonen3, Hannu J Aronen3, Mojtaba Jafaritadi4.
Abstract
Joint effusion due to elbow fractures are common among adults and children. Radiography is the most commonly used imaging procedure to diagnose elbow injuries. The purpose of the study was to investigate the diagnostic accuracy of deep convolutional neural network algorithms in joint effusion classification in pediatric and adult elbow radiographs. This retrospective study consisted of a total of 4423 radiographs in a 3-year period from 2017 to 2020. Data was randomly separated into training (n = 2672), validation (n = 892) and test set (n = 859). Two models using VGG16 as the base architecture were trained with either only lateral projection or with four projections (AP, LAT and Obliques). Three radiologists evaluated joint effusion separately on the test set. Accuracy, precision, recall, specificity, F1 measure, Cohen's kappa, and two-sided 95% confidence intervals were calculated. Mean patient age was 34.4 years (1-98) and 47% were male patients. Trained deep learning framework showed an AUC of 0.951 (95% CI 0.946-0.955) and 0.906 (95% CI 0.89-0.91) for the lateral and four projection elbow joint images in the test set, respectively. Adult and pediatric patient groups separately showed an AUC of 0.966 and 0.924, respectively. Radiologists showed an average accuracy, sensitivity, specificity, precision, F1 score, and AUC of 92.8%, 91.7%, 93.6%, 91.07%, 91.4%, and 92.6%. There were no statistically significant differences between AUC's of the deep learning model and the radiologists (p value > 0.05). The model on the lateral dataset resulted in higher AUC compared to the model with four projection datasets. Using deep learning it is possible to achieve expert level diagnostic accuracy in elbow joint effusion classification in pediatric and adult radiographs. Deep learning used in this study can classify joint effusion in radiographs and can be used in image interpretation as an aid for radiologists.Entities:
Mesh:
Year: 2022 PMID: 35821056 PMCID: PMC9276721 DOI: 10.1038/s41598-022-16154-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Patient demographics in pediatric and adults groups (from the main data registry).
| Patients’s demographics | Positive fatpad/effusion | Negative fatpad/effusion | |
|---|---|---|---|
| Pediatrics (n = 490) | 8.6 (1–18) | 10.95 (0–18) | < 0.005 |
| Adults (n = 819) | 49.56 (19–97) | 49.85 (19–98) | 0.83 |
| Female | 134 (121) | 109 (126) | 0.17 |
| Female | 146 (233) | 223 (217) | < 0.005 |
*Two-sample t-test significant test.
Patient demographics in different subsets.
| Patients’s Demographics | Training | Validation | Test | Total |
|---|---|---|---|---|
| Mean (whole population) | 34.6 | 35.5 | 33 | |
| Pediatric (1–18) | 248 | 80 | 79 | 407 |
| Adults (19–98) | 418 | 142 | 134 | 694 |
| Male | 324 | 101 | 91 | 516 |
| Female | 342 | 121 | 122 | 585 |
Figure 1Example of normal adult radiographs including AP, external oblique, internal oblique and lateral projections used in this study.
Figure 2AI study method diagram. NFP = Negative Fat Pads/joint effusion; PFP = Positive Fat Pads/joint effusion; DCNN = Deep Convolutional Neural Network; ROC = Receiver Operating Characteristic; AUC = Area Under the Curve.
The number of images in train, validation and test set in 4-projection and lateral dataset.
| Data category | Train | Validation | Test |
|---|---|---|---|
| Four-projections | 2672 | 892 | 859 |
| Single lateral projection | 944 | 229 | 215 |
Different DCNN models trained in this study and model descriptions.
| Model | Description | Number of layers/trainable parameters |
|---|---|---|
| VGG16 | A 16-layer architecture consisting of convolution layers, Max-pooling layers, and 3 fully connected layers at the end. It has a deep network but end-to-end small 3 × 3 Convolutional filters | 16 layers 138.4M parameters |
| DenseNet201 | A CNN architecture consisting of Densely connected blocks, where each layer input comes from previous layer output feature maps. It has two block types, Dense blocks including batch normalization, ReLU activation and 3 × 3 convolution layers, a Transition layer consisting of Batch normalization, 1 × 1 convolution and Average pooling layers. Transition blocks are placed after each dense blocks | 402 layers 20.2M parameters |
| MobileNet | An architecture that utilizes depth-wise separable convolutions and thus reducing the number of parameters. These are made of two operations: depthwise convolution for filtering, and point-wise convolution for combining the outputs of depth-wise convolutions with 1 × 1 convolution | 55 layers 4.3M parameters |
| ResNet152 | The main feature of ResNet architecture is the existence of residual blocks that utilize shortcuts to skip some layers. Each residual block consists of two Conv-layers, with batch normalization and ReLU activation, using 3 × 3 filters with stride 1. Resnet is famous for solving the Vanishing Gradient problem | 307 layers 60.4M parameters |
| InceptionV3 | A CNN model that is made of symmetric and asymmetric building blocks that consist of Convolutions, AVG-pooling, Max-pooling, dropouts, and fully connected layers. The convolutions are factorized that results in a reduced number of learnable parameters | 189 layers 23.9M parameters |
| NASNetLarge | Stands for Neural Search Architecture network and works best on small datasets. In simple terms, it automates the network architecture engineering, and identifies and evaluates the performance of possible architecture designs without training. Furthermore, it utilizes a regularization technique called ScheduledDropPath | 533 layers 88.9M parameters |
| CheXNet | It is a 121 layer Convolutional neural network that inputs a chest X-ray image and outputs the probability of a pathology | 121 layers 6.9M parameters |
Figure 3Architecture of the modified VGG16 model trained for this paper.
Figure 4Different deep learning model comparisons are seen with their true positive rate (A) and precision (B).
Figure 5ROC AUC (A) and AUPRC (B) obtained from best training iterations for Models A (Lateral Projection data) and B (4-Projections data). AUC = Area Under the Curve and AUPRC = Area Under the Precision Recall Curve.
Figure 6AUC curves obtained from best training iterations for Model A with lateral projection data (A) and Model B with 4-projections data (B) in pediatric and adult patients. AUC = Area Under the Curve.
Figure 7Averaged confusion matrices for the model A and B over the 10 iterations. Confusion matrices show performance of a deep learning model that was trained with only lateral projection (A) and a model that was trained with four projections (B).
Classifications scores of Model A and Model B. Values (except F1-score and AUC) are shown in percentage. The Confidence level was set to 95%. (P) = Pediatric patients only, (A) = Adult patients only.
| Model | Precision | Sensitivity | Specificity | Accuracy | F1-score | AUC |
|---|---|---|---|---|---|---|
| Model A | 86.8% (83.3–90.3) | 88.5% (87.0–90.1) | 90.2% (87.1–93.3) | 89.5% (88.0–91.0) | 0.876 (0.86–0.89) | 0.951 (0.94–0.96) |
| Model B | 77.9% (75.1–80.8) | 82.2% (76.8–87.5) | 83.1% (79.4–86.7) | 82.7% (81.8–83.6) | 0.797 (0.78–0.81) | 0.906 (0.89–0.91) |
| Model A, (P) | 86.4% (83.7–89.1) | 84.9% (82.4–87.3) | 85.3% (81.7–88.9) | 85.1% (83.6–86.5) | 0.856 (0.84–0.87) | 0.924 (0.91–0.93) |
| Model A, (A) | 87.3% (82.7–91.8) | 91.7% (89.9–93.4) | 92.4% (89.2–95.6) | 92.1% (90.2–94.0) | 0.893 (0.87–0.92) | 0.966 (0.96–0.97) |
| Model B, (P) | 83.4% (81.0–85.7) | 75.7% (73.0–78.3) | 83.3% (80.1–86.4) | 79.3% (77.7–81.0) | 0.793 (0.78–0.81) | 0.866 (0.85–0.88) |
| Model B, (A) | 83.3% (80.4–86.2) | 78.0% (74.5–81.5) | 91.3% (89.1–93.6) | 86.7% (85.8–87.5) | 0.804 (0.79–0.82) | 0.924 (0.92–0.93) |
Figure 8Pediatric (Male, 10y) and adult (Female, 19y) patient with joint effusion (A,C) seen anteriorly (orange arrow) and posteriorly (blue arrow) but without visible fracture. Heat map highlights joint effusion (B,D). Pediatric (Female, 8y) and adult (Female, 36y) patient with no joint effusion (E,G) anteriorly (orange arrow) or posteriorly (blue arrow) and without visible fracture. Heat map does not highlight normal joint effusion (F,H).
AI model and Radiologist performance comparisons on the lateral elbow test set.
| Model | Model A | Radiologist 1 | Radiologist 2 | Radiologist 3 |
|---|---|---|---|---|
| AUC | 0.951 | 0.927 | 0.923 | 0.928 |
| Accuracy | 89.8% | 93% | 92.5% | 93% |
| Sensitivity | 88.8% | 92.1% | 91.01% | 92.1% |
| Specificity | 90.5% | 93.6% | 93.6% | 93.6% |
| Precision | 86.8% | 91.1% | 91.01% | 91.1% |
| F1 score | 87.8% | 91.6% | 91.01% | 91.6% |