| Literature DB >> 35162030 |
Fırat Hardalaç1, Fatih Uysal1,2, Ozan Peker1, Murat Çiçeklidağ3, Tolga Tolunay3, Nil Tokgöz4, Uğurhan Kutbay1, Boran Demirciler5, Fatih Mert5.
Abstract
Hospitals, especially their emergency services, receive a high number of wrist fracture cases. For correct diagnosis and proper treatment of these, images obtained from various medical equipment must be viewed by physicians, along with the patient's medical records and physical examination. The aim of this study is to perform fracture detection by use of deep-learning on wrist X-ray images to support physicians in the diagnosis of these fractures, particularly in the emergency services. Using SABL, RegNet, RetinaNet, PAA, Libra R-CNN, FSAF, Faster R-CNN, Dynamic R-CNN and DCN deep-learning-based object detection models with various backbones, 20 different fracture detection procedures were performed on Gazi University Hospital's dataset of wrist X-ray images. To further improve these procedures, five different ensemble models were developed and then used to reform an ensemble model to develop a unique detection model, 'wrist fracture detection-combo (WFD-C)'. From 26 different models for fracture detection, the highest detection result obtained was 0.8639 average precision (AP50) in the WFD-C model. Huawei Turkey R&D Center supports this study within the scope of the ongoing cooperation project coded 071813 between Gazi University, Huawei and Medskor.Entities:
Keywords: X-ray; artificial intelligence; biomedical image processing; bone fractures; deep learning; fracture detection; object detection; transfer learning; wrist
Mesh:
Year: 2022 PMID: 35162030 PMCID: PMC8838335 DOI: 10.3390/s22031285
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1The anatomy of the wrist.
Figure 2Deep-learning based detection models using transfer learning.
Figure 3Single-stage object detectors (RetinaNet, FSAF, etc.).
Figure 4Two-stage object detectors (Dynamic R-CNN, Faster R-CNN, etc.).
PAA and Dynamic R-CNN bbox numbers according to different threshold values.
| Threshold Value | PAA Bbox | Dynamic R-CNN Bbox |
|---|---|---|
| 0.1 | 4317 | 291 |
| 0.2 | 1098 | 176 |
| 0.3 | 415 | 124 |
| 0.4 | 159 | 97 |
| 0.5 | 75 | 79 |
| 0.6 | 36 | 71 |
| 0.7 | 4 | 65 |
| 0.8 | 0 | 57 |
| 0.9 | 0 | 48 |
Figure 5PAA and Dynamic R-CNN bbox numbers according to different threshold values.
Figure 6Weight boxes fusion ensemble.
Figure 7WFD ensemble models based on weight boxes fusion.
Figure 8Proposed WFD-C ensemble models based on weight boxes fusion.
Submodels and weight coefficients of WFD ensemble models.
| Ensemble Models | Model-1 | Model-2 | Model-3 | Model-4 | Model-5 | Weight Coefficients |
|---|---|---|---|---|---|---|
| WFD-1 (single stage) | RegNet | FSAF | RetinaNet | SABL Rt.Net | PAA | (1, 5, 5, 5, 5) |
| WFD-2 (two stage) | DCN | SABL Fs. R-CNN | - | - | - | (2, 2) |
| WFD-3 (AP50) | RegNet | PAA | FSAF | Libra Rt.Net | - | (3, 3, 3, 4) |
| WFD-4 (AR) | RegNet | PAA | FSAF | SABL Rt.Net | - | (1, 3, 3, 1) |
| WFD-5 (LRP-opt.) | FSAF | Fs. R-CNN | DCN | Libra Rt.Net | RetinaNet | (2, 1, 4, 2, 3) |
| WFD-C (combo) | WFD-1 | WFD-3 | WFD-4 | WFD-5 | WFD-2 | (4, 4, 3, 5, 5) |
Figure 9The proposed models for fracture detection of wrist X-ray images.
Figure 10The wrist X-ray dataset.
Details of wrist fracture dataset.
| X-ray Device | Samsung GC70 | Time Period of Collection | 2010–2020 |
|---|---|---|---|
| X-ray images and total fractures | 542, 569 | Specialist physicians | 1 radiologist, 2 orthopedists |
| Dataset (train, validation, test) | %80, %10, %10 | Fracture types | radius and ulna |
| Training data and fractures | 434, 459 | Patients (female, male, total) | 134, 141, 275 |
| Validation data and fractures | 54, 55 | Average age, pediatrics, adult | 45, 21, 254 |
| Test data and fractures | 54, 56 | Patients with fractures in both wrists (train, validation, test, total) | 10, 0, 1, 11 |
Figure 11The wrist X-ray images after data preprocessing.
Figure 12The wrist X-ray images after data augmentation.
Evaluation metrics definition and calculations.
| Evaluation Metrics | Definition and Calculations |
|---|---|
| Intersection over Union (IOU) | area (BBoxp ∩ Bboxg)/area (BBoxp U Bboxg) |
| True Positive (TP) | IUO ≥ 0.5 |
| False Positive (FP) | IUO < 0.5 |
| False Negative (FN) | failing to detect Bboxg |
| True Negative (TN) | can’t be used (infinite) |
| Precision (P) | TP/(TP + FP) = TP/all detections |
| Recall (R) | TP/(TP + FN) = TP/all ground truths |
| Average Precision (AP) | under area of P-R curve |
| Average Recall (AR) | twice the under area of R-IOU curve |
| Optimal Localization Recall Precision (oLRP) | minimum achievable average matching error over the confidence scores |
Figure 13Validation AP50 and AR results of detection models.
Figure 14Validation LRP-optimal threshold (LRPt), oLRPLoc (oLRPL), oLRPFP and oLRPFN results of detection models. (* with Aug.).
Training loss and epoch results for the highest validation AP50 in detection models.
| Models | Without Augmentation | With Augmentation | ||||||
|---|---|---|---|---|---|---|---|---|
| TB_Loss | T_Loss | TT | Best Epoch | TB_Loss | T_Loss | TT | Best Epoch | |
| DCN Faster R-CNN | 0.0938 | 0.1821 | 174 | 6 | 0.0969 | 0.1839 | 174 | 5 |
| Dynamic R-CNN | 0.2978 | 0.5033 | 139 | 8 | 0.2664 | 0.4605 | 156 | 10 |
| Faster R-CNN | 0.0873 | 0.1573 | 126 | 10 | 0.0864 | 0.1634 | 137 | 12 |
| FSAF | 0.3142 | 0.5419 | 130 | 7 | 0.2605 | 0.4718 | 128 | 8 |
| RetinaNet | 0.3103 | 0.4987 | 120 | 16 | 0.348 | 0.576 | 120 | 8 |
| Libra RetinaNet | 0.5033 | 0.7278 | 131 | 8 | 0.5374 | 0.7813 | 130 | 6 |
| PAA | 0.3256 | 0.8387 | 127 | 6 | 0.33 | 0.834 | 104 | 7 |
| RegNet RetinaNet | 0.315 | 0.543 | 256 | 8 | 0.313 | 0.539 | 225 | 7 |
| SABL Faster R-CNN | 0.0536 | 0.2244 | 243 | 8 | 0.0501 | 0.2140 | 212 | 12 |
| SABL RetinaNet | 0.1034 | 0.3817 | 132 | 21 | 0.1857 | 0.6559 | 123 | 6 |
Validation AP50 and AR results of detection models.
| Models | Without Augmentation | With Augmentation | ||
|---|---|---|---|---|
| AP50 | AR | AP50 | AR | |
| DCN Faster R-CNN | 0.599 | 0.365 | 0.619 | 0.393 |
| Dynamic R-CNN | 0.77 | 0.416 | 0.777 | 0.375 |
| Faster R-CNN | 0.61 | 0.362 | 0.598 | 0.404 |
| FSAF | 0.579 | 0.415 | 0.684 | 0.46 |
| RetinaNet | 0.634 | 0.436 | 0.668 | 0.44 |
| Libra RetinaNet | 0.691 | 0.456 | 0.679 | 0.496 |
| PAA | 0.617 | 0.498 | 0.629 | 0.547 |
| RegNet RetinaNet | 0.609 | 0.458 | 0.685 | 0.471 |
| SABL Faster R-CNN | 0.632 | 0.429 | 0.658 | 0.386 |
| SABL RetinaNet | 0.67 | 0.436 | 0.67 | 0.456 |
Validation LRP-optimal threshold (LRPt), oLRPLoc (oLRPL), oLRPFP and oLRPFN results of detection models.
| Models | Without Augmentation | With Augmentation | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| LRPt | oLRPL | oLRPFP | oLRPFN | oLRP | LRPt | oLRPL | oLRPFP | oLRPFN | oLRP | |
| DCN Faster R-CNN | 0.681 | 0.309 | 0.216 | 0.473 | 0.824 | 0.506 | 0.318 | 0.261 | 0.382 | 0.815 |
| Dynamic R-CNN | 0.88 | 0.321 | 0.2 | 0.345 | 0.798 | 0.775 | 0.318 | 0.208 | 0.236 | 0.769 |
| Faster R-CNN | 0.553 | 0.329 | 0.404 | 0.382 | 0.851 | 0.641 | 0.311 | 0.319 | 0.418 | 0.827 |
| FSAF | 0.322 | 0.304 | 0.397 | 0.364 | 0.824 | 0.404 | 0.3 | 0.271 | 0.364 | 0.794 |
| RetinaNet | 0.441 | 0.317 | 0.2 | 0.418 | 0.814 | 0.439 | 0.304 | 0.25 | 0.4 | 0.804 |
| Libra RetinaNet | 0.465 | 0.295 | 0.159 | 0.327 | 0.756 | 0.437 | 0.273 | 0.234 | 0.345 | 0.752 |
| PAA | 0.503 | 0.31 | 0.443 | 0.382 | 0.842 | 0.552 | 0.28 | 0.314 | 0.364 | 0.783 |
| RegNet RetinaNet | 0.421 | 0.294 | 0.292 | 0.382 | 0.797 | 0.499 | 0.298 | 0.205 | 0.364 | 0.779 |
| SABL Faster R-CNN | 0.46 | 0.331 | 0.4 | 0.345 | 0.846 | 0.436 | 0.345 | 0.403 | 0.339 | 0.859 |
| SABL RetinaNet | 0.591 | 0.19 | 0.382 | 0.382 | 0.763 | 0.534 | 0.28 | 0.238 | 0.418 | 0.783 |
Figure 15Test AP50 and AR results of detection models.
Figure 16Test LRP-optimal threshold (LRPt), oLRPLoc (oLRPL), oLRPFP and oLRPFN results of detection models. (* with Aug.).
Test AP50 and AR results of detection models.
| Models | Without Augmentation | With Augmentation | ||
|---|---|---|---|---|
| AP50 | AR | AP50 | AR | |
| DCN Faster R-CNN | 0.547 | 0.391 | 0.577 | 0.323 |
| Dynamic R-CNN | 0.63 | 0.341 | 0.654 | 0.323 |
| Faster R-CNN | 0.617 | 0.343 | 0.624 | 0.395 |
| FSAF | 0.739 | 0.398 | 0.746 | 0.412 |
| RetinaNet | 0.621 | 0.377 | 0.652 | 0.338 |
| Libra RetinaNet | 0.695 | 0.432 | 0.715 | 0.445 |
| PAA | 0.666 | 0.491 | 0.754 | 0.496 |
| RegNet RetinaNet | 0.713 | 0.452 | 0.74 | 0.468 |
| SABL Faster R-CNN | 0.622 | 0.427 | 0.595 | 0.402 |
| SABL RetinaNet | 0.613 | 0.386 | 0.65 | 0.404 |
Test LRP-optimal threshold (LRPt), oLRPLoc (oLRPL), oLRPFP and oLRPFN results of detection models.
| Models | Without Augmentation | With Augmentation | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| LRPt | oLRPL | oLRPFP | oLRPFN | oLRP | LRPt | oLRPL | oLRPFP | oLRPFN | oLRP | |
| DCN Faster R-CNN | 0.69 | 0.343 | 0.306 | 0.554 | 0.883 | 0.47 | 0.335 | 0.364 | 0.375 | 0.848 |
| Dynamic R-CNN | 0.825 | 0.349 | 0.302 | 0.339 | 0.845 | 0.784 | 0.352 | 0.362 | 0.339 | 0.858 |
| Faster R-CNN | 0.644 | 0.339 | 0.308 | 0.357 | 0.839 | 0.54 | 0.35 | 0.419 | 0.357 | 0.869 |
| FSAF | 0.354 | 0.338 | 0.236 | 0.25 | 0.803 | 0.371 | 0.322 | 0.241 | 0.214 | 0.777 |
| RetinaNet | 0.343 | 0.357 | 0.359 | 0.268 | 0.851 | 0.376 | 0.376 | 0.286 | 0.286 | 0.862 |
| Libra RetinaNet | 0.6 | 0.321 | 0.094 | 0.482 | 0.824 | 0.443 | 0.333 | 0.196 | 0.339 | 0.81 |
| PAA | 0.516 | 0.309 | 0.296 | 0.321 | 0.798 | 0.56 | 0.310 | 0.184 | 0.286 | 0.766 |
| RegNet RetinaNet | 0.466 | 0.311 | 0.235 | 0.304 | 0.783 | 0.552 | 0.317 | 0.163 | 0.357 | 0.791 |
| SABL Faster R-CNN | 0.378 | 0.303 | 0.494 | 0.286 | 0.834 | 0.592 | 0.292 | 0.27 | 0.509 | 0.827 |
| SABL RetinaNet | 0.561 | 0.309 | 0.32 | 0.393 | 0.819 | 0.514 | 0.31 | 0.233 | 0.411 | 0.81 |
Figure 17Sample of left/right wrist fracture results [ground-truth bounding box (green), predicted bounding box (red)] for PAA (Best score of 20 models).
Figure 18Precision-recall curve of PAA (best score of 20 models).
Figure 19Results (AP50, AR, LRP-optimal threshold (LRPt), oLRPLoc (oLRPL), oLRPFP and oLRPFN) of Ensemble Model (WFD-1, WFD-2, WFD-3, WFD-4, WFD-5, WFD-C).
Results (AP50, AR, LRP-optimal threshold (LRPt), oLRPLoc (oLRPL), oLRPFP and oLRPFN) of Ensemble Model (WFD-1, WFD-2, WFD-3, WFD-4, WFD-5, WFD-C).
| Ensemble Models | AP50 | AR | LRPt | oLRPL | oLRPFP | oLRPFN | oLRP |
|---|---|---|---|---|---|---|---|
| WFD-1 | 0.8379 | 0.361 | 0.467 | 0.353 | 0.125 | 0.25 | 0.8 |
| WFD-2 | 0.776 | 0.346 | 0.47 | 0.341 | 0.174 | 0.296 | 0.806 |
| WFD-3 | 0.8327 | 0.398 | 0.452 | 0.322 | 0.13 | 0.286 | 0.77 |
| WFD-4 | 0.8338 | 0.382 | 0.511 | 0.3343 | 0.12 | 0.214 | 0.778 |
| WFD-5 | 0.8413 | 0.329 | 0.375 | 0.349 | 0.148 | 0.164 | 0.78 |
| WFD-C |
| 0.33 | 0.357 | 0.349 | 0.158 | 0.143 | 0.77 |
Figure 20Sample of right wrist fracture results [ground-truth bounding box (green), predicted bounding box (red)].
Figure 21Sample of left wrist fracture results [ground-truth bounding box (green), predicted bounding box (red)].
Figure 22Count of predicted bounding box.
Figure 23Precision-recall curve of WFD-C (best score of ensemble models).
Comparison with various amounts of wrist test datasets.
| Model | Input | Dataset | Amount | AP50 | AR | LRPt | oLRPL |
|---|---|---|---|---|---|---|---|
| PAA | 800 × 800 × 3 | Gazi | 54 test | 0.754 | 0.496 | 0.56 | 0.310 |
| YOLOv3 | 750 × 750 × 3 | Gazi | 54 test | 0.531 | 0.298 | 0.164 | 0.378 |
| Proposed WFD-C | 800 × 800 × 3 | Gazi | 54 test | 0.8639 | 0.33 | 0.357 | 0.349 |
| PAA | 800 × 800 × 3 | Gazi | 54 test + 54 valid | 0.629 | 0.499 | 0.552 | 0.304 |
| YOLOv3 | 750 × 750 × 3 | Gazi | 54 test + 54 valid | 0.516 | 0.286 | 0.164 | 0.364 |
| Proposed WFD-C | 800 × 800 × 3 | Gazi | 54 test + 54 valid | 0.709 | 0.344 | 0.454 | 0.315 |
Inclusion/exclusion criteria for dataset.
| Inclusion Criteria | Exclusion Criteria |
|---|---|
| Fracture labeling: Only fractures in the radius and ulna bones are labeled in Wrist. | Fracture labeling: other small bone (trapezoid, trapezium, scaphoid, capitate, hamate, triquetrum, pisiform, lunate) fractures in Wrist were not studied and ignored. |
| Image size: Images were rescaled to 800 × 800 × 3, and deep learning models supporting this size were used. | Image size: YOLO and other deep learning models that do not support 800 × 800 × 3 size were not used. |
| Data collection process: X-ray images of patients from the last 10 years (between 2010 and 2020) were used in 2020. | Data collection process: X-ray images of patients after 2020 were not used. |
| Number of fractures: there are 570 fractures in 542 images. Thus, there are multiple fracture ones in an image. | Number of fractures: images with not just one fracture, but one and/or more than one fracture were used. |
| The number of patients with fractures in both hands is 11. The distribution of these patients is 10 in Train and 1 in Test. | The patient with a fracture in both hands was not included in the validation. |
| The number of patients under the age of 12 and adults are 21 and 254, respectively. There is heterogeneity in these patient numbers. | There is no homogeneity, that is, an equal distribution, in the number of patients under the age of 12 and adults. |
| One radiologist and 2 orthopedists were jointly involved in the labeling of the fractures. | No more than 3 physicians were used in the labeling of fractures. There is no difference of opinion in the labeling of physicians. |
| In the Dataset distribution, the number of fractures per image is highest in the train dataset. | In the distribution of the dataset, the number of fractures per image was not considered to be equal in the train, validation and test dataset. |
| In the dataset, the number of females is 134 and the number of males is 141. | Equality in the number of males and females was not considered in the dataset. |
| The images in the dataset are 7% pair (right, left), 43% right hand, 50% left hand. | Equality was not observed in the number of pairs, right-handed and left-handed images in the dataset. |
| Since the graphics card in the local PC hardware used in the study is 4 GB, object detection models that support this are used in MMDetection. | In the study, models that support a graphics card higher than 4 GB from the object detection model-hands in MMDetection could not be used. |