| Literature DB >> 35657439 |
Susan C Shelmerdine1,2,3,4, Richard D White5, Hantao Liu6, Owen J Arthurs7,8,9, Neil J Sebire7,8,9.
Abstract
BACKGROUND: Majority of research and commercial efforts have focussed on use of artificial intelligence (AI) for fracture detection in adults, despite the greater long-term clinical and medicolegal implications of missed fractures in children. The objective of this study was to assess the available literature regarding diagnostic performance of AI tools for paediatric fracture assessment on imaging, and where available, how this compares with the performance of human readers.Entities:
Keywords: Artificial intelligence; Diagnostic accuracy; Fracture; Machine learning; Trauma
Year: 2022 PMID: 35657439 PMCID: PMC9166920 DOI: 10.1186/s13244-022-01234-3
Source DB: PubMed Journal: Insights Imaging ISSN: 1869-4101
Fig. 1PRISMA flow chart for the study search and selection
Fig. 2Methodological quality assessment of the included studies using the QUADAS-2 tool. Risk of bias and applicability concerns summary about each domain are shown for each included study
Study aims, injury to be detected and patient inclusion/exclusion criteria, organised by publication date
| Author, year | Country | Body part | Type of injury | Patient inclusion criteria | Patient exclusion criteria | Study aim |
|---|---|---|---|---|---|---|
| Zhou, [ | USA | Forearm | Plastic bowing deformities | Forearm radiographs of children aged 1–18 years with history of trauma | None stated | Development of a computer-aided detection application for plastic bowing deformity fractures in paediatric forearms |
| Malek [ | Malaysia | Lower limb (femur, tibia, fibula) | Any fracture | Radiographs of fractured femur, tibia or fibula in children < 12 years of age | None stated | Development of an artificial neural network to analyse normal (< 12 weeks) versus delayed healing time for paediatric lower limb fractures |
| England [ | USA | Elbow | Traumatic elbow joint effusions | Elbow radiographs of children aged 1–19 years attending the emergency department with history of blunt trauma. Lateral view of radiograph technically adequate | Images with cast applied, elbow dislocation/displacement, comminuted fracture, metallic surgical hardware | Detection of traumatic paediatric elbow joint effusions using a deep convolutional neural network |
| Rayan [ | USA | Elbow | Any elbow fracture | Elbow radiographs in children | None stated | Binomial classification of elbow fractures using a deep learning approach |
| Choi [ | South Korea | Elbow | Supracondylar fractures | Elbow radiographs (two views) in children with suspected supracondylar fracture | Follow-up imaging (only initial radiographs included) Non-supracondylar fractures Elbow dislocation Underlying bone dysplasia | Development of a dual input convolutional neural network for detection of supracondylar fractures |
| Starosolski [ | USA | Distal tibia | Most fracture types | Radiographs of the foot, ankle, tibia or fibula in children | Plastic bowing fractures or any fracture without discrete fracture line. Images with surgical fixation, cast or other alternative pathology than fracture | Development of a convolutional neural network for detection of tibial fractures |
| Dupuis [ | France | Appendicular skeleton | Any appendicular fracture type | Radiographs of any body part from consecutive patients < 18 years old with suspected trauma attending emergency department | Radiographs of the axial skeleton (skull, spine, chest) | External validation of a commercially available deep learning algorithm for appendicular fracture detection in children |
| Zhang [ | Canada | Distal radius | Any fracture type | Children aged < 17 years with unilateral distal radial tenderness following trauma with asymptomatic contralateral wrist as normal comparator | Existing cast over forearm, laceration of the forearm, open fractures, inability to tolerate ultrasound study, lack of time for scanning | Diagnostic accuracy of 3D ultrasound and use of artificial intelligence for detection of paediatric wrist injuries |
| Tsai [ | USA | Distal tibia | Corner metaphyseal fractures | Children aged < 1 years referred for suspected abuse | None stated, AP projections for normal and abnormal distal tibial radiographs included only | Develop and evaluate a machine learning based binary classification algorithm to detect distal tibial corner metaphyseal fractures on radiographic skeletal surveys performed for suspected infant abuse |
Study characteristics for articles included in systematic review, organised by publication date
| Author, year | Dataset study period | Patient ages (years, unless otherwise stated) | % Male | No. centres | Type of centre(s) | Index test | Ground truth / reference | Ground truth blinded to clinical detail? |
|---|---|---|---|---|---|---|---|---|
| Zhou [ | Not stated | Range: 1–18 | Not stated | Single | Tertiary Paediatric | Plain radiography | Two radiologists, over 10-year experience each | Yes |
| Malek [ | 4 years (2009–11, 2014) | Median: 8.5 SD: 3.9 Range: 0–12 | Not stated | Single | Tertiary Paediatric | Plain radiography | Time to fracture healing where no fracture line can be identified on radiography, as determined by single orthopaedic surgeon | No, but all cases were fractured |
| England [ | 3.6 years (Jan 2014–Sept 2017) | Mean: 11.4 SD: 5.1 Range: 1–19 Percentage of children in age groups (1–5, 6–10, 11–15, 16–19) per dataset are also provided in manuscript. | 64.6% | Single | Tertiary Paediatric | Plain radiography | Radiology reports by consultant radiologist. A sub selection of 262 mages re-reviewed by three musculoskeletal radiologists | Musculoskeletal radiologists assessing a sub selection of the radiographs were blinded. Original radiologist report unblinded |
| Rayan [ | 4 years (Jan 2014–Dec 2017) | Mean: 7.2 Range: 0–18 | 57% | Single | Tertiary Paediatric | Plain radiography | Radiological reports by a single radiologist (experience unspecified) | No |
| Choi [ | 6 years (Jan 2013 to Dec 2018) | Percentage of children in age groups (0–4, 5–9, 10–14, 15–19) per dataset are provided in manuscript. No mention of mean, median ages overall. Range: 0–19 | Not stated | Two centres, same city | Tertiary Paediatric | Plain radiography | All radiographs re-reviewed by two paediatric radiologists | Yes |
| Starosolski [ | 8 years (2009–2017) | Mean: 6.4 SD: 4.4 | 33% | Single | Tertiary Paediatric | Plain radiography | Radiology reports by a single radiologist | Unclear |
| Dupuis [ | 1 year (March 2019–2020) | Median: 9.2 Mean: 8.5 Range: 0–17 SD: 4.5 | 57.3% | Single | Tertiary Paediatric | Plain radiography | Radiology report by one of a possible eleven radiologists with 2.5–35 years’ experience | No, but this reference was not used for training |
| Zhang [ | Not stated | Mean: 9.9 Range: 3.8–14.8 | 70% | Single | Tertiary Paediatric | 3D ultrasound | Plain radiography acquired within 30 days of ultrasound of affected wrist, reported by consultant radiologist of affected limb. The contralateral limb was also imaged with ultrasound but without radiography confirmation of injury. In these cases normality was presumed where asymptomatic | Not for the 3D ultrasound, unclear regarding radiography reporting |
| Tsai [ | 13.4 years (1 Jan 2009 to 31 May 2021) | ‘Normal’ Cohort Mean: 5 months Range: 0.2–11.6 months SD: 3.3 months. ‘Abnormal’ Cohort Mean: 3.3 months Range: 0.4–12 months SD: 2.9 months | ‘Normal’ Cohort = 68.5%; ‘Abnormal’ Cohort = 73% | Single | Tertiary Paediatric | Plain radiography | Radiology report issued by consultant radiologist with subsequent confirmation by primary study author (experienced paediatric radiologist) | Unclear, likely not blinded |
Input data demographics and study dataset sizes, organised by publication date
| Author, year | Body part | Total dataset (patients) | Total dataset (exams and images) | Training set | Validation set | Test set |
|---|---|---|---|---|---|---|
| Zhou [ | Forearm | 226 | 226 radiographs (59 bowing fractures) | 226 radiographs (59 bowing fractures) | N/A | N/A |
| Malek [ | Lower limb (femur, tibia, fibula) | 57 | Unclear, presumed 57 exams. No mention of projections or total images. (25, 50% normal healing time; 25, 50% delayed healing time) | 39 exams (18, 50% normal; 18, 50% abnormal) | 9 exams (4, 44.4% normal; 5, 55.6% abnormal) | 17 exams (11, 64.7% normal; 6, 35.3% abnormal) |
| England [ | Elbow | 882 | 901 lateral radiographs (images) | 657 images (500, 76.2% normal; 157, 23.8% abnormal) | 115 images (82, 71.3% normal; 33, 28.7% abnormal) | 129 images (96, 74.4% normal; 33, 25.6% abnormal) |
| Rayan [ | Elbow | Not stated | 21,456 exams; 58,817 images | 20,350 exams; 55,721 images (4966, 24% normal, 15,384, 76% abnormal) | 1106 exams; 3096 images (516, 47% normal, 590, 53% abnormal) | N/A |
| Choi [ | Elbow | 810 | 1619 elbow exams; 3238 images | 1012 exams (780, 77.1% normal; 232, 22.9% abnormal) | 254 examinations (196, 77.2% normal; 58, 22.8% abnormal) | Temporal set: 258 exams (192, 74.4% normal; 66, 25.6% abnormal) Geographic set: 96 exams (72, 75.8% normal, 23, 24.2% abnormal) |
| Starosolski [ | Distal tibia | 490 | 490 exams; 245, 50% abnormal 245, 50% normal | Not stated | Not stated | 98 images (49, 50% normal; 49, 50% abnormal) |
| Dupuis [ | Appendicular skeleton | 2549 | 2634 exams; 5865 images | N/A | N/A | 1825, 69.2% normal; 809, 30.8% abnormal exams |
| Zhang [ | Distal radius | 30 | 55 × 3D ultrasound ‘sweeps’ of both wrists (injured and contralateral); Each ‘sweep’ having ~ 382 image slices Overall 19 cases of distal wrist fracture | 21 sweeps (~ 6000 images) Abnormal: Normal split not stated | 1640 image slices selected from 72 sweeps of 36 patients.23, 64% normal; 13, 36% abnormal cases 990, 60% normal; 650, 40% abnormal images Unclear how this validation dataset was acquired | N/A |
| Tsai [ | Distal tibia | 124 patients (35 abnormal, 89 normal) | 250 radiographs (177 normal, 73 abnormal) | 187 radiographs | 13 radiographs | 50 radiographs |
Diagnostic accuracy of artificial intelligence algorithms for fracture detection, organised by body parts
| Author, year | Dataset | Body part | AUC | Accuracy, % (95% CI) | Sensitivity, % (95% CI) | Specificity, % (95% CI) | PPV, % (95% CI) | NPV, % (95% CI) | TP | FP | FN | TN |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| England [ | Validation | Elbow effusions | 0.985 (0.966–1.00) | NS | NS | NS | NS | NS | NS | NS | NS | NS |
| Test | Elbow effusions | 0.943 (0.884–1.00) | 0.907 (0.843–0.951) | 0.909 (0.788–1.00) | 0.906 (0.844–0.958) | NS | NS | 87 | 9 | 3 | 30 | |
| Rayan [ | Validation | Elbow fractures | 0.947 (0.930–0.960) | 0.877 (0.856–0.895) | 0.908 (0.882–0.929) | 0.841 (0.807–0.870) | 0.867 (0.838–0.892) | 0.889 (0.858–0.914) | 536 | 82 | 54 | 434 |
| Choi [ | Validation | Supracondylar fractures | 0.976 (0.949–0.991) | 0.945 (0.910–0.967) | 0.948 (0.859–0.982) | 0.944 (0.902–0.968) | 0.833 (0.726–0.904) | 0.984 (0.954–0.995) | 55 | 11 | 3 | 185 |
| Temporal test set | Supracondylar fractures | 0.985 (0.962–0.996) | 0.904 (0.855–0.938) | 0.939 (0.852–0.983) | 0.922 (0.874–0.956) | 0.805 (0.717–0.871) | 0.978 (0.945–0.991) | 62 | 15 | 4 | 117 | |
| Geographical test set | Supracondylar fractures | 0.992 (0.947–1.000) | 0.895 (0.817–0.942) | 1.000 (0.852–1.000) | 0.861 (0.759–0.931) | 0.697 (0.564–0.803) | 1.000 | 23 | 10 | 0 | 62 | |
| Dupuis [ | Test | Elbow fractures (subgroup) | NS | 0.888 (0.847–0.919) | 0.918 (0.846–0.958) | 0.873 (0.819–0.913) | 0.781 (0.969–0.847) | 0.956 (0.915–0.977) | 89 | 25 | 8 | 172 |
| Zhou [ | Test set (best performing for AP ulnar view, using optimal central angle measurement of bone) | Forearm (Bowing fracture) | 0.992 (NS) | NS | 1.000 (NS) | 0.940 (NS) | NS | NS | NS | NS | NS | NS |
| Zhang [ | Test set—analysed per patient | Distal radius (ultrasound) | NS | 0.92 | 1.0 | 0.87 | NS | NS | NS | NS | NS | NS |
| Malek [ | Training | Lower limb fracture healing | 0.8 (NS) | 0.821 (0.673–0.910) | 0.792 (0.595–0.908) | 0.867 (0.621–0.963) | 0.905 (0.711–0.973) | 0.722 (0.491–0.875) | 19 | 2 | 5 | 13 |
| Validation | Lower limb fracture healing | NS | 0.556 (0.267–0.811) | 0.600 (0.231–0.882) | 0.500 (0.150–0.850) | 0.600 (0.231–0.882) | 0.500 (0.150–0.850) | 3 | 2 | 2 | 2 | |
| Test | Lower limb fracture healing | NS | 0.889 (0.565–0.980) | 1.000 (0.566–1.000) | 0.750 (0.301–0.954) | 0.833 (0.436–0.970) | 1.000 (0.439–1.000) | 5 | 1 | 0 | 3 | |
| Starosolski [ | Test | Distal tibia | 0.995 (NS) | 0.979 (0.929–0.994) | 0.959 (0.863–0.989) | 1.000 (0.927–1.000) | 1.000 (0.924–1.000) | 0.961 (0.868–0.989) | 47 | 0 | 2 | 49 |
| Tsai [ | Test (mean and SD for accuracy across models in fivefold cross-validation) | Distal tibia (corner metaphyseal fracture) | NS | 0.93 ± 0.018 | 0.88 ± 0.05 | 0.96 ± 0.015 | 0.89 ± 0.036 | 0.95 ± 0.023 | 13 | 2 | 2 | 33 |
| Test (best performing model) | Distal tibia (corner metaphyseal fracture) | NS | 0.960 (0.865–0.989) | 0.929 (0.685–0.987) | 0.972 (0.858–0.995) | 0.929 (0.685–0.987) | 0.972 (0.858–0.995) | 13 | 1 | 1 | 35 | |
| Dupuis [ | Test | Appendicular skeleton | NS | 0.926 (0.915–0.936) | 0.957 (0.940–0.969) | 0.912 (0.898–0.925) | 0.829 (0.803–0.852) | 0.979 (0.971–0.985) | NS | NS | NS | NS |
95% confidence intervals are omitted where these are not provided in the publication or calculatable by raw values in the confusion matrix
AP anterior–posterior, NS not stated. CI confidence interval. AUC area under the curve, PPV positive predictive value, NPV negative predictive value, TP true positive, FP false positive, FN false negative, TN true negative, SD standard deviation
Studies comparing artificial intelligence algorithms versus (or combined with) human reader, organised by publication date
| Author, year | Human/AI | Accuracy, % (95% CI) | Sensitivity, % (95% CI) | Specificity, % (95% CI) | TP | FP | FN | TN |
|---|---|---|---|---|---|---|---|---|
| England [ | AI | 0.907 (0.843–0.951) | 0.909 (0.788–1.000) | 0.906 (0.844–0.958) | 87 | 9 | 3 | 30 |
| PGY5 emergency medicine trainee (non-radiologist) | 0.915 (0.852–0.957) | 0.848 (0.681–0.949) | 0.938 (0.869–0.977) | 90 | 6 | 5 | 28 | |
| Choi, [ | AI (Geographical test set) | 0.895 (0.817–0.942) | 1.000 (0.852–1.000) | 0.861 (0.759–0.931) | 23 | 10 | 0 | 62 |
| Summated score of three radiologists (2–7-year experience) from different institution to test dataset | 0.975 (0.950–0.988) | 0.957 (0.880–0.985) | 0.981 (0.953–0.993) | 66 | 4 | 3 | 212 | |
| Lowest performing radiologist alone | NS (AUC 0.977 (0.924–0.997)) | 0.957 (0.781–0.999) | 0.972 (0.903–0.997) | NS | NS | NS | NS | |
| Lowest performing radiologist with AI assistance | NS (AUC 0.993 (0.949–1.000)) | 1.000 (0.852–1.000) | 0.972 (0.903–0.997) | NS | NS | NS | NS | |
| Zhang [ | AI (Test set—data undefined) | 0.920 | 1.000 | 0.870 | NS | NS | NS | NS |
| Human: paediatric musculoskeletal radiologist | 0.89 (0.782–0.949) | 1.000 (0.833–1.000) | 0.833 (0.681–0.921) | 19 | 6 | 0 | 30 |
95% confidence intervals are omitted where these are not provided in the publication
NS not stated. CI confidence interval. AUC area under the curve, PPV positive predictive value, NPV negative predictive value, TP true positive, FP false positive, FN false negative, TN true negative, PGY postgraduate year