| Literature DB >> 32524787 |
Qing Qing Zhou1, Jiashuo Wang2, Wen Tang3, Zhang Chun Hu1, Zi Yi Xia1, Xue Song Li1, Rongguo Zhang3, Xindao Yin4, Bing Zhang5, Hong Zhang6.
Abstract
OBJECTIVE: To evaluate the performance of a convolutional neural network (CNN) model that can automatically detect and classify rib fractures, and output structured reports from computed tomography (CT) images.Entities:
Keywords: Artificial intelligence; Convolutional neural networks; Deep learning; Multidetector computed tomography; Rib fractures; Structured report
Year: 2020 PMID: 32524787 PMCID: PMC7289688 DOI: 10.3348/kjr.2019.0651
Source DB: PubMed Journal: Korean J Radiol ISSN: 1229-6929 Impact factor: 3.500
Clinical and Radiologic Information of Rib Fracture Patients from Monocentric Data
| Variables | Total | Training Set | Validation Set | |
|---|---|---|---|---|
| No. of patients | 974 | 876 (90) | 98 (10) | - |
| No. of thick slices (5 mm) | 679 | 614 (70.1) | 65 (66.3) | 0.442 |
| No. of thin slices (1 mm) | 295 | 262 (29.9) | 33 (33.7) | 0.442 |
| Median age (range) | 55 (20–97) | 55 (20–97) | 58 (22–89) | 0.190 |
| Sex (male:female) | 643:331 | 582:294 | 61:37 | 0.472 |
| No. of annotations | 20064 | 18584 | 1480 | - |
| Fresh fracture | 8179 | 7699 (41.4) | 480 (32.5) | - |
| Healing fracture | 8723 | 8112 (43.7) | 611 (40.7) | - |
| Old fracture | 3162 | 2773 (14.9) | 389 (26.8) | - |
| Pixels | 1024 × 1024 | 1024 × 1024 | 1024 × 1024 | - |
Numerical data were reported as median (range). Percentages were shown inside parentheses.
Clinical and Radiologic Information of Five Multicenter/Multiparameter Validation Sets
| Variables | Validation Sets | Control Set | |||||
|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | |||
| No. of patients | Hospital A (n = 33) | Hospital A (n = 65) | Hospital B (n = 25) | Hospital C (n = 25) | Hospital B/C (n = 25) | Hospital A (n = 30) | - |
| Median age (range) | 62 (26–80) | 59 (24–89) | 59 (24–87) | 53 (28–73) | 61 (29–73) | 53 (32–71) | 0.625 |
| Sex (male:female) | 19:14 | 42:23 | 19:6 | 18:7 | 20:5 | 18:12 | 0.384 |
| Slice thickness (mm) | 1 | 5 | 1 | 2 | 1/2 | 1 | - |
| Pixels | 1024 × 1024 | 1024 × 1024 | 1024 × 1024 | 1024 × 1024 | 512 × 512 | 1024 × 1024 | - |
| No. of CT images* | 809 | 491 | 1468 | 1006 | 1667 | 9917 | - |
| Annotations | 881 | 599 | 1708 | 1150 | 2132 | - | - |
| Fresh fracture | 214 (24.3) | 266 (44.4) | 567 (33.2) | 270 (23.5) | 1073 (50.3) | - | - |
| Healing fracture | 418 (47.4) | 193 (32.2) | 1001 (58.6) | 388 (33.7) | 810 (38.0) | - | - |
| Old fracture | 249 (28.3) | 140 (23.4) | 140 (8.2) | 492 (42.8) | 249 (11.7) | - | - |
Numerical data were reported as median (range). Percentages were shown inside parentheses. *Validation set 1–5 contained only images with annotations, and control set contained patients' all CT images.
Fig. 1Flow chart showing overall study process.
CNN = convolutional neural network, Faster R-CNN = faster region-based convolutional neural network, FP = false positive, YOLOv3 = you only look once v3
Fig. 2Schematic illustration of Faster R-CNN architecture.
A. ResNet-101. B. RPN network was mainly used to generate regional proposals. C. ROI pooling. D. Classifier. ROI = region of interest, RPN = region proposal network
Fig. 3CT image with rectangular boxes and corresponding CT image reports.
All detected fractures were listed in sequence with numbers of corresponding CT layers (green numbers) (left). Preceding small white numbers correspond to fractures labelled in CT image (right).
Performance Metrics for Multicenter and Multiparameter Validation
| Indicators | Validation Set | Mean | ||||
|---|---|---|---|---|---|---|
| 1 (n = 33) | 2 (n = 65) | 3 (n = 25) | 4 (n = 25) | 5 (n = 25) | ||
| Precision | ||||||
| Fresh fractures | 203/259 = 0.784 (0.722–0.832) | 231/250 = 0.924 (0.895–0.955) | 487/600 = 0.812 (0.746–0.843) | 225/282 = 0.798 (0.726–0.852) | 853/961 = 0.888 (0.857–0.912) | 0.841 |
| Healing fractures | 365/388 = 0.941 (0.908–0.961) | 171/206 = 0.830 (0.803–0.868) | 840/913 = 0.920 (0.896–0.943) | 343/461 = 0.744 (0.696–0.787) | 660/778 = 0.848 (0.812–0.876) | 0.857 |
| Old fractures | 200/237 = 0.844 (0.791–0.901) | 116/144 = 0.806 (0.773–0.859) | 129/182 = 0.709 (0.620–0.782) | 367/415 = 0.884 (0.832–0.922) | 144/230 = 0.626 (0.537–0.696) | 0.774 |
| Mean | 0.856 | 0.853 | 0.814 | 0.809 | 0.787 | 0.824 |
| Recall | 203/214 = 0.949 (0.897–0.978) | 231/266 = 0.868 (0.841–0.898) | 487/567 = 0.859 (0.846–0.871) | 225/270 = 0.833 (0.811–0.866) | 853/1073 = 0.795 (0.780–0.808) | 0.861 |
| Fresh fractures | ||||||
| Healing fractures | 365/418 = 0.873 (0.860–0.884) | 171/193 = 0.886 (0.857–0.926) | 840/1001 = 0.839 (0.830–0.852) | 343/388 = 0.884 (0.869–0.899) | 660/810 = 0.815 (0.804–0.831) | 0.859 |
| Old fractures | 200/249 = 0.803 (0.774–0.830) | 116/140 = 0.829 (0.795–0.884) | 129/140 = 0.921 (0.901–0.939) | 367/492 = 0.746 (0.727–0.774) | 144/249 = 0.578 (0.521–0.631) | 0.775 |
| Mean | 0.875 | 0.861 | 0.873 | 0.821 | 0.729 | 0.832 |
| F1-score | ||||||
| Fresh fractures | 1.488/1.733 = 0.859 (0.824–0.886) | 1.604/1.792 = 0.895 (0.868–0.914) | 1.395/1.671 = 0.835 (0.798–0.863) | 1.329/1.631 = 0.815 (0.785–0.844) | 1.412/1.683 = 0.839 (0.823–0.847) | 0.849 |
| Healing fractures | 1.643/1.814 = 0.906 (0.890–0.917) | 1.471/1.716 = 0.857 (0.827–0.885) | 1.544/1.759 = 0.878 (0.869–0.888) | 1.315/1.628 = 0.808 (0.783–0.834) | 1.382/1.663 = 0.831 (0.817–0.848) | 0.856 |
| Old fractures | 1.355/1.647 = 0.823 (0.796–0.852) | 1.336/1.635 = 0.817 (0.792–0.848) | 1.306/1.630 = 0.801 (0.757–0.845) | 1.319/1.630 = 0.809 (0.792–0.830) | 0.724/1.204 = 0.601 (0.552–0.643) | 0.770 |
| Mean | 0.863 | 0.856 | 0.840 | 0.811 | 0.757 | 0.825 |
| Total FPs* | 116 | 82 | 239 | 223 | 312 | 194 |
| Fresh fractures | 56 (41–78) | 19 (11–27) | 113 (91–166) | 57 (39–85) | 108 (82–142) | 71 |
| Healing fractures | 23 (15–37) | 35 (26–49) | 73 (51–97) | 118 (93–150) | 118 (93–153) | 73 |
| Old fractures | 37 (22–53) | 28 (19–34) | 53 (36–79) | 48 (31–74) | 86 (63–124) | 50 |
| Mean | 39 | 27 | 80 | 74 | 104 | 65 |
Corresponding 95% confidence intervals, shown inside parentheses, were estimated by using bootstrapping with 1000 bootstraps and randomly sampled at annotations level. *Number of FPs was total number of FPs annotations. Validation set 1 and 2 were from monocentric validation set, and validation set 3–5 were from multicenter data. FPs = False positives
Fig. 4Comparison of diagnostic efficiency for different fractures in different situations on fROC curves.
True positive rate and average number of FP per scan of fresh fractures (A), and healing fractures (B), old fractures (C) on whole CT images from 33 patients without merging results are shown by fROC curves. From enlarged inset, 11 points of structured report (yellow star) and radiologists with and without AI assistance (red and orange circles, respectively) were all above curve; among them, five points representing AI-assisted diagnosis (red circles) were greatest (all located in upper left corner). AI = artificial intelligence, fROC = free-response receiver operating characteristic
Comparison of Precision and Sensitivity of Different Fractures in Different Situations
| Indicators | Fresh Fractures | Healing Fractures | Old Fractures | Mean |
|---|---|---|---|---|
| Mean precision | ||||
| Five validation sets with fracture images | 4.206/5 = 0.841 | 4.283/5 = 0.857 | 3.869/5 = 0.774 | 0.824 |
| Full images without results merged | 165/283 = 0.583 | 345/483 = 0.714 | 168/296 = 0.568 | 0.622 |
| Structured report | 43/67 = 0.642 | 49/61 = 0.803 | 38/46 = 0.826 | 0.757 |
| Radiologists without AI assistance* | 4.351/5 = 0.870 | 4.243/5 = 0.848 | 3.459/5 = 0.692 | 0.803 |
| Radiologists with AI assistance* | 4.457/5 = 0.891 | 4.577/5 = 0.915 | 4.642/5 = 0.928 | 0.911 |
| Mean sensitivity | ||||
| Five validation sets with fracture images | 4.304/5 = 0.861 | 4.297/5 = 0.859 | 3.877/5 = 0.775 | 0.832 |
| Full images without results merged | 165/214 = 0.771 | 345/418 = 0.825 | 168/249 = 0.675 | 0.757 |
| Structured report | 43/45 = 0.956 | 49/56 = 0.875 | 38/54 = 0.704 | 0.845 |
| Radiologists without AI assistance† | 3.623/5 = 0.725 | 3.071/5 = 0.614 | 2.667/5 = 0.533 | 0.624 |
| Radiologists with AI assistance† | 4.621/5 = 0.924 | 4.570/5 = 0.914 | 3.758/5 = 0.752 | 0.863 |
*Precision of radiologists' diagnoses increased 10.8% after AI assistance (p = 0.008), †Sensitivity of diagnosis increased 23.9% after AI assistance (p = 0.008). AI = artificial intelligence
Fig. 5Bar graph of time to diagnosis.
Time to diagnosis of five different radiologists decreased when AI assistance was used (all p < 0.01) and average time decrease was 73.9 seconds.
FPs and Frequency of Rib Fracture of Structured Report in Control Set (n = 30)
| Category | Per-Patient Level | Per-Lesion Level | ||
|---|---|---|---|---|
| FPs | Frequency (FPs/Patient) | FPs | Frequency (FPs/Patient) | |
| Fresh fractures | 6 (2–9) | 6/30 = 0.200 (0.067–0.300) | 10 (3–16) | 10/30 = 0.333 (0.100–0.533) |
| Healing fractures | 5 (1–8) | 5/30 = 0.167 (0.033–0.267) | 6 (1–10) | 6/30 = 0.200 (0.033–0.333) |
| Old fractures | 1 (0–2) | 1/30 = 0.033 (0.000–0.067) | 2 (0–4) | 2/30 = 0.067 (0.000–0.133) |
| Total | 10 (3–19) | 10/30 = 0.333 (0.100–0.633) | 18 (4–30) | 18/30 = 0.600 (0.133–1.000) |
Correspondence 95% confidence intervals were shown inside parentheses. In patient level, there were 2 patients who have PFs of both fresh and healing fractures.
Fig. 6Detection/diagnosis results of different fractures shown on CT images.
A–C. Rib fractures were detected/diagnosed by CNN model and radiologists correctly. D. Two fresh fractures were diagnosed by CNN model, while subtle fresh fracture in posterior rib was missed by some radiologists (arrow). E. These two healing fractures were misdiagnosed as old fractures by radiologists, and rear one was detected correctly by CNN model (arrow). F–I. FPs were detected on healthy ribs by CNN model. CNN = convolutional neural network