| Literature DB >> 31041565 |
Hai Ye1, Feng Gao2, Youbing Yin2, Danfeng Guo2, Pengfei Zhao2, Yi Lu2, Xin Wang2, Junjie Bai2, Kunlin Cao2, Qi Song2, Heye Zhang3, Wei Chen4,5, Xuejun Guo6, Jun Xia7.
Abstract
OBJECTIVES: To evaluate the performance of a novel three-dimensional (3D) joint convolutional and recurrent neural network (CNN-RNN) for the detection of intracranial hemorrhage (ICH) and its five subtypes (cerebral parenchymal, intraventricular, subdural, epidural, and subarachnoid) in non-contrast head CT.Entities:
Keywords: 3D imaging; Algorithms; Brain; Intracranial hemorrhage (ICH); Multislice computed tomography
Mesh:
Year: 2019 PMID: 31041565 PMCID: PMC6795911 DOI: 10.1007/s00330-019-06163-2
Source DB: PubMed Journal: Eur Radiol ISSN: 0938-7994 Impact factor: 5.315
Demographic information of subjects used in this study
| Non-ICH | ICH | ||
|---|---|---|---|
|
| 1000 | 1836 | – |
| Age (years)* | 41.58 ± 15.26 (2–82) | 53.91 ± 16.51 (1–98) | < 0.001 |
| Sex (male:female) | 448:552 | 1195:641 | < 0.001 |
*Age reported as mean ± standard deviation (minimum–maximum)
Subject-level and slice-level scoring variability assessment of three radiologists on the diagnosis of ICH and five subtypes
| R1 and R2 | R2 and R3 | R1 and R3 |
| |||||
|---|---|---|---|---|---|---|---|---|
|
|
|
| ||||||
| ICH | Subject | 100 | 1.00 | 99 | 0.99 | 99 | 0.99 | 0.99 |
| Slice | 93 | 0.83 | 96 | 0.91 | 92 | 0.80 | 0.85 | |
| CPH | Subject | 91 | 0.77 | 95 | 0.87 | 91 | 0.77 | 0.80 |
| Slice | 95 | 0.85 | 97 | 0.92 | 95 | 0.84 | 0.87 | |
| SAH | Subject | 86 | 0.70 | 87 | 0.73 | 85 | 0.68 | 0.71 |
| Slice | 89 | 0.65 | 91 | 0.74 | 89 | 0.62 | 0.67 | |
| EDH | Subject | 98 | 0.85 | 98 | 0.83 | 97 | 0.80 | 0.82 |
| Slice | 99 | 0.79 | 99 | 0.82 | 99 | 0.73 | 0.78 | |
| SDH | Subject | 94 | 0.78 | 94 | 0.78 | 93 | 0.72 | 0.76 |
| Slice | 97 | 0.74 | 97 | 0.78 | 95 | 0.64 | 0.72 | |
| IVH | Subject | 87 | 0.72 | 94 | 0.87 | 88 | 0.74 | 0.78 |
| Slice | 93 | 0.71 | 97 | 0.88 | 94 | 0.73 | 0.78 | |
R, radiologist; p, percentage agreement rate
κ, Cohen’s kappa coefficient, a statistic that measures inter-rater agreement and is more robust than percent agreement rate. A number greater than 0.6 indicates substantial agreement, while greater than 0.8 indicates almost perfect agreement
Κ, Fleiss’ kappa coefficient, a statistic that measures the reliability of agreement between multiple raters. A number greater than 0.6 indicates substantial agreement, while greater than 0.8 indicates almost perfect agreement
Fig. 1Demonstration of ICH and its subtype prediction workflow. Given processed CT images, two-type classification was first applied to predict if a subject showed ICH. If a subject was predicted to be ICH-positive by our algorithm, we further applied five-type classification to determine which (one or more) of the five subtypes of ICH this subject had
Fig. 2Illustration of training and testing schema of the two-type and five-type classification tasks. Collected data was first pre-processed and then utilized as training, validation, and testing set for two-type and five-type classification tasks
Subject-level performance of the automated algorithm, three junior radiology trainees, and a senior radiologist on two-type and five-type classification tasks
| Accuracy | Sensitivity | Specificity | F1 score | AUC | ||
|---|---|---|---|---|---|---|
| ICH | Model (Sub-Lab) | 0.99 | 0.98 | 0.99 | 0.99 | 1.00 |
| Model (Sli-Lab) | 0.99 | 0.99 | 0.99 | 0.99 | 1.00 | |
| JRT 1 | 0.94 | 0.91 | 1.00 | 0.95 | 0.96 | |
| JRT 2 | 0.97 | 0.97 | 0.97 | 0.98 | 0.97 | |
| JRT 3 | 0.97 | 0.95 | 1.00 | 0.97 | 0.97 | |
| JRT ( | 0.96 ± 0.02 | 0.94 ± 0.03 | 0.99 ± 0.02 | 0.96 ± 0.02 | 0.97 ± 0.01 | |
| SR | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | |
| CPH | Model (Sub-Lab) | 0.88 | 0.90 | 0.82 | 0.92 | 0.94 |
| Model (Sli-Lab) | 0.90 | 0.92 | 0.83 | 0.93 | 0.94 | |
| JRT 1 | 0.84 | 0.79 | 1.00 | 0.88 | 0.89 | |
| JRT 2 | 0.92 | 0.92 | 0.90 | 0.94 | 0.91 | |
| JRT 3 | 0.87 | 0.86 | 0.90 | 0.91 | 0.88 | |
| JRT ( | 0.88 ± 0.04 | 0.86 ± 0.07 | 0.93 ± 0.06 | 0.91 ± 0.03 | 0.89 ± 0.02 | |
| SR | 0.95 | 0.98 | 0.86 | 0.97 | 0.92 | |
| SAH | Model (Sub-Lab) | 0.75 | 0.65 | 0.82 | 0.7 | 0.82 |
| Model (Sli-Lab) | 0.83 | 0.69 | 0.94 | 0.78 | 0.89 | |
| JRT 1 | 0.62 | 0.19 | 0.96 | 0.30 | 0.57 | |
| JRT 2 | 0.81 | 0.58 | 1.00 | 0.74 | 0.79 | |
| JRT 3 | 0.65 | 0.27 | 0.95 | 0.40 | 0.61 | |
| JRT ( | 0.69 ± 0.10 | 0.35 ± 0.21 | 0.97 ± 0.03 | 0.48 ± 0.23 | 0.66 ± 0.12 | |
| SR | 0.96 | 0.95 | 0.96 | 0.95 | 0.96 | |
| EDH | Model (Sub-Lab) | 0.92 | 0.69 | 0.94 | 0.55 | 0.90 |
| Model (Sli-Lab) | 0.96 | 0.69 | 0.98 | 0.72 | 0.94 | |
| JRT 1 | 0.97 | 0.54 | 1.00 | 0.73 | 0.77 | |
| JRT 2 | 0.98 | 0.77 | 1.00 | 0.87 | 0.88 | |
| JRT 3 | 0.96 | 0.85 | 0.97 | 0.73 | 0.91 | |
| JRT ( | 0.97 ± 0.01 | 0.72 ± 0.16 | 0.99 ± 0.02 | 0.78 ± 0.08 | 0.85 ± 0.07 | |
| SR | 0.99 | 0.92 | 1.00 | 0.96 | 0.96 | |
| SDH | Model (Sub-Lab) | 0.87 | 0.61 | 0.93 | 0.64 | 0.91 |
| Model (Sli-Lab) | 0.94 | 0.86 | 0.96 | 0.84 | 0.96 | |
| JRT 1 | 0.88 | 0.53 | 0.96 | 0.62 | 0.75 | |
| JRT 2 | 0.94 | 0.75 | 0.99 | 0.83 | 0.87 | |
| JRT 3 | 0.91 | 0.50 | 1.00 | 0.67 | 0.75 | |
| JRT ( | 0.91 ± 0.03 | 0.59 ± 0.14 | 0.98 ± 0.02 | 0.71 ± 0.11 | 0.79 ± 0.07 | |
| SR | 0.98 | 0.94 | 0.99 | 0.96 | 0.97 | |
| IVH | Model (Sub-Lab) | 0.84 | 0.66 | 0.94 | 0.74 | 0.84 |
| Model (Sli-Lab) | 0.91 | 0.84 | 0.95 | 0.87 | 0.93 | |
| JRT 1 | 0.83 | 0.57 | 0.97 | 0.70 | 0.77 | |
| JRT 2 | 0.92 | 0.82 | 0.98 | 0.88 | 0.90 | |
| JRT 3 | 0.88 | 0.72 | 0.97 | 0.81 | 0.84 | |
| JRT( | 0.88 ± 0.05 | 0.70 ± 0.13 | 0.97 ± 0.01 | 0.80 ± 0.09 | 0.84 ± 0.07 | |
| SR | 0.96 | 1.00 | 0.94 | 0.94 | 0.97 | |
Sub-Lab, only subject-level labels were available and used in the training process. Sli-Lab, slice-level labels were available; thus, both slice-level and subject-level labels were used in the training process
JRT, junior radiology trainee; SR, senior radiologist
x̅ ± s, mean ± standard deviation
Fig. 3Subject-level ROC curves and AUC results for two-type and five-type classification tasks. a, b two-type and five-type results for algorithm trained with only subject-level labels. c, d two-type and five-type results for algorithm trained with both subject-level and slice-level labels. The dashed black line shows the diagonal between coordinates (0, 0) and (1, 1). AUC is shown in the legend of each plot
Fig. 4Examples of regions that our algorithm paid most attention to when making decisions using the Grad-CAM approach. a–f Results for slices with different bleeding locations and different sizes of bleeding areas. Red means high importance while gray means low importance
Fig. 5Representative examples of SAH-positive cases that were misdiagnosed by all three junior radiology trainees but correctly predicted by our algorithm. a–c Three consecutive slices around the SAH hemorrhage loci for each example. The white arrows point to the SAH hemorrhage loci confirmed by the senior radiologist