| Literature DB >> 30375411 |
Luca Giancardo1, Octavio Arevalo2,3, Andrea Tenreiro2,3, Roy Riascos2,3, Eliana Bonfante2,3.
Abstract
The aim of this study is to evaluate whether we could develop a machine learning method to distinguish models of cerebrospinal fluid shunt valves (CSF-SVs) from their appearance in clinical X-rays. This is an essential component of an automatic MRI safety system based on X-ray imaging. To this end, a retrospective observational study using 416 skull X-rays from unique subjects retrieved from a clinical PACS system was performed. Each image included a CSF-SV representing the most common brands of programmable shunt valves currently used in US which were split into five different classes. We compared four machine learning pipelines: two based on engineered image features (Local Binary Patterns and Histogram of Oriented Gradients) and two based on features learned by a deep convolutional neural network architecture. Performance is evaluated using accuracy, precision, recall and f1-score. Confidence intervals are computed with non-parametric bootstrap procedures. Our best performing method identified the valve type correctly 96% [CI 94-98%] of the time (CI: confidence intervals, precision 0.96, recall 0.96, f1-score 0.96), tested using a stratified cross-validation approach to avoid chances of overfitting. The machine learning pipelines based on deep convolutional neural networks showed significantly better performance than the ones based on engineered image features (mean accuracy 95-96% vs. 56-64%). This study shows the feasibility of automatically distinguishing CSF-SVs using clinical X-rays and deep convolutional neural networks. This finding is the first step towards an automatic MRI safety system for implantable devices which could decrease the number of patients that experience denials or delays of their MRI examinations.Entities:
Year: 2018 PMID: 30375411 PMCID: PMC6207736 DOI: 10.1038/s41598-018-34164-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Pipeline of the envisioned system.
Figure 2Examples of CSF-SV X-ray valve images in our dataset. Each row shows 15 random samples for each of the 5 classes used. From the top, row 1: Medtronic Strata II - NSC (n = 164); row 2: Codman-Hakim (n = 106); row 3: Sophysa Polaris SPV (n = 82); row 4: Codman-Certas (n = 42); row 5: Miethke ProGAV (n = 22). In parentheses, the number of samples contained in the dataset. The dataset contains all of the shunt valves brands currently used in US.
Classification performance of the 4 machine learning pipelines. In brackets the confidence intervals are shown. The accuracy column shows the absolute percentage of images correctly classified.
| Accuracy | Avg. Precision | Avg. Recall | Avg. F1-score | ||
|---|---|---|---|---|---|
|
| Local Binary Patterns (LBP) | 64% [60–69] | 0.64 [0.60–0.69] | 0.64 [0.60–0.69] | 0.64 [0.60–0.68] |
| Histogram of Oriented Gradients (HOG) | 56% [53–63] | 0.57 [0.53–0.63] | 0.56 [0.52–0.61] | 0.51 [0.47–0.56] | |
|
| Xception Network | 95% [94–97] | 0.95 [0.94–0.97] | 0.95 [0.94–0.97] | 0.95 [0.93–0.97] |
| Enhanced-Xception Network |
|
|
|
|
The precision, recall and f1-score columns represent the performance metrics averaged over the 5 classes. The performance metric for each class is available in Table 2. Deep convolutional networks trained with a transfer learning strategy clearly outperform the two feature engineering methods tested. Specifically, the “Enhanced-Xception Network” is the one showing the best performance overall. (Precision is also known as positive predictive value; recall is also known as sensitivity; F1-score is the harmonic mean of precision and recall).
Class-level performance metrics using the Enhanced-Xception Network.
| N | Precision | Recall | F1-score | ||
|---|---|---|---|---|---|
| Strata II - NSC | 164 | 0.98 | 0.96 | 0.97 | ***p < 0.001 |
| Codman-Hakim | 106 | 0.92 | 0.98 | 0.95 | ***p < 0.001 |
| Sophysa Polaris SPV | 82 | 0.98 | 1.00 | 0.99 | ***p < 0.001 |
| Codman-Certas | 42 | 0.95 | 0.86 | 0.90 | ***p < 0.001 |
| Miethke proGAV | 22 | 1.00 | 0.95 | 0.98 | ***p < 0.001 |
N = number of samples per class. (Precision is also known as positive predictive value; recall is also known as sensitivity; F1-score is the harmonic mean of precision and recall). The p-value is computed using the two-tailed Mann–Whitney U test to reject the null hypothesis that each valve is indistinguishable from the others in a 1 vs all framework.
Figure 3Confusion matrix for the Enhanced-Xception Network. Each cell shows the ratio of images of a given class (true label) classified into the class indicated by the column (predicted label). A perfect classifier will only have 1.00 in the matrix diagonal.
Figure 4All 16 misclassified samples by the Enhanced-Xception Network. Large foreign object, low contrast or acquisition angles not well represented in the dataset are visible.