| Literature DB >> 31001713 |
Gene Kitamura1, Chul Y Chung2, Barry E Moore3.
Abstract
To determine whether we could train convolutional neural network (CNN) models de novo with a small dataset, a total of 596 normal and abnormal ankle cases were collected and processed. Single- and multiview models were created to determine the effect of multiple views. Data augmentation was performed during training. The Inception V3, Resnet, and Xception convolutional neural networks were constructed utilizing the Python programming language with Tensorflow as the framework. Training was performed using single radiographic views. Measured output metrics were accuracy, positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity. Model outputs were evaluated using both one and three radiographic views. Ensembles were created from a combination of CNNs after training. A voting method was implemented to consolidate the output from the three views and model ensemble. For single radiographic views, the ensemble of all 5 models produced the best accuracy at 76%. When all three views for a single case were utilized, the ensemble of all models resulted in the best output metrics with an accuracy of 81%. Despite our small dataset size, by utilizing an ensemble of models and 3 views for each case, we achieved an accuracy of 81%, which was in line with the accuracy of other models using a much higher number of cases with pre-trained models and models which implemented manual feature extraction.Entities:
Keywords: Ankle; Convolutional neural network; Deep learning; Fractures; Machine learning; Neural network
Year: 2019 PMID: 31001713 PMCID: PMC6646476 DOI: 10.1007/s10278-018-0167-7
Source DB: PubMed Journal: J Digit Imaging ISSN: 0897-1889 Impact factor: 4.056
Fig. 1Example cases of ankle fractures. Each row represents a different patient, and the images are ordered as frontal, oblique, and lateral views from the left to the right. The first row demonstrates a minimally displaced lateral malleolar fracture with an incidental non-ossifying fibroma. The middle and last rows demonstrate trimalleolar fractures with a widened medial tibiotalar joint space
The architecture of the Inception V3 model. Each row represents a layer of the network, and the input of a particular layer is the output of the previous layer. For the inception layers, each row represents parallel sub-layers that were concatenated prior to being passed to the subsequent layer. The list of parameters in each row represents serial processes within the sub-layer. With the Inception 6 layer, the parameters within the parentheses represent additional parallel sub-layers within the serial processes. The number of times each Inception layer is repeated is prepended to each layer name
| Type | Patch size/strides | Input size |
|---|---|---|
| Conv 1 | 3 × 3/2 | 300 × 300 × 1 |
| Conv 2 | 3 × 3/1 | 149 × 149 × 32 |
| Conv 3 | 3 × 3/1 | 147 × 147 × 32 |
| Max pool 1 | 3 × 3/2 | 147 × 147 × 64 |
| Conv 4 | 1 × 1/1 | 73 × 73 × 64 |
| Conv 5 | 3 × 3/1 | 73 × 73 × 80 |
| Max pool 2 | 3 × 3/2 | 71 × 71 × 192 |
| 3× Inception 1 | 1 × 1/1 | 35 × 35 × 192 |
| 1 × 1/1, 3 × 3/1 | ||
| 1 × 1/1, 3 × 3/1, 3 × 3/1 | ||
| Avg pool 3 × 3/1, 1 × 1/1 | ||
| Inception 2 | 3 × 3/2 | 35 × 35 × 288 |
| Max pool 3 × 3/2 | ||
| 1 × 1/1, 3 × 3/1, 3 × 3/2 | ||
| 4× Inception 3 | 1 × 1/1 | 17 × 17 × 768 |
| 1 × 1/1, 1 × 7/1, 7 × 1/1 | ||
| 1 × 1/1, 1 × 7/1, 7 × 1/1, 1 × 7/1, 7 × 1/1 | ||
| Avg pool 3 × 3/1, 1 × 1/1 | ||
| Auxiliary | Avg pool 5 × 5/3, 1 × 1/1, linear, softmax | 17 × 17 × 768 |
| Inception 5 | 1 × 1/1, 3 × 3/2 | 17 × 17 × 768 |
| 1 × 1/1, 1 × 7/1, 7 × 1/1, 3 × 3/2 | ||
| Max pool 3 × 3/2 | ||
| 2× Inception 6 | 1 × 1/1 | 8 × 8 × 1280 |
| 1 × 1/1, (1 × 3/1, 3 × 1/1) | ||
| 1 × 1/1, 3 × 3/1, (1 × 3/1, 3 × 1/1) | ||
| Avg pool 3 × 3/1, 1 × 1/1 | ||
| Avg pool | 8 × 8/1 | 8 × 8 × 2048 |
| Output | Dropout, linear, softmax | 1 × 1 × 2048 |
The architecture of the Resnet model. Each row represents a layer of the network, and the input of a particular layer is the output of the previous layer. Serial processes are represented as comma-separated parameters in each row. The number of times each Conv layer is repeated is prepended to each layer name. The feature map sizes are downsampled by a factor for 2 during the first iteration of Conv 3 to Conv 5. Resnet models with and without inclusion of the Auxiliary and dropout were constructed
| Type | Patch size/strides | Input size |
|---|---|---|
| Conv 1 | 7 × 7/2 | 300 × 300 × 1 |
| Max pool 1 | 3 × 3/2 | 147 × 147 × 64 |
| 3× Conv 2 | 1 × 1/1, 3 × 3/1, 1 × 1/1 | 74 × 74 × 64 |
| 4× Conv 3 | 1 × 1/1, 3 × 3/1, 1 × 1/1 | 74 × 74 × 256 |
| 23× Conv 4 | 1 × 1/1, 3 × 3/1, 1 × 1/1 | 37 × 37 × 512 |
| Auxiliary | Avg pool 5 × 5/3, 1 × 1/1, linear, softmax | 19 × 19 × 1024 |
| 3× Conv 5 | 1 × 1/1, 3 × 3/1, 1 × 1/1 | 19 × 19 × 1024 |
| Avg pool | 10 × 10/1 | 10 × 10 × 2048 |
| Output | Dropout, linear, softmax | 1 × 1 × 2048 |
The architecture of the Xception network. The Conv 6 layer is repeated eight times. Each row represents a layer of the network. Serial processes are designated by comma-separated parameters. Xception models with and without Auxiliary and dropout were constructed
| Type | Patch size/strides | Input size |
|---|---|---|
| Conv 1 | 3 × 3/2 | 300 × 300 × 1 |
| Conv 2 | 3 × 3/1 | 150 × 150 × 32 |
| Conv 3 | 3 × 3/1, 3 × 3/1 | 150 × 150 × 64 |
| Max pool 1 | 3 × 3/2 | 150 × 150 × 128 |
| Conv 4 | 3 × 3/1, 3 × 3/1 | 75 × 75 × 128 |
| Max pool 2 | 3 × 3/2 | 75 × 75 × 256 |
| Conv 5 | 3 × 3/1, 3 × 3/1 | 38 × 38 × 256 |
| Max pool 3 | 3 × 3/2 | 38 × 38 × 728 |
| 8× Conv 6 | 3 × 3/1, 3 × 3/1, 3 × 3/1 | 19 × 19 × 728 |
| Auxiliary | Avg pool 5 × 5/3, 1 × 1/1, linear, softmax | 19 × 19 × 728 |
| Conv 7 | 3 × 3/1, 3 × 3/1 | 19 × 19 × 728 |
| Max pool 4 | 3 × 3/2 | 19 × 19 × 1024 |
| Conv 8 | 3 × 3/1 | 10 × 10 × 1024 |
| Conv 9 | 3 × 3/1 | 10 × 10 × 1536 |
| Avg pool | 10 × 10/1 | 10 × 10 × 2048 |
| Output | Dropout, linear, softmax | 1 × 1 × 2048 |
The output metrics for the convolutional neural networks. Ensemble_A is comprised of a combination of all five of the convolutional neural networks. Ensemble_B consists of three models that produced the best outputs together, which in this case was the Inception V3, Resnet, and Xception with drop/aux. Output metrics were obtained using the validation-test data set
| Model | Views | Accuracy | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|---|
| Inception V3 | One | 0.70 | 0.68 | 0.73 | 0.71 | 0.70 |
| Three | 0.74 | 0.73 | 0.75 | 0.74 | 0.73 | |
| Resnet | One | 0.73 | 0.68 | 0.77 | 0.75 | 0.71 |
| Three | 0.75 | 0.70 | 0.80 | 0.78 | 0.73 | |
| Resnet with drop/aux | One | 0.72 | 0.74 | 0.70 | 0.71 | 0.73 |
| Three | 0.73 | 0.73 | 0.73 | 0.73 | 0.73 | |
| Xception | One | 0.75 | 0.73 | 0.76 | 0.75 | 0.74 |
| Three | 0.78 | 0.75 | 0.80 | 0.79 | 0.76 | |
| Xception with drop/aux | One | 0.75 | 0.71 | 0.80 | 0.78 | 0.73 |
| Three | 0.78 | 0.73 | 0.73 | 0.81 | 0.75 | |
| Ensemble_A | One | 0.76 | 0.77 | 0.76 | 0.76 | 0.77 |
| Three | 0.81 | 0.80 | 0.83 | 0.82 | 0.81 | |
| Ensemble_B | One | 0.75 | 0.68 | 0.82 | 0.79 | 0.72 |
| Three | 0.80 | 0.73 | 0.88 | 0.85 | 0.76 |