| Literature DB >> 30467462 |
Shui-Hua Wang1,2, Chaosheng Tang1, Junding Sun1, Jingyuan Yang3, Chenxi Huang4, Preetha Phillips5, Yu-Dong Zhang1,6.
Abstract
Aim: Multiple sclerosis is a severe brain and/or spinal cord disease. It may lead to a wide range of symptoms. Hence, the early diagnosis and treatment is quite important. Method: This study proposed a 14-layer convolutional neural network, combined with three advanced techniques: batch normalization, dropout, and stochastic pooling. The output of the stochastic pooling was obtained via sampling from a multinomial distribution formed from the activations of each pooling region. In addition, we used data augmentation method to enhance the training set. In total 10 runs were implemented with the hold-out randomly set for each run.Entities:
Keywords: batch normalization; convolutional neural network; deep learning; dropout; multiple sclerosis; stochastic pooling
Year: 2018 PMID: 30467462 PMCID: PMC6236001 DOI: 10.3389/fnins.2018.00818
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Demographic characteristics of two datasets.
| Multiple sclerosis ( | eHealth | 38 | 676 | 34.1 ± 10.5 | 17/21 |
| Healthy control (Zhang et al., | private | 26 | 681 | 33.5 ± 8.3 | 12/14 |
Figure 1Samples of our dataset. (A) Original MS image. (B) MS image with plaque delineated. (C) Healthy control image I. (D) Healthy control image II.
Figure 2Pipeline of convolutional neural network.
Figure 3Pipeline of conv layer.
Figure 4A toy example of max pooling and average pooling.
Figure 5Structure of FC layer.
Figure 6An example of dropout neural network.
Variables used in batch normalization.
| The output of a layer | |
| The normalization of | |
| ~li | Input of the non-linearity layer |
| α | Mean value of the minibatch |
| δ2 | Variance of the minibatch |
| Layer index | |
| ε | A small constant |
| The number of samples of the minibatch |
Figure 7A toy example of stochastic pooling.
Hold-out validation setting.
| MS | 350 | 326 |
| HC | 350 | 331 |
| Total | 700 | 657 |
Figure 8Pipeline of data preprocessing.
Statistical analysis of 10 runs.
| 1 | 98.77 | 98.19 | 98.17 | 98.48 |
| 2 | 98.47 | 97.58 | 97.57 | 98.02 |
| 3 | 98.47 | 98.79 | 98.77 | 98.63 |
| 4 | 98.16 | 98.79 | 98.77 | 98.48 |
| 5 | 99.08 | 98.79 | 98.78 | 98.93 |
| 6 | 98.77 | 98.79 | 98.77 | 98.78 |
| 7 | 99.39 | 99.40 | 99.39 | 99.39 |
| 8 | 99.08 | 98.49 | 98.48 | 98.78 |
| 9 | 98.77 | 99.40 | 99.38 | 99.09 |
| 10 | 98.77 | 99.40 | 99.38 | 99.09 |
| Average | 98.77 ± 0.35 | 98.76 ± 0.58 | 98.75 ± 0.58 | 98.77 ± 0.39 |
Figure 9Results of data augmentation. (A) Rotation. (B) Scaling. (C) Noise injection. (D) Random translation. (E) Gamma correction.
Hyperparameters of Conv layers.
| Conv_1 | 3 × 3 | 1 | 8 | 2 |
| Pool_1 | 3 × 3 | 2 | ||
| Conv_2 | 3 × 3 | 8 | 8 | 2 |
| Pool_2 | 3 × 3 | 2 | ||
| Conv_3 | 3 × 3 | 8 | 16 | 1 |
| Conv_4 | 3 × 3 | 16 | 16 | 1 |
| Conv_5 | 3 × 3 | 16 | 16 | 1 |
| Pool_3 | 3 × 3 | 2 | ||
| Conv_6 | 3 × 3 | 16 | 32 | 1 |
| Conv_7 | 3 × 3 | 32 | 32 | 1 |
| Conv_8 | 3 × 3 | 32 | 32 | 1 |
| Conv_9 | 3 × 3 | 32 | 64 | 1 |
| Conv_10 | 3 × 3 | 64 | 64 | 1 |
| Conv_11 | 3 × 3 | 64 | 64 | 1 |
| Pool_4 | 3 × 3 | 2 |
Hyperparameters of Fully-connected layers.
| FCL_1 | 20 × 1024 | 20 × 1 | |
| DO_1 | 0.5 | ||
| FCL_2 | 10 × 20 | 10 × 1 | |
| DO_2 | 0.5 | ||
| FCL_3 | 2 × 10 | 2 × 1 |
Figure 10Activation map of proposed CNN model.
Figure 11Confusion matrixes of each run.
Ten random runs of MP and AP methods.
| R1 | 97.87 | 97.87 | 97.89 | 97.87 |
| R2 | 98.63 | 98.63 | 98.66 | 98.63 |
| R3 | 98.18 | 98.18 | 98.20 | 98.17 |
| R4 | 96.04 | 96.04 | 96.11 | 96.04 |
| R5 | 96.80 | 96.80 | 96.86 | 96.80 |
| R6 | 98.78 | 98.78 | 98.81 | 98.78 |
| R7 | 98.63 | 98.63 | 98.65 | 98.63 |
| R8 | 97.86 | 97.86 | 97.88 | 97.87 |
| R9 | 99.24 | 99.24 | 99.25 | 99.24 |
| R10 | 98.63 | 98.63 | 98.64 | 98.63 |
| Average | 98.07 ± 0.93 | 98.07 ± 0.98 | 98.10 ± 0.96 | 98.07 ± 0.98 |
| 1 | 97.41 | 97.41 | 97.55 | 97.41 |
| 2 | 96.65 | 96.66 | 96.67 | 96.65 |
| 3 | 98.33 | 98.32 | 98.37 | 98.33 |
| 4 | 97.41 | 97.41 | 97.42 | 97.41 |
| 5 | 96.65 | 96.65 | 96.65 | 96.65 |
| 6 | 97.87 | 97.87 | 97.88 | 97.87 |
| 7 | 97.56 | 97.57 | 97.58 | 97.56 |
| 8 | 97.87 | 97.87 | 97.92 | 97.87 |
| 9 | 98.48 | 98.48 | 98.52 | 98.48 |
| 10 | 98.48 | 98.47 | 98.51 | 98.48 |
| Average | 97.67 ± 0.64 | 97.67 ± 0.67 | 97.71 ± 0.68 | 97.67 ± 0.67 |
Pooling method comparison and p-values of singed-rank test.
| MP | 98.33 ± 0.75 | 98.33 ± 0.79 | 98.34 ± 0.79 | 98.33 ± 0.80 |
| 0.0645 | 0.0605 | |||
| AP | 97.67 ± 0.64 | 97.67 ± 0.67 | 97.71 ± 0.68 | 97.67 ± 0.67 |
| SP (Ours) | 98.77 ± 0.35 | 98.76 ± 0.58 | 98.75 ± 0.58 | 98.77 ± 0.39 |
Bold means the p-values are less than 0.05.
Comparison of the approach with and without data augmentation.
| No augmentation | 98.22 ± 0.71 | 98.19 ± 1.03 | 98.18 ± 1.01 | 98.20 ± 0.77 |
| Data augmentation | 98.77 ± 0.35 | 98.76 ± 0.58 | 98.75 ± 0.58 | 98.77 ± 0.39 |
Comparison to traditional AI approaches.
| Multiscale AM-FM (Murray et al., | 94.08 | 93.64 | 91.91 | 93.83 |
| ARF (Nayak et al., | 96.23 ± 1.18 | 96.32 ± 1.48 | N/A | 96.28 ± 1.25 |
| BWT-LR (Wang et al., | 97.12 ± 0.14 | 98.25 ± 0.16 | N/A | 97.76 ± 0.10 |
| 4-level HWT (Wu and Lopez, | N/A | N/A | N/A | 87.65 ± 1.79 |
| MBD (Zhang et al., | 97.78 ± 1.29 | 97.82 ± 1.60 | N/A | 97.80 ± 1.40 |
| CNN-DO-BN-SP (Ours) | 98.77 ± 0.35 | 98.76 ± 0.58 | 98.75 ± 0.58 | 98.77 ± 0.39 |
Comparison to deep learning approaches.
| CNN-PReLU-DO (Zhang et al., | 98.22 | 98.24 | N/A | 98.23 |
| CNN-DO-BN-SP (Ours) | 98.77 ± 0.35 | 98.76 ± 0.58 | 98.75 ± 0.58 | 98.77 ± 0.39 |
Figure 12Comparison plot.