| Literature DB >> 36034311 |
Narayani Wagle1,2, John Morkos3, Jingyan Liu1, Henry Reith1, Joseph Greenstein4, Kirby Gong1, Indranuj Gangan1, Daniil Pakhomov2, Sanchit Hira1, Oleg V Komogortsev5, David E Newman-Toker4,6,7, Raimond Winslow1,2,8, David S Zee4,8,9, Jorge Otero-Millan9,10, Kemar E Green1,4.
Abstract
Background: Nystagmus identification and interpretation is challenging for non-experts who lack specific training in neuro-ophthalmology or neuro-otology. This challenge is magnified when the task is performed via telemedicine. Deep learning models have not been heavily studied in video-based eye movement detection.Entities:
Keywords: artificial intelligence; deep learning; dizziness; eye movements; machine learning; nystagmus; telemedicine; vertigo
Year: 2022 PMID: 36034311 PMCID: PMC9403604 DOI: 10.3389/fneur.2022.963968
Source DB: PubMed Journal: Front Neurol ISSN: 1664-2295 Impact factor: 4.086
Figure 1Pendular and jerk nystagmus waveform morphologies.
Figure 2Variations in video quality of the dataset.
Figure 3Filtered image appearance across different beta values.
Figure 4Filtered image construction for (A) the non-sliding window and (B) the sliding window variations.
Figure 5Examples of the different video frame rate and resolution variations for (A) raw frames and (B) corresponding filtered images.
Figure 6Model architecture and the framework of the ensemble model. P, probability; n, number of filtered images; ∑, the sum of; n, number of filtered images.
Figure 7The k-fold cross validation architecture with the data partitioned into training and tests sets for k = 3–folds (A), AUROC curves for each fold (B) as well as training/validation loss (C) and training/validation accuracy (D) for the model.
Performance metrics for model experiments.
|
|
|
|
| |
|---|---|---|---|---|
|
| ||||
| β = 0.001 | 0.75 | 68.1% | 77.4% | 72.5% |
| β = 0.005 | 0.79 | 81.1% | 62.9% | 72.5% |
| β = 0.01 | 0.83 | 84.0% | 69.3% | 77.1% |
| β = 0.05 | 0.78 | 60.8% | 80.6% | 70.2% |
| β = 0.1 | 0.81 | 71.0% | 80.6% | 75.5% |
| β = 0.25 | 0.85 | 88.4% | 69.3% | 79.3% |
| β = 0.5 | 0.79 | 75.3% | 74.1% | 74.8% |
|
| ||||
| 150 msec | 0.80 | 78.3% | 75.8% | 77.1% |
| 333 msec | 0.80 | 75.4% | 77.4% | 76.3% |
| 500 msec | 0.82 | 71.0% | 83.9% | 77.1% |
| 1,000 msec | 0.81 | 72.5% | 82.3% | 77.1% |
| 2,000 msec | 0.75 | 61.0% | 75.8% | 67.9% |
| Hard voting | 0.84 | 72.4% | 83.8% | 77.8% |
| Soft voting | 0.85 | 88.4% | 69.3% | 79.3% |
|
| ||||
| 1,000 msec−50 frames | 0.77 | 60.9% | 88.7% | 74.1% |
| 500 msec−50 frames | 0.78 | 75.4% | 74.2% | 74.8% |
| 333 msec−50 frames | 0.74 | 71.0% | 71.0% | 71.0% |
| 150 msec−50 frames | 0.74 | 71.0% | 71.0% | 71.0% |
| 1,000 msec−100 frames | 0.70 | 47.8% | 90.3% | 67.9% |
| 500 msec−100 frames | 0.71 | 52.2% | 88.7% | 69.5% |
| 333 msec−100 frames | 0.69 | 49.3% | 87.1% | 67.2% |
| 150 msec−100 frames | 0.71 | 59.4% | 82.3% | 70.2% |
| 1,000 msec−150 frames | 0.67 | 36.2% | 96.8% | 64.9% |
| 500 msec−150 frames | 0.68 | 46.4% | 88.7% | 66.4% |
| 333 msec−150 frames | 0.67 | 44.9% | 87.1% | 64.9% |
| 150 msec−150 frames | 0.69 | 49.3% | 87.1% | 67.2% |
| 1,000 msec−350 frames | 0.59 | 20.3% | 96.8% | 56.5% |
| 500 msec−350 frames | 0.62 | 27.5% | 95.2% | 59.5% |
| 333 msec−350 frames | 0.58 | 17.4% | 98.4% | 55.7% |
| 150 msec−350 frames | 0.61 | 27.5% | 93.5% | 58.8% |
|
| ||||
| Balanced eccentric gaze videos | 0.83 | 83.1% | 72.1% | 77.8% |
|
| ||||
| ResNet | 0.84 | 72.4% | 83.8% | 77.8% |
| DenseNet | 0.81 | 75.3% | 79.0% | 77.1% |
| VGG | 0.85 | 59.4% | 96.7% | 77.1% |
| Inception | 0.82 | 84.0% | 72.5% | 78.6% |
|
| ||||
| ResNet-soft vote | 0.85 | 88.4% | 69.3% | 79.3% |
| VGG-hard vote | 0.85 | 59.4% | 96.7% | 77.1% |
| ResNet-soft vote + VGG-hard vote ensemble* | 0.86 | 88.4% | 74.2% | 81.7% |
| Reverse model | 0.50 | 0.00% | 100% | 47.7% |
| Fold 1 | 0.91 | 72.1% | 97.0% | 84.7% |
| Fold 2 | 0.82 | 83.6% | 73.0% | 78.3% |
| Fold 3 | 0.85 | 85.3% | 72.9% | 78.4% |
| Average | 0.86 | 80.3% | 80.9% | 80.4% |
|
| ||||
| LSTM + CNN | 0.46 | 100% | 2.00% | 48.4% |
|
| ||||
| 60 Hz (240 ×320) | 0.86 | 88.4% | 74.2% | 81.7% |
| 60 Hz (60 ×80) | 0.84 | 78.3% | 83.9% | 80.9% |
| 60 Hz (15 ×20) | 0.83 | 68.1% | 85.5% | 76.3% |
| 30 Hz (240 ×320) | 0.83 | 76.8% | 77.4% | 77.1% |
| 30 Hz (60 ×80) | 0.85 | 71.0% | 87.1% | 78.6% |
| 30 Hz (15 ×20) | 0.83 | 89.9% | 66.1% | 78.6% |
| 15 Hz (240 ×320) | 0.81 | 65.2% | 83.9% | 74.1% |
| 15 Hz (60 ×80) | 0.82 | 78.3% | 74.2% | 76.3% |
| 15 Hz (15 ×20) | 0.72 | 55.1% | 82.3% | 67.9% |
AUROC, area under the receiver operating characteristic curve; CNN, convolutional neural network; LTSM, long-term short-term memory. (*)indicating the best performing model.
Comparing best overall AUROC (ResNet-soft vote + VGG-hard vote ensemble) with best AUROC from each model experiment shown in Table 1 using two sample (unpaired) t-test.
|
|
|
|
|
|
|---|---|---|---|---|
|
| ||||
| Filtered image optimization | 0.85 | 0.529 | 0.369 | <0.001 |
| Sliding window comparison | 0.82 | 0.518 | 0.379 | <0.001 |
| Voting (β = 0.25) | 0.85 | 0.599 | 0.340 | <0.001 |
| Sliding window-temporal voting criteria | 0.78 | 0.244 | 0.356 | 0.837 |
| Data split modification | 0.83 | 0.558 | 0.369 | <0.001 |
| ImageNet classifier comparison | 0.85 | 0.439 | 0.328 | <0.001 |
| ResNet-soft vote + VGG-hard vote ensemble in reverse | 0.50 | 0.000 | 0.000 | <0.001 |
| Comparison with existing video classification method | 0.46 | 0.492 | 0.008 | <0.001 |
| Frame rate/resolution combinations (15 Hz) | 0.82 | 0.604 | 0.292 | <0.001 |
| Frame rate/resolution combinations (30 Hz) | 0.85 | 0.517 | 0.331 | <0.001 |
AUROC, area under the receiver operating characteristic curve. MP, Model Predictions; SD, Standard Deviation.
Figure 8Comparing the filtered-image probability distribution of the same false negative video with nystagmus in eccentric eye position toward the end of video.