| Literature DB >> 27034258 |
Benjamin H Brinkmann1, Joost Wagenaar2, Drew Abbot3, Phillip Adkins3, Simone C Bosshard4, Min Chen4, Quang M Tieng4, Jialune He5, F J Muñoz-Almaraz6, Paloma Botella-Rocamora6, Juan Pardo6, Francisco Zamora-Martinez6, Michael Hills7, Wei Wu8, Iryna Korshunova9, Will Cukierski10, Charles Vite11, Edward E Patterson12, Brian Litt2, Gregory A Worrell13.
Abstract
SEE MORMANN AND ANDRZEJAK DOI101093/BRAIN/AWW091 FOR A SCIENTIFIC COMMENTARY ON THIS ARTICLE : Accurate forecasting of epileptic seizures has the potential to transform clinical epilepsy care. However, progress toward reliable seizure forecasting has been hampered by lack of open access to long duration recordings with an adequate number of seizures for investigators to rigorously compare algorithms and results. A seizure forecasting competition was conducted on kaggle.com using open access chronic ambulatory intracranial electroencephalography from five canines with naturally occurring epilepsy and two humans undergoing prolonged wide bandwidth intracranial electroencephalographic monitoring. Data were provided to participants as 10-min interictal and preictal clips, with approximately half of the 60 GB data bundle labelled (interictal/preictal) for algorithm training and half unlabelled for evaluation. The contestants developed custom algorithms and uploaded their classifications (interictal/preictal) for the unknown testing data, and a randomly selected 40% of data segments were scored and results broadcasted on a public leader board. The contest ran from August to November 2014, and 654 participants submitted 17 856 classifications of the unlabelled test data. The top performing entry scored 0.84 area under the classification curve. Following the contest, additional held-out unlabelled data clips were provided to the top 10 participants and they submitted classifications for the new unseen data. The resulting area under the classification curves were well above chance forecasting, but did show a mean 6.54 ± 2.45% (min, max: 0.30, 20.2) decline in performance. The kaggle.com model using open access data and algorithms generated reproducible research that advanced seizure forecasting. The overall performance from multiple contestants on unseen data was better than a random predictor, and demonstrates the feasibility of seizure forecasting in canine and human epilepsy.media-1vid110.1093/brain/aww045_video_abstractaww045_video_abstract.Entities:
Keywords: epilepsy; experimental models; intracranial EEG; refractory epilepsy
Mesh:
Year: 2016 PMID: 27034258 PMCID: PMC5022671 DOI: 10.1093/brain/aww045
Source DB: PubMed Journal: Brain ISSN: 0006-8950 Impact factor: 15.255
Figure 1Canine electrode locations and data segments. (A) For the canine subjects, bilateral pairs of 4-contact strips were implanted oriented along the anterior-posterior direction. Electrode wires were tunnelled through the neck and connected to an implanted telemetry device secured beneath the latissimus dorsi muscle. (B) An hour of data with a 5-min offset before each lead seizure was extracted and split into 10-min segments for analysis. (C) The expanded view illustrates a ∼35-s long seizure.
Figure 2Human implanted electrode locations. Implanted electrodes are visible in X-ray CT images coregistered to the space of the patient’s MRI for the two epilepsy patients whose data was used in this competition. (A) Patient 1 had bitemporal 8-contact penetrating depth electrodes implanted along the axes of the left and right hippocampus. (B) Patient 2 had a 3 × 8 subdural electrode grid placed along the axis of the left temporal lobe and frontal lobe strip electrodes. Spheres represent approximate electrode positions due to post-craniotomy brain surface shift in the CT. Electrodes not used in these experiments have been omitted from this illustration.
Data characteristics for the Kaggle.com seizure forecasting contest and held-out data experiment
| Subject | Sampling rate (Hz) | Recorded data (h) | Seizures | Lead seizures | Training clips (% interictal) | Testing clips (% interictal) | Held-out clips (% interictal) |
|---|---|---|---|---|---|---|---|
| Dog 1 | 400 | 1920 | 22 | 8 | 504 (95.2) | 502 (95.2) | 2000 (99.7) |
| Dog 2 | 400 | 8208 | 47 | 40 | 542 (92.3) | 1000 (91.0) | 1000 (100) |
| Dog 3 | 400 | 5112 | 104 | 18 | 1512 (95.2) | 907 (95.4) | 1000 (100) |
| Dog 4 | 400 | 7152 | 29 | 27 | 901 (89.2) | 990 (94.2) | 1000 (95.8) |
| Dog 5 | 400 | 5616 | 19 | 8 | 480 (93.8) | 191 (93.7) | 0 |
| Patient 1 | 5000 | 71.3 | 5 | 4 | 68 (73.5) | 195 (93.9) | 0 |
| Patient 2 | 5000 | 158.5 | 41 | 6 | 60 (70.0) | 150 (90.7) | 0 |
Figure 3Leading scores during the competition. Plots of the leading score on the kaggle.com public (black line) and private (red line) leader boards for the duration of the competition. The top score from the held-out data experiment is represented by the horizontal blue line.
AUC scores for top ten Kaggle.com finalists in the public and private leaderboards
| Place | Team name | Public leader board | Private leader board | Entries |
|---|---|---|---|---|
| 1 | QMSDP | 0.86 | 0.82 | 501 |
| 2 | Birchwood | 0.84 | 0.80 | 160 |
| 3 | ESAI CEU-UCH | 0.82 | 0.79 | 182 |
| 4 | Michael Hills | 0.86 | 0.79 | 427 |
| 5 | KPZZ | 0.82 | 0.79 | 196 |
| 6 | Carlos Fernandez | 0.84 | 0.79 | 299 |
| 7 | Isaac | 0.84 | 0.79 | 253 |
| 8 | Wei Wu | 0.82 | 0.79 | 140 |
| 9 | Golondrina | 0.82 | 0.78 | 171 |
| 10 | Sky Kerzner | 0.84 | 0.78 | 97 |
The public leader board score was computed on a randomly-chosen 40% subset of the data, while the private leader board was computed on the remaining 60%.
AUC scores for the held-out data experiment compared to scores on the public and private leader boards
| Team name | Window (overlap) | Features | Machine learning algorithm | Ensemble method | Public leader board | Private leader board | Held-out data | Per cent change | Sensitivity at 75% specificity |
|---|---|---|---|---|---|---|---|---|---|
| QMSDP | 60s (0%), 8 s (97%) | Spectral power, spectral entropy, correlation, fractal dimensions, Hjorth parameters, distribution statistics, signal variance | LassoGLM, Bagged SVM, Random Forest | Weighted average | 0.86 | 0.82 | 0.75 | −7.97 | 0.71 |
| Birchwood | 50 s (0%) | Log spectral power, covariance | SVM | Platt scaling | 0.84 | 0.80 | 0.74 | −8.01 | 0.60 |
| ESAI CEU-UCH | 60 s (50%) | Spectral power, correlation, signal derivativePCA and ICA preprocessing | Neural Network and K Nearest Neighbour clustering | Bayesian combination | 0.82 | 0.79 | 0.72 | −9.77 | 0.54 |
| Michael Hills | Spectral power, correlation, spectral entropy, fractal dimensions, Hurst exponent Genetic algorithm feature selection | SVM | 0.86 | 0.79 | 0.79 | −0.29 | 0.73 | ||
| Wei Wu | 60 s (0%) | Spectral power, statistical measures, covariance matrices | SVM and GLMNet | Weighted average of rank scores | 0.82 | 0.79 | 0.77 | −1.86 | 0.69 |
| Golondrina | 60 s (0%) | Spectral power, signal standard deviation | Convolutional neural networks (test data calibration) | 0.82 | 0.78 | 0.76 | −2.77 | 0.73 | |
Rows in bold represent algorithm variations submitted after the competition as part of the held-out data experiment. Additional information on algorithms is available in the Supplementary material.