| Literature DB >> 30006563 |
Zhongxing Zhang1, Geert Mayer2, Yves Dauvilliers3, Giuseppe Plazzi4,5, Fabio Pizza4,5, Rolf Fronczek6,7, Joan Santamaria8, Markku Partinen9, Sebastiaan Overeem10,11, Rosa Peraita-Adrados12, Antonio Martins da Silva13, Karel Sonka14, Rafael Del Rio-Villegas15, Raphael Heinzer16, Aleksandra Wierzbicka17, Peter Young18, Birgit Högl19, Claudio L Bassetti20, Mauro Manconi21,20, Eva Feketeova22, Johannes Mathis20, Teresa Paiva23, Francesca Canellas24, Michel Lecendreux25,26, Christian R Baumann27, Lucie Barateau3, Carole Pesenti3, Elena Antelmi4,5, Carles Gaig8, Alex Iranzo8, Laura Lillo-Triguero12, Pablo Medrano-Martínez12, José Haba-Rubio16, Corina Gorban1, Gianina Luca28, Gert Jan Lammers6,7, Ramin Khatami29,30.
Abstract
Narcolepsy is a rare life-long disease that exists in two forms, narcolepsy type-1 (NT1) or type-2 (NT2), but only NT1 is accepted as clearly defined entity. Both types of narcolepsies belong to the group of central hypersomnias (CH), a spectrum of poorly defined diseases with excessive daytime sleepiness as a core feature. Due to the considerable overlap of symptoms and the rarity of the diseases, it is difficult to identify distinct phenotypes of CH. Machine learning (ML) can help to identify phenotypes as it learns to recognize clinical features invisible for humans. Here we apply ML to data from the huge European Narcolepsy Network (EU-NN) that contains hundreds of mixed features of narcolepsy making it difficult to analyze with classical statistics. Stochastic gradient boosting, a supervised learning model with built-in feature selection, results in high performances in testing set. While cataplexy features are recognized as the most influential predictors, machine find additional features, e.g. mean rapid-eye-movement sleep latency of multiple sleep latency test contributes to classify NT1 and NT2 as confirmed by classical statistical analysis. Our results suggest ML can identify features of CH on machine scale from complex databases, thus providing 'ideas' and promising candidates for future diagnostic classifications.Entities:
Mesh:
Year: 2018 PMID: 30006563 PMCID: PMC6045630 DOI: 10.1038/s41598-018-28840-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The results of the performance of cross-validation. The colored lines indicate different interaction depths (i.e., maximal tree depths). Each data point in the figure represents one classifier. For example, the magenta data point at (1500, 0.985) indicates a model built with 1500 trees and the tree depth is 3, and this model gives an AUC value of 0.985 in the 10-fold cross-validation with 10 times repeats.
The example of predictive probability of classifier NT1/NT2 on the testing set.
| The number of patient | Probability of NT1 | Probability of NT2 |
|---|---|---|
| 1 | 20.2% | 79.8% |
| 2 | 17.5% | 82.5% |
| 3 | 15.2% | 84.8% |
Figure 2The relative influences of predictors in the classifiers of NT1/NT2. The variable names written on the vertical axis are the ones giving relative influence larger than 0.1. “Pat.” is short for pattern, “Cat.” is cataplexy. “Sleep.latency.REM.mean.sum” is the mean REM sleep latency of multiple sleep latency test. “HH.certainty” is the clinical certainty of hypnagogic hallucinations.
Figure 3Comparisons of mean REM sleep latency between NT1/NT2 patients with the same number of SOREMP. The number of patient of each subgroup is given on the x-axis. P-values are given by Wilcoxon rank sum tests. Please note the error bars are the ones of standard error of mean. Considering the small sample size of NT2 patients, the distributions of the mean REM sleep latency in NT2 patients with 4 and 5 SOREMP may hardly fit normal distribution. So we provide the results of Wilcoxon rank sum test which tests the median rather than the mean of compared groups in the figure.
Figure 4The relative influences of the features contributing to the classification of NT1/NT2 selected by the SGB model built without cataplexy features. “Sleep.latency.NREM1.mean.sum” is the mean non-REM stage 1 sleep latency of MSLT. ESS is Epworth sleepiness scale. “Waking.up.upon.daytime.sleep” means wake up from daytime sleep. PLMI is periodic limb movement index. “EDS.refreshing.sleep.episodes” means excessive daytime sleepiness (EDS) is refreshed after sleep episodes.
Figure 5The relative influences of the features contributing to the classification of NT1/NT2 selected by the SGB model built without cataplexy features and hypocretin level. Please refer to figure legends of Fig. 3 and 4 for the names of the predictors. “age.SD” here means the age of sleep diagnosis.
Figure 6A simple example of visualizing gradient boosting.