| Literature DB >> 27257386 |
Kevin Bretonnel Cohen1, Benjamin Glass2, Hansel M Greiner3, Katherine Holland-Bouley3, Shannon Standridge3, Ravindra Arya3, Robert Faist2, Diego Morita3, Francesco Mangano4, Brian Connolly2, Tracy Glauser3, John Pestian2.
Abstract
OBJECTIVE: We describe the development and evaluation of a system that uses machine learning and natural language processing techniques to identify potential candidates for surgical intervention for drug-resistant pediatric epilepsy. The data are comprised of free-text clinical notes extracted from the electronic health record (EHR). Both known clinical outcomes from the EHR and manual chart annotations provide gold standards for the patient's status. The following hypotheses are then tested: 1) machine learning methods can identify epilepsy surgery candidates as well as physicians do and 2) machine learning methods can identify candidates earlier than physicians do. These hypotheses are tested by systematically evaluating the effects of the data source, amount of training data, class balance, classification algorithm, and feature set on classifier performance. The results support both hypotheses, with F-measures ranging from 0.71 to 0.82. The feature set, classification algorithm, amount of training data, class balance, and gold standard all significantly affected classification performance. It was further observed that classification performance was better than the highest agreement between two annotators, even at one year before documented surgery referral. The results demonstrate that such machine learning methods can contribute to predicting pediatric epilepsy surgery candidates and reducing lag time to surgery referral.Entities:
Keywords: epilepsy; epilepsy surgery; machine learning; natural language processing; neurosurgery
Year: 2016 PMID: 27257386 PMCID: PMC4876984 DOI: 10.4137/BII.S38308
Source DB: PubMed Journal: Biomed Inform Insights ISSN: 1178-2226
Patient level classification of surgery candidacy, baseline system. ± values are standard deviations. The baseline system performs comparably to the aggregate of neurologists.
| NLP | ||||
|---|---|---|---|---|
| SURGERY CANDIDATE | NON-CANDIDATE | TOTAL | ||
| Gold Standard | 71 | 29 | 100 | |
| 21 | 79 | 100 | ||
| 92 | 108 | 200 | ||
Notes: Precision: 0.77 ± 0.03. Recall: 0.71 ± 0.03. F-measure: 0.74 ± 0.04.
Figure 1Classification of surgery candidacy at 11 clinic visits by the baseline system. Error bars are 95% confidence intervals. The mean performance of the baseline system is comparable to the aggregate of neurologists at the first visit and improves over time.
Figure 2Classification of surgery candidacy at nine time periods by the baseline system. Error bars are 95% confidence intervals. The mean performance of the baseline system is comparable to the aggregate of neurologists at the first time period and improves over time.
F-measure with various feature sets.
| FEATURE SET | |
|---|---|
| Unigrams | 0.77 ± 0.03 |
| Unigrams + drugs | 0.81 ± 0.03 |
| Bigrams | 0.80 ± 0.03 |
| Bigrams + drugs | 0.80 ± 0.03 |
| Unigrams + bigrams | 0.80 ± 0.03 |
| Unigrams + bigrams + drugs | 0.82 ± 0.03 |
Note: All other factors such as balanced data, data set size, classification by SVM, and distant supervision for the data source are held constant.
Effect of classifier, holding all other factors constant.
| Classifier | |
|---|---|
| Naive Bayes | 0.77 ± 0.03 |
| Support vector machine | 0.82 ± 0.03 |
Figure 3Effect of size of training data. All other factors such as balance, feature set, classifier type, and data source are held constant. Error bars are 95% confidence intervals.
Effect of data balance, holding all other factors constant.
| POSITIVE INSTANCES | NEGATIVE INSTANCES | |
|---|---|---|
| 100 | 100 | 0.82 ± 0.03 |
| 100 | 200 | 0.80 ± 0.03 |
| 100 | 300 | 0.74 ± 0.04 |
| 100 | 400 | 0.70 ± 0.04 |
Effect of data source, holding all other factors constant.
| DATA SOURCE | |
|---|---|
| Distant supervision | 0.74 ± 0.08 |
| Manually annotated | 0.70 ± 0.08 |