| Literature DB >> 34792589 |
Liqin Wang1, John Laurentiev2, Jie Yang3, Ying-Chih Lo1, Rebecca E Amariglio4, Deborah Blacker5, Reisa A Sperling4, Gad A Marshall4, Li Zhou1.
Abstract
Importance: Detecting cognitive decline earlier among older adults can facilitate enrollment in clinical trials and early interventions. Clinical notes in longitudinal electronic health records (EHRs) provide opportunities to detect cognitive decline earlier than it is noted in structured EHR fields as formal diagnoses. Objective: To develop and validate a deep learning model to detect evidence of cognitive decline from clinical notes in the EHR. Design, Setting, and Participants: Notes documented 4 years preceding the initial mild cognitive impairment (MCI) diagnosis were extracted from Mass General Brigham's Enterprise Data Warehouse for patients aged 50 years or older and with initial MCI diagnosis during 2019. The study was conducted from March 1, 2020, to June 30, 2021. Sections of notes for cognitive decline were labeled manually and 2 reference data sets were created. Data set I contained a random sample of 4950 note sections filtered by a list of keywords related to cognitive functions and was used for model training and testing. Data set II contained 2000 randomly selected sections without keyword filtering for assessing whether the model performance was dependent on specific keywords. Main Outcomes and Measures: A deep learning model and 4 baseline models were developed and their performance was compared using the area under the receiver operating characteristic curve (AUROC) and area under the precision recall curve (AUPRC).Entities:
Mesh:
Year: 2021 PMID: 34792589 PMCID: PMC8603078 DOI: 10.1001/jamanetworkopen.2021.35174
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Figure 1. Flowchart of the Machine Learning Modeling for Cognitive Decline Detection
Creation of 2 reference data sets for model development and evaluation and the creation of word embedding for the deep learning model. The study cohort was patients with initial mild cognitive impairment (MCI) diagnosis in 2019. Notes written during the 4 years preceding the MCI diagnosis were extracted. Two different randomization approaches were used for creating data sets I and II. The first randomization method. used for obtaining 4950 sections, considered an uneven distribution of sections among patients and section types. The second randomization method, used for obtaining 2000 sections, aimed to obtain a sample of sections from the entire corpus except the sections that were included in data set I. In addition, a larger corpus was used for training a word embedding model for the deep learning algorithm. NLP indicates natural language processing.
aDue to uneven distribution (for example, some patients might have more notes and some section types were more prevalent in the corpus) the randomization aimed to maximize the coverage of different patients and different section types. This randomization was performed among the sections with mention of keywords potentially related to cognitive decline.
bRandomization for sections of clinical notes from the entire corpus except the sections that were included in data set I.
Characteristics for Data Sets I and II for the Development and Validation of Models for Identifying Cognitive Decline
| Characteristic | Data set I | Data set II |
|---|---|---|
| Unique patients, No. | 1969 | 1161 |
| Sex, No. (%) | ||
| Female | 1046 (53.1) | 619 (53.3) |
| Male | 923 (46.9) | 542 (46.7) |
| Age, mean (SD), y | 76.0 (13.3) | 76.5 (10.2) |
| Sections, No. | 4950 | 2000 |
| Notes, No. | 4745 | 1996 |
| Character-level length, mean (SD) | 849 (936) | 463 (669) |
| Keywords present, No. (%) | 4950 (100) | 740 (37.0) |
| Cognitive decline present in sections overall, No. (%) | 1453 (29.4) | 69 (3.5) |
| Cognitive decline present in sections with keywords (%) | 1453 (29.4) | 69 (9.3) |
Performance of 5 Machine Learning Models for Detecting Cognitive Decline From Clinical Notes
| Model | AUROC (95% CI) | AUPRC (95% CI) |
|---|---|---|
|
| ||
| Logistic regression | 0.936 (0.929-0.943) | 0.880 (0.867-0.893) |
| Random forest | 0.950 (0.944-0.956) | 0.889 (0.875-0.902) |
| SVM | 0.939 (0.933-0.946) | 0.883 (0.869-0.897) |
| XGBoost | 0.953 (0.946-0.960) | 0.882 (0.864-0.900) |
| Deep learning | 0.971 (0.967-0.976) | 0.933 (0.921-0.944) |
|
| ||
| Logistic regression | 0.969 (0.947-0.987) | 0.762 (0.656-0.849) |
| Random forest | 0.985 (0.972-0.994) | 0.830 (0.746-0.898) |
| SVM | 0.954 (0.924-0.979) | 0.723 (0.618-0.822) |
| XGBoost | 0.988 (0.969-0.998) | 0.898 (0.830-0.957) |
| Deep learning | 0.997 (0.994-0.999) | 0.929 (0.870-0.969) |
Abbreviations: AUPRC, area under the precision recall curve; AUROC, area under the receiver operating characteristic curve; SVM, support vector machine.
Figure 2. Machine Learning Model Performance
Receiver operating characteristic curves and precision recall curves for the deep learning and baseline models (logistic regression, support vector machine [SVM], random forest, and XGBoost) in data sets I and II. A, Receiver operating characteristic curve (left panel) and precision recall curve (right panel) for data set I (4950 sections). B, Receiver operating characteristic curve (left panel) and precision recall curve (right panel) for data set II (2000 sections). In the left panels, gray-dashed curves contain all points in the precision/recall space whose F1 scores are the same, and the gray-dotted diagonal line indicates the performance based on random guessing.