| Literature DB >> 35549702 |
Robert B Penfold1, David S Carrell2, David J Cronkite2, Chester Pabiniak2, Tammy Dodd2, Ashley Mh Glass2, Eric Johnson2, Ella Thompson2, H Michael Arrighi3, Paul E Stang3.
Abstract
BACKGROUND: Patients and their loved ones often report symptoms or complaints of cognitive decline that clinicians note in free clinical text, but no structured screening or diagnostic data are recorded. These symptoms/complaints may be signals that predict who will go on to be diagnosed with mild cognitive impairment (MCI) and ultimately develop Alzheimer's Disease or related dementias. Our objective was to develop a natural language processing system and prediction model for identification of MCI from clinical text in the absence of screening or other structured diagnostic information.Entities:
Keywords: Dementia; Early identification; MCI; NLP
Mesh:
Year: 2022 PMID: 35549702 PMCID: PMC9097352 DOI: 10.1186/s12911-022-01864-z
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Concepts associated with mild cognitive impairment
| Variable | Description | ACT count | Gen. Pop. Pop. count |
|---|---|---|---|
| S_EXCL | References to stroke (used to exclude patients from analysis) | 351,995 | 21,423 |
| WITHX | Patient accompanied by family member | 212,669 | 14,603 |
| RESPONS | Responsibility being assumed by family member | 123,834 | 19,895 |
| NEGATE | Atenolol, hypercalcemia, statins, and “remember to take” boilerplate language | 122,559 | 8703 |
| HALLUC | Hallucination issues | 75,042 | 6515 |
| HEADACHE | Headache/concern for stroke or brain injury | 67,955 | 5550 |
| W_EXCL | Traumatic brain injury, dehydration, etc. (used to exclude patients from analysis) | 49,853 | 3091 |
| DECLINE | Declining memory/cognitive abilities | 49,522 | 13,559 |
| WANDER | Wandering, getting lost, or unable to recognize | 42,677 | 1821 |
| CALLED | Reference to communication going through family member | 36,073 | 3140 |
| FORGET | Forget/can't remember | 28,412 | 3310 |
| DONEPEZIL | Donepezil, Aricept discussed (e.g., regarding what the medications can do) | 18,910 | 821 |
| CONCERN | Family showing concern for patient | 15,576 | 3568 |
| FORGETFL | Forgetful | 11,146 | 1372 |
| EXAM | Cognitive evaluation | 10,828 | 6027 |
| OTHER_SA | Communication goes through family members | 5474 | 562 |
| S_HALLUC | Strong hallucination concern | 4791 | 294 |
| ICD_EXCL | Dementia ICD diagnosis code appearing in text | 3622 | 60 |
| DEMENTIA | Severe dementia noted | 2426 | 85 |
| REFERAL | Referral for cognitive assessment | 1653 | 659 |
| COMPREHE | Poor understanding/comprehension | 1606 | 184 |
| W_DECLIN | Decline in word finding, vocabulary, explaining, etc | 1548 | 148 |
| CONCENTR | Difficulty concentrating | 1363 | 545 |
| EARLY | Early dementia | 1204 | 344 |
| DECLINE_ | Communication/call concerning memory decline | 1127 | 87 |
| FORGETX | Forget [something] e.g., keys | 978 | 115 |
| S_CONCER | Worsening or strong concern for dementia | 861 | 237 |
| PLAN | Related care plan to family member | 829 | 47 |
| HAL_EXCL | Hallucination issues resolved | 742 | 79 |
| BOI_INCL | Boilerplate text describing memory problems not necessarily specific to the patient | 559 | 141 |
| OTH_EXCL | Headache/memory complaint relating to non-patient | 409 | 22 |
| RISK | Risk of dementia | 379 | 139 |
| W_CONCER | Concern for word finding, vocabulary, explaining, etc | 377 | 26 |
| ICD_INCL | MCI ICD diagnosis code appearing in text but not in structured data | 255 | 82 |
| DENIAL | Patient denies problem with memory or functioning | 204 | 6 |
| EXM_EXCL | Normal cognitive exam | 137 | 203 |
| STIMULANT | Stimulant medications (modafinil, Provigil, etc.) | 113 | 37 |
| SENILE | Not thinking well/not lucid | 73 | 0 |
| BURDEN | Burden on family member | 58 | 15 |
| BOOK | Names of relevant books, including 36-h Day, Dignified Life, and Ageless Outings | 41 | 1 |
| EXCLUDE | Words referencing forgetfulness excluded because of ambiguity concerns | 3 | 7 |
| WELLNESS | Wellness check | 0 | 0 |
Corpora descriptive statistics for characters, words, and tokens
| Corpus | Num. chars | Num. chars | Num. chars | Num. words | Num. words | Num. words | Num. tokens | Num. of tokens | Num. tokens |
|---|---|---|---|---|---|---|---|---|---|
| Mean | Max | Min | Mean | Max | Min | Mean | Max | Min | |
| ACT (training) | 1229.6 | 52,491.0 | 0 | 216.9 | 9350.0 | 0 | 260.8 | 10,946.0 | 0 |
| Gen. Pop. (training) | 1324.9 | 76,831.0 | 0 | 233.0 | 15,029.0 | 0 | 276.9 | 17,422.0 | 0 |
| Gen. Pop. (validation) | 1118.7 | 58,080.0 | 0 | 196.9 | 9588.0 | 0 | 234.9 | 11,251.0 | 0 |
Cohort demographics
| ACT cohort | General population | |||
|---|---|---|---|---|
| N | % | N | % | |
| Total people | 1473 | 100 | 2391 | 100 |
| Age at index | ||||
| 65–69 | 14 | 0.95 | 456 | 19.07 |
| 70–74 | 107 | 7.26 | 515 | 21.54 |
| 75–79 | 260 | 17.65 | 461 | 19.28 |
| 80–84 | 395 | 26.82 | 450 | 18.82 |
| 85+ | 697 | 47.32 | 509 | 21.29 |
| Sex | ||||
| Female | 954 | 64.77 | 1419 | 59.35 |
| Male | 519 | 35.23 | 972 | 40.65 |
| Race | ||||
| American Indian/Alaska native | 8 | 0.54 | 26 | 1.09 |
| Asian | 40 | 2.72 | 104 | 4.35 |
| Black or African American | 61 | 4.14 | 51 | 2.13 |
| Native Hawaiian or Other Pacific Islander | 3 | 0.2 | 1 | 0.04 |
| Other | 10 | 0.68 | 15 | 0.63 |
| Unknown or not reported | 27 | 1.83 | 46 | 1.92 |
| White | 1324 | 89.88 | 2148 | 89.84 |
| Ethnicity | ||||
| Hispanic or Latino | 37 | 2.51 | 94 | 3.93 |
| Not Hispanic or Latino | 1417 | 96.2 | 2249 | 94.06 |
| Unknown/not reported ethnicity | 19 | 1.29 | 48 | 2.01 |
| Neighborhood income | ||||
| < $25,000 | 6 | 0.41 | 21 | 0.88 |
| ≥ $25,000 | 1409 | 95.66 | 2351 | 98.33 |
| Missing | 58 | 3.94 | 19 | 0.79 |
| Neighborhood education | ||||
| < 25% college | 209 | 14.19 | 795 | 33.25 |
| ≥ 25% college | 1206 | 81.87 | 1577 | 65.96 |
| Missing | 58 | 3.94 | 19 | 0.79 |
MCI prevalence
| ACT cohort | General population | ||||
|---|---|---|---|---|---|
| MCI (−) | MCI (+) | MCI (−) | MCI (+) | ||
| Age | |||||
| 65–69 | n | 9 | 5 | 371 | 85 |
| % | 64.3 | 35.7 | 81.4 | 18.6 | |
| 70–74 | n | 57 | 51 | 407 | 108 |
| % | 52.8 | 47.2 | 79.0 | 21.0 | |
| 75–79 | n | 136 | 123 | 318 | 143 |
| % | 52.5 | 47.5 | 69.0 | 31.0 | |
| 80–84 | n | 194 | 200 | 270 | 180 |
| % | 49.2 | 50.8 | 60.0 | 40.0 | |
| 85 + | n | 341 | 357 | 283 | 226 |
| % | 48.9 | 51.1 | 55.6 | 44.4 | |
| Sex | |||||
| Male | n | 260 | 259 | 693 | 279 |
| % | 50.1 | 49.9 | 71.3 | 28.7 | |
| Female | n | 477 | 477 | 956 | 463 |
| % | 50.0 | 50.0 | 67.4 | 32.6 | |
| Total | n | 737 | 736 | 1649 | 742 |
| % | 50.0 | 50.0 | 69.0 | 31.0 | |
Variables retained in the prediction model
| Variable (intercept) | Description | Coefficient |
|---|---|---|
| ICD_EXCL | Dementia ICD9 codes in text but not structured data | 0.634 |
| DEMENTIA | Severe dementia | 0.596 |
| DONEPEZIL | Discussion of Aricept, donepezil (but not prescribed or used) | 0.568 |
| OTHER_SA | Communication goes through family members | 0.17 |
| RACE | Black race | 0.134 |
| DECLINE | Declining memory/cognitive abilities | 0.082 |
| BEHAVIOR SUM | Sum of presence of behavioral concepts | 0.076 |
| AGE | Age at index (per year) e.g. For age 70 the coefficient = 1.61 | 0.023 |
| CALLED | Reference to family member calling about patient’s memory | 0.012 |
Fig. 1ROC curve for training and validation cohorts. Green dotted line: ACT + general population training. Light green dotted line: ACT training. Orange dotted line: general population 60% training sample. Blue dotted line: general population 40% validation sample. Gray dotted line: demographic variables only. ACT + general population 60% training: AUC = 0.716 (0.695, 0.736). ACT alone: AUC = 0.700 (0.673, 0.726). General population, 60% Training: AUC = 0.698 (0.663, 0.731). General population, 40% validation: AUC = 0.670 (0.638, 0.702). Demographics only (no NLP variables): AUC = 0.598 (0.576, 0.621)
Prediction model performance characteristics in each population at various cutoffs for probability of correct classification
| Cohort | Cutoffa | Sensitivity | Specificity | PPV | NPV | F1 Score |
|---|---|---|---|---|---|---|
| ACT + Gen. Pop. training | 0.3 | 0.95 | 0.17 | 0.46 | 0.83 | 0.62 |
| 0.4 | 0.63 | 0.69 | 0.6 | 0.71 | 0.61 | |
| 0.5 | 0.37 | 0.9 | 0.73 | 0.66 | 0.49 | |
| 0.6 | 0.24 | 0.96 | 0.81 | 0.63 | 0.37 | |
| ACT training | 0.3 | 0.99 | 0.04 | 0.51 | 0.76 | 0.67 |
| 0.4 | 0.75 | 0.52 | 0.61 | 0.68 | 0.67 | |
| 0.5 | 0.49 | 0.82 | 0.73 | 0.62 | 0.59 | |
| 0.6 | 0.34 | 0.92 | 0.80 | 0.58 | 0.48 | |
| Gen. Pop. training | 0.3 | 0.88 | 0.31 | 0.38 | 0.84 | 0.53 |
| 0.4 | 0.37 | 0.86 | 0.57 | 0.74 | 0.45 | |
| 0.5 | 0.11 | 0.98 | 0.75 | 0.70 | 0.19 | |
| 0.6 | 0.02 | 1.00 | 1.00 | 0.68 | 0.04 | |
| Gen. Pop. validation | 0.3 | 0.87 | 0.32 | 0.35 | 0.85 | 0.50 |
| 0.4 | 0.32 | 0.88 | 0.53 | 0.75 | 0.40 | |
| 0.5 | 0.09 | 0.97 | 0.56 | 0.71 | 0.16 | |
| 0.6 | 0.02 | 1.00 | 0.70 | 0.70 | 0.04 |
aCutoffs are the various probabilities that the researcher or health system would choose as a threshold to classify someone as “positive” for MCI