Literature DB >> 31391041

Early temporal characteristics of elderly patient cognitive impairment in electronic health records.

Somaieh Goudarzvand1, Jennifer St Sauver2, Michelle M Mielke2, Paul Y Takahashi3, Yugyung Lee1, Sunghwan Sohn4.   

Abstract

BACKGROUND: The aging population has led to an increase in cognitive impairment (CI) resulting in significant costs to patients, their families, and society. A research endeavor on a large cohort to better understand the frequency and severity of CI is urgent to respond to the health needs of this population. However, little is known about temporal trends of patient health functions (i.e., activity of daily living [ADL]) and how these trends are associated with the onset of CI in elderly patients. Also, the use of a rich source of clinical free text in electronic health records (EHRs) to facilitate CI research has not been well explored. The aim of this study is to characterize and better understand early signals of elderly patient CI by examining temporal trends of patient ADL and analyzing topics of patient medical conditions in clinical free text using topic models.
METHODS: The study cohort consists of physician-diagnosed CI patients (n = 1,435) and cognitively unimpaired (CU) patients (n = 1,435) matched by age and sex, selected from patients 65 years of age or older at the time of enrollment in the Mayo Clinic Biobank. A corpus analysis was performed to examine the basic statistics of event types and practice settings where the physician first diagnosed CI. We analyzed the distribution of ADL in three different age groups over time before the development of CI. Furthermore, we applied three different topic modeling approaches on clinical free text to examine how patients' medical conditions change over time when they were close to CI diagnosis.
RESULTS: The trajectories of ADL deterioration became steeper in CI patients than CU patients approximately 1 to 1.5 year(s) before the actual physician diagnosis of CI. The topic modeling showed that the topic terms were mostly correlated and captured the underlying semantics relevant to CI when approaching to CI diagnosis.
CONCLUSIONS: There exist notable differences in temporal trends of basic and instrumental ADL between CI and CU patients. The trajectories of certain individual ADL, such as bathing and responsibility of own medication, were closely associated with CI development. The topic terms obtained by topic modeling methods from clinical free text have a potential to show how CI patients' conditions evolve and reveal overlooked conditions when they close to CI diagnosis.

Entities:  

Keywords:  Activity of daily living; Cognitive impairment; Deep learning; Early diagnosis; Topic modeling

Mesh:

Year:  2019        PMID: 31391041      PMCID: PMC6686236          DOI: 10.1186/s12911-019-0858-0

Source DB:  PubMed          Journal:  BMC Med Inform Decis Mak        ISSN: 1472-6947            Impact factor:   2.796


Background

Medical achievements have produced a population whose lifespan has increased by 30 years since the beginning of the twentieth century [1]. In 2012, there were 40.7 million people aged 65 and over in the United States (13.2% of the total population), with 38.7% reported to have one or more disabilities [2]. The aging population has also led to an increase in persons living with CI, more than 17 million people in the United States [3], causing patients, their families, and society an annual estimate of $18 billion in lost income and direct cost of care [4]. Herein, we are defining CI as either mild cognitive impairment (MCI) or dementia. In 2015 Alzheimer’s Disease International estimated that dementias affected 46.8 million individuals worldwide. They projected the number to nearly triple by 2050 reaching 131.5 million people worldwide [5]. Regarding this, the subject of MCI is paramount as it is a transitional zone between normal life in older ages and dementia. One study indicated that clinicians were not aware of CI in more than 40% of their patients [6]. The failure to diagnose cognitive complaints will delay appropriate care plans of underlying diseases and comorbid conditions, and may cause safety issues for patients and others [7, 8]. In many cases, the CI problem will worsen over time [8-10]. Thus, early diagnosis of CI can be of utmost importance and may reduce the large burden later on the medical and social care. The impact of CI on ADL has been used as a criterion to differentiate MCI and dementia [11]. ADL is often divided into basic ADL (b-ADL), which includes activities such as personal hygiene, clothing, feeding and toileting [12] and instrumental ADL (i-ADL), which is commonly referred to as independent living abilities such as household activities, handling money, shopping, and transportation [13-15]. The i-ADL has a higher demand for cognitive function than the b-ADL and is important for living an independent life in society [16]. ADL is highly dependent on cognitive function and behavior [17]. Therefore, there should be assessments that are capable of detecting changes in ADL as soon as changes in cognition and behavior are detected [17]. In this study, we first examined basic statistics of EHR corpus relevant to CI diagnosis. Temporal trends of ADL in elderly patients (age 65 or up) mined from EHRs before they develop CI were compared between CI and CU patients. We used both structured (current visit information provided by patients) and unstructured data (clinical notes). Furthermore, we applied machine learning techniques (i.e., three topic modeling methods) on clinical notes to extract meaningful semantics (i.e., topics and terms) residing in clinical free text to examine their potential association with future CI development. Different studies have used machine learning algorithms to differentiate between cognitively normal and MCI individuals [18, 19], to predict conversion from MCI to Alzheimer’s disease (AD) [20], and to predict the time to this conversion [19]. Researchers [21] developed two layers model in which the first layer is for a screening test to categorize a normal or abnormal group. The second layer is a close examination to classify MCI or dementia. They compared result with various machine learning approaches. Support vector machines, multi-layer perceptron and logistic regression showed high performance. Conversion from MCI to AD has also been studied using a deep learning model with MRI, neuropsychological and demographics data [22]. In another study [23], they tried to predict MCI from spontaneous spoken utterances. Classifying cognitive profiles using machine learning with fMRI data as an addition to cognitive data were explored [24]. In their work fMRI data are only used to train the classifier and classification of new data is solely based on cognitive data. Another research [25] focuses on the early diagnosis of AD with deep learning, utilizing sparse auto-encoder. They used neuroimages obtained from neuroimaging initiative database for identifying the region of brain images that are sensitive to AD progression. These previous studies tried to leverage their results by incorporating fMRI data into their models. Although it may have a positive impact on the result, not all the patients have the fMRI data so it may not be broadly applicable, compared to the application using routine EHR data in the health care population. Also, they did not try to identify new risk factors associated with the CI patients in EHRs but only rely on existing known medical conditions and fMRI data to predict the CI. There are studies focused on predicting progression from MCI to dementia using neuropsychological data. Researches [26] considered the neuropsychological test results to examine their applicability for predicting dementia using a machine learning algorithm. They used a feature selection ensemble approach to choose the features available in the neuropsychological test as a predictor of developing AD dementia. The neuropsychological test to predict the time conversion from MCI was also investigated in [27]. In this study, MCI patients were grouped with regards to who developed to dementia (converter MCI) or remained MCI (stable MCI) during a specified time window. Then a prognostic model was developed to predict the conversion time as early as 5 years before developing to dementia. Unlike the previous studies, we applied a machine learning approach (i.e., topic modeling) to examine topics and terms in EHR free text that can be potentially used for early detection of CI. A few studies have focused on the early diagnosis of CI [28]; however, these studies have followed the conventional approaches of assessing patients by i-ADL and b-ADL rather than utilizing machine learning algorithm and EHR free text.

Methods

The basic EHR corpus statistics (i.e., distributions of event types and practice settings of the first CI diagnosis, numbers of clinical notes between CI and CU patients) were examined. Temporal trends of patient ADL were compared and topics in the clinical free text were analyzed over time using three machine learning models between physician-diagnosed CI and CU patient groups.

Data

The study cohort was selected from patients 65 years of age or older at the time of enrollment in the Mayo Clinic Biobank (n = 22,772), where we identified physician-diagnosed CI patients (n = 1,435; male 55%) and CU patients (n = 1,435) matched by age (+/− 1 year) and sex. The physician-diagnosed CI patients were determined based on diagnosis (i.e., dementia, cognitive impairment, cognitive deficit, cognitive decline, mild cognitive impairment) under the diagnosis section in clinical notes [29].

Corpus analysis

The basic EHR corpus statistics relevant to CI diagnosis (i.e., the distributions of event types and practice settings of the first CI diagnosis) were examined and also the number of clinical notes over time between CI and CU patients was compared.

Analysis of activity of daily living

The ADL was collected from two sources: 1) the current visit information, which is provided and updated by the patients every 6 months when they visit the Mayo Clinic, 2) certain sections in clinical notes (i.e., instructions for continuing care, ongoing care orders, system review). The current visit information includes questionnaires to assess the ability of patients to accomplish ADL (binary assessment assessing the difficulty of ADL: yes or no) in a structured format. The clinical notes were processed by the MedTaggerIE module in MedTagger [30, 31], which is the open-source pipeline developed by Mayo Clinic for pattern-based information extraction with a capability of assertion detection (i.e., negated, possible, hypothetical, associated with a patient) and normalization, to extract ADL related concepts. These concepts were automatically mapped to the corresponding predefined ADL categories through the MedTaggerIE implementation (i.e., rule-based normalization process). We only included non-negated ADL related concepts. Once we obtained ADL concepts, they were mapped to items in Katz’s index (b-ADL) [12] and Lawton scale (i-ADL) [13-15], which are the most commonly used tools for assessing ADL. The items of ADL used in this study for each ADL category are—1) b-ADL: bathing, dressing, transferring, toileting, and feeding; 2) i-ADL: using transportation, shopping, preparing food, housekeeping, responsibility for own medications, and handling financing. These items can be mapped to the International Classification of Functioning, Disability, and Health (ICF) [32], allowing for broad information exchange. The temporal trends of b-ADL and i-ADL between CI and CU patients were compared in every 6 months for 5 years before the first physician-diagnosed CI and the latest visit for CI and CU patients, respectively.

Analysis of topics in clinical notes

The topics in clinical notes were investigated: 1) how topic terms evolve in CI patients each year for the past 5 years (experiment 1), and 2) how topic terms are different between CI and CU patients over the 5-year period before the development of CI (experiment 2). This step-wise time frame allows us to observe how the topics change over time, motived by the expert recommendation that people older than 65 years old should visit doctors every 6 months to determine if symptoms are staying the same, improving or growing worse [17]. We examined the topics in 1) entire clinical notes, 2) individual sections (i.e., history of present illness, diagnosis, current medication) independently, and 3) the set of sections that most likely include medical concepts of interest (i.e., chief complaint, history of present illness, system review, past medical history, physical examination, impression/report/plan, and diagnosis). For preprocessing the topic models, we keep the most frequent 2,000 words as the vocabulary after removing stop words and stemming. We applied three different machine learning models; two conventional topic modeling methods (LDA and TKM) and one deep learning approach (KATE) as follows. The number of topics was determined based on the self-regulatory capability embedded in a TKM model.

Latent Dirichlet allocation (LDA)

It is a generative probabilistic model in which the document will be viewed as a mixture of various topics and each topic as a distribution of the words [33]. We set the number of topics to 20 and 10 words distribution in each topic. Other hyper parameters were set as the code implemented in [34].

Topic keyword model (TKM)

This method addresses the shortcoming of LDA approach (i.e., ignoring the order of words). In TKM, each word in each topic aims to show how common the word is within the topic and how common it is between other topics [35]. The other advantage of this method is that redundant topics will be removed automatically. We used the hyper parameters as explained in the paper in [35].

K competitive autoencoder (KATE)

An autoencoder is a neural network which can automatically learn data representations though constructing its input at the output level. Many variants of autoencoders have been proposed mainly for image data. However, KATE has been designed to overcome the weakness of traditional autoencoder which is not suitable for textual data [34]. The number of the topics in this experiment was set to 20 and 10 words distribution for each topic. Other deep learning parameters were set as discussed in the original paper [34].

Results

We first examined basic EHR corpus statistics of the cohort. Then, we analyzed patterns of temporal trends of 1) b-ADL and i-ADL and 2) individual ADL between CI and CU patients before patients develop CI. The outcomes of three topic modeling methods (i.e., terms and topics mined from clinical notes) were analyzed and compared between the two patient groups over time, both qualitatively and quantitatively, in order to better understand patient medical conditions that may contribute more to CI development.

Corpus statistics

Figure 1 shows major event types (i.e., note types) and practice settings along with their occurrences in which a physician first diagnosed CI. The consultation was the most dominant event to diagnose CI (28%), followed by subsequent visit (19%), limited exam (18%), multi-system evaluation (11%), and supervisory (6%), which cover more than 80% of total events of CI diagnosis. For practice setting, neurology (31%) was the most dominant, followed by primary care (26%), general internal medicine (12%), family medicine (6%), and brain (3%).
Fig. 1

Distribution of the first CI diagnosis (CON: consult, SV: subsequent visit, LE: limited exam, ME: multi-system evaluation, SUP: supervisory, SE: specialty evaluation, ADM: admission; GIM: general internal medicine)

Distribution of the first CI diagnosis (CON: consult, SV: subsequent visit, LE: limited exam, ME: multi-system evaluation, SUP: supervisory, SE: specialty evaluation, ADM: admission; GIM: general internal medicine) Table 1 contains the statistics of clinical notes for the past 5 years of CI and CU patients before they develop CI and the latest visit date, respectively. As can be seen, CI patients consistently showed higher reading of clinical notes than CU patients and the difference was most significant in the first year before CI diagnosis.
Table 1

Average number of clinical notes for CI and CU patients (SD in parenthesis)

YearaCI patientCU patient
130.7 (41.5)21.8 (34.0)
222.4 (26.4)17.2 (24.0)
321.4 (25.7)17.0 (23.5)
421.1 (25.7)16.1 (20.9)
518.7 (21.3)15.4 (18.7)

aBefore CI diagnosis or latest note date for CI and CU patients, respectively

Average number of clinical notes for CI and CU patients (SD in parenthesis) aBefore CI diagnosis or latest note date for CI and CU patients, respectively

ADL distribution

Figure 2 shows temporal distributions of the deteriorated b-ADL and i-ADL of CI and CU patients in three age groups (65–74, 75–84, and 85 & up). Overall, CI patients had worse b-ADL and i-ADL (i.e., a higher ratio of deteriorated ADL) than CU patients in all age groups and this trend is more significant when it is close to physician-diagnosed CI for CI patients. The deteriorated b-ADL and i-ADL between the age groups of 65–74 and 75–84 are not much different for both CI and CU patients. Interestingly, the overall CU patients’ b-ADL were worse than i-ADL, but it is opposite for CI patients—i.e., CI patients’ i-ADL became worse than b-ADL over time, mainly when it was close to 1.5 to 1 year(s) before the physician-diagnosed CI.
Fig. 2

Distribution of b-ADL and i-ADL for CI and CU patient groups (x-axis is year(s) before the 1st physicain-diagnosed CI for CI patients and the latest visit for CU patients; y-axis is a ratio of patients who have a deteriorated ADL)

Distribution of b-ADL and i-ADL for CI and CU patient groups (x-axis is year(s) before the 1st physicain-diagnosed CI for CI patients and the latest visit for CU patients; y-axis is a ratio of patients who have a deteriorated ADL) We have also examined individual ADL trajectories for the entire patient cohort between CI and CU patients. Overall, CI patients had more deteriorated ADL than CU patients over time for all ADL categories. The most deteriorated ADL in 6 months prior was transferring (17% for CI and 14% for CU patients) in b-ADL and housekeeping (14% for CI and 10% for CU patients) in i-ADL. The difference between the two groups is relatively small for housekeeping and transferring, but large for bathing and responsibility for own medication (Fig. 3).
Fig. 3

ADL distributions for CU and CI patient groups (x-axis is year(s) before the 1st physicain-diagnosed CI for CI patients and the latest clinical visit for CU patients; y-axis is a ratio of patients who have a deteriorated ADL)

ADL distributions for CU and CI patient groups (x-axis is year(s) before the 1st physicain-diagnosed CI for CI patients and the latest clinical visit for CU patients; y-axis is a ratio of patients who have a deteriorated ADL)

Topic modeling

Qualitative analysis

We examined the topic terms extracted by three different models (i.e., LDA, TKM, and KATE) from clinical notes to compare hidden topics in CI patients before they develop CI. This approach may reveal potential patient medical conditions that lead to CI. Tables 2, 3 and 4 include topic terms generated by different topic models in different portions of clinical notes for 6 months before physician-diagnosed CI. The bold font in the topic denotes the correlated words in a given topic relevant to CI.
Table 2

Topic words by TKM (6 months before CI diagnosis)

SectionWord distribution for topics
Allpain symptom scalepati feel hpi fatigu numer loss vomit rate worst appetit statusth climb headach
Set of sectionssleep apnea cpap obstruct sleepi oximetri interfac daytime polysomnographi snore
History of present illnessglucos sugar metformin pressur blood insulin vitamin diabet cholesterol losartan interact hydrochlorothiazide
Medicationsugar glucos metformin decitabin blood pseudogout copeman diabet losartan fast insulin lantu read station glipizide
Diagnosislesion carcinoma cell dermatolog melanoma ulcer cancer squamou surgeri concern examin nonmelanoma

The bold font denotes the correlated words for a given topic relevant to CI

Table 3

Topic words by KATE (6 months before CI diagnosis)

SectionWord distribution for topics
Allmouth hpi releas capsul sleep apnea gi nasal prescript obstruct
Set of sectionsmedic prescript sleep hypertens obstruct concern diabet apnea hyperlipidemia acut
History of present illnesschronic diseas atrial hypertens failur fibril acut heart coronari back
Medicationdrop zocor aspirin ophthalm day tablet low atenolol apr
Diagnosishistori sleep apnea obstruct hypertens disord hyperlipidemia neuropathi bilater depress

The bold font denotes the correlated words for a given topic relevant to CI

Table 4

Topic words by LDA (6 months before CI diagnosis)

SectionWord distribution for topics
Allnormal distress clear alert bilater soft edema sound orient tender
Set of sectionsdiseas histori hypertens chronic statu arteri atrial hyperlipidemia coronari diabet
History of present illnessurinari urin incontin bladder infect tract symptom deni histori urgenc
Medicationcarbidopa levodopa hs benjamin start vitamin garlic bid knutson hydrochlorothiazide
Diagnosismemori hypertens concern hyperlipidemia health mainten chronic hypothyroid complaint elev

The bold font denotes the correlated words for a given topic relevant to CI

Topic words by TKM (6 months before CI diagnosis) The bold font denotes the correlated words for a given topic relevant to CI Topic words by KATE (6 months before CI diagnosis) The bold font denotes the correlated words for a given topic relevant to CI Topic words by LDA (6 months before CI diagnosis) The bold font denotes the correlated words for a given topic relevant to CI The words in the tables are stemmed. We included one representative cluster of the topics for each section. As can be seen in the tables, the topics are distinguishable of each other, capturing a meaningful representation of the text data. For example, Table 2, all sections show some symptoms related to “fatigue,” which may be the potential risk of dysfunction [36]; the topic in set of sections is relevant to “sleep issue” that could be observed in the individuals suffering from cognitive disorder [36, 37]. The topic words in the history of present illness section, we can observe glucose, diabetes, insulin, and hydrochlorothiazide, which are related to diabetes disease considered as a potential risk factor of cognitive decline [38]. For the topic in the medication section, we observed medications to control high blood sugar [38]. The topic in the diagnosis section includes the terms related to cancer [39-42]. Table 3, set of section and history of present illness include hyperlipidemia that can be considered as a risk factor of CI [43], coronary artery disease and hypertension, which are relevant to cognitive decline [44, 45]. In Table 4, LDA result in similar outcomes as TKM and KATE is shown. Words like edema, distress, memory, hypertension, coronary, urinary and hyperlipidemia as the potential risk factor of cognitive dysfunction was discussed [44-47]. Carcinoma, melanoma, cancer, and squamous in the last row are the terms related to cancer [39-42].

Quantitative analysis

We quantified how the topic terms learned by the topic models are: 1) changed in CI patients when they approach physician-diagnosed CI, comparing year by year for the past 5 years (experiment 1), and 2) distinct between CI and CU patients for the entire past 5 years (experiment 2). We utilized aggregated term frequency in the topic terms over time. For the first approach (experiment 1), the differences of topic term frequencies between two consecutive years prior to CI diagnosis were computed (starting from 1 year prior to the CI diagnosis), repeated for each year, for the whole 5 year period. We used 400 topic terms for each year. This may allow us to identify potential topic terms associated with CI development because we may observe more frequent topic terms that are relevant to CI when it approaches the CI diagnosis date. For the second approach (experiment 2), we also used the same approach of aggregated term-frequency differences but for the entire 5-year period. In this way, the common topic terms between CI and CU patients might be sorted out and the remaining terms are likely the ones associated with CI. The reason we used the entire 5 years was that we have not observed any significance comparing year by year. Figure 4 shows the high-level concept of our approach using aggregated term differences. The result of these approaches is visualized in Figs. 5, 6, 7, 8, 9 and 10. The larger words denote that they appear more frequently in the result of topic modeling on clinical notes compared to the previous year (experiment 1), or in the whole 5 years (experiment 2) (the corresponding individual raw data in Figs. 5, 6, 7, 8, 9 and 10 are located in Tables in Appendix). The results were compared with the recent publication to verify whether this approach generates meaningful outcomes relevant to CI.
Fig. 4

Aggregated term frequencies. The first table shows the frequency one year before CI development, middle table is the frequency two year before CI development. Last table is the result which terms repeated most

Fig. 5

Topic terms for CI patients - TKM (Experiment 1)

Fig. 6

Topic terms for CI patients - KATE (Experiment 1)

Fig. 7

Topic terms for CI patients - LDA (Experiment 1)

Fig. 8

Topic terms in the TKM model (Experiment 2)

Fig. 9

Topic terms in the KATE model (Experiment 2)

Fig. 10

Topic terms in the LDA model (Experiment 2)

Aggregated term frequencies. The first table shows the frequency one year before CI development, middle table is the frequency two year before CI development. Last table is the result which terms repeated most Topic terms for CI patients - TKM (Experiment 1) Topic terms for CI patients - KATE (Experiment 1) Topic terms for CI patients - LDA (Experiment 1) Topic terms in the TKM model (Experiment 2) Topic terms in the KATE model (Experiment 2) Topic terms in the LDA model (Experiment 2) A disease, “lymphoma” was seen in multiple results (Figs. 5a, 6b, c, 7a, b, c, 8b, 9b, c and 10a, c), which appeared in Hodgkin lymphoma patients complaining about cognitive deterioration and fatigue [48]. A researcher found that cognitive decline was more severe and frequent in Hodgkin lymphoma patients compared to the healthy population [48]. Based on recent study patients with “nocturnal hypoxia” had poor memory retention compared with healthy individuals [49]. Indeed, “oximetry” (Fig. 5a) is a device able to measure the oxygen saturated in the blood in hypoximia patients. In another study [50], a researcher demonstrated that “global cerebral edema” is a vital risk factor for cognitive dysfunction which we see more frequently in Figs. 5a, 6a, and 10b,c. Researchers studied the association between cancer and cognitive decline in older ages [39-42]. They concluded that cancer therapy could negatively impact cognition in some patients. Regarding to this, the word “metastasi,” “squamous,” “chemotherapy,” “oxaliplatin,” and “carcinoma” can be seen in Figs. 5c, 7b, c, 8a, b, 9b, and 10a, c. It has been explored that “tinnitus patients” are more at risk of the cognitive deficit as shown in Fig. 5b, c [51]. The wordbevacizumab” in Fig. 8a is a cancer medicine that interferes with the growth of a cancer cell in the body. Indeed, it is used to treat certain types of brain cancer or kidney cancer. The relation between urinary disease and CI has been investigated in several studies (Figs. 6c and 8b) [46, 47]. The words like “depression,” “confusion,” “memory,” and “pressure,” which has been already known as the sign of CI can be seen in the Figs. 6b, 7a, b, c, 9b, c, and 10b, c. A couple of the studies explored the relationship between CI in late life and hyperlipidemia, hypertension, and coronary (Figs. 6a, 9a, 8c, and 10c). Heavy snoring and sleep apnea in Figs. 6a, b, c and 8a have been investigated largely by researchers which shows a strong link to earlier cognitive decline [37]. An apnea/hypopnea index is an index, which is usually used to indicate the severity of sleep apnea in patients, is another extracted topic repeated 8 times more in the CI population compared with CU. CPAP is used to treat sleep-related breathing disorders including sleep apnea (Fig. 8c). Diabetes diseases have been identified as a potential risk of cognitive dysfunction [38] and regarding that topic diabetes, glucose, and sugar [44, 45, 52] can be seen at Figs. 6c, 9c, and 10a. In [53], researchers showed that memory impairment has a particular association with the presence of left ventricular hypertrophy (Figs. 9b and 10b). Atrial Fibrillation has been studied at [54] as a risk factor of cognitive decline (Figs. 6c, 8c, and 9c). We can find the relation between “osteomyel” patients and CI at [55] as illustrated in Fig. 8b. In [56] researcher explored that after ischemia cognitive function is disrupted (Fig. 8c). Figures 8b and 10b, c indicate the word “lung.” Some studies including researchers at [57] discussed lung diseases as a determinant of cognitive decline. Apart from the topics and words discussed here, there are some words whose frequency was high in the years close to CI diagnosis, so they are bold and large. Some of them, for example, caregiver, care, exercise, and neuropathy may be indirectly relevant to CI. However, there are common words like boilerplate such as problem, pain, sudden, disease, status, which can appear in all diseases and need to be filtered out.

Discussion

It is important to identify early signs of CI and thus clinicians plan accordingly and perform appropriate actions, relieving potential cost and burden. In this study, we examined basic EHR corpus statistics relevant to CI patients, and analyzed temporal trends of patient ADL over time and topics in clinical notes between CI and CU patient groups in order to characterize and better understand elderly patient’s medical conditions before they develop CI. The consultation was the most significant event type, and the neurology was the most dominant practice setting first to diagnose CI by physicians. The consistently higher number of clinical notes for CI patients than CU patients presumably concludes that CI patients likely visit hospitals or clinics more than CU patients. Temporal trends of individual ADL and the groups of ADL (i.e., b-ADL and i-ADL) have been examined over time back in 5 years before the first physician-diagnosed CI and the latest visit for CU patients, respectively. It was observed that the trajectories of ADL deterioration became steeper in CI patients than CU patients approximately 1 to 1.5 year(s) before the actual physician diagnosis of CI. More notably, the deterioration of i-ADL was worse than that of b-ADL in CI patients during this period, which was not in the case in CU patients. Considering a significant delay in CI diagnosis and a missing opportunity for appropriate plans in the current practice [4, 5], this observation may be beneficial to promote early detection of CI. The trajectories of bathing (b-ADL) and responsibility for own medication (i-ADL) deteriorated much more rapidly in CI patients than CU patients over time. These measures might also be a potential surrogate symptom to facilitate early CI diagnosis. The result of this study suggests that using topic modeling can benefit to discover meaningful and hidden topics and terms of the clinical notes. The result was promising as we discussed in the qualitative and quantitative analysis. We observed that the words in the topic were mostly correlated and captured the underlying semantics. The model was able to extract the words relevant to CI; the words like hypertension, depress, and memory which are a potential indication associated with CI. We were also able to come up with other potential factors that may be relevant to CI according to the recent publications. Overall, the recent models TKM and KATE were better at capturing the semantically meaningful representation of the data compared to LDA. Further, KATE model generated more words related to CI which falls in memory, depression, hypertension, dizziness, and confusion category than TKM model. We validated the results of the topic modeling based on aggregated term frequencies. The results were visualized to show the hidden potential topics that may contribute to developing CI. These results were validated by recent publications and showed promising outcomes. However, some common topic words, not relevant to CI but may appear in any diseases, were also captured. A further post-process would be required to filter out them. Generally, CI is diagnosed by health professionals through asking questions to patients to assess memory, concentration, and understanding. However, it is not routinely performed in many healthcare institutions, causing a delay in timely CI diagnosis. Considering this fact, our study of the use of EHR free text to analyze early signals of CI would be a potential alternative to automate or support CI assessment and thus to facilitate a routine practice to detect CI in advance. The limitations of this study include the use of physician-diagnosed CI, which does not differentiate the severity of CI, instead of full assessment or test due to its unavailability. However, our study is still useful since the focus of this study is to explore the use of EHR documentation to promote early detection of CI, considering the significant delay in CI diagnosis by clinicians in the current health care practice. Another limitation would be a potential imbalanced distribution of clinical notes for certain illnesses (e.g., cancer patients are seen more than others and have more clinical notes). This may affect the result of topic modeling; however, we examined a broad range of topics and demonstrated good potential applicability.

Conclusion

There exist notable differences in temporal trends of b-ADL and i-ADL between CI and CU patients, approximately 1 to 1.5 year(s) earlier than actual physician-diagnosis CI—i.e., the steeper slope of overall ADL deterioration and worse i-ADL than b-ADL in CI patients during this period. The trajectories of certain individual ADL (bathing and responsibility of own medication) were closely associated with the CI development. The topics and terms over time obtained by topic modeling methods from clinical free text have the potential to show how CI patient’s conditions evolve and reveal overlooked conditions when they close to CI diagnosis. These observations may promote early detection of CI and thus expedite appropriate care of underlying diseases and comorbid conditions. In the future, we plan to use neuroimaging and assessment data to identify the more granular classification of cognitive function and develop a prediction model leveraging our observations to detect patients in high risk of different stages of CI and identify associated longitudinal risk factors.
Table 5

Raw data used in Fig. 5 TKM Experiment 1

Set of sectionsDiagnosisHistory of present illness
TermRate of increaseTermRate of increaseTermRate of increase
provid4limb4emerg4
prescript4chest3present4
statu4chemotherapi3walk3
kidnei3cgy3renal3
oximetri3dyspnea3bmd3
pap3myeloma3chemotherapi2
ecg3carcinoma3cpap2
edema3cefazolin2snore2
lymphoma3pressure2pulmonari2
asleep2bevacizumab2diabet2
urinary2hyperlipidemia2epworth2
hypertrophy2neuropathi2bladder2
hypersensitiviti2metastas2thyroid2
Insulin2melanoma2ischemia2
lung2Sugar2thyroid2
memory2Warfarin2insulin2
numb2lung2potassium2
symptom1.5ldl2fibril1.5
cough1.5Sleep1.5assist1.5
atrial1.5Medic1.5symptom1.5
Table 6

Raw data used in Fig. 6 KATE Experiment 1

Set of sectionsDiagnosisHistory of present illness
TermRate of increaseTermRate of increaseTermRate of increase
disease7lymphoma6problem7
Status6memori5care6
hypertens5acut4lower4
edema5heart4extrem4
hyperlipidemia5pulmonari3wife4
respons4care3depress4
loss4anxiety2hip3
sleep3adenocarcinoma2lymphoma3
bilater3cpap2assist3
cancer3hypothyroid2loss3
arteri3gleason2fibril3
apnea3psa2urinary3
fibril3melanoma2cough2
note2.5Fatigue2memori2
care2.33depress1.8hear2
evaluation2.33sleep1.8headach2
Coronary2anticoagul1.67memori2
nurs1.67obstruct1.6back1.67
symptom1.62major1.5ct1.5
breath1.5syndrom1.5ear1.5
Table 7

Raw data used in Fig. 7 LDA Experiment 1

Set of sectionsDiagnosisHistory of present illness
TermRate of increaseTermRate of increaseTermRate of increase
lymphoma3pressur3note5
walk3impair3emerg4
cycl3subdur3limb3
fall3liver3lymphoma3
head3chest2asleep2
surgeri3surgeri2neuropathi2
prophylaxi3bowel2memori2
atrial2traumat2dizzi2
systol2thyroid2numb2
depress2anxieti2issue2
weak2mood2joint2
dialysi2spinal2toe2
dizzi2glucose2blood2
dress2hypothyroid2hemorrhag2
pap2cpap2examin2
express2squamou2headach2
memori21.51.5pressur1.5
dose1.67atrial1.5lower1.5
skin1.5memori1.5symptom1.5
prescript1.5fibril1.5cancer1.5
Table 8

Raw data used in Fig. 8 TKM Experiment 2

Set of sectionsDiagnosisHistory of present illness
TermRate of increaseTermRate of increaseTermRate of increase
bowel12alert7suspect6
present11mssa6cholecyst6
autonom10epidur6obstruct5
igg10ech6emerg5
chain10bladder6ischemia4
bevacizumab10osteomyel6depress4
folfox10lung5diet4
ahi9nausea5swollen3.5
epworth9tumor5cpap3
snore9confus4potassium3
abdomin8difficulti4lung3
myeloma7plasma3.5bladder2.5
tinnitu6dysgeusia3.5confus2.33
housekeep6anxieti3plasma2.33
Sleep4.5cefazolin3concern2.25
Cough4.5canal3immune2
dyspnea4bursa2.5metastas2
urinari3.75abdomin2.33surgery2
hypersensitivity3descript2.25hyperlipidemia2
melanoma2.67lymphoma2thyroid2
Table 9

Raw data used in Fig. 9 KATE Experiment 2

Set of sectionsDiagnosisHistory of present illness
TermRate of increaseTermRate of increaseTermRate of increase
prevent12depress12caregiv14
listen6hypertroph10inr11
risk6care8fall9
correct5memori5anticoagul8
specif5lymphoma5depress5
examin4.5infect4.5toenail5
coumadin4gait4sugar5
coordin3.5stent4psa4
dizzi3cough3mellitu4
dress3cad3issu4
mellitu3toe3warfarin4
prostat3anemia2.5lymphoma3
chest3sensorineur2.43glucos3
diabet3chest2.2insulin3
servic3squamou1.75coronari2.33
insulin3fibril1.74abdomin2.25
prostat3urinari1.72diabet2.22
coronari2.67syndrom1.62fibril2.2
anticoagul1.83anticoagul1.55atrial2
lesion1.5atrial1.52urinari1.67
Table 10

Raw data used in Fig. 10 LDA Experiment 2

Set of sectionsDiagnosisHistory of present illness
TermRate of increaseTermRate of increaseTermRate of increase
cpap7hypertroph9reveal9
toenail7memori7coumadin7
infect6concern7dizzi6
glucos5hemorrhag6bilater5
lipid5parkinson6depress4.5
diabet4dizzi4bleed4
lung4neck4neurolog4
concern3.33depress3.2platelet3
allergi3headach3sleepi3
lymphoma3fatigu3warfarin2.67
depress3weight3inr2.5
adenocarcinoma3lung3head2.33
insulin3dystroph3hyperlipidemia2
toe2.33stroke3numb2
psa2.33aneurysm2.5edema2
hemoglobin2polyp2.33sugar2
bladder2anxieti2.33anticoagul2
prophylaxi2edema2.25dl1.75
coronari1.83bladder1.83ldl1.67
apnea1.67mellitu1.5lymphoma1.5
  47 in total

1.  Mild cognitive impairment: clinical characterization and outcome.

Authors:  R C Petersen; G E Smith; S C Waring; R J Ivnik; E G Tangalos; E Kokmen
Journal:  Arch Neurol       Date:  1999-03

2.  Ten great public health achievements--United States, 1900-1999.

Authors: 
Journal:  MMWR Morb Mortal Wkly Rep       Date:  1999-04-02       Impact factor: 17.586

3.  A comparative review of the Katz ADL and the Barthel Index in assessing the activities of daily living of older people.

Authors:  Irene Hartigan
Journal:  Int J Older People Nurs       Date:  2007-09       Impact factor: 2.115

4.  STUDIES OF ILLNESS IN THE AGED. THE INDEX OF ADL: A STANDARDIZED MEASURE OF BIOLOGICAL AND PSYCHOSOCIAL FUNCTION.

Authors:  S KATZ; A B FORD; R W MOSKOWITZ; B A JACKSON; M W JAFFE
Journal:  JAMA       Date:  1963-09-21       Impact factor: 56.272

5.  Urinary cortisol excretion: is it really a predictor of incident cognitive impairment?

Authors:  Martin Fenske
Journal:  Neurobiol Aging       Date:  2006-10-31       Impact factor: 4.673

Review 6.  Missed and delayed diagnosis of dementia in primary care: prevalence and contributing factors.

Authors:  Andrea Bradford; Mark E Kunik; Paul Schulz; Susan P Williams; Hardeep Singh
Journal:  Alzheimer Dis Assoc Disord       Date:  2009 Oct-Dec       Impact factor: 2.703

7.  National estimates of the quantity and cost of informal caregiving for the elderly with dementia.

Authors:  K M Langa; M E Chernew; M U Kabeto; A R Herzog; M B Ofstedal; R J Willis; R B Wallace; L M Mucha; W L Straus; A M Fendrick
Journal:  J Gen Intern Med       Date:  2001-11       Impact factor: 5.128

8.  Predictors of cognitive dysfunction after subarachnoid hemorrhage.

Authors:  Kurt T Kreiter; Daphne Copeland; Gary L Bernardini; Joseph E Bates; Shelley Peery; Jan Claassen; Y Evelyn Du; Yaakov Stern; E Sander Connolly; Stephan A Mayer
Journal:  Stroke       Date:  2002-01       Impact factor: 7.914

Review 9.  Screening for dementia in primary care: a summary of the evidence for the U.S. Preventive Services Task Force.

Authors:  Malaz Boustani; Britt Peterson; Laura Hanson; Russell Harris; Kathleen N Lohr
Journal:  Ann Intern Med       Date:  2003-06-03       Impact factor: 25.391

10.  Physician recognition of cognitive impairment: evaluating the need for improvement.

Authors:  Joshua Chodosh; Diana B Petitti; Marc Elliott; Ron D Hays; Valerie C Crooks; David B Reuben; J Galen Buckwalter; Neil Wenger
Journal:  J Am Geriatr Soc       Date:  2004-07       Impact factor: 5.562

View more
  2 in total

1.  Early Alert of Elderly Cognitive Impairment using Temporal Streaming Clustering.

Authors:  Omar A Ibrahim; Sunyang Fu; Maria Vassilaki; Ronald C Petersen; Michelle M Mielke; Jennifer St Sauver; Sunghwan Sohn
Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)       Date:  2021-12

2.  Deep Learning Prediction of Mild Cognitive Impairment using Electronic Health Records.

Authors:  Sajjad Fouladvand; Michelle M Mielke; Maria Vassilaki; Jennifer St Sauver; Ronald C Petersen; Sunghwan Sohn
Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)       Date:  2020-02-06
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.