| Literature DB >> 35799294 |
Zitao Shen1, Dalton Schutte2,3, Yoonkwon Yi1, Anusha Bompelli2, Fang Yu4, Yanshan Wang5, Rui Zhang6,7.
Abstract
BACKGROUND: Since no effective therapies exist for Alzheimer's disease (AD), prevention has become more critical through lifestyle status changes and interventions. Analyzing electronic health records (EHRs) of patients with AD can help us better understand lifestyle's effect on AD. However, lifestyle information is typically stored in clinical narratives. Thus, the objective of the study was to compare different natural language processing (NLP) models on classifying the lifestyle statuses (e.g., physical activity and excessive diet) from clinical texts in English.Entities:
Keywords: Alzheimer’s disease; Clinical text classification; Deep learning; Electronic health records; Machine learning; Natural language processing
Mesh:
Year: 2022 PMID: 35799294 PMCID: PMC9261217 DOI: 10.1186/s12911-022-01819-4
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1Overview of the study design
Example sentences with weak labels for excessive diet and physical activity
| Category | Class | Sentence example |
|---|---|---|
| Excessive diet | High fat diet | Pt is having fatty food |
| High calorie diet | He had token high calorie diet for 2 weeks | |
| High salt diet | His current diet contains too much food with high salt | |
| Normal diet | She backs to normal diet | |
| Non-specific abnormal | She has no knowledge of salt restrictions | |
| Physical activity | Physical activity | Pt has increase regular physical activity |
| Physical inactivity | He didn’t maintain daily exercise |
Comparison of results for models for the physical activity case study
| Model | Weighted avg | Physical active | Physical inactivity | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 | |
| Rule-based | 0.88 | 0.53 | 0.62 | 0.87 | 0.32 | 0.47 | 0.89 | 0.85 | 0.87 |
| Logistic regression | 0.91 | 0.89 | 0.89 | 0.81 | 0.90 | 0.77 | 0.87 | ||
| Random forest | 0.89 | 0.88 | 0.88 | 0.95 | 0.85 | 0.89 | 0.80 | 0.93 | 0.86 |
| SVM | 0.91 | 0.89 | 0.89 | 0.82 | 0.90 | 0.79 | 0.88 | ||
| BERT base | 0.90 | 0.90 | 0.89 | 0.89 | 0.94 | 0.92 | 0.91 | 0.82 | 0.86 |
| Bio BERT | 0.89 | 0.89 | 0.89 | 0.91 | 0.92 | 0.91 | 0.88 | 0.85 | 0.60 |
| PubMed BERT (Abs) | 0.90 | 0.89 | 0.89 | 0.89 | 0.95 | 0.92 | 0.92 | 0.80 | 0.85 |
| PubMed BERT (Abs + Ft) | 0.88 | 0.88 | 0.87 | 0.86 | 0.95 | 0.90 | 0.91 | 0.76 | 0.82 |
| Bio-clinical BERT | 0.91 | 0.91 | 0.91 | 0.91 | 0.95 | 0.93 | 0.91 | 0.86 | |
| UMLS BERT | 0.91 | 0.84 | |||||||
*Bold numbers indicate best performance in each column
Comparison of results for models for the excessive diet case study
| Model | Weighted avg | Normal diet | High calorie diet | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 | |
| Rule-based | 0.91 | 0.86 | 0.87 | 0.52 | 0.67 | 0.94 | 0.94 | 0.94 | |
| Logistic regression | 0.88 | 0.85 | 0.85 | 0.50 | 0.83 | 0.63 | 0.94 | 0.97 | |
| Random forest | 0.89 | 0.86 | 0.86 | 0.52 | 0.89 | 0.66 | 0.94 | 0.97 | |
| SVM | 0.88 | 0.85 | 0.85 | 0.50 | 0.83 | 0.63 | 0.94 | 0.97 | |
| BERT base | 0.91 | 0.91 | 0.91 | 0.66 | 0.75 | 0.70 | 0.98 | 0.99 | |
| Bio BERT | 0.92 | 0.92 | 0.92 | 0.71 | 0.78 | 0.98 | 0.99 | ||
| PubMed BERT (Abs) | 0.91 | 0.90 | 0.90 | 0.59 | 0.81 | 0.68 | |||
| PubMed BERT (Abs + Ft) | 0.90 | 0.90 | 0.90 | 0.63 | 0.64 | 0.62 | |||
| Bio-clinical BERT | 0.75 | 0.73 | |||||||
| UMLS BERT | 0.92 | 0.92 | 0.92 | 0.72 | 0.69 | 0.70 | |||
*Bold numbers indicate best performance in each column
Fig. 2Results for training models on portions of physical activity data