| Literature DB >> 35258457 |
Andrea Ferrario1,2, Minxia Luo3,4, Angelina J Polsinelli5, Suzanne A Moseley6, Matthias R Mehl7, Kristina Yordanova8, Mike Martin3,4, Burcu Demiray3,4.
Abstract
BACKGROUND: Language use and social interactions have demonstrated a close relationship with cognitive measures. It is important to improve the understanding of language use and behavioral indicators from social context to study the early prediction of cognitive decline among healthy populations of older adults.Entities:
Keywords: Electronically Activated Recorder (EAR); behavioral indicators; cognitive aging; language complexity; machine learning; natural language processing; social context
Year: 2022 PMID: 35258457 PMCID: PMC8941438 DOI: 10.2196/28333
Source DB: PubMed Journal: JMIR Aging ISSN: 2561-7605
All runs considered in this study.
| Run | Feature combination | Features, n |
| R0 | Sociodemographic | 4 |
| R1 | Sociodemographic + linguistic measures | 18 |
| R2 | Sociodemographic + social context | 23 |
| R3 | Sociodemographic + POSa tags | 19 |
| R4 | Sociodemographic + linguistic measures + social context | 37 |
| R5 | Sociodemographic + social context + POS tags | 38 |
| R6 | Sociodemographic + linguistic measures + POS tags | 33 |
| R7 | Sociodemographic + linguistic measures + social context + POS tags | 52 |
aPOS: part of speech.
Figure 1Repeated cross-validation with the recursive feature elimination (RFE) algorithm.
Summary of all hyperparameters tuned in the repeated cross-validation with the RFE algorithm.
| Algorithm or model | Hyperparameters |
| RFEa algorithm |
Number of features to select Number of features to reduce at each step |
| Machine learning model (RFb) |
Number of trees Maximum tree depth |
| Machine learning model (XGBoostc and LightGBMd) |
Number of trees Maximum tree depth Learning rate |
aRFE: recursive feature elimination.
bRF: random forest.
cXGBoost: extreme gradient boosting.
dLightGBM: light gradient boosting machine.
Performance of the best models for the prediction of the Keep Track target variable. All results were obtained for 10 folds and 10 repeats.
| Run | Model | MSEa, mean (SD) | Features, n |
| R0 | LightGBMb | 13.26 (5.33) | 4 |
| R1 | LightGBM | 12.80 (5.43) | 10 |
| R2 | LightGBM | 12.46 (4.85) | 5 |
| R3 | LightGBM | 12.95 (4.98) | 10 |
| R4c | LightGBM | 11.81 (4.92) | 10 |
| R5 | LightGBM | 12.12 (4.43) | 20 |
| R6 | LightGBM | 12.65 (4.92) | 15 |
| R7 | LightGBM | 12.02 (4.66) | 25 |
aMSE: mean squared error.
bLightGBM: light gradient boosting machine.
cThe best run was R4.
Features, their importance, and type for the best light gradient boosting machine model of R4 for prediction of Keep Track scores.
| Rank | Featurea | Importance of feature | Type of feature |
| 1 | alone_prc | 0.34 | Social context |
| 2 | age at EARb testing | 0.16 | Sociodemographic |
| 3 | mean_Density | 0.13 | Linguistic measure |
| 4 | std_ChaoShen | 0.13 | Linguistic measure |
| 5 | TV_prc | 0.10 | Social context |
| 6 | in_transit_prc | 0.07 | Social context |
| 7 | partner_sign_other_prc | 0.04 | Social context |
| 8 | small_talk_prc | 0.03 | social context |
aDescriptions of features are listed in Multimedia Appendix 1.
bEAR: Electronically Activated Recorder.
Performance of the best models for the prediction of the Consonant Updating target variable.
| Run | k | Model | MSEa, mean (SD) | Features, n |
| R0 | 10 | LightGBMb | 113.50 (45.55) | 4 |
| R1 | 5 | LightGBM | 114.85 (25.64) | 18 |
| R2 | 5 | LightGBM | 114.00 (26.04) | 5 |
| R3c | 5 | LightGBM | 97.26 (21.38) | 5 |
| R4 | 10 | LightGBM | 114.30 (45.50) | 10 |
| R5 | 5 | LightGBM | 100.73 (22.93) | 5 |
| R6 | 5 | LightGBM | 100.07 (22.74) | 5 |
| R7 | 10 | XGBoostd | 101.38 (41.32) | 5 |
aMSE: mean squared error.
bLightGBM: light gradient boosting machine.
cThe best run was R3.
dXGBoost: extreme gradient boosting.
Features, their importance, and type for the best light gradient boosting machine model of R3 for prediction of Consonant Updating scores.
| Rank | Feature | Importance of feature | Type of feature |
| 1 | NUM | 0.37 | Part of speech |
| 2 | INTJ | 0.23 | Part of speech |
| 3 | NOUN | 0.23 | Part of speech |
| 4 | ADP | 0.17 | Part of speech |
Performance of the best models for the prediction of the Working Memory target variable.
| Run | k | Model | MSEa, mean (SD) | Features, n |
| R0 | 5 | LightGBMb | 37.75 (7.94) | 4 |
| R1 | 10 | LightGBM | 37.70 (14.07) | 10 |
| R2 | 5 | LightGBM | 37.75 (7.93) | 5 |
| R3c | 5 | XGBoostd | 30.23 (6.63) | 10 |
| R4 | 5 | LightGBM | 37.75 (7.93) | 5 |
| R5 | 10 | XGBoost | 31.49 (13.03) | 5 |
| R6 | 10 | LightGBM | 31.25 (12.24) | 5 |
| R7 | 5 | XGBoost | 32.22 (6.77) | 5 |
aMSE: mean squared error.
bLightGBM: light gradient boosting machine.
cThe best run was R3.
dXGBoost: extreme gradient boosting.
Features, their importance, and type for the best extreme gradient boosting model of R3 for the prediction of Working Memory scores.
| Rank | Feature | Importance of feature | Type of feature |
| 1 | NUM | 0.30 | Part of speech |
| 2 | INTJ | 0.20 | Part of speech |
| 3 | NOUN | 0.20 | Part of speech |
| 4 | PRON | 0.13 | Part of speech |
| 5 | ADP | 0.10 | Part of speech |
| 6 | PROPN | 0.07 | Part of speech |
Benchmarking the best models from Tables 3, 5, and 7 with the constant model predicting the mean value of the target variable for all three predictions.
| Prediction | MSEa of constant model, mean (SD) | MSE of best model, mean (SD) |
| Keep Track | 13.57 (5.37) | 11.81 (4.92) |
| Consonant Updating | 114.77 (45.71) | 97.26 (21.38) |
| Working Memory | 37.81 (14.05) | 30.23 (6.63) |
aMSE: mean squared error.