| Literature DB >> 32442149 |
Adane Tarekegn1, Fulvio Ricceri2,3, Giuseppe Costa2,3, Elisa Ferracin3, Mario Giacobini4.
Abstract
BACKGROUND: Frailty is one of the most critical age-related conditions in older adults. It is often recognized as a syndrome of physiological decline in late life, characterized by a marked vulnerability to adverse health outcomes. A clear operational definition of frailty, however, has not been agreed so far. There is a wide range of studies on the detection of frailty and their association with mortality. Several of these studies have focused on the possible risk factors associated with frailty in the elderly population while predicting who will be at increased risk of frailty is still overlooked in clinical settings.Entities:
Keywords: classification; elderly people; frailty; genetic programming; imbalanced dataset; machine learning; predictive modeling
Year: 2020 PMID: 32442149 PMCID: PMC7303829 DOI: 10.2196/16678
Source DB: PubMed Journal: JMIR Med Inform
Description of output variables in the dataset.
| Variable | Code | Value, n (%) | |
|
|
|
| |
|
| No | 0 | 1,053,790 (96.18) |
|
| Yes | 1 | 41,823 (3.82) |
|
|
|
| |
|
| No | 0 | 1,088,124 (99.32) |
|
| Yes | 1 | 7489 (0.68) |
|
|
|
| |
|
| No | 0 | 1,064,186 (97.13) |
|
| Yes | 1 | 31,427 (2.87) |
|
|
|
| |
|
| No | 0 | 1,088,530 (99.35) |
|
| Yes | 1 | 7083 (0.65) |
|
|
|
| |
|
| No | 0 | 1,056,695 (96.45) |
|
| Yes | 1 | 38,918 (3.55) |
|
|
|
| |
|
| No | 0 | 1,076,541 (98.26) |
|
| Yes | 1 | 19,072 (1.74) |
ED: emergency department.
Figure 1Evaluation metrics.
Figure 2Experimental workflow of the predictive machine learning model.
The most important variables in the mortality and fracture problems.
| Rank | Mortality problem | Fracture problem | ||
|
| Variable | Variable | ||
| 1 | Age | <.001 | Age | <.001 |
| 2 | Charlson index | <.001 | Femur fracture | <.001 |
| 3 | # urgent hospitalization | <.001 | # urgent hospitalization | <.001 |
| 4 | # total hospitalization | <.001 | Neck fracture | <.001 |
| 5 | Invalidity | <.001 | Green code | <.001 |
| 6 | # nontraumatic | <.001 | # total hospitalization | <.001 |
| 7 | Disability | <.001 | Charlson index | <.001 |
| 8 | Poly prescriptions | <.001 | Poly prescriptions | <.001 |
| 9 | Green code | <.001 | Invalidity | <.001 |
| 10 | Yellow code | <.001 | Disability | <.001 |
| 11 | Blood | <.001 | Nerve disease | <.001 |
| 12 | Anemia | <.001 | Depression | <.001 |
| 13 | Circulatory disease | <.001 | Blood | <.001 |
| 14 | Respiratory disease | <.001 | Anemia | <.001 |
| 15 | Urinary tract disease | <.001 | Yellow code | <.001 |
Figure 3Train accuracy (left) and test accuracy (right) for mortality data without performing any parameter tuning and using all the feature subsets (from top 3 to top 58 feature subsets). The left plot shows that random forest and decision tree overfit the training data, which poorly generalize on the test data as the number of features increase.
Figure 4Train accuracy (left) and test accuracy (right) for fracture data without performing parameter tuning and using all the feature subsets (from top 3 to top 58 feature subsets). The left plot shows that random forest and decision tree overfit the training data, which poorly generalize on the test data as the number of features increase.
Prediction performance using true positive rate and true negative rate for the six problems.
| Problem | SVMa | RFb | ANNc | DTd | GPe | |||||||||
|
| TPRf | TNRg | TPR | TNR | TPR | TNR | TPR | TNR | TPR | TNR | ||||
| Mortality | 0.78 | 0.78 | 0.79 | 0.77 | 0.79 | 0.78 | 0.60 | 0.79 | 0.75 | 0.76 | ||||
| Disability | 0.78 | 0.72 | 0.78 | 0.71 | 0.75 | 0.75 | 0.78 | 0.69 | 0.71 | 0.67 | ||||
| Fracture | 0.75 | 0.74 | 0.77 | 0.72 | 0.77 | 0.72 | 0.79 | 0.66 | 0.70 | 0.73 | ||||
| Urgent hospitalization | 0.61 | 0.73 | 0.65 | 0.68 | 0.66 | 0.68 | 0.64 | 0.68 | 0.66 | 0.62 | ||||
| Preventable hospitalization | 0.74 | 0.73 | 0.73 | 0.72 | 0.73 | 0.73 | 0.76 | 0.66 | 0.73 | 0.64 | ||||
| ED admissionh,i | 0.63 | 0.73 | 0.63 | 0.72 | 0.63 | 0.74 | 0.62 | 0.73 | 0.73 | 0.63 | ||||
aSVM: support vector machine.
bRF: random forest.
cANN: artificial neural network.
dDT: decision tree.
eGP: genetic programming.
fTPR: true positive rate.
gTNR: true negative rate.
hED: emergency department.
iwith a red code.
Results of Wilcoxon signed-rank test in terms of P values.
| Problem/dataset | SVMa vs GPb | RFc vs GP | NNd vs GP | DTe vs GP |
| Mortality | <.001 | .003 | .001 | <.001 |
| Fracture | <.001 | .02 | <.001 | .002 |
| Disability | .06 | .004 | .01 | .003 |
| Urgent hospitalization | .71 | .01 | .37 | .01 |
| Preventable hospitalization | .68 | .03 | .87 | .005 |
| Accessing the EDf with a red code | .006 | <.001 | .01 | <.001 |
aSVM: support vector machine.
bGP: genetic programming.
cRF: random forest.
dNN: neural network.
eDT: decision tree.
fED: emergency department.
Figure 5The score of five models across 10 validation samples on the mortality problem.
Figure 6The score of five models across 10 validation samples on the fracture problem.
Prediction results of models using a 10-fold cross-validation.
| Models | Accuracy | TPRa | TNRb | F1-score | |
|
|
|
|
|
| |
|
| ANNc | 0.78 | 0.81 | 0.76 | 0.79 |
|
| SVMd | 0.79 | 0.77 | 0.80 | 0.78 |
|
| RFe | 0.78 | 0.79 | 0.76 | 0.76 |
|
| LRf | 0.78 | 0.78 | 0.79 | 0.78 |
|
| DTg | 0.75 | 0.80 | 0.70 | 0.76 |
|
|
|
|
|
| |
|
| ANN | 0.75 | 0.77 | 0.73 | 0.75 |
|
| SVM | 0.75 | 0.77 | 0.74 | 0.75 |
|
| RF | 0.75 | 0.78 | 0.72 | 0.76 |
|
| LR | 0.75 | 0.75 | 0.75 | 0.75 |
|
| DT | 0.74 | 0.76 | 0.72 | 0.74 |
|
|
|
|
|
| |
|
| ANN | 0.74 | 0.76 | 0.71 | 0.75 |
|
| SVM | 0.75 | 0.78 | 0.73 | 0.76 |
|
| RF | 0.75 | 0.77 | 0.72 | 0.75 |
|
| LR | 0.75 | 0.76 | 0.73 | 0.74 |
|
| DT | 0.73 | 0.78 | 0.70 | 0.75 |
aTPR: true positive rate.
bTNR: true negative rate.
cANN: artificial neural network
dSVM: support vector machine.
eRF: random forest.
fLR: logistic regression.
gDT: decision tree.
Prediction results of models using a 10-fold cross-validation procedure.
| Models | Accuracy | TPRa | TNRb | F1-score | |
|
|
|
|
|
| |
|
| ANNc | 0.67 | 0.64 | 0.71 | 0.66 |
|
| SVMd | 0.75 | 0.77 | 0.73 | 0.76 |
|
| RFe | 0.66 | 0.65 | 0.67 | 0.66 |
|
| LRf | 0.67 | 0.72 | 0.62 | 0.65 |
|
| DTg | 0.66 | 0.65 | 0.67 | 0.65 |
|
|
|
|
|
| |
|
| ANN | 0.74 | 0.73 | 0.74 | 0.73 |
|
| SVM | 0.74 | 0.71 | 0.76 | 0.73 |
|
| RF | 0.73 | 0.73 | 0.74 | 0.73 |
|
| LR | 0.74 | 0.71 | 0.76 | 0.73 |
|
| DT | 0.72 | 0.73 | 0.71 | 0.72 |
|
|
|
|
|
| |
|
| ANN | 0.70 | 0.65 | 0.74 | 0.67 |
|
| SVM | 0.68 | 0.64 | 0.72 | 0.66 |
|
| RF | 0.68 | 0.66 | 0.70 | 0.67 |
|
| LR | 0.69 | 0.64 | 0.74 | 0.67 |
|
| DT | 0.67 | 0.70 | 0.65 | 0.68 |
aTPR: true positive rate.
bTNR: true negative rate.
cANN: artificial neural network.
dSVM: support vector machine.
eRF: random forest.
fLR: logistic regression.
gDT: decision tree.
hED: emergency department.