| Literature DB >> 34122784 |
Jessica Pinaire1,2, Etienne Chabert2, Jérôme Azé2, Sandra Bringay2,3, Paul Landais1.
Abstract
Prediction of a medical outcome based on a trajectory of care has generated a lot of interest in medical research. In sequence prediction modeling, models based on machine learning (ML) techniques have proven their efficiency compared to other models. In addition, reducing model complexity is a challenge. Solutions have been proposed by introducing pattern mining techniques. Based on these results, we developed a new method to extract sets of relevant event sequences for medical events' prediction, applied to predict the risk of in-hospital mortality in acute coronary syndrome (ACS). From the French Hospital Discharge Database, we mined sequential patterns. They were further integrated into several predictive models using a text string distance to measure the similarity between patients' patterns of care. We computed combinations of similarity measurements and ML models commonly used. A Support Vector Machine model coupled with edit-based distance appeared as the most effective model. We obtained good results in terms of discrimination with the receiver operating characteristic curve scores ranging from 0.71 to 0.99 with a good overall accuracy. We demonstrated the interest of sequential patterns for event prediction. This could be a first step to a decision-support tool for the prevention of in-hospital death by ACS.Entities:
Mesh:
Year: 2021 PMID: 34122784 PMCID: PMC8172301 DOI: 10.1155/2021/5531807
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Figure 1The contextual hierarchy. Number (n) of patients is displayed with number of deaths in parentheses.
Figure 2Data flow chart.
Example of patients' sequential database.
| Patient | Age (years) | Sex | January | February | March | April |
|---|---|---|---|---|---|---|
|
| >65 | Man |
|
| ||
|
| >65 | Man |
|
| ||
|
| >65 | Man |
| R07 |
| |
|
| >65 | Man | I25 |
|
| |
|
| >65 | Man | I21 |
|
| |
|
| >65 | Woman | I20 | R07 | ||
|
| >65 | Woman |
|
| R07 | |
|
| >65 | Woman | I21 |
|
| |
|
| 45–65 | Man |
| R07 |
| |
|
| 45–65 | Man | I20, I25, I21 | |||
|
| 45–65 | Man | I20, I21 | R07 | ||
|
| 45–65 | Woman | I50 | I20, I25, I21 | R07 | |
|
| 45–65 | Woman | I20, I21, I50 | |||
|
| 45–65 | Woman | I20 | R07 | I50 |
The <(R07) (I20)> pattern appears in bold with contextual information on sex and age. This pattern appeared in individuals aged >65 years. Only one 45–65 years individual was included.
Two examples of the most frequently mined contextual sequential patterns in ACS trajectories together with their corresponding support.
| Sequential pattern | Support |
|---|---|
|
| |
| <(Chronic ischemic heart disease)> | 42.4 |
| <(Angina pectoris)> | 32.6 |
| <(AMI)> | 29.5 |
| <(Angina pectoris) (angina pectoris)> | 6.1 |
| <(Angina pectoris) (chronic ischemic heart disease)> | 4.4 |
| <(Chronic ischemic heart disease) (chronic ischemic heart disease) (chronic ischemic heart disease)> | 1.8 |
|
| |
|
| |
| <(AMI)> | 45.4 |
| <(Angina pectoris)> | 25.2 |
| <(Chronic ischemic heart disease)> | 24.5 |
| <(Chronic ischemic heart disease) (chronic ischemic heart disease)> | 2.7 |
| <(AMI) (chronic ischemic heart disease)> | 1.9 |
| <(AMI) (AMI)> | 1.9 |
Means of area under the ROC curve (AURC), F-measure, and error rate for the different types of models and similarities in the modeling of ICD-10 code trajectories.
| AURC | F-measure | Error rate | |||||
|---|---|---|---|---|---|---|---|
| Model | Similarity | Mean | 95% CI | Mean | 95% CI | Mean | 95% CI |
| NB | Edition | 0.77 | 0.68–0.86 | 0.70 | 0.62–0.82 | 0.26 | 0.16–0.34 |
| q-gram | 0.72 | 0.64–0.77 | 0.64 | 0.58–0.70 | 0.33 | 0.28–0.38 | |
| Heuristic | 0.73 | 0.64–0.82 | 0.66 | 0.60–0.77 | 0.32 | 0.24–0.39 | |
|
| |||||||
| KNN | Edition | 0.44 | 0.38–0.53 | 0.58 | 0.53–0.63 | 0.38 | 0.35–0.43 |
| q-gram | 0.50 | 0.45–0.55 | 0.57 | 0.52–0.61 | 0.40 | 0.37–0.44 | |
| Heuristic | 0.54 | 0.46–0.59 | 0.55 | 0.52–0.65 | 0.41 | 0.38–0.46 | |
|
| |||||||
| Tree | Edition | 0.74 | 0.66–0.83 | 0.66 | 0.56–0.79 | 0.28 | 0.19–0.35 |
| q-gram | 0.67 | 0.62–0.71 | 0.63 | 0.57–0.70 | 0.34 | 0.30–0.39 | |
| Heuristic | 0.70 | 0.64–0.80 | 0.65 | 0.57–0.77 | 0.31 | 0.22–0.38 | |
|
| |||||||
| LR | Edition | 0.77 | 0.68–0.88 | 0.70 | 0.62–0.83 | 0.27 | 0.16–0.35 |
| q-gram | 0.75 | 0.65–0.82 | 0.69 | 0.62–0.77 | 0.29 | 0.23–0.38 | |
| Heuristic | 0.74 | 0.64–0.82 | 0.69 | 0.62–0.80 | 0.30 | 0.21–0.39 | |
|
| |||||||
| SVM | Edition | 0.83 | 0.76–0.92 | 0.70 | 0.61–0.82 |
| 0.16–0.33 |
| q-gram | 0.80 | 0.72–0.89 | 0.66 | 0.60–0.73 | 0.31 | 0.26–0.37 | |
| Heuristic |
| 0.77–0.92 | 0.70 | 0.64–0.81 | 0.27 | 0.20–0.36 | |
|
| |||||||
| ANN | Edition | 0.82 | 0.72–0.94 | 0.70 | 0.59–0.85 |
| 0.14–0.33 |
| q-gram | 0.81 | 0.72–0.90 | 0.70 | 0.62–0.79 | 0.28 | 0.21–0.37 | |
| Heuristic | 0.83 | 0.71–0.96 |
| 0.63–0.86 | 0.26 | 0.14–0.35 | |
CI = confidence interval. Best results are in bold.
Distribution (%) of the best combinations (model, similarity) according to the type of trajectories.
| ICD-10 code trajectories | DRG trajectories | |||||||
|---|---|---|---|---|---|---|---|---|
| Tree | LR | SVM | ANN | Tree | LR | SVM | ANN | |
| Edition | — | 2.86 |
| 17.14 | 5.56 | — |
| 20.37 |
| q-gram | — | 5.71 | 5.71 | 2.86 | — | — | — | 1.85 |
| Heuristic | 5.71 | — | 11.43 | 5.71 | — | 3.70 | 11.11 | 3.70 |
Average ranking (%) of the best models across all contexts and similarities.
| Rank | ICD-10 code trajectories | DRG trajectories | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NB | KNN | Tree | LR | SVM | ANN | NB | KNN | Tree | LR | SVM | ANN | |
| 1st | — | — | — | 4.35 | 73.91 | 21.74 | — | — | — | 17.39 | 34.78 | 56.54 |
| 2nd | 4.35 | — | 4.35 | 4.35 | 17.39 | 69.57 | — | — | 8.70 | 21.74 | 34.78 | 34.78 |
| 3rd | 30.43 | — | 13.04 | 43.48 | 8.70 | 4.35 | 30.43 | — | 13.04 | 26.09 | 21.74 | 4.35 |
Internal validation: AURC, error rate, numbers of predicted, and observed deaths by context according to the type of trajectory.
| Context | DRG trajectories | ICD-10 code trajectories | ||||||
|---|---|---|---|---|---|---|---|---|
| AURC | Error rate | Observed | Predicted | AURC | Error rate | Observed | Predicted | |
| Man and >65 years and ≤5 stays | 0.93 | 0.11 | 16 | 12.6 | 0.98 | 0.08 | 16 | 14 |
| Woman and 45–65 years | 0.96 | 0.05 | 15 | 14.8 |
| 0.07 | 15 | 13.4 |
| >65 years and ≤5 stays | 0.91 | 0.09 | 16 | 14.6 | 0.85 | 0.14 | 16 | 13.4 |
| Man and 45–65 years and ≤5 stays |
| 0.04 | 16 | 15.4 |
| 0.05 | 16 | 15.6 |
| 45–65 years and ≤5 stays |
|
| 16 | 15.2 | 0.98 | 0.05 | 16 | 15.6 |
| Woman and >65 years and >5 stays | 0.87 | 0.22 | 20 | 16.8 | 0.93 | 0.2 | 20 | 18.4 |
| Woman and >65 years | 0.92 | 0.16 | 26 | 22.6 | 0.87 | 0.18 | 26 | 24.6 |
| Man and ≤5 stays | 0.97 | 0.07 | 14 | 14 | 0.97 |
| 14 | 13.8 |
| ≤5 stays | 0.97 |
| 20 | 19 |
| 0.04 | 20 | 19 |
| Woman and >5 stays | 0.93 | 0.19 | 24 | 16.8 | 0.96 | 0.20 | 24 | 20 |
| Woman | 0.93 | 0.13 | 30 | 24.2 | 0.94 | 0.14 | 30 | 26.8 |
| Man and 45–65 years and >5 stays | 0.89 | 0.27 | 22 | 20.4 | 0.86 | 0.27 | 22 | 17.2 |
| Man and >65 years and >5 stays | 0.82 | 0.29 | 46 | 32 | 0.84 | 0.28 | 46 | 34.2 |
| 45–65 years and >5 stays | 0.86 | 0.27 | 24 | 19.8 | 0.78 | 0.39 | 24 | 16.2 |
| Man and 45–65 years | 0.93 | 0.07 | 15 | 14.2 | 0.78 | 0.35 | 26 | 20.4 |
| Man and >65 years | 0.84 | 0.25 | 56 | 42.4 | 0.80 | 0.28 | 56 | 44.2 |
| 45–65 years | 0.82 | 0.26 | 30 | 22 | 0.82 | 0.29 | 30 | 18.6 |
| >65 years and >5 stays | 0.71 | 0.32 | 66 | 31.6 | 0.71 | 0.33 | 66 | 63.6 |
| >65 years | 0.76 | 0.26 | 82 | 56.2 | 0.74 | 0.27 | 82 | 48.4 |
| Man and >5 stays | 0.79 | 0.31 | 70 | 56.4 | 0.74 | 0.33 | 70 | 59.2 |
| Man | 0.78 | 0.30 | 84 | 94.6 | 0.92 | 0.23 | 84 | 79.4 |
| >5 stays | 0.82 | 0.28 | 92 | 88 | 0.82 | 0.26 | 92 | 75.2 |
| General | 0.81 | 0.25 | 114 | 69.4 | 0.81 | 0.25 | 114 | 88.6 |
Best results are in bold.
External validation: AURC and Brier Score by context according to the type of trajectory.
| Context | DRG trajectories | ICD-10 code trajectories | ||
|---|---|---|---|---|
| AURC | Brier Score | AURC | Brier Score | |
| Man and >65 years and ≤5 stays | 0.87 | 0.11 | 0.82 | 0.15 |
| Woman and 45–65 years | 0.65 | 0.12 | 0.65 | 0.11 |
| >65 years and ≤5 stays | 0.90 | 0.09 | 0.88 |
|
| Man and 45–65 years and ≤5 stays |
|
| 0.81 | 0.11 |
| 45–65 years and ≤5 stays | 0.96 | 0.05 |
| 0.14 |
| Woman and >65 years and >5 stays | 0.77 | 0.19 | 0.74 | 0.19 |
| Woman and >65 years | 0.75 | 0.14 | 0.81 | 0.16 |
| Man and ≤5 stays | 0.95 | 0.08 | 0.89 | 0.14 |
| ≤5 stays | 0.94 | 0.05 | 0.87 | 0.16 |
| Woman and >5 stays | 0.67 | 0.21 | 0.79 | 0.20 |
| Woman | 0.83 | 0.13 | 0.82 | 0.16 |
| Man and 45–65 years and >5 stays | 0.80 | 0.18 | 0.65 | 0.24 |
| 45–65 years and >5 stays | 0.78 | 0.18 | 0.57 | 0.26 |
| Man and >65 years and >5 stays | 0.68 | 0.23 | 0.75 | 0.19 |
| Man and 45–65 years | 0.80 | 0.14 | 0.76 | 0.24 |
| Man and >65 years | 0.76 | 0.14 | 0.80 | 0.18 |
| 45–65 years | 0.74 | 0.22 | 0.81 | 0.17 |
| >65 years and >5 stays | 0.70 | 0.17 | 0.70 | 0.24 |
| >65 years | 0.79 | 0.15 | 0.72 | 0.17 |
| Man and >5 stays | 0.77 | 0.20 | 0.74 | 0.21 |
| Man | 0.76 | 0.24 | 0.85 | 0.17 |
| >5 stays | 0.73 | 0.21 | 0.80 | 0.19 |
| General | 0.82 | 0.16 | 0.79 | 0.17 |
Best results are in bold.
Figure 3External validation: area under ROC curve and Brier Score according to the type of trajectory and context size. (a) Area under ROC curve. (b) Brier Score.