| Literature DB >> 30819172 |
Merijn Beeksma1, Suzan Verberne2, Antal van den Bosch3, Enny Das4, Iris Hendrickx4, Stef Groenewoud5.
Abstract
BACKGROUND: Life expectancy is one of the most important factors in end-of-life decision making. Good prognostication for example helps to determine the course of treatment and helps to anticipate the procurement of health care services and facilities, or more broadly: facilitates Advance Care Planning. Advance Care Planning improves the quality of the final phase of life by stimulating doctors to explore the preferences for end-of-life care with their patients, and people close to the patients. Physicians, however, tend to overestimate life expectancy, and miss the window of opportunity to initiate Advance Care Planning. This research tests the potential of using machine learning and natural language processing techniques for predicting life expectancy from electronic medical records.Entities:
Keywords: Advance care planning; Clinical free-text; Life expectancy prediction; Long short-term memory
Mesh:
Year: 2019 PMID: 30819172 PMCID: PMC6394008 DOI: 10.1186/s12911-019-0775-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Simplified LSTM architecture. At final time step t, xt represents the feature vector used as input to the hidden LSTM units, which activate output ht. In each preceding time step, output h functions as an intermediate prediction of life expectancy. We are interested in final prediction ht: a probability distribution for the next 50 months
Fig. 2Probability distributions produced by the baseline model for one patient at different moments in time. From top to bottom, the corresponding actual number of months to death are 33 months, 11 months, and 3 months, respectively
Deviation in months between actual life expectancy and model’s predictions for the baseline model
| Root mean square | Mean deviation |
|---|---|
| 17.6 | 6.4 |
Deviation in months between actual life expectancy and predicted life expectancy for different keyword models
| Selection method | Hidden units | Root mean square deviation | Mean deviation | ||||
|---|---|---|---|---|---|---|---|
| 100 features | 200 features | 300 features | 100 features | 200 features | 300 features | ||
| Frequency | 50 | 17.6 | 17.2 | 17.0 | 4.5 | 5.0 | 5.8 |
| 100 | 17.5 | 17.4 |
| 2.1 | 1.2 |
| |
| 200 | 17.7 | 17.8 | 17.8 | 1.6 | 1.3 | 1.0 | |
| Entropy | 50 | 17.4 | 17.8 | 17.8 | 5.1 | 5.6 | 5.4 |
| 100 | 17.2 |
| 17.8 | 2.5 |
| 1.6 | |
| 200 | 17.7 | 17.5 | 17.7 | 2.3 | 2.0 | 1.3 | |
| Word2vec | 50 |
| 18.2 | 18.2 |
| −4.3 | −3.7 |
| 100 | 18.1 | 17.8 | 17.8 | −4.2 | −4.1 | −4.8 | |
| 200 | 18.3 | 18.3 | 18.4 | −3.75 | −4.4 | − 4.4 | |
The models differ from each other in terms of selection method and number of included keywords. The best models are defined by two criteria: 1) having a relatively low root mean square, followed by 2) having a low mean deviation. Note: the first criterion is leading, the second criterion is only used as a tie breaker. For each selection method, the results of the best-performing model are marked with boldface, based on these criteria
Evaluation of the quality of the predictions
| Assessor | Accuracy | Overly pessimistic | Overly optimistic |
|---|---|---|---|
| Human | 20% | 17% | 63% |
| Baseline model | 23% | 58% | 20% |
| Frequency model | 29% | 27% | 44% |
| Entropy model | 28% | 46% | 27% |
| Word2vec model | 38% | 32% | 31% |
Predictions were considered accurate if they deviate less than 33% from the actual life expectancy. Results were adopted from [15]. Note: the doctors in [15] estimated life expectancy for a different group of patients than our models do in this the current research
Evaluation of the quality of the predictions
| Assessor | Accuracy | Overly pessimistic | Overly optimistic |
|---|---|---|---|
| Human | 20% | 17% | 63% |
| Baseline model | 20% | 68% | 12% |
| Keyword model | 29% | 52% | 19% |
Predictions were considered accurate if they deviate less than 33% from the actual life expectancy. The human results were adopted from [15]. Note: the doctors in [15] estimated life expectancy for a different group of patients than our models do in this the current research
Results for correlation calculations between several outcome measures
| Tested relations | Hypotheses | Pearson’s | Significance |
|---|---|---|---|
| Actual vs. predicted life exp. | positive relation | .36 | <.001 |
| Certainty vs. actual life exp. | negative relation | −.35 | <.001 |
| Certainty vs. predicted life exp. | negative relation | −.61 | <.001 |
| Certainty vs. absolute difference between actual and predicted life exp. | negative relation | −.02 | .12 |
Fig. 3Absolute frequency counts for actual and predicted life expectancies, for each month in range 1–50
Fig. 4Relative certainty as a function of predicted life expectancy