| Literature DB >> 31054141 |
Jenna M Reps1, Peter R Rijnbeek2, Patrick B Ryan3.
Abstract
INTRODUCTION: US claims data contain medical data on large heterogeneous populations and are excellent sources for medical research. Some claims data do not contain complete death records, limiting their use for mortality or mortality-related studies. A model to predict whether a patient died at the end of the follow-up time (referred to as the end of observation) is needed to enable mortality-related studies.Entities:
Mesh:
Year: 2019 PMID: 31054141 PMCID: PMC6834730 DOI: 10.1007/s40264-019-00827-0
Source DB: PubMed Journal: Drug Saf ISSN: 0114-5916 Impact factor: 5.606
Characteristics of development target population and validation datasets
| Characteristic | Development dataset | Validation datasets | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| OPTUM DOD test/train | OPTUM validation | CCAE | MDCD | MDCR | ||||||
| Non dead | Dead (all deaths) | Non dead | Dead (death at discharge) | Non dead | Dead (death at discharge) | Non dead | Dead (death at discharge) | Non dead | Dead (death at discharge) | |
| % ( | % ( | % ( | % ( | % ( | % ( | % ( | % ( | % ( | % ( | |
| Sex: female | 50.6 | 49.8 | 50.1 | 42.3 | 51.7 | 45 | 60.5 | 63.8 | 51.9 | 47.2 |
| Acute respiratory disease | 26.2 | 48.0 | 19.6 | 40.8 | 21.6 | 58.3 | 25.3 | 50.2 | 15.2 | 57 |
| Chronic liver disease | 1 | 4.4 | 0.5 | 3 | 0.5 | 9.2 | 0.4 | 6.1 | 0.7 | 1.9 |
| Chronic obstructive lung disease | 2.1 | 36.5 | 1.1 | 29.8 | 0.6 | 22.8 | 1.6 | 38.4 | 7.2 | 38.2 |
| Dementia | 0.8 | 23.7 | 0.3 | 14.8 | 0.1 | 1.8 | 0.5 | 31.2 | 2.2 | 16.7 |
| Hypertensive disorder | 18.3 | 76.0 | 10 | 73.1 | 10 | 50.4 | 6.9 | 70.8 | 44.9 | 64.5 |
| Obesity | 4.1 | 7.6 | 2.2 | 11.4 | 2.2 | 9.9 | 2.9 | 10.9 | 2.6 | 4 |
| Osteoarthritis | 8.0 | 34.9 | 4.3 | 35.3 | 4.2 | 18 | 2.7 | 30.1 | 19.3 | 27.4 |
| Pneumonia | 2.0 | 39.2 | 1.2 | 25.6 | 1.1 | 36 | 2 | 38.7 | 3.3 | 44.7 |
| Renal impairment | 2.0 | 46.6 | 0.8 | 34.2 | 0.4 | 35.9 | 1 | 43.5 | 5.3 | 44.7 |
| Heart disease | 8.3 | 75.9 | 4.6 | 79.3 | 4 | 71.1 | 3.9 | 76.1 | 29.4 | 87.4 |
| Heart failure | 1.4 | 42.0 | 0.7 | 34.4 | 0.3 | 23.9 | 1 | 41.4 | 5.2 | 52 |
| Malignant neoplastic disease | 3.7 | 37.5 | 1.8 | 20.3 | 1.7 | 53.5 | 0.8 | 18.9 | 13.9 | 31.3 |
| Characteristic | Value | Value | Value | Value | Value | Value | Value | Value | Value | Value |
| Charlson Comorbidity Index | ||||||||||
| Mean | 0.9 | 7.5 | 0 | 6 | 0 | 7 | 0 | 6 | 2 | 7 |
| Age, years | ||||||||||
| Mean | 37.4 | 75.3 | 33 | 70 | 32 | 54 | 20 | 70 | 73 | 82 |
CCAE IBM MarketScan® Commercial Database, DOD date of death, MDCD IBM MarketScan® Multi-State Medicaid Database, MDCR IBM MarketScan® Medicare Supplemental Database, OPTUM Optum© De-Identified Clinformatics® Data Mart Database
Fig. 1Scatter plot of variable means for people with death recorded within 61 days of the end of observation (y axis) vs. people without death recorded within 61 days of the end of observation (x axis). Green points correspond to variables included in the trained model and blue dots are variables that were not included in the trained model
Fig. 2Receiver operating characteristic curve and calibration plots for the internal validation
Results at various prediction threshold cut-offs that can be used to select the threshold used by any future epidemiology study when selecting an end of observations due to death
| Finding people who are dead … | |||||
|---|---|---|---|---|---|
| Prediction threshold | Sensitivity of death | Specificity of death | Positive predictive value of death | Proportion of target population | |
| If you choose by prediction threshold to be greater than … | |||||
| 0.9 | 26.170 | 99.901 | 86.898 | 0.007 | |
| 0.5 | 61.895 | 99.474 | 74.754 | 0.020 | |
| 0.1 | 90.282 | 98.149 | 55.095 | 0.040 | |
| If you choose by sensitivity … | |||||
| 0.666 | 50 | 99.676 | 79.549 | 0.015 | |
| 0.1046 | 90 | 98.184 | 55.489 | 0.040 | |
| 0.0019 | 99 | 69.321 | 7.510 | 0.324 | |
| If you choose by specificity… | |||||
| 0.253 | 78.608 | 99 | 66.515 | 0.029 | |
| 0.905 | 25.599 | 99.9 | 87.174 | 0.007 | |
| 0.990 | 7.207 | 99.99 | 95.054 | 0.002 | |
Results at various prediction threshold cut-offs that can be used to select the threshold used by any future epidemiology study when selecting the non-death end of observations
| Finding people who are still alive…. | |||||
|---|---|---|---|---|---|
| Prediction threshold | Sensitivity of alive | Specificity of alive | Positive predictive value of alive | Proportion of target population | |
| If you choose by prediction threshold less than … | |||||
| 0.5 | 99.474 | 61.895 | 99.046 | 0.980 | |
| 0.1 | 98.149 | 90.282 | 99.752 | 0.960 | |
| 0.01 | 91.351 | 97.196 | 99.923 | 0.892 | |
| If you choose by sensitivity … | |||||
| 0.00103 | 50 | 99.658 | 99.983 | 0.488 | |
| 0.0085 | 90 | 97.375 | 99.927 | 0.879 | |
| 0.252 | 99 | 78.624 | 99.460 | 0.971 | |
| If you choose by specificity … | |||||
| 0.104 | 98.177 | 90 | 99.747 | 0.960 | |
| 0.00196 | 70.264 | 99 | 99.964 | 0.686 | |
| 0.00055 | 34.636 | 99.9 | 99.993 | 0.338 | |
Fig. 3Percentage of the final end of observation per year that are due to death or imputed as due to death by the DEAD model for each databases across the years 2006–16. CCAE IBM MarketScan® Commercial Database, MDCD IBM MarketScan® Multi-State Medicaid Database, MDCR IBM MarketScan® Medicare Supplemental Database, OPTUM Optum© De-Identified Clinformatics® Data Mart Database
| Death can be incompletely recorded in US claims data and this can limit drug safety studies that use these datasets. |
| We present a model that can predict whether the end of observation was due to death in US claims data with a discriminative performance of 0.986 on the area under the receiver operating characteristic curve. |
| The model is available online and can be readily applied to any dataset in the Observational Medical Outcomes Partnership common data model. |