| Literature DB >> 25951622 |
Rita Hamad1, Sepideh Modrek1, Jessica Kubo2, Benjamin A Goldstein2, Mark R Cullen1.
Abstract
BACKGROUND: Investigators across many fields often struggle with how best to capture an individual's overall health status, with options including both subjective and objective measures. With the increasing availability of "big data," researchers can now take advantage of novel metrics of health status. These predictive algorithms were initially developed to forecast and manage expenditures, yet they represent an underutilized tool that could contribute significantly to health research. In this paper, we describe the properties and possible applications of one such "health risk score," the DxCG Intelligence tool.Entities:
Mesh:
Year: 2015 PMID: 25951622 PMCID: PMC4423900 DOI: 10.1371/journal.pone.0126054
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Linked datasets employed in this study.
| Dataset | Contents |
|---|---|
| Personnel | Age |
| Race | |
| Gender | |
| Employment status (e.g., active, on leave) | |
| Claims | International Classification of Diseases codes |
| Current Procedural Terminology codes | |
| Dates of healthcare encounters | |
| National Death Index | Date of death |
| Eligibility | Insurance status |
* This variable was used to determine which employees to include in our sample, i.e., those who were actively employed on January 1, 1996.
Sample characteristics.
|
| |
| Female (%) | 10.1 |
| Age in 1996 (mean ± SD) | 46.7 ± 8.7 |
| Race (%) | |
| White | 87.1 |
| Black | 8.5 |
| Hispanic | 3.7 |
| Other | 0.7 |
|
| |
| Deaths during 1996–2011 (%) | 8.2 |
| New disease diagnoses during 1996–2011 (%) | 15.9 |
| Diabetes | 39.6 |
| Hypertension | 6.6 |
| Asthma/COPD | 4.7 |
| Depression | 14.6 |
|
| |
| Mean ± SD | 1.12 ± 1.36 |
| Median | 0.79 |
| Min, Max | 0.23, 33.19 |
| Quintiles | |
| Q1 | 0.23, 0.53 |
| Q2 | 0.53, 0.69 |
| Q3 | 0.69, 0.93 |
| Q4 | 0.93, 1.38 |
| Q5 | 1.38, 33.19 |
Inclusion criteria: Employed on January 1, 1996 with at least one risk score in the period 1996–2011 (N = 14,161). COPD = chronic obstructive pulmonary disease.
Fig 1Risk score distribution in 1996.
Note: For clarity of presentation, we omit observations with a risk score of greater than four (2.1%). Sample includes individuals employed at the firm on January 1, 1996 with at least one risk score during 1996–2011(N = 14,161).
Risk score correlates.
| Variable | Beta | [95% CI] |
|---|---|---|
| Age (per 10-year increment) | 0.51 | [0.48, 0.54] |
| Female | 0.12 | [0.03, 0.20] |
| Race (Ref = White) | ||
| Black | 0.45 | [0.31, 0.59] |
| Hispanic | 0.078 | [-0.085, 0.24] |
| Other | -0.14 | [-0.51, 0.22] |
| Year | 0.025 | [0.021, 0.030] |
| Constant | -51.74 | [-61.06, -42.41] |
| Observations | 151,931 | |
| Individuals | 13,880 |
* p <0.05,
** p < 0.01.
Note: Sample includes individuals employed at the firm on January 1, 1996. Analysis conducted using multivariable linear regression with individual-level random effects. Robust standard errors are clustered at the individual level.
Associations between risk score and new disease diagnosis in subsequent year.
| Coefficient [95% CI] | |||||
|---|---|---|---|---|---|
| Asthma | Depression | Diabetes | Hypertension | Ischemic heart disease | |
| Previous year risk score | 0.00041 | 0.00021 | 0.00047 | -0.000077 | 0.00041 |
| [0.00026, 0.00057] | [0.000097, 0.00033] | [0.00026, 0.00068] | [-0.00037, 0.00022] | [0.00019, 0.00063] | |
| Age | 0.000024 | -0.00017 | 0.00018 | 0.00027 | 0.00050 |
| [-0.000015, 0.000063] | [-0.00020, -0.00013] | [0.00013, 0.00023] | [0.00018, 0.00035] | [0.00045, 0.00055] | |
| Female | 0.0023 | 0.0025 | -0.0030 | -0.0030 | -0.0047 |
| [0.00098, 0.0036] | [0.0013, 0.0037] | [-0.0044, -0.0017] | [-0.0052, -0.00076] | [-0.0058, -0.0037] | |
| Race (ref white) | |||||
| Black | -0.00072 | -0.00093 | 0.0067 | 0.010 | -0.0010 |
| [-0.0019, 0.00048] | [-0.0018, -0.000030] | [0.0045, 0.0089] | [0.0073, 0.013] | [-0.0027, 0.00064] | |
| Hispanic | -0.0026 | 0.000030 | 0.0052 | 0.0023 | -0.00071 |
| [-0.0038, -0.0014] | [-0.0015, 0.0015] | [0.0022, 0.0082] | [-0.0014, 0.0060] | [-0.0030, 0.0016] | |
| Other | -0.0021 | -0.0010 | 0.00090 | -0.0062 | -0.0017 |
| [-0.0055, 0.0012] | [-0.0043, 0.0023] | [-0.0056, 0.0074] | [-0.014, 0.0020] | [-0.0069, 0.0034] | |
| Observations | 143,822 | 144,392 | 139,633 | 127,321 | 141,706 |
| Individuals | 13,681 | 13,736 | 13,293 | 12,191 | 13,441 |
* p < 0.05,
** p < 0.01.
Note: Sample includes individuals employed at the firm on January 1, 1996. Analyses are conducted using linear probability models with individual-level random effects, in which an individual’s risk score in one year predicts their likelihood of a new diagnosis of disease in the following year. Standard errors are clustered at the individual level. To be considered a new diagnosis, the individual must have been free of the disease for the first two years of the study. For each of these conditions, individuals with one or more inpatient claims or two or more outpatient claims with a relevant ICD diagnosis code in a 365-day period are considered to have the disease in question. Each model includes dummy variables for year to control for secular trends.
Fig 2Kaplan-Meier survival curves for chronic disease diagnoses, by 1996 risk score quintile.
Note: Sample includes individuals employed at the firm on January 1, 1996. For each of these conditions, individuals with one or more inpatient claims or two or more outpatient claims with a relevant ICD diagnosis code in a 365-day period are considered to have a new diagnosis of the disease in question. To rule out prevalent (i.e., existing) cases, we require the individual to have no claims related to the diagnosis for the first two years of the study period. As our dataset includes claims data beginning in January 1, 1996, for each disease we exclude individuals with diagnoses in 1996–1997, such that the earliest possible date of diagnosis for a given disease is January 1, 1998. N = (a) 8,522; (b) 7,641; (c) 8,841; (d) 8,886; (e) 8,665.
Cox proportional hazards models for incident disease diagnosis and mortality.
| Hazard Ratio [95% CI] | ||||||
|---|---|---|---|---|---|---|
| Asthma | Depression | Diabetes | Hypertension | Ischemic heart disease | Mortality | |
| 1996 risk score | 1.09 | 1.07 | 1.09 | 1.05 | 1.10 | 1.21 |
| [1.03, 1.14] | [0.99, 1.15] | [1.06, 1.13] | [1.01, 1.08] | [1.06, 1.14] | [1.19, 1.24] | |
| Female | 1.51 | 1.63 | 0.62 | 0.83 | 0.36 | 0.67 |
| [1.20, 1.92] | [1.26, 2.11] | [0.50, 0.77] | [0.73, 0.94] | [0.27, 0.50] | [0.50, 0.91] | |
| Race (ref white) | ||||||
| Black | 0.76 | 0.85 | 1.81 | 1.87 | 0.94 | 1.29 |
| [0.54, 1.07] | [0.56, 1.27] | [1.51, 2.18] | [1.64, 2.13] | [0.75 1.17] | [1.02, 1.63] | |
| Hispanic | 0.40 | 1.09 | 1.91 | 1.00 | 0.90 | 1.10 |
| [0.20, 0.78] | [0.65, 1.82] | [1.47, 2.49] | [0.81, 1.23] | [0.65, 1.25] | [0.74, 1.64] | |
| Other | 0.63 | 0.84 | 1.05 | 0.86 | 1.22 | 0.72 |
| [0.16, 2.52] | [0.21, 3.38] | [0.50, 2.23] | [0.47, 1.55] | [0.55, 2.72] | [0.27, 1.93] | |
| Observations | 8,841 | 8,886 | 8,522 | 7,641 | 8,665 | 9,012 |
* p < 0.05,
** p < 0.01.
Note: Sample includes individuals employed at the firm on January 1, 1996. To be considered a new diagnosis, the individual must have been free of the disease for the first two years of the study. For each of these conditions, individuals with one or more inpatient claims or two or more outpatient claims with a relevant ICD diagnosis code in a 365-day period are considered to have the disease in question. Each model includes dummy variables to control for age group at baseline (20–30 years old, 30–40 years old, etc.). Individuals were censored at the last date that they were active at the firm based on the personnel dataset. For mortality, individuals were censored at September 1, 2011, after which we do not have data on mortality.
Cox proportional hazards models for incident disease diagnosis and mortality, by 1996 risk score quintiles.
| Hazard Ratio [95% CI] | ||||||
|---|---|---|---|---|---|---|
| Asthma | Depression | Diabetes | Hypertension | Ischemic heart disease | Mortality | |
| Risk score quintile (ref = Q1) | ||||||
| Q2 | 1.50 | 1.39 | 1.18 | 1.31 | 1.43 | 1.32 |
| [1.09, 2.07] | [0.98, 1.95] | [0.97, 1.45] | [1.15, 1.48] | [1.12, 1.84] | [0.92, 1.91] | |
| Q3 | 1.91 | 1.96 | 1.37 | 1.20 | 1.67 | 1.08 |
| [1.39, 2.62] | [1.40, 2.74] | [1.12, 1.68] | [1.05, 1.36] | [1.31, 2.14] | [0.74, 1.56] | |
| Q4 | 2.32 | 2.28 | 1.70 | 1.39 | 2.35 | 1.35 |
| [1.69, 3.20] | [1.61, 3.22] | [1.38, 2.08] | [1.22, 1.59] | [1.85, 2.99] | [0.94, 1.94] | |
| Q5 | 2.75 | 2.18 | 1.91 | 1.43 | 2.73 | 2.24 |
| [1.98, 3.81] | [1.50, 3.17] | [1.54, 2.37] | [1.24, 1.65] | [2.13, 3.49] | [1.57, 3.19] | |
| Female | 1.27 | 1.33 | 0.56 | 0.78 | 0.31 | 0.63 |
| [0.99, 1.62] | [1.02, 1.74] | [0.45, 0.69] | [0.68, 0.88] | [0.23, 0.43] | [0.46, 0.85] | |
| Race (ref white) | ||||||
| Black | 0.75 | 0.84 | 1.81 | 1.86 | 0.95 | 1.29 |
| [0.54, 1.06] | [0.56, 1.26] | [1.51, 2.18] | [1.63, 2.12] | [0.76, 1.19] | [1.02, 1.63] | |
| Hispanic | 0.38 | 1.07 | 1.92 | 1.00 | 0.87 | 1.05 |
| [0.20, 0.76] | [0.64, 1.80] | [1.48, 2.49] | [0.81, 1.23] | [0.63, 1.21] | [0.70, 1.57] | |
| Other | 0.67 | 0.85 | 1.06 | 0.88 | 1.44 | 1.03 |
| [0.17, 2.70] | [0.21, 3.43] | [0.50, 2.25] | [0.48, 1.58] | [0.64, 3.21] | [0.39, 2.77] | |
| Observations | 8,841 | 8,886 | 8,522 | 7,641 | 8,665 | 9,012 |
* p < 0.05,
** p < 0.01.
Note: Sample includes individuals employed at the firm on January 1, 1996. To be considered a new diagnosis, the individual must have been free of the disease for the first two years of the study. For each of these conditions, individuals with one or more inpatient claims or two or more outpatient claims with a relevant ICD diagnosis code in a 365-day period are considered to have the disease in question. Models include dummy variables to control for age group at baseline (20–30 years old, 30–40 years old, etc.). Individuals were censored at the last date that they were active at the firm based on the personnel dataset. For mortality, individuals were censored at September 1, 2011, after which we do not have data on mortality.
Fig 3Kaplan-Meier survival curve for mortality, by 1996 risk score quintile.
Note: Sample includes individuals employed at the firm on January 1, 1996. Individuals were censored at September 1, 2011, after which we do not have data on mortality. N = 9,012.