| Literature DB >> 33623893 |
Jackie Szymonifka1, Sarah Conderino2, Christine Cigolle3,4,5, Jinkyung Ha3,4,5, Mohammed Kabeto3,4,5, Jaehong Yu1, John A Dodson6,7, Lorna Thorpe2, Caroline Blaum8, Judy Zhong1.
Abstract
OBJECTIVE: Electronic health records (EHRs) have become a common data source for clinical risk prediction, offering large sample sizes and frequently sampled metrics. There may be notable differences between hospital-based EHR and traditional cohort samples: EHR data often are not population-representative random samples, even for particular diseases, as they tend to be sicker with higher healthcare utilization, while cohort studies often sample healthier subjects who typically are more likely to participate. We investigate heterogeneities between EHR- and cohort-based inferences including incidence rates, risk factor identifications/quantifications, and absolute risks.Entities:
Keywords: cardiovascular disease; cohort analysis; electronic health records; risk factors; type 2 diabetes mellitus
Year: 2020 PMID: 33623893 PMCID: PMC7886535 DOI: 10.1093/jamiaopen/ooaa059
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Figure 1.Inclusion criteria and study flow chart for (A) NYULH-EHR T2DM patients and (B) HRS T2DM respondents. (A) Inclusion criteria for the NYULH-EHR cohort. Among patients seen in the New York University Langone Health ambulatory care clinic between 1995 and 2014 who met the eligibility criteria outlined in Supplementary Figure S1, we first limited the analysis cohort to encounters with patients 50 years of age or older. We then reduced the eligible cohort to patients ≥ 50 years of age who had T2DM, as defined in Supplementary Figure S1. We removed patients who met the criteria for T2DM status at initial encounter since date of initial diagnosis could not be reliably estimated. Finally, we removed patients who already met the criteria for CVD diagnosis, so that subsequent incident CVD cases could be identified. (B) Inclusion criteria for the HRS cohort. Among respondents to the HRS between 1992 and 2014, we first limited the analysis cohort to respondents who were 50 years of age or older during at least one interview. We then reduced the eligible cohort to respondents age ≥ 50 years with self-reported and subsequently adjudicated T2DM. We also eliminated T2DM cases that were self-reported at the first interview since date of initial diagnosis could not be reliably estimated. Finally, we removed respondents with self-reported, and subsequently adjudicated, CVD or stroke at or before the first interview at which T2DM was reported so that subsequent incident CVD cases could be identified.
Demographic characteristics of the analyzed NYULH-EHR T2DM patients and the HRS T2DM participants
| NYULH-EHR T2DM patients | HRS T2DM respondents | |||||||
|---|---|---|---|---|---|---|---|---|
| CVD status | Overall | No CVD | CVD |
| Overall | No CVD | CVD |
|
| ( | ( | ( | ( | ( | ( | |||
| Age at T2DM onset (years) | 63.54 (10.25) | 62.52 (9.93) | 66.79 (10.58) | <0.001 | 65.61 (8.69) | 65.61 (8.77) | 65.89 (8.38) | 0.99 |
| Sex: female (%) | 4495 (60.5) | 3443 (60.8) | 1052 (59.3) | 0.269 | 1684 (55.5) | 1326 (56.3) | 358 (52.9) | 0.12 |
| Race (%) | <0.001 | <0.001 | ||||||
| Caucasian (White) | 3598 (48.4) | 2489 (44.0) | 1109 (62.5) | 2142 (70.6) | 1606 (68.2) | 536 (79.2) | ||
| African American (Black) | 1427 (19.2) | 1208 (21.3) | 219 (12.4) | 646 (21.3) | 529 (22.5) | 117 (17.3) | ||
| Other | 1903 (25.6) | 1526 (27.0) | 377 (21.3) | 241 (7.9) | 217 (9.2) | 24 (3.5) | ||
| Unknown | 504 (6.8) | 436 (7.7) | 68 (3.8) | 3 (0.1) | 3 (0.1) | 0 (0.0) | ||
| Ethnicity (%) | <0.001 | <0.001 | ||||||
| Hispanic or Latino | 1498 (20.2) | 1217 (21.5) | 281 (15.8) | 495 (16.3) | 415 (17.6) | 80 (11.8) | ||
| Not Hispanic or Latino | 5049 (67.9) | 3699 (65.4) | 1350 (76.1) | 2537 (83.7) | 1940 (82.4) | 597 (88.2) | ||
| Unknown | 885 (11.9) | 743 (13.1) | 142 (8.0) | − | − | − | ||
| Smoking (%) | 0.024 | <0.001 | ||||||
| No | 6965 (93.7) | 5324 (94.1) | 1641 (92.6) | 1734 (57.2) | 1265 (53.7) | 469 (69.3) | ||
| Yes | 467 (6.3) | 235 (5.9) | 132 (7.4) | 456 (15.0) | 336 (14.3) | 120 (17.7) | ||
| Unknown | – | – | – | 842 (27.8) | 754 (32.0) | 88 (13.0) | ||
| Body mass index (BMI) (kg/m2) | 31.36 (6.31) | 31.29 (6.27) | 31.60 (6.44) | 0.07 | 30.05 (6.06) | 29.92 (6.00) | 30.47 (6.25) | 0.04 |
| Systolic blood pressure (SBP) (mmHg) | 130.47 (11.67) | 130.57 (11.38) | 130.15 (12.58) | 0.19 | 132.46 (18.60) | 131.74 (18.33) | 134.67 (19.25) | 0.002 |
| High-density lipoprotein (HDL) (mg/dL) | 51.99 (14.79) | 52.43 (14.81) | 50.36 (14.61) | <0.001 | 51.00 (13.84) | 51.49 (13.93) | 49.53 (13.50) | 0.008 |
| Hemoglobin A1c (HbA1c) (%) | 7.18 (1.39) | 7.16 (1.37) | 7.25 (1.48) | 0.018 | 6.71 (1.13) | 6.68 (1.14) | 6.81 (1.11) | 0.03 |
| Albumin (g/dL) | 4.23 (0.33) | 4.25 (0.33) | 4.15 (0.35) | <0.001 | – | – | – | |
| Creatinine (mg/dL) | 0.94 (0.37) | 0.92 (0.34) | 1.02 (0.46) | <0.001 | – | – | – | |
| eGFR (mL/min/1.73m2) | 74.00 (21.33) | 76.28 (20.58) | 64.77 (21.86) | <0.001 | – | – | – | |
| Cystatin C (mg/dL) | – | – | – | 1.21 (0.54) | 1.16 (0.49) | 1.37 (0.64) | <0.001 | |
| C-reactive protein (CRP) (mg/L) | – | – | – | 4.52 (6.68) | 4.15 (6.34) | 5.62 (7.54) | <0.001 | |
| Anti-hypertensive medication use (%) | 0.539 | <0.001 | ||||||
| No | 1951 (26.3) | 1496 (26.4) | 455 (25.7) | 76 (2.5) | 75 (3.2) | 1 (0.1) | ||
| Yes | 5481 (73.7) | 4163 (73.6) | 1318 (74.3) | 1938 (63.9) | 1488 (63.2) | 450 (66.5) | ||
| Unknown | – | – | – | 1018 (33.6) | 792 (33.6) | 226 (33.4) | ||
| Diabetes medication use (%) | <0.001 | <0.001 | ||||||
| No | 1957 (26.3) | 1391 (24.6) | 566 (31.9) | 403 (13.3) | 379 (16.1) | 24 (3.5) | ||
| Yes | 5475 (73.7) | 4268 (75.4) | 1207 (68.1) | 2381 (78.5) | 1802 (76.5) | 579 (85.5) | ||
| Unknown | – | – | – | 248 (8.2) | 174 (7.4) | 74 (10.9) | ||
| Atherosclerotic CVD medication use (%) | 0.312 | − | ||||||
| No | 2748 (37.0) | 2074 (36.6) | 674 (38.0) | – | – | – | ||
| Yes | 4684 (63.0) | 3585 (63.4) | 1099 (62.0) | – | – | – | ||
| Encounters/year (median [IQR]) | 6.3 (3.9, 10.4) | 5.6 (3.6, 9.0) | 9.6 (5.9, 18.1) | <0.001 | – | – | – | |
Demographics and risk factors were summarized using means (standard deviations) for continuous measures and frequencies (percentages) for categorical measures. Comparisons were performed using the two-sample t-test and the chi-squared test, respectively. Biomarkers are summarized as the mean of all available biomarkers measurements following T2DM onset. Medications are summarized as prescriptions or self-reported medication usage at any encounter/visit following T2DM onset.
Demographic covariates adjusted Cox models of time-to-CVD
| NYULH-EHR | HRS | Test for heterogeneity | |||
|---|---|---|---|---|---|
| Variables | Hazard ratio [95% CI] |
| Hazard ratio [95% CI] |
| |
| Model 1 | |||||
| Age at type 2 diabetes mellitus (T2DM) onset | 1.03 [1.03, 1.04] | <0.001 | 1.03 [1.02, 1.04] | <0.001 | 0.34 |
| Sex (reference = Male) | |||||
| Female | 0.79 [0.71, 0.87] | <0.001 | 0.90 [0.77, 1.04] | 0.15 | 0.18 |
| Race (reference = Caucasian) | |||||
| African American (Black) | 0.55 [0.47, 0.64] | <0.001 | 0.73 [0.60, 0.90] | 0.003 | 0.03 |
| Other | 0.77 [0.67, 0.89] | <0.001 | 0.55 [0.36, 0.83] | 0.005 | 0.12 |
| Ethnicity (reference = Not Hispanic or Latino) | |||||
| Hispanic or Latino | 0.60 [0.52, 0.70] | <0.001 | 0.72 [0.57, 0.92] | 0.009 | 0.21 |
| Smoking (reference = Non-smokers) | |||||
| Smokers | 1.49 [1.23, 1.79] | <0.001 | 1.29 [1.05, 1.58] | 0.015 | 0.32 |
| Models 2–11 | |||||
| Body mass index (BMI) | 1.19 [1.13, 1.25] | <0.001 | 1.15 [1.06, 1.24] | 0.001 | 0.46 |
| Systolic blood pressure (SBP) | 1.03 [0.97, 1.08] | 0.359 | 1.06 [0.97, 1.16] | 0.19 | 0.52 |
| Total cholesterol (TC) | 1.09 [1.03, 1.16] | 0.003 | 0.95 [0.86,1.05] | 0.301 | 0.02 |
| High-density lipoprotein (HDL) | 0.84 [0.79, 0.90] | <0.001 | 0.89 [0.80, 0.99] | 0.04 | 0.33 |
| Hemoglobin A1c (HbA1c) | 1.26 [1.19, 1.33] | <0.001 | 1.08 [0.98, 1.18] | 0.11 | 0.01 |
| Albumin | 0.89 [0.84, 0.94] | <0.001 | – | – | – |
| Creatinine | 1.09 [1.05, 1.14] | <0.001 | – | – | – |
| eGFR | 0.77 [0.71, 0.84] | <0.001 | – | – | – |
| Cystatin C | – | – | 1.16 [1.09, 1.23] | <0.001 | – |
| C-reactive protein | – | – | 1.15 [1.09, 1.22] | <0.001 | – |
Model 1 is a Cox model with demographic covariates including age at T2DM diagnosis, sex, race, ethnicity, smoking status and, for NYULH-EHR, number of encounters. Models 2 to 11 are Cox models with covariates in Model 1 and one biomarker per model. Each biomarker was modeled separately, adjusting for the demographic covariates in Model 1 to accommodate not all biomarkers were available for the two datasets. HRs are per 1 standard deviation increase in the continuous covariates. The number of encounters was significant in more models. When multiple measurements were available for a patient longitudinally for a biomarker, the average of all available measurements values was used. HRs were estimated treating death as a competing risk. Cochran's Q-test was used to determine whether the adjusted HR estimates from the datasets were significantly different for each covariate.
Figure 2.Calibration plots comparing predicted and observed 5-year risks. (A) HRS cohort using Framingham risk score to predict total CVD outcome; (B) NYULH-EHR cohort using Framingham risk score to predict total CVD outcome; (C) NYULH-EHR cohort using ACC/AHA pooled cohort equations to predict hard CVD outcome; (D) NYULH-EHR cohort using Swedish NDR to predict hard CVD outcome. Risk factors included in the FRS global CVD function and the ACC/AHA function are age, TC, HDL, SBP, treatment for hypertension, smoking, and T2DM status (all yes). Sex-specific risk equations were applied to males and females separately. For the ACC/AHA risk score, African American (AA) coefficients were used for AAs and white coefficients were used for all other patients. We replaced 10-year baseline survival estimates with 5-year estimates by assuming exponential survival distributions to align with the available follow-up of the present cohorts. Risk factors included in the Swedish NDR risk prediction functions include onset age of T2DM, T2DM duration, sex, BMI, smoking, HbA1c, SBP, and antihypertensive and lipid-reducing drugs. We divided the cohorts into deciles of the predicted risk and calculated the mean predicted risk value within each decile. We calculated the observed risk from the Kaplan–Meier estimate within each decile and plotted the observed vs. predicted risk functions.