| Literature DB >> 31266516 |
Saeed Al-Azazi1,2, Alexander Singer3, Rasheda Rabbani1,2, Lisa M Lix4,5.
Abstract
BACKGROUND: Administrative health records (AHRs) and electronic medical records (EMRs) are two key sources of population-based data for disease surveillance, but misclassification errors in the data can bias disease estimates. Methods that combine information from error-prone data sources can build on the strengths of AHRs and EMRs. We compared bias and error for four data-combining methods and applied them to estimate hypertension prevalence.Entities:
Keywords: Administrative data; Electronic medical records; Misclassification bias; Prevalence; Statistical model
Mesh:
Year: 2019 PMID: 31266516 PMCID: PMC6604278 DOI: 10.1186/s12911-019-0845-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Hypertension case ascertainment algorithms from administrative health records (AHRs) and electronic medical records (EMRs)
| Data source | Contact frequency, source and duration | ICD 9-CM/10-CA diagnosis codes | ATC medication codes |
|---|---|---|---|
| AHR | 1 + H or 2 + P in 2 years | ICD-9-CM: 401–405 ICD-10-CA: I10-I13, I15 | |
| EMR | (2 + P in 2 years) or 1 + PL or 1 + Rx ever | ICD-9-CM: 401–405 | C07AB04, C09XA02, C03DB01, C08CA01, C07AB03, C07CB03, C09AA07, C09AA01, C07AG02, C03BA04, C09AA08, C09AA02, C09BA02, C09CA02, C09DA02, C08CA02, C09AA09,C03AA03, C03EA01, C03BA11, C09CA04, C09DA04, C09AA03, C09BA03, C09DA01, C02LB01, C03BA08, C09CA07, C07AA06, C09AA10, C03DB02, C09CA03, C08DA01 |
H Hospital discharge abstract, P Physician billing claim, PL Problem list, Rx Drug codes; ICD-9-CM/10-CA International Classification of Diseases, 9th Revision, Clinical Modification and 10th version of the Canadian version, ATC Anatomic Therapeutic Chemical classification system
Percent absolute relative bias (RB) and mean squared error (MSE) for computer simulation study
|
|
|
| |||||||||
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
| ||
|
| 0.00 | 9.5 | 47.5 | 7.5 | 9.5 | 11.3 | 23.1 | 59.4 | 1.3 | 48.3 | 49.5 |
|
|
|
|
|
|
| 22.9 | 59.4 | 1.5 | 41.7 | 54.3 | |
| 0.50 | 10.1 | 47.2 | 7.0 | 24.3 | 21.1 | 23.7 | 59.1 | 0.9 | 99.0 | 78.6 | |
| 18, 10% | 0.00 | 0.3 | 58.6 | 18.2 | 1.1 | 3.0 | 10.8 | 67.1 | 12.8 | 28.9 | 37.5 |
| 0.20 | 0.9 | 58.9 | 18.7 | 5.9 | 5.9 | 10.5 | 67.2 | 13.0 | 31.1 | 54.3 | |
| 0.50 | 0.2 | 58.3 | 17.7 | 48.8 | 26.1 | 11.2 | 66.8 | 12.4 | 108.8 | 90.1 | |
| 15, 15% | 0.00 | 4.1 | 49.2 | 11.5 | 3.7 | 4.3 | 17.6 | 61.7 | 5.8 | 41.3 | 42.5 |
| 0.20 | 3.6 | 49.4 | 12.0 | 3.1 | 3.6 | 17.2 | 61.9 | 6.1 | 37.6 | 50.1 | |
| 0.50 | 4.8 | 48.7 | 10.9 | 20.5 | 12.5 | 18.1 | 61.5 | 5.3 | 102.0 | 70.7 | |
|
|
|
| |||||||||
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
| ||
|
| 0.00 | 0.04 | 0.90 | 0.02 | 0.06 | 0.47 | 0.22 | 1.41 | < 0.01 | 0.99 | 1.76 |
|
|
|
|
|
|
| 0.21 | 1.41 | < 0.01 | 0.82 | 2.25 | |
| 0.50 | 0.04 | 0.89 | 0.02 | 1.06 | 1.47 | 0.23 | 1.40 | 0.00 | 4.68 | 4.46 | |
| 18, 10% | 0.00 | < 0.01 | 1.37 | 0.13 | 0.02 | 0.69 | 0.05 | 1.80 | 0.07 | 0.40 | 1.70 |
| 0.20 | < 0.01 | 1.39 | 0.14 | 0.06 | 1.16 | 0.05 | 1.80 | 0.07 | 0.70 | 3.48 | |
| 0.50 | < 0.01 | 1.36 | 0.13 | 2.28 | 2.32 | 0.05 | 1.79 | 0.06 | 5.36 | 6.16 | |
| 15, 15% | 0.00 | 0.01 | 0.97 | 0.05 | 0.03 | 0.46 | 0.13 | 1.53 | 0.01 | 0.74 | 1.62 |
| 0.20 | 0.01 | 0.98 | 0.06 | 0.02 | 0.72 | 0.12 | 1.53 | 0.02 | 0.74 | 2.20 | |
| 0.50 | 0.01 | 0.95 | 0.05 | 1.03 | 1.29 | 0.13 | 1.51 | 0.01 | 4.84 | 3.96 | |
|
|
|
| |||||||||
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
| ||
| 8, 7% | 0.00 | 11.5 | 55.8 | 8.4 | 9.7 | 35.0 | 29.6 | 67.1 | 1.1 | 76.2 | 154.9 |
| 0.20 | 10.5 | 56.1 | 9.2 | 42.6 | 37.8 | 28.7 | 67.4 | 0.3 | 196.7 | 217.1 | |
| 0.50 | 13.1 | 54.9 | 7.1 | 216.1 | 43.3 | 30.7 | 66.6 | 2.0 | 307.6 | 286.5 | |
| 8, 5% | 0.00 | 2.9 | 59.7 | 16.0 | 1.1 | 50.5 | 12.2 | 73.4 | 13.5 | 45.3 | 114.9 |
| 0.20 | 3.2 | 59.7 | 15.8 | 10.7 | 85.1 | 12.0 | 73.9 | 13.8 | 235.2 | 273.4 | |
| 0.50 | 3.8 | 59.2 | 15.3 | 230.5 | 198.2 | 14.2 | 73.3 | 12.1 | 322.2 | 334.8 | |
| 5, 5% | 0.00 | 14.7 | 70.0 | 30.9 | 6.3 | 92.1 | 7.4 | 78.7 | 28.4 | 61.0 | 193.0 |
| 0.20 | 15.4 | 70.2 | 31.5 | 134.4 | 149.1 | 8.0 | 78.7 | 28.8 | 271.0 | 217.9 | |
| 0.50 | 13.6 | 69.5 | 30.0 | 275.7 | 222.1 | 6.3 | 78.2 | 27.4 | 333.8 | 375.0 | |
|
|
|
| |||||||||
|
|
| ||||||||||
|
|
|
|
|
|
|
|
|
|
| ||
| 8, 7% | 0.00 | 0.01 | 0.31 | 0.01 | 0.28 | 2.01 | 0.09 | 0.45 | < 0.01 | 0.87 | 6.33 |
| 0.20 | 0.01 | 0.31 | 0.01 | 1.29 | 1.31 | 0.08 | 0.45 | < 0.01 | 5.19 | 8.53 | |
| 0.50 | 0.02 | 0.30 | 0.01 | 5.60 | 6.72 | 0.10 | 0.44 | < 0.01 | 9.97 | 12.77 | |
| 8, 5% | 0.00 | < 0.01 | 0.36 | 0.03 | 0.12 | 2.78 | 0.02 | 0.54 | 0.02 | 0.28 | 4.31 |
| 0.20 | < 0.01 | 0.36 | 0.03 | 0.59 | 3.59 | 0.02 | 0.55 | 0.02 | 7.28 | 12.63 | |
| 0.50 | < 0.01 | 0.35 | 0.02 | 6.39 | 8.45 | 0.02 | 0.54 | 0.02 | 10.91 | 16.73 | |
| 5, 5% | 0.00 | 0.02 | 0.49 | 0.10 | 0.57 | 6.96 | 0.01 | 0.62 | 0.08 | 1.92 | 9.37 |
| 0.20 | 0.02 | 0.49 | 0.10 | 4.82 | 8.11 | 0.01 | 0.62 | 0.08 | 9.11 | 9.70 | |
| 0.50 | 0.02 | 0.48 | 0.09 | 8.71 | 10.08 | < 0.01 | 0.61 | 0.08 | 11.79 | 18.41 | |
OR Rule-based OR method, AND Rule-based AND method, RSSA Rule-based sensitivity-specificity adjusted method, PSSA Probabilistic-based sensitivity-specificity adjusted; prevT denotes true population prevalence; denotes outcome prevalence; denotes correlation between data sources; denotes average correlation amongst disease markers using the exchangeable correlation pattern. * in PSSA(*) denotes the number of model markers (i.e., covariates) for PSSA method; we multiplied each MSE value by 100; The bolded simulation condition are consistent with the conditions observed for our numeric example of hypertension
Fig. 1Study flowchart
Socio-demographic characteristics and case ascertainment markers for the study cohort
| Characteristics | Frequency | % |
|---|---|---|
| Sex | ||
| Male | 29,802 | 43.3 |
| Female | 39,075 | 56.7 |
| Age group | ||
| 18–44 years | 33,007 | 47.9 |
| 45–64 years | 26,243 | 38.1 |
| 65+ years | 9627 | 14.0 |
| Region | ||
| Non-Winnipeg | 30,871 | 44.8 |
| Winnipeg | 38,006 | 55.2 |
| Income quintile | ||
| Not found | 8888 | 12.9 |
| Q1 (lowest) | 8858 | 12.9 |
| Q2 | 10,278 | 14.9 |
| Q3 | 12,154 | 17.6 |
| Q4 | 14,106 | 20.5 |
| Q5 (highest) | 14,593 | 21.2 |
| Charlson Comorbidity Score | ||
| 0 | 57,649 | 83.7 |
| 1 to 2 | 10,348 | 15.0 |
| 3+ | 880 | 1.3 |
| AHR-defined diseases | ||
| Cerebrovascular disease | 916 | 1.3 |
| Congestive heart failure | 558 | 0.8 |
| COPD | 1287 | 1.9 |
| Coronary heart disease | 2623 | 3.8 |
| Dementia | 625 | 0.9 |
| Depression | 7098 | 10.3 |
| Diabetes | 4176 | 6.1 |
| Obesity | 1623 | 2.4 |
| Renal disease | 916 | 1.3 |
| Substance abuse | 1387 | 2.0 |
| EMR-defined diseases | ||
| COPD | 181 | 0.3 |
| Dementia | 1130 | 1.6 |
| Depression | 11,005 | 16.0 |
| Diabetes | 6435 | 9.3 |
| Obesity | 15,191 | 22.1 |
Q Income quintile, COPD Chronic obstructive pulmonary disease
Fig. 2Hypertension prevalence estimates (%) for data-combining methods in the numeric example. Note: Error bars represent 95% confidence intervals; OR = rule-based OR method; AND = rule-based AND method; RSSA = rule-based sensitivity-specificity adjusted method; PSSA = probabilistic-based sensitivity-specificity adjusted method
Hypertension prevalence estimates (%) from administrative health records (AHRs) and electronic medical records (EMRs) in the numeric example
| Data Source/Method | Males (95% CI) | Females (95% CI) | 18–44 years (95% CI) | 45–64 years (95% CI) | 65+ years (95% CI) |
|---|---|---|---|---|---|
| AHR only | 31.7 (31.2–32.2) | 30.3 (29.8–30.8) | 10.3 (10.0–10.6) | 40.5 (39.9–41.1) | 75.3 (74.4–76.2) |
| EMR only | 26.0 (25.5–26.5) | 24.1 (23.7–24.5) | 9.0 (8.7–9.3) | 33.5 (32.9–34.1) | 56.4 (55.4–57.4) |
| OR | 35.7 (35.2–36.2) | 34.0 (33.5–34.5) | 12.8 (12.4–13.2) | 45.3 (44.7–45.9) | 78.8 (78.0–79.6) |
| AND | 22.1 (21.6–22.6) | 20.9 (20.5–21.3) | 6.4 (6.1–6.7) | 28.7 (28.1–29.3) | 53.0 (52.0–54.0) |
| RSSA | 33.4 (32.8–33.9) | 31.3 (30.6–31.8) | 11.9 (11.6–12.3) | 42.2 (41.6–42.8) | 73.8 (72.9–74.7) |
| PSSA, model 1 | 37.1 (36.8–37.3) | 34.9 (34.7–35.1) | 13.9 (13.7–14.2) | 46.9 (46.7–47.3) | 79.7 (79.4–80.0) |
| PSSA, model 2 | 37.0 (36.8–37.2) | 34.7 (34.5–35.0) | 13.6 (13.4–13.9) | 46.1 (45.9–46.4) | 79.4 (79.1–79.7) |
| PSSA, model 3 | 36.5 (36.2–36.7) | 34.5 (34.3–34.7) | 12.8 (12.6–13.0) | 46.3 (46.0–46.6) | 79.4 (79.1–79.8) |
| PSSA, model 4 | 35.1 (34.9–35.4) | 33.2 (32.9–33.5) | 12.2 (11.9–12.4) | 44.8 (44.5–45.1) | 79.1 (78.8–79.5) |
CI Confidence interval, OR Rule-based OR method, AND Rule-based AND method, RSSA Rule-based sensitivity-specificity adjusted method, PSSA Probabilistic-based sensitivity-specificity adjusted method, PSSA, model 1 covariates are sex, age group, region, income quintile, Charlson comorbidity score, chronic obstructive pulmonary disease (A, E), diabetes (A, E), depression (A, E), dementia (A, E), obesity (A, E), cerebrovascular disease (A), congestive heart failure (A), coronary heart disease (A), renal disease (A), substance abuse (A); PSSA, model 2 covariates are sex, age group, region, income quintile, chronic obstructive pulmonary disease (E), diabetes (E), depression (E), dementia (E), obesity (E), cerebrovascular disease (A), congestive heart failure (A), coronary heart disease (A), renal disease (A), substance abuse (A); PSSA, model 3 covariates are sex, age group, region, income quintile, chronic obstructive pulmonary disease (E), diabetes (E), depression (E), dementia (E), obesity (E), coronary heart disease (A), renal disease (A), substance abuse (A); PSSA, model 4 covariates are sex, age group, chronic obstructive pulmonary disease (E), diabetes (E), obesity (E), coronary heart disease (A), congestive heart failure (A), substance abuse (A); A and E denote disease-specific covariates that were identified from AHRs and EMRs, respectively
Model fit statistics for the PSSA method in the numeric example
| Model | Overall | Males | Females | 18–44 years | 45–64 years | 65+ years |
|---|---|---|---|---|---|---|
| 1 | 167,249 | 73,418 | 93,565 | 42,925 | 71,311 | 26,719 |
| 2 | 166,994 | 73,405 | 93,493 | 42,921 | 70,983 | 26,554 |
| 3 | 166,506 | 73,181 | 93,351 | 42,421 | 71,033 | 26,622 |
| 4 |
|
|
|
|
|
|
PSSA Probabilistic-based sensitivity-specificity adjusted method, PSSA, model 1 covariates are sex, age group, region, income quintile, Charlson comorbidity score, chronic obstructive pulmonary disease (A, E), diabetes (A, E), depression (A, E), dementia (A, E), obesity (A, E), cerebrovascular disease (A), congestive heart failure (A), coronary heart disease (A), renal disease (A), substance abuse (A); PSSA, model 2 covariates sex, age group, region, income quintile, chronic obstructive pulmonary disease (E), diabetes (E), depression (E), dementia (E), obesity (E), cerebrovascular disease (A), congestive heart failure (A), coronary heart disease (A), renal disease (A), substance abuse (A); PSSA, model 3 covariates are sex, age group, region, income quintile, chronic obstructive pulmonary disease (E), diabetes (E), depression (E), dementia (E), obesity (E), coronary heart disease (A), renal disease (A), substance abuse (A); PSSA, model 4 covariates are sex, age group, chronic obstructive pulmonary disease (E), diabetes (E), obesity (E), coronary heart disease (A), congestive heart failure (A), substance abuse (A); A and E denote disease-specific markers that were identified from AHRs and EMRs, respectively; Values in bold-face font represent the best-fitting model