| Literature DB >> 34664641 |
Lily A Cook1, Jonathan Sachs1, Nicole G Weiskopf1.
Abstract
OBJECTIVE: The aim of this study was to collect and synthesize evidence regarding data quality problems encountered when working with variables related to social determinants of health (SDoH).Entities:
Keywords: Hispanic Americans; bias; data quality; healthy equity; social determinants of health
Mesh:
Year: 2021 PMID: 34664641 PMCID: PMC8714289 DOI: 10.1093/jamia/ocab199
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Figure 1.PRISMA flow diagram.
Eligibility criteria for articles
| √ Included: | X Excluded: | |
|---|---|---|
| Topic/Focus | Original, peer-reviewed research focused on the quality of social determinants of health data. | Reviews; opinion pieces; research that has not been peer-reviewed. |
| Social Determinants of Health Factors | Race/ethnicity, language preference, health insurance status, country of origin, occupation, socioeconomic status, education level, environmental health (proximity to healthy food, walkability, exposure to environmental toxins, etc.), geocoded patient address data (only included if the article primarily focused on linking clinical data to external datasets for research on social determinants) | Behaviors (eg, smoking and exercise) |
| Sources of Health Data | Clinical sources within the United States and Canada: EHR, medical registries, administrative databases compiled from EHR data, observational studies using clinical data pulled directly from the medical records of participants | Nonclinical sources: population-level data, mHealth sources, genomic datasets, vital records (ie, birth or death certificate data); Clinical sources outside the United States or Canada |
| Language | Articles written in English. | Articles in languages other than English. |
Characteristics of studies included in this review
| Primary social determinant of health, | ||||||
|---|---|---|---|---|---|---|
| Primary data quality issue | Race, ethnicity, country of origin | Insurance status | Occupation | General community-level | Environmental | Nonspecific |
|
| 15 (37.5%) | 1 (100%) | 6 (86%) | 2 (12.5%) | 1 (20%) | 6 (86%) |
|
| 0 | 0 | 1 (14%) | 10 (62.5%) | 3 (60%) | 0 |
|
| 25 (62.5%) | 0 | 0 | 4 (25%) | 1 (20%) | 1 (14%) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| administrative or demographic sources | patient address is geocoded to link community-level data | diagnosis codes | |||
Findings about bias and differential data quality
| Bias Finding? | Social determinant | Bias type | Articles reporting that finding, |
|---|---|---|---|
| Yes | Race/ethnicity | Misclassification | 19 (25.0) |
| Missing Not at Random (MNAR) | 9 (11.8) | ||
| Differentially implausible | 2 (2.6) | ||
| Other | 1 (1.3) | ||
| Insurance | Missing Not at Random (MNAR) | 1 (1.3) | |
| Occupation | Missing Not at Random (MNAR) | 4 (5.3) | |
| General Community Level | Rural data are problematic | 3 (3.9) | |
| Other | 3 (3.9) | ||
| Environmental | Misclassification | 3 (3.9) | |
| Nonspecific | Missing Not at Random (MNAR) | 2 (2.6) | |
|
| Did not evaluate for bias | 24 (31.6) | |
|
| Evaluated for bias and found none | 5 (6.6) | |
Summary of recommendations found in the articles
| Five ways to increase data quality | |
|---|---|
| Recommendation | References supporting this recommendation |
| 1. Avoid complete case analysis |
|
| 2. Impute data |
|
| 3. Rely on multiple sources |
|
| 4. Use validated software tools |
|
| 5. Select addresses thoughtfully |
|