| Literature DB >> 35296719 |
Jennifer Holcomb1,2, Luis C Oliveira3,4, Linda Highfield5,6, Kevin O Hwang7, Luca Giancardo8, Elmer Victor Bernstam9,10.
Abstract
Providers currently rely on universal screening to identify health-related social needs (HRSNs). Predicting HRSNs using EHR and community-level data could be more efficient and less resource intensive. Using machine learning models, we evaluated the predictive performance of HRSN status from EHR and community-level social determinants of health (SDOH) data for Medicare and Medicaid beneficiaries participating in the Accountable Health Communities Model. We hypothesized that Medicaid insurance coverage would predict HRSN status. All models significantly outperformed the baseline Medicaid hypothesis. AUCs ranged from 0.59 to 0.68. The top performance (AUC = 0.68 CI 0.66-0.70) was achieved by the "any HRSNs" outcome, which is the most useful for screening prioritization. Community-level SDOH features had lower predictive performance than EHR features. Machine learning models can be used to prioritize patients for screening. However, screening only patients identified by our current model(s) would miss many patients. Future studies are warranted to optimize prediction of HRSNs.Entities:
Mesh:
Year: 2022 PMID: 35296719 PMCID: PMC8927567 DOI: 10.1038/s41598-022-08344-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Data sources and linkage for modeling. Flow chart showing the data sources combined to create the dataset displayed. The data sources are displayed as three cylinders displaying the data linking between sources. The measures from each source are displayed as a rectangle linking to other cylinders. Patient-level HRSNs were collected in the AHC screening survey. Using survey demographic data, patients were mapped using a Master Patient Index database (MPI) to patient ID, demographics, diagnosis, and procedures in the EHR. Patients addresses provided in the survey and the EHR were geocoded to each patient’s address and corresponding Census tract. The geocoding is displayed as a diamond connected to the HRSN survey data measures.
Figure 2Consort flow diagram. Consort flow diagram depicting the sample size at each step of the data linkage. The diagram moves downward with each step displayed as a rectangle. Patients were excluded from the final datasets if they had no EHR data, if there was not alignment with the EHR and survey addresses, and if their geocoded location was missing corresponding Census data. These exclusions are depicted as rectangles with arrows along the diagram indicating where a patient sample was excluded. From these exclusions, the bottom and final three rectangles depict the training, validation, and test datasets included in the data analysis.
Demographic Characteristics and Health-Related Social Needs (HRSNs) of CMS Beneficiaries in the Accountable Health Communities (AHC) Model in the Greater Houston Area, September 2018 to December 2020.
| Characteristics | Sample (n = 9800) |
|---|---|
| Patients, No. (%) | |
| Age, mean (SD), years | 35.5 (26.3) |
| Black or African American | 3978 (40.6) |
| Other | 2857 (29.2) |
| White | 1763 (18.0) |
| Latin American | 612 (6.2) |
| Hispanic or Latino | 337 (3.4) |
| American Indian or Alaska Native | 23 (0.2) |
| Asian or Pacific Islander | 15 (0.2) |
| Unknowna | 215 (2.2) |
| Single | 5809 (59.3) |
| Married | 1411 (14.4) |
| Widowed | 368 (3.8) |
| Divorced | 365 (3.7) |
| Separated | 67 (0.7) |
| Life Partner | 6 (0.1) |
| Legally Separated | 6 (0.1) |
| Unknown | 1768 (18.0) |
| Female | 5162 (52.7) |
| Male | 3049 (31.1) |
| Unknown | 1589 (16.2) |
| Medicaid | 8370 (85.4) |
| Medicare | 2231 (22.8) |
| Housing instability and/or quality | 2876 (29.3) |
| Food insecurity | 3780 (38.6) |
| Transportation | 2722 (27.8) |
| Difficulty paying utility bills | 2582 (26.3) |
| Any core need | 5588 (57.0) |
| All core needs | 813 (8.3) |
| Health System A | 8211 (83.8) |
| Health System B | 1108 (11.3) |
| Health System C | 481 (4.9) |
aIncludes "Unknown", "Declined", "Not Answered” responses and records that had no response.
bPatients could be in multiple categories so numbers do not sum to total.
Figure 3Machine learning model predictive value by HRSN status.