| Literature DB >> 35224458 |
Elham Hatef1, Masoud Rouhizadeh2, Claudia Nau3, Fagen Xie3, Christopher Rouillard4, Mahmoud Abu-Nasser4, Ariadna Padilla3, Lindsay Joe Lyons3, Hadi Kharrazi1,5, Jonathan P Weiner1, Douglas Roblin4.
Abstract
OBJECTIVE: To evaluate whether a natural language processing (NLP) algorithm could be adapted to extract, with acceptable validity, markers of residential instability (ie, homelessness and housing insecurity) from electronic health records (EHRs) of 3 healthcare systems.Entities:
Keywords: electronic health record; homelessness; housing insecurity; natural language processing; social determinants of health
Year: 2022 PMID: 35224458 PMCID: PMC8867582 DOI: 10.1093/jamiaopen/ooac006
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Approach for gold standard and feature development across study sites
| JHHS | KPMAS | KPSC | |
|---|---|---|---|
| NLP validation | |||
| Gold standard method | Social needs questionnaire | Social needs questionnaire | Social needs ICD codes & manual annotation |
| Training and validation data sets | |||
| Patients/responses (count) | 3572 | 8197 | 300 |
| (1786+/1786−) | (833+,7364−) | (150+/150−) | |
| Clinical notes (count) | 299 307 | 78 825 | 9575 |
| Note type | All note types; lab results, radiology, and pathology reports excluded | Case management, complex care program, family practice, internal medicine, psychotherapy, and utilization management departments | All note types; discharge instructional/administrative notes excluded |
| Breakdown of training and validation data sets | Randomly selected 50% of the sample for the training data set and reserved the remaining subset for a hold-out validation data set | Randomly selected 80% of the sample for the training data set and reserved the remaining subset for a hold-out validation data set | Randomly split the sample into 5 subsets (each with 30+/30− patients). Used 4 of the subsets for the training data set and the fifth set as a hold-out validation data set |
| Feature development | |||
| Hand-crafted linguistic patterns | TF-IDF Vectorizer feature extraction | Hand-crafted linguistic patterns | |
Total number of patients and those with positive and negative responses to residential instability questions in the social needs questionnaire included in the training and validation data sets. JHHS site then randomly selected 50% of the total sample to develop the training data set and reserved the remaining subset for evaluation of model over-fitting in a hold-out validation data set. KPMAS randomly selected 80% of the total survey data set to develop the training data set and reserved 20% of the total survey data set, of which 833 were positive and 7364 were negative for residential instability response, for evaluation of model over-fitting in a hold-out validation data set.
KPSC randomly split the total sample into 5 subsets, each set containing 30 patients from the positive and 30 patients from the negative residential instability groups. We used 4 of the subsets for the iterative adaption of the NLP algorithm and the fifth set of 30 positive and 30 negative patients for the final evaluation of the model over-fitting as a hold-out validation data set.
JHHS: Johns Hopkins Health System, KPMAS: Kaiser Permanente Mid-Atlantic States, KPSC: KP Southern California, NLP: Natural Language Processing, TF-IDF: Term Frequency-Inverse Document Frequency (TF-IDF).
Characteristics of the study population at each study site
| JHHS | KPMAS | KPSC | ||||
|---|---|---|---|---|---|---|
| Total | Residential instability | Total | Residential instability | Total | Residential instability | |
| Study population, | ||||||
| 47 440 (100) | 1786 (3.8) | 25 727 (100) | 2905 (11.3) | 300 | 138 | |
| Age, | ||||||
| 18–34 | 4592 (9.7) | 283 (15.8) | 3612 (14.0) | 505 (17.4) | 97 (32.3) | 39 (28.3) |
| 35–44 | 3303 (7.0) | 290 (16.2) | 1858 (7.2) | 342 (11.8) | 57 (19.0) | 34 (24.6) |
| 45–64 | 12 526 (26.4) | 747 (41.8) | 8404 (32.7) | 1217 (41.9) | 95 (31.7) | 52 (37.8) |
| 65–74 | 8732 (18.4) | 186 (10.4%) | 5634 (21.9) | 467 (16.1) | 27 (9.0) | 11 (8.0) |
| 12 996 (27.4) | 109 (6.1%) | 6219 (24.2) | 374 (12.8) | 24 (8.0) | 2 (1.5) | |
| Gender, | ||||||
| Male | 25 425 (53.6) | 667 (37.3) | 10 204 (39.7) | 1053 (36.3) | 156 (52.0) | 88 (63.8) |
| Female | 22 013 (46.4) | 1119 (62.6) | 15 523 (60.3) | 1852 (63.7) | 144 (48.0) | 50 (36.2) |
| Race/Ethnicity, | ||||||
| Hispanic | 6 (0.01) | 1 (0.05) | 2010 (7.8) | 222 (7.7) | 112 (37.3) | 45 (32.6) |
| Non-Hispanic Black | 15 102 (31.8) | 844 (49.5) | 12 345 (48.0) | 1641 (56.5) | 42 (14.0) | 24 (17.4) |
| Non-Hispanic White | 26 899 (56.7) | 801 (44.8) | 8547 (33.2) | 820 (28.2) | 96 (32.0) | 47 (34.1) |
| Asian/Pacific-Islander | 1802 (3.8) | 12 (0.7) | 2244 (8.7) | 193 (6.6) | 21 (7.0) | 2 (1.5) |
| Other/unknown | 3483 (7.3) | 115 (6.4) | 581 (2.3) | 29 (1.0) | 29 (9.7) | 20 (14.5) |
| Insurance, | ||||||
| Medicaid | 31 (0.1) | 3 (0.2) | 4391 (17.1) | 877 (30.2) | 21 (7.0) | 8 (5.8) |
| Medicare | 0 (0) | 0 (0) | 13 154 (51.1) | 1,095 (37.7) | 45 (15.0) | 8 (5.8) |
| Deductible | 702 (1.5) | 12 (0.7) | 2114 (8.2) | 236 (8.1) | 19 (6.3) | 3 (2.2) |
| Standard HMO | 236 (0.5%) | 11 (0.6) | 5756 (22.4) | 640 (22.0) | 89 (29.7) | 16 (11.6) |
| Other | 41 180 (86.8) | 1589 (88.9) | 312 (1.2) | 57 (2.0) | 126 (42.0) | 103 (74.5) |
The JHHS site assessed over 30 EHR questionnaires and flowsheets addressing residential instability and identified 5 relevant ones. Between July 2016 and June 2020 we identified, 1786 patients with positive response and 45 654 patients with negative response to residential instability questions.
The KPMAS site extracted the YCLS survey data from the EHR. Between March 2017 and June 2020 we identified a total of 40 372 YCLS survey responses completed by 25 727 KPMAS adult members, 2905 of whom indicated residential instability. We used YCLS survey responses to generate a binary label of a patient’s needs related to residential instability. We assigned a positive label (1) to patients with a survey response indicating an unmet housing need and a negative label (0) to patients with a survey response indicating no current housing need.
KPSC site randomly selected 150 hospitalized or ED patients between January 2016 and December 2019 with residential instability diagnosis or homeless checklist and 150 without residential instability diagnosis or homeless checklist. Positive cases (138 patients) were those identified as residentially unstable during the chart review process.
Including self-pay.
124 of 126 are non-KP members.
102 of 103 are non-KP member.
EHR: Electronic Health Record, HMO: Health Maintenance Organization, JHHS: Johns Hopkins Health System, KPMAS: Kaiser Permanente Mid-Atlantic States, KPSC: KP Southern California, YCLS: Your Current Life Situation.