| Literature DB >> 35275063 |
Stephanie N Howson1, Michael J McShea1, Raghav Ramachandran1, Howard S Burkom1, Hsien-Yen Chang2, Jonathan P Weiner2, Hadi Kharrazi2.
Abstract
BACKGROUND: A small proportion of high-need patients persistently use the bulk of health care services and incur disproportionate costs. Population health management (PHM) programs often refer to these patients as persistent high utilizers (PHUs). Accurate PHU prediction enables PHM programs to better align scarce health care resources with high-need PHUs while generally improving outcomes. While prior research in PHU prediction has shown promise, traditional regression methods used in these studies have yielded limited accuracy.Entities:
Keywords: ensemble methodology; machine learning; observational; persistent high utilizers; population health analytics; prediction; retrospective; utilization
Year: 2022 PMID: 35275063 PMCID: PMC8990371 DOI: 10.2196/33212
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Selection process of the study population. JHHC: Johns Hopkins Health Care; EDC: expanded diagnostic cluster.
Figure 2Stacking ensemble architecture. F&P: feature selection and predictions; PHU: persistent high utilizer; non-PHU: nonpersistent high utilizer.
Specification of the study populations (n=165,595).
|
| Overall study population (n=165,595) | Non-PHUa population (n=155,862) | PHU population (n=8359) | ||||
|
| 19.85 (17.45) | 18.79 (16.82) | 38.51 (18.01) | ||||
|
| 0-17, n (%) | 101,264 (61.2) | 99,352 (63.7) | 1459 (17.5) | |||
|
| 18-64, n (%) | 63,260 (38.2) | 55,666 (35.7) | 6730 (80.5) | |||
|
| 65+, n (%) | 1037 (0.6) | 844 (0.5) | 170 (2.0) | |||
| Sex (male), n (%) | 72,974 (44.1) | 69,683 (44.7) | 2735 (32.7) | ||||
|
| |||||||
|
| White | 41,492 (25.1) | 38,762 (24.9) | 2457 (29.4) | |||
|
| Black | 54,207 (32.7) | 50,993 (32.7) | 2879 (34.4) | |||
|
| Otherb | 149 (0.1) | 143 (0.1) | 6 (<0.1) | |||
|
| Missingc | 69,747 (42.1) | 65,964 (42.3) | 3017 (36.1) | |||
|
| |||||||
|
| 0 | 160,035 (96.6) | 151,971 (97.5) | 6792 (81.3) | |||
|
| 1-5 | 5430 (3.3) | 3866 (2.5) | 1500 (17.9) | |||
|
| 6-10 | 77 (<0.1) | 20 (<0.1) | 54 (0.6) | |||
|
| 11+ | 19 (<0.1) | 5 (<0.1) | 13 (0.2) | |||
|
| |||||||
|
| 0 | 3720 (2.2) | 3663 (2.4) | 27 (0.3) | |||
|
| 1-5 | 96,122 (58.0) | 94,138 (60.4) | 1234 (14.8) | |||
|
| 6-10 | 33,996 (20.5) | 32,317 (20.7) | 1428 (17.1) | |||
|
| 11+ | 31,723 (19.2) | 25,744 (16.5) | 5670 (67.8) | |||
aPHU: persistent high utilizer.
bMembers of known race/ethnicity not equal to Asian, Hispanic, White, or Black.
cMembers with empty values for race.
Figure 3Classification threshold of sensitivity versus positive predictive value (PPV): patient A: incorrectly classified as normal (risk score=82%) and patient B: correctly classified as a persistent high utilizer (risk score=97%).
Model fit statistics for predicting persistent high utilizer status.
| Model | Parameter tuning | Sensitivity, % | PPVa, % |
| Stacking ensemble | CNB1 𝛼=.70, fit prior, norm 200 estimators 400 maxd depth 5 mine samples split 0.01% min samples auto max features class weight=0.842 100 estimators 350 max depth 2 min samples split 0.01% min samples class weight=1.0 | 49.0 | 50.3 |
| RF |
300 estimators 500 max depth 20 min samples split 0.01% min samples leaf | 48.4 | 47.2 |
| JHU-ACGf | ACGg system probability of PHUh | 44.7 | 44.1 |
| Logistic regression | Based on 241 parameters (ie, diagnoses and medications) | 46.8 | 46.1 |
aPPV: positive predictive value.
bCNB: complement naïve Bayes.
cRF: random forest.
dmax: maximum.
emin: minimum.
fJHU-ACG: ACG predictive model with no local tuning.
gACG: adjusted clinical group.
hPHU: persistent high utilizer.