| Literature DB >> 35974092 |
Lorinda Coombs1,2, Abigail Orlando3, Xiaoliang Wang3, Pooja Shaw3, Alexander S Rich3, Shreyas Lakhtakia3, Karen Titchener1, Blythe Adamson3, Rebecca A Miksad4, Kathi Mooney1.
Abstract
We present a general framework for developing a machine learning (ML) tool that supports clinician assessment of patient risk using electronic health record-derived real-world data and apply the framework to a quality improvement use case in an oncology setting to identify patients at risk for a near-term (60 day) emergency department (ED) visit who could potentially be eligible for a home-based acute care program. Framework steps include defining clinical quality improvement goals, model development and validation, bias assessment, retrospective and prospective validation, and deployment in clinical workflow. In the retrospective analysis for the use case, 8% of patient encounters were associated with a high risk (pre-defined as predicted probability ≥20%) for a near-term ED visit by the patient. Positive predictive value (PPV) and negative predictive value (NPV) for future ED events was 26% and 91%, respectively. Odds ratio (OR) of ED visit (high- vs. low-risk) was 3.5 (95% CI: 3.4-3.5). The model appeared to be calibrated across racial, gender, and ethnic groups. In the prospective analysis, 10% of patients were classified as high risk, 76% of whom were confirmed by clinicians as eligible for home-based acute care. PPV and NPV for future ED events was 22% and 95%, respectively. OR of ED visit (high- vs. low-risk) was 5.4 (95% CI: 2.6-11.0). The proposed framework for an ML-based tool that supports clinician assessment of patient risk is a stepwise development approach; we successfully applied the framework to an ED visit risk prediction use case.Entities:
Year: 2022 PMID: 35974092 PMCID: PMC9380664 DOI: 10.1038/s41746-022-00660-3
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Baseline patient characteristics for model training and retrospective evaluation.
| Characteristics | Categories | Model training cohort | Retrospective cohort |
|---|---|---|---|
| Age, years | Median (range) | 64 (18–100) | 65 (18–101) |
| Gender | Male | 47% | 47% |
| Female | 53% | 53% | |
| Ethnicity | Hispanic | 7% | 7% |
| Non-Hispanic | 93% | 93% | |
| Race | White | 88% | 87% |
| Black | 1% | 1% | |
| Asian | 2% | 2% | |
| Other | 8% | 8% | |
| Unknown | 1% | 2% | |
| Cancer Sitesa | Breast | 24% | 23% |
| Unspecified primary malignant neoplasms | 22% | 21% | |
| Non-melanoma skin neoplasms | 20% | 22% | |
| Prostate | 15% | 16% | |
| Lung | 8% | 10% | |
| Multiple myeloma | 7% | 8% | |
| Non-Hodgkin’s lymphoma | 11% | 11% | |
| Leukemia | 8% | 9% | |
| Colon | 6% | 6% | |
| Melanoma of skin | 9% | 8% | |
| Bone/connective tissue | 8% | 9% | |
| Brain | <5% | 5% | |
| Head and neck | 6% | 6% |
aOnly cancer sites with ≥5% prevalence at baseline are listed in the table.
Retrospective and prospective evaluation results.
| Model performance metric | Retrospective result | Prospective result |
|---|---|---|
| ED prevalence | 10% | 7% |
| Predicted risk level, proportion of patients classified as “high risk” | 8% | 10% |
| Sensitivity (sens) [aka: recall] | 19% (95% CI: 19–20) | 32% (95% CI: 18–48) |
| Specificity (spec) | 93% (95% CI: 93–93) | 92% (95% CI: 90–94) |
| PPV | 26% (95% CI: 26–26) | 22% (95% CI: 12–34) |
| NPV | 91% (95% CI: 91–91) | 95% (95% CI: 93–97) |
| OR of ED visit (high-risk vs. low-risk patients) | 3.5 (95% CI: 3.4–3.5) | 5.4 (95% CI: 2.6–11.0) |
Prospective evaluation metrics are at the patient level and retrospective evaluation metrics are calculated at the encounter level.
ED emergency department, NPV negative predictive value, PPV positive predictive value.
Calibration factor results.
| Group type | Group | Estimated calibration factor [95% confidence interval] |
|---|---|---|
| Ethnicity | Hispanic | 0.024 [−0.023, 0.064] |
| Not Hispanic | 0.004 [−0.006, 0.014] | |
| Gender | Female | −0.001 [−0.013, 0.011] |
| Male | 0.014 [−0.002, 0.029] | |
| Race | Asian | 0.003 [−0.048, 0.046] |
| Black | −0.064 [−0.134, 0.011] | |
| Other | 0.033 [−0.009, 0.072] | |
| Unknown | −0.023 [−0.068, 0.015] | |
| White | 0.005 [−0.006, 0.015] |
Fig. 1Calibration factor, by stratification.
Calibration factor with 95% confidence intervals, stratified on different race, gender and ethnicity values.
Baseline characteristics among Huntsman patients with cancer in the prospective validation study.
| Characteristics | Categories | Hold out cohort | Deliverable cohort | ||||
|---|---|---|---|---|---|---|---|
| Overall | High risk | Low risk | Overall | High risk | Low risk | ||
| Age (years) | Median | 65 | 63 | 65 | 65 | 64 | 65 |
| Mean (range) | 63 (20–95) | 62 (29–95) | 63 (20–95) | 63 (20–93) | 62 (28–88) | 63 (20–93) | |
| Gender, | Male | 300 (47) | 27 (44) | 273 (48) | 278 (46) | 24 (55) | 254 (45) |
| Female | 333 (53) | 34 (56) | 299 (52) | 325 (54) | 20 (45) | 305 (55) | |
| Ethnicity, | Hispanic | 51 (8) | 8 (13) | 43 (8) | 41 (7) | 5 (11) | 36 (6) |
| Non-Hispanic | 582 (92) | 53 (87) | 529 (92) | 562 (93) | 39 (89) | 523 (94) | |
| Race, | White | 529 (84) | 49 (80) | 480 (84) | 512 (85) | 33 (75) | 479 (86) |
| Black | 10 (2) | 1 (2) | 9 (2) | 10 (2) | 3 (7) | 7 (1) | |
| Asian | 15 (2) | 3 (5) | 12 (2) | 16 (3) | 2 (5) | 14 (3) | |
| Other | 66 (10) | 8 (13) | 58 (10) | 52 (9) | 6 (14) | 46 (8) | |
| Unknown | 13 (2) | 0 (0) | 13 (2) | 13 (2) | 0 (0) | 13 (2) | |
| Medicaid, | Yes | 76 (12) | 28 (46) | 48 (8) | 63 (10) | 16 (36) | 47 (8) |
| No | 557 (88) | 33 (54) | 524 (92) | 540 (90) | 28 (64) | 512 (92) | |
| H@H enrollment at index encounter, | Yes | 20 (3) | 9 (15) | 11 (2) | 26 (4) | 9 (20) | 17 (3) |
| No | 613 (97) | 52 (85) | 561 (98) | 577 (96) | 35 (80) | 542 (97) | |
| Number of cancer diagnosis, | 1 | 309 (49) | 28 (46) | 281 (49) | 295 (49) | 17 (39) | 278 (50) |
| 2 | 175 (28) | 13 (21) | 162 (28) | 150 (25) | 14 (32) | 136 (24) | |
| 3 | 80 (13) | 7 (11) | 73 (13) | 75 (12) | 6 (14) | 69 (12) | |
| 4 | 31 (5) | 6 (10) | 25 (4) | 40 (7) | 4 (9) | 36 (6) | |
| ≥5 | 38 (6) | 7 (11) | 31 (5) | 43 (7) | 3 (7) | 40 (7) | |
| Cancer Sitesa, | Breast | 129 (20) | 3 (5) | 126 (22) | 129 (21) | 2 (5) | 127 (23) |
| Unspecified primary malignant neoplasms | 123 (19) | 18 (30) | 105 (18) | 152 (25) | 14 (32) | 138 (25) | |
| Non-melanoma skin neoplasms | 111 (18) | 9 (15) | 102 (18) | 117 (19) | 6 (14) | 111 (20) | |
| Prostate | 95 (15) | 8 (13) | 87 (15) | 83 (14) | 6 (14) | 77 (14) | |
| Lung | 63 (10) | 11 (18) | 52 (9) | 66 (11) | 5 (11) | 61 (11) | |
| Multiple myeloma | 56 (9) | 2 (3) | 54 (9) | 70 (12) | 8 (18) | 62 (11) | |
| Non-Hodgkin’s lymphoma | 54 (9) | 6 (10) | 48 (8) | 65 (11) | 5 (11) | 60 (11) | |
| Leukemia | 53 (9) | 7 (11) | 46 (8) | 57 (9) | 3 (7) | 54 (10) | |
| Colon | 46 (7) | 11 (18) | 35 (6) | 31 (5) | 5 (11) | 26 (5) | |
| Melanoma of skin | 42 (7) | 5 (8) | 37 (6) | 45 (7) | 3 (7) | 42 (8) | |
| Bone/connective tissue | 40 (6) | 5 (8) | 35 (6) | 43 (7) | 4 (9) | 39 (7) | |
| Kidney/renal pelvis | 35 (6) | 6 (10) | 29 (5) | 26 (4) | 1 (2) | 25 (4) | |
| Rectum | 33 (5) | 9 (15) | 24 (4) | 21 (3) | 2 (5) | 19 (3) | |
| Ovary | 31 (5) | 3 (5) | 28 (5) | 28 (5) | 4 (9) | 24 (4) | |
| Uterus | 30 (5) | 2 (3) | 28 (5) | 22 (4) | 3 (7) | 19 (3) | |
| Brain | 29 (5) | 4 (7) | 25 (4) | 26 (4) | 3 (7) | 23 (4) | |
| Head and neck | 27 (4) | 4 (7) | 23 (4) | 34 (6) | 4 (9) | 30 (5) | |
aOnly cancer sites with >5% prevalence at baseline are listed in the table.
H@H Huntsman at Home Program.
ML-based clinical tool evaluation framework steps.
• Define patient care/quality goals • Identify actionable clinical events that if predicted help achieve goals • Establish metrics and results required to identify “at risk” patients • Evaluate if this type of tool is useful for furthering goals • Determine how the tool will embed into clinical workflow, and what actions need to be taken based on predicted clinical event • Define key metrics for evaluating clinical impact of risk predictions | All stakeholders (clinicians, business leaders, data scientists, etc.,) will have a clear understanding of how deployment of an ML-based clinical tool will help to achieve quality improvement goals. Teams should be able to fill in this statement: “If the care team knows that X event will happen, they will take Y action, to increase Z value”. |
| Decide whether to build a custom ML-based tool or acquire an existing ML-based tool that is practical, customizable, and suited for the practice’s local data patterns | Organization will be equipped with the right ML-based clinical tool for their intended goals |
| Retrospectively apply model to a representative historical patient population from the institution and then compare predictions with known past observed events to confirm if the tool meets desired metrics | Allows the organization to expediently assess the suitability of the ML-based clinical tool for the prediction task at hand |
• Proactively evaluate for bias, including treatment pattern disparities or lack of representation, choice of modeling approach, or choice of predicted clinical event • Make necessary adjustments to the tool before there is any impact on patients | Ensures that the ML-based clinical tool algorithms do not reproduce real-world inequalities that can occur as a result of treatment pattern disparities or a lack of representation encoded in datasets, the choice of modeling approach, or the choice of predicted clinical event |
• Conduct a prospective evaluation on a present-day, real-world patient population in a randomized setting to understand how well the model is likely to perform in real time • Note: This step may not be necessary in every case if the ML-based tool has been prospectively evaluated and its performance in real-world setting monitored | Prospective validation is considered the “gold standard” of ML model validation when applied to the point-of-care setting because it shows the clinical team how well the model is likely to perform in real time where several factors can affect model performance, such as recent pattern changes in the real world (e.g., occurrence of a pandemic), care delivery (e.g., updates to clinical standards), or technical or operational issues (e.g., data entry delays that can make a system unusable in practice) |
• Adopt tool into standard clinical workflow • Conduct data quality monitoring, performance monitoring, and bias monitoring • The ML-based tool should not replace traditional patient identification processes, but support them with a data-driven approach that also enhances their efficiency | The ML-based tool can now be used to achieve the quality improvement goal defined in Step 1. Ongoing monitoring ensures the model’s suitability in the dynamic clinical environment of the real world where patterns of care seeking and care delivery evolve, and that model predictions are not impacted by manual or technical errors that could inadvertently affect a patient’s predicted risk and/or access to supplemental care. |
ML machine learning.
Definitions of metrics to assess model accuracy.
| Metrics | Definition |
|---|---|
| ED prevalence (%) | Prevalence of observed 60-day ED visit: proportion (0–100%) |
| Predicted risk level | Binary, high/low; proportion (high risk %) |
| Sensitivity (sens) [aka: recall] | Proportion of encounters classified as high risk among those with ED visit |
| Specificity (spec) | Proportion of encounters classified as low risk among those without ED visit |
| Positive predictive value | Proportion of encounters followed by an ED visit among those classified as high risk |
| Negative predictive value | Proportion of encounters without a subsequent ED visit among those classified as low risk |
| Odds ratio | Odds ratio of ED visit among high-risk encounters vs. low-risk encounters |
ED emergency department.
Fig. 2Prospective evaluation randomization approach.
ED emergency department. aIf the patient’s predicted probability of risk is greater than the classification threshold the patient will be classified as “high risk”, otherwise the patient will be classified as “low risk”. The classification threshold is selected using retrospective data prior to the start of the prospective evaluation.