| Literature DB >> 34042600 |
Gang Luo1.
Abstract
Using machine learning predictive models for clinical decision support has great potential in improving patient outcomes and reducing health care costs. However, most machine learning models are black boxes that do not explain their predictions, thereby forming a barrier to clinical adoption. To overcome this barrier, an automated method was recently developed to provide rule-style explanations of any machine learning model's predictions on tabular data and to suggest customized interventions. Each explanation delineates the association between a feature value pattern and an outcome value. Although the association and intervention information is useful, the user of the automated explaining function often requires more detailed information to better understand the patient's situation and to aid in decision making. More specifically, consider a feature value in the explanation that is computed by an aggregation function on the raw data, such as the number of emergency department visits related to asthma that the patient had in the prior 12 months. The user often wants to rapidly drill through to see certain parts of the related raw data that produce the feature value. This task is frequently difficult and time-consuming because the few pieces of related raw data are submerged by many pieces of raw data of the patient that are unrelated to the feature value. To address this issue, this paper outlines an automated lineage tracing approach, which adds automated drill-through capability to the automated explaining function, and provides a roadmap for future research. ©Gang Luo. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 27.05.2021.Entities:
Keywords: clinical decision support; database management systems; electronic medical records; forecasting; machine learning
Year: 2021 PMID: 34042600 PMCID: PMC8193496 DOI: 10.2196/27778
Source DB: PubMed Journal: JMIR Med Inform
An example list of encounters of a patient with asthma displayed on the standard interface of an electronic medical record system.a
| Visit date | Primary diagnosisb | Visit type | Department | Provider | Facility |
| Dec 20, 2020 | Cough (R05) | Outpatient | HMCc family medicine clinic | John Smith | HMC |
| Dec 18, 2020 | Dysphagia, unspecified (R13.10) | Outpatient | HMC family medicine clinic | David Wong | HMC |
| … | … | … | … | … | … |
| Oct 15, 2020 | Cystitis, unspecified without hematuria (N30.90) | Inpatient | UWMCd 8SE | Leslie Hurdle | UWMC |
|
|
|
|
|
|
|
| Oct 09, 2020 | Dizziness and giddiness (R42) | Outpatient | HMC family medicine clinic | Eve Johnson | HMC |
| … | … | … | … | … | … |
| Feb 11, 2020 | Posttraumatic stress disorder, unspecified (F43.10) | Outpatient | HMC psychotherapy clinic | Amy Jiang | HMC |
|
|
|
|
|
|
|
| Feb 03, 2020 | Headache, unspecified (R51.9) | Outpatient | HMC family medicine clinic | Jude Lake | HMC |
| … | … | … | … | … | … |
aThis example list is made based on a similar list seen in real electronic medical record data at the University of Washington Medicine.
bThis column does not show up on the standard interface. This column is included because it will be discussed in this paper.
cHMC: Harborview Medical Center.
dUWMC: University of Washington Medical Center.
eFor the feature value “2 of the number of emergency department visits related to asthma that the patient had in the prior 12 months,” the related rows in the list producing the feature value are marked in italics.
fHEDUCC: Harborview Emergency Department Urgent Care Center.
An example of the parts of the related raw data that should be displayed for a feature value.a
| Visit date | Primary diagnosis | Department | Provider | Facility |
| Oct 12, 2020 | Viral infection, unspecified (B34.9) | HMCb HEDUCCc | Patricia Sward | HMC |
| Feb 08, 2020 | Syncope and collapse (R55) | HMC HEDUCC | Peter Shavlik | HMC |
aFor the example list shown in Table 1 and the feature value “2 of the number of emergency department visits related to asthma that the patient had in the prior 12 months,” the parts that the user of the automated explaining function wants to see are in the related raw data producing the feature value.
bHMC: Harborview Medical Center.
cHEDUCC: Harborview Emergency Department Urgent Care Center.
Figure 1The flow chart for building a clinical machine learning predictive model on the training data, making predictions on the new data, and using our automated method to explain the model’s predictions.
Figure 2A logical query plan for the select-project-join-aggregate query Q given in the “Intermediate result tables” section.
Figure 3The high-level logical query plan for computing the unified data frame that contains all the features of the new data. SQL: structured query language.
Figure 4The hierarchy of intermediate materialized views matching the canonical form of the logical query plan for the definition query of the materialized view enc_features_3_view.
The 5 unique requirements of automated lineage tracing for automatically explaining machine learning predictions for clinical decision support.
| Requirement | Reason for posing the requirement |
| Retrieving only a small set of attributes | To prevent the user from being overwhelmed by many nonessential or irrelevant attributes |
| Adding some essential attributes that do not directly produce the feature value | To make the retrieved lineage information include the most essential content |
| Sorting the retrieved lineage information in an appropriate order | To make the retrieved lineage information easy to scan |
| Computing the lineage information based on the semantic meaning of the feature | To avoid including irrelevant or nonessential source tuples in the |
| Performing no lineage tracing for any health care system feature value computed by an aggregation function | To avoid including irrelevant data in the retrieved lineage information |