| Literature DB >> 34697394 |
Xinpeng Shen1, Sisi Ma1,2, Prashanthi Vemuri3, M Regina Castro4, Pedro J Caraballo5, Gyorgy J Simon6,7.
Abstract
Modern AI-based clinical decision support models owe their success in part to the very large number of predictors they use. Safe and robust decision support, especially for intervention planning, requires causal, not associative, relationships. Traditional methods of causal discovery, clinical trials and extracting biochemical pathways, are resource intensive and may not scale up to the number and complexity of relationships sufficient for precision treatment planning. Computational causal structure discovery (CSD) from electronic health records (EHR) data can represent a solution, however, current CSD methods fall short on EHR data. This paper presents a CSD method tailored to the EHR data. The application of the proposed methodology was demonstrated on type-2 diabetes mellitus. A large EHR dataset from Mayo Clinic was used as development cohort, and another large dataset from an independent health system, M Health Fairview, as external validation cohort. The proposed method achieved very high recall (.95) and substantially higher precision than the general-purpose methods (.84 versus .29, and .55). The causal relationships extracted from the development and external validation cohorts had a high (81%) overlap. Due to the adaptations to EHR data, the proposed method is more suitable for use in clinical decision support than the general-purpose methods.Entities:
Mesh:
Year: 2021 PMID: 34697394 PMCID: PMC8546093 DOI: 10.1038/s41598-021-99990-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Study design and evaluations. (A) Overview of the study design for Mayo Clinic (MC) EHR. (B) The workflow of the internal evaluation. Three methods FGES + raw, FGES + transf, and the proposed algorithm were compared using stability, precision, and recall. Orange color highlights the proposed method (Method 3). (C) The workflow of external comparison. The proposed method was applied to two datasets, MC and M Health Fairview (FV), and the resulting graphs were compared.
Characteristics of the MC and FV data sets.
| MC (N = 57,332) | FV (N = 79,486) | P-value | |||
|---|---|---|---|---|---|
| Events in window 1 | New events in window 2 | Events in window 1 | New events in window 2 | ||
| Age | 48.1 (18.2) | 50.4 (14.6) | 0.000 | ||
| Male | 0.43 | 0.34 | 0.000 | ||
| Ethnicity white | 0.92 | 0.93 | 0.000 | ||
| BMI ≥ 25 and < 30 | 27.1 | 2.9 | 27.5 | 3.4 | 0.097 |
| BMI ≥ 30 | 32.6 | 3.6 | 43.1 | 4.9 | 0.000 |
| SBP ≥ 140 | 10.3 | 3.4 | 4.5 | 2.9 | 0.000 |
| DBP ≥ 90 | 2.3 | 1.0 | 1.6 | 1.2 | 0.000 |
| LDL ≥ 130 | 18.4 | 3.6 | 15.4 | 4.3 | 0.000 |
| HDL abnormal | 20.2 | 1.7 | 24.6 | 3.0 | 0.000 |
| Trigl ≥ 150 | 22.6 | 3.7 | 17.6 | 4.3 | 0.000 |
| FPG ≥ 100 and < 125 | 24.4 | 7.2 | |||
| FPG ≥ 125 | 11.9 | 3.7 | |||
| A1c ≥ 5.7 and A1c < 6.5 | 6.8 | 0.6 | |||
| A1c ≥ 6.5 | 7.0 | 0.9 | |||
| Hypertension (HTN) | 28.4 | 5.6 | 30.6 | 8.4 | 0.000 |
| Obesity (Ob) | 11.5 | 1.2 | 11.3 | 1.3 | 0.320 |
| Hyperlipidemia (HL) | 31.9 | 8.3 | 36.4 | 9.4 | 0.000 |
| Pre-diabetes mellitus (predm) | 0.9 | 3.5 | 2.4 | 2.4 | 0.000 |
| Diabetes mellitus (DM) | 7.9 | 5.1 | 9.5 | 4.3 | 0.000 |
| Chronic renal failure (CRF) | 1.2 | 0.2 | 0.2 | 0.3 | 0.000 |
| Congestive heart failure (CHF) | 2.4 | 1.7 | 1.2 | 1.4 | 0.000 |
| Coronary artery disease (CAD) | 9.4 | 3.5 | 5.6 | 3.4 | 0.000 |
| Myocardial infarction (MI) | 2.4 | 1.2 | 0.9 | 1.6 | 0.000 |
| Cerebrovascular disease (CeVD) | 3.6 | 2.3 | 1.8 | 1.4 | 0.000 |
| Stroke | 1.2 | 1.1 | 0.6 | 1.0 | 0.000 |
| Hypertension | 20.6 | 8.3 | 31.5 | 13.9 | 0.000 |
| Hyperlipidemia | 15.7 | 8.0 | 24.6 | 9.1 | 0.000 |
| Diabetes mellitus | 4.4 | 2.4 | 7.2 | 4.3 | 0.000 |
For age, mean (sd) is indicated; for the disease-related events, percentage (%) of positive is indicated. New events rate at the second time windows is reported.
BMI: Body mass index; SBP: systolic blood pressure; DBP: diastolic blood pressure, Trigl: triglycerides, FPG: fasting plasma glucose; A1c: hemoglobin A1c.
Directional stability.
| Method | Number of distinct edges | Ambiguously oriented (%) |
|---|---|---|
| FGES + raw | 125 | 45 |
| FGES + transf | 75 | 24 |
| Proposed | 64 | 0 |
The table shows the number of distinct edges that appeared in half of the 1000 bootstrap replications, and the percentage of ambiguously oriented edges.
Metrics from clinical evidence.
| Precision | Associative recall | Causal recall | |
|---|---|---|---|
| 1. FGES + raw | 0.294 | 1.000 | 1.000 |
| 2. FGES + transf | 0.549 | 0.985 | 1.000 |
| 3. Proposed | 0.838 | 1.000 | 0.947 |
Figure 2Causal graph discovered by the proposed method. The ‘.dx’ suffix indicates diagnosis of the disease. The abbreviations of the diseases and lab results can be found in Table 1.
External validation.
| Edge | Discovery % | Reason | |
|---|---|---|---|
| MC | FV | ||
| HDL → Trigl | 0 | 91.7 | There is no clear precedence relationship, the two events often coincide |
| HTN.dx → CRF | 88.5 | 0.1 | |
| Trigl → DM.dx | 100 | 0 | |
| Trigl → HL.tx | 100 | 0 | |
| LDL → HL.dx | 72.1 | 0 | |
| FPG.125 → DM.dx | 100 | 0 | FV uses A1c, not FPG |
| Trigl → FPG.125 | 99.5 | 0.2 | |
| DBP → HTN.tx | 91.5 | 0 | The criteria for diagnosis and treatment are institution specific |
| SBP → HL.tx | 99.3 | 1.7 | |
| SBP → HTN.tx | 100 | 29.1 | |
| Trigl → HTN.tx | 83.7 | 0 | |
| CHF → MI | 0 | 67.6 | SBP is a common cause for CHF and MI, but at FV, this effect was too weak in 68% of the bootstrap iterations |
| HL.dx → Trigl | 0 | 87.6 | While the main driver of Trigl is BMI, at FV, the diagnosis of HL helps explain the variation in Trigl |
| HL.tx → CAD | 0 | 74.3 | LDL drives both HL treatment and CAD |
The ‘.tx’ suffix denotes the treatment, and ‘.dx’ suffix denotes the diagnosis of the disease. The abbreviations of the diseases and lab tests can be found in Table 1. The table describes the edges that were discordant between the Mayo Clinic (MC) and M Health Fairview (FV) data sets. It shows the percentage of the bootstrap iterations in which the edge was discovered at MC and FV and a brief reason for the discrepancy.