| Literature DB >> 35579813 |
Jenna Wong1, Daniel Prieto-Alhambra2,3, Peter R Rijnbeek3, Rishi J Desai4, Jenna M Reps5, Sengwee Toh6.
Abstract
Increasing availability of electronic health databases capturing real-world experiences with medical products has garnered much interest in their use for pharmacoepidemiologic and pharmacovigilance studies. The traditional practice of having numerous groups use single databases to accomplish similar tasks and address common questions about medical products can be made more efficient through well-coordinated multi-database studies, greatly facilitated through distributed data network (DDN) architectures. Access to larger amounts of electronic health data within DDNs has created a growing interest in using data-adaptive machine learning (ML) techniques that can automatically model complex associations in high-dimensional data with minimal human guidance. However, the siloed storage and diverse nature of the databases in DDNs create unique challenges for using ML. In this paper, we discuss opportunities, challenges, and considerations for applying ML in DDNs for pharmacoepidemiologic and pharmacovigilance studies. We first discuss major types of activities performed by DDNs and how ML may be used. Next, we discuss practical data-related factors influencing how DDNs work in practice. We then combine these discussions and jointly consider how opportunities for ML are affected by practical data-related factors for DDNs, leading to several challenges. We present different approaches for addressing these challenges and highlight efforts that real-world DDNs have taken or are currently taking to help mitigate them. Despite these challenges, the time is ripe for the emerging interest to use ML in DDNs, and the utility of these data-adaptive modeling techniques in pharmacoepidemiologic and pharmacovigilance studies will likely continue to increase in the coming years.Entities:
Mesh:
Year: 2022 PMID: 35579813 PMCID: PMC9112258 DOI: 10.1007/s40264-022-01158-3
Source DB: PubMed Journal: Drug Saf ISSN: 0114-5916 Impact factor: 5.228
Fig. 1Key domains of activities performed by distributed data networks conducting studies in pharmacoepidemiology and pharmacovigilance. On the left, a schematic of a generic distributed data network is shown, where data partners do not pool their databases and instead maintain full control over the use and sharing of their data with the analysis center. The gray rectangles represent key domains of activities performed by distributed data networks that conduct studies in pharmacoepidemiology and pharmacovigilance
Fig. 2Practical data-related factors of distributed data networks. These three data-related factors influence how distributed data networks operate in practice. For each factor, distributed data networks may theoretically fall anywhere along the spectrum between the two extremes
Four select scenarios of distributed data networks
| Scenario | Modality of source data | Degree of data standardization | Granularity of shared data |
|---|---|---|---|
| 1 (base case) | Structured data only | Common data model for all inputs | Individual-level data for all sites |
| 2 (less standardized data available) | Structured data only | Individual-level data for all sites | |
| 3 (more complex data modalities used) | Common data model for all inputs | Individual-level data for all sites | |
| 4 (less granular data shared) | Structured data only | Common data model for all inputs |
Scenario 1 represents the most simple and straightforward case for applying machine learning in distributed data networks. Scenarios 2–4 each deviate from the base case with respect to characteristics under one of the three practical data-related factors, as indicated in bold
Select studies involving the use of machine learning in distributed data networks
| Objective | Study setting | Use of machine learning | Main findings |
|---|---|---|---|
| To compare the effectiveness and safety of three bariatric procedures: RYGB, SG, and AGB [ | 41 US health systems in the National Patient-Centered Clinical Research Network ( | • LASSO used to simultaneously select features and estimate parameters for propensity score models • As de-identified individual-level datasets were shared with the analysis center, site-specific effects of covariates on propensity scores were allowed by including interactions between site and all covariates in the feature selection process for propensity score models • Propensity score deciles used for confounding adjustment in the association of bariatric procedure type with the study outcomes | • RYGB associated with greater weight loss than SG or AGB at 1-year, 3 years, and 5-years post-procedure, but RYGB had the highest 30-day rate of major adverse events • At 5 years, estimated percent total weight loss for RYGB patients was 6.7 (95% CI 5.8–7.7) percentage points greater than SG patients and 13.9 (95% CI 12.4–15.4) percentage points greater than AGB patients • 30-day rate of major adverse events was 5.0% for RYGB vs 2.6% for SG patients (OR 1.57, 95% CI 1.40–1.77) and 2.9% for AGB patients (OR 1.66, 95% CI 1.28–2.16) |
| To develop and validate a phenotyping algorithm for anaphylaxis [ | KPWA ( | • 5 machine learning algorithms (logistic regression, elastic net, BART, feed-forward neural network and boosted trees) used to predict the probability of being an anaphylaxis case • Ensemble learner containing a weighted combination of the machine learning algorithms also considered • Candidate features manually curated from structured EHR data and unstructured clinical text • 3 feature selection approaches explored • 3 feature sets explored to determine the added value of including features derived from unstructured text • All phenotyping algorithms developed and internally validated at KPWA; transported and externally validated at KPNW | • Adding features derived from unstructured clinical text improved the performance of phenotyping algorithms compared with using features from structured data alone • At KPWA, BART based on features from structured data and clinical text, selected using LASSO, achieved the best performance (cross-validated AUC: 0.71) • BART based on features from structured data and clinical text with no feature selection generalized best to KPNW (cross-validated AUC: 0.70 at KPWA, 0.67 at KPNW) • Manual curation of NLP-derived features was extremely labor intensive; future work will explore semi-automated approaches for curating features from clinical text |
| To develop and validate a prognostic model predicting risk of hemorrhagic transformation within 30 days of an initial acute ischemic stroke [ | 11 databases from three countries (USA, Germany, and Japan) within the OHDSI network ( | • LASSO used to simultaneously select features and estimate parameters to predict risk of hemorrhagic transformation within 30 days of initial acute ischemic stroke • Candidate features created from structured EHR data within 3 lookback periods prior to the index date • Prognostic model developed and internally validated in 1 database; externally validated in each of the remaining 10 databases | • Of 169,967 candidate predictors considered, 612 selected for the final model • In the development database ( • Across the remaining 10 databases ( |
| To develop and validate a machine learning model to predict cause of death (within 60 days) from a patient’s last medical check-up [ | 2 databases from South Korea with cause-of-death data ( | • Two-level stacking ensemble used to predict cause of death within 60 days, which consisted of a meta-learner that used the outputs from a collection of base learners as inputs to make the final prediction • The base learners consisted of 2 machine learning algorithms (LASSO and gradient boosting machine) that predicted each of 9 outcomes (mortality status and 8 causes of death), for a total of 18 base learners • Candidate features created from claims data within 3 lookback periods prior to the index date • Stacking ensemble developed and internally validated in 1 South Korean database (claims); externally validated in the other South Korean database (EHR) • Stacking ensemble also used to impute cause of death in the 3 US databases | • In the development database ( • In 1 US database with mortality status, the AUC of the 2 base learners predicting mortality status were both 0.98, but the top 3 causes of death imputed by the stacking ensemble differed from the known top-ranked causes of mortality in the USA; these discrepancies suspected to be at least partly attributable to differences between the countries in the definition of heart disease death. |
| To develop and validate a prognostic model predicting 1-year risk of incident heart failure in patients with type 2 diabetes mellitus initiating a second pharmacotherapy for type 2 diabetes [ | 5 US databases ( | • LASSO used to simultaneously select features and estimate parameters to predict 1-year risk of incident heart failure • 2 feature sets evaluated (age and sex, all features) • Each database developed and internally validated 2 prognostic models (1 per feature set); remaining 4 databases externally validated the site-specific models | • Internal validation of site-specific models had an AUC range from 0.64 to 0.71 for baseline (age and sex) models and an AUC range from 0.73 to 0.81 for full models • Among full models, external validation of 3 site-specific models consistently achieved comparable performance across all other databases • Using a heatmap to visualize the internal and external performance of the site-specific models across all databases offers valuable insights |
| To predict clinical outcomes in patients with COVID-19 who present to the emergency department [ | 20 clinical institutes from various regions around the world ( | • Federated learning used to train a deep learning model to predict the EXAM (electronic medical record chest X-ray AI model) risk score, a continuous value from 0 to 1, with higher values denoting greater oxygen requirements • Model inputs included 20 features, 19 derived from the electronic medical record data and 1 chest x-ray image, where electronic medical record and image data were concatenated into a single high-dimensional feature vector • Minimal efforts directed at harmonizing data across sites • Global and local models trained; evaluated on held-out test data at each site | • For predicting 24-hour oxygen treatment, global model outperformed all local models: average AUC for the global vs locally trained models was 0.92 vs 0.80 (16% improvement), and AUC for global model also provided an average increase in generalizability of 38% compared with AUC for locally trained models • At the largest site, global model achieved sensitivity of 0.95 and specificity of 0.88 for predicting mechanical ventilation treatment or death at 24 hours |
AGB adjustable gastric banding, AUC area under the receiver operating curve (possible values from 0 to 1, values closer to 1 indicate better performance), BART Bayesian additive regression trees, CI confidence interval, EHR electronic health record, KPNW Kaiser Permanente Northwest, KPWA Kaiser Permanente Washington, LASSO Least Absolute Shrinkage and Selection Operator, NLP natural language processing, OHDSI Observational Health Data Sciences and Informatics, OR odds ratio, RYGB Roux-en-Y gastric bypass, SG sleeve gastrectomy
| Many opportunities exist for distributed data networks (DDNs) to use machine learning in pharmacoepidemiologic and pharmacovigilance studies; however, the practical data-related characteristics of DDNs also create unique challenges for applying machine learning. |
| In this review, we discuss various challenges that DDNs face when applying machine learning and present different approaches for addressing these challenges, including issues for consideration and examples of how real-world DDNs have addressed or are working to help mitigate these challenges. |
| The use of machine learning in DDNs is an emerging area of interest that holds much promise, and the utility of these data-adaptive modeling methods for enhancing pharmacoepidemiologic and pharmacovigilance studies will likely continue to increase in the coming years. |