| Literature DB >> 30864308 |
Rui Duan1, Mary Regina Boland, Jason H Moore, Yong Chen.
Abstract
Electronic Health Records (EHR) contain extensive information on various health outcomes and risk factors, and therefore have been broadly used in healthcare research. Integrating EHR data from multiple clinical sites can accelerate knowledge discovery and risk prediction by providing a larger sample size in a more general population which potentially reduces clinical bias and improves estimation and prediction accuracy. To overcome the barrier of patient-level data sharing, distributed algorithms are developed to conduct statistical analyses across multiple sites through sharing only aggregated information. The current distributed algorithm often requires iterative information evaluation and transferring across sites, which can potentially lead to a high communication cost in practical settings. In this study, we propose a privacy-preserving and communication-efficient distributed algorithm for logistic regression without requiring iterative communications across sites. Our simulation study showed our algorithm reached comparative accuracy comparing to the oracle estimator where data are pooled together. We applied our algorithm to an EHR data from the University of Pennsylvania health system to evaluate the risks of fetal loss due to various medication exposures.Entities:
Mesh:
Year: 2019 PMID: 30864308 PMCID: PMC6417819
Source DB: PubMed Journal: Pac Symp Biocomput ISSN: 2335-6928
Demographics of Pregnancies Treated at UPenn Health System
| Demographics | Normal Pregnancy (N=30,810) | Fetal Loss (M=4,763) | P-value |
|---|---|---|---|
| Race | |||
| White | 13911 (45.2%) | 2291 (48.1%) | |
| African American | 12918 (41.9%) | 1871 (39.3%) | |
| Other | 1916 (6.2%) | 274 (5.8%) | |
| Asian | 2065 (6.7%) | 327 (6.9%) | |
| Age | 29.40 | 32.15 | <0.001 |
| Weight (pounds) | 126.26 | 115.43 | <0.001 |
| Body Mass Index | 19.06 | 16.61 | <0.001 |
For race, we only used a binary variable for white versus non-white
Figure 2.Mean square errors (MSE) of ODAL, the pooled and the local estimators under settings A, B, C and D. In setting A (upper left panel), we evenly divide N subjects in to 10 sub-datasets, and increase N from 1000 to 10000. In setting B (upper right panel), each site contains 1000 subjects and the number of sites K is then increased from 2 to 100. In setting C (lower left panel), we generate 10000 subjects, and evenly divide them into K sub-datasets, where K increases from 2 to 100. In setting D (lower right panel), we generate 10000 subjects, and divide them into 10 sub-datasets, where the local dataset has n subjects and the other 9 sub-datasets has the equal number of subjects. We increase n from 100 to 9900.
Figure 3:Odds ratio estimates from the ODAL method (red triangles) and the pooled data (blue circles) for 100 medications and their associations with fetal loss. The 100 medications from left to right are sorted by their prevalence in the population.
Figure 4:Odds ratio estimates from ODAL and the pooled estimator for the top 10 medications positively associated (left panel) and negatively associated (right panel) with fetal loss. On the left panel, the ten medications are misoprostol, acetaminophen codeine, doxycycline hyclate, oxycodone acetaminophen, ibuprofen, levonorgestrel, medroxyprogesterone acetate, etonogestrel ethinyl estradiol, hydrochlorothiazide and norelgestromin eth estradiol. On the right panel, the ten acronyms are referring to prenatal vitamins (without vit. A) with DHA, iron, folic acid and docusate sodium; Prenatal vitamins with Iron fumarate, Folic Acid; Prenatal vitamin with Folic Acid and DHA; DHA; Prenatal vitamins with Iron, Sulfate, and Folic Acid; Prenatal vitamin (without vit. A) with DHA, Folic Acid, Extra Iron and Docusate sodium; Prenatal vitamins; Prenatal multi-vitamin with Folic Acid and minimum Iron; Prenatal vitamins with Iron, Docusate sodium, and Folic Acid; Metoclopramide hcl. The letter on each medication shows the FDA assigned pregnancy category, where A, B and C means of no or unknown risk, D and X means of risk. N means the medication is not assigned a category. Detailed interpretations can be found at https://chemm.nlm.nih.gov/pregnancycategories.htm.
| Algorithm: ODAL |
|---|
| 1. Initial value: obtain |
| 2. Initial communication: transfer |
| 3. For j = 2 to |
| 4. |
| 5. transfer |
| 6. |
| 7. Compute the surrogate likelihood |
| 8. obtain |
| 9. |