| Literature DB >> 26733045 |
Scott D Constable, Yuzhe Tang, Shuang Wang, Xiaoqian Jiang, Steve Chapin.
Abstract
BACKGROUND: The biomedical community benefits from the increasing availability of genomic data to support meaningful scientific research, e.g., Genome-Wide Association Studies (GWAS). However, high quality GWAS usually requires a large amount of samples, which can grow beyond the capability of a single institution. Federated genomic data analysis holds the promise of enabling cross-institution collaboration for effective GWAS, but it raises concerns about patient privacy and medical information confidentiality (as data are being exchanged across institutional boundaries), which becomes an inhibiting factor for the practical use.Entities:
Mesh:
Year: 2015 PMID: 26733045 PMCID: PMC4699163 DOI: 10.1186/1472-6947-15-S5-S2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Figure 1The system workflow.
Figure 2PCF function to compute minor allele frequency. Here × and y are the inputs for one genotype read in from Alice and Bob's data, respectively. The low-order 16 bits of each input correspond to a control group, whereas the high-order 16 bits correspond to a case group. We begin by computing the aggregated count of the lexicographically lower (e.g. 'A' < 'G') nucleotide for each group across both parties. lowCt denotes this count for the aggregated control group, and likewise highCt for the control group. The lexicographically high counts are simply the total counts minus the low ones. We then decide which count is lower, and thus must represent the minor allele. Finally we perform a floating point adjustment, divide to obtain the frequency, and output the case and control MAFs to the terminal.
Figure 3Computing . obs[0] and obs[1] are the case group's low and high allele counts, respectively, with obs[2] and obs[3] as the respective counts for the control group. exp[0 ... 3] are the respective expected counts for each allele in each group. The "for" loop computes the χ2 statistic according to Equation 1.
Distribution across institutions for the sample data.
| Institution 1 | Institution 2 | |
|---|---|---|
| Case | 100 | 100 |
| Control | 100 | 100 |
Figure 4Execution time for evaluating MAF and .
Figure 5Total bytes transferred by each party, with respect to the number of individuals in the merged datasets.