| Literature DB >> 30064383 |
Victor A O Carmelo1,2, Lisette J A Kogelman2,3, Majbritt Busk Madsen4, Haja N Kadarmideen5,6.
Abstract
BACKGROUND: Genetic epistasis is an often-overlooked area in the study of the genomics of complex traits. Genome-wide association studies are a useful tool for revealing potential causal genetic variants, but in this context, epistasis is generally ignored. Data complexity and interpretation issues make it difficult to process and interpret epistasis. As the number of interaction grows exponentially with the number of variants, computational limitation is a bottleneck. Gene Network based strategies have been successful in integrating biological data and identifying relevant hub genes and pathways related to complex traits. In this study, epistatic interactions and network-based analysis are combined in the Weighted Interaction SNP hub (WISH) method and implemented in an efficient and easy to use R package.Entities:
Keywords: Complex traits; Epistasis; GWAS; Networks; WGCNA
Mesh:
Year: 2018 PMID: 30064383 PMCID: PMC6069724 DOI: 10.1186/s12859-018-2291-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overview of the package pipeline and workflow. The method only requires phenotypes and genotype data to run. The boxes in red are optional but recommended. The genotype data should be input using the PLINK ped and tped format [14]. Phenotypes can be either continuous or binary. The WISH method can be separated into three overall parts: QC and data filtering, calculation of epistasis and network and module generation. The QC should be similar to a standard GWAS based on call rates and minor allele frequency. An additional step can be done to filter based on LD, which is built into the package. The calculation of epistasis is the most computationally heavy part and is fully parallelized. The network and module construction part is based on converting the epistatic coefficients into correlations and running the WGCNA pipeline, which is integrated into the WISH package
Fig. 2Example of a Pseudo Manhattan plot. Visualizing interactions in a meaningful way is difficult due to the high data dimensionality. One way to solve this is to use summary statistics for each locus instead. Here we sum over the -log likelihoods of all interactions for each variant to give an idea of which variants are most strongly interacting across the genome and color by chromosome
Fig. 3Visualization of pairwise chromosomal interaction strength. Chromosomal interactions are found by calculating the 90th percentile of the –log likelihood of all epistatic interactions between each chromosome pair and then normalizing them to from − 1 (weakest) to 1 (strongest) interactions
Fig. 4Scaling of runtime using multithreading based on 1000, 2000 or 3000 variants and 500 samples using simulated genotypes and phenotypes. We see that the improvement in run time with increased number of threads is not linear, due to increased overhead. In all the different runs the improvement in runtime from 5 to 40 threads is ab out 5-fold. On the other hand, the number of variants has no effect on the speed with about 9000 models per second being calculated using 40 threads across all runs. This is because as with larger data sets the individual threads handle larger data chunks at a time, leading to less overhead