| Literature DB >> 28572842 |
Jason H Moore1, Peter C Andrews1, Randal S Olson1, Sarah E Carlson2, Curt R Larock2, Mario J Bulhoes2, James P O'Connor2, Ellen M Greytak3, Steven L Armentrout2.
Abstract
BACKGROUND: Large-scale genetic studies of common human diseases have focused almost exclusively on the independent main effects of single-nucleotide polymorphisms (SNPs) on disease susceptibility. These studies have had some success, but much of the genetic architecture of common disease remains unexplained. Attention is now turning to detecting SNPs that impact disease susceptibility in the context of other genetic factors and environmental exposures. These context-dependent genetic effects can manifest themselves as non-additive interactions, which are more challenging to model using parametric statistical approaches. The dimensionality that results from a multitude of genotype combinations, which results from considering many SNPs simultaneously, renders these approaches underpowered. We previously developed the multifactor dimensionality reduction (MDR) approach as a nonparametric and genetic model-free machine learning alternative. Approaches such as MDR can improve the power to detect gene-gene interactions but are limited in their ability to exhaustively consider SNP combinations in genome-wide association studies (GWAS), due to the combinatorial explosion of the search space. We introduce here a stochastic search algorithm called Crush for the application of MDR to modeling high-order gene-gene interactions in genome-wide data. The Crush-MDR approach uses expert knowledge to guide probabilistic searches within a framework that capitalizes on the use of biological knowledge to filter gene sets prior to analysis. Here we evaluated the ability of Crush-MDR to detect hierarchical sets of interacting SNPs using a biology-based simulation strategy that assumes non-additive interactions within genes and additivity in genetic effects between sets of genes within a biochemical pathway.Entities:
Keywords: Bioinformatics; Common diseases; Epistasis; Genome-wide Association study; Machine learning
Year: 2017 PMID: 28572842 PMCID: PMC5450417 DOI: 10.1186/s13040-017-0139-3
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 1Heatmaps summarizing the results of the simulation study. The dark blue shading indicates a detection success rate of 80% or greater for each particular heritability, minor allele frequency (MAF; 0.2 and 0,4), total number of SNPs (100 and 1000), size of the embedded ‘target’ models (i.e. the number of SNPs they contain; 2 and 4), sample size (2000 and 8000), and trait standard deviation combination (0.05, 0.1, 0.2, and 0.3). Panel a represents the described Crush-MDR search. Panel b represents a random search with the same number of evaluations
Fig. 2MDR t-statistic (x-axis) vs. Cartesian entropy (y-axis) results for the Crush-MDR multiobjective optimization analysis of normalized hippocampal volume in the ADNI dataset, as visualized in the Crush-MDR visualization module. Models on the Pareto front are shown as pink points, and all other models explored by Crush-MDR in the run are shown as gray points
Fig. 3Associations between genotype and phenotype for the top-scoring (a) three-factor and (b) two-factor models on the Crush-MDR Pareto front. Each cell shows one genotype combination and the average phenotypic difference from the mean for subjects with those genotypes. Wider bars indicate a larger number of subjects with those genotypes. Dark gray cells indicate genotypes associated with higher-than-average hippocampal volume (HV). Light gray cells indicate lower-than-average HV, and white cells represent genotypes with no subjects. Genotypes are coded as 0/1/2 according to the number of copies of the major allele
Fig. 4a IMP and b GIANT neuron analysis of the 19 genes found on the Pareto front of the Crush-MDR analysis of normalized hippocampal volume in the ADNI dataset. The minimum relationship confidence in each figure is 0.2