| Literature DB >> 36124808 |
Maura John1,2, Markus J Ankenbrand3, Carolin Artmann3, Jan A Freudenthal3, Arthur Korte3, Dominik G Grimm1,2,4.
Abstract
MOTIVATION: Genome-wide association studies (GWAS) are an integral tool for studying the architecture of complex genotype and phenotype relationships. Linear mixed models (LMMs) are commonly used to detect associations between genetic markers and a trait of interest, while at the same time allowing to account for population structure and cryptic relatedness. Assumptions of LMMs include a normal distribution of the residuals and that the genetic markers are independent and identically distributed-both assumptions are often violated in real data. Permutation-based methods can help to overcome some of these limitations and provide more realistic thresholds for the discovery of true associations. Still, in practice, they are rarely implemented due to the high computational complexity.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36124808 PMCID: PMC9486594 DOI: 10.1093/bioinformatics/btac455
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Schematic illustration of matrices and tensors of the permGWAS architecture. (A) Commonly used matrix representation when computing sequential univariate tests, where is the phenotypic vector for n samples and denotes the matrix of fixed effects, including a column of ones for the intercept, the covariates and the jth SNP . (B) 3D-tensor representation of a LMM to compute univariate tests batch-wise. The phenotype is represented as a 3D tensor containing b copies of the phenotype vector and is a 3D tensor containing the matrices to . (C) 4D-tensor representation of a permutation-based batch-wise LMM. The phenotype is represented as a 4D tensor containing for each permutation the 3D tensor for all q permutations and is a 4D tensor containing q copies of
Fig. 2.Runtime comparison of permGWAS versus EMMAX and FaST-LMM. Note that all axes are log-scaled. (A) Computational time as function of number of SNPs with fixed number of 1000 samples. (B) Computational time as function of number of samples with 106 markers each. (C) Computational time as function of number of permutations with 1000 samples and 106 markers each. Dashed lines for EMMAX and FaST-LMM are estimated based on the computational time for 1000 samples and 106 markers times the number of permutations
Fig. 3.Simulated phenotypes with gamma-distributed noise. Shape parameters of the gamma distributions were set at 4, 3, 2, 1 and 0.1. (A) Shape of the gamma distribution. (B) Exemplary phenotypic value distribution for each shape parameter. (C) Permutation-based thresholds over 50 simulated phenotypes as box plots for each gamma shape parameter. Red dashed line illustrates the fixed Bonferroni significance threshold. (D) Phenotype-wise FDR for both the fixed Bonferroni significance threshold and the permutation-based significance threshold
Fig. 4.GWAS of three different A.thaliana phenotypes. Manhattan plots display the associations of all markers for the three phenotypes (A) #744 (https://arapheno.1001genomes.org/phenotype/744/), which is nearly normal distributed, (B) #118 (https://arapheno.1001genomes.org/phenotype/118/) and (C) #325 (https://arapheno.1001genomes.org/phenotype/325/). Where the two latter phenotypes are non-normally distributed. The Bonferroni threshold is denoted by a red horizontal dashed line and the respective permutation-based threshold by a horizontal blue line