| Literature DB >> 34831407 |
Nikolay V Kondratyev1, Margarita V Alfimova1, Arkadiy K Golov1,2, Vera E Golimbet1.
Abstract
Scientifically interesting as well as practically important phenotypes often belong to the realm of complex traits. To the extent that these traits are hereditary, they are usually 'highly polygenic'. The study of such traits presents a challenge for researchers, as the complex genetic architecture of such traits makes it nearly impossible to utilise many of the usual methods of reverse genetics, which often focus on specific genes. In recent years, thousands of genome-wide association studies (GWAS) were undertaken to explore the relationships between complex traits and a large number of genetic factors, most of which are characterised by tiny effects. In this review, we aim to familiarise 'wet biologists' with approaches for the interpretation of GWAS results, to clarify some issues that may seem counterintuitive and to assess the possibility of using GWAS results in experiments on various complex traits.Entities:
Keywords: GWAS; complex traits; polygenic scores
Mesh:
Year: 2021 PMID: 34831407 PMCID: PMC8623533 DOI: 10.3390/cells10113184
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1Main concepts discussed in the text. (a) GWAS. The scheme depicts the Manhattan plot, a main visualisation of GWAS results. Manhattan plot shows distribution of observed p-levels of individual association tests across genomic positions (represented as dots). Manhattan plot allows to quickly assess how many associations pass the genomic significance threshold (dashed line). Inset depicts the corresponding quantile-quantile (Q-Q) plot which shows the distribution of observed versus expected p-levels. (b) LDSC. The scheme shows differences in two GWAS experiments both with population bias and only one has true genetic effects (turquoise). The Q-Q plots for both experiments are depicted as insets to illustrate that they look the same. LD score is a sum of correlations between tested SNPs for a given SNP. Chi-squared is a measure of effect for a given SNP, modelled as a random variable. The slope of the regression is proportional to heritability (h) and the intercept (a) is proportional to bias. (c) TWAS. The scheme depicts the Manhattan plot with gene-based associations. The inset shows how an individual association is produced with external eQTL data which are used to predict gene expression by genotypes in GWAS data. (d) PRS. The scheme demonstrates a typical situation in GWAS when polygenic scores calculated with relaxed genomic thresholds perform better than with strict threshold. Three ways of selecting associations from the same GWAS for PRS calculation are presented (left). ROC curves (right) represent predictive models based on PRS, calculated using significant (index) associations only (salmon), with relaxed threshold (olive) and all SNPs (‘omnigenic’, lilac). (e) PheWAS. Association tests of a broad range of phenotypes grouped by similarity for a specific genotype are depicted. The typical situation where similar phenotypes have similar degree of association is depicted. (f) Mendelian randomisation. The scheme shows the experiment of studying the possible causal relationship (an arrow, marked with the question mark) between exposure (E) and outcome (O) with instrumental variable (IV) and possible unknown confounders (U). Numbered are conditions for IV to be valid, stop signs symbolise forbidden relations.
Figure 2Published studies with available summary statistics that are included in the GWAS Catalog. (a) Number of studies added to the GWAS Catalog by year. (b) Sample sizes in the GWAS Catalog over time. Some studies mentioned in the text are highlighted. The data were accessed through the GWAS Catalog site on 21 September 2021.