| Literature DB >> 35725481 |
Nadav Brandes1, Omer Weissbrod2, Michal Linial3.
Abstract
Genetic studies of human traits have revolutionized our understanding of the variation between individuals, and yet, the genetics of most traits is still poorly understood. In this review, we highlight the major open problems that need to be solved, and by discussing these challenges provide a primer to the field. We cover general issues such as population structure, epistasis and gene-environment interactions, data-related issues such as ancestry diversity and rare genetic variants, and specific challenges related to heritability estimates, genetic association studies, and polygenic risk scores. We emphasize the interconnectedness of these problems and suggest promising avenues to address them.Entities:
Keywords: Causal variants; Complex human traits; Diversity; Epistatis; GWAS; Gene-environment interactions; Genome-wide association studies; GxE; GxG; Heritability; Human phenotypes; Linkage disequilibrium; Missing heritability; Non-additive genetic effects; PRS; Polygenic risk scores; Population structure; Rare variants; Recessive effects; Statistical genetics
Mesh:
Year: 2022 PMID: 35725481 PMCID: PMC9208223 DOI: 10.1186/s13059-022-02697-9
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 17.906
Open problems
| Category | # | Open problem | Brief explanation | Why it is important | Related open problems | Selected references |
|---|---|---|---|---|---|---|
| Genetic studies are confounded by the ancestries of participants. Mounting evidence points towards residual population structure not accounted for, while overcorrection can obscure genuine genetic signal. | Without resolving this, it will be difficult to trust the results of genetic studies. | 4, 6, 7, 12, 16 | [ | |||
| The assumption that phenotypes can be approximated by summing separate genetic effects is ubiquitous in genetic studies. If incorrect, this could undermine many results. Also, how do we identify and quantify epistatic effects? | Our ultimate goal is an accurate genetic model of human traits, linear or not. | 11, 14 | ||||
| Genetic effects may be contingent on environmental conditions. Such interactions are difficult to discover, and their overall contribution to phenotypic variance is not clear. Substantial GxE interactions would also undermine many methods. | GxE interactions are potentially an important piece in the genetic puzzle, which can highlight the mechanism of genetic associations and inform interventions. | 11, 13 | [ | |||
| Most genetic studies of complex traits deal only with common variants, even though the strongest effects are expected in rare variants. In aggregate, they may contribute substantially to heritability. Key challenges are lack of statistical power and genotyping. | Rare variants may be important to many complex traits. Neglecting them would leave us with an incomplete understanding of the genetic variation underlying these traits. | 1, 5, 11 | ||||
| Routine pipelines are optimized for simple variants (i.e., single-nucleotide variants and small indels), while commonly overlooking more complex genetic variation, including structural variants, copy number variation, repetitive regions and variants on the X, Y or MT chromosomes. | These types of variants contribute substantially to many traits. | 4, 11 | ||||
| Family-based study designs naturally overcome many challenges of cohort studies, specifically with respect to population structure, environmental biases, and direct vs. indirect genetic effects. However, family-based genetic resources are scarce, and there are not enough methods to analyze them. | Family-based genetic data could play an important role in studying genetic effects, especially when causality is sought. | 1, 12 | [ | |||
| Individuals of non-European ancestry are heavily underrepresented in genetic datasets, leading to inequality in access to medical knowledge. More diversity would also help deal with population structure and establish the causality of genetic associations. | We are interested in understanding the genetics of all of humanity, and we cannot afford to discard such a powerful tool. | 1, 12, 16 | [ | |||
| Studied traits are often not entirely well defined, and there is often a lot of noise in the phenotyping process (mostly with respect to binary phenotypes). | Noisy and biased data hinders our progress. | [ | ||||
| Genetic associations may reflect people’s decision to participate rather than the studied phenotype. | This is potentially a major source of bias. | |||||
| It is not entirely clear what the “correct” way to define and measure heritability is and how heritability estimates should be interpreted. For example, do they provide an upper bound on the predictive power of polygenic risk scores? | Heritability estimates provide a lot of insight and guide our progress, and they could be even more useful if we reached a consensus on what they mean exactly. | 11, 14 | ||||
| This is a classic problem, asking why detected associations explain only a small part of the heritability in most complex traits, and why there is a large gap between heritability estimates obtained from SNP-based and twin-based methods. Despite a lot of progress in suggesting solutions and collecting evidence, the problem is still not fully resolved. | As long as this is not fully resolved, there are lingering doubts that our understanding of genetic effects is flawed in some fundamental way. | 2, 3, 4, 5, 10, 14 | [ | |||
| Most genetic associations implicate entire genomic regions, and it is considered a hard problem to pinpoint the exact causal variants. It is also important to rule out confounding and other statistical biases. | If we want to learn from genetic associations, we need to be able to detect causal variants and genes. | 1, 6, 7 | [ | |||
| Even after the causality of genetic elements is established, understanding the molecular mechanisms behind them is a grand challenge. To date, only a very small fraction of genetic discoveries are understood at that level. | Without understanding the mechanism of genetic associations, they provide only limited biological and medical insight. | 3 | ||||
| Our ability to make accurate phenotypic predictions from genetic data is still very limited, even in highly heritable traits. Other than increasing sample sizes, we do not have very effective strategies to improve predictions. | Accurate genotype-to-phenotype predictions have an enormous clinical potential. | 2, 10, 11, 15 | [ | |||
| The use of polygenic risk scores in the clinics remains quite limited. To be clinically useful, predictive models need to be proven robust and reliable. | If successfully implemented in the clinics, these models have the potential to revolutionize healthcare and usher in the era of personalized medicine. | 14, 16 | [ | |||
| Polygenic risk scores trained in one setting generally do not generalize well to other settings, including different ancestries or genotyping technologies. | This is critical for ensuring the robustness of these models and allow them to be used in the clinics, and for their fruits to benefit all groups. | 1, 7, 15 |
Fig. 1Population structure confounds human genetic studies. A The population that an individual is born into influences their genetics and their environment, which are the two components affecting traits. As a result, genetic associations with human traits are confounded by population structure. B Even when considering a specific human group and controlling for the major axes of genetic variation in a cohort, the allele frequency of some variants can still vary across populations and exhibit clear geographic patterns, a problem known as “residual population structure”
Fig. 2Identifying causal variants in the presence of linkage disequilibrium. A A single causal variant is in linkage disequilibrium with other nearby variants. As a result, variants that are correlated with the causal variant also obtain significant p-values even though they are not causal. B Combining GWAS summary statistics from three different ancestry groups, each exhibiting a different linkage disequilibrium pattern, to fine-map the results. By assuming that only one of the variants is causal, it can be recovered with high confidence
Fig. 3Estimating heritability. Common methods for estimating the heritability of human traits. A In twin studies, heritability is estimated by the degree to which monozygotic (identical) twins are more phenotypically similar to each other than dizygotic (non-identical) twins. B In GREML, heritability is estimated by comparing genetic and phenotypic similarities across pairs of unrelated individuals. C In family-based methods, given a pair of individuals and their parents, the degree to which they are more genetically similar than would be expected from their parents can be compared to their phenotypic similarity to estimate the heritability of the trait