Literature DB >> 19591663

Exploring the unknown: assumptions about allelic architecture and strategies for susceptibility variant discovery.

Abstract

Identification of common-variant associations for many common disorders has been highly effective, but the loci detected so far typically explain only a small proportion of the genetic predisposition to disease. Extending explained genetic variance is one of the major near-term goals of human genetic research. Next-generation sequencing technologies offer great promise, but optimal strategies for their deployment remain uncertain, not least because we lack a clear view of the characteristics of the variants being sought. Here, I discuss what can and cannot be inferred about complex trait disease architecture from the information currently available and review the implications for future research strategies.

Entities: Disease Gene Species

Year: 2009 PMID： 19591663 PMCID： PMC2717392 DOI： 10.1186/gm66

Source DB: PubMed Journal: Genome Med ISSN： 1756-994X Impact factor: 11.117

Genome-wide association (GWA) analysis has provided the first effective strategy to allow a systematic dissection of the genetic basis of common, complex, multifactorial traits [1,2]. Several hundred loci have been identified to stringent levels of significance [3]. Although for many of these we remain some distance from a complete enumeration of causal mechanisms, there have already been substantial advances in understanding of disease - the role of autophagy in inflammatory bowel disease [4] and cell adhesion in autism [5,6], for instance. However, for most common traits the proportion of the overall phenotypic variance explained remains small, limiting the extent to which prediction of individual disease risk is possible. There is growing speculation about the mechanisms that might account for the substantial proportion of trait heritability that remains to be characterized [7]. This speculation has repercussions well beyond recondite theoretical discussion about the genetic architecture of complex traits. With advances in technology (particularly next-generation sequencing) and growing enthusiasm for funding large-scale gene discovery efforts, hypotheses about the nature of this so-called 'genetic dark matter' [7] have a direct bearing on research strategies. Recently, this debate has seemed increasingly polarized between those who feel a continued search for common susceptibility variants is of limited value, because all that remains to be found are variants of vanishingly small effect [8], and those who feel that, pending reductions in costs that will allow high-quality, whole-genome sequence data to be generated in adequately powered sample sizes, there is virtue in persisting with an approach of proven worth [9]. There is good reason to assume that this 'dark matter' is neither an illusion created by inflated estimates of heritability nor the consequence of marked non-additivity of effects [10,11]. If so, then the sum total of genetic variance should largely be explicable in terms of the main effects of all the risk alleles of various types (single nucleotide polymorphisms, indels, copy number variants (CNVs) and inversions), allele frequencies (rare, low-frequency and common) and effect sizes. So far, the only parts of this 'space' explored systematically are those occupied by rare, penetrant alleles (principally through linkage analysis of monogenic phenotypes) and common, mostly low-effect alleles (accessible through GWA analysis). As we seek to make sensible decisions about the direction of future discovery efforts - in terms of the characteristics of the variants we are seeking and the technologies we should use to find them - we need to understand what the exploration of the 'known' genetic landscape can tell us about the parts that remain largely uncharted.

Contrasting views of the genetic landscape

One long-standing view is that complex trait susceptibility is predominantly a matter of common variants [12]. Common variants collectively account for most individual variation in DNA sequence, and the same might be expected to be the case for phenotypic variation. If true, the results of GWA studies so far indicate that most of the as-yet-undiscovered variants must (in Europeans at least) have very small effects, because the high coverage and large sample sizes used will have left few, if any, large common-variant effects undiscovered. Evidence (for example, from large-scale meta-analyses [13]) is, for many traits, consistent with the notion of a long 'polygenic tail' of small effects, but it remains unclear how much of overall heritability can be explained under this model. The idea that complex-trait susceptibility involves a very large number of variants of modest effect has led some to suggest that the value of all such discoveries is diminished, on the basis that one learns little about the biology of disease if too many genes are implicated [8]. However, for many phenotypes, the overall salience of the loci of greatest effect emerging from GWA studies (the pathways implicated and the relationships to monogenic forms of the same traits) argues forcefully against such a nihilistic interpretation [9,13,14]. The contrasting viewpoint holds that common-trait susceptibility derives mostly from the action of rare or low-frequency variants [15,16]. Although such variants account for less individual sequence variation than common variants, there may be a disproportionate effect on disease susceptibility. The more recent origin of low-frequency variants may allow alleles with more dramatic phenotypic effects to be represented in the population. Also, large-effect alleles may cause phenotypic disturbances that are not as easily buffered by compensatory changes during development as are well tolerated, small-effect, common-variant alleles. Recent evidence that large, rare CNVs are associated with behavioral and psychiatric disease phenotypes [5,17,18] supports this view. Some argue that such a rare variant architecture is precisely what one would expect for diseases causing low reproductive fitness, though this rationalization fails to explain the high yield of common-variant signals reported for other diseases, such as type 1 diabetes, that were, until recently, fatal during early life [19]. It has even been suggested that many of the common-variant associations discovered by recent GWA studies may turn out to be due to the concerted action of multiple low-frequency and rare causal variants. The NOD2 (CARD15) signal for Crohn's disease indicates that this is certainly possible [20]. For many diseases, however, evidence that common-variant associations are consistent across multiple ethnic groups [21] represents a strong counter to such a model: one would expect the linkage disequilibrium patterns around recent rare and low-frequency causal variants to result in far more inter-ethnic heterogeneity than is actually observed.

The best of both worlds

Although both extreme positions have merit, the likelihood is that, for most diseases, the architecture of predisposition features causal variants that have a wide range of allele frequencies and effect sizes. For most complex traits, the absence of compelling signals from linkage studies conducted in families segregating multifactorial diseases imposes an upper bound to feasible effect sizes; even so, it is easy to show that a limited number of low-frequency susceptibility alleles of medium effect could go a long way to explaining missing heritability. For example, the effect of a low-frequency variant with a population minor allele frequency of 1% and a per-allele odds ratio of 3, when measured in terms of sibling relative risk (a commonly used measure of familial aggregation), exceeds that of the largest common-variant effect known for type 2 diabetes (around TCF7L2). Twenty such variants across the genome would account for most of the unexplained heritability for this condition. Such a constellation of variants could provide a respectable tool for individual disease prediction, and the variants discovered would (because of their relatively large effect size) be valuable resources for detailed molecular and physiological study. The extent to which variants with these characteristics are segregating in the population remains unknown, but this is an area in which the combination of next-generation sequencing technologies and large-scale association analysis provides a powerful stimulus to discovery. Early results of this approach (such as the identification of low-frequency variants within the IFIH1 gene that have a marked effect on type 1 diabetes susceptibility) are encouraging [22].

Strategy and the 'lumpiness' of the genome

Ultimately, we can expect large-scale, high-depth, genome-wide sequencing to enable the systematic exploration of the entire allele-frequency, effect-size space and provide empirical resolution of many of these issues. However, there remain serious financial, logistical and analytical barriers to the implementation of this technology, and the number of such experiments that could be supported by the major funders is, for the time being, limited. All this means that, for the next few years, the power of next-generation sequencing will need to be used carefully if a profusion of underpowered discovery efforts is to be avoided. Efforts targeted to specific genomic regions (around particular candidate genes or pathways or exons across the genome, for example) are attractive because high coverage of the selected areas in large sample sizes can be generated at reasonable cost. Whole-genome sequencing will, for now, be restricted to low-pass coverage across respectable sample sizes, or high-depth coverage in smaller, highly selected, phenotypically extreme sample sets. The genomic distribution of disease-effect loci will have a major impact on the success of these alternative approaches (Figure 1). If the low-frequency and rare variants influencing a given trait are disproportionately located in the same loci as the common variants that have been found to date, then targeted follow-up of regions revealed by GWA studies will be a powerful approach, and extending the range and scope of GWA analysis (to other ethnic groups, for example) should be a particularly efficient strategy. If, on the other hand, the 'dark matter' variants have little positional (or biological) overlap with those already known, then genome-wide resequencing is likely to be the only practical way to find them. The evidence so far (overlap between monogenic and multifactorial loci; growing numbers of loci with multiple independent association signals; extensive pleiotropy, and so on [23,24]) provides some support for the former view. Effort in tracking down common susceptibility variants, as well as being valuable in its own right, should therefore guide researchers towards other types of causal variants.

Figure 1

Causal variant signals and their genomic distribution. Two possible versions of the state of nature are presented (see text). In one ('even'), causal variants differing in terms of allele frequency (color scale) and effect size (height of bar) are distributed randomly across the genome: the location of common-variant (red/orange) associations of modest effect provides no guide to the location of lower-frequency variants (yellow/green), some of which have quite large effects. In the other ('lumpy'), causal variants congregate around certain genomic positions ('genes'): GWA studies that reveal the location of the common-variant associations will also reveal the positions of lower-frequency variants, and the proportion of disease biology explained by the loci discovered through GWA studies will be far greater than the proportion of variance explained would suggest.

Letting several well designed flowers bloom

With only limited empirical data to guide future locus-discovery efforts, extrapolation from the modest proportion of genetic variance so far explained is fraught with danger. The menu of possible research strategies is large, but each choice makes some implicit assumption about the characteristics of the variants being sought and the genomic architecture of the disease under consideration. Given uncertainties over the true state of nature, it is difficult to say which approaches will be most productive. This argues for open minds, a healthy disdain for orthodoxy, and careful exploration of the technological and methodological options. At the same time, it is important that the next wave of large-scale discovery efforts is designed so as to test assumptions about trait architecture and technological performance so that lessons of generic value to the field can be learned.

Abbreviations

CNV: copy number variant; GWA: genome-wide association.

Competing interests

The author declares that he has no competing interests.

23 in total

Review 1. Heritability in the genomics era--concepts and misconceptions.

Authors: Peter M Visscher; William G Hill; Naomi R Wray
Journal: Nat Rev Genet Date: 2008-03-04 Impact factor: 53.242

2. Genomewide association studies--illuminating biologic pathways.

Authors: Joel N Hirschhorn
Journal: N Engl J Med Date: 2009-04-15 Impact factor: 91.245

Review 3. Genome-wide association studies for complex traits: consensus, uncertainty and challenges.

Authors: Mark I McCarthy; Gonçalo R Abecasis; Lon R Cardon; David B Goldstein; Julian Little; John P A Ioannidis; Joel N Hirschhorn
Journal: Nat Rev Genet Date: 2008-05 Impact factor: 53.242

4. Rare chromosomal deletions and duplications increase risk of schizophrenia.

Authors:
Journal: Nature Date: 2008-07-30 Impact factor: 49.962

5. Autism genome-wide copy number variation reveals ubiquitin and neuronal genes.

Authors: Joseph T Glessner; Kai Wang; Guiqing Cai; Olena Korvatska; Cecilia E Kim; Shawn Wood; Haitao Zhang; Annette Estes; Camille W Brune; Jonathan P Bradfield; Marcin Imielinski; Edward C Frackelton; Jennifer Reichert; Emily L Crawford; Jeffrey Munson; Patrick M A Sleiman; Rosetta Chiavacci; Kiran Annaiah; Kelly Thomas; Cuiping Hou; Wendy Glaberson; James Flory; Frederick Otieno; Maria Garris; Latha Soorya; Lambertus Klei; Joseph Piven; Kacie J Meyer; Evdokia Anagnostou; Takeshi Sakurai; Rachel M Game; Danielle S Rudd; Danielle Zurawiecki; Christopher J McDougle; Lea K Davis; Judith Miller; David J Posey; Shana Michaels; Alexander Kolevzon; Jeremy M Silverman; Raphael Bernier; Susan E Levy; Robert T Schultz; Geraldine Dawson; Thomas Owley; William M McMahon; Thomas H Wassink; John A Sweeney; John I Nurnberger; Hilary Coon; James S Sutcliffe; Nancy J Minshew; Struan F A Grant; Maja Bucan; Edwin H Cook; Joseph D Buxbaum; Bernie Devlin; Gerard D Schellenberg; Hakon Hakonarson
Journal: Nature Date: 2009-04-28 Impact factor: 49.962

Review 6. Molecular genetics of Crohn's disease.

Authors: Richard K Russell; Elaine R Nimmo; Jack Satsangi
Journal: Curr Opin Genet Dev Date: 2004-06 Impact factor: 5.578

7. A key role for autophagy and the autophagy gene Atg16l1 in mouse and human intestinal Paneth cells.

Authors: Ken Cadwell; John Y Liu; Sarah L Brown; Hiroyuki Miyoshi; Joy Loh; Jochen K Lennerz; Chieko Kishi; Wumesh Kc; Javier A Carrero; Steven Hunt; Christian D Stone; Elizabeth M Brunt; Ramnik J Xavier; Barry P Sleckman; Ellen Li; Noboru Mizushima; Thaddeus S Stappenbeck; Herbert W Virgin
Journal: Nature Date: 2008-10-05 Impact factor: 49.962

Review 8. Common and rare variants in multifactorial susceptibility to common diseases.

Authors: Walter Bodmer; Carolina Bonilla
Journal: Nat Genet Date: 2008-06 Impact factor: 38.330

Review 9. Common vs. rare allele hypotheses for complex diseases.

Authors: Nicholas J Schork; Sarah S Murray; Kelly A Frazer; Eric J Topol
Journal: Curr Opin Genet Dev Date: 2009-05-28 Impact factor: 5.578

Review 10. Data and theory point to mainly additive genetic variance for complex traits.

Authors: William G Hill; Michael E Goddard; Peter M Visscher
Journal: PLoS Genet Date: 2008-02-29 Impact factor: 5.917

17 in total

1. A data-adaptive sum test for disease association with multiple common or rare variants.

Authors: Fang Han; Wei Pan
Journal: Hum Hered Date: 2010-04-23 Impact factor: 0.444

Review 2. Human population structure and the adaptive response to pathogen-induced selection pressures.

Authors: John Novembre; Eunjung Han
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2012-03-19 Impact factor: 6.237

Review 3. Genetic architecture of cancer and other complex diseases: lessons learned and future directions.

Authors: Lucia A Hindorff; Elizabeth M Gillanders; Teri A Manolio
Journal: Carcinogenesis Date: 2011-03-31 Impact factor: 4.944

4. Identification of novel loci for Alzheimer disease and replication of CLU, PICALM, and BIN1 in Caribbean Hispanic individuals.

Authors: Joseph H Lee; Rong Cheng; Sandra Barral; Christiane Reitz; Martin Medrano; Rafael Lantigua; Ivonne Z Jiménez-Velazquez; Ekaterina Rogaeva; Peter H St George-Hyslop; Richard Mayeux
Journal: Arch Neurol Date: 2010-11-08

Review 5. Progress and promise of genome-wide association studies for human complex trait genetics.

Authors: Barbara E Stranger; Eli A Stahl; Towfique Raj
Journal: Genetics Date: 2010-11-29 Impact factor: 4.562

Review 6. Genomics in the post-GWAS era.

Authors: Brian D Juran; Konstantinos N Lazaridis
Journal: Semin Liver Dis Date: 2011-05-02 Impact factor: 6.115

7. Refined QTLs of osteoporosis-related traits by linkage analysis with genome-wide SNPs: Framingham SHARe.

Authors: David Karasik; Josée Dupuis; Kelly Cho; L Adrienne Cupples; Yanhua Zhou; Douglas P Kiel; Serkalem Demissie
Journal: Bone Date: 2010-01-11 Impact factor: 4.398

8. Effect of five genetic variants associated with lung function on the risk of chronic obstructive lung disease, and their joint effects on lung function.

Authors: María Soler Artigas; Louise V Wain; Emmanouela Repapi; Ma'en Obeidat; Ian Sayers; Paul R Burton; Toby Johnson; Jing Hua Zhao; Eva Albrecht; Anna F Dominiczak; Shona M Kerr; Blair H Smith; Gemma Cadby; Jennie Hui; Lyle J Palmer; Aroon D Hingorani; S Goya Wannamethee; Peter H Whincup; Shah Ebrahim; George Davey Smith; Inês Barroso; Ruth J F Loos; Nicholas J Wareham; Cyrus Cooper; Elaine Dennison; Seif O Shaheen; Jason Z Liu; Jonathan Marchini; Santosh Dahgam; Asa Torinsson Naluai; Anna-Carin Olin; Stefan Karrasch; Joachim Heinrich; Holger Schulz; Tricia M McKeever; Ian D Pavord; Markku Heliövaara; Samuli Ripatti; Ida Surakka; John D Blakey; Mika Kähönen; John R Britton; Fredrik Nyberg; John W Holloway; Debbie A Lawlor; Richard W Morris; Alan L James; Cathy M Jackson; Ian P Hall; Martin D Tobin
Journal: Am J Respir Crit Care Med Date: 2011-10-01 Impact factor: 21.405

Review 9. Genetic Basis of Chronotype in Humans: Insights From Three Landmark GWAS.

Authors: David A Kalmbach; Logan D Schneider; Joseph Cheung; Sarah J Bertrand; Thiruchelvam Kariharan; Allan I Pack; Philip R Gehrman
Journal: Sleep Date: 2017-02-01 Impact factor: 5.849

10. Missense mutations in the MEFV gene are associated with fibromyalgia syndrome and correlate with elevated IL-1beta plasma levels.

Authors: Jinong Feng; Zhifang Zhang; Wenyan Li; Xiaoming Shen; Wenjia Song; Chunmei Yang; Frances Chang; Jeffrey Longmate; Claudia Marek; R Paul St Amand; Theodore G Krontiris; John E Shively; Steve S Sommer
Journal: PLoS One Date: 2009-12-30 Impact factor: 3.240