| Literature DB >> 25709717 |
Paul L Auer1, Guillaume Lettre2.
Abstract
Genome-wide association studies (GWASs) have successfully uncovered thousands of robust associations between common variants and complex traits and diseases. Despite these successes, much of the heritability of these traits remains unexplained. Because low-frequency and rare variants are not tagged by conventional genome-wide genotyping arrays, they may represent an important and understudied component of complex trait genetics. In contrast to common variant GWASs, there are many different types of study designs, assays and analytic techniques that can be utilized for rare variant association studies (RVASs). In this review, we briefly present the different technologies available to identify rare genetic variants, including novel exome arrays. We also compare the different study designs for RVASs and argue that the best design will likely be phenotype-dependent. We discuss the main analytical issues relevant to RVASs, including the different statistical methods that can be used to test genetic associations with rare variants and the various bioinformatic approaches to predicting in silico biological functions for variants. Finally, we describe recent rare variant association findings, highlighting the unexpected conclusion that most rare variants have modest-to-small effect sizes on phenotypic variation. This observation has major implications for our understanding of the genetic architecture of complex traits in the context of the unexplained heritability challenge.Entities:
Year: 2015 PMID: 25709717 PMCID: PMC4337325 DOI: 10.1186/s13073-015-0138-2
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Comparison of strategies for rare variant association studies
|
|
|
|
|
|
|---|---|---|---|---|
| Whole-genome sequencing | 2,000 individuals at 30× (high depth) | 3.3 gigabases | For example, Illumina (DNA library and sequencing) | ~4,000a |
| 2,000 individuals at 5× (low depth) | 3.3 gigabases | ~800 | ||
| Whole-exome sequencing | 2,000 individuals at 80× | 50 to 70 megabases | Agilent SureSelect (capture); Illumina (DNA library and sequencing) | ~750 |
| Targeted sequencing of candidate genes | 2,000 individuals at 100× | 500 kilobases (exons from ~100 genes) | TruSeq Custom Amplicon Illumina (capture); Illumina (DNA library and sequencing) | ~325 |
| 2,000 individuals at 100× | 100 kilobases (exons from ~20 genes) | ~250 | ||
| 5,000 individuals at 100× | 100 kilobases (exons from ~20 genes) | ~125 | ||
| Exome array | 10,000 individuals | ~250,000 coding variants | Illumina ExomeChip array | ~70 |
We provide cost estimates for next-generation DNA sequencing or genotyping experiments using different study designs.
aWith the recently developed Illumina HiSeq X Ten platform, whole-genome sequencing at high coverage is 60 to 70% cheaper. We do not recommend or endorse any specific companies or products. Cost estimates do not include bioinformatics processing.
Figure 1Comparison of power for trios and case–control designs. Power to detect associations for 10,000 cases and 10,000 controls (blue) and 10,000 trios (red) across a range of minor allele frequencies (MAFs). Power was calculated with a significance threshold of P < 0.05, a prevalence of 0.1 and a relative risk of 1.1, using the Genetic Power Calculator tool [112].
Partial list of tools and resources to annotate DNA sequence variants
|
|
|
|
|
|---|---|---|---|
| CADD | A framework that integrates multiple annotations into one metric by contrasting variants that survived natural selection with simulated mutations |
| [ |
| ENCODE | Annotation of potential functional elements (for example, histone tail modifications) in several cell lines |
| [ |
| Epigenomics Roadmap | Annotation of potential functional elements (for example, DNAse I hypersensitive sites) in many human tissues and primary cells |
| [ |
| FANTOM5 | Annotation of transcriptional enhancers in many human tissues and primary cells through detection of bidirectional capped transcription |
| [ |
| GERP | Identifies constrained elements in multiple alignments by quantifying substitution deficits |
| [ |
| HaploReg | Visualization of DNA polymorphisms along with their predicted chromatin state, their sequence conservation across mammals, and their effect on regulatory motifs |
| [ |
| Phen-Gen | Method that combines patients' disease symptoms and sequencing data with prior domain knowledge to identify the causative genes for rare disorders |
| [ |
| PolyPhen-2 | A tool that predicts the possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations |
| [ |
| RegulomeDB | A database that annotates SNPs with known and predicted regulatory elements in the intergenic regions of the human genome using gene expression, ENCODE and literature-mining data |
| [ |
| RVIS | This score is designed to rank genes in terms of whether they have more or less common functional genetic variation relative to the genome-wide expectation given the amount of apparently neutral variation the gene has |
| [ |
| SIFT | Predicts whether an amino acid substitution affects protein function based on the degree of conservation of amino acid residues in sequence alignments derived from closely related sequences |
| [ |
| VEP | Determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts and protein sequence, as well as regulatory regions. |
| [ |
CNV, copy number variant; SNP, single nucleotide polymorphism.
Figure 2Functional annotation of regulatory sequences in the human genome. Genome tracks from the UCSC Genome Browser. CXCL2 (blue) encodes a chemokine produced by activated monocytes and neutrophils at sites of inflammation. Single nucleotide polymorphisms (SNPs; rs546829 and rs1371799, green) are associated with monocyte count by a genome-wide association study. The red box upstream of CXCL2 includes a predicted enhancer identified in monocytes by FANTOM5 (black rectangles). FANTOM5 did not annotate an enhancer in hepatocytes, a less relevant cell type for CXCL2. Using histone tail modification information, ENCODE predicted strong enhancers (orange) at the same position in erythroleukemic (K562) and endothelial (HUVEC) cells. Chr, chromosome; hESC, human embryonic stem cell; HMM, hidden Markov model; kb, kilobases.