| Literature DB >> 28369037 |
Colby Chiang1, Alexandra J Scott1, Joe R Davis2,3, Emily K Tsang2,4, Xin Li2, Yungil Kim5, Tarik Hadzic6, Farhan N Damani5, Liron Ganel1, Stephen B Montgomery2,3,7, Alexis Battle5, Donald F Conrad8,9, Ira M Hall1,8,10.
Abstract
Structural variants (SVs) are an important source of human genetic diversity, but their contribution to traits, disease and gene regulation remains unclear. We mapped cis expression quantitative trait loci (eQTLs) in 13 tissues via joint analysis of SVs, single-nucleotide variants (SNVs) and short insertion/deletion (indel) variants from deep whole-genome sequencing (WGS). We estimated that SVs are causal at 3.5-6.8% of eQTLs-a substantially higher fraction than prior estimates-and that expression-altering SVs have larger effect sizes than do SNVs and indels. We identified 789 putative causal SVs predicted to directly alter gene expression: most (88.3%) were noncoding variants enriched at enhancers and other regulatory elements, and 52 were linked to genome-wide association study loci. We observed a notable abundance of rare high-impact SVs associated with aberrant expression of nearby genes. These results suggest that comprehensive WGS-based SV analyses will increase the power of common- and rare-variant association studies.Entities:
Mesh:
Year: 2017 PMID: 28369037 PMCID: PMC5406250 DOI: 10.1038/ng.3834
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Summary of variant types and discovery methods. SNVs and indels were detected using the Genome Analysis Toolkit (GATK) and SVs were detected by breakpoint evidence (BP) and supported by read-depth evidence (BP, RD), or only detected by read-depth evidence (RD). Common variants (MAF ≥ 0.05) were tested for cis eQTLs. The SV-only eQTL mapping excluded SNVs and indels for greater sensitivity, while the joint eQTL mapping included all variant types.
| Detection method | # of variants | Median resolution (bp) | Median size (bp) | # of common variants | eVariants (SV-only) | eVariants (joint) | |
|---|---|---|---|---|---|---|---|
| SNV | GATK | 21,764,904 | – | 1 | 6,394,161 | – | 16,959 |
| Indel | GATK | 3,030,964 | – | 3 | 801,431 | – | 2,130 |
| Deletion (DEL) | BP, RD | 11,492 | 35 | 993 | 2,939 | 510 | 25 |
| RD | 473 | kilobase | 3,819 | 284 | 68 | 17 | |
| Duplication (DUP) | BP, RD | 2,506 | 96 | 576 | 676 | 97 | 3 |
| RD | 1,035 | kilobase | 4,999 | 684 | 148 | 76 | |
| Multi-allelic CNV (mCNV) | RD | 1,534 | kilobase | 3,847 | 1060 | 264 | 118 |
| Inversion (INV) | BP | 51 | 15 | 1045 | 14 | 0 | 0 |
| Reference mobile element insertion (rMEI) | BP | 2,051 | 1 | 307 | 1,535 | 265 | 10 |
| Other SV (BND) | BP | 4,460 | 34 | – | 1,788 | 281 | 4 |
| All SVs | – | 23,602 | 39 | – | 8,980 | 1,634 | 253 |
| All variant types | – | 24,819,470 | – | – | 7,204,572 | – | 19,342 |
Resolution refers to the positional certainty at each breakpoint, with read-depth variants having approximate breakpoint precision on the kilobase scale.
Figure 1Structural variation call set. (a) Size distribution of ascertained SVs by variant type and (b) number of SVs detected in each sample. Starred (*) samples exhibited abnormal read-depth profiles, and were excluded from rare variant analyses. (c) The site frequency spectrum of SVs compared to SNVs and indels detected by GATK.
Figure 2eQTL effect size distributions and heritability partitioning with linear mixed models. (a) Effect size distributions for coding and noncoding variants of each type, with the number of eQTLs of each type above each distribution. The top panels (SV-only eQTLs) show the 5,128 eQTLs that were discovered by the SV-only analysis, while the bottom two panels show the 23,554 eQTLs discovered by the joint analysis. The “DUP” category includes duplications and mCNVs, and the alternate allele for rMEIs is the insertion. (b,c) Heat scatter plots showing the heritability of each eQTL apportioned to the most significant SV in the cis window (x-axis) and the additive effect from the top 1,000 most significant SNVs and indels in the cis window (y-axis) for (b) SV-only and (c) joint eQTL mapping analyses. Gray lines denote the median of values for each axis.
Figure 3Feature enrichment of SV-eQTLs. Fold enrichment and 95% confidence intervals (based on 100 random shuffled sets of the positions of SVs in each bin) for the overlap between the most significant SV and various annotated genomic features at the union of eQTLs discovered by SV-only or joint eQTL mapping. (a) Composition of each causality score bin by SV type. (b) Enrichment for an SV in each bin of causality to touch exons of the affected eGene. For the remaining plots in blue (c-f), SVs that overlapped with an exon of their affected eGene were excluded, yet the remaining SVs still showed significant enrichment in (c) enhancers from the Dragon Enhancers Database (DENdb), (d) in the 10 kb regions upstream and (e) downstream of transcriptions start sites (TSS), and (f) regions predicted to be highly occupied by transcription factors (FunSeq HOT regions).
Figure 4Candidate SV-eQTLs at GWAS loci. Genomic position and haplotype blocks are shown on the x-axis, and each variant’s association with the indicated eGene is shown on the y-axis. The rectangular points represent the predicted causal SV, with the colors representing its linkage (r2) to each marker in the window. The labeled diamonds show the reported risk allele for the specified GWAS phenotype. (a) A 294 bp deletion that intersects an enhancer in intron 1 of DAB2IP was linked to a risk allele for abdominal aortic aneurysm (rs7025486), and is also predicted to be a causal eQTL for DAB2IP. (b) A 1,468 bp deletion associated with increased expression of PADI4 is linked to a known risk allele for rheumatoid arthritis (rs2301888).
Figure 5Gene expression outliers are associated with rare SVs. (a) Fold enrichment of rare variants within 5 kb of expression outliers (red) and fold enrichment of outliers within 5 kb of rare variants (blue) between the observed set of 5,047 outliers and 1,000 random permutations of their sample names (y-axis is log-scaled). (b) Effect size distributions for each SV type within 5 kb of an outlier in the same individual, with “coding” SVs defined as those that overlap with exons of the outlier gene and “noncoding” defined by the remainder. (c) Size distribution histograms by minor allele frequency (MAF) classes and rare SVs within 5 kb of an expression outlier in the same individual, excluding balanced rearrangements. A peak at ~300 bp in the top two plots results from Alu SINE insertions in the reference genome.