| Literature DB >> 27624058 |
Andréanne Morin1,2, Tony Kwan1,2, Bing Ge2, Louis Letourneau2, Maria Ban3, Karolina Tandre4, Maxime Caron2, Johanna K Sandling4,5, Jonas Carlsson5, Guillaume Bourque1,2, Catherine Laprise6, Alexandre Montpetit2, Ann-Christine Syvanen5, Lars Ronnblom4, Stephen J Sawcer3, Mark G Lathrop1,2, Tomi Pastinen7,8.
Abstract
BACKGROUND: The observation that the genetic variants identified in genome-wide association studies (GWAS) frequently lie in non-coding regions of the genome that contain cis-regulatory elements suggests that altered gene expression underlies the development of many complex traits. In order to efficiently make a comprehensive assessment of the impact of non-coding genetic variation in immune related diseases we emulated the whole-exome sequencing paradigm and developed a custom capture panel for the known DNase I hypersensitive site (DHS) in immune cells - "Immunoseq".Entities:
Keywords: Capture; Gene expression; Immune disease; Next-generation sequencing; Rare variants
Mesh:
Year: 2016 PMID: 27624058 PMCID: PMC5022205 DOI: 10.1186/s12920-016-0220-7
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Benchmarking the ImmunoSeq capture panel by known disease associated sites and regulatory variants. a Autosomal GWAS hits associated to more than one autoimmune or chronic inflammatory disease, for neuropsychiatric diseases and for cancer included in the Immunoseq. custom capture panel. (Cut-off of 1 × 10−8 was used to select GWAS hits to analyze, SNPs in LD selected based on r 2 > 0.9, HLA (human leucocyte antigen) hits and region as well as chromosome X SNPs were excluded from the analyses). SNP in LD = GWAS hits that have a SNP in LD in the Immunoseq. custom capture panel. b cis-eQTLs from monocytes (CD14+) and B Cells (CD19+) (considered has haplotype block, r 2 > 0.9) included in the Immunoseq. panel. Cut-off of p < 1e-3 or p < 1e-5, and p < 1e-12 after 1000 permutations (1000 = number of SNPs tested per probe) and top 1 eQTLs per transcript were kept for analysis (HLA hits and region as well as chromosome X hits were excluded in the analyses). c Enrichment of GWAS hits (same as in A) and proximal SNPs (LD r 2 > 0.9) that fall in DHSs selected for immune cell types compared to DHSs selected from other tissues (either all or non-overlapping ones) and regions randomly selected (1000 times) from the whole genome (either the full genome or only non-coding regions excluding HLA). Significance was calculated using Fisher’s exact test. Enrichment is significant (p < 0.001) for all GWAS hits except for Neuropsychiatric hits. d Enrichment of eQTLs (same as in B) and proximal SNPs (LD r 2 > 0.9) positioned at DHSs selected for immune cell types compared to DHSs selected from other tissues (either all or non-overlapping ones) and regions randomly selected (1000 times) from the whole genome (either entire genome or only the non-coding part excluding the HLA region). All enrichments shown are significant (p < 0.001). All p-values were calculated using Fisher’s exact test
Sequencing statistics of the samples sequenced with Immunoseq
| Mean target coverage | Bases on target (%)a | Target region without coverage (%)b | Target bases with > =10x coverage (%)c | Level of multiplexing | Sequencing platform | |
|---|---|---|---|---|---|---|
| Sweden Uppsala Bioresource samples ( | 52X | 88 | 1.9 | 83 | 2X (3 samples) | HiSeq2500 (2X samples) |
| 5X (27 samples) | HiSeq2000 (5X samples) |
Alignment to the human hg19 reference genome, and variant calling (HaplotypeCaller) to identify all SNPs were performed. Shows average values across samples
aOn and near bait bases/good quality bases aligned (according to Picards metrics). bThe percentage of target region that did not reach 2x coverage over any base.cThe percentage of all target bases achieving 10X or higher coverage. We considered a variant to be true at > =10 depth
General characteristics of the common, rare and novel single nucleotide variations (SNVs)
| Total number (average per sample) | All | Common | Rare | Novela | |
|---|---|---|---|---|---|
| All (Immunoseq) | 351,088 (90,594) | 275,042 (83,839) | 50,004 (5318) | 26,042 (1437) | |
| Codingb | All | 60,946 (15,169) | 45,545 (1818) | 12,452 (1166) | 2949 (185) |
| Non-synonymousc | 30,967 (7174) | 21,807 (6403) | 7405 (669) | 1755 (102) | |
| Synonymousc | 29,214 (7770) | 23,434 (7305) | 4785 (395) | 995 (71) | |
| Stop-gainedc | 395 (71) | 202 (56) | 135 (13) | 58 (2) | |
| Exomed | 120,245 (30,682) | 91,818 (27,916) | 21,497 (2280) | 6930 (486) | |
| Non-codinge | 290,142 (75,424) | 229,497 (70,020) | 37,552 (4152) | 23,093 (1251) | |
| All DHSf | 195,182 (51,559) | 154,154 (48,056) | 24,571 (2677) | 16,457 (826) | |
Total number of variants and the average number of variants per sample that were included in the Immunoseq design
aNovel variants are defined as not identified in the 1000 Genomes Project nor included in dbSNP141. bCoding variants are those located in the exons of the RefSeq coding sequence. cSynonymous, non-synonymous and stop-gained variants were annotated using SNPeff and the hg19 version of the genome. dThe Exome is based on the Roche SeqCap EZ exome v3.0. e Non-coding variants are those not in the RefSeq coding sequence. f The All DHSs category combines all DHSs from the selected 12 cell types and could partly overlap with the Exome. Cut-offs used for the quality control of the variants are read depth ≥ 10, genotyping quality (gq) ≥ 70, mapping quality (MQ) ≥50, and proportion of the reference allele between 10 and 90 %
Fig. 2Discovery and functional potential of rare and novel variants using Immunoseq. a Proportion of novel variants (all, Genomic Evolutionary Rate Profiling (GERP++) > =1 and GERP++ > =2) identified in DHS (red) compared to the exome (blue). b Distribution of proportion of common (red), rare (blue) and novel (green) variants according to GERP++ score and Combined annotation dependent depletion (CADD) score. c Fold enrichment of rare (blue), novel (green) or rare and novel combined (red) variants compared to common variants found at shared or cell-type specific DHSs. Linear regression slope: rare =0.119 p-value = 1.35e-05, novel = 0.093 p-value = 5.81e-05, rare and novel = 0.113 p-value = 2.41e-06. d Proportion of common (red), rare (blue) and novel (green) variants localized at a DHS that either disrupt or create a transcription-factor binding motif. P-values are calculated using Fisher’s exact test (***p < 0.001)
Fig. 3The impact of rare and novel noncoding variants on gene expression. a Using the replication set, we looked at the adjusted proportion of transcripts with common (red), rare (blue) or novel (green) noncoding variants in the vicinity (+/−20 kb) of a gene based on different allelic imbalance: 1.5 to 9, 2 to 9, 2.5 to 9, 3 to 9 and 3.5 to 9 fold difference. Adjustment was based on average number of SNPs used to calculate ASE at each ASE levels. b Enrichment of proportion of transcripts showing allelic imbalance (AI) with rare or novel variants in the vicinity of the gene compared to AI transcripts with common variants in vicinity of a gene. We looked at coding (histogram) vs noncoding variants as well as noncoding variants in DHS regions correlated with the promoters (Pearson correlation r > 0.5 to 0.9). In red are all transcripts where allelic imbalance was measured (allAI) and in blue are the transcripts for which the top associated SNP is homozygous in the sample (homAI). Linear regression slope for allAI = 0.015 (p-value = 0.0196) and homAI = 0.063 (p-value = 0.0024). Allelic imbalance genes are considered as > =2 fold between the alleles and equally expressed genes are < =1.5 fold. c Fold difference between proportions of AI transcripts with rare or novel variants in the vicinity compared to AI transcripts with common variants in the vicinity. Only including transcripts for which the top associated SNP is homozygous (homAI). We looked at coding (histogram) vs noncoding variants around the genes (+/−20 kb from gene) and in DHS regions correlated with the promoters (Pearson correlation r > 0.5 to 0.9). We compare different levels of allelically imbalanced transcripts from 1.5 fold to 3.5. all AI: AI transcripts comparing all transcripts for which ASE was measured and homAI: transcripts for which the top associated SNP that drives the association across samples is homozygous
Fig. 4The number and location of rare and novel noncoding variants have an impact on gene. a Adjusted proportion of AI transcripts that contain 1 or more noncoding common (red) or rare and novel (blue) variants in transcripts vicinity (+/−20 kb from gene). Adjustment was based on average number of SNPs used to calculate ASE at each ASE levels. b Fold enrichment of common (red) or rare and novel (blue) variants in AI vs all transcripts measuring their distance from transcription start sites (TSS). Transcripts with p < 0.05 were used. Sliding window of 80 kb every 10 kb was used