| Literature DB >> 26210163 |
Martin L Buchkovich1, Karl Eklund2, Qing Duan3, Yun Li4,5,6, Karen L Mohlke7, Terrence S Furey8,9.
Abstract
BACKGROUND: Genetic variation can alter transcriptional regulatory activity contributing to variation in complex traits and risk of disease, but identifying individual variants that affect regulatory activity has been challenging. Quantitative sequence-based experiments such as ChIP-seq and DNase-seq can detect sites of allelic imbalance where alleles contribute disproportionately to the overall signal suggesting allelic differences in regulatory activity.Entities:
Mesh:
Year: 2015 PMID: 26210163 PMCID: PMC4515314 DOI: 10.1186/s12920-015-0117-x
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Overview of AA-ALIGNER. Sample genotypes or common variants are used to create a custom reference genome (1). Sequence reads are filtered to remove low quality reads (2) and aligned to the custom reference using GSNAP including alternate alleles (3). Alignments are filtered further to increase alignment quality (4) and used to detect sites of allelic imbalance (5, binomial test) and identify peaks (6). Allelic imbalance is tested at heterozygous sites included in the customized reference genome and at predicted heterozygous sites, identified based on a minimum number of mapped reads containing each of two alleles. If desired, predicted heterozygous sites can be used to update the custom reference and be included in a second alignment repeating steps 3–6
Allele-aware alignments with complete genotypes (GSNAP) vs no genotype information (BWA)
| GSNAP | BWA | |||||
|---|---|---|---|---|---|---|
| Standard | Complementa | Differenceb | Standard | Complementa | Differenceb | |
|
| 33,599,679 | 33,599,721 | 120 | 33,543,808 | 33,547,947 | 344,942 |
|
| 1,295,901 | 1,295,914 | 120 | 1,197,696 | 1,186,891 | 344,942 |
| Reference allele | 675,394 | 620,517 | - | 677,697 | 640,978 | - |
| Non-reference allele | 620,507 | 675,397 | - | 519,999 | 545,913 | - |
|
| 1618 | 1618 | 0 | 1593 | 1614 | 87 |
|
| 200 | 200 | 0 | 151 | 147 | 56 |
| Reference allele | 108 | 92 | - | 91 | 82 | - |
| Non-reference allele | 92 | 108 | - | 60 | 65 | - |
aAlignment reference contained the non-reference allele of heterozygous sites used to create the standard reference bDiffers in mapping or detection between alignments to standard and complement references cOut of 10,000 peaks with strongest signal dbinomial p-value < .01
Allelic imbalance detection accuracy in alignments using partial or no genotypes compared to complete genotypes
| Factor/Assay (Condition) | Completea | Partial Genotypeb Imbalances | No Genotypec Imbalances | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total | Partial | None | Total | Known variants | Predicted variants | Total | Known variants | Predicted variants | |||||||||||||
| Nt | Nimp | Ncom | Nt | Sens | Prec | Nimp | Sens | Prec | Nt-Nimp | Sens | Prec | Nt | Sens | Prec | Ncom | Sens | Prec | Nt-Ncom | Sens | Prec | |
| CREB1 (50 bp) | 200 | 125 | 141 | 190 | 73.0 | 76.8 | 134 | 96.8 | 90.3 | 56 | 33.3 | 44.6 | 203 | 76.0 | 74.9 | 160 | 93.6 | 82.5 | 43 | 33.9 | 46.5 |
| CREB1 (35 bp) | 106 | 70 | 81 | 104 | 73.6 | 75.0 | 74 | 97.1 | 91.9 | 30 | 27.8 | 33.3 | 107 | 77.4 | 76.6 | 87 | 92.6 | 86.2 | 20 | 28.0 | 35.0 |
| CREB1 (20 bp) | 26 | 16 | 16 | 24 | 69.2 | 75.0 | 17 | 100.0 | 94.1 | 7 | 20.0 | 28.6 | 22 | 69.2 | 81.8 | 17 | 100.0 | 94.1 | 5 | 20.0 | 40.0 |
| CTCF (35 bp) | 267 | 187 | 192 | 300 | 83.1 | 74.0 | 198 | 98.4 | 92.9 | 102 | 47.5 | 37.3 | 298 | 85.0 | 76.2 | 210 | 97.9 | 89.5 | 88 | 52.0 | 44.3 |
| DNase (20 bp) | 104 | 43 | 47 | 138 | 51.0 | 38.4 | 42 | 97.7 | 100.0 | 96 | 18.0 | 11.5 | 144 | 51.9 | 37.5 | 55 | 97.9 | 83.6 | 89 | 14.0 | 9.0 |
| CREB1 (2 alns)d | 200 | 125 | 141 | 195 | 78.5 | 80.5 | 135 | 97.6 | 90.4 | 60 | 46.7 | 58.3 | 204 | 77.0 | 75.5 | 156 | 92.2 | 83.3 | 48 | 40.7 | 50.0 |
| Mismatches alllowed | |||||||||||||||||||||
| CREB1 (0 mm) | 199 | 122 | 138 | 137 | 58.8 | 85.4 | 137 | 95.9 | 85.4 | 0 | - | - | 160 | 63.3 | 78.8 | 160 | 91.3 | 78.8 | 0 | - | - |
| CREB1 (1 m m)e | 200 | 125 | 141 | 190 | 73.0 | 76.8 | 134 | 96.8 | 90.3 | 56 | 33.3 | 44.6 | 203 | 76.0 | 74.9 | 160 | 93.6 | 82.5 | 43 | 33.9 | 46.5 |
| CREB1 (2 mm) | 199 | 124 | 137 | 245 | 80.4 | 65.3 | 133 | 97.6 | 91.0 | 112 | 52.0 | 34.8 | 251 | 81.4 | 64.5 | 159 | 96.4 | 83.0 | 92 | 48.4 | 32.6 |
| CREB1 (3 mm) | 213 | 123 | 143 | 301 | 79.2 | 53.2 | 132 | 98.4 | 90.9 | 169 | 50.0 | 23.7 | 313 | 81.7 | 52.7 | 161 | 96.4 | 83.9 | 152 | 47.6 | 19.7 |
| Minimum reads/allele | |||||||||||||||||||||
| CREB1 (2 reads) | 301 | 178 | 199 | 486 | 73.4 | 45.5 | 187 | 97.2 | 92.5 | 299 | 39.0 | 16.1 | 515 | 75.4 | 44.1 | 228 | 95.0 | 82.9 | 287 | 37.3 | 13.2 |
| CREB1 (3 reads) | 261 | 156 | 173 | 267 | 70.1 | 68.8 | 162 | 94.9 | 91.4 | 105 | 33.3 | 33.3 | 289 | 72.8 | 65.7 | 191 | 92.5 | 83.8 | 98 | 34.1 | 30.6 |
| CREB1 (4 reads) | 230 | 142 | 159 | 218 | 71.4 | 76.0 | 148 | 95.1 | 91.2 | 70 | 33.7 | 42.6 | 235 | 74.8 | 73.2 | 175 | 92.5 | 84.0 | 60 | 35.2 | 41.7 |
| CREB1 (5 reads)e | 200 | 125 | 141 | 190 | 73.0 | 76.8 | 134 | 96.8 | 90.3 | 56 | 33.3 | 44.6 | 203 | 76.0 | 74.9 | 160 | 93.6 | 82.5 | 43 | 33.9 | 46.5 |
| CREB1 (6 reads) | 198 | 122 | 136 | 174 | 70.7 | 80.5 | 130 | 96.7 | 90.8 | 44 | 28.9 | 50.0 | 188 | 73.7 | 77.7 | 153 | 93.4 | 83.0 | 35 | 30.6 | 54.3 |
| CREB1 (7 reads) | 173 | 109 | 123 | 154 | 72.8 | 81.8 | 116 | 97.2 | 91.4 | 38 | 31.2 | 52.6 | 167 | 75.7 | 78.4 | 138 | 92.7 | 82.6 | 29 | 34.0 | 58.6 |
| CREB1 (8 reads) | 157 | 100 | 111 | 141 | 72.0 | 80.1 | 107 | 97.0 | 90.7 | 34 | 28.1 | 47.1 | 148 | 75.2 | 79.7 | 124 | 92.8 | 83.1 | 24 | 32.6 | 62.5 |
| CREB1 (9 reads) | 144 | 91 | 101 | 130 | 72.2 | 80.0 | 98 | 96.7 | 89.8 | 32 | 30.2 | 50.0 | 140 | 75.7 | 77.9 | 115 | 93.1 | 81.7 | 25 | 34.9 | 60.0 |
| CREB1 (10 reads) | 124 | 80 | 88 | 117 | 74.2 | 78.6 | 88 | 96.2 | 87.5 | 29 | 34.1 | 51.7 | 125 | 76.6 | 76.0 | 102 | 92.0 | 79.4 | 23 | 38.9 | 60.9 |
| CREB1 (15 reads) | 88 | 60 | 66 | 82 | 77.3 | 82.9 | 66 | 96.7 | 87.9 | 16 | 35.7 | 62.5 | 88 | 80.7 | 80.7 | 76 | 92.4 | 80.3 | 12 | 45.5 | 83.3 |
| CREB1 (20 reads) | 63 | 47 | 52 | 64 | 84.1 | 82.8 | 53 | 97.9 | 86.8 | 11 | 43.8 | 63.6 | 67 | 88.9 | 83.6 | 60 | 96.2 | 83.3 | 7 | 54.5 | 85.7 |
| Imputation Rsq threshold | |||||||||||||||||||||
| CREB1 (Rsq > .3) e | 200 | 125 | - | 190 | 73.0 | 76.8 | 134 | 96.8 | 90.3 | 56 | 33.3 | 44.6 | - | - | - | - | - | - | - | - | - |
| CREB1 (Rsq > .4) | 200 | 122 | - | 190 | 72.5 | 76.3 | 133 | 97.5 | 89.5 | 57 | 33.3 | 45.6 | - | - | - | - | - | - | - | - | - |
| CREB1 (Rsq > .5) | 200 | 121 | - | 187 | 72.5 | 77.5 | 129 | 98.3 | 92.2 | 58 | 32.9 | 44.8 | - | - | - | - | - | - | - | - | - |
| CREB1 (Rsq > .6) | 200 | 118 | - | 186 | 72.5 | 78.0 | 124 | 98.3 | 93.5 | 62 | 35.4 | 46.8 | - | - | - | - | - | - | - | - | - |
| CREB1 (Rsq > .7) | 200 | 117 | - | 185 | 72.0 | 77.8 | 123 | 98.3 | 93.5 | 62 | 34.9 | 46.8 | - | - | - | - | - | - | - | - | - |
| CREB1 (Rsq > .8) | 200 | 104 | - | 182 | 70.5 | 77.5 | 111 | 99.0 | 92.8 | 71 | 39.6 | 53.5 | - | - | - | - | - | - | - | - | - |
| CREB1 (Rsq > .9) | 200 | 96 | - | 176 | 69.5 | 79.0 | 99 | 99.0 | 96.0 | 77 | 42.3 | 57.1 | - | - | - | - | - | - | - | - | - |
aComplete genotype alignments use sequencing-based genotypes bPartial genotype alignments use array-based genotypes and imputation cNo genotypes alignments use common variants (MAF > .05) from 1000 Genomes EUR dImbalances called after a second alignment using refined genotypes; known variants are variants included in the first alignment eCondition used by default by AA-ALIGNER; Nt total imbalance count, Nimp imbalances at heterozygous sites identified by imputation, Ncom imbalances at common variants, Sens, percent sensitivity, Prec, percent precision
Fig. 2Validation of allelic imbalance detected at GWAS loci and other predicted sites. a We detected significant allelic imbalance (binomial P < 0.01) in CREB1 ChIP-seq sequence reads at variants at five disease- and trait-associated loci. b At rs2382818, sequence reads that failed to align when only single alleles were considered (top) were correctly aligned in an allele-aware alignment (bottom). The increase in aligned reads allowed for the detection of a CREB1 peak (black box) and allelic imbalance at the variant for which more reads were aligned containing the T allele than the A allele were aligned. Total sequence signal is displayed and reads are shaded based which allele they contain. c We detected a significantly greater proportion of reads containing the C allele of rs713875 than the G allele. d EMSA using purified CREB1 and labeled probes containing each allele at nine sites of allelic imbalance to test for allelic differences in binding. Alleles colored blue are predicted to bind CREB1 more strongly than alleles colored red. Allelic differences in protein binding consistent with these predictions were observed for starred (*) variants. Only CREB1-bound probe is shown. Similar results were observed in a replicate experiment