| Literature DB >> 29793448 |
Min Wang1,2, Timothy P Hancock3, Amanda J Chamberlain4, Christy J Vander Jagt4, Jennie E Pryce4,5,3, Benjamin G Cocks4,5, Mike E Goddard4,6, Benjamin J Hayes4,7.
Abstract
BACKGROUND: Topological association domains (TADs) are chromosomal domains characterised by frequent internal DNA-DNA interactions. The transcription factor CTCF binds to conserved DNA sequence patterns called CTCF binding motifs to either prohibit or facilitate chromosomal interactions. TADs and CTCF binding motifs control gene expression, but they are not yet well defined in the bovine genome. In this paper, we sought to improve the annotation of bovine TADs and CTCF binding motifs, and assess whether the new annotation can reduce the search space for cis-regulatory variants.Entities:
Keywords: Allele-specific expression quantitative trail loci; Allelic-specific expression; CTCF binding motifs; Cattle; Expression quantitative trail loci; Functional annotation; Topological association domains
Mesh:
Substances:
Year: 2018 PMID: 29793448 PMCID: PMC5968476 DOI: 10.1186/s12864-018-4800-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary statistics of TAD mapping
| Input TADs | Reference assembly | Number of TADs | Mean TAD width (kb) | Number of input TADs mapped (ratio) | |||
|---|---|---|---|---|---|---|---|
| Study | Cell or tissue | 1 location in bovine genome | Same bovine chromosome | Bovine genome | |||
| Dixon 2012 | hESC | hg18 | 3127 | 852.2 | – | – | – |
| bostau6 | 2885 | 830.5 | 2784(89.03%) | 2930(93.70%) | 2956(94.53%) | ||
| IMR90 | hg18 | 2349 | 1122 | – | – | – | |
| bostau6 | 2236 | 1071 | 2050(87.27%) | 2198(93.57%) | 2261(96.25%) | ||
| mESC | mm9 | 2200 | 1093 | – | – | – | |
| bostau6 | 2173 | 1104 | 1912(86.91%) | 2041(92.77%) | 2127(96.68%) | ||
| cortex | mm9 | 1519 | 1542 | – | – | – | |
| bostau6 | 1597 | 1489 | 1283(84.46%) | 1401(92.23%) | 1502(98.88%) | ||
| Rudan 2015 | liver | mm10 | 3643 | 695 | – | – | – |
| bostau6 | 3507 | 602.5 | 2388(65.55%) | 2873(78.86%) | 2901(79.63%) | ||
| canfam3 | 3315 | 686.5 | – | – | – | ||
| bostau6 | 2979 | 810 | 2731(82.38%) | 2887(87.09%) | 2916(87.96%) | ||
The source study and cell/tissue from input TAD dataset, and also the input and output reference assemblies with the number of TADs and their mean width (in kilobases) are shown. Also presented are the number of input TADs that did not split during mapping (1 location in the bovine genome), the number of input TADs that did not split or split intra-chromosomally during mapping (same bovine chromosome), and the number of input TADs that did not split, split intra-chromosomally or inter-chromosomally during mapping (bovine genome)
Fig. 1Distribution of TAD widths. For each input dataset, the TAD width (millions of base pairs) on input reference genome, intermediate reference genome (if it was used), bovine reference genome (putative bovine TADs stage 1, stage 2 and final) were shown as boxplots
Summary statistics of identifying CTCF binding motifs in the bovine genome
| Mammalian CTCF ChIP-Seq sequence | Putative bovine CTCF binding motifs (strand-specific) | |||||
|---|---|---|---|---|---|---|
| Quantity | Width (nt) | Number of motif profiles identified (e ≤ 10−5) | Set | Quantity | Width (nt) | make up % of bovine genome |
| 184,492 | 42–1716 | 81 | Less stringent | 3,770,311 | 7–29 | 0.61% |
| More stringent | 78,524 | 15–97 | 0.02% | |||
The number of input mammalian CTCF ChIP-Seq sequences and their range of lengths are shown. Also presented is the number of CTCF binding motif clusters identified from MEME-ChIP. Last presented are FIMO results that are categorised into two sets by filtering stringencies. A less stringent set of putative bovine CTCF binding motifs has motif P-value no larger than 10−5. A more stringent set of putative bovine CTCF binding motifs has motif P-value no larger than 10−8, motif score no smaller than 80, and non-overlapping. For each set the number of CTCF binding motifs discovered in the bovine genome, their range of lengths, and the proportion of these motifs making up the bovine genome are shown
Fig. 2Enrichment of biological hallmarks in putative bovine TAD boundaries. For each hallmark biological signal (rows) and each putative bovine TAD set (columns), these frequency histograms show the number of overlapping base pairs between the signal and the TAD boundaries. The frequencies were rounded to the nearest 10 thousand, and the bin width was 1. The 10,000 random permutations are in colour and the actual number is the black vertical line. If a biological signal is enriched at TAD boundaries, the vertical line will be on the right and clearly separated from the histogram
Fig. 3R-squared values from significant ANOVA models (P ≤ 10−8). For each tissue (X-axis) and each regulatory unit defined by TAD, CTCF gaps and TAD+CTCF gaps, these bar plots show the R-squared values of significant (P ≤ 10−8) ANOVA models (Y-axis). Note in the CTCF model, there is no input TAD set involved. For the purpose of a convenient comparison with the other models, the same R-squared values in the CTCF model were plotted across the same tissue
Rank and fold change of enrichment of significant aseQTLs and eQTLs at putative bovine CTCF binding motifs
| QTL type | cell type | P-value significant threshold | Less stringent | More stringent | ||
|---|---|---|---|---|---|---|
| rank | fold change | rank | fold change | |||
| aseQTL | white blood | 10−5 | <0.0001 | 1.54 | <0.0001 | 4.19 |
| 10−6 | <0.0001 | 1.55 | <0.0001 | 4.07 | ||
| 10−7 | <0.0001 | 1.55 | <0.0001 | 4.02 | ||
| 10−8 | <0.0001 | 1.56 | <0.0001 | 3.96 | ||
| milk | 10−5 | <0.0001 | 1.49 | <0.0001 | 3.97 | |
| 10−6 | <0.0001 | 1.49 | <0.0001 | 3.71 | ||
| 10−7 | <0.0001 | 1.49 | <0.0001 | 3.67 | ||
| 10−8 | <0.0001 | 1.48 | <0.0001 | 3.61 | ||
| eQTL | white blood | 10−5 | <0.0001 | 1.38 | <0.0001 | 3.97 |
| 10−6 | <0.0001 | 1.43 | <0.0001 | 3.56 | ||
| 10−7 | <0.0001 | 1.45 | <0.0001 | 2.75 | ||
| 10−8 | <0.0001 | 1.47 | <0.0001 | 2.66 | ||
| milk | 10−5 | <0.0001 | 1.34 | <0.0001 | 4.48 | |
| 10−6 | <0.0001 | 1.40 | <0.0001 | 4.75 | ||
| 10−7 | <0.0001 | 1.38 | <0.0001 | 4.05 | ||
| 10−8 | <0.0001 | 1.53 | 0.9978 | 3.44 | ||
The levels of enrichment of significant allele-specific expression quantitative trait loci (aseQTL) and expression quantitative trait loci (eQTL) from bovine white blood cells and milk cells at two sets of putative bovine CTCF binding motifs are presented. The significance is defined by a P-value less than 10−5, 10−6, 10−7and 10−8 in aseQTL mapping and eQTL mapping respectively. A less stringent set of putative bovine CTCF binding motifs has motif P-value no larger than 10−5. A more stringent set of putative bovine CTCF binding motifs has motif P-value no larger than 10−8 and motif score no smaller than 80
Fig. 4Enrichment of significant aseQTLs and eQTLs in putative bovine CTCF binding motifs. For each significant threshold (columns) and each aseQTL/eQTL in white blood cells or milk cells (rows), these frequency histograms show the number of significant aseQTLs/eQTLs in putative bovine CTCF binding motifs (motif score ≥ 80 and motif P-values ≤ 10−8). The 10,000 random permutations are in colour and the actual number is the black vertical line. If the actual significant aseQTLs/eQTLs are enriched at putative bovine CTCF binding motifs, the vertical line will be on the right and clearly separated from the histogram
eGenes are often located within the same TAD as their linearly-furthermost, and most-significant, aseQTL/eQTL
| QTL type | Cell type | maximum number of eSNPs tested in a chromosome | Sourced TAD set | number of significant eGene in TAD | number of furthermost and most-significant | Chi-Square test | ||
|---|---|---|---|---|---|---|---|---|
| Expected | observed | value | ||||||
| aseQTL | white blood | 1,057,269 | hg18:hESC | 13,302 | 1406.379224 | 6375 | 17,553.7238 | 0 |
| hg18:IMR90 | 13,790 | 1457.973951 | 7273 | 23,192.8204 | 0 | |||
| mm9:mESC | 13,641 | 1442.220643 | 7682 | 26,996.4562 | 0 | |||
| mm9:cortex | 13,150 | 1390.308735 | 8670 | 38,116.6453 | 0 | |||
| mm10:liver | 12,013 | 1270.09725 | 5101 | 11,554.8757 | 0 | |||
| canfam3:liver | 13,046 | 1379.313137 | 6086 | 16,060.8209 | 0 | |||
| milk | 1,153,815 | hg18:hESC | 9553 | 1102.23947 | 3862 | 6909.82168 | 0 | |
| hg18:IMR90 | 10,019 | 1156.007249 | 4550 | 9964.63198 | 0 | |||
| mm9:mESC | 9706 | 1119.892839 | 4764 | 11,857.8462 | 0 | |||
| mm9:cortex | 9317 | 1075.009436 | 5608 | 19,114.2541 | 0 | |||
| mm10:liver | 8692 | 1002.895998 | 2931 | 3706.85001 | 0 | |||
| canfam3:liver | 9638 | 1112.046897 | 3863 | 6805.23726 | 0 | |||
| eQTL | white blood | 8964 | hg18:hESC | 327 | 0.2931228 | 114 | 44,108.66 | 0 |
| hg18:IMR90 | 340 | 0.304776 | 125 | 51,017.4649 | 0 | |||
| mm9:mESC | 322 | 0.2886408 | 125 | 53,883.3149 | 0 | |||
| mm9:cortex | 305 | 0.273402 | 123 | 55,090.3719 | 0 | |||
| mm10:liver | 284 | 0.2545776 | 91 | 32,346.6467 | 0 | |||
| canfam3:liver | 330 | 0.295812 | 119 | 47,633.917 | 0 | |||
| milk | 2463 | hg18:hESC | 343 | 0.0844809 | 24 | 6770.19366 | 0 | |
| hg18:IMR90 | 348 | 0.0857124 | 25 | 7241.9128 | 0 | |||
| mm9:mESC | 356 | 0.0876828 | 34 | 13,115.9732 | 0 | |||
| mm9:cortex | 323 | 0.0795549 | 36 | 16,218.7166 | 0 | |||
| mm10:liver | 340 | 0.083742 | 23 | 6271.10507 | 0 | |||
| canfam3:liver | 362 | 0.0891606 | 22 | 5384.49588 | 0 | |||
The linearly-furthermost, and most-significant, aseQTL and eQTL for each eGene all have a P-value less than 10−8 in aseQTL/eQTL mapping