| Literature DB >> 33028202 |
Michelle M Halstead1, Colin Kern1, Perot Saelao1, Ying Wang1, Ganrea Chanthavixay1, Juan F Medrano1, Alison L Van Eenennaam1, Ian Korf1, Christopher K Tuggle2, Catherine W Ernst3, Huaijun Zhou4, Pablo J Ross5.
Abstract
BACKGROUND: Although considerable progress has been made towards annotating the noncoding portion of the human and mouse genomes, regulatory elements in other species, such as livestock, remain poorly characterized. This lack of functional annotation poses a substantial roadblock to agricultural research and diminishes the value of these species as model organisms. As active regulatory elements are typically characterized by chromatin accessibility, we implemented the Assay for Transposase Accessible Chromatin (ATAC-seq) to annotate and characterize regulatory elements in pigs and cattle, given a set of eight adult tissues.Entities:
Keywords: Cattle; Chromatin accessibility; Comparative epigenomics; Functional annotation; Pig
Mesh:
Substances:
Year: 2020 PMID: 33028202 PMCID: PMC7541309 DOI: 10.1186/s12864-020-07078-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Experimental design and ATAC-seq quality metrics. a Overview of tissues collected from adult male cattle and pigs for ATAC-seq, wherein the Tn5 transposase preferentially cuts DNA at accessible sites and simultaneously inserts sequencing adapters. b Scatterplots showing Pearson correlation of normalized genome-wide ATAC-seq signal (reads per million; RPM) between biological replicates for brain cortex and lung. c Heatmaps depicting normalized ATAC-seq signal at all TSS, sorted by signal intensity. Signal shown for brain cortex and lung. d Normalized ATAC-seq signal in cattle tissues at the GAPDH locus (a housekeeping gene), as well as several genes with tissue-specific activity. Pink bars indicate that signal exceeded the viewing range (up to 20 RPM). e Normalized ATAC-seq signal in pig tissues at the same genes. f PCA of normalized ATAC-seq signal in consensus open chromatin identified in cattle. G) PCA of normalized ATAC-seq signal in consensus open chromatin identified in pig
ATAC-seq data preprocessing. Per library, total raw reads, mapped reads (excluding mitochondrial DNA), duplicate reads, informative reads (monoclonal and uniquely mapping), and percent of raw reads that were informative
| Cattle | Adipose | A | 152,184,070 | 145,648,161 | 82,972,324 | 37,815,392 | 24.85 |
| Adipose | B | 126,920,776 | 123,879,749 | 25,466,938 | 72,623,985 | 57.22 | |
| Cerebellum | B | 158,964,554 | 134,391,900 | 53,303,886 | 49,305,250 | 31.02 | |
| Brain Cortex | A | 216,202,716 | 189,763,599 | 122,567,574 | 30,997,045 | 14.34 | |
| Brain Cortex | B | 205,363,760 | 174,961,995 | 96,411,128 | 41,021,487 | 19.98 | |
| Hypothalamus | A | 146,439,048 | 102,771,471 | 50,044,176 | 35,090,140 | 23.96 | |
| Hypothalamus | B | 68,153,356 | 63,771,723 | 31,495,325 | 15,328,535 | 22.49 | |
| Liver | A | 175,124,072 | 165,112,503 | 95,001,248 | 40,031,560 | 22.86 | |
| Liver | B | 194,367,992 | 179,265,341 | 99,942,567 | 36,472,289 | 18.76 | |
| Lung | A | 163,469,102 | 159,710,295 | 27,682,063 | 99,691,291 | 60.98 | |
| Lung | B | 190,030,170 | 184,571,092 | 36,657,121 | 110,584,502 | 58.19 | |
| Muscle | A | 89,174,618 | 86,045,857 | 15,262,767 | 55,794,693 | 62.57 | |
| Muscle | B | 97,247,280 | 69,592,640 | 10,539,048 | 45,909,945 | 47.21 | |
| Spleen | A | 261,785,316 | 250,911,567 | 163,860,741 | 37,521,241 | 14.33 | |
| Spleen | B | 179,854,284 | 162,449,025 | 51,533,284 | 69,626,100 | 38.71 | |
| Pig | Adipose | A | 93,118,998 | 87,025,520 | 15,814,107 | 57,651,677 | 61.91 |
| Adipose | B | 71,639,956 | 66,597,412 | 12,368,404 | 42,960,583 | 59.97 | |
| Cerebellum | A | 186,388,542 | 175,280,070 | 90,013,906 | 63,538,783 | 34.09 | |
| Cerebellum | B | 125,817,350 | 116,536,743 | 41,557,703 | 57,997,395 | 46.10 | |
| Brain Cortex | A | 101,924,240 | 98,384,411 | 36,110,627 | 53,096,952 | 52.09 | |
| Brain Cortex | B | 160,871,726 | 155,783,257 | 77,377,043 | 66,403,810 | 41.28 | |
| Hypothalamus | A | 112,463,726 | 106,966,835 | 52,803,730 | 39,919,895 | 35.50 | |
| Hypothalamus | B | 170,163,006 | 162,907,052 | 92,303,710 | 56,125,160 | 32.98 | |
| Liver | A | 171,864,386 | 167,321,556 | 113,588,699 | 42,376,220 | 24.66 | |
| Liver | B | 169,952,062 | 164,205,920 | 85,279,200 | 64,117,458 | 37.73 | |
| Lung | A | 108,086,464 | 104,424,556 | 19,658,156 | 73,018,749 | 67.56 | |
| Lung | B | 110,690,180 | 106,275,981 | 17,791,030 | 74,869,852 | 67.64 | |
| Muscle | A | 168,211,474 | 165,225,305 | 41,747,860 | 113,058,649 | 67.21 | |
| Muscle | B | 141,069,686 | 138,785,466 | 39,577,315 | 91,780,486 | 65.06 | |
| Spleen | A | 91,549,652 | 88,698,199 | 13,254,769 | 64,352,893 | 70.29 | |
| Spleen | B | 93,747,292 | 90,558,895 | 12,817,045 | 67,241,934 | 71.73 |
Quality metrics of ATAC-seq libraries. Non-redundant read fraction (NRF) measures library complexity, Fraction of reads in peaks (FRiP) measures signal-to-noise ratio, and synthetic Jensen-Shannon distance (sJSD) measures divergence between ATAC-seq signal and a uniform distribution
| Cattle | Adipose | A | 0.43 | 9.82 | 0.34 |
| Adipose | B | 0.79 | 15.04 | 0.34 | |
| Cerebellum | B | 0.60 | 42.53 | 0.48 | |
| Brain Cortex | A | 0.35 | 30.16 | 0.48 | |
| Brain Cortex | B | 0.45 | 40.80 | 0.49 | |
| Hypothalamus | A | 0.51 | 43.67 | 0.50 | |
| Hypothalamus | B | 0.51 | 11.97 | 0.37 | |
| Liver | A | 0.42 | 37.99 | 0.50 | |
| Liver | B | 0.44 | 28.17 | 0.44 | |
| Lung | A | 0.83 | 37.63 | 0.45 | |
| Lung | B | 0.80 | 44.26 | 0.48 | |
| Muscle | A | 0.82 | 40.77 | 0.50 | |
| Muscle | B | 0.85 | 41.77 | 0.51 | |
| Spleen | A | 0.35 | 21.29 | 0.42 | |
| Spleen | B | 0.68 | 35.24 | 0.45 | |
| Pig | Adipose | A | 0.82 | 5.21 | 0.40 |
| Adipose | B | 0.81 | 4.12 | 0.40 | |
| Cerebellum | A | 0.49 | 43.05 | 0.46 | |
| Cerebellum | B | 0.64 | 41.49 | 0.47 | |
| Brain Cortex | A | 0.63 | 40.65 | 0.48 | |
| Brain Cortex | B | 0.50 | 44.53 | 0.46 | |
| Hypothalamus | A | 0.51 | 36.85 | 0.49 | |
| Hypothalamus | B | 0.43 | 37.76 | 0.44 | |
| Liver | A | 0.32 | 55.38 | 0.52 | |
| Liver | B | 0.48 | 49.25 | 0.49 | |
| Lung | A | 0.81 | 31.21 | 0.43 | |
| Lung | B | 0.83 | 36.87 | 0.46 | |
| Muscle | A | 0.75 | 58.16 | 0.62 | |
| Muscle | B | 0.71 | 61.94 | 0.65 | |
| Spleen | A | 0.85 | 29.41 | 0.43 | |
| Spleen | B | 0.86 | 22.70 | 0.38 |
Replicability of ATAC-seq peaks. Overlap of peak sets derived from biological replicates
| Cattle | Adipose | 59,612 | 133,768 | 38,647 | 64.8 | 38,471 | 28.8 |
| Brain Cortex | 109,395 | 160,546 | 75,462 | 69.0 | 75,206 | 46.8 | |
| Hypothalamus | 59,966 | 37,036 | 19,970 | 33.3 | 19,999 | 54.0 | |
| Liver | 102,704 | 114,583 | 58,444 | 56.9 | 58,563 | 51.1 | |
| Lung | 221,576 | 248,844 | 167,311 | 75.5 | 166,777 | 67.0 | |
| Muscle | 107,208 | 113,502 | 76,780 | 71.6 | 76,801 | 67.7 | |
| Spleen | 110,852 | 200,323 | 79,355 | 71.6 | 79,007 | 39.4 | |
| Pig | Adipose | 9192 | 7645 | 4778 | 52.0 | 4778 | 62.5 |
| Cerebellum | 220,327 | 214,455 | 132,123 | 60.0 | 132,292 | 61.7 | |
| Brain Cortex | 142,658 | 156,069 | 103,581 | 72.6 | 103,382 | 66.2 | |
| Hypothalamus | 103,402 | 144,418 | 63,382 | 61.3 | 63,368 | 43.9 | |
| Liver | 112,202 | 136,085 | 78,395 | 69.9 | 78,274 | 57.5 | |
| Lung | 167,298 | 191,926 | 114,470 | 68.4 | 113,864 | 59.3 | |
| Muscle | 137,000 | 105,823 | 92,039 | 67.2 | 92,705 | 87.6 | |
| Spleen | 100,982 | 100,867 | 64,192 | 63.6 | 64,208 | 63.7 |
Regions identified as ATAC-seq peaks in both biological replicates of pig, cattle, and mouse tissues. To obtain a single comprehensive set of “consensus” ATAC-seq peaks for each species, regions that were identified as peaks in both biological replicates for each tissue were collapsed, such that any overlapping peaks were merged into a single unique interval
| Adipose | 4785 | 373.583 | 38,745 | 501.575 | 64,008 | 633.770 |
| Cerebellum | 134,086 | 555.921 | 93,927 | 535.284 | 57,771 | 618.660 |
| Brain Cortex | 104,272 | 515.209 | 75,762 | 465.855 | 149,223 | 604.979 |
| Hypothalamus | 63,736 | 514.036 | 20,045 | 415.089 | – | – |
| Liver | 78,841 | 498.528 | 58,853 | 482.561 | 90,228 | 679.330 |
| Lung | 115,491 | 569.537 | 169,734 | 619.450 | 76,168 | 652.975 |
| Muscle | 93,550 | 698.708 | 77,378 | 566.262 | 22,003 | 514.762 |
| Spleen | 64,667 | 560.972 | 79,960 | 542.832 | 35,278 | 601.650 |
| Consensus (collapsed set) | 306,304 | 623.608 | 273,594 | 616.064 | 254,076 | 668.062 |
Fig. 2Open chromatin localization and differential accessibility. a Distribution of cattle and pig consensus open chromatin relative to genomic features. Because peaks often span multiple features, peaks were categorized based on 1 bp overlap with features in the following order: first as TSS (±50 bp), then as promoter (2 kb upstream of TSS), transcription termination site (TTS) (±50 bp), 5′ untranslated region (UTR), 3′ UTR, coding sequence (CDS), intronic, and if no features were overlapped, peaks were considered intergenic. b Distribution of consensus peak activity, ranging from tissue-specific (accessible in only one tissue) to ubiquitous (accessible in all sampled tissues). Consensus peaks that were accessible in a single tissue were further broken down by tissue. c Distribution of consensus peak activity for regions containing CTCF motifs
Motif enrichment in consensus open chromatin. Top ten enriched known binding motifs identified from the merged set of consensus peaks in each species
| Cattle consensus peaks | Pig consensus peaks | ||||
|---|---|---|---|---|---|
| Motif | Peaks with motif (%) | Motif | Peaks with motif (%) | ||
| CTCF (Zf) | 1e-2891 | 8.66 | CTCF (Zf) | 1e-2587 | 8.16 |
| BORIS (Zf) | 1e-1695 | 11.56 | BORIS (Zf) | 1e-1580 | 10.90 |
| NF1(CTF) | 1e-505 | 16.51 | Jun-AP1 (bZIP) | 1e-784 | 8.25 |
| Sp1 (Zf) | 1e-389 | 8.60 | Fosl2 (bZIP) | 1e-769 | 11.25 |
| CEBP (bZIP) | 1e-334 | 14.00 | Fra1 (bZIP) | 1e-718 | 16.30 |
| ETS (ETS) | 1e-278 | 9.70 | Sp1 (Zf) | 1e-639 | 8.16 |
| NRF1 (NRF) | 1e-276 | 3.09 | BATF (bZIP) | 1e-624 | 18.32 |
| RFX (HTH) | 1e-272 | 2.75 | Mef2d (MADS) | 1e-583 | 6.00 |
| Rfx2 (HTH) | 1e-264 | 3.07 | Atf3 (bZIP) | 1e-576 | 19.02 |
| Mef2d (MADS) | 1e-257 | 4.82 | Mef2c (MADS) | 1e-502 | 11.63 |
Fig. 3Characterization of tissue-specific consensus open chromatin. a Number of consensus peaks demonstrating tissue-specific accessibility in each tissue and species. b Normalized ATAC-seq signal (RPM) at regions demonstrating tissue-specific accessibility in cattle and pig tissues. Tissue-specific peaks were first grouped by corresponding tissue, then ordered by signal intensity. Peaks were scaled to 500 bp, and signal is shown 500 bp upstream and downstream. c Distribution of tissue-specific open chromatin relative to gene annotations. d Enrichment of known TF binding motifs in tissue-specific open chromatin. Motifs sorted by TF family. Sets of tissue-specific open chromatin for each species are grouped by tissue. Increasing size and color intensity indicate increasing enrichment for a given motif
Fig. 4Conservation of regulatory element accessibility in cattle, pig, and mouse. a Summary of pairwise open chromatin conservation. Circles reflect number of consensus ATAC-seq peaks in each species. Arrow width reflects proportion of consensus peaks that could be projected to the other species. Lighter section of arrows reflect proportion of regions that could be mapped which demonstrated conserved accessibility in at least one tissue in both species. b Of regions that could be projected to all three species, number of regions with conserved accessibility in all three species and in ungulates laid over a phylogenetic tree reflective of evolutionary distance in millions of years (MYA). c Genomic distribution of regions with conserved accessibility in all three species, relative to the mouse gene annotation. Brief summary of enriched gene ontology (GO) terms for genes marked by conserved open chromatin at their TSS in every tissue. d Consensus peaks, consensus peaks with conserved sequence (that could be mapped to all three species), and consensus peaks with conserved sequence and conserved accessibility in all three species at the MEF2A locus. Tracks show normalized ATAC-seq signal. Conserved promoter open chromatin highlighted in green. Conserved distal (putative enhancer) open chromatin highlighted in yellow