| Literature DB >> 33193703 |
Mazdak Salavati1,2, Alex Caulton3,4, Richard Clark5, Iveta Gazova1,6, Timothy P L Smith7, Kim C Worley8, Noelle E Cockett9, Alan L Archibald1, Shannon M Clarke3, Brenda M Murdoch10, Emily L Clark1,2.
Abstract
The overall aim of the Ovine FAANG project is to provide a comprehensive annotation of the new highly contiguous sheep reference genome sequence (Oar rambouillet v1.0). Mapping of transcription start sites (TSS) is a key first step in understanding transcript regulation and diversity. Using 56 tissue samples collected from the reference ewe Benz2616, we have performed a global analysis of TSS and TSS-Enhancer clusters using Cap Analysis Gene Expression (CAGE) sequencing. CAGE measures RNA expression by 5' cap-trapping and has been specifically designed to allow the characterization of TSS within promoters to single-nucleotide resolution. We have adapted an analysis pipeline that uses TagDust2 for clean-up and trimming, Bowtie2 for mapping, CAGEfightR for clustering, and the Integrative Genomics Viewer (IGV) for visualization. Mapping of CAGE tags indicated that the expression levels of CAGE tag clusters varied across tissues. Expression profiles across tissues were validated using corresponding polyA+ mRNA-Seq data from the same samples. After removal of CAGE tags with <10 read counts, 39.3% of TSS overlapped with 5' ends of 31,113 transcripts that had been previously annotated by NCBI (out of a total of 56,308 from the NCBI annotation). For 25,195 of the transcripts, previously annotated by NCBI, no TSS meeting stringent criteria were identified. A further 14.7% of TSS mapped to within 50 bp of annotated promoter regions. Intersecting these predicted TSS regions with annotated promoter regions (±50 bp) revealed 46% of the predicted TSS were "novel" and previously un-annotated. Using whole-genome bisulfite sequencing data from the same tissues, we were able to determine that a proportion of these "novel" TSS were hypo-methylated (32.2%) indicating that they are likely to be reproducible rather than "noise". This global analysis of TSS in sheep will significantly enhance the annotation of gene models in the new ovine reference assembly. Our analyses provide one of the highest resolution annotations of transcript regulation and diversity in a livestock species to date.Entities:
Keywords: CAGE; FAANG; TSS; WGBS; enhancer; ovine; promoter; transcriptome
Year: 2020 PMID: 33193703 PMCID: PMC7645153 DOI: 10.3389/fgene.2020.580580
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
FIGURE 1FAANG assays (CAGE, WGBS, and mRNA-Seq) performed on each tissue from Benz2616.
FIGURE 2Workflow of the analysis pipeline and respective tools used for CAGE sequence data analysis.
FIGURE 3Schematic representation of the two clustering algorithms used in the CAGEfightR package for TSS (uni-directional) and TSS-Enhancer (bi-directional) clustering.
FIGURE 4The genomic region distribution of CAGE tag clusters mapped against Oar rambouillet v1.0 assembly and gene annotation. The counts were averaged across tissues. (A) Uni-directional TSS clusters with the highest proportion in promoter region (± 100 bp of the 5 UTR beginning at the [TSS]). (B) Bi-directional TSS-Enhancer clusters with the highest proportion in the proximal region (1,000 bp upstream of the 5′UTR beginning at the [TSS]).
FIGURE 5Chord diagram of expression level (TPM) of CAGE tag clusters (uni-directional TSS) across all the tissues collected from Benz2616. Shared CAGE tag clusters are common to at least two-thirds of the tissues (37/56).
FIGURE 6Chord diagram of expression level (TPM) of CAGE tag clusters (bi-directional TSS-Enhancer) across all the tissues collected from Benz2616. CAGE tag clusters expressed (>10 CTPM) by at least two-thirds of the tissues (37/56).
The total number and percentage of “novel” CAGE tag clusters for each tissue within 50 bp (short range) and 400 bp (long range) from the promoter.
| Tissue | % Novel | Clusters within 50 bp | Clusters within 400 bp | Total |
| Abomasum | 49.38 | 8,161 | 8,688 | 16,584 |
| Abomasum pylorus | 49.89 | 12,339 | 13,074 | 25,963 |
| Adipose subcutaneous | 51.74 | 12,336 | 13,074 | 26,970 |
| Adrenal cortex | 51.79 | 12,285 | 13,019 | 26,859 |
| Adrenal medulla | 52.86 | 11,520 | 12,210 | 25,604 |
| Alveolar macrophages | 51.19 | 12,008 | 12,731 | 25,845 |
| Caudal vena cava | 37.75 | 10,937 | 11,578 | 18,179 |
| Cecum | 51.68 | 12,070 | 12,801 | 26,261 |
| Cerebellum | 48.59 | 8,393 | 8,936 | 16,796 |
| Cerebral cortex | 51.19 | 12,199 | 12,917 | 26,327 |
| Descending colon | 51.80 | 11,830 | 12,539 | 25,810 |
| Diaphragm | 52.53 | 10,367 | 11,016 | 22,733 |
| Duodenum | 52.34 | 11,243 | 11,932 | 24,620 |
| Esophagus | 49.90 | 10,016 | 10,625 | 20,741 |
| Gall bladder | 47.78 | 11,870 | 12,578 | 23,852 |
| Heart atrioventricular valve left | 50.90 | 12,268 | 13,000 | 26,330 |
| Heart right atrium | 52.96 | 10,996 | 11,666 | 24,444 |
| Heart right ventricle | 50.47 | 12,260 | 12,987 | 26,082 |
| Hippocampus | 53.40 | 12,142 | 12,878 | 27,451 |
| Hypothalamus | 52.7 | 11,105 | 11,786 | 24,527 |
| Ileum | 52.45 | 12,352 | 13,094 | 27,411 |
| Jejunum | 31.67 | 10,810 | 11,418 | 16,361 |
| Kidney cortex | 52.04 | 12,317 | 13,057 | 27,076 |
| Kidney medulla | 51.07 | 10,946 | 11,618 | 23,365 |
| Liver | 49.35 | 12,255 | 12,981 | 25,459 |
| Lung | 52.91 | 11,644 | 12,339 | 25,995 |
| Lymph node mesenteric | 46.34 | 12,132 | 12,838 | 23,742 |
| Lymph node prescapular | 53.56 | 11,533 | 12,228 | 26,096 |
| Mammary gland | 49.75 | 10,048 | 10,688 | 20,774 |
| Omasum | 39.89 | 9,167 | 9,708 | 15,692 |
| Ovary | 50.79 | 12,334 | 13,073 | 26,434 |
| Oviduct | 53.29 | 11,563 | 12,260 | 25,957 |
| Parathyroid gland | 53.5 | 11,577 | 12,272 | 26,231 |
| Peyer’s patch | 52.41 | 11,881 | 12,578 | 26,240 |
| Pituitary gland | 47.25 | 6,918 | 7,362 | 13,400 |
| Pons | 40.69 | 11,622 | 12,296 | 20,506 |
| Rectum | 53.55 | 12,002 | 12,723 | 27,192 |
| Reticulum | 53.39 | 12,185 | 12,911 | 27,589 |
| Retina | 53.54 | 11,805 | 12,537 | 26,691 |
| Rumen atrium | 50.69 | 12,335 | 13,077 | 26,363 |
| Rumen ventral | 40.20 | 7,109 | 7,567 | 12,165 |
| Skeletal muscle biceps femoris | 50.23 | 12,151 | 12,872 | 25,715 |
| Skeletal muscle longissimus dorsi | 53.67 | 11,356 | 12,060 | 25,748 |
| Skeletal muscle semimembranosus | 51.15 | 12,262 | 12,993 | 26,471 |
| Skin non-haired | 52.4 | 11,907 | 12,629 | 26,337 |
| Spinal cord cervical | 51.47 | 11,376 | 12,050 | 24,508 |
| Spiral colon | 53.25 | 11,937 | 12,662 | 26,813 |
| Spleen | 53.46 | 12,161 | 12,892 | 27,568 |
| Thalamus | 41.61 | 11,426 | 12,079 | 20,404 |
| Thyroid gland | 53.6 | 11,894 | 12,615 | 27,012 |
| Tongue | 39.57 | 9,639 | 10,244 | 16,512 |
| Tonsil palatine | 46.57 | 12,178 | 12,875 | 23,978 |
| Urethra | 52.76 | 11,387 | 12,087 | 25,292 |
| Urinary bladder | 51.68 | 11,163 | 11,840 | 24,174 |
| Uterus caruncle | 48.33 | 12,199 | 12,917 | 24,857 |
| Vagina | 52.30 | 11,600 | 12,300 | 25,543 |
| Average | 49.80 | 11,349 | 12,032 | 23,994 |
FIGURE 7Overlay of CAGE, mRNA-Seq, and WGBS data tracks centered using the genomic coordinates of genes IRF2BP2 and ARID4B. (A) Shows a hypomethylated area overlapping multiple uni- and bi-directional CAGE tag clusters at 5′UTR of IRF2BP2. (B) Predicted CAGE tag clusters with no verifying hypomethylation island within the middle of ARID4B gene, which are likely to be “noise”.
FIGURE 8Numbers of CAGE TSS that were hypomethylated according to the WGBS data to distinguish between “novel” reproducible (+HypoCpG) TSS and “noise” (w/o). (A) Shows the distribution of CAGE clusters as novel and annotated with or without HypoCpG. (B) Percentage of CAGE clusters in each category for each of the eight tissues.
FIGURE 9Network analysis of tissue TSS and gene expression profiles in 52 matched samples from Benz2616. The clustering algorithm was based on MI distance of each tissue given the expressed (A) mRNA-Seq transcript level TPM and (B) CAGE tag clusters (TSSs).
FIGURE 10Long-range correlation of single enhancer site with multiple promotors of several genes. The track shows the significant correlation of a leading/primary enhancer site highly co-expressed with several TSS sites of different genes in a relatively long coding frame (± 10,000 Kb). The 3rd track from the top also shows the level of methylation at CpG sites at DNA level of Benz2616 overlaying the same coordinates of the IK gene and ± 10 Kbp.
Metrics comparison of CAGE atlases from 7 species.
| Species | Genome | TSS | Genes |
| Human | hg38 | 209,911 | 31,184 |
| Mouse | mm10 | 164,672 | 30,501 |
| Chicken | galGal5 | 32,015 | 7,759 |
| Sheep | Oar rambouillet v1.0 | 28,148 | 13,912 |
| Rhesus monkey | rheMac8 | 25,869 | 8,047 |
| Dog | canFam3 | 23,147 | 5,288 |