| Literature DB >> 29182598 |
Marina Lizio1,2, Abdul Kadir Mukarram3, Mizuho Ohno1,2, Shoko Watanabe1,2, Masayoshi Itoh1,2,4, Akira Hasegawa1,2, Timo Lassmann2,5, Jessica Severin1,2, Jayson Harshbarger1,2, Imad Abugessaisa1, Takeya Kasukawa1, Chung Chau Hon1, Piero Carninci1,2, Yoshihide Hayashizaki2,4, Alistair R R Forrest2,6, Hideya Kawaji1,2,4,7.
Abstract
The promoter landscape of several non-human model organisms is far from complete. As a part of FANTOM5 data collection, we generated 13 profiles of transcription initiation activities in dog and rat aortic smooth muscle cells, mesenchymal stem cells and hepatocytes by employing CAGE (Cap Analysis of Gene Expression) technology combined with single molecule sequencing. Our analyses show that the CAGE profiles recapitulate known transcription start sites (TSSs) consistently, in addition to uncover novel TSSs. Our dataset can be thus used with high confidence to support gene annotation in dog and rat species. We identified 28,497 and 23,147 CAGE peaks, or promoter regions, for rat and dog respectively, and associated them to known genes. This approach could be seen as a standard method for improvement of existing gene models, as well as discovery of novel genes. Given that the FANTOM5 data collection includes dog and rat matched cell types in human and mouse as well, this data would also be useful for cross-species studies.Entities:
Mesh:
Year: 2017 PMID: 29182598 PMCID: PMC5704677 DOI: 10.1038/sdata.2017.173
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Study overview.
The steps of sample collection, CAGE data production, post-processing and further analyses are shown as arrows with their results indicated by squares.
Human (hg19) CAGE peaks liftOver to dog and rats.
| Reported are the numbers of: expressed lifted over human peaks, peaks within 50 bp of a dog/rat peak, peaks expressed in matching dog/rat samples. Total of human peaks are also reported, for reference. | ||||
|---|---|---|---|---|
| hg19 -> canFam3 | 129,287 | 19,302 | 18,374 | 201,802 |
| hg19 -> rn6 | 111,218 | 20,742 | 19,096 | 201,802 |
Matrix of correlation values for all pairwise comparisons of dog and rat samples.
| The top right side of the table shows pairwise Spearman correlation values for the dog samples, while the bottom left side shows corresponding values for the rat samples. Column and row labels abbreviation: AoSMCdiff—aortic smooth muscle cells differentiated; AoSMC—aortic smooth muscle cells; MSCbm—Mesenchymal stem cells bone marrow derived; Hep—hepatocytes; UniTis—universal RNA tissue. Numbers 1,2,3 indicate biological replicates. | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AoSMCdiff1 | 0.89 | 0.91 | 0.78 | 0.87 | 0.84 | 0.58 | 0.58 | 0.65 | 0.06 | 0.06 | 0.08 | 0.2 | AoSMCdiff1 | |
| AoSMCdiff2 | 0.91 | 0.98 | 0.68 | 0.9 | 0.88 | 0.47 | 0.45 | 0.53 | 0.05 | 0.04 | 0.06 | 0.16 | AoSMCdiff2 | |
| AoSMCdiff3 | 0.74 | 0.9 | 0.72 | 0.9 | 0.87 | 0.53 | 0.52 | 0.59 | 0.05 | 0.04 | 0.06 | 0.16 | AoSMCdiff3 | |
| AoSMC1 | 0.78 | 0.73 | 0.76 | 0.83 | 0.72 | 0.83 | 0.85 | 0.84 | 0.1 | 0.08 | 0.11 | 0.24 | AoSMC1 | |
| AoSMC2 | 0.78 | 0.71 | 0.73 | 0.97 | 0.94 | 0.64 | 0.64 | 0.72 | 0.07 | 0.06 | 0.08 | 0.21 | AoSMC2 | |
| AoSMC3 | 0.75 | 0.72 | 0.8 | 0.92 | 0.94 | 0.48 | 0.48 | 0.59 | 0.04 | 0.04 | 0.05 | 0.17 | AoSMC3 | |
| MSCbm1 | 0.6 | 0.49 | 0.49 | 0.7 | 0.77 | 0.75 | 0.98 | 0.95 | 0.1 | 0.08 | 0.11 | 0.2 | MSCbm1 | |
| MSCbm2 | 0.6 | 0.49 | 0.52 | 0.74 | 0.81 | 0.81 | 0.95 | 0.97 | 0.1 | 0.08 | 0.11 | 0.2 | MSCbm2 | |
| MSCbm3 | 0.48 | 0.4 | 0.42 | 0.66 | 0.72 | 0.65 | 0.77 | 0.81 | 0.09 | 0.07 | 0.1 | 0.2 | MSCbm3 | |
| Hep1 | 0.03 | 0.06 | 0.07 | 0.04 | 0.03 | 0.04 | 0.01 | 0.01 | 0 | 0.98 | 0.96 | 0.72 | Hep1 | |
| Hep2 | 0.04 | 0.07 | 0.08 | 0.05 | 0.03 | 0.05 | 0.01 | 0.01 | 0.01 | 0.99 | 0.99 | 0.76 | Hep2 | |
| Hep3 | 0.03 | 0.04 | 0.05 | 0.03 | 0.02 | 0.03 | 0.01 | 0.01 | 0 | 1 | 0.99 | 0.77 | Hep3 | |
| UniTis | 0.1 | 0.14 | 0.16 | 0.13 | 0.11 | 0.13 | 0.06 | 0.06 | 0.05 | 0.92 | 0.92 | 0.92 | UniTis | |
| Rat | AoSMCdiff1 | AoSMCdiff2 | AoSMCdiff3 | AoSMC1 | AoSMC2 | AoSMC3 | MSCbm1 | MSCbm2 | MSCbm3 | Hep1 | Hep2 | Hep3 | UniTis |
Figure 2Reproducibility of replicates.
Scatter plots and correlation values of normalized expression values between AoSMC samples (replicate 1 versus replicate 2 on the left and replicate 1 differentiated versus non differentiated in the center), and MDS plots highlighting the separation across cell types are shown for rat (a) and dog (b).
Figure 3Characterization of CAGE peaks in dog and rat.
(a) Percentage of mapped reads at promoters identified by DPI for each sample. Labels description: AoSMC=aortic smooth muscle cell; AoSMCdiff=differentiated aortic smooth muscle cell; MESbm=mesenchymal stem cell from bone marrow; Hep=hepatocyte; UniTis=Universal tissue; (b) histograms of CAGE peaks lengths; (c) enrichment of TATA motifs near CAGE peaks; (d) graphs showing TATA-rich versus CpG-rich peaks. TATA-only bound CAGE peaks tend to be sharp whereas CpG-only peaks are generally broader; (e) percentage of genes that can be associated to a CAGE peak for each of the inspected known models; (f) distribution of the distances of CAGE peaks from their closest gene TSS. Colours: orange denotes rat and blue dog, except for (d), where colour-code is specified in the legend.
Number of dog and rat CAGE peaks overlapping known genomic features, as annotated by HOMER tool.
| RefSeq gene is the reference set used by the tool. 3UTR=3-prime untranslated region; ncRNA=non-coding RNA; TTS=transcription termination site; LINE=long interspersed nuclear element; SINE=short interspersed nuclear element; tRNA=transfer RNA; 5UTR=5-prime untranslated region; scRNA=small cytoplasmic RNA; LTR=long terminal repeats; snRNA=small nuclear RNA; rRNA=ribosomal RNA; srpRNA=signal recognition particle RNA. | ||
|---|---|---|
| 3UTR | 474 | 111 |
| ncRNA | 18 | 0 |
| TTS | 403 | 209 |
| LINE | 82 | 286 |
| SINE | 114 | 122 |
| tRNA | 1 | 0 |
| DNA | 17 | 65 |
| Exon | 1,937 | 513 |
| Intron | 1,561 | 163 |
| Intergenic | 3,013 | 8,001 |
| Promoter | 16,625 | 1,394 |
| 5UTR | 1,054 | 57 |
| scRNA | 4 | 0 |
| CpG-Island | 2,770 | 11,321 |
| Low_complexity | 28 | 427 |
| LTR | 169 | 114 |
| Simple_repeat | 122 | 213 |
| snRNA | 34 | 3 |
| Unknown | 2 | 0 |
| Satellite | 2 | 1 |
| rRNA | 63 | 146 |
| srpRNA | 0 | 1 |
Totals and ratios of peaks-genes associations.
| Listing of the totals and percentages for the gene models used in this study together with all robust peaks identified in this study in dog and rat. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Ensembl_transcript | 39,595 | 29,881 | 19,254 | 10,542 | 68% | 46% | 1,0688 | 6,194 | 27% | 21% |
| Augustus_gene | 29,380 | 29,165 | 1,3513 | 9,829 | 47% | 42% | 7,328 | 5,755 | 25% | 20% |
| RefSeq_transcript | 18,978 | 2,274 | 17,188 | 1,350 | 60% | 6% | 9,246 | 779 | 49% | 34% |
| Geneid_gene | 41,652 | 32,342 | 9,399 | 6,997 | 33% | 30% | 5,339 | 4,155 | 13% | 13% |
| Genscan_gene | 49,319 | 42,671 | 8,115 | 5,779 | 28% | 25% | 4,618 | 3,525 | 9% | 8% |
| EST_gene | 1,270,134 | 401,654 | 23,621 | 13,549 | 83% | 59% | 13,718 | 7,932 | 1% | 2% |
| All robust CAGE peaks | 28,497 | 23,147 | 26,208 | 18,597 | 92% | 80% | 28,497 | 23,147 | 100% | 100% |
Breakdown of numbers of CAGE peaks overlapping zero or more gene models.
| The gene model sets refer to those listed in | |||||||
|---|---|---|---|---|---|---|---|
| Rat | 2,289 | 5,045 | 2,646 | 5,000 | 5,617 | 4,115 | 3,785 |
| Dog | 4,550 | 5,984 | 3,943 | 3,202 | 3,004 | 2,230 | 234 |
Figure 4Zenbu examples of Rescue CAGE Peaks.
Screen shots of (a) LOXL3 gene in dog with RCPs supported by RNA-seq and human lift-over promoters, and (b) Loxl3 gene in rat annotated with CAGE peaks, also supported by RNA-seq and human promoters.