| Literature DB >> 31327999 |
Malte Thodberg1,2, Albin Sandelin1,2.
Abstract
Cap Analysis of Gene Expression (CAGE) is one of the most popular 5'-end sequencing methods. In a single experiment, CAGE can be used to locate and quantify the expression of both Transcription Start Sites (TSSs) and enhancers. This is workflow is a case study on how to use the CAGEfightR package to orchestrate analysis of CAGE data within the Bioconductor project. This workflow starts from BigWig-files and covers both basic CAGE analyses such as identifying, quantifying and annotating TSSs and enhancers, advanced analysis such as finding interacting TSS-enhancer pairs and enhancer clusters, to differential expression analysis and alternative TSS usage. R-code, discussion and references are intertwined to help provide guidelines for future CAGE studies of the same kind.Entities:
Keywords: CAGE; DE; Enhancer; Motifs; Promoter; TSS
Mesh:
Year: 2019 PMID: 31327999 PMCID: PMC6613478 DOI: 10.12688/f1000research.18456.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Comparison of Bioconductor packages for CAGE data analysis.
| Analysis | icetea | TSRchitect | CAGEr | CAGEfightR |
|---|---|---|---|---|
| Simplest input | FASTQ | BAM | BAM | BigWig |
| TSS calling | sliding window | X-means | distance or paraclu | slice-reduce |
| TSS shapes | - | + | + | + |
| Differential Expression | + | + | + | - |
| Enhancer calling | - | - | - | + |
| TSS-enhancer correlation | - | - | - | + |
| Super enhancers | - | - | - | + |
Overview of samples in the nanotube exposure experiment.
| Group | Biological Replicates |
|---|---|
| Ctrl | 5 mice |
| Nano | 6 mice |
The initial design matrix for the nanotubes experiment.
| Class | Name | BigWigPlus | BigWigMinus | |
|---|---|---|---|---|
| C547 | Ctrl | C547 | mm9.CAGE_7J7P_NANO_KON_547.plus.
| mm9.CAGE_7J7P_NANO_KON_547.minus.
|
| C548 | Ctrl | C548 | mm9.CAGE_ULAC_NANO_KON_548.
| mm9.CAGE_ULAC_NANO_KON_548.minus.
|
| C549 | Ctrl | C549 | mm9.CAGE_YM4F_Nano_KON_549.plus.
| mm9.CAGE_YM4F_Nano_KON_549.minus.
|
| C559 | Ctrl | C559 | mm9.CAGE_RSAM_NANO_559.plus.bw | mm9.CAGE_RSAM_NANO_559.minus.bw |
| C560 | Ctrl | C560 | mm9.CAGE_CCLF_NANO_560.plus.bw | mm9.CAGE_CCLF_NANO_560.minus.bw |
| N13 | Nano | N13 | mm9.CAGE_KTRA_Nano_13.plus.bw | mm9.CAGE_KTRA_Nano_13.minus.bw |
| N14 | Nano | N14 | mm9.CAGE_RSAM_NANO_14.plus.bw | mm9.CAGE_RSAM_NANO_14.minus.bw |
| N15 | Nano | N15 | mm9.CAGE_RFQS_Nano_15.plus.bw | mm9.CAGE_RFQS_Nano_15.minus.bw |
| N16 | Nano | N16 | mm9.CAGE_CCLF_NANO_16.plus.bw | mm9.CAGE_CCLF_NANO_16.minus.bw |
| N17 | Nano | N17 | mm9.CAGE_RSAM_NANO_17.plus.bw | mm9.CAGE_RSAM_NANO_17.minus.bw |
| N18 | Nano | N18 | mm9.CAGE_CCLF_NANO_18.plus.bw | mm9.CAGE_CCLF_NANO_18.minus.bw |
Design matrix after adding new batch covariate.
| Class | Batch | |
|---|---|---|
| C547 | Ctrl | B |
| C548 | Ctrl | B |
| C549 | Ctrl | B |
| C559 | Ctrl | A |
| C560 | Ctrl | A |
| N13 | Nano | B |
| N14 | Nano | A |
| N15 | Nano | B |
| N16 | Nano | A |
| N17 | Nano | A |
| N18 | Nano | A |
Top differentially expressed TSS and enhancer candidates.
| cluster | clusterType | txType | baseMean | log2FoldChange | padj |
|---|---|---|---|---|---|
| chr1:73977049-73977548;- | TSS | intron | 1183.3740 | 2.838367 | 0 |
| chr2:32243097-32243468;- | TSS | promoter | 30799.5953 | 3.741789 | 0 |
| chr3:144423689-144423778;- | TSS | promoter | 191.0431 | 3.709530 | 0 |
| chr4:125840648-125840820;- | TSS | proximal | 1063.4328 | 3.867574 | 0 |
| chr4:137325466-137325712;- | TSS | intron | 176.7636 | 3.912592 | 0 |
| chr7:53971039-53971170;- | TSS | promoter | 8720.5204 | 6.696838 | 0 |
| chr9:120212846-120213294;+ | TSS | promoter | 316.0582 | 2.404706 | 0 |
| chr11:83222553-83222887;+ | TSS | proximal | 228.5560 | 6.098838 | 0 |
| chr12:105649334-105649472;+ | TSS | CDS | 175.1364 | 3.345412 | 0 |
| chr19:56668148-56668332;+ | TSS | CDS | 103.8795 | -2.254371 | 0 |
Global summary of differentially expressed genes.
| (Intercept) | BatchB | ClassNano | |
|---|---|---|---|
| Down | 51 | 2572 | 1505 |
| NotSig | 463 | 8278 | 10373 |
| Up | 13053 | 2717 | 1689 |
Top differentially expressed genes.
| symbol | nClusters | AveExpr | logFC | adj.P.Val | |
|---|---|---|---|---|---|
| 66938 | Sh3d21 | 3 | 5.871004 | 3.075745 | 0.0e+00 |
| 245049 | Myrip | 2 | 4.371325 | 2.414055 | 7.0e-07 |
| 12722 | Clca3a1 | 1 | 3.020528 | 3.692198 | 7.0e-07 |
| 382864 | Colq | 3 | 2.770158 | -3.426911 | 1.1e-06 |
| 20716 | Serpina3n | 5 | 6.384175 | 1.872782 | 3.0e-06 |
| 72275 | 2200002D01Rik | 2 | 7.208031 | 1.693257 | 5.5e-06 |
| 381813 | Prmt8 | 4 | 4.553612 | 1.409006 | 5.8e-06 |
| 170706 | Tmem37 | 2 | 5.503908 | 1.679690 | 5.8e-06 |
| 18654 | Pgf | 1 | 4.862055 | 2.337045 | 5.8e-06 |
| 20361 | Sema7a | 1 | 7.612236 | 1.473680 | 5.9e-06 |
Top enriched or depleted GO-terms.
| Term | Ont | N | Up | Down | P.Up | P.Down | |
|---|---|---|---|---|---|---|---|
| GO:0006954 | inflammatory response | BP | 556 | 142 | 51 | 0 | 0.9562685 |
| GO:0006952 | defense response | BP | 1072 | 224 | 99 | 0 | 0.9878373 |
| GO:0097529 | myeloid leukocyte migration | BP | 170 | 61 | 14 | 0 | 0.9359984 |
| GO:0010033 | response to organic substance | BP | 2074 | 370 | 196 | 0 | 0.9987104 |
| GO:0006950 | response to stress | BP | 2755 | 464 | 246 | 0 | 0.9999946 |
| GO:0006955 | immune response | BP | 1034 | 210 | 96 | 0 | 0.9833226 |
| GO:0042221 | response to chemical | BP | 2762 | 467 | 292 | 0 | 0.9178712 |
| GO:0050900 | leukocyte migration | BP | 288 | 83 | 23 | 0 | 0.9792828 |
| GO:0001816 | cytokine production | BP | 634 | 143 | 45 | 0 | 0.9998658 |
| GO:0001817 | regulation of cytokine production | BP | 570 | 132 | 39 | 0 | 0.9998856 |
Top enriched of depleted KEGG-terms.
| Pathway | N | Up | Down | P.Up | P.Down | |
|---|---|---|---|---|---|---|
| path:mmu04060 | Cytokine-cytokine receptor interaction | 173 | 56 | 13 | 0.0000000 | 0.9579351 |
| path:mmu04668 | TNF signaling pathway | 105 | 31 | 8 | 0.0000037 | 0.9186628 |
| path:mmu00600 | Sphingolipid metabolism | 41 | 17 | 2 | 0.0000051 | 0.9583011 |
| path:mmu00980 | Metabolism of xenobiotics by cytochrome P450 | 48 | 4 | 17 | 0.8857194 | 0.0000137 |
| path:mmu03010 | Ribosome | 122 | 32 | 2 | 0.0000226 | 0.9999900 |
| path:mmu04064 | NF-kappa B signaling pathway | 85 | 24 | 5 | 0.0000704 | 0.9655534 |
| path:mmu04657 | IL-17 signaling pathway | 74 | 22 | 2 | 0.0000806 | 0.9985563 |
| path:mmu00982 | Drug metabolism - cytochrome P450 | 46 | 5 | 15 | 0.7266916 | 0.0001238 |
| path:mmu04630 | JAK-STAT signaling pathway | 112 | 29 | 7 | 0.0001453 | 0.9785951 |
| path:mmu04512 | ECM-receptor interaction | 69 | 21 | 13 | 0.0001488 | 0.0577601 |
Top differentially used TSSs.
| txType | geneID | symbol | logFC | FDR | |
|---|---|---|---|---|---|
| chr17:13840650-13840851;- | intron | 21646 | Tcte2 | 1.7889344 | 0e+00 |
| chr10:57857044-57857314;+ | promoter | 110829 | Lims1 | -1.0651946 | 0e+00 |
| chr14:70215678-70215876;- | intron | 246710 | Rhobtb2 | 2.4933979 | 0e+00 |
| chr4:141154044-141154185;- | intron | 74202 | Fblim1 | 1.7018062 | 0e+00 |
| chr17:33966135-33966308;+ | intron | 66416 | Ndufa7 | 2.1612127 | 0e+00 |
| chr15:76428030-76428201;- | intron | 94230 | Cpsf1 | 1.4598815 | 0e+00 |
| chr19:57271818-57272125;- | promoter | 226251 | Ablim1 | 1.1456163 | 0e+00 |
| chr9:77788968-77789200;+ | intron | 68801 | Elovl5 | 0.9810692 | 1e-07 |
| chr11:116395161-116395462;+ | proximal | 20698 | Sphk1 | 1.7471930 | 1e-07 |
| chr2:91496305-91496449;+ | intron | 228359 | Arhgap1 | 0.9809491 | 3e-07 |
Top genes showing any differential TSS usage.
| geneID | symbol | NExons | FDR |
|---|---|---|---|
| 21646 | Tcte2 | 4 | 0e+00 |
| 110829 | Lims1 | 3 | 0e+00 |
| 246710 | Rhobtb2 | 3 | 0e+00 |
| 74202 | Fblim1 | 3 | 0e+00 |
| 66416 | Ndufa7 | 3 | 0e+00 |
| 94230 | Cpsf1 | 2 | 0e+00 |
| 226251 | Ablim1 | 3 | 0e+00 |
| 68801 | Elovl5 | 2 | 1e-07 |
| 20698 | Sphk1 | 3 | 1e-07 |
| 228359 | Arhgap1 | 2 | 2e-07 |