| Literature DB >> 36061801 |
Viktoriya V Lavrekha1,2, Victor G Levitsky1,2, Anton V Tsukanov1, Anton G Bogomolov3, Dmitry A Grigorovich4, Nadya Omelyanchuk1, Elena V Ubogoeva1, Elena V Zemlyanskaya1,2, Victoria Mironova1,5.
Abstract
Having DNA-binding profiles for a sufficient number of genome-encoded transcription factors (TFs) opens up the perspectives for systematic evaluation of the upstream regulators for the gene lists. Plant Cistrome database, a large collection of TF binding profiles detected using the DAP-seq method, made it possible for Arabidopsis. Here we re-processed raw DAP-seq data with MACS2, the most popular peak caller that leads among other ones according to quality metrics. In the benchmarking study, we confirmed that the improved collection of TF binding profiles supported a more precise gene list enrichment procedure, and resulted in a more relevant ranking of potential upstream regulators. Moreover, we consistently recovered the TF binding profiles that were missing in the previous collection of DAP-seq peak sets. We developed the CisCross web service (https://plamorph.sysbio.ru/ciscross/) that gives more flexibility in the analysis of potential upstream TF regulators for Arabidopsis thaliana genes.Entities:
Keywords: DAP-seq; RNA-seq; multi-omics data integration; proximal promoters; transcription factor binding profiles
Year: 2022 PMID: 36061801 PMCID: PMC9434332 DOI: 10.3389/fpls.2022.942710
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
FIGURE 1CisCross algorithm scheme (see section “Materials and methods”). Green/pink colors mark foreground and background data and respective parallel processes of their analysis. Foreground and background data comprise the annotations of promoter regions of the input genes and the rest genes, respectively. For one DAP-seq set of peaks, the first step of the analysis maps the peaks to promoters of the input genes and the rest genes. The second step uses these data of genome mapping to compile a 2 × 2 contingency table for the input genes and the rest genes with the counts of genes whose promoters overlap or do not overlap the peaks. Finally, Fisher’s exact test is applied to estimate the enrichment of the peaks in promoters (p-value). Output data comprise the list of enriched TF binding profiles in the ascending order of FDR (the significance p-value adjusted for multiple testing).
FIGURE 2Summary statistics on the Plant Cistrome, CisCross-GEM, and CisCross-MACS2 versions of the DAP-seq peak set collection. (A) The total number of peak sets. “New” implies sets missing in the Plant Cistrome version. “Increase”/“Decrease” means that the number of peaks in sets increased/decreased at least twofold; “Small changes” indicates any smaller changes. X-axes in panels (A,B) denote the version of the DAP-seq collection. (B) The number of col and colamp peak sets in three versions of the DAP-seq collection. (C) Distribution of mean peak length (Y-axis) in individual peak sets (X-axis) for the CisCross-MACS2 version of the DAP-seq collection. Red line denotes the fixed peak length in the Plant Cistrome version (200 bp). Blue/orange colors mark the peak sets with shorter/longer mean peak length. A few example peak sets are named.
Changes in the number of peaks between selected TF binding profiles from the CisCross-MACS2 and Plant Cistrome versions of the DAP-seq peak set collection.
| TF name | TAIR ID | TF family | Number of peaks | ||
| Plant Cistrome | MACS2 | ||||
|
| DEL1 | AT3G48160 | E2F-DP | absent | 382 |
| EIN3_colamp | AT3G20770 | EIL | absent | 774 | |
| GATA1_colamp | AT3G24050 | C2C2 (Zn) | absent | 1926 | |
| NAC16_colamp | AT1G34180 | NAC | absent | 1318 | |
| WUS_colamp | AT2G17950 | WOX | absent | 2443 | |
|
| C3H67_colamp | AT5G63260 | C3H (Zn) | 4364 | 7551 |
| DAG2_colamp | AT2G46590 | C2C2 (Zn) | 9982 | 17182 | |
| HB33_colamp | AT1G75240 | ZF-HD | 14246 | 25968 | |
| OBP3_colamp | AT3G55370 | C2C2 (Zn) | 5038 | 20300 | |
| SND2 | AT4G28500 | NAC | 636 | 3695 | |
|
| FUS3 | AT3G26790 | B3-domain | 3266 | 2055 |
| ERF19 | AT1G22810 | AP2/ERF | 3765 | 4386 | |
| LBD13_colamp | AT2G30340 | LBD | 3715 | 4437 | |
| NAC62_colamp | AT3G49530 | NAC | 5883 | 5990 | |
| RVE5 | AT4G01280 | MYB | 9550 | 8793 | |
|
| BBX31 | AT3G21890 | C2C2 (Zn) | 16775 | 1654 |
| ATHB13 | AT1G69780 | HD-ZIP | 23232 | 2173 | |
| LBD23 | AT3G26620 | LBD | 1451 | 383 | |
| VND4 | AT1G12260 | NAC | 10458 | 5058 | |
| WRKY22 | AT4G01250 | WRKY | 22769 | 7544 | |
FIGURE 3Comparison of the results for the gene list enrichment analysis in pairwise combinations of different versions of the DAP-seq collection for the benchmark compilation of RNA-seq data from the EBI Expression Atlas (see section “Materials and methods”). Panels (A–C) show the percentage of overlap of the output lists for potential TF regulators (FDR < 0.05). Panels (D–F) show the total number of potential TF regulators (FDR < 0.05).
Gene list enrichment analysis of the series of transcriptomic data for various treatment times by auxin hormone (see section “Materials and methods”).
| 1 h | 2 h | 4 h | 6 h | 55 h | |
|
| |||||
| Plant Cistrome |
| 78 | 23 | 51 | 124 |
| CisCross-GEM |
|
| 6 | 74 | 114 |
| CisCross-MACS2 |
| 81 |
| 54 |
|
|
| |||||
| Plant Cistrome | 68 | 145 | 81 | 81 | 2 |
| CisCross-GEM | 340 | 168 | 179 | 7 | 21 |
| CisCross-MACS2 | 37 |
|
|
|
|
The ranks for two TF regulators ARF5, and EIN3 are shown. The asterisk marks the significance of enrichment corrected with Benjamini-Hochberg (FDR), *** is FDR < 0.001, **FDR < 0.01, * < FDR < 0.1. The hits with the lowest rank for the TF over three versions of the DAP-seq collection are marked in bold.
FIGURE 4Examples of the output data for the CisCross web service. (A) CisCross-Main mode for gene list enrichment analysis (for the list of auxin up regulated genes from GSE149410). (B) CisCross-Light mode for the upstream region of PIN7 (AT1G23080) gene.