| Literature DB >> 34503979 |
Yuyun Zhang1,2, Zijuan Li1,2, Yu'e Zhang2,3, Kande Lin4, Yuan Peng1,2,5, Luhuan Ye1,2, Yili Zhuang1,2, Meiyue Wang1,2, Yilin Xie1,2, Jingyu Guo1,6, Wan Teng2,3, Yiping Tong2,3, Wenli Zhang4, Yongbiao Xue2,3,7,8, Zhaobo Lang1,2,5, Yijing Zhang1,2,9.
Abstract
More than 80% of the wheat genome consists of transposable elements (TEs), which act as major drivers of wheat genome evolution. However, their contributions to the regulatory evolution of wheat adaptations remain largely unclear. Here, we created genome-binding maps for 53 transcription factors (TFs) underlying environmental responses by leveraging DAP-seq in Triticum urartu, together with epigenomic profiles. Most TF binding sites (TFBSs) located distally from genes are embedded in TEs, whose functional relevance is supported by purifying selection and active epigenomic features. About 24% of the non-TE TFBSs share significantly high sequence similarity with TE-embedded TFBSs. These non-TE TFBSs have almost no homologous sequences in non-Triticeae species and are potentially derived from Triticeae-specific TEs. The expansion of TE-derived TFBS linked to wheat-specific gene responses, suggesting TEs are an important driving force for regulatory innovations. Altogether, TEs have been significantly and continuously shaping regulatory networks related to wheat genome evolution and adaptation.Entities:
Year: 2021 PMID: 34503979 PMCID: PMC8647832 DOI: 10.1101/gr.275658.121
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.438
Figure 1.Genome-wide binding of wheat transcription factors underlying responses to environmental stimuli. (A) Schematic of the experimental design and filtering steps. The detailed filtering steps were listed in Supplemental Table S2. (B) Genomic tracks illustrating the targeting of RD29 by a subset of these TFs as well as locations of representative motifs. (C) DHS read density of TFBSs. TFBSs were grouped according to the number of binding TFs. DHS signal densities (bin size 50 bp) within a 4-kb window centered on merged TFBS centers. (D) Clustering of the top motif identified for each TF. Dendrogram based on motif similarity.
Figure 2.Concordance of DAP-seq and ChIP-seq peaks. (A) Venn diagrams showing the overlap between ChIP-seq peaks, DAP-seq peaks of AP2-DREB-7, and DHS. (B) Average number of AP2 motifs (bin size = 50 bp) within a 4-kb window centered on common and unique peak summits. The CBF binding motif in the JASPAR plant database is used because CBF is the orthologous gene of Tu AP2-DREB-7 in Arabidopsis. Four CBF binding motifs in the JASPAR database were merged into a consensus motif. (C) Motifs de novo identified from common and unique peaks. (D) Enrichment of histone marks in common and unique peaks. H3K4me3 and H3K27me3 overlapping (bivalent) peaks and unique peaks (K4-only and K27-only) were used for the analysis. (E) Enrichment of DAP-seq unique peaks in H3K27me3 down-regulated and DNase I hypersensitivity (DH) up-regulated regions by ABA. The MAnorm package (Shao et al. 2012) was used for the quantitative comparison of H3K27me3 ChIP-seq and DHS signals between samples. (F) Genomic tracks illustrating the coincidence between DAP-seq unique binding and ABA-induced chromatin accessibility and reduced H3K27me3. TuG1812G0500004885 is a gene with LRR domains which may relate to biotic or abiotic stress responses.
Figure 3.Distribution of TFBSs and stress responsiveness of TF targets. (A) Clustering of TF binding correlations based on occurrence of DAP-seq peaks shows that TFs from the same family generally have similar binding profiles. (B) Circos plot showing the genomic distribution of the four largest TFBS clusters shown in A. The visualization of genomic distribution was performed by Circos (Krzywinski et al. 2009). (C) Fraction of peaks for each TF localized to the distal regions (>10 kb from the nearest gene). For each TF peak set, the distance to the TSS of the nearest gene was compared with randomly selected regions using an unpaired Student's t-test. Almost all TFs are closer to gene body regions. (**) P < 0.01, (***) P < 0.001 (H1: expected distance > observed distance). (D) Enrichment of TF proximal targets in stress-responsive and non-stress-responsive genes. The top panel (red), middle panel (blue), and bottom panel (green) are the enrichment of TF targets in up-regulated genes, down-regulated genes, and genes with no significant expression change in response to stresses. The color range represents the enrichment P value and the circle size represents the odds ratio. Genes with FPKM > 1 were used for the analysis.
Figure 4.Pervasive association of TFBSs and TEs in Tu. (A) Proportion of the TFBSs that occurred in TEs with (dark blue) or without (light blue) DHSs and non-TE regions with (dark red) or without (light red) DHSs. The numbers of all TFBSs are shown on the right. (B) DAP-seq density and AP2-DREB-7 ChIP-seq density distribution of non-TE TFBSs and TE-embedded TFBSs with or without DHSs. (C) Epigenetic profiles of TE-embedded and non-TE TFBSs. All figures represent the average signal density at 50-bp resolution within a 4-kb window centered on peak summits. Top panel: Regulatory histone marks, including H3K4me3, H3K27me3, and H3K9ac. Bottom panel: DNA methylation levels in three contexts. (D,E) Distribution of motifs (D) and conservation levels (E) in non-TE TFBSs and TE-embedded TFBSs with or without DHSs. (D) The number of motif occurrences (bin size 50 bp) within a 4-kb window centered on the merged TFBS centers. The unions of the primary motifs of these TFs were used. (E) Conservation score is a measure of sequence conservation across wheat species. The 0.33 quantile (0.16) and 0.66 quantile (0.25) of the conservation score of all peaks were used to define the degree of conservation. 0 < score < 0.16 for low conservation, 0.16 ≤ score < 0.25 for median conservation, score ≥0.25 for high conservation. For each TFBS set, the number of the TFBSs in each conservation category was compared with a randomly selected set using a χ2 test. (***) P < 0.001. (F) Specific TE families enriched among TFBSs. Blue dots and brown triangles represent families contributed to TE-embedded TFBS overlapping and not overlapping with DHSs, respectively. Highly enriched TE families (enrichment score > 9) are labeled with family names. (G) Percentage of TFBSs in TE-embedded regions with or without DHSs. The color range and circle size represent the percentage of TFBSs overlapping with TEs. (H) TE copy number (line plot) of each family (represented by different colors) during evolution. The genome sizes are shown as a bar plot, light blue representing TEs and light gray representing non-TEs. (I) Dendrogram showing the sequence similarity between RLG family 13 members. (J) Age of different groups of RLG family 13 measured by sequence similarity of LTR from both ends. A Wilcoxon signed-rank test was used to compare the LTR distance of different groups. (**) P < 0.01 (H1: the LTR sequence of TEs with DHSs and TFBSs were more divergent than other TEs).
Figure 5.Ongoing degeneration of remnant TEs to TFBSs in non-TE regions. (A) Left: Fraction of TE-embedded TFBSs showing high sequence similarity to non-TE TFBSs. Right: Fraction of non-TE TFBSs with high sequence similarity to TE-embedded TFBSs. (B) Multiple sequence alignment of one cluster of TE-embedded and TE-derived TFBSs based on sequence similarity (n = 2034). The alignment in the red circle is enlarged on the right. The alignment also shows a particularly high degree of sequence identity for the WRKY binding motif. (C) Fractions of homologous sequences in other species for TE-embedded TFBSs, TE-derived TFBSs in non-TE regions, and other non-TE TFBSs. (D) Enriched TE subfamilies with TFBSs showing sequence similarity to non-TE TFBSs. (E) TE-derived TFBSs were grouped according to the sequence divergence with TEs. Level 1 represents low divergence and level 4 represents high divergence. (F) Distribution of the distance between TFBSs and the proximal genes. TFBSs were classified as TE-embedded, TE-derived, and TE-free; 30,000 non-TFBS regions were randomly sampled from genomic loci without TFBSs. A Wilcoxon signed-rank test was used to compare the TE-derived TFBSs and TE-embedded TFBSs. (***) P < 0.001 (H1: TE-derived TFBSs were closer to genes than TE-embedded TFBSs). (G) Distribution of sequence conservation for different groups of TFBSs and non-TFBSs in TEs. A Wilcoxon signed-rank test was used to compare the TE-derived TFBSs and TE-embedded TFBSs. (***) P < 0.001 (H1: TE-derived TFBSs were more conservative than TE-embedded TFBSs). (H–K) Epigenetic feature distribution of TFBSs embedded in TEs or localized to non-TE regions and non-TFBSs. (H) DNA methylation. (I) DHS density. (J,K) Regulatory histone mark distribution. A Wilcoxon signed-rank test was used to compare the TE-derived TFBSs and TE-embedded TFBSs. (***) P < 0.001. (For H, H1: TE-derived TFBSs had lower methylation levels than TE-embedded TFBSs. For I–K, H1: TE-derived TFBSs had more active epigenetic signatures than TE-embedded TFBSs.)
Figure 6.TE-derived TFBSs have wired new genes into the regulatory network of wheat environmental responses. (A) Fraction of TE-derived TFBSs and TE-free TF targets induced by abiotic stresses. (B) Ratio of unique response genes in wheat and commonly induced genes in Tu and Os. Orange spots represent TE-derived TFBSs. Black spots represent TE-free TFBSs. The TFs with the number of targets induced by abiotic stresses greater than 20 were kept. (C) Ka/Ks ratio of TFs between Os and Tu. The values for 1:1 orthologous TFs are shown on top, and the line plot represents the background of Ka/Ks distribution for all 1:1 orthologous genes between Os and Tu. TFs with Ka/Ks ratios greater than the median of all 1:1 orthologous genes are in dark orange; other TFs displayed in light orange. (D) Network showing incorporation of new stress-responsive genes by TE-derived TFBSs. TEs in A are shown. (E) Model illustrating the rewiring of the gene regulatory network by TE-derived TFBSs. Left: Some TFBSs or TFBS precursors exist within specific TEs, transposition of which leads to expansion of corresponding TFBSs or precursors. Right: Transposed TEs were degenerated and lost typical TE structures, but some TFBSs present in TEs were evolutionarily selected for regulating nearby gene activity. The closer the TE-derived TFBS to genes, the stronger the regulatory activity. The reverse arrow at the bottom illustrates that some TE-embedded TFBSs may be hijacked from the non-TE TFBSs.