| Literature DB >> 34983397 |
Jan Deneweth1, Yves Van de Peer1,2,3, Vanessa Vermeirssen4,5,6.
Abstract
BACKGROUND: Transposable elements (TE) make up a large portion of many plant genomes and are playing innovative roles in genome evolution. Several TEs can contribute to gene regulation by influencing expression of nearby genes as stress-responsive regulatory motifs. To delineate TE-mediated plant stress regulatory networks, we took a 2-step computational approach consisting of identifying TEs in the proximity of stress-responsive genes, followed by searching for cis-regulatory motifs in these TE sequences and linking them to known regulatory factors. Through a systematic meta-analysis of RNA-seq expression profiles and genome annotations, we investigated the relation between the presence of TE superfamilies upstream, downstream or within introns of nearby genes and the differential expression of these genes in various stress conditions in the TE-poor Arabidopsis thaliana and the TE-rich Solanum lycopersicum.Entities:
Keywords: Gene regulation; Plant genomes; Regulatory networks; Stress; Transposable elements
Mesh:
Substances:
Year: 2022 PMID: 34983397 PMCID: PMC8725346 DOI: 10.1186/s12864-021-08215-8
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Positioning and abundance of TEs nearby protein-coding genes. A) Three genomic positionings of TEs relative to genes were considered: “upstream” contains TEs within 1 kb upstream of the gene, “downstream” contains those within 1 kb downstream of the gene, and “intron” contains TEs within introns. Only the transposon boundary closest to the gene was considered for this classification. B) Well-defined TE superfamilies (TEFs) adjacent to protein-coding genes in A. thaliana and the proportion of gene-proximal TEs they hold. ‘Rest’ indicates the sum of Mariner, ATDNA12T3_2, Tc1 and ATREP18 TEFs. C) Well-defined TE superfamilies (TEFs) adjacent to protein-coding genes in S. lycopersicum and the proportion of gene-proximal TEs they hold. ‘Rest’ indicates the sum of Mariner, Retrotransposon, Helitron and TRIM_LARD TEFs
Number of genes and TEs in the genomes of A. thaliana and S. lycopersicum. For well-defined TE superfamilies (TEFs) the number of TE-proximal genes and the number of TEs in the different genomic positionings with respect to their nearby protein-coding genes are given
| Genes | TE-proximal genes | UP | IN | DOWN | Gene-proximal TEs | Total TEs | |
|---|---|---|---|---|---|---|---|
| 27,420 | 9540 | 10,099 | 2163 | 6997 | 14,420 | 31,189 | |
| 33,697 | 22,949 | 17,175 | 39,507 | 15,668 | 59,236 | 531,409 |
Genes is the total number of protein-coding genes annotated in the genome of the species. ‘TE-proximal genes’ is the total number of protein-coding genes adjacent to well-defined TEFs. Upstream (UP), intragenic (IN) and downstream (DOWN) refer to the relative positioning of TEs to adjacent genes, up to a maximum distance of 1 kb. ‘Gene-proximal TEs’ is the unique total number of TEs in well-defined TEFs adjacent to protein-coding genes, hence a combination of the upstream, intragenic and downstream positioning, where the same TE can be in different genomic locations for different genes. ‘Total TEs’ gives the total number of TEs annotated in the genome of the species
Fig. 2Fold enrichment of differentially expressed genes near specific TE superfamilies (TEFs) upon stress for A) A. thaliana and B) S. lycopersicum. The Chi-squared test was conducted separately for up- and downregulated genes with TEFs upstream, downstream or within introns. The intensity from yellow to red reflects the enrichment score with values between 1 and 4.5, as compared to all differentially expressed genes near all TEFs in that specific genomic positioning and stress condition. The significance of enrichment is indicated within the tiles: * = FDR adjusted p-value < 0.05, ** = FDR adjusted p-value < 0.01, *** = FDR adjusted p-value < 0.001. We additionally filtered out significant results for which the observed number of differentially expressed genes near a TEF was less than 5 and the expected number was less than 2. Only TEFs, stress conditions and genomic positionings for which a valid enrichment was found are shown. : heat_B = 1 h incubation at 44 °C - leaves, paraquat_A = spray with 25 μM paraquat, photorespiratory_mutant_B = SHORT_ROOT (shr) mutant – 24 h photorespiratory stress, proteasome_inh_A = 100 μM proteasome inhibitor MG132, proteasome_mutant_B = rpn-10 mutant – RPN10 is a subunit of the 26S proteasome, salt_heat_A = 150 mM NaCl for 15 days + 1 h incubation at 44 °C – leaves. : hormone_B = 48 h after treatment with ACC (ethylene precursor), infection_necrotrophic_A = infection by Colletotrichum gloeosporioides - leaves, infection_necrotrophic_C = infection by Pseudomonas syringae pv. tomato DC3000 - leaves, infection_viral_A = infection by Tomato yellow leaf curl virus - leaves, light_A = constant shade – shoot apical meristem / leaf primordia, light_B = constant sun – shoot apical meristem / leaf primordia, light_C = sun to shade – shoot apical meristem / leaf primordia, light_D = constant sun - shoot apical meristem / leaf primordia, stress_tolerance_A = male-sterile, stress tolerant mutant
Most significant cis-regulatory motifs detected de novo by RSAT peak-motifs in TE sequences adjacent to stress-responsive genes in A. thaliana
We only analyzed sequences of enriched TEF members near stress-responsive genes in a specific genomic positioning (upstream, intron, downstream). All other TE sequences in the same genomic positioning were taken as background. Detected motifs were compared to the motif databases Cistrome, footprintDB-plants, JASPAR core non-redundant plants and cisBP. Only most significant sequence logo(s) are displayed. N.S. = non-significant
Most significant cis-regulatory motifs detected de novo by RSAT peak-motifs in TE sequences adjacent to stress-responsive genes in S. lycopersicum
We only analyzed sequences of enriched TEF members near stress-responsive genes in a specific genomic positioning (upstream, intron, downstream). All other TE sequences in the same genomic positioning were taken as background. Detected motifs were compared to the motif databases Cistrome, footprintDB-plants, JASPAR core non-redundant plants and cisBP. Only most significant sequence logo(s) are displayed. N.S. = non-significant
Most significant plant cis-regulatory motifs detected by RSAT dna-pattern in TE sequences adjacent to stress-responsive genes for A. thaliana (Atha) and S. lycopersicum (Slyc)
| Sample | Motif pattern | Name | Percentage in gene-proximal TEs (%) | Enrichment | Adjusted p-value |
|---|---|---|---|---|---|
| Atha_proteasome_mutant_B_ intron_down_ | GTTAGGTTC | ACIII element (MYB) | 17 | 49.4 | 0.0174 |
| Atha_proteasome_mutant_B_ intron_up_ | tACACGbmACyk | NAC019 | 20 | 177.8 | 0 |
| vyaCACGgmAcyr | NAC055 | 20 | 177.8 | 0 | |
| aYACGCAA | NAC080 | 50 | 22.2 | 5.83E-06 | |
| mrCACGTGyk | MYC4 (BHLH) | 20 | 118.5 | 3.45E-05 | |
| rrCACGTGyy | ILR3 (BHLH) | 20 | 59.3 | 0.00049 | |
| Atha_proteasome_mutant_B_ downstream_up_ | GTGGaCCCrs | TCP16 | 10 | 889 | 0 |
| TACCGACGA | DRE-like | 10 | 889 | 0 | |
| GGCCGACGT | DRE-like | 10 | 592.7 | 0 | |
| mrCACGTGyk | MYC4 (BHLH) | 10 | 592.7 | 0 | |
| dwwkvhsACGTGKCa | GBF3 (bZIP) | 10 | 444.5 | 0 | |
| vGAAssTTCy | 10 | 63.5 | 0 | ||
| Atha_photorespiratory_mutant_B_upstream_down_ | CAATGATTG | AtHB5 | 29 | 36.8 | 0.0005 |
| CAATSATTG | AtHB2 | 29 | 36.8 | 0.0005 | |
| yCAATCAWtg | AtHB7 | 29 | 29.6 | 0.0009 | |
| wAATATATTw | AHL20 (AT-hook) | 57 | 4.5 | 0.0136 | |
| Atha_paraquat_A_ intron_down_ | wawawAAATATCtwa | AT3G10113 ( | 14 | 84.7 | 0.0265 |
| aAAATATCTt | CCA1 ( | 29 | 16.9 | 0.0322 | |
| awycTTATCtthwy | AT3G11280 ( | 14 | 50.8 | 0.0322 | |
| AGAAATTTCT | HSEs binding site motif | 14 | 28.2 | 0.0449 | |
| TACGTACAA | SBP-box (zinc finger) | 14 | 31.8 | 0.0449 | |
| Atha_heat_B_ upstream_up_ | ACAGAG | REF6 | 32 | 2.3 | 0.0088 |
| TGGGCY | SITEIIATCYTC (TCP) | 25 | 2.1 | 0.0326 | |
| ayACGywAy | AtNAC6 | 13 | 2.8 | 0.0394 | |
| Atha_heat_B_ intron_up_ | AGCCGACGA | DRE-like | 11 | 65.9 | 0.0180 |
| Atha_salt_heat_A_ upstream_up_ | ACAGAG | REF6 | 28 | 2.1 | 0.0212 |
| ayACGywAy | AtNAC6 | 14 | 2.9 | 0.0224 | |
| GGGCC | SORLIP2 | 22 | 2.3 | 0.0228 | |
| TGGGCY | SITEIIATCYTC (TCP) | 25 | 2.1 | 0.0237 | |
| Atha_salt_heat_A_ intron_up_ | AGCCGACGA | DRE-like | 9 | 53.9 | 0.0295 |
| Slyc_infection_necrotrophic_C_ upstream_down_ | GATAAGR | I-box core | 71 | 3.7 | 0.0664 |
| Slyc_infection_necrotrophic_C_ intron_down_ | wAAwwwwTTw | AHL12 (AT-hook) | 94 | 3.9 | 1.03E-09 |
| rTTTAAAh | TCX6 (CXC) | 72 | 3.6 | 1.60E-05 | |
| rTTTrAAw | SOL1 (CXC) | 83 | 2.7 | 2.98E-05 | |
| dAwTTAAwTw | AGF1 (AT-hook) | 56 | 5.0 | 3.38E-05 | |
| rwWAAmGT | COG1 (DOF) | 78 | 2.7 | 0.0001 | |
| Slyc_infection_necrotrophic_A_ downstream_up_ | CCAATAAAGG | CArG-box (MADS) | 13 | 69.7 | 0.0343 |
| CCTTTATTGG | CArG-box (MADS) | 13 | 69.7 | 0.0343 | |
| Slyc_infection_viral_A_ intron_up_ | yaahawhwwCAmCAACawyahh | AT1G18960 ( | 10 | 135.5 | 0.0037 |
| wwwwwTdACCGTTrr | MYB3R1 ( | 10 | 125.4 | 0.0037 | |
| wthwwwACCGTTA | LOF2 ( | 10 | 80.6 | 0.0068 | |
| GGCCGACAA | DRE-like | 10 | 69.1 | 0.0068 | |
| tmayTAATyAhgwww | ZFHD2 | 10 | 51.3 | 0.0101 | |
| Slyc_infection_viral_A_ intron_up_ | ATATTTAWW | SEF1MOTIF | 67 | 5.4 | 4.57E-06 |
| wAAwwwwTTw | AHL12 (AT-hook) | 83 | 3.4 | 5.42E-06 | |
| tAWWTAWWta | AHL13 (AT-hook) | 56 | 4.6 | 0.0001 | |
| AAATTAAA | Bellringer/replumless/pennywise (AG/HD) | 56 | 4.5 | 0.0001 | |
| ATtwawaATTwAATt | AT1G76110 (HMG/ARID) | 11 | 78.4 | 0.0002 | |
| dACCGGTw | 11 | 7.1 | 0.0379 | ||
| Slyc_stress_tolerance_A_ intron_down_ | wwwCGhATwWT | AtHB32 ( | 12 | 3.6 | 0.0338 |
| kATGTTGC | TEM2 ( | 17 | 2.7 | 0.0414 | |
| AAATTAAA | Bellringer/replumless/pennywise (AG/ | 29 | 2.3 | 0.0220 | |
| TTWTWTTWTT | MARTBOX | 39 | 2.3 | 0.0042 | |
| TTNCGTA | NAC binding site | 24 | 2.2 | 0.0402 | |
| Slyc_light_A_ downstream_up_ | CATTAATTAG | Soybean homeodomein leucine zippers (GmHdl56, GmHdl57) | 18 | 56.1 | 8.62E-06 |
| TTTTACTAGT | SORLREP1 | 14 | 29.9 | 0.0008 | |
| yGCCGCC | ERF2 (tobacco) | 23 | 8.7 | 0.0033 | |
| rCACGTGy | BHLH3 | 18 | 10.8 | 0.0039 | |
| ywTTTACyGc | BRADI1G77610 (MYB) | 14 | 13.3 | 0.0066 | |
| Slyc_light_A_ intron_up_ | CATTAATTAG | Soybean homeodomein leucine zippers (GmHdl56, GmHdl57) | 13 | 50 | 7.02E-15 |
| dwwGAAATGAwr | AT2G31460 (auxin response factor 70) | 16 | 5.6 | 1.68E-05 | |
| KWGTGRWAAWRW | GT-1 motif rbcS (pea) | 11 | 2.7 | 0.0440 | |
| wgawAAmGt | DOF4.7 | 17 | 2.1 | 0.0491 | |
| wtcaGTTr | AtMYB87 | 21 | 2.0 | 0.0225 | |
| Slyc_lightB_ intron_up_ | CATTAATTAG | Soybean homeodomein leucine zippers (GmHdl56, GmHdl57) | 12 | 44.7 | 1.23E-11 |
| dwwGAAATGAwr | AT2G31460 (auxin response factor 70) | 12 | 4.3 | 0.0048 | |
| Slyc_light_C_ intron_up_ | CATTAATTAG | Soybean homeodomein leucine zippers (GmHdl56, GmHdl57) | 15 | 55.3 | 1.84E-15 |
| dwwGAAATGAwr | AT2G31460 (auxin response factor 70) | 15 | 5.3 | 0.0002 | |
| waATgAtTAh | YAB5 (YABBY) | 11 | 4.0 | 0.0087 | |
| Slyc_light_D_ intron_up_ | CATTAATTAG | Soybean homeodomein leucine zippers (GmHdl56, GmHdl57) | 15 | 54.6 | 2.18E-15 |
| dwwGAAATGAwr | AT2G31460 (auxin response factor 70) | 15 | 5.2 | 0.0002 | |
| waATgAtTAh | YAB5 (YABBY) | 11 | 4.0 | 0.0101 | |
| Slyc_light_C_ intron_up_ | AACCAAAC | 15 | 3.2 | 0.0002 | |
| ACCAAAC | 27 | 2.6 | 7.97E-06 | ||
| rymAGTTA | 32 | 2.1 | 4.21E-05 | ||
| TATTAG | CPBCSPOR | 51 | 2.0 | 4.27E-08 | |
| Slyc_light_D_ intron_up_ | TATTAG | CPBCSPOR | 53 | 2.1 | 2.02E-10 |
| rymAGTTA | 31 | 2.1 | 2.85E-05 | ||
| ACCAAAC | 27 | 2.6 | 3.29E-06 | ||
| CATGCAT | RY repeat motif (soybean) | 18 | 2.0 | 0.0134 | |
| AACCAAAC | 15 | 3.3 | 4.67E-05 | ||
| CTAACCA | 14 | 2.1 | 0.0283 | ||
| rwakATtCyc | 11 | 3.4 | 0.0007 |
We analyzed the overrepresentation of 2735 known plant TFBS collected from footprintDB, AGRIS, PLACE and the literature (Methods) in sequences of enriched TEF members near stress-responsive genes in a specific genomic positioning (upstream, intron, downstream) as compared to all TE sequences near genes in that genomic positioning. To limit false positives, we only considered motifs that were present in at least 10% of the TEs and that were at least two times overrepresented. We here display only the 5 most significant known motifs, in addition to any de novo detected motifs or relevant stress-related motifs. Matching TF families between the tools peak-motifs and dna-pattern are highlighted in bold
Fig. 3TE-mediated heat and proteotoxic stress gene regulatory network for A. thaliana. Copia elements in upstream regions and Gypsy elements in introns of heat-responsive genes recruited specific regulatory factors. Also, Gypsy elements within introns and downstream regions and SINE within introns of proteoxic stress-responsive genes hosted cis-regulatory motifs targeted by specific TFs. We can distinguish several regulons, related to the different TEF-differentially expressed genes associations from left to right: SINE/proteasome mutant targeted by ARR18 (grey), SINE/proteasome inhibitor targeted by REF6 (green), Copia/salt-heat targeted by BPC1, ZML2, REF6, NAC6, TCP and RAMOSA1 (orange), Copia/heat targeted by BPC1, ZML2, REF6, NAC6, TCP and RAMOSA1, in addition to HSFB2A and S1FA3 (red), Gypsy downstream/proteasome mutant targeted by HSFB2A, S1FA3, TCP16, AT2G01818, DREB/CBF, GBF3 and MYC4 (darkblue), Gypsy intron/proteasome mutant targeted by HSFB2A, S1FA3, NAC, MYC4 and ILR3 (lightblue), Gypsy/heat targeted by DREB/CBF (purple) and Gypsy/salt-heat targeted by DREB/CBF (pink)