| Literature DB >> 33617633 |
Juzuo Li1, Ning Li1, Ling Zhu1, Zhibin Zhang1, Xiaochong Li1, Jinbin Wang1, Hongwei Xun2, Jing Zhao1, Xiaofei Wang1, Tianya Wang1, Hongyan Wang3, Bao Liu1, Yu Li4, Lei Gong1.
Abstract
Plant long non-coding RNAs (lncRNAs) function in diverse biological processes, and lncRNA expression is under epigenetic regulation, including by cytosine DNA methylation. However, it remains unclear whether 5-methylcytosine (5mC) plays a similar role in different sequence contexts (CG, CHG, and CHH). In this study, we characterized and compared the profiles of genome-wide lncRNA profiles (including long intergenic non-coding RNAs [lincRNAs] and long noncoding natural antisense transcripts [lncNATs]) of a null mutant of the rice DNA methyltransferase 1, OsMET1-2 (designated OsMET1-2-/-) and its isogenic wild type (OsMET1-2+/+). The En/Spm transposable element (TE) family, which was heavily methylated in OsMET1-2+/+, was transcriptionally de-repressed in OsMET1-2-/- due to genome-wide erasure of CG methylation, and this led to abundant production of specific lncRNAs. In addition, RdDM-mediated CHH hypermethylation was increased in the 5'-upstream genomic regions of lncRNAs in OsMET1-2-/-. The positive correlation between the expression of lincRNAs and that of their proximal protein-coding genes was also analyzed. Our study shows that CG methylation negatively regulates the TE-related expression of lncRNA and demonstrates that CHH methylation is also involved in the regulation of lncRNA expression.Entities:
Keywords: DNA methylation; OsMET1-2; RNA-directed DNA methylation (RdDM); long non-coding RNAs (lncRNAs); small interference RNA (siRNA); transposable element
Year: 2021 PMID: 33617633 PMCID: PMC8049413 DOI: 10.1093/g3journal/jkab049
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Identification and characterization of long non-coding RNA (lncRNAs) in OsMET1-2+/+ and OsMET1-2−/−. (A) The workflow of lncRNA identification pipeline developed in this study. The parenthesized numbers in blue and red denote the respective number of reads or transcripts input into the following step. The frames in gradient colors specify the detailed database(s) and/or tools adopted in respective step. (B) The Venn diagrams tabulating the numbers of lincRNA and lncNAT shared (common) in OsMET1-2+/+ (blue) and OsMET1-2−/− (red) and specifically identified in respective sample (wild type and mutant specific). The exact number of lncRNAs in each category is listed beneath respective category name. (C) Proportions of lncRNA transcripts (lincRNAs and lncNATs) and the adjacent PCgenes in OsMET1-2+/+ and OsMET1-2−/− categorized in terms of the exon numbers. (D) Proportions of lncRNA transcripts (lincRNAs and lncNATs) and the adjacent PCgenes in OsMET1-2+/+ and OsMET1-2−/− categorized in terms of the transcript length. (E) Cumulative frequency curves of the transcript abundances of lincRNA, lncNAT, and PCgenes. The x-axis tabulates each transcript category with respective log2FC (fold change) of Reads Per Kilobase per Million mapped reads (RPKM); the y-axis tabulates the accumulative frequency after adding each transcript category.
Figure 2Genomic regions of CG hypomethylation in OsMET1-2+/+ expressing mutant-specific lncRNAs after null-mutation of OsMET1-2 gene. (A) The boxplots depict the CG methylation levels of genomic regions (core body and up-/downstream 2 kb flanking regions) expressing common and mutant-specific lncRNAs (including lincRNA and lncNAT) in respective OsMET1-2+/+ and OsMET1-2−/−. Wilcoxon test is adopted to test the statistical significance for paired two sample sets. One asterisk (*), two asterisks (**), and three asterisks (***) denote the significant P-values at the levels of 0.05, 0.01, and 0.001, respectively. (B) Boxplots of weighted mean CG methylation levels of random bootstrap sampled genomic regions and genomic regions expressing common and mutant-specific lncRNAs (lincRNAs and lncNATs) in OsMET1-2+/+. Independent two-sample t-test is used, in which significance levels are also denoted at the same cutoff P-values as above. (C) Density curves of the percentages of random bootstrap sampled intergenic (left) and anti-sense genic regions (right) overlapping with DMRs and arrow-marked observed percentage of common and mutant-specific lncRNAs (lincRNAs and lncNATs) derived from the DMRs. Within respective bootstrapping test, we randomly re-sample 1000 sets of genomic regions, the number and length of which are identical with respective lncRNAs (lincRNAs and lncNATs). Within each re-sampled set of genomic regions, the proportion of regions overlapping with DMRs is calculated. Respective 1000 proportions are summarized in each density curve. The original observed proportion of lncRNA occurred in DMRs is denoted by the arrow and respective statistical P-value for each bootstrapping test is also specified nearby each arrow.
Figure 3LncRNA and mRNA transcripts generated by TEs in OsMET1-2+/+ and OsMET1-2−/−. (A) Proportions of lncRNA transcripts (lincRNA and lncNAT) and genomic mRNA with at least one exon overlapping with TEs (at least 10 bp). (B) Proportions of common and mutant-specific lncRNAs (lincRNAs and lncNATs) overlapping with respective type of TEs (at least 10 bp). The parenthesized number denotes the total number of respective TE type in the genome.
The weighted mean cytosine DNA methylation levels of protein coding genes, TE-related genes, all TE types, and each specific type of TEs in OsMET1-2 and OsMET1-2−−
| Category | CG | CHG | CHH | ||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
| Decreased (%) |
|
| Decreased (%) |
|
| Decreased (%) | |
| Protein coding genes | 26.40 | 3.30 | −87.60 | 9.20 | 6.70 | −26.50 | 2.10 | 0.70 | −65.50 |
| TE-related genes | 85.40 | 18.40 | −78.40 | 64.70 | 51.20 | −20.90 | 4.90 | 2.20 | −53.90 |
| Total repeats | 83.60 | 18.30 | −78.10 | 54.10 | 43.80 | −19.10 | 26.30 | 10.70 | −59.50 |
| Retrotransposons (Class I/retro TE) | 89.60 | 21.10 | −76.50 | 65.00 | 51.00 | −21.50 | 10.20 | 6.10 | −40.60 |
| Copia | 87.50 | 19.30 | −78.00 | 61.00 | 47.00 | −23.00 | 7.60 | 4.90 | −36.10 |
| Gypsy | 90.80 | 21.70 | −76.10 | 68.70 | 53.70 | −21.80 | 7.00 | 5.50 | −20.60 |
| LTR-other | 84.90 | 23.30 | −72.50 | 59.30 | 48.80 | −17.70 | 10.90 | 9.50 | −12.90 |
| Cassandra | 94.40 | 28.90 | −69.40 | 74.20 | 60.10 | −19.00 | 20.50 | 13.10 | −36.20 |
| Caulimovirus | 94.60 | 24.90 | −73.70 | 81.70 | 73.90 | −9.60 | 3.30 | 4.30 | 29.10 |
| LINE | 82.70 | 17.20 | −79.20 | 61.20 | 55.30 | −9.70 | 4.60 | 2.50 | −45.60 |
| SINE | 87.30 | 18.90 | −78.40 | 54.60 | 42.90 | −21.40 | 23.90 | 7.90 | −66.90 |
| Transposons (Class II/DNA TE) | 78.20 | 16.60 | −78.80 | 47.20 | 38.40 | −18.50 | 22.00 | 9.20 | −58.30 |
| En/Spm | 90.50 | 19.00 | −79.00 | 54.10 | 37.00 | −31.50 | 10.00 | 10.20 | 2.70 |
| MITEs | 83.40 | 18.00 | −78.50 | 52.90 | 43.20 | −18.30 | 33.80 | 12.90 | −61.80 |
| hAT | 79.60 | 14.50 | −81.80 | 37.50 | 23.10 | −38.30 | 12.30 | 5.20 | −57.80 |
| Harbinger | 80.90 | 17.50 | −78.40 | 53.10 | 46.30 | −12.90 | 30.20 | 12.80 | −57.70 |
| Stowaway | 77.20 | 17.20 | −77.70 | 45.70 | 37.40 | −18.10 | 25.40 | 9.20 | −63.50 |
| Tourist | 79.40 | 18.40 | −76.80 | 50.30 | 44.60 | −11.40 | 24.50 | 9.50 | −61.10 |
| MuDR | 87.50 | 21.10 | −75.90 | 53.50 | 47.10 | −12.00 | 16.30 | 6.10 | −62.80 |
| DNA-other | 59.40 | 10.10 | −83.00 | 34.60 | 29.40 | −14.90 | 14.40 | 5.90 | −59.30 |
Within each category, the proportion of reduction in DNA methylation level (in CG, CHG, and CHH context) in OsMET1-2−− relative to respective level in the OsMET1-2 is recorded as “Increase or Decreased (%),” which is calculated as (OsMET1-2−−–OsMET1-2)/OsMET1-2.
Figure 4Weighted mean CHH DNA methylation and siRNA abundance (Log2 transformed) of genomic regions (lincRNA bodies and their up-/downstream [+2kb] regulative regions) expressing common, mutant-specific, and differentially up- and downregulated lincRNA in OsMET1-2+/+ and OsMET1-2−/−. (A) Weighted mean CHH DNA methylation and siRNA abundance of genomic regions expressing respective featured lincRNAs. (B) Weighted mean CHH DNA methylation and siRNA abundance of genomic regions expressing En/Spm-derived featured lincRNAs. The gray blocks denote the 5′-upstream (∼250 bp upstream of transcription starting site) regulative regions with co-localization of hypermethylated CHH and abundant siRNAs.
Figure 5Cis-acting lncRNAs is positively correlated with expression of their neighboring PCgenes. (A) Scatter plot illustrating the positive correlation between the fold changes of DElincRNA (differential expression of lincRNA in OsMET1-2−/−vs in OsMET1-2+/+; log2 transformed on the x-axis) and those of respective DEPCgenes (differential expression of lincRNA-related PCgenes in OsMET1-2−/−vs in OsMET1-2+/+; log2 transformed on the y-axis). The detailed Pearson’s correlation indices and respective statistical significances are tabulated in panel C of this figure. (B) Scatter plot illustrating no correlation between the fold changes of DElncNAT (differential expression of lncNAT in OsMET1-2−/−vs in OsMET1-2+/+; log2 transformed on the x-axis) and those of respective DEPCgenes (differential expression of lncNAT-related PCgenes in OsMET1-2−/−vs in OsMET1-2+/+; log2 transformed on the y-axis). The detailed Pearson’s correlation indices and respective statistical significances are tabulated in panel C of this figure. (C) Different lincRNA and lncNAT subgroups are categorized in terms of their relative positions to TEs and CG DMRs, in which the circles denote the lncRNAs co-localizing with the TEs and CG DMRs; the squares denote the lncRNAs uniquely co-localizing with the TEs; the diamonds denote the lncRNAs uniquely co-localizing with the CG DMRs; and the triangles denote the lncRNAs neither co-localizing with the TEs nor CG DMRs. Pearson’s correlation is calculated for paired lncRNA and PCgenes in each subgroup. Three asterisks (***) represent the significant P-values at the level of 0.001; and raw non-significant P-values (>0.05) are specified.
Figure 6Scatter plot illustrating the unique positive correlation of cis-acting lincRNA with the expression of their neighboring PCgenes rather than respective paralogs of PCgenes and random selected PCgenes for the correlation. (A) Positive correlation between the fold changes of DElincRNA (differential expression of lincRNA in OsMET1-2−/−vs in OsMET1-2+/+; log2 transformed on the x-axis) and those paralogs of DEPCgenes and DEPCgene (differential expression of lincRNA-related PCgenes and their paralogs in OsMET1-2−/−vs in OsMET1-2+/+; log2 transformed on the y-axis). No corresponding correlation is detected between DElncNAT and their DEPCgenes and repective paralogs of DEPCgene. The detailed Pearson’s correlation indices and respective statistical significances are tabulated in panel C of this figure. (B) No significant correlation is detected between the lncRNA and their random selected PCgenes. Detailed Pearson’s correlation indices and categories are tabulated in panel C of this figure. (C) Pearson’s correlation indices between the fold changes of DElncRNA (differential expression of lincRNAs and lncNATs in OsMET1-2−−vs in OsMET1-2) and those of DEPCgenes, paralogs of respective DEPCgenes (differential expression of lincRNA- and lncNAT-related PCgenes and their paralogs in OsMET1-2−−vs in OsMET1-2), and random selected respective PCgenes are tabulated with corresponding supporting statistical P-values. Different lincRNA and lncNAT subgroups are categorized in terms of their PCgenes, paralogs of respective DEPCgenes, and random selected PCgenes, in which the circles denote the lincRNAs paired with their respective DEPCgenes; the squares denote the lincRNAs paired with their respective paralogs of DEPCgenes; the diamonds denote the lncNATs paired with their respective DEPCgenes; the triangles denote the lncNATs paired with their respective paralogs of DEPCgenes; the crosses denote the lincRNAs paired with random selected PCgenes; and the pentagons denote the lncNATs with random selected PCgenes. Pearson’s correlation is calculated for each subgroup. Two asterisks (**) represent the significant P-values at the level of 0.01; and raw non-significant P-values (>0.05) are specified.