| Literature DB >> 33809209 |
Chisato Okudaira1, Matomo Sakari1, Toshifumi Tsukahara1,2.
Abstract
Cytosine-to-Uridine (C-to-U) RNA editing involves the deamination phenomenon, which is observed in animal nucleus and plant organelles; however, it has been considered the U-to-C is confined to the organelles of limited non-angiosperm plant species. Although previous RNA-seq-based analysis implied U-to-C RNA editing events in plant nuclear genes, it has not been broadly accepted due to inadequate confirmatory analyses. Here we examined the U-to-C RNA editing in Arabidopsis tissues at different developmental stages of growth. In this study, the high-throughput RNA sequencing (RNA-seq) of 12-day-old and 20-day-old Arabidopsis seedlings was performed, which enabled transcriptome-wide identification of RNA editing sites to analyze differentially expressed genes (DEGs) and nucleotide base conversions. The results showed that DEGs were expressed to higher levels in 12-day-old seedlings than in 20-day-old seedlings. Additionally, pentatricopeptide repeat (PPR) genes were also expressed at higher levels, as indicated by the log2FC values. RNA-seq analysis of 12-day- and 20-day-old Arabidopsis seedlings revealed candidates of U-to-C RNA editing events. Sanger sequencing of both DNA and cDNA for all candidate nucleotide conversions confirmed the seven U-to-C RNA editing sites. This work clearly demonstrated presence of U-to-C RNA editing for nuclear genes in Arabidopsis, which provides the basis to study the mechanism as well as the functions of the unique post-transcriptional modification.Entities:
Keywords: Arabidopsis thaliana; RNA-seq; differentially expressed genes (DEGs); uridine-to-cytidine RNA editing
Year: 2021 PMID: 33809209 PMCID: PMC8001311 DOI: 10.3390/cells10030635
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1Analysis of genes differentially expressed between 12- and 20-d-old Arabidopsis seedlings. (A) Venn diagram of differentially expressed genes (DEGs). The sum of the numbers in each circle represents the total number of genes expressed within a sample, and the overlap represents genes expressed in both samples. (B) Correlation analysis of gene expression between samples. R2 indicates the square of the Pearson’s correlation coefficient. (C) Volcano plot of DEGs. The x-axis shows the fold change in gene expression between different samples and the y-axis shows the statistical significance of the differences in gene expression. Significantly up- and downregulated genes are highlighted in red and green, respectively. Genes showing no differential expression between 12- and 20-d-old seedlings are shown in blue. Comparison of the expression levels of DEGs (D–F). Comparison of, read count, and FPKM values of DEGs (D) between 12- and 20-d-old seedlings. (E) Summary of DEGs. (F) FPKM statistic.
Figure 2Analysis of single-nucleotide base conversions identified in 12-d-old Arabidopsis seedlings by RNA-seq. (A) Pie chart showing the percentage for genes identified with single-nucleotide base conversions. (B) Number of total edited sites and edited genes (blue), and number of sites and genes with U-to-C mutations (orange). (C) Log2FC values for the genes identified with U-to-C nucleotide conversion. Genes were expressed to higher levels in 12-d-old seedlings than in 20-d-old seedlings (C).
List of candidate U-to-C RNA editing sites detected in Arabidopsis seedlings at different developmental stages.
| S.No. | Position | Reads | Gene ID | Description | |
|---|---|---|---|---|---|
| 12 Days | 20 Days | ||||
| 1 | 3412532 | 56 | 0 | AT2G07715 | Ribosomal Proteins L2, RNA binding domain |
| 2 | 8544440 | 34 | 0 | AT4G14940 | Amine oxidase |
| 3 | 26898977 | 2 | 0 | AT5G67411 | GRAS family transcription factor |
| 4 | 8297931 | 4 | 0 | AT1G23380 | KNOTTED1-like homeobox gene 6 |
| 5 | 14657330 | 14 | 0 | AT4G29950 | Ypt/Rab-GAP domain of gyp1p superfamily protein |
| 6 | 3392826 | 107 | 16 | AT2G07709 | - |
| 7 | 362386 | 175 | 44 | ATMG01390 | - |
| 8 | 7191444 | 105 | 197 | AT2G16586 | Unknown |
| 9 | 3061212 | 498 | 2 | AT4G06477 | - |
| 10 | 9226791 | 28 | 69 | AT4G16330 | 2-oxoglutarate (2OG) and Fe(II)-dependent oxygenase superfamily protein |
| 11 | 5816271 | 12 | 6 | AT3G17050 | - |
| 12 | 9255546 | 268 | 99 | AT4G16380 | Heavy metal transport/detoxification superfamily protein |
| 13 | 14198871 | 647 | 240 | AT3G41768 | - |
| 14 | 16918673 | 55 | 46 | AT5G42320 | Zn-dependent exopeptidases superfamily protein |
| 15 | 17708862 | 0 | 21 | AT3G47965 | Unknown |
| 16 | 24989428 | 27 | 0 | AT5G62220 | glycosyltransferase 18 |
| 17 | 21320395 | 0 | 12 | AT5G52530 | dentin sialophosphoprotein-related |
| 18 | 2848835 | 146 | 86 | AT5G08740 | NAD(P)H dehydrogenase C1 |
| 19 | 15546833 | 13 | 0 | AT4G32190 | Myosin heavy chain-related protein |
| 20 | 3392918 | 144 | 14 | AT2G07709 | - |
| 21 | 21319578 | 0 | 5 | AT5G52530 | dentin sialophosphoprotein-related |
| 22 | 21077241 | 0 | 2 | AT1G56290 | CwfJ-like family protein |
| 23 | 7622202 | 0 | 2 | AT4G13070 | RNA-binding CRS1/YhbY (CRM) domain protein |
| 24 | 17692876 | 29 | 0 | AT5G43970 | translocase of outer membrane 22-V |
| 25 | 10266697 | 46 | 6 | AT1G29340 | plant U-box 17 |
| 26 | 7869982 | 17 | 0 | AT5G23380 | Protein of unknown function (DUF789) |
| 27 | 7836325 | 19 | 0 | AT1G22190 | Integrase-type DNA-binding superfamily protein |
| 28 | 19998466 | 36 | 0 | AT3G54000 | Unknown |
| 29 | 603074 | 13 | 5 | AT5G02670 | Unknown |
| 30 | 22561577 | 0 | 2 | AT3G60970 | multidrug resistance-associated protein 15 |
| 31 | 6025041 | 0 | 27 | AT4G09520 | Cofactor-independent phosphoglycerate mutase |
| 32 | 909133 | 0 | 2 | AT4G02070 | MUTS homolog 6 |
| 33 | 7797368 | 0 | 4 | AT4G13420 | high affinity K+ transporter 5 |
| 34 | 8662474 | 0 | 3 | AT4G15180 | SET domain protein 2 |
| 35 | 12669828 | 0 | 2 | AT4G24530 | O-fucosyltransferase family protein |
| 36 | 15653919 | 0 | 2 | AT4G32430 | Pentatricopeptide repeat (PPR) superfamily protein |
| 37 | 5075516 | 0 | 2 | AT2G12490 | - |
| 38 | 17587422 | 0 | 2 | AT2G42200 | squamosa promoter binding protein-like 9 |
| 39 | 17958701 | 0 | 2 | AT2G43200 | S-adenosyl-L-methionine-dependent methyltransferases superfamily protein |
| 40 | 526197 | 0 | 5 | AT3G02515 | - |
| 41 | 20795012 | 69 | 64 | AT3G56040 | UDP-glucose pyrophosphorylase 3 |
| 42 | 3264804 | 0 | 2 | AT5G10370 | helicase domain-containing protein/IBR domain-containing protein/zinc finger protein-related |
| 43 | 9633752 | 0 | 2 | AT5G27330 | Prefoldin chaperone subunit family protein |
| 44 | 12108844 | 0 | 9 | AT5G32481 | - |
| 45 | 15644809 | 0 | 4 | AT5G39090 | HXXXD-type acyl-transferase family protein |
| 46 | 3332097 | 0 | 2 | AT1G10160 | - |
| 47 | 3564739 | 0 | 2 | AT1G10720 | BSD domain-containing protein |
| 48 | 9825469 | 0 | 6 | AT1G28130 | Auxin-responsive GH3 family protein |
| 49 | 9997031 | 0 | 2 | AT1G28440 | HAESA-like 1 |
| 50 | 4006628 | 0 | 13 | AT5G12370 | exocyst complex component sec10 |
| 51 | 5097198 | 0 | 5 | AT2G12505 | - |
| 52 | 11465954 | 0 | 11 | AT1G31930 | extra-large GTP-binding protein 3 |
| 53 | 7014676 | 0 | 2 | AT3G20087 | N/A |
| 54 | 15766171 | 0 | 2 | AT2G37585 | Core-2/I-branching beta-1,6-N-acetylglucosaminyltransferase family protein |
| 55 | 7191297 | 249 | 171 | AT2G16586 | Unknown |
| 56 | 17908527 | 0 | 2 | AT1G48450 | Protein of unknown function (DUF760) |
Figure 3The next-generation sequencing (NGS) data of Arabidopsis for expressed PPR genes. Out of 465 expressed PPR genes, 10 genes including AT3G62470, AT1G50270, AT1G16830, AT1G63080, AT1G06580, AT3G56550, AT1G09820, AT3G53360, AT2G22410, and AT4G32430 showed nucleotide conversion (A). Out of 54 U-to-C variant genes, one gene, AT4G32430, was found as PPR gene (B). The list of expressed genes, PPR genes that differed in base nucleotide conversions, the genes that differed in U-to-C base conversion, and the PPR gene that differed in U-to-C base conversion are shown in (C).
Figure 4The flowchart for methodology for identification of U-to-C RNA editing site. (A) Raw reads are filtered to remove reads containing adapters or reads of low quality, so that downstream analyses are based on clean reads. The filtering process is as follows. (1) Discard reads with adaptor contamination. (2) Discard reads when uncertain nucleotides constitute more than 10% of either read (N > 10%). (3) Discard reads when low-quality nucleotides (base quality less than 20) constitute more than 50% of the read. For mapping sequences, TopHat2 was chosen for plant genomes. The mismatch parameter was set to 2 and other parameters were set to default. Appropriate parameters were also set, such as the longest intron length. Only filtered reads were used to analyze the mapping status of RNA-seq data to the reference genome. Edited sites were further validated and confirmed by RT-PCR. (B) Clean reads for day 12 and day 20. (C) Percentage of reads mapped to genome regions for day 12 and day 20.
Figure 5The Sanger sequence chromatogram depicting the U-to-C types of RNA editing events in 12-d- and 20-day-old seedlings from the same tissues of Arabidopsis via cDNA and genomic, gDNA using forward primers. Arrows indicate the position of RNA editing.
List of genes identified with U-to-C RNA editing sites in 12-day- and 20-day-old Arabidopsis seedlings.
| S.No. | Position | Edited Site | Gene ID | RNA Editing Efficiency (in %) | Encoded Protein | |
|---|---|---|---|---|---|---|
| 12 Days | 20 Days | |||||
| 1. | 14198871 | 5′ UTR | AT2G16586 | 77.30 | 65.74 | Transmembrane protein |
| 2. | 16918673 | CDS | AT5G42320 | 24.20 | 0 | Zn-dependent exopeptidase superfamily protein |
| 3. | 603074 | 5′ UTR | AT5G02670 | 0 | 22.80 | Hypothetical protein |
| 4. | 7191297 | 3′ UTR | AT3G41768 | 45.54 | 49.65 | Ribosomal RNA |
| 5. | 15653919 | 3′ UTR | AT4G32430 | 0 | 20.43 | PPR-like superfamily protein |
| 6. | 17708862 | 3′ UTR | AT3G47965 | 24.54 | 22.48 | Hypothetical protein |
| 7. | 21320395 | CDS | AT5G52530 | 20.65 | 0 | Dentin sialophosphoprotein-like protein |