Literature DB >> 31040371

The implication of plastid transcriptome analysis in petaloid monocotyledons: A case study of Lilium lancifolium (Liliaceae, Liliales).

Hoang Dang Khoa Do1, Joo-Hwan Kim2.   

Abstract

Transcriptome data provide useful information for studying the evolutionary history of angiosperms. Previously, different genomic events (i.e., duplication, deletion, and pseudogenization) were discovered in the plastid genome of Liliales; however, the effects of these events have not addressed because of the lack of transcriptome data. In this study, we completed the plastid genome (cpDNA) and generated transcriptome data of Lilium lancifolium. Consequently, the cpDNA of L. lancifolium is 152,479 bp in length, which consists of one large single copy (81,888 bp), one small single copy (17,607 bp), and two inverted repeat regions (26,544 bp). The comparative genomic analysis of newly sequenced cpDNA and transcriptome data revealed 90 RNA editing sites of which two positions are located in the rRNA coding region of L. lancifolium. A further check on the secondary structure of rRNA showed that RNA editing causes notable structural changes. Most of the RNA editing contents are C-to-U conversions, which result in nonsynonymous substitutions. Among coding regions, ndh genes have the highest number of RNA editing sites. Our study provided the first profiling of plastid transcriptome analyses in Liliales and fundamental information for further studies on post-transcription in this order as well as other petaloid monocotyledonous species.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 31040371      PMCID: PMC6491592          DOI: 10.1038/s41598-019-43259-7

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

In the genomic era, besides the nuclear, mitochondrial, and plastid genomes, transcriptome data also provide useful information for exploring the evolutionary history of angiosperms. Previously, different transcriptome studies of model plants by applying next-generation sequencing method (NGS) were reported, including Arabidopsis thaliana, rice, sugarcane, and so on[1-5]. These data added a deeper understanding of the gene expression and biological mechanisms that allow plants to survive and adapt to the environment. For example, transcriptome data revealed the mechanism that showed how sugarcane responded after being infected by bacteria or a virus[3,5]. Also, transcriptome data suggested genes that are responsible for drought and salinity tolerance in rice[4]. The transcriptome analysis was not only reported for model plants but also wild species. For instance, RNA editing sites were observed in Spirodela polyrhiza and Phalaenopsis aphrodite subsp formosana[6,7]. Also, the mechanism of cold responses in Lilium lancifolium was profiled from transcriptome data[8]. In plants, RNA editing resulted from the modification of nucleotide sequences. This process has a significant effect on gene expression because it can cause the presence of an internal stop codon[9-11]. In fact, the effect of RNA editing on the metabolism of plants was reported[12]. These studies suggested the necessity of transcriptome data in studying the evolution in angiosperms. Liliales is a member of petaloid monocotyledons and consists of 10 families of 1558 taxa[13]. The plastid genomes of this order have been intensively studied, and different genomic events (i.e., inversion, deletion, duplication, and pseudogenization) were discovered[14-19]. For example, various stages of rps16 deletion were recorded in the tribe Melanthieae (Melanthiaceae, Liliales)[19]. Also, duplication events were found in the Paris  species[16,18]. Although genomic events were reported, their effects on the post-transcriptional process are unclear because of the lack of plastid transcriptome data in Liliales. Therefore, in this study, we conducted the first plastid transcriptome analysis in Liliales. First of all, Lilium lancifolium was selected as the target species. Then, complete plastid genome (cpDNA) of L. lancifolium was sequenced using Next-generation sequencing method. Based on the new RNA editing data, we test the hypothesis of whether the pseudogenization can be reversed in L. lancifolium. We also check the effect of RNA editing site on the secondary structure of rRNA.

Results

Plastid genome of Lilium lancifolium

The complete plastid genome sequence of L. lancifolium (accession number MH177880; Fig. 1A) in this study is 152,479 bp in length and composed of a large single copy (LSC; 81,888 bp), a small single copy (SSC; 17,607 bp), and two inverted repeat regions (IR; 26,492 bp). In comparison with the previously completed cpDNA of L. lancifolium from China (accession number KY748297) and Korea (accession number KY940844), of which the length of cpDNA is identical (152,574 bp), the gene contents and orders are similar among the three individuals. However, the percentage of the identity of new cpDNA in L. lancifolium in this study is 99.8% and 99.9% compared to counterparts from Korea and China, respectively. Also, the translation initiation factor IF-1 (infA) gene was not annotated in previous data, but it was predicted as a pseudogene in this study because of the presence of internal stop codons within the coding region. Additionally, the different length of poly A sequence after start codon caused two types of cemA gene. The first type is functional cemA in L. lancifolium from China, of which 9-bp-poly A sequence was found. In contrast, the malfunctioning cemA was annotated in cpDNA from Korea counterparts, of which 10-bp and 11-bp-poly A sequences were found and caused internal stop codons in the coding region. The IGS regions among three cpDNA sequences of L. lancifolium showed a high similarity (over 95%), except the ISG regions of petA-psbJ, ndhF-rpl32, and ccsA-ndhD with similarity of 84.2%, 93.95%, and 94.83%, respectively (Table 1).
Figure 1

The map of plastid genome and number of RNA editing sites in different gene groups. (A) The map of plastid genome of Lilium lancifolium. Genes shown outside and inside of the outer circle are transcribed counter clockwise and clockwise, respectively. The dark gray area in the inner circle indicates the CG content of the chloroplast genome. The colors represent different groups of genes in cpDNA. LSC: Large single copy; SSC: small single copy; IRA: inverted repeat region A; IRB: inverted repeat region B. (B) The number of RNA editing sites in different gene groups. A: Rubisco; B: ATP dehydrogenase subunit P; C: Ribosomal RNAs; D: Cytochrome b6/f; E: Hypothetical proteins; F: ATP synthase; G: Miscellaneous proteins; H: Large and small subunit ribosomal proteins; I: Photosystem I and II; J: RNA polymerase; K: NADH oxidoreductase.

Table 1

Pairwise identity of IGS region among three complete cpDNAs of Lilium lancifolium.

RegionsIdentity (%)RegionsIdentity (%)
KY748297-ChinaKY940844-KoreaKY748297-ChinaKY940844-Korea
trnK_UUU-matK 99.6799.67 psaI-ycf4 10099.72
matK-trnK_UUU 99.8799.87 ycf4-cemA 10099.86
trnK_UUU-rps16 99.7399.46 petA-psbJ 84.284.2
rps16-trnQ_UUG 99.6899.68 psbE-petL 10099.92
psbK-psbI 99.299.2 psaJ-rpl33 99.6100
trnS_GCU-trnG_UCC 99.7199.71 rpl33-rps18 99.4399.43
trnR_UCU-atpA 99.1298.23 rps18-rpl20 10099.63
atpH-atpI 99.8999.89clpP intron 199.8799.75
rps2-rpoC2 10098.28petB intron10099.88
rpoC1 intron99.8799.74 rps11-rpl36 10099.24
psbM-trnD_GUC 99.92100 rps8-rpl14 96.9596.95
trnE_UUC-trnT_GGU 10099.85 rpl16 intron 10099.9
trnT_GGU-trnE_UUC 99.8999.89 ycf15-trnL_CAA 10099.86
psaA-ycf3 99.7599.75 rps7-rps12 98.2898.28
ycf3 intron 210099.02 rps12-trnV_GAC 99.95100
trnS_GGA-rps4 99.6599.65trnI_GAU intron10099.89
trnT_UGU-trnL_UAA 99.87100 ndhF-rpl32 93.9593.95
trnL_UAA intron10099.26 ccsA-ndhD 94.8394.83
ndhC-trnV_UAC 99.8699.86 psaC-ndhE 99.7599.75
accD-psaI 10099.88 rps15-ycf1 99.5199.51

The bold letters indicate regions which have low similarity (<95%).

The map of plastid genome and number of RNA editing sites in different gene groups. (A) The map of plastid genome of Lilium lancifolium. Genes shown outside and inside of the outer circle are transcribed counter clockwise and clockwise, respectively. The dark gray area in the inner circle indicates the CG content of the chloroplast genome. The colors represent different groups of genes in cpDNA. LSC: Large single copy; SSC: small single copy; IRA: inverted repeat region A; IRB: inverted repeat region B. (B) The number of RNA editing sites in different gene groups. A: Rubisco; B: ATP dehydrogenase subunit P; C: Ribosomal RNAs; D: Cytochrome b6/f; E: Hypothetical proteins; F: ATP synthase; G: Miscellaneous proteins; H: Large and small subunit ribosomal proteins; I: Photosystem I and II; J: RNA polymerase; K: NADH oxidoreductase. Pairwise identity of IGS region among three complete cpDNAs of Lilium lancifolium. The bold letters indicate regions which have low similarity (<95%).

RNA editing sites and their potential effects

The mapping results of transcription data to complete cpDNA of L. lancifolium revealed 90 editing sites, which located unequally among genes groups (Fig. 1B). The ndh genes possess the highest number of editing sites (29 sites) followed by the RNA polymerase genes (16 sites). The Rubisco gene (rbcL) has only one RNA editing site within its coding region. In contrast, ndhB gene has the highest number of editing sites (10 sites). The most abundant content of RNA editing in L. lancifolium is C-to-U conversion (Table 2). However, the U-to-C conversion was also found in rpl36 and rrn23. Most RNA editing resulted in nonsynonymous substitution, of which the changes from S (Serine) to L (Leucine) is the most frequent (25 sites), followed by S (serine) to F (Phenylalanyl) with 17 sites. Nevertheless, 12 out of 90 editing sites resulted in synonymous substitution (Table 2). Additionally, a total of 32 editing sites were also found in IGS regions of L. lancifolium cpDNA (Supplementary Table 1). In comparison to the previous complete cpDNA of L. lancifolium, the infA and cemA were annotated as pseudogenes because of the presence of internal stop codons within the coding regions. However, the transcriptome data revealed that there are no significant RNA editing sites within the coding region of these two genes. The RNA editing occurred not only in protein-coding genes but also in rRNA (Table 2). Specifically, the U-to-C conversion was found in rrn23S, whereas the C-to-U conversion was recorded in rrn5S. A further check on the predicted structure of rrn5S showed that the editing event affected the structure of rrn5S (Fig. 2). The RNA expression level was also compared among protein-coding genes of L. lancifolium cpDNA (Table 3). The results showed that the psbA is the most expressed gene followed by rbcL and petB. Although the ndh genes have the highest number of RNA editing sites, their expression level is lower than other genes (Table 3). The petL has the lowest expression level.
Table 2

The number of RNA editing sites among coding regions of plastid genome of L. lancifolium.

GeneSitePosition (aa/nucleotide)Editing contentCoverageNumber of reads (percentage)
rbcL 150/150P(ccC) → P(ccU)*188389C: 39 (0.02%); U: 187680 (99.62%)
matK 1160/478H(Cau) → Y(Uau)751C: 195 (26.1%); U: 556 (73.9%)
2245/734S(uCu) → F(uUu)3612C: 1888 (52.3%); U: 1718 (47.6%)
psbA 1232/696S(ucC) → S(ucU)*2011054C: 387 (0.02%); U: 2002141 (99.56%)
atpA 1258/774S(uCa) → L(uUa)2684C: 37 (1.4%); U: 2646 (98.5%)
2383/1148S(uCa) → L(uUa)3878C: 51 (1.3%); U: 3821 (98.5%)
atpF 131/92P(cCa) → L(cUa)3913C: 331 (8.5%); U: 3579 (91.4%)
atpI 115/45Y(uaC) → Y(uaU)3682C: 2736 (74.3%); U: 940 (25.5%)
2210/629S(uCa) → L(uUa)3580C: 47 (1.3%); U: 3528 (98.5%)
rpoC2 11235/3704S(uCa) → L(uUa)493C: 43 (8.7%); U: 451 (91.3%)
rpoC1 114/41P(cCa) → L(cUa)764C: 203 (26.5%); U: 558 (72.9%)
261/182S(uCc) → F(uUc)1004C: 543 (54%); U: 462 (46%)
3107/321I(auC) → I(auU)*818C: 487 (59.5%); U: 331 (40.4%)
4178/500S(uCa) → L(uUa)848C: 98 (11.5%); U: 750 (88.3%)
5210/629S(uCa) → L(uUa)857C: 149 (17.4%); U: 709 (82.6%)
6267/799R(Cgg) → W(Ugg)906C: 43 (4.7%); U: 859 (94.7%)
rpoB 129-10S(uCc) → F(uUc)223C: 81 (36.2%); U: 142 (63.4%)
2113/338S(uCu) → F(uUu)205C: 82 (39.8%); U: 123 (59.7%)
3184/551S(uCa) → F(uUa)78C: 66 (83.5%); U: 13 (16.5%)
4189/566S(uCa) → F(uUa)112C: 54 (47.8%); U: 59 (52.2%)
5665/1994S(uCa) → F(uUu)233C: 11 (4.7%); U: 223 (95.3%)
6807/2420S(uCa) → F(uUa)230C: 31 (13.4%); U: 200 (86.6%)
7900/2698P(Ccu) → S(Ucu)286C: 75(26.1%); U: 212 (73.9%)
psbZ 117/50S(uCa) → L(uUa)23558C: 869 (3.7%); U: 22664 (96.2%)
260/180L(cuC) → L(cuU)*15389C: 14560 (94.6%); U: 810 (5.3%)
rps14 127/80S(uCa) → L(uUa)14988C: 470 (3.1%); U: 14472 (96.6%)
psaA 151/153A(gcC) → A(gcU)*10434C: 6 (0.1%); U: 10407 (99.7%)
ycf3 115/44S(uCu) → F(uUu)2070C: 694 (33.5%); U: 1372 (66.2%)
221/63I(auC) → I(auU)*1477C: 640 (43.3%); U: 834 (56.4%)
362/185T(aCg) → M(aUg)648C: 502 (77.3%); U: 146 (22.5%)
464/191P(cCa) → L(cUa)903C: 423 (46.8%); U: 469 (51.9%)
ndhJ 143/128S(uCa) → L(uUa)968C: 245 (25.3%); U: 724 (74.7%)
ndhK 123/69P(ccC) → P(ccU)*264C: 155 (58.5%); U: 110 (41.5%)
227/81F(uuC) → F(uuU)*343C: 191 (55.5%); U: 153 (44.5%)
ndhC 113-5H(Cac) → Y(Uac)335C: 69 (20.5%); U: 267 (79.5%)
2104/311P(cCa) → L(cUa)264C: 155 (58.5%); U: 110 (41.5%)
3108/323S(uCa) → L(uUa)343C: 191 (55.5%); U: 153 (44.5%)
atpB 1395/1184S(uCa) → L(uUa)8704C: 140 (1.6%); U: 8556 (98.3%)
accD 1452/1355S(uCa) → L(uUa)625C: 224 (35.8%); U: 393 (62.8%)
2466/1397P(uCc) → L(uUc)574C: 245 (42.6%); U: 329 (57.2%)
psaI 125/74S(uCu) → F(uUu)1721C: 579 (33.6%); U: 1140 (66.2%)
227/80H(Cau) → Y(Uau)1152C: 1078 (93.5%); U: 72 (6.2%)
334/102V(guC) → V(guU)3243C: 2758 (85%); U: 482 (14.9%)
ycf4 1176/528F(uuC) → F(uuU)*2311C: 1774 (76.7%); U: 538 (23.3%)
psbJ 120/59P(cCu) → L(cUu)21133C: 392 (1.9%); U: 20673 (97.8%)
psbF 126/77S(uCu) → F(uUu)8436C: 252 (3%); U: 8177 (96.9%)
psbE 172/214P(Ccu) → S(Ucu)13182C: 145 (1.1%); U: 13023 (98.8%)
petL 12/5S(uCu) → F(uUu)2646C: 559 (21.1%); U: 2084 (78.7%)
219/56P(cCa) → L(cUa)1591C: 48 (3%); U: 1540 (96.7%)
rps18 174/221S(uCg) → L(uUg)5264C: 283 (5.4%); U: 4975 (94.5%)
clpP 126/82H(Cau) → Y(Uau)865C: 110 (12.7%); U: 756 (87.3%)
2187/559H(Cau) → Y(Uau)1533C: 107 (5.4%); U: 1402 (91.4%)
psbN 110/30F(uuC) → F(uuU)*57622C: 12298 (21.3%); U: 45238 (78.5%)
petB 14/11N(aAu) → S(aGu)10646A: 8356 (78.5%); G: 2268 (21.3%)
2142/424R(Cgg) → W(Ugg)27257C: 290 (1.1%); U: 26917 (98.7%)
3206/617P(cCa) → L(cUa)9585C: 220 (2.3%); U: 9351 (97.5%)
petD 1162/484Q(Caa) → stop(Uaa)13552C: 248 (1.8%); U: 13286 (98%)
rpoA 167/200S(uCu) → F(uUu)947C: 323 (34.1%); U: 622 (65.6%)
2123/368S(uCa) → L(uUa)1193C: 254 (21.3%); U: 938 (78.6%)
rpl36 114-5V(gUu) → A(gCu)3479U: 2 (0.1%); C: 3475 (99.8%)
rps3 1157/470T(aCa) → I(aUa)1513C: 74 (4.9%); U: 1440 (95.1%)
2195/583H(Cau) → Y(Uau)1951C: 309 (15.8%); U: 1638 (83.9%)
rpl2 11/2T(aCg) → M(aUg)708C: 340 (48%); U: 369 (52%)
rpl23 124/71S(uCu) → F(uUu)562C: 84 (48%); U: 479 (85.1%)
ndhB 150/149S(uCa) → L(uUa)803C: 88 (10.9%); U: 716 (89.1%)
2156/467P(cCa) → L(cUa)929C: 41 (4.4%); U: 887 (95.4%)
3181/542T(aCg) → M(aUg)536C: 55 (10.2%); U: 482 (89.8%)
4204/611S(uCa) → L(uUa)343C: 58 (16.9%); U: 286 (83.1%)
5205/704S(uCc) → F(uUc)347C: 53 (15.2%); U: 295 (84.8%)
6246/737P(cCa) → L(cUa)167C: 65 (38.7%); U: 102 (60.7%)
7277/830S(uCa) → L(uUa)193C: 123 (61.3%); U: 71 (36.4%)
8279/836S(uCa) → L(uUa)162C: 80 (49.1%); U: 83 (50.9%)
9371/112S(uCa) → L(uUa)1134C: 108 (9.5%); U: 1025 (90.3%)
10494/1481P(cCa) → L(cUa)1271C: 188 (14.8%); U: 1082 (85.1%)
ndhF 121/62S(uCa) → L(uUa)674C: 46 (6.8%); U: 628 (93%)
287/259H(Cac) → Y(Uac)208C: 51 (24.4%); U: 158 (75.6%)
3131/392S(uCu) → F(uUu)1205C: 13 (11.1%); U: 1191 (98.8%)
ccsA 1118/353S(uCa) → L(uUa)750C: 70 (9.3%); U: 680 (90.5%)
2272/815S(uCa) → L(uUa)731C: 68 (9.3%); U: 662 (90.4%)
ndhD 11/2T(aCg) → M(aUg)791C: 285 (36%); U: 505 (63.8%)
222/65S(uCc) → F(uUc)628C: 82 (13%); U: 545 (86.6%)
3130/389S(uCa) → L(uUa)722C: 129 (17.8%); U: 593 (82%)
4227/680S(uCg) → L(uUg)840C: 190 (22.6%); U: 648 (77.1%)
5318/953T(aCa) → I(aUa)930C: 124 (13.3%); U: 803 (86.3%)
ndhG 117/50S(uCa) → L(uUa)531C: 115 (21.6%); U: 417 (78.4%)
2116/347P(cCg) → L(cUg)1198C: 107 (8.9%); U: 1088 (90.7%)
ndhA 1358/1073S(uCc) → F(uUc)1467C: 198 (13.5%); U: 1226 (86.2%)
ndhH 130-10L(cuC) → L(cuU)*1107C: 1020 (92.1%); U: 85 (7.7%)
2169/505H(Cau) → Y(Uau)740C: 85 (11.5%); U: 651 (87.9%)
rrn5S 1−/72C → U9350C: 24 (0.3%); U: 9258 (99%)
rrn23S 1−/1327U → C2532C: 2148 (84.8%); U: 381 (15%)

The asterisk indicates synonymous substitution. Bold letters represent changes of nucleotides and their positions in the codons.

Figure 2

The predicted secondary structure of rrn5S with (lower) and without RNA editing site (upper). The color radian (from purple to red) means the probability of connection among nucleotides (from 0 to 1). The black arrows indicate the position of RNA editing.

Table 3

RNA expression of protein-coding genes in the L. lancifolium chloroplast genome.

GeneLength (bp)RPKMGeneLength (bp)RPKMGeneLength (bp)RPKM
psbA 1032557101 atpI 7441798 ndhK 872459
rbcL 144372754 psbF 1201780 ndhG 534454
petB 146711128 atpF 13381766 rpoC1 2837451
psbC 14169863 ycf4 5551700 ndhB 2215425
petD 12339292 cemA 7091460 ndhD 1506411
psbH 2228993 rps11 4171391 accD 1470400
psbD 10628199 petA 9631363 rps4 606382
psbZ 1896822 ndhE 3061341 rps15 273369
psaC 2466521 ycf3 19491259 ndhH 1182365
psbB 15275815 rpl14 3691235 ccsA 966362
psbN 1325486 rpl16 14201158 rps12 914360
psbE 2525439 rps8 3991052 rpl36 114338
rps14 3035273 infA 2281042 rpl2 1497310
psaA 22534791 ndhA 20801004 rps2 711303
psaB 22054244 psbT 102932 ndhC 363292
psbJ 1233913 psbI 111837 rpl23 282220
atpB 14973584 ndhJ 477827 ycf1 5577207
rpl33 2043280 rps3 657788 psaI 105190
atpE 4083191 rpoA 1008705 rpl20 354177
atpH 2462600 ndhI 540680 rpoC2 4125171
psbL 1172587 ndhF 2229641 rpoB 3207133
psaJ 1292369 rpl32 174626 psbM 105112
psbK 1922181 rps7 468622 petN 9089
rps18 3062058 petG 114555 ycf2 662170
atpA 15241966 rps19 279523 ycf15 23156
rps16 11421964 rpl22 393496 petL 96013
matK 15391865 clpP 1991488
The number of RNA editing sites among coding regions of plastid genome of L. lancifolium. The asterisk indicates synonymous substitution. Bold letters represent changes of nucleotides and their positions in the codons. The predicted secondary structure of rrn5S with (lower) and without RNA editing site (upper). The color radian (from purple to red) means the probability of connection among nucleotides (from 0 to 1). The black arrows indicate the position of RNA editing. RNA expression of protein-coding genes in the L. lancifolium chloroplast genome.

Discussion

Previously, most of the plastid genome studies used the data of one individual as the representative of that species and focused on a comparative genomic analysis with closely related taxa[15,16,18]. This approach has not been fully providing detailed information on the diversification of cpDNAs within a species. Recently, Shi et al.[20] reported 11 complete cpDNAs of both cultivated and wild watermelon. The cpDNAs of three individuals of each species were completed and compared to others. They showed that although the gene number and order are identical among examined species, the wild watermelon exhibited a significant change in the plastid genome size. In this study, the newly sequenced cpDNA revealed both conserved and diverse trends in comparison with previously published cpDNA of L. lancifolium. In fact, the cpDNA of L. lancifolium in this study is longer (105 bp) than those in previous studies which have an identical length (152,574 bp). Additionally, the Korean L. lancifolium has nonfunctional cemA which was found as a functional gene in Chinese counterpart. These findings suggested the interspecific diversification of plastid genome among wild plants and the need for further studies on this issue. Additionally, low similarities in IGS of petA-psbJ, ndhF-rpl32, and ccsA-ndhD suggest these regions as hot-spot sites for further studies on evolution of L. lancifolium and related species (Table 1). RNA editing plays an important role during the post-transcriptional process because it alters the coding content of the genes by two pathways of insertion/deletion and conversion/substitution. Specifically, the C-to-U conversion altered a serin to phenylalanine codon in psbF mRNA of Spirodela polyrhiza[6]. Also, the formation of translation initiation, or internal stop codon, caused by RNA editing, was reported[9-11]. A similar trend was found in the transcriptome data of L. lancifolium (Table 2). The changes of amino acid composition among genes were mainly caused by C-to-U conversion. Also, the formation of the start codon in rpl36 and ndhD genes resulted from C-to-U conversion at the second position in the start codon. In L. lancifolium cpDNA, infA and cemA genes were annotated as pseudogenes and expected to be corrected by RNA editing process. However, there are no RNA editing sites in mRNA of these two genes. Previously, the loss of infA in cpDNA was recorded in many plants[21] and compensated by the nuclear infA. In fact, mRNA of nuclear infA was found by assembling transcriptome data to infA gene of Pheonix dactylifera (GenBank Accession XM_008784933). In contrast, the case of cemA needs further investigations. Among the four examined monocots, there are different numbers of RNA editing sites (Supplementary Table 2). Most of RNA editing sites resulted in nonsynonymous substitutions, except for Phalaenopsis aphrodite subp formosana of which more than half substitution is synonymous (Supplementary Table 2). However, most of the editing content is C to U in all examined taxa. Notably, the RNA editing in Deschampsia antarctica revealed the changes from G to A, G to C, A to G, A to C, A to U, and U to G[22]. The transcriptome data revealed different expression level of genes in L. lancifolium cpDNA (Table 3). In comparison with RNA expression in D. antarctica, a member of the grass family, there are differences among expression level between L. lancifolium and D. antarctica[22]. One possible explanation could be the different habitat environment. In fact, D. antarctica adapted to the harsh environment of Antarctica whereas L. lancifolium distributes in temperate regions. Additionally, the gene expression is different during development stages of plants. In this study, we used only the leaf tissue at the growth stage of L. lancifolium. Therefore, further studies of transcriptome of various tissues at different development stages should be conducted to explore the overall trend of expression in L. lancifolium. Previously, Chen et al.[7] reported the effect of RNA editing on stabilizing the secondary structure of trnM in Phalaenopsis aphrodite subsp formosana. In this study, RNA editing site resulted in changes in the secondary structure of rrn5S. These data revealed the effect of RNA editing process on the stability of rRNA and tRNA. Although RNA editing sites have been recorded in protein coding regions of plastid genomes, studies on the secondary structure of these changes have not been fully conducted. Therefore, further investigations should be done to give a deeper understanding of this issue. To sum up, the first plastid genome analysis of L. lancifoliun provided fundamental information for further studies on post-transcription events in Liliales. In fact, the RNA editing process is not able to reverse the pseudogenization caused by genomic events in plastid genomes of L. lancifolium. In this study, only leaf tissue was used for plastid transcriptome study, which does not fully reflect the evolutionary information in L. lancifolium. Therefore, transcriptome data of other tissue should be generated to trace the evolutionary history in L. lancifolium and other species of Liliales.

Materials and Methods

Plant materials, total DNA extraction, and RNA isolation

The mature fresh leaves of Lilium lancifolium during the growth stage were collected in its wild habitats. The specimen of L. lancifolium was made and deposited to Gachon University Herbarium. For DNA extraction, the fresh leaves were dried in silica gel before extraction steps with Plant DNeasy Mini Kit (Qiagen, Korea). To isolate total RNA, the fresh leaves were immediately put in liquid nitrogen after collected. Then, they were stored in the cold condition until being used for RNA isolation, which was conducted using Plant RNeasy mini Kit (Qiagen, Korea). Both DNA extraction and RNA isolation were conducted based on manufacturer’s instruction. The quality of DNA and RNA were tested using gel electrophoresis and one spectrophotometer.

NGS generation, genome assembly, RNA editing determination and prediction of rRNA structure

To generate NGS data, the total DNA and RNA from leaves of L. lancifolium were applied to Illumina Hiseq 200 and Nextseq 500, respectively. First of all, the DNA and RNA were fragmented. Then, newly fragmented DNA and RNA were hybridized and ligated with adapters. In the next step, PCR amplification was employed to create the sequence library. Finally, the library was sequenced and resulted in the DNA-NGS data of 301 bp in length and transcriptome data of 76 bp in length. The DNA-NGS data were imported to Geneious program for further analysis[23]. The reads were trimmed with more than 5% chance of an error per base before being assembled to reference cpDNA of Lilium lancifolium (Accession number KY940844) with similarity over 95% between reads and reference sequence. Consequently, there are 3,852,736 reads of which 17,473 reads (0.45%) were assembled to reference with coverage of 34.5x. The newly completed cpDNA of L. lancifolium was annotated and manual adjusted in Geneious program. The map of cpDNA was illustrated by OGDraw[24] with manual modification. The new cpDNA in this study was aligned with previously reported cpDNA of L. lancifolium (Accession number KY748297 and KY940844) using MAUVE alignment embedded in Geneious to identify hot-spot regions[25]. Also, the newly assembled cpDNA (GenBank Accession number MH177880) was used for identifying RNA editing sites. The RNA sequence data were imported to Geneious and aligned to cpDNA of L. lancifolium using Bowtie 2.0 with mismatch ≤2[26]. The filtered reads (26,824,116 out of 53,643,506 reads) were then analyzed using Cufflinks to calculate the read per kilobase million (RFKM) and TopHat for variants calling[27]. The determination of RNA editing sites was based on the division of reads with editing based on the total reads of that position. If the frequency of C-to-U or U-to-C conversion was over 5%, that position was recognized as an RNA editing sites as described in previous study[28]. An online tool (http://rna.tbi.univie.ac.at/) was used to predict the second structure of rRNA[29]. Transcriptome data of L. lancifolium was deposited to NCBI (SRA accession SAMN08940087). Supplementary Tables
  24 in total

1.  Many parallel losses of infA from chloroplast DNA during angiosperm evolution with multiple independent transfers to the nucleus.

Authors:  R S Millen; R G Olmstead; K L Adams; J D Palmer; N T Lao; L Heggie; T A Kavanagh; J M Hibberd; J C Gray; C W Morden; P J Calie; L S Jermiin; K H Wolfe
Journal:  Plant Cell       Date:  2001-03       Impact factor: 11.277

2.  Mauve: multiple alignment of conserved genomic sequence with rearrangements.

Authors:  Aaron C E Darling; Bob Mau; Frederick R Blattner; Nicole T Perna
Journal:  Genome Res       Date:  2004-07       Impact factor: 9.043

3.  Editing of Mitochondrial Transcripts nad3 and cox2 by Dek10 Is Essential for Mitochondrial Function and Maize Plant Development.

Authors:  Weiwei Qi; Zhongrui Tian; Lei Lu; Xiuzu Chen; Xinze Chen; Wei Zhang; Rentao Song
Journal:  Genetics       Date:  2017-02-17       Impact factor: 4.562

4.  High levels of RNA editing in a vascular plant chloroplast genome: analysis of transcripts from the fern Adiantum capillus-veneris.

Authors:  Paul G Wolf; Carol A Rowe; Mitsuyasu Hasebe
Journal:  Gene       Date:  2004-09-15       Impact factor: 3.688

5.  Comparative genomics of four Liliales families inferred from the complete chloroplast genome sequence of Veratrum patulum O. Loes. (Melanthiaceae).

Authors:  Hoang Dang Khoa Do; Jung Sung Kim; Joo-Hwan Kim
Journal:  Gene       Date:  2013-08-23       Impact factor: 3.688

6.  A trnI_CAU triplication event in the complete chloroplast genome of Paris verticillata M.Bieb. (Melanthiaceae, Liliales).

Authors:  Hoang Dang Khoa Do; Jung Sung Kim; Joo-Hwan Kim
Journal:  Genome Biol Evol       Date:  2014-06-19       Impact factor: 3.416

7.  Whole plastid transcriptomes reveal abundant RNA editing sites and differential editing status in Phalaenopsis aphrodite subsp. formosana.

Authors:  Ting-Chieh Chen; Yu-Chang Liu; Xuewen Wang; Chi-Hsuan Wu; Chih-Hao Huang; Ching-Chun Chang
Journal:  Bot Stud       Date:  2017-09-16       Impact factor: 2.787

8.  Full Chloroplast Genome Assembly of 11 Diverse Watermelon Accessions.

Authors:  Chao Shi; Shuo Wang; Fei Zhao; Hua Peng; Chun-Lei Xiang
Journal:  Front Genet       Date:  2017-04-18       Impact factor: 4.599

9.  Insight into infrageneric circumscription through complete chloroplast genome sequences of two Trillium species.

Authors:  Sang-Chul Kim; Jung Sung Kim; Joo-Hwan Kim
Journal:  AoB Plants       Date:  2016-04-06       Impact factor: 3.276

10.  Transcriptome analysis in different rice cultivars provides novel insights into desiccation and salinity stress responses.

Authors:  Rama Shankar; Annapurna Bhattacharjee; Mukesh Jain
Journal:  Sci Rep       Date:  2016-03-31       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.