Literature DB >> 29088105

The Complete Chloroplast Genome Sequences of the Medicinal Plant Forsythia suspensa (Oleaceae).

Wenbin Wang1, Huan Yu2, Jiahui Wang3, Wanjun Lei4, Jianhua Gao5, Xiangpo Qiu6, Jinsheng Wang7.   

Abstract

Forsythia suspensa is an important medicinal plant and traditionally applied for the treatment of inflammation, pyrexia, gonorrhea, diabetes, and so on. However, there is limited sequence and genomic information available for F. suspensa. Here, we produced the complete chloroplast genomes of F. suspensa using Illumina sequencing technology. F. suspensa is the first sequenced member within the genus Forsythia (Oleaceae). The gene order and organization of the chloroplast genome of F. suspensa are similar to other Oleaceae chloroplast genomes. The F. suspensa chloroplast genome is 156,404 bp in length, exhibits a conserved quadripartite structure with a large single-copy (LSC; 87,159 bp) region, and a small single-copy (SSC; 17,811 bp) region interspersed between inverted repeat (IRa/b; 25,717 bp) regions. A total of 114 unique genes were annotated, including 80 protein-coding genes, 30 tRNA, and four rRNA. The low GC content (37.8%) and codon usage bias for A- or T-ending codons may largely affect gene codon usage. Sequence analysis identified a total of 26 forward repeats, 23 palindrome repeats with lengths >30 bp (identity > 90%), and 54 simple sequence repeats (SSRs) with an average rate of 0.35 SSRs/kb. We predicted 52 RNA editing sites in the chloroplast of F. suspensa, all for C-to-U transitions. IR expansion or contraction and the divergent regions were analyzed among several species including the reported F. suspensa in this study. Phylogenetic analysis based on whole-plastome revealed that F. suspensa, as a member of the Oleaceae family, diverged relatively early from Lamiales. This study will contribute to strengthening medicinal resource conservation, molecular phylogenetic, and genetic engineering research investigations of this species.

Entities:  

Keywords:  Forsythia suspensa; chloroplast genome; comparative genomics; phylogenetic analysis; sequencing

Mesh:

Substances:

Year:  2017        PMID: 29088105      PMCID: PMC5713258          DOI: 10.3390/ijms18112288

Source DB:  PubMed          Journal:  Int J Mol Sci        ISSN: 1422-0067            Impact factor:   5.923


1. Introduction

Forsythia suspensa (Thunb.) Vahl, known as “Lianqiao” in Chinese, is a well-known traditional Asian medicine that is widely distributed in many Asian and European countries [1]. In folk medicine, the extract of the dried fruit has long been used to treat a variety of diseases, such as inflammation, pyrexia, gonorrhea, tonsillitis, and ulcers [2]. In recent years, the dried ripe fruit of F. suspensa has often been prescribed for the treatment of diabetes in China [3,4]. Chloroplast (cp) genomes are mostly circular DNA molecules, which have a typical quadripartite structure composed of a large single copy (LSC) region and a small single copy (SSC) region interspersed between two copies of inverted repeats (IRa/b) [5]. The cp genome sequences can provide vast information not only about genes and their encoded proteins, but also on functional implications and evolutionary relationships [6]. Due to high-throughput capabilities and relatively low costs, next-generation sequencing techniques have made it more convenient to obtain a large number of cp genome sequences [7]. After the first complete cp DNA sequences were reported in Nicotiana tabacum [8] and Marchantia polymorpha [9], complete cp DNA sequences of numerous plant species were determined [6,10,11,12]. To date, approximately 1300 plant cp genomes are publicly available as part of the National Center for Biotechnology Information (NCBI) database. Within the Oleaceae family, the complete cp genomes of several plant species have been published [12,13,14,15], thereby providing additional evidence for the evolution and conservation of cp genomes. Nevertheless, no cp genome belonging to genus Forsythia has been reported. Few data are available with respect to the F. suspensa cp genome. In order to characterize the complete cp genome sequence of the F. suspensa and expand our understanding of the diversity of the genus Forsythia, details of the cp genome structure and organization are reported in this paper. This is also the first sequenced member of the genus Forsythia (Oleaceae). We compare the F. suspense cp genome with previously annotated cp genomes of other Lamiales species. Our studies could provide basic data for the medicinal species conservation and molecular phylogenetic research of the genus Forsythia and Lamiales.

2. Results and Discussions

2.1. Genome Features

Whole genome sequencing using an Illumina Hiseq 4000 PE150 platform generated 19,241,634 raw reads. Clean reads were obtained by removing adaptors and low-quality read pairs. Then, we collected 662,793 cp-genome-related reads (3.44% of total reads), reaching an average of 636 × coverage over the cp genome. With PCR-based experiments, we closed the gaps and validated the sequence assembly, and ultimately obtained a complete F. suspensa cp genome sequence, which was then submitted to GenBank (accession number: MF579702). Most cp genomes of higher plants have been found to have a typical quadripartite structure composed of an LSC region and an SSC region interspersed between the IRa/b region [5]. The complete cp genome of F. suspensa has a total length of 156,404 bp, with a pair of IRs of 25,717 bp that separate an LSC region of 87,159 bp and an SSC region of 17,811 bp (Figure 1). The total GC content was 37.8%, which was similar to the published Oleaceae cp genomes [12,13,14,15]. The GC content of the IR regions was 43.2%, which was higher when compared with the GC content in the LSC and SSC regions (35.8% and 31.8%, respectively).
Figure 1

Chloroplast genome map of Forsythia suspensa. Genes drawn inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes are color-coded based on their function, which are shown at the left bottom. The inner circle indicates the inverted boundaries and GC content.

The gene content and sequence of the F. suspensa cp genome are relatively conserved, with basic characteristics of land plant cp genomes [16]. It encodes a total of 114 unique genes, of which 19 are duplicated in the IR regions. Out of the 114 genes, there are 80 protein-coding genes (70.2%), 30 tRNA (26.3%), and four rRNA genes (rrn5, rrn4.5, rrn16, rrn23) (3.5%) (Table 1). Eighteen genes contained introns, fifteen (nine protein-coding and six tRNA genes) of which contained one intron and three of which (rps12, ycf3, and clpP) contained two introns (Table 2). The rps12 gene is a trans-spliced gene, three exons of which were located in the LSC region and IR regions, respectively. The complete gene of matK was located within the intron of trnK-UUU. One pseudogene (non functioning duplications of functional genes), ycf1, was identified, located in the boundary regions between IRb/SSC. The partial gene duplication might have caused the lack of protein-coding ability. In general, the junctions between the IR and LSC/SSC regions vary among higher plant cp genomes [17,18,19]. In the F. suspensa cp genome, the ycf1 gene regions extended into the IR region in the IR/SSC junctions, while the rpl2 was 51 bp apart from the LSC/IR junction.
Table 1

A list of genes found in the plastid genome of Forsythia suspensa.

Category for GenesGroup of Gene Name of Gene
Photosynthesis related genesRubiscorbcL
Photosystem ІpsaA, psaB, psaC, psaI, psaJ
Assembly/stability of photosystem Іycf3 *, ycf4
Photosystem ІІpsbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
ATP synthaseatpA, atpB, atpE, atpF *, atpH, atpI
cytochrome b/f complexpetA, petB *, petD *, petG, petL, petN
cytochrome c synthesisccsA
NADPH dehydrogenasendhA *, ndhB *, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ
Transcription and translation related genestranscriptionrpoA, rpoB, rpoC1 *, rpoC2
ribosomal proteinsrps2, rps3, rps4, rps7, rps8, rps11, rps12 *, rps14, rps15, rps16 *, rps18, rps19, rpl2 *, rpl14, rpl16 *, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36
translation initiation factorinfA
RNA genesribosomal RNArrn5, rrn4.5, rrn16, rrn23
transfer RNAtrnA-UGC *, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-UCC *, trnG-GCC *, trnH-GUG, trnI-CAU, trnI-GAU *, trnK-UUU *, trnL-CAA, trnL-UAA *, trnL-UAG, trnfM-CAUI, trnM-CAU, trnN-GUU, trnP-UGG, trnQ-UUG, trnR-ACG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC, trnV-UAC *, trnW-CCA, trnY-GUA
Other genesRNA processingmatK
carbon metabolismcemA
fatty acid synthesisaccD
proteolysisclpP *
Genes of unknown functionconserved reading framesycf1, ycf2, ycf15, ndhK

* indicate the intron-containing genes.

Table 2

Genes with introns within the F. suspensa chloroplast genome and the length of exons and introns.

GeneLocationExon І (bp)Intron І (bp)Exon ІІ (bp)Intron ІІ (bp)Exon ІІІ (bp)
trnA-UGCIR3881435
trnG-GCCLSC2467648
trnI-GAUIR4294235
trnK-UUULSC38249437
trnL-UAALSC3747350
trnV-UACLSC3857237
rps12 *LSC114-23153627
rps16LSC40864227
atpFLSC144705411
rpoC1LSC4457581619
ycf3LSC129714228737153
clpPLSC69815291642228
petBLSC6707642
petDLSC8713475
rpl16LSC9865399
rpl2IR393664435
ndhBIR777679756
ndhASSC5551106531

* The rps12 is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ end in the IR regions.

2.2. Comparison to Other Lamiales Species

The IR regions are highly conserved and play an important role in stabilizing the cp genome structure [20,21]. For IR and SC boundary regions, their expansion and contraction are commonly considered as the main mechanism behind the length variation of angiosperm cp genomes [22,23]. In this study, we compared the junctions of LSC/IRb/SSC/IRa of the seven Lamiales cp genomes (Figure 2), and also observed the expansions and contractions in IR boundary regions.
Figure 2

Comparisons of LSC, SSC, and IR region borders among six Lamiales chloroplast genomes. Ψ indicates a pseudogene. Colorcoding mean different genes on both sides of the junctions. Number above the gene features means the distance between the ends of genes and the junction sites. The arrows indicated the location of the distance. This figure is not to scale.

The rps19 genes of four Oleaceae species were all completely located in the LSC region, and the IR region expanded to the rps19 gene in the other three genomes, with a short rps19 pseudogene of 43 bp, 30 bp, and 40 bp created at the IRa/LSC border in S. miltiorrhiza, S. indicum, and S. takesimensis, respectively. The border between the IRb and SSC extended into the ycf1 genes, with ycf1 pseudogenes created in all of the seven species. The length of the ycf1 pseudogene was very similar in four of the Oleaceae species (1091 or 1092 bp), and was longer than that in S. miltiorrhiza (1056 bp), S. indicum (1012 bp), and S. takesimensis (886 bp). Overlaps were detected between the ycf1 pseudogene and the ndhF gene in five cp genomes (except for S. indicum and S. takesimensis), which also had similar lengths (25 or 26 bp) in four Oleaceae species. The trnH-GUG genes were all located in the LSC region, the distance of which from the LSC/IRa boundary was 3–22 bp. Overall, the IR/SC junctions of the Oleaceae species were similar and showed some difference compared to those of Lamiaceae (S. miltiorrhiza), Pedaliaceae (S. indicum), and Scrophulariaceae (S. takesimensis). Our results suggested that the cp genomes of closely related species might be conserved, whereas greater diversity might occur among species belonging to different families, such as one inverted repeat loss in the cp genome of Astragalus membranaceus [24] and the large inversions in Eucommia ulmoides [25].

2.3. Codon Usage Analysis

The synonymous codons often have different usage frequencies in plant genomes, which was termed codon usage bias. A variety of evolutionary factors which affect gene mutation and selection may lead to the occurrence of codon bias [26,27]. To examine codon usage, the effective number of codons (Nc) of 52 protein-coding genes (PCGs) was calculated. The Nc values for each PCG in F. suspensa are shown in Table S2. Our results indicated that the Nc values ranged from 37.83 (rps14) to 54.75 (ycf3) in all the selected PCGs. Most Nc values were greater than 44, which suggested a weak gene codon bias in the F. suspensa cp genome. The rps14 gene was detected to exist in the most biased codon usage with the lowest mean Nc value of 37.83. Table 3 showed the codon usage and relative synonymous codon usage (RSCU). Due to the RSCU values of >1, thirty codons showed the codon usage bias in the F. suspensa cp genes. Interestingly, out of the above 30 codons, twenty-nine were A or T-ending codons. Conversely, the G + C-ending codons exhibited the opposite pattern (RSCU values < 1), indicating that they are less common in F. suspensa cp genes. Stop codon usage was found to be biased toward TAA. The similar codon usage rules of bias for A- or T-ending were also found in poplar, rice, and other plants [28,29,30].
Table 3

The relative synonymous codon usage of the Forsythia suspensa chloroplast genome.

Amino AcidsCodonNumberRSCUAA FrequencyAmino AcidsCodonNumberRSCUAA Frequency
PheUUU7791.325.59%SerUCU4721.767.59%
UUC4050.68UCC2470.92
LeuUUA7201.9310.56%UCA3071.15
UUG4511.21UCG1520.57
CUU4861.30AGU3391.26
CUC1290.35AGC910.34
CUA3010.81ProCCU3511.554.26%
CUG1500.40CCC1700.75
IleAUU8901.478.57%CCA2691.19
AUC3770.62CCG1130.50
AUA5480.91ThrACU4301.634.98%
MetAUG4951.002.34%ACC2010.76
ValGUU4231.485.41%ACA3241.23
GUC1260.44ACG1000.38
GUA4471.56AlaGCU5261.845.41%
GUG1510.53GCC1770.62
TyrUAU6311.613.70%GCA3281.14
UAC1520.39GCG1150.40
TERUAA281.620.25%CysUGU1711.531.05%
UAG100.58UGC520.47
UGA140.81ArgCGU2751.306.00%
HisCAU4041.582.42%CGC900.42
CAC1080.42CGA2841.34
GlnCAA5951.523.69%CGG970.46
CAG1860.48ArgAGA3921.85
AsnAAU7961.564.81%AGG1330.63
AAC2240.44GlyGGU4931.337.00%
LysAAA8371.545.15%GGC1450.39
AAG2530.46GGA5941.60
AspGAU6901.594.09%GGG2510.68
GAC1760.41GluGAA8661.545.32%
TrpUGG3861.001.82%GAG2620.46

The value of relative synonymous codon usage (RSCU) > 1 are highlighted in bold.

The factors affecting codon usage may vary in different genes or species. In a relative study, Zhou et al. [30] considered the genomic nucleotide mutation bias as a main cause of codon bias in seed plants such as arabidopsis and poplar. Morton [31] reported that the cp gene codon usage was largely affected by the asymmetric mutation of cp DNA in Euglena gracilis. Our result suggested that a low GC content and codon usage bias for A + T-ending may be a major factor in the cp gene codon usage of F. suspensa. The 52 unique PCGs comprised 63,555 bp that encoded 21,185 codons. The amino acid (AA) frequencies of the F. suspensa cp genome were further computed. Of these codons, 2237 (10.56%) encode leucine, which was the most frequency used AA in the F. suspensa cp genome (Table 3). As the least common one, cysteine was only encoded by 223 (1.05%) codons.

2.4. Repeats and Simple Sequence Repeats Analysis

Repeat sequences in the F. suspensa cp genome were analyzed by REPuter and the results showed that there were no complement repeats and reverse repeats. Twenty-six forward repeats and 23 palindrome repeats were detected with lengths ≥ 30 bp (identity > 90%) (Table 4). Out of the 49 repeats, 34 repeats (69.4%) were 30–39 bp long, 11 repeats (22.4%) were 40–49 bp long, four repeats (8.2%) were 50–59 bp long, and the longest repeat was 58 bp. Generally, repeats were mostly distributed in noncoding regions [32,33]; however, 53.1% of the repeats in the F. suspensa cp genome were located in coding regions (CDS) (Figure 3A), mainly in ycf2; similar to that of S. dentata and S. takesimensis [34]. Meanwhile, 40.8% of repeats were located in intergenic spacers (IGS) and introns, and 6.1% of repeats were in parts of the IGS and CDS.
Table 4

Repetitive sequences of Forsythia suspensa calculated using REPuter.

No.Size/bpType #Repeat 1 Start (Location)Repeat 2 Start (Location)Region
130F10,814 (trnG-GCC *)38,746 (trnG-UCC)LSC
230F17,447 (rps2-rpoC2)17,448 (rps2-rpoC)LSC
330F44,547 (psaA-ycf3)44,550 (psaA-ycf3)LSC
430F45,978 (ycf3 intron2)101,338 (rps12_3end-trnV-GAC)LSC, IRa
530F91,923 (ycf2)91,965 (ycf2)IRa
630F110,167 (rrn4.5-rrn5)110,198 (rrn4.5-rrn5)IRa
730F133,335 (rrn5-rrn4.5)133,366 (rrn5-rrn4.5)IRb
830F149,178 (ycf2)149,214 (ycf2)IRb
930F149,196 (ycf2)149,214 (ycf2)IRb
1030F151,568 (ycf2)151,610 (ycf2)IRb
1132F9313 (trnS-GCU *)37,781 (psbC-trnS-UGA *)LSC
1232F40,965 (psaB)43,189 (psaA)LSC
1332F53,338 (ndhC-trnV-UAC)53,358 (ndhC-trnV-UAC)LSC
1432F115,350 (ndhF-rpl32)115,378 (ndhF-rpl32)SSC
1534F94,332 (ycf2)94,368 (ycf2)IRa
1634F94,350 (ycf2)94,368 (ycf2)IRa
1735F149,188(ycf2)149,206 (ycf2)IRb
1839F45,966 (ycf3 intron2)101,326 (rps12_3end-trnV-GAC)LSC, IRa
1939F45,966 (ycf3 intron2)122,604 (ndhA intron1)LSC, SSC
2041F40,953 (psaB)43,177 (psaA)LSC
2141F101,324 (rps12_3end-trnV-GAC)122,602 (ndhA intron)IRa, SSC
2242F94,320 (ycf2)94,356 (ycf2)IRa
2342F149,165 (ycf2)149,201 (ycf2)IRb
2444F94,340 (ycf2)94,358 (ycf2)IRa
2558F94,332 (ycf2)94,340 (ycf2)IRa
2658F149,165 (ycf2)149,183 (ycf2)IRb
2730P9315 (trnS-GCU *)47,653 (trnS-GGA)LSC
2830P14,359 (atpF-atpH)14,359 (atpF-atpH)LSC
2930P34,338 (trnT-GGU-psbD)34,338 (trnT-GGU-psbD)LSC
3030P37,783 (psbC-trnS-UGA *)47,653 (trnS-GGA)LSC
3130P45,978 (ycf3 intron2)142,195 (trnV-GAC-rps12_3end)LSC, IRb
3230P91,923 (ycf2)151,568 (ycf2)IRa, IRb
3330P91,965 (ycf2)151,610 (ycf2)IRa, IRb
3430P110,167 (rrn4.5-rrn5)133,335 (rrn5-rrn4.5)IRa, IRb
3530P110,198 (rrn4.5-rrn5)133,366 (rrn5-rrn4.5)IRa, IRb
3630P122,764 (ndhA intron1)122,766 (ndhA intron1)SSC
3734P94,332 (ycf2)149,161 (ycf2)IRa, IRb
3834P94,350 (ycf2)149,161 (ycf2)IRa, IRb
3934P94,368 (ycf2)149,179 (ycf2)IRa, IRb
4034P94,368 (ycf2)149,179 (ycf2)IRa, IRb
4139P45,966 (ycf3 intron2)45,966 (ycf3 intron2)LSC, IRb
4241P122,602 (ndhA intron1)142,198 (trnV-GAC–rps12_3end)SSC, IRb
4342P94,320 (ycf2)149,165 (ycf2)IRa, IRb
4442P94,356 (ycf2)149,201 (ycf2)IRa, IRb
4544P77,475 (psbT-psbN)77,475 (psbT-psbN)LSC
4644P94,340 (ycf2)149,161 (ycf2)IRa, IRb
4744P94,358 (ycf2)149,179 (ycf2)IRa, IRb
4858P94,332 (ycf2)149,165 (ycf2)IRa, IRb
4958P94,340 (ycf2)149,183 (ycf2)IRa, IRb

# F: forward; P: palindrome; * part in the gene.

Figure 3

Distribution of repeat sequence and simple sequence repeats (SSRs) within F. suspensa chloroplast genomes. (A) Distribution of repeats; and (B) distribution of SSRs. IGS: intergenic spacer.

Simple sequence repeats (SSRs) are widely distributed across the entire genome and exert significant influence on genome recombination and rearrangement [35]. As valuable molecular markers, SSRs have been used in polymorphism investigations and population genetics [36,37]. The occurrence, type, and distribution of SSRs were analyzed in the F. suspensa cp genome. In total, we detected 54 SSRs in the F. suspensa cp genome (Table 5), accounting for 700 bp of the total sequence (0.45%). The majority of these SSRs consisted of mono- and di-nucleotide repeats, which were found 35 and seven times, respectively. Tri-(1), tetra-(4), and penta-nucleotide repeat sequences (1) were detected with a much lower frequency. Six compound SSRs were also found. Fifty SSRs (92.6%) were composed of A and T nucleotides, while tandem G or C repeats were quite rare, which was in concordance with the other research results [38,39]. Out of these SSRs, 42 (88.9%) and six (11.1%) were located in IGS and introns, respectively (Figure 3B). Only five SSRs were found in the coding genes, including rpoC2, rpoA, and ndhD, and one was located in parts of the IGS and CDS. In addition, we noticed that almost all SSRs were located in LSC, except for (T)19, and no SSRs were detected in the IR region. These SSRs may be developed lineage-specific markers, which might be useful in evolutionary and genetic diversity studies.
Table 5

Distribution of SSR loci in the chloroplast genome of Forsythia suspensa.

SSR Type #SSR SequenceSizeStartSSR LocationRegion
p1(A)101031,855psbM-trnD-GUCLSC
1031,992psbM-trnD-GUCLSC
1038,025trnS-UGA-psbZLSC
1073,886clpP intron1LSC
1085,390rpl16 intronLSC
(T)1010507trnH-GUG-psbALSC
109056psbK-psbILSC
1011,162trnR-UCU-atpALSC
1059,781rbcL-accDLSC
1066,291petA-psbJLSC
1069,202petL-petGLSC
(C)10105236trnK-UUU-rps16LSC
(T)111119,678rpoC2LSC
1150,871trnF-GAA-ndhJLSC
1161,662accD-psaILSC
1172,263rpl20-clpPLSC
1174,741clpP intron2LSC
(T)121220,216rpoC2LSC
1281,254rpoALSC
1283,666rps8-rpl14LSC
(A)131312,741atpA-atpFLSC
1346,877ycf3-trnS-GGALSC
(T)131314,109atpF-atpHLSC
1334,486trnT-GGU-psbDLSC
1337,645psbC-trnS-UGALSC
1386,860rpl22-rps19LSC
(T)141448,630rps4-trnT-UGULSC
(A)151533,163trnE-UUC-trnT-GGULSC
(A)161646,618ycf3 intron2LSC
(A)191944,559psaA-ycf3LSC
(T)1919117,928ndhDSSC
(A)202029,957trnC-GCA-petNLSC
p2(AT)5104646trnK-UUU-rps16LSC
106558rps16-trnQ-UUGLSC
1021,057rpoC2LSC
(TA)51069,619trnW-CCA-trnP-UGGLSC
(TA)61248,772rps4-trnT-UGULSC
1249,291trnT-UGU-trnL-UAALSC
1269,931trnP-UGG-psaJLSC
p3(CCT)41269,371petG-trnW-CCALSC
p4(AAAG)31273,413clpP intron1LSC
(TCTT)31231,191petN-psbMLSC
(TTTA)31255,102trnM-CAU-atpELSC
(AAAT)4169284psbI-trnS-GCULSC
p5(TCTAT)3159458trnS-GCU-trnG-GCCLSC
c-2317,456rps2-rpoC2LSC
-2763,589ycf4-cemALSC
-3378,324petB intronLSC
-4571,570rps18-rpl20LSC
-5938,501psbZ-trnG-UCCLSC
-9057,078atpB *LSC

# p1: mono-nucleotide; p2: di-nucleotide; p3: tri-nucleotide; p4: tetra-nucleotide; p5: penta-nucleotide; c: compound; * part in the gene.

2.5. Predicted RNA Editing Sites in the F. suspensa Chloroplast Genes

In the F. suspensa cp genome, we predicted 52 RNA editing sites, which occurred in 21 genes (Table 6). The ndhB gene contained the most editing sites (10), and this finding was consistent with other plants such as rice, maize, and tomato [40,41,42]. Meanwhile, the genes ndhD and rpoB were predicted to have six editing sites: matK, five; ropC2, three; accD, ndhA, ndhF, ndhG, and petB, two; and one each in atpA, atpF, atpI, ccsA, petG, psbE, rpl2, rpl20, rpoA, rps2, and rps14. All these editing sites were C-to-U transitions. The editing phenomenon was also commonly found in the chloroplasts and mitochondria of seed plants [43]. The locations of the editing sites in the first, second, and third codons were 14, 38, and 0, respectively. Of the 52 sites, twenty were U_A types, which was similar codon bias to previous studies of RNA editing sites [10,44]. In addition, forty-eight RNA editing events in the F. suspensa cp genome led to acid changes for highly hydrophobic residues, such as leucine, isoleucine, valine, tryptophan, and tyrosine. The conversions from serine to leucine were the most frequent transitions. As a form of post-transcriptional regulation of gene expression, the feature has already been revealed by most RNA editing researches [44]. Notably, our results provide additional evidence to support the above conclusion.
Table 6

The predicted RNA editing site in the Forsythia suspensa chloroplast genes.

GeneCodon PositionAmino Acid PositionCodon (Amino Acid) ConversionScore
accD794265uCg (S) => uUg (L)0.8
1403468cCu (P) => cUu (L)1
atpA914305uCa (S) => uUa (L)1
atpF9231cCa (P) => cUa(L)0.86
atpI629210uCa (S) => uUa (L)1
ccsA7124aCu (T) => aUu (I)1
matK27191Ccu (P) => Ucu (S)0.86
460154Cac (H) => Uac (Y)1
646216Cau (H) => Uau (Y)1
1180394Cgg (R) => Ugg (W)1
1249417Cau (H) => Uau (Y)1
ndhA344115uCa (S) => uUa (L)1
569190uCa (S) => uUa (L)1
ndhB14950uCa (S) => uUa (L)1
467156cCa (P) => cUa (L)1
586196Cau (H) => Uau (Y)1
611204uCa (S) => uUa (L)0.8
737246cCa (P) => cUa (L)1
746249uCu (S) => uUu (F)1
830277uCa (S) => uUa (L)1
836279uCa (S) => uUa (L)1
1292431uCc (S) => uUc (F)1
1481494cCa (P) => cUa (L)1
ndhD21aCg (T) => aUg (M)1
4716uCu (S) => uUu (F)0.8
313105Cgg (R) => Ugg (W)0.8
878293uCa (S) => uUa (L)1
1298433uCa (S) => uUa (L)0.8
1310437uCa (S) => uUa (L)0.8
ndhF29097uCa (S) => uUa (L)1
671224uCa (S) => uUa (L)1
ndhG314105aCa (T) => aUa (I)0.8
385129Cca (P) => Uca (S)0.8
petB418140Cgg (R) => Ugg (W)1
611204cCa (P) => cUa (L)1
petG9432Cuu (L) => Uuu (F)0.86
psbE21472Ccu (P) => Ucu (S)1
rpl2596199gCg (A) => gUg (V)0.86
rpl20308103uCa (S) => uUa (L)0.86
rpoA830277uCa (S) => uUa (L)1
rpoB338113uCu (S) => uUu (F)1
551184uCa (S) => uUa (L)1
566189uCg (S) => uUg (L)1
1672558Ccc (P) => Ucc (S)0.86
2000667uCu (S) => uUu (F)1
2426809uCa (S) => uUa (L)0.86
rpoC21792598Cgu (R) => Ugu (C)0.86
2305769Cgg (R) => Ugg (W)1
37461249uCa (S) => uUa (L)0.86
rps224883uCa (S) => uUa (L)1
rps148027uCa (S) => uUa (L)1
14950cCa (P) => cUa (L)1

2.6. Phylogeny Reconstruction of Lamiales Based on Complete Chloroplast Genome Sequences

Complete cp genomes comprise abundant phylogenetic information, which could be applied to phylogenetic studies of angiosperm [11,45,46]. To identify the evolutionary position of F. suspensa within Lamiales, an improved resolution of phylogenetic relationships was achieved by using these whole cp genome sequences of 36 Lamiales species. Three species, C. Arabica, I. purpurea, and O. nivara were also chosen as outgroups. The Maximum likelihood (ML) bootstrap values were fairly high, with values ≥ 98% for 32 of the 36 nodes, and 30 nodes had 100% bootstrap support (Figure 4). F. suspensa, whose cp genome was reported in this study, was closely related to A. distichum, which then formed a cluster with H. palmeri, J. nudiflorum, and the Olea species from Oleaceae with 100% bootstrap supports. Notably, Oleaceae diverged relatively early from the Lamiales lineage. In addition, four phylogenetic relationships were only supported by lower ML bootstrap values. This was possibly a result of less samples in these families. The cp genome is also expected to be useful in resolving the deeper branches of the phylogeny, along with the availability of more whole genome sequences.
Figure 4

Maximum likelihood phylogeny of the Lamiales species inferred from complete chloroplast genome sequences. Numbers near branches are bootstrap values of 100 pseudo-replicates. The tree on the right panel was constructed manually by reference to the left one, and the distance of branches was meaningless. The branches without numbers indicate 100% bootstrap supports.

3. Materials and Methods

3.1. Plant Materials

Samples of F. suspensa were collected in Zezhou County, Shanxi Province, China. The voucher specimens were deposited in the Herbarium of Shanxi Agricultural University, Taigu, China. Additionally, the location of the specimens was not within any protected area.

3.2. DNA Library Preparation, Sequencing, and Genome Assembly

Genomic DNA was extracted from fresh young leaves of the F. suspensa plant using the mCTAB method [47]. Genomic DNA was fragmented into 400–600 bp using a Covaris M220 Focused-ultrasonicator (Covaris, Woburn, MA, USA). Library preparation was conducted using NEBNext® Ultra™ DNA Library Prep Kit Illumina (New England, Biolabs, Ipswich, MA, USA). Sample sequencing was carried out on an Illumina Hiseq 4000 PE150 platform. Next, raw sequence reads were assembled into contigs using SPAdes [48], CLC Genomics Workbench 8 (Available online: http://www.clcbio.com), and SOAPdenovo2 [49], respectively. Chloroplast genome contigs were selected by BLAST (Available online: http://blast.ncbi.nlm.nih.gov/) [50] and were assembled by Sequencher 4.10 (Available online: http://genecodes.com/). All reads were mapped to the cp genome using Geneious 8.1 [51], which verified the selected contigs. The closing of gaps was accomplished by special primer designs, PCR amplification, and Sanger sequencing. Finally, we obtained a high-quality complete F. suspensa cp genome, and the result was submitted to NCBI (Accession Number: MF579702).

3.3. Genome Annotation and Comparative Genomics

Chloroplast genome annotation was performed using DOGMA (Dual Organellar GenoMe Annotator) [52] (Available online: http://dogma.ccbb.utexas.edue). Putative protein-coding genes, tRNAs, and rRNAs were identified by BLASTX and BLASTN searches (Available online: http://blast.ncbi.nlm.nih.gov/), respectively. The cp genome was drawn using OrganellarGenomeDRAW [53] (Available online: http://ogdraw.mpimp-golm.mpg.de/index.shtml), with subsequent manual editing. The boundaries between the IR and SC regions of F. suspensa and six other Lamiales species were compared and analyzed.

3.4. Repeat Sequence Analyses

The REPuter program [54] (Available online: https://bibiserv.cebitec. uni-bielefeld.de/reputer) was used to identify repeats including forward, reverse, palindrome, and complement sequences. The length and identity of the repeats were limited to ≥30 bp and >90%, respectively, with the Hamming distance equal to 3 [55,56]. The cp SSRs were detected using MISA [57] with the minimum repeats of mono-, di-, tri-, tetra-, penta-, and hexanucleotides set to 10, 5, 4, 3, 3, and 3, respectively.

3.5. Codon Usage

To ensure sampling accuracy, only 52 PCGs with a length >300 bp were selected for synonymous codon usage analysis. Two relevant parameters, Nc and RSCU, were calculated using the program CodonW1.4.2 (Available online: http://downloads.fyxm.net/CodonW-76666.html). Nc is often utilized to evaluate the codon bias at the individual gene level, in a range from 20 (extremely biased) to 61 (totally unbiased) [58]. RSCU is the observed frequency of a codon divided by the expected frequency. The values close to 1.0 indicate a lack of bias [59]. AA frequency was also calculated and expressed by the percentage of the codons encoding the same amino acid divided by the total codons.

3.6. Prediction of RNA Editing Sites

Prep-Cp [60] (Available online: http://prep.unl.edu/) and CURE software [61] (Available online: http://bioinfo.au.tsinghua.edu.cn/pure/) were applied to the prediction of RNA editing sites, and the parameter threshold (cutoff value) was set to 0.8 to ensure prediction accuracy.

3.7. Phylogenomic Analyses

ML phylogenetic analyses were performed using the F. suspensa complete cp genome and 32 Lamiales plastomes with three species, Coffea arabica, Ipomoea purpurea, and Oryza nivara, as outgroups (Table S1). All of the plastome sequences were aligned using MAFFT program version 7.0 [62] (Available online: http://mafft.cbrc.jp/alignment/server/index.html) and adjusted manually where necessary. These plastome nucleotide alignments were subjected to ML phylogenetic analyses with MEGA7.0 [63] based on the General Time Reversible model. A discrete Gamma distribution was used to model evolutionary rate differences among sites. The branch support was estimated by rapid bootstrap analyses using 100 pseudo-replicates.

4. Conclusions

The cp genome of the medicinal plant F. suspensa was reported for the first time in this study and its organization is described and compared with that of other Lamiales species. This genome is 156,404 bp in length, with a similar quadripartite structure and genomic contents common to most land plant genomes. The low GC content of the cp genome might caused the codon usage bias toward A- or T-ending codons. All of the predicted RNA editing sites in the genome were C-to-U transitions. Among several relative species, the genome size and IR expansion or contraction exhibited some differences, and the divergent regions were also analyzed. Repeat sequences and SSRs within F. suspensa were analyzed, which may be useful in developing molecular markers for the analyses of infraspecific genetic differentiation within the genus Forsythia (Oleaceae). Phylogenetic analysis based on the entire cp genome revealed that F. suspensa, as a member of the Oleaceae family, diverged relatively early from Lamiales. Overall, the sequences and annotation of the F. suspensa cp genome will facilitate medicinal resource conservation, as well as molecular phylogenetic and genetic engineering research of this species.
  54 in total

1.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

Review 2.  The chloroplast genome.

Authors:  M Sugiura
Journal:  Plant Mol Biol       Date:  1992-05       Impact factor: 4.076

3.  Editing of the chloroplast ndhB encoded transcript shows divergence between closely related members of the grass family (Poaceae).

Authors:  R Freyer; C López; R M Maier; M Martín; B Sabater; H Kössel
Journal:  Plant Mol Biol       Date:  1995-11       Impact factor: 4.076

4.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

5.  The complete chloroplast genome sequence of date palm (Phoenix dactylifera L.).

Authors:  Meng Yang; Xiaowei Zhang; Guiming Liu; Yuxin Yin; Kaifu Chen; Quanzheng Yun; Duojun Zhao; Ibrahim S Al-Mssallem; Jun Yu
Journal:  PLoS One       Date:  2010-09-15       Impact factor: 3.240

6.  Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns.

Authors:  Robert K Jansen; Zhengqiu Cai; Linda A Raubeson; Henry Daniell; Claude W Depamphilis; James Leebens-Mack; Kai F Müller; Mary Guisinger-Bellian; Rosemarie C Haberle; Anne K Hansen; Timothy W Chumley; Seung-Bum Lee; Rhiannon Peery; Joel R McNeal; Jennifer V Kuehl; Jeffrey L Boore
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

7.  Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors:  Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal:  Bioinformatics       Date:  2012-04-27       Impact factor: 6.937

8.  The Complete Chloroplast Genome of Ye-Xing-Ba (Scrophularia dentata; Scrophulariaceae), an Alpine Tibetan Herb.

Authors:  Lianghong Ni; Zhili Zhao; Gaawe Dorje; Mi Ma
Journal:  PLoS One       Date:  2016-07-08       Impact factor: 3.240

9.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler.

Authors:  Ruibang Luo; Binghang Liu; Yinlong Xie; Zhenyu Li; Weihua Huang; Jianying Yuan; Guangzhu He; Yanxiang Chen; Qi Pan; Yunjie Liu; Jingbo Tang; Gengxiong Wu; Hao Zhang; Yujian Shi; Yong Liu; Chang Yu; Bo Wang; Yao Lu; Changlei Han; David W Cheung; Siu-Ming Yiu; Shaoliang Peng; Zhu Xiaoqian; Guangming Liu; Xiangke Liao; Yingrui Li; Huanming Yang; Jian Wang; Tak-Wah Lam; Jun Wang
Journal:  Gigascience       Date:  2012-12-27       Impact factor: 6.524

10.  Genome-Wide Identification of SSR and SNP Markers Based on Whole-Genome Re-Sequencing of a Thailand Wild Sacred Lotus (Nelumbo nucifera).

Authors:  Jihong Hu; Songtao Gui; Zhixuan Zhu; Xiaolei Wang; Weidong Ke; Yi Ding
Journal:  PLoS One       Date:  2015-11-25       Impact factor: 3.240

View more
  34 in total

1.  The complete chloroplast genomes of two species in threatened monocot genus Caldesia in China.

Authors:  Virginia M Mwanzia; John M Nzei; Dong-Ying Yan; Peris W Kamau; Jin-Ming Chen; Zhi-Zhong Li
Journal:  Genetica       Date:  2019-10-25       Impact factor: 1.082

2.  Comparative Genomics of the Balsaminaceae Sister Genera Hydrocera triflora and Impatiens pinfanensis.

Authors:  Zhi-Zhong Li; Josphat K Saina; Andrew W Gichira; Cornelius M Kyalo; Qing-Feng Wang; Jin-Ming Chen
Journal:  Int J Mol Sci       Date:  2018-01-23       Impact factor: 5.923

3.  Molecular Evolution of Chloroplast Genomes of Orchid Species: Insights into Phylogenetic Relationship and Adaptive Evolution.

Authors:  Wan-Lin Dong; Ruo-Nan Wang; Na-Yao Zhang; Wei-Bing Fan; Min-Feng Fang; Zhong-Hu Li
Journal:  Int J Mol Sci       Date:  2018-03-02       Impact factor: 5.923

4.  The Complete Chloroplast Genome Sequence of Tree of Heaven (Ailanthus altissima (Mill.) (Sapindales: Simaroubaceae), an Important Pantropical Tree.

Authors:  Josphat K Saina; Zhi-Zhong Li; Andrew W Gichira; Yi-Ying Liao
Journal:  Int J Mol Sci       Date:  2018-03-21       Impact factor: 5.923

5.  Comparative Analysis of the Complete Chloroplast Genomes of Four Aconitum Medicinal Species.

Authors:  Jing Meng; Xuepei Li; Hongtao Li; Junbo Yang; Hong Wang; Jun He
Journal:  Molecules       Date:  2018-04-26       Impact factor: 4.411

6.  Comparative Analysis of Chloroplast Genomes of Four Medicinal Capparaceae Species: Genome Structures, Phylogenetic Relationships and Adaptive Evolution.

Authors:  Dhafer A Alzahrani; Enas J Albokhari; Samaila S Yaradua; Abidina Abba
Journal:  Plants (Basel)       Date:  2021-06-17

7.  Chloroplast Genome of the Folk Medicine and Vegetable Plant Talinum paniculatum (Jacq.) Gaertn.: Gene Organization, Comparative and Phylogenetic Analysis.

Authors:  Xia Liu; Yuan Li; Hongyuan Yang; Boyang Zhou
Journal:  Molecules       Date:  2018-04-09       Impact factor: 4.411

8.  Complete Chloroplast Genomes of 14 Mangroves: Phylogenetic and Comparative Genomic Analyses.

Authors:  Chengcheng Shi; Kai Han; Liangwei Li; Inge Seim; Simon Ming-Yuen Lee; Xun Xu; Huanming Yang; Guangyi Fan; Xin Liu
Journal:  Biomed Res Int       Date:  2020-05-05       Impact factor: 3.411

9.  Comparative Analysis of the Complete Plastid Genome of Five Bupleurum Species and New Insights into DNA Barcoding and Phylogenetic Relationship.

Authors:  Jun Li; Deng-Feng Xie; Xian-Lin Guo; Zhen-Ying Zheng; Xing-Jin He; Song-Dong Zhou
Journal:  Plants (Basel)       Date:  2020-04-22

10.  Complete chloroplast genome sequence of Barleria prionitis, comparative chloroplast genomics and phylogenetic relationships among Acanthoideae.

Authors:  Dhafer A Alzahrani; Samaila S Yaradua; Enas J Albokhari; Abidina Abba
Journal:  BMC Genomics       Date:  2020-06-06       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.