Literature DB >> 35627316

The Chloroplast Genome of Wild Saposhnikovia divaricata: Genomic Features, Comparative Analysis, and Phylogenetic Relationships.

Shanyong Yi1,2, Haibo Lu2, Wei Wang1,2, Guanglin Wang1,3, Tao Xu1,2, Mingzhi Li4, Fangli Gu1,2, Cunwu Chen1,2, Bangxing Han2, Dong Liu1.   

Abstract

Saposhnikovia divaricata, a well-known Chinese medicinal herb, is the sole species under the genus Saposhnikovia of the Apiaceae subfamily Apioideae Drude. However, information regarding its genetic diversity and evolution is still limited. In this study, the first complete chloroplast genome (cpDNA) of wild S. divaricata was generated using de novo sequencing technology. Similar to the characteristics of Ledebouriella seseloides, the 147,834 bp-long S. divaricata cpDNA contained a large single copy, a small single copy, and two inverted repeat regions. A total of 85 protein-coding, 8 ribosomal RNA, and 36 transfer RNA genes were identified. Compared with five other species, the non-coding regions in the S. divaricata cpDNA exhibited greater variation than the coding regions. Several repeat sequences were also discovered, namely, 33 forward, 14 reverse, 3 complement, and 49 microsatellite repeats. Furthermore, phylogenetic analysis using 47 cpDNA sequences of Apioideae members revealed that L. seseloides and S. divaricata clustered together with a 100% bootstrap value, thereby supporting the validity of renaming L. seseloides to S. divaricata at the genomic level. Notably, S. divaricata was most closely related to Libanotis buchtormensis, which contradicts previous reports. Therefore, these findings provide a valuable foundation for future studies on the genetic diversity and evolution of S. divaricata.

Entities:  

Keywords:  complete cpDNA sequence; phylogeny; traditional Chinese medicine

Mesh:

Substances:

Year:  2022        PMID: 35627316      PMCID: PMC9141249          DOI: 10.3390/genes13050931

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.141


1. Introduction

Saposhnikovia divaricata (Turcz.) Schischk., the sole species of the genus Saposhnikovia Schischk, under the Apiaceae subfamily Apioideae Drude, is widely distributed in the Northern regions of China. It is one of the most important and well-known traditional Chinese medicinal plants listed in the Chinese Pharmacopoeia, as well as in several pharmaceutical records, such as the Thousand Golden Prescriptions (Qian Jin Fang) and Shen Nong’s Materia Medica (Shen Nong Ben Cao Jing). The dried roots are called Fang-Feng in China, Bang-Poong in Korea, and Bofu in Japan, and have been extensively used for treating arthralgia, headaches, rheumatism, stroke, fever, and allergic rhinitis [1]. Recently, studies investigating the chemical constituents of S. divaricata revealed that the main active components were chromones, coumarins, and volatile oils [2,3,4], which exhibited anti-proliferative and anti-oxidant, anti-bacterial and anti-tumor, anti-convulsant, anti-coagulant, anti-inflammatory, and anti-pyretic properties [4,5,6,7,8]. However, little information is known regarding genetic diversity and evolution. The chloroplast is a photosynthetic organelle in algae and plants that provides the energy essential for growth and reproduction by promoting the biosynthesis and metabolism of starch and fatty acids [9]. Recent studies have shown that this double membrane plant organelle originated from the endosymbiosis of cyanobacteria [10]. Chloroplast genomes (cpDNAs) are maternally inherited in most plants, and the majority of angiosperm cpDNAs are characterized by small molecules, high copy number genes, and highly conserved sequences [11,12]. Typically, cpDNAs are closed circular double-stranded DNA with a classic quadripartite structure composed of two inverted repeat regions (IRa and IRb), a small single copy (SSC) region, and a large single copy (LSC) region [13]. The cpDNA composition and sequence in angiosperms consist of highly conserved protein-coding genes (PCGs), transfer RNA (tRNA) genes, and ribosomal RNA (rRNA) genes [14]. However, the size, structure, and IR contraction and expansion of angiosperm cpDNAs have undergone several alterations caused by the adaptation to changing environments, and pseudocolonization even occurred in some genera [15]. Therefore, cpDNAs can be used to analyze the genetic structure and molecular characteristics among closely related plant species. The sequenced cpDNAs are generally available to download in public databases, such as the National Center for Biotechnology Information (NCBI; https://www.ncbi.nlm.nih.gov/; accessed on 14 September 2020). With the rapid development of next-generation sequencing technology, the cpDNAs of numerous species have been fully sequenced and functionally characterized. However, although the cpDNA sequences of the cultivated S. divaricata in China [16,17] and its synonymous Ledebouriella seseloides from South Korea [18] have been reported, little is known about the cpDNA information of wild S. divaricata, especially its genetic diversity and evolutionary relationship with L. seseloides and other related species. L. seseloides has already been renamed S. divaricata [19], however, it is still used by some researchers [18]. Furthermore, the cpDNA sequences of L. seseloides and S. divaricata were separately published and have not yet been analyzed in one study. The genomic analysis of available data for these species will further support the validity of this renaming at the molecular level. To investigate the genetic characteristics and phylogeny of wild S. divaricata and to discover a molecular basis for the renaming of L. seseloides, we collected wild S. divaricata samples (33°72′ N, 112°02′ E) for high-throughput cpDNA sequencing and conducted an in-depth analysis via comparison with L. seseloides and other related species. Specifically, we aimed to determine the genomic features of the wild S. divaricata cpDNA to extensively compare its cpDNA with L. seseloides and other Apioideae subfamily members and to identify the repeats and simple sequence repeats (SSRs), thereby discovering the unique characteristics of the wild S. divaricata cpDNA. The comparison between the cpDNA sequences of wild S. divaricata and 46 other taxa under the subfamily Apioideae revealed their phylogenetic relationships. The repeats identified in this study may be useful for developing SSR markers to analyze the genetic diversity of Apioideae subfamily members and are candidates for DNA barcoding studies. Our findings may provide a foundation for future genomic research on the genetic diversity and evolution of S. divaricata and other related species.

2. Results and Discussion

2.1. Genomic Features of the Wild S. divaricata CpDNA

The 147,834 bp-long wild S. divaricata cpDNA was composed of a 93,202 bp-long LSC, a 17,324 bp-long SSC, and a pair of 18,654 bp-long IR (Figure 1) regions. The IRa and IRb regions contained genes of the same type but were arranged in reverse. The length of the SSC region in wild S. divaricata cpDNA was similar to those of other herbs (17,000–19,000 bp), but the lengths of the IR regions were significantly shorter than those of other herbs [20,21]. The GC content of the wild S. divaricata cpDNA was 37.5%, which is consistent with previous reports [16,17]. Determining the GC content in the four regions is necessary for exploring species evolution and genetic relationships and is considered an important parameter for evaluating the codon preference and evolutionary trend in plants. Similar to other closely related species, the GC content in the IR regions of wild S. divaricata cpDNA was 44.6%, which is higher than those in the LSC (35.9%) and SSC (36.0%) regions (Table S1, Figure 2). In addition, we re-annotated, analyzed, and compared all the reported cpDNA sequences of S. divaricata and its synonymous species, L. seseloides. The results showed that the main difference between the two was the cpDNA size, although the total number of genes and unigenes was the same (Table S2).
Figure 1

Chloroplast genome map showing all reported genes of Saposhnikovia divaricata. Genes drawn inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes belonging to different functional groups are color-coded. The darker gray in the inner circle corresponds to GC content. Small single-copy (SSC) region, large single-copy (LSC) region, and inverted repeats (IRa and IRb) are displayed.

Figure 2

Comparison of GC content of the wild S. divaricata cpDNA using GView program.

The comprehensive and in-depth analysis of wild S. divaricata cpDNA revealed 129 functional genes, including 8 rRNA genes, 36 tRNA genes, and 85 PCGs (Table 1). Except for the double-copy gene rps12 located in the LSC and IR regions, all genes, including eight tRNA genes (trnA-UGC, trnG-UCC, trnI-GAU, trnL-CAA, trnL-UAA, trnN-GUU, trnR-ACG, and trnV-GAC), five PCGs (rps7, rps12, ndhB, ycf1, and ycf15), and four rRNA genes (rrn4.5, rrn5, rrn16, and rrn23) were duplicated in the IR regions. Additionally, the LSC region contained 24 tRNA genes and 66 PCGs. By contrast, the SSC region only possessed one tRNA and twelve PCGs. Notably, five types of ycf (ycf1, ycf2, ycf3, ycf4, and ycf15) were detected in this genome. Moreover, two genes (clpP and ycf3) contained two introns (rps12 was special with two copies, and its first exon was shared in the LSC region, and exons 2 and 3 were in the IR region), whereas six tRNAs (trnK-UUU, trnI-GAU, trnA-UGC, trnG-UCC, trnV-UAC, and trnL-UAA) and nine PCGs (rps16, rpoC1, rpl2, rpl16, ndhA, ndhB, PetB, PetD, and atpF) only possessed one intron (Table 2). The trnK-UUU gene had the largest intron (2532 bp), which included the matK gene. Introns can regulate the gene transcription rate and play a vital role in gene structure and function [22]. Rpl2, which was the only gene with an intron in the ribosomal large subunit of S. divaricata cpDNA, is commonly used as a phylogenetic marker for special species, such as those under tribe Desmodieae [23]. Screening via hybridization demonstrated that the rpl2 intron was lost in at least five other dicotyledon lineages [24]. In higher plants, infA encodes approximately 70 amino acids of the translation initiation factor IF1, which is an important component of protein translation initiation in the organelles [25]. InfA is an extremely active gene during cpDNA evolution and has become repeatedly invalidated in 24 angiosperm lineages, although most angiosperm species seemingly contain the intact gene [26]. Furthermore, infA is considered the most mobile cpDNA gene in plants that has been transferred many times through evolution [27].
Table 1

List of genes found in the chloroplast genome of the wild S. divaricata.

Group of GenesGene NameNumber
Self-replication rRNA genesrrn4.5 (×2), rrn5 (×2), rrn16 (×2), rrn23 (×2)8
tRNA genes* trnA-UGC (×2), trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnfM-CAU, * trnG-UCC (×2), trnH-GUG, trnM-CAU, * trnI-GAU (×2), * trnK-UUU, trnL-CAA (×2), * trnL-UAA (×2), trnI-CAU, trnN-GUU (×2), trnP-UGG, trnQ-UUG, trnR-ACG (×2), trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC (×2), * trnV-UAC, trnW-CCA, trnY-GUA36
Ribosomal small subunitrps2, rps3, rps4, rps7 (×2), rps8, rps11, rps12 (×2), rps14, rps15, * rps16, rps18, rps1914
Ribosomal large subunit* rpl2, rpl14, * rpl16, rpl20, rpl22, rpl23, rpl32, rpl33, rpl369
DNA-dependent RNA polymeraserpoA, rpoB, * rpoC1, rpoC24
photosynthesis Large subunit of rubisco rbcL 1
Photosystem IpsaA, psaB, psaC, psaI, psaJ5
Photosystem IIpsbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ15
NADH dehydrogenase* ndhA, * ndhB (×2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK12
Cytochrome b/f complexpetA, * petB, * petD, petG, petL, petN6
ATP synthaseatpA, atpB, atpE, * atpF, atpH, atpI6
other Maturase matK 1
Subunit of acetyl-CoA carboxylase accD 1
Envelope membrane protein cemA 1
Protease ** clpP 1
C-type cytochrome synthesis ccsA 1
Translation initiation factor infA 1
Functions unknown Conserved open reading framesycf1 (×2), ycf2, ** ycf3, ycf4, ycf15 (×2)7
Total 129

One star character (*) means one intron; (**) means two introns; (×2) indicates genes with two copies.

Table 2

The genes with introns in the wild S. divaricata chloroplast genome and the length of the exons and introns.

GeneLocationExon1 (bp)Intron1 (bp)Exon2 (bp)Intron2 (bp)Exon3 (bp)
trnK-UUU LSC37253235
trnI-GAU IRb3796835
trnI-GAU IRa3796835
t rnA-UGC IRb3881835
t rnA-UGC IRa3881835
trnG-UCC LSC2370348
trnV-UAC LSC3956935
trnL-UAA LSC3550250
1 rps12LSC + IRa114 23253826
1 rps12LSC + IRb114 23253826
rps16 LSC40859197
rpoC1 LSC4327481605
rpl2 LSC394651434
rpl16 LSC9950399
ndhA SSC5531099539
ndhB LSC777682756
ndhB IRa777682756
PetB LSC6758642
PetD LSC8750475
atpF LSC145711401
clpP LSC23163529284871
ycf3 LSC153776228717126

1 Since the rps12 gene is trans-spliced in the wild S. divaricata cpDNA, the length of intron 1 is not counted.

In addition, we analyzed the codon usage preference and relative synonymous codon usage (RSCU) in the cpDNAs of S. divaricata and its related species. Based on the tRNA and PCG sequences, the codon usage frequency in the wild S. divaricata cpDNA was determined (Table S2) and compared to six closely related species, namely, L. seseloides, Libanotis buchtormensis, Seseli montanum, Peucedanum praeruptorum, Angelica paeoniifolia, and Arracacia xanthorrhiza (Figure 3). In total, 24,347 codons were detected in all the coding sequences of S. divaricata. Among these, leucine (Leu) was the most common amino acid, accounting for 10.6% (2573) of the total codons, whereas cysteine (Cys) was the least common (1.0%, 255). The comparison of the GC content in the first to third (GC1–GC3) positions and total GC content (GCs) among the seven cpDNAs indicated that the GC composition of the codons in S. divaricata and L. seseloides cpDNAs was the most similar (Figure 3A–D). Furthermore, the majority of the synonymous codons with RSCU values >1 ended with either adenine (A) or thymine (T) bases (except for TTG and ATG), indicating that codons with A or T ends are common (Table S2, Figure 3E). Notably, Arginine (Arg), Leu, and Serine (Ser) showed a high degree of codon bias among the seven species, whereas tryptophan (Trp) had no codon bias. In addition, we found that the cpDNAs of the wild S. divaricata and the other species preferred TAA as the termination codon. Hypothetically, the best combination of codons can promote the faster and more accurate translation of required proteins. The use of synonymous codons is also influenced by multiple factors, such as genome size, gene length, gene expression level, protein secondary structure, and gene density [28,29]. Therefore, codon preference analysis may be used to examine the balance between mutation preference and natural selection during translation optimization [30].
Figure 3

Comparison of the GC content, codon usage preference, and amino acid proportion in the protein-coding genes of seven chloroplast genomes. (A–D) GC content in the synonymous codons at the first (GC1), second (GC2), and third (GC3) positions and total GC content (GCs). (E) Codon preference and proportion of amino acids based on relative synonymous codon usage (RSCU) values. Ter represents the stop codon. Legend: A, A. xanthorrhiza; B, P. praeruptorum; C, L. buchtormensis; D, S. divaricata; E, L. seseloides; F, S. montanum; G, A. paeoniifolia.

2.2. Comparative CpDNA Analysis of Seven Species under Subfamily Apioideae

The sequence divergence of the cpDNAs among selected species belonging to subfamily Apioideae Drude—L. buchtormensis and S. montanum under tribe Ammineae, P. praeruptorum and A. paeoniifolia under tribe Peucedaneae, A. xanthorrhiza under tribe Selineae, and L. seseloides under tribe Laserpiteae—were examined using the S. divaricata (tribe Laserpiteae) cpDNA as reference (Figure 4). As expected, all cpDNAs exhibited the general structure and order of characteristic genes, with the non-coding regions showing greater variation than the coding regions. Notably, ycf1 (IR and SSC regions) and ycf2 (IR and LSC regions) were quite mutable. Since the lengths of ycf1 and ycf2 located at the boundaries of IR regions are very long, these genes are thus prone to insertion–deletion (InDel), resulting in the considerable differences between the cpDNAs of S. divaricata and the other species. These results indicate that the IR, SSC, and LSC regions rapidly evolved in Apioideae Drude species. Notably, the rRNA sequences were the most conserved among the seven cpDNAs, which is similar to most angiosperms, such as Salvia miltiorrhiza [31] and Phyllostachys sulphurea [32]. We also found that the degree of variation among the IR regions of the seven cpDNAs was low, whereas most of the variation occurred in the SSC regions and in the binding sites of the IR and LSC regions (Figure 2). In addition, all the coding regions in the seven cpDNAs were extracted and evaluated for nucleotide variability. Eight PCGS, namely, rpl32, trnH-GUG, ycf2, ndhI, trnP-UGG, psaJ, psbA, and psaC, possessed the highest Pi values, of which rpl32 was the most variable (Figure 5).
Figure 4

Comparison of the seven chloroplast genomes belonging to subfamily Apioideae Drude using mVISTA program. Grey arrows and thick black lines above the alignments indicate gene orientations and IR positions, respectively. A cut-off of 70% identity was used for the plots, with the Y-scale representing the percent identity (50–100%). Genome regions are color-coded as protein-coding (exon; blue), ribosomal RNA (rRNA; cyan), and conserved non-coding sequences (CNS; pink).

Figure 5

Comparison of the nucleotide variability (Pi) values among the seven species cp genomes. The Y-axis shows the Pi values; the X-axis shows the genes with high Pi values.

The expansion and contraction at the IR region borders are prevalent in many species and are considered the primary reason for the size differences between plant cpDNAs during evolution [33]. Comparison of the IR/LSC and IR/SSC boundaries in A. xanthorrhiza, P. praeruptorum, S. divaricata, L. seseloides, L. buchtormensis, S. montanum, and A. paeoniifolia was performed to assess the degree of IR expansion or contraction among these species. As expected, S. divaricata and L. seseloides contained similar boundaries in the LSC, SSC, and IR regions, with a small difference in the size of ycf2. This result supports the hypothesis that S. divaricata and L. seseloides are the same species. By contrast, due to the less frequent expansion of ycf2 in the LSC/IRb junction, the IR regions in S. divaricata were much smaller than those in L. buchtormensis and S. montanum. In particular, the ycf2 in the LSC region of S. divaricata showed an 80-bp-long expansion towards the IRb region, whereas those of L. buchtormensis and S. montanum had 1293- and 1302-bp-long expansions towards their IRb regions, respectively. The ndhB/ycf2, ycf1, ndhF, and trnH genes were also found to be located in the LSC/IRb, SSC/IRb, IRa/SSC, and LSC/IRa junctions, respectively (Figure 6). Among these, ycf1, a possible pseudogene located in the IR/SSC boundary, was generated after expansion, which is similar to the corresponding coding gene and can be considered as a non-functional genomic DNA copy. However, ycf1 is not transcribed and has no specific physiological function. The ycf1 sequence exhibited a 1-, 22-, 11-, 11-, 38-, 8-, and 16-bp-long expansion from the IRb to the SSC regions in the cpDNAs of A. xanthorrhiza, P. praeruptorum, S. divaricata, L. seseloides, L. buchtormensis, S. montanum, and A. paeoniifolia, respectively. By contrast, the gaps of trnH sequences in the LSC from the IRa regions of P. praeruptorum, S. divaricata, L. seseloides, L. buchtormensis, and S. montanum were 47, 57, 57, 321, and 663 bp long, respectively. However, A. xanthorrhiza and A. paeoniifolia contained no trnH in the LSC region. Notably, the majority of the cpDNAs contained trnN in the IRa regions, except S. divaricata, L. seseloides, and A. xanthorrhiza, whereas only S. divaricata, L. seseloides, and A. paeoniifolia possessed trnL in the IR regions. Moreover, P. praeruptorum, L. buchtormensis, S. divaricata, and L. seseloides possessed psbA in the LSC regions. Recently, the psbA-trnH intergenic spacer (IGS) region was used as a candidate DNA barcode sequence to identify similar species under the genus Dendrobium [34] and family Umbelliferae [35]. The psbA-trnH IGS can also be used as a barcode to distinguish whether two species belong to the same family [36]. In addition, the trnN in the S. divaricata and L. seseloides cpDNAs may have been lost during recombination. Therefore, we hypothesize that the psbA-trnH IGS can be combined with trnN to develop a DNA barcode for the molecular identification of S. divaricata plants.
Figure 6

Comparison of the borders of LSC, SSC, and IR regions among seven cp genomes.

2.3. Identification of Repeat Sequences and SSRs in Wild S. divaricata CpDNA

A total of 33 forward, 14 reverse, and 3 complement repeat sequences were discovered in the wild S. divaricata cpDNA (Table 3). Most of these repeats were between 20 and 50 bp in length. The largest was the 84 bp-long forward repeat in the ycf2 of the LSC region. Notably, LSC was the region with the densest number of repeated sequences. Among these, No. 28–35 were also associated with ycf2, whereas No. 45 was related to ndhA. Ten forward repeats were located in the IR regions, including two repeats (No. 40 and 49) related to ycf15. Moreover, two pairs of repeats (No. 9 and 10) were found to be located in two different regions, specifically in the introns of LSC/SSC and LSC/IRb, respectively.
Table 3

Repeat sequences in the chloroplast genome of the wild S. divaricata.

IDSize (bp)Repeat 1Type 1Size (bp)Repeat 2Mismatch (bp)E-ValueGeneRegion
1347110F32712630.00011IGSLSC
2328400F3136,45121.90 × 10−5IGSLSC
3349846R32115,68530.00011IGSLSC;SSC
4359851C34115,66833.00 × 10−5IGSLSC;SSC
5359851C32115,67033.00 × 10−5IGSLSC;SSC
63520,788F3520,83733.00 × 10−5IGSLSC
73632,147R3732,16032.23 × 10−6IGSLSC
83944,679F3998,96221.74 × 10−9ycf3 (intron); IGSLSC;IRb
93944,679F39122,48331.64 × 10−7ycf3 (intron); ndhA (intron)LSC;SSC
103544,682F3595,89333.00 × 10−5ycf3 (intron); ndhBLSC;IRb
113344,685F3398,96813.27 × 10−8ycf3 (intron); IGSLSC;IRb
123151,905R3164,14414.92 × 10−7IGSLSC
133551,907R3551,90723.57 × 10−7IGSLSC
143251,907F3251,92321.90 × 10−5IGSLSC
153751,911F3664,14332.23 × 10−6IGSLSC
164251,912R4251,91223.15 × 10−11IGSLSC
174251,912R4051,91223.15 × 10−11IGSLSC
182851,912F28115,67008.53 × 10−8IGSLSC;SSC
193151,912R31115,66314.92 × 10−7IGSLSC;SSC
203651,913R3951,91631.64 × 10−7IGSLSC
213551,914R37115,67432.23 × 10−6IGSLSC;SSC
223351,922F32115,66325.07 × 10−6IGSLSC;SSC
233051,925R29115,66911.90 × 10−6IGSLSC;SSC
243252,673R3252,67321.90 × 10−5IGSLSC
253164,142R3164,14227.14 × 10−5IGSLSC
263564,144C36115,67138.18 × 10−6IGSLSC;SSC
272567,922F2567,94605.46 × 10−6IGSLSC
288491,433F8491,45111.64 × 10−38 ycf2 LSC
297091,433F7091,46932.12 × 10−25 ycf2 LSC
305291,433F5291,48735.90 × 10−15 ycf2 LSC
315991,440F5991,47611.30 × 10−23 ycf2 LSC
324591,440F4591,49425.66 × 10−13 ycf2 LSC
335991,458F5991,47601.85 × 10−26 ycf2 LSC
344191,458F4191,49401.27 × 10−15 ycf2 LSC
352391,458F2391,51208.73 × 10−5 ycf2 LSC
364494,003F4494,02411.04 × 10−14IGSIRb
373694,011F369403201.30 × 10−12IGSIRb
384198,960F41122,48131.19 × 10−8IGS;ndhA (intron)IRb;SSC
393398,968F33122,48925.07 × 10−6IGS;ndhA (intron)IRb;SSC
404299,905F4299,92603.18 × 10−16 ycf15 IRb
4134107,943F34107,97518.43 × 10−9IGSIRb
4231108,296F31132,70927.14 × 10−5IGSIRb;IRa
4323114,349F23114,38100.0000873IGSSSC
4428115,668R28115,66808.53 × 10−8IGSSSC
4531122,640R31122,64001.33 × 10−9ndhA (intron)SSC
4634133,027F34133,05918.43 × 10−9IGSIRa
4731133,030F30133,06327.14 × 10−5IGSIRa
4823133,038F23133,07000.0000873IGSIRa
4942141,068F42141,08903.18 × 10−16 ycf15 IRa
5044146,968F44146,98911.04 × 10−14IGSIRa

1 F, Forword; R, Reverse, C, complement; IGS, intergenic space.

SSRs or microsatellites are 1–6 bp repeat sequences commonly distributed throughout the genome. SSRs have been widely employed in studies for species identification, population genetics, and evolutionary history due to their high level of intraspecific polymorphism and uniparental inheritance [37,38]. In total, forty-nine SSRs were discovered in the wild S. divaricata cpDNA, including forty mononucleotide (81.6%), four dinucleotide (8.2%), two trinucleotide (4.1%), and three complex (6.1%) SSRs, most of which were found in the LSC region (Table 4). Furthermore, the three complex SSRs consisted of three mononucleotide and four dinucleotide repeats. A total of 21 SSRs were detected in the genes, and the rest were located in the IGS region. Thirty-three (67.3%) mononucleotide SSRs were mainly composed of short poly A or poly T repeats and rarely contained tandem guanine (G) or cytosine (C) repeats, which corroborate previous reports on other herbs [39]. These SSR markers can be utilized for the conservation study, linkage map construction, and marker-assisted selection of wild S. divaricata and other closely related species.
Table 4

Simple sequence repeats (SSRs) in the wild S. divaricata chloroplast genome.

IDTypeRepeat MotifbpStartEndRegionGeneIDTypeRepeat MotifbpStartEndRegionGene
1p1(A)101015391548LSC 26c(A)11gacaggtttttgctccttttcgtataatattcttgtattcttgtaa Tagaaaaataatagaaaag (A)108671,81371,898LSC clpP
2p1(A)101017941803LSC trnK-UUU 27p1(T)101072,63772,646LSC
3p3(TTA)51554195433LSC rps16 28p1(T)121283,12483,135LSC rpl16
4p1(A)101093939402LSC trnR-UCU 29p1(T)161684,84384,858LSC
5p2(AT)71498679880LSC 30p1(G)131394,28794,299IRb
6p2(AT)91813,05913,076LSC 31p1(T)131399,33599,347IRb
7p1(A)141416,40616,419LSC 32p1(T)1010103,203103,212IRb trnI-GAU
8p1(T)111118,65118,661LSC rpoC2 33p1(G)1414104,444104,457IRb trnA-UGC
9p1(T)121226,38026,391LSC 34p1(A)1010111,049111,058IRb ycf1
10p1(A)101027,39227,401LSC 35p1(A)1111111,833111,843IRb ycf1
11p3(AAT)61828,64228,659LSC 36p1(A)1212115,539115,550SSC
12p1(T)121229,60229,613LSC 37c(TA)6ttt(TA) 8aattatatatatga(AT)657115,669115,725SSC
13p1(T)121232,75332,764LSC 38p1(A)1313116,776116,788SSC ccsA
14p1(A)121233,32033,331LSC 39p1(A)1111120,333120,343SSC
15p1(C)101037,14137,150LSC 40p1(T)1010121,130121,139SSC
16p1(A)131343,45543,467LSC 41p1(T)1515128,046128,060SSC
17p1(T)101045,26945,278LSC ycf3 42p1(T)1111128,410128,420SSC ycf1
18p2(TA)71447,46547,478LSC 43p1(T)1010128,671128,680SSC ycf1
19p2(TA)71451,92651,939LSC 44p1(T)1111129,194129,204IRa
20p1(T)101052,68852,697LSC 45p1(T)1010129,979129,988IRa
21p1(T)101055,64855,657LSC atpB 46p1(C)1414136,580136,593IRa trnA-UGC
22p1(A)181856,23456,251LSC 47p1(A)1010137,825137,834IRa trnI-GAU
23p1(T)101058,02158,030LSC 48p1(A)1313141,690141,702IRa
24p1(T)101060,53160,540LSC 49p1(C)1313146,738146,750IRa
25c(A)10tatcagaacttt (TA)63464,12364,156LSC

2.4. Phylogenetic Analysis of 47 Taxa under Subfamily Apioideae Based on CpDNA Sequences

Based on the successful application of cpDNAs in studying angiosperm phylogeny, complete cpDNA sequences have been widely used to obtain powerful data for developing biosystem models [14]. To study the phylogenetic position of the wild S. divaricata within the Apiaceae subfamily Apioideae Drude, the complete cpDNAs of forty-seven taxa belonging to ten genera under tribes Peucedaneae Drude, Smyrnieae Koch, Ammineae Koch, Laserpiteae Drude, and Selineae Spreng were used for phylogenetic tree construction (Table S3). One species each from tribes Saniculoideae Drude (Sanicula chinensis) and Mackinlayoideae Plunkett and Lowry (Centella asiatica) were selected as outgroups (Figure 7). The maximum likelihood (ML) trees generated using FastTree and IQ-TREE software demonstrated similar results and ensured the reliability of the phylogenetic analysis, but also showed some difference from previous reports [16,17]. Notably, the 100% bootstrap value observed in the clustering of L. seseloides and S. divaricata further supported the hypothesis that the two were the same species. In addition, S. divaricata (Laserpiteae Drude) was discovered to be most closely related to L. buchtormensis from Ammineae Koch, P. japonicum and P. praeruptorum from Peucedaneae Drude, and S. montanum from Ammineae Koch. These results suggest that the genetic relationships between the species under genera Saposhnikovia and Libanotis are closer than those under genera Peucedanum and Seseli, as evidenced by the high bootstrap support values. Furthermore, Laserpiteae Drude and Ammineae Koch species potentially have a closer kinship with each other than with Peucedaneae Drude species, which contradicts the previous reports on cultivated S. divaricata [16,17].
Figure 7

Phylogeny of 47 taxa within Apioideae Drude species based on the ML analysis of the cp genome’s IRs, LSC, and SSC regions with Sanicula chinensis and Centella asiatica as the outgroups based on FastTree (left) and IQ-TREE (right). The information of all chloroplast genomes used for phylogenetic analysis was shown in Table S3.

3. Materials and Methods

3.1. Sampling, CpDNA Extraction, and Sequencing

Fresh mature leaves were plucked from wild S. divaricata. Total genomic DNA was extracted from young leaves using a Trelief TM Plant Genomic DNA Kit (TsingKe Biotechnology Co., Ltd., Beijing, China). After quality testing, DNA was fragmented and used to set up 350 bp short-insert libraries and the qualified libraries were sequenced with PE 150 bp on the BGISEQ-500 sequencer according to the manufacturer’s instructions. The sequencing depth was 6.0 Gb of 150-bp paired-end reads.

3.2. CpDNA Assembly and Annotation

First, all raw reads were trimmed using Fastp [40]. Subsequently, high-quality reads were mapped to the reference chloroplast genomes of Apioideae obtained from GenBank through Bowtie2 v.2.3.4.3 (Langmead B, et al. https://github.com/BenLangmead/bowtie2, accessed on accessed on 14 September 2020) [41]. The sequence of the coding gene having the maximum coverage was utilized as a seed sequence for de novo assembly by NOVOPlasty v4.2.1 [42]. The assembled cp genomes were annotated with DOGMA [43], GeSeq [44], tRNAscan [45], and ARAGORN [46], then manually adjusted and confirmed using Geneious 9.1.8 (M Kearse, et al. San Diego, CA, https://www.geneious.com/, accessed on accessed on 14 September 2020) [47]. The circular chloroplast genome map was drawn by OrganellarGenomeDRAW tool (OGDRAW) v.1.3.1 (Greiner S, et al. https://chlorobox.mpimp-golm.mpg.de/OGDraw.html, accessed on accessed on 14 September 2020) [48] for further comparison of gene order and content. The other genomes downloaded from GenBank for comparative analysis were re-annotated according to the above method. The assembled cp genome has been deposited to the GenBank with the accession number MZ708833.

3.3. CpDNA Comparison and Sequence Divergence Analysis

The Relative Synonymous Codon Usage (RSCU) values were determined to quantify the extent of the codon usage bias. RSCU was calculated for every codon in each genome according to the published equation [49]. The overall GC content and GC content at the first, second, and third codon positions (GC1, GC2, and GC3, respectively) of the genomes were calculated using EMBOSS software suite [50]. Simple sequence repeats (SSRs) were searched via MISA v1.01 [51] with the following criteria: 10, 6, 5, 5, 5, and 5 repeat units for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides, respectively. Chloroplast genome similarity was assessed using BLAST Atlas on the GView server (Franklin B., et al. https://server.gview.ca/, accessed on 14 September 2020) [52] with S. divaricata genome as a reference. The junction regions between the IR, SSC, and LSC of these plastomes were compared using the IRscope+ online program [53]. The divergent regions were visualized using Shuffle-LAGAN mode [54] included in mVISTA v.2.0 (Frazer K.A., et al., https://genome.lbl.gov/vista/mvista, accessed on 14 September 2020) [55] with S. divaricata genome as a reference. To identify polymorphic regions with substantial variability, the aligned sequences were imported in DnaSP v6.12.03 (DNA Sequences Polymorphism) (Rozas J., et al. http://www.ub.edu/dnasp/, accessed on 14 September 2020) using the sliding window method with a step size of 15 bp and a window length of 200 bp [56].

3.4. Phylogenetic Analysis

The complete cp genomes of forty-seven taxa from the Apiaceae subfamily Apioideae Drude and two species from Saniculoideae Drude (S. chinensis) and Mackinlayoideae Plunkett and Lowry (C. asiatica) as outgroups were employed for the phylogenetic reconstruction. These cpDNAs were downloaded from GenBank in NCBI (Table S3). The whole cpDNA sequence alignment was carried out by using MAFFT v7.450 (Katoh K., et al. https://mafft.cbrc.jp/alignment/software/, accessed on 14 September 2020) [57], and then the regions with consistent site coverage less than 95% were deleted. Maximum likelihood (ML) analysis was performed by FastTree 2.1.11 (Price M.N., et al. http://www.microbesonline.org/fasttree/, accessed on 14 September 2020) [58] and IQ-TREE version 2.1.4 (Minh B.Q., et al. https://github.com/iqtree/iqtree2, accessed on 14 September 2020) [59]. The former was conducted under the best-fit nucleotide substitution model with General Time Reversible + γ (GTR + γ), Shimodaira–Hasegawa test, and the latter was determined using the Akaike Information Criterion (AIC) by ModelFinder in the IQ-TREE package and 1000 bootstrap replicates [60].

4. Conclusions

In this study, we first analyzed the cpDNA of the wild S. divaricata and compared it with its close relatives. The wild S. divaricata cpDNA contained 8 rRNA genes, 36 tRNA genes, and 85 PCGs and had a total GC content of 37.5%. These results are consistent with all the reported cpDNA sequences of S. divaricata and its synonymous species, L. seseloides. Compared to other related species, the non-coding regions exhibited greater variation than the coding regions. The comparison of the IR/LSC and IR/SSC boundaries among seven cpDNAs revealed that the trnN in the wild S. divaricata may have been lost during the reorganization process. Hence, trnN can be combined with the psbA-trnH IGS region as a DNA barcode for the Apioideae Drude species. We also found that the LSC region was a dense region of repeated sequences, in which 49 potentially informative SSRs were identified. Furthermore, the genetic relationship between L. seseloides and S. divaricata was confirmed at the genomic level for the first time. Notably, these two were most closely related to L. buchtormensis, which contradicts previous reports. By contrast, the phylogenetic tree showed that the Laserpiteae Drude and Ammineae Koch species have a close kinship. Overall, our findings contribute important genetic information that may be useful for future studies on the genetic diversity and phylogenetic relationships of the Apioideae species.
  55 in total

Review 1.  On the origin of chloroplasts, import mechanisms of chloroplast-targeted proteins, and loss of photosynthetic ability - review.

Authors:  M Vesteg; R Vacula; J Krajcovic
Journal:  Folia Microbiol (Praha)       Date:  2009-10-14       Impact factor: 2.099

2.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

3.  Codon optimality is a major determinant of mRNA stability.

Authors:  Vladimir Presnyak; Najwa Alhusaini; Ying-Hsin Chen; Sophie Martin; Nathan Morris; Nicholas Kline; Sara Olson; David Weinberg; Kristian E Baker; Brenton R Graveley; Jeff Coller
Journal:  Cell       Date:  2015-03-12       Impact factor: 41.582

4.  The role of insertions/deletions in the evolution of the intergenic region between psbA and trnH in the chloroplast genome.

Authors:  J Aldrich; B W Cherney; E Merlin; L Christopherson
Journal:  Curr Genet       Date:  1988-08       Impact factor: 3.886

5.  The complete chloroplast genome sequence of Ledebouriella seseloides (Hoffm.) H. Wolff.

Authors:  Hyun Oh Lee; Kyunghee Kim; Sang-Choon Lee; Junki Lee; Jonghoon Lee; Soonok Kim; Tae-Jin Yang
Journal:  Mitochondrial DNA A DNA Mapp Seq Anal       Date:  2015-07-28       Impact factor: 1.514

6.  Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors:  Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal:  Bioinformatics       Date:  2012-04-27       Impact factor: 6.937

7.  GeSeq - versatile and accurate annotation of organelle genomes.

Authors:  Michael Tillich; Pascal Lehwark; Tommaso Pellizzer; Elena S Ulbricht-Jones; Axel Fischer; Ralph Bock; Stephan Greiner
Journal:  Nucleic Acids Res       Date:  2017-07-03       Impact factor: 16.971

Review 8.  Saposhnikovia divaricata-An Ethnopharmacological, Phytochemical and Pharmacological Review.

Authors:  Min Yang; Cong-Cong Wang; Wen-le Wang; Jian-Ping Xu; Jie Wang; Chun-Hong Zhang; Min-Hui Li
Journal:  Chin J Integr Med       Date:  2020-04-21       Impact factor: 1.978

Review 9.  Saposhnikoviae divaricata: a phytochemical, pharmacological, and pharmacokinetic review.

Authors:  Jenny Kreiner; Edwin Pang; George Binh Lenon; Angela Wei Hong Yang
Journal:  Chin J Nat Med       Date:  2017-04

10.  Complete chloroplast genome sequences of Mongolia medicine Artemisia frigida and phylogenetic relationships with other plants.

Authors:  Yue Liu; Naxin Huo; Lingli Dong; Yi Wang; Shuixian Zhang; Hugh A Young; Xiaoxiao Feng; Yong Qiang Gu
Journal:  PLoS One       Date:  2013-02-27       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.