Literature DB >> 36119003

Comparing chloroplast genomes of traditional Chinese herbs Schisandra sphenanthera and S. chinensis.

Xue-Ping Wei1,2, Hui-Juan Li1, Peng Che1, Hao-Jie Guo1, Ben-Gang Zhang1,2, Hai-Tao Liu1,2, Yao-Dong Qi1,2.   

Abstract

Objective: Schisandra sphenanthera and S. chinensis are the two important medicinal plants that have long been used under the names of "Nan-Wuweizi" and "Wuweizi", respectively. The misuse of "Nan-Wuweizi" and "Wuweizi" in herbal medical products calls for an accurate method to distinguish these herbs. Chloroplast (cp) genomes have been widely used in species delimitation and phylogeny due to their uniparental inheritance and lower substitution rates than that of the nuclear genomes. To develop more efficient DNA markers for distinguishing S. sphenanthera, S. chinensis, and the related species, we sequenced the cp genome of S. sphenanthera and compared it to that of S. chinensis.
Methods: The cp genome of S. sphenanthera was sequenced at the Illumina HiSeq platform, and the reference-guided mapping of contigs was obtained with a de novo assembly procedure. Then, comparative analyses of the cp genomes of S. sphenanthera and S. chinensis were carried out.
Results: The cp genome of S. sphenanthera was 146 853 bp in length and consisted of a large single copy (LSC) region of 95 627 bp, a small single copy (SSC) region of 18 292 bp, and a pair of inverted repeats (IR) of 16 467 bp. GC content was 39.6%. A total of 126 functional genes were predicted, of which 113 genes were unique, including 79 protein-coding genes, 30 transfer RNA (tRNA) genes, and four ribosomal RNA (rRNA) genes. Five tRNA, four protein-coding genes, and all rRNA were duplicated in the IR regions. There were 18 intron-containing genes, including six tRNA genes and 12 protein-coding genes. In addition, 45 SSRs were detected. The whole cp genome of S. sphenanthera was 123 bp longer than that of S. chinensis. A total of 474 SNPs and 97 InDels were identified. Five genetic regions with high levels of variation (Pi > 0.015), trnS-trnG, ccsA-ndhD, psbI-trnS, trnT-psbD and ndhF-rpl32 were revealed.
Conclusion: We reported the cp genome of S. sphenanthera and revealed the SNPs and InDels between the cp genomes of S. sphenanthera and S. chinensis. This study shed light on the species identification and further phylogenetic study within the genus of Schisandra.
© 2020 Tianjin Press of Chinese Herbal Medicines. Published by ELSEVIER B.V.

Entities:  

Keywords:  Nan-Wuweizi; Schisandra chinensis (Turcz.) Baill; Schisandra sphenanthera Rehd. et Wils.; Wuweizi; chloroplast genome

Year:  2020        PMID: 36119003      PMCID: PMC9476811          DOI: 10.1016/j.chmed.2019.09.009

Source DB:  PubMed          Journal:  Chin Herb Med        ISSN: 1674-6384


Introduction

Schisandra Mich. (Schisandraceae) is a genus with disjunctive distribution between East Asia and North America. It consists of 23 species based on Saunders’ treatment (Saunders, 2000). Only one species, S. glabra, is indigenous to North America, whereas all other species distribute in East and Southeast Asia, and Far Eastern Siberia (Saunders, 2000). The fruit of the species from China are typically used as folk herbs, especially Schisandra sphenanthera Rehd. et Wils. and S. chinensis (Turcz.) Baill, which are the source plants of “Nan-Wuweizi” (Schisandrae Sphenantherae Fructus) and “Wuweizi” (Schisandrae Chinensis Fructus) recorded in the Chinese Pharmacopoeia (National Pharmacopoeia Committee, 2015). Despite of the similar use in traditional Chinese medicine, including arresting discharge, replenishing qi, promoting fluid secretion, toning the kidney, and inducing sedation, there is difference between their chemical compositions (Lu & Chen, 2009). The medical efficacy of the two herbs has been considered different since the Ming Dynasty (1368–1644 CE). For example, Enlightening Primer of Materia Medica (Ben Cao Meng Quan in Chinese), an ancient book of herbal medicine written by Jia-mo Chen from Ming Dynasty, pointed out that “Nan-Wuweizi” was used to treat wind-cold cough while “Wuweizi” was better for the treatment of consumptive damage. The different pharmacodynamics imply that they should not be treated as the same drug. However, the fruits of Schisandra are all red berries and nearly all of them, including S. sphenanthera and S. chinensis, were used as crude drugs. The diagnostic morphological characters, such as tepal color, number and shape of stamens, and gynoecium, typically are identified in living plants, thus it is difficult to differentiate the source plants of crude drug and traditional Chinese medicine products. Some previous studies attempted to resolve this problem using gene fragments. For example, a recent scheme using a combination of ITS + trnH-psbA + matK + rbcL as the most ideal DNA barcode to determine the medicinal plants of Schisandraceae was proposed by Zhang et al. (2015). However, the results revealed that this combination of DNA barcodes was not suitable for most species of Schisandra due to its low variation. Chloroplasts (cp) are the key plastid organelles in nearly all terrestrial plants and algae (Neuhaus & Emes, 2000). Cp genomes of the sequenced species usually have a quadripartite structure, which includes two identical copies of a large inverted repeat (IR) sequence separated by a small single-copy (SSC) and a large single-copy (LSC) region. Variations among the cp genomes involves the plastome size and structure (Howe, 2016). Cp genomes, with the features of uniparental inheritance and lower substitution rates than that of the nuclear genomes, have been widely used in species identification, phylogenetics, and genetic engineering in previous studies (e.g., Ravi et al., 2008, Parks et al., 2009, Daniell et al., 2016, Sabater, 2018). Here we sequenced the cp genome of S. sphenanthera and compared it to the cp genome of S. chinensis, which have been reported in our previous study (Guo et al., 2017). The aim was to find more effective molecular markers to reveal more accurate interspecific relationships, and also to identify medicinal material within Schisandra.

Materials and methods

Plant materials

Fresh leaves of Schisandra sphenanthera were collected from Ankang, Shaanxi Province, China (Collection No.: SX2016101403; 109°0′932″E, 32°53′03″N). The voucher specimen was deposited in the herbarium of the Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Peking Union Medical College (IMD), China.

DNA extraction and sequencing

Total genomic DNA was extracted from silica gel-dried leaves using the Plant Genomic DNA Kit (Tiangen Biotech, Beijing, China) following the manufacturer’s instructions. The Illumina HiSeq platform was used to sequence the S. sphenanthera genome with a paired-end (PE) 150 genomic library by Beijing Novogene Bioinformatics Technology Co., Ltd. (Beijing, China).

Genome assembly and annotation

Clean reads were obtained after filtering out reads of low quality, such as the reads with more than 50% bases which mass value less than 5 or the reads with more than 10% “N”. The BLAST+ program (Camacho et al., 2009) was used to capture S. sphenanthera cp reads by comparing them with the cp sequence of S. chinensis (Accession No. KY111264) as a reference. The cp reads were assembled by SOAPdenovo2 (Luo et al., 2012). SSPACE-STANDARD-3.0 was used to extend cp contigs and build scaffolds (Boetzer et al., 2011). More cp DNA reads were extracted from total DNA reads with the scaffolds as a reference in order to obtain a complete cp genome sequence. The cp genome sequence was confirmed by mapping the total DNA reads in Geneious 10.0.2 (https://www.geneious.com/). Chloroplast genome annotation was performed with CPGAVAS (Liu et al., 2012), and the annotation result was manually inspected by Apollo (Lee et al., 2009). The tRNA gene boundaries were corroborated by using tRNAscan-SE 1.21 (Schattner et al., 2005). The cp genome map was drawn using the OGDRAW program (Lohse et al., 2013). The complete chloroplast genome sequence was deposited in GenBank, receiving the accession number MK193856.

Repeat sequence and SSRs analyses

We investigated the distribution of SSRs located in the cp genome of S. sphenanthera and verified all of the repeats manually. The tandem repeat structure was detected using Tandem Repeats Finder (TRF) v4.04 (http://tandem.bu.edu/trf/trf.html) with the default parameters (Benson, 1999). Short dispersed repeats (SDRs) were identified by REPuter (http://tandem.bu.edu/trf/trf.html) including forward and palindromic repeats with the minimal repeat size of 30 bp and hamming distance = 3. Low complexity and nested repeats were removed manually. The MISA program (http://tandem.bu.edu/trf/trf.html) was used to exploit potential SSRs. Motif sizes of 10, 5, 5, 3, 3, and 3 were set as the minimum repeats for mono-, di-, tri-, tetra-, penta-, and hexa- nucleotides, respectively.

Comparative genome analysis

The mVISTA program was used to compare the cp genome of S. sphenanthera, S. chinensis, Illicium oligandrum Merr. et Chun and other three basal angiosperm species in a Shuffle-LAGAN mode (Frazer et al., 2004). Cp genomes of S. chinensis (KY111264), I. oligandrum (EF380354), Nymphaea alba L. (AJ627251), Amborella trichopoda Baill. (AJ506156) and Trithuria inconspicua Cheeseman (HE963749) were downloaded from GenBank. S. sphenanthera was set as the reference. IR expansion/contraction was also analyzed. In addition, DnaSP v.5 (Librado & Rozas, 2009) was used to obtain the SNPs and InDels between S. sphenanthera and S. chinensis.

Sliding window analysis of cp genomes

Sequences of S. sphenanthera and S. chinensis were aligned and adjusted manually using BioEdit v.7.1.11 (Hall et al., 2011). Then the DnaSP v.5 (Librado & Rozas, 2009) was used to conduct a sliding window analysis to assess the variability of these two cp genomes with 200-bp step size and 600-bp window length.

Results

Chloroplast genome assembly, organization, and gene content of S. sphenanthera

The complete cp genome of S. sphenanthera was 146 853 bp in length with the typical circular quadripartite structure of angiosperm cp genomes (Fig. 1). The LSC (95 627 bp) and SSC regions (18 292 bp) were separated by a pair of inverted repeat regions (IRA and IRB, 16 467 bp) (Fig. 1, Table 1). The GC content of the complete cp genome, LSC, SSC, and IR region of S. sphenanthera were 39.60%, 38.40%, 35%, and 45.6%, respectively. The relatively high GC content in the IR regions also occurs in most other plants because of the high GC content of the four ribosomal RNA (rRNA) genes (Cheng et al., 2017). GC content was unevenly distributed throughout the entire cp genome, which could be related to the divergence of the conserved property between IR and SC regions (Yang et al., 2014).
Fig. 1

Chloroplast genome map of S. sphenanthera. The gray arrow represents the direction in which the genes are translated. Genes shown outside of the circle are transcribed clockwise, while those inside are counterclockwise. Large single copy (LSC), small single copy (SSC), and inverted repeats (IRA, IRB) are indicated. The darker gray represents GC content in the inner circle, conversely the lighter one represents AT content.

Table 1

Characteristics of cp genomes of Schisandra sphenanthera and S. chinensis.

NamesS. sphenanthera
S. chinensis
Length (bp)/percent (%)GC content/%Length (bp)/percent (%)GC content/%
Total146 85339.6146 73039.7
LSC95627/65.138.495538/65.138.6
SSC18292/12.535.018270/12.535.0
IR16467/11.245.616461/11.245.7
CDS72917/49.739.472837/49.639.4
1st position24304/16.546.824279/16.546.8
2nd position24305/16.539.324279/16.549.3
3rd position24306/16.532.124279/16.532.1
Chloroplast genome map of S. sphenanthera. The gray arrow represents the direction in which the genes are translated. Genes shown outside of the circle are transcribed clockwise, while those inside are counterclockwise. Large single copy (LSC), small single copy (SSC), and inverted repeats (IRA, IRB) are indicated. The darker gray represents GC content in the inner circle, conversely the lighter one represents AT content. Characteristics of cp genomes of Schisandra sphenanthera and S. chinensis. A total of 126 functional genes were predicted in the cp genome of S. sphenanthera, of which 113 genes were unique, including 79 protein-coding genes, 30 transfer RNA (tRNA) genes and four rRNA genes (Fig. 1, Table 2). Five tRNA, four protein-coding genes, and all rRNA replication events occurred in the IR regions. The ycf15 is present in many angiosperm species but absent in S. sphenanthera (Goremykin et al., 2003, Li et al., 2014).
Table 2

Gene contents in cp genome of S. sphenanthera.

FunctionsFamily namesCodesList of genes
Genes for photosynthesisSubunits of ATP synthaseatpatpA, atpB, atpE, atpF+, atpH, atpI
Subunits of NADH-dehydrogenasendhndhA+, ndhB*+, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Subunits of cytochrome b/f complexpetpetA, petB+, petD+, petG, petL, petN
Subunits of photosystem IpsapsaA, psaB, psaC, psaI, psaJ
Subunits of photosystem IIpsbpsbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
Subunit of rubiscorbcrbcL



Other genesSubunit of Acetyl-CoA-carboxylaseaccaccD
c-type cytochrom synthesis geneccsccsA
Envelop membrane proteincemcemA
ProteaseclpclpP+
Translational initiation factorinfinfA
MaturasematmatK



Self replicationLarge subunit of ribosomerplrpl16+, rpl2+, rpl14, rpl20, rpl22, rpl23, rpl32, rpl33, rpl36
DNA dependent RNA polymeraserporpoA, rpoB, rpoC1+, rpoC2
Small subunit of ribosomerpsrps18, rps15, rps16+, rps7*, rps12*+, rps3, rps2, rps11, rps4, rps19, rps8, rps14
rRNA Genesrrnrrn16S*, rrn23S*, rrn4.5S*, rrn5S*
tRNA GenestrntrnA-UGC*+, trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCC+, trnH-GUG, trnI-CAU, trnI-GAU*+, trnK-UUU+, trnL-CAA, trnL-UAA+, trnL-UAG, trnfM-CAU, trnM-CAU, trnN-GUU*, trnP-UGG, trnQ-UUG, trnR-ACG*, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC*, trnV-UAC+, trnW-CCA, trnY-GUA



Genes of unkown functionConserved open reading framesycfycf1*, ycf2, ycf3+, ycf4

Notes: The rps12 gene was divided, 5′-rps12 was located in the LSC region, and 3′-rps12 was located in the IR region.

Gene located in the IR regions; + Intron-containing gene.

Gene contents in cp genome of S. sphenanthera. Notes: The rps12 gene was divided, 5′-rps12 was located in the LSC region, and 3′-rps12 was located in the IR region. Gene located in the IR regions; + Intron-containing gene. Introns play an important role in the regulation of alternative gene splicing but have no significance in the structure of the translation products. Introns always accumulate more mutations without the pressure of natural selection (Graveley, 2001). Altogether 18 intron-containing genes were found, including six tRNA genes (trnK-UUU, trnG-UCC, trnL-UAA, trnV-UAC, trnI-GAU, trnA-UGC) and 12 protein-coding genes (rps16, atpF, rpoC1, ycf3, clpP, petB, petD, rpl16, rpl2, ndhB, ndhA, rps12) (Table 3). Both ycf3 and clpP contained two introns. In contrast, rps12 had no intron but three exons, which indicated the rps12 may have an RNA trans-splicing structure. This gene located within the boundaries of the LSC and the IR regions, with its 5′ exon locating in an LSC region and two 3′ exons locating in IR regions. Only one intron was found in the remaining 15 genes, among which trnK-UUU has the longest intron (2486 bp) and trnL-UAA has the shortest one (482 bp).
Table 3

Genes with introns and length of exons and introns in cp genome of S. sphenanthera.

No.GenesLocationExon I/bpIntron I/bpExon II/bpIntro II/bpExon III/bp
1trnK-UUULSC39248634
2rps16LSC41819220
3trnG-UCCLSC3376061
4atpFLSC148757416
5rpoC1LSC4567111608
6ycf3LSC127727230756153
7trnL-UAALSC3748250
8trnV-UACLSC4157155
9clpPLSC71804294556247
10petBLSC6761642
11petDLSC8690526
12rpl16LSC9945402
13rpl2LSC394659431
14ndhBIR869609757
15trnI-GAUIR3493939
16trnA-UGCIR3979035
17ndhASSC5561061542
18rps12LSC-IR11423226
Genes with introns and length of exons and introns in cp genome of S. sphenanthera. The total length of the protein-coding regions (CDS) genes was 72 917 bp, accounting for 49.7% of the whole cp genome of S. sphenanthera. The frequency of codon usage was calculated and summarized in Table 1 and Table 4. A total of 10.1% of all codons (2463) encoded leucine, and 1.2% (2 9 0) encoded cysteine. Leucine and cysteine were the most and least prevalent amino acids, respectively. The high AT content at the third codon position reflected a codon usage bias of A or T. With the exception of trnL-CAA, all of the types of preferred synonymous codons (RSCU >1) ended with A or U. The GC content of the third codon position was lowest (32.1%) in the CDS, which suggested the codon usage bias may have developed during the evolution course. The codon usage bias was generally found in cp genomes of plant and reflected a genomic bias towards a higher A + T content (Clegg et al., 1994).
Table 4

Codon-anticodon recognition pattern and relative synonymous codon usage (RSCU) for cp genome of S. sphenanthera.

CodonCountRSCUtRNACodonCountRSCUtRNA
UAU(Y)6971.57UUA(L)7011.71trnL-UAA
UAC(Y)1910.43trnY-GUAUUG(L)5271.28trnL-CAA
UGG(W)4311trnW-CCACUU(L)4921.2
GUU(V)5151.47CUC(L)1970.48
GUC(V)1720.49trnV-GACCUA(L)3450.84trnL-UAG
GUA(V)4861.38trnV-UACCUG(L)2010.49
GUG(V)2330.66AAA(K)8301.41trnK-UUU
ACU(T)4901.57AAG(K)3500.59
ACC(T)2430.78trnT-GGUAUU(I)9541.39
ACA(T)3631.16trnT-UGUAUC(I)4600.67trnI-GAU
ACG(T)1510.48AUA(I)6490.94trnI-CAU
UCU(S)5171.66CAU(H)4771.52
UCC(S)3050.98trnS-GGACAC(H)1510.48trnH-GUG
UCA(S)3971.27trnS-UGAGGU(G)5721.33
UCG(S)1730.55trnS-GCUGGC(G)1780.41trnG-GCC
AGU(S)3601.15GGA(G)6741.56trnG-UCC
AGC(S)1210.39GGG(G)3000.7
CGU(R)3231.25trnR-ACGUUU(F)7611.17
CGC(R)1120.43UUC(F)5380.83trnF-GAA
CGA(R)3261.26GAA(E)9131.44trnE-UUC
CGG(R)1200.47GAG(E)3550.56
AGA(R)4821.87trnR-UCUGAU(D)7601.56
AGG(R)1840.71GAC(D)2130.44trnD-GUC
CAA(Q)6071.46trnQ-UUGUGU(C)2181.5
CAG(Q)2240.54UGC(C)720.5trnC-GCA
CCU(P)3971.53GCU(A)6071.76
CCC(P)2260.87GCC(A)2350.68
CCA(P)3011.16trnP-UGGGCA(A)4001.16trnA-UGC
CCG(P)1140.44GCG(A)1390.4
AAU(N)8281.51UAA(*)331.19
AAC(N)2720.49trnN-GUUUAG(*)230.83
AUG(M)5911trnfM-CAUtrnM-CAUUGA(*)270.98

Notes: RSCU: Relative Synonymous Codon Usage.

Codon-anticodon recognition pattern and relative synonymous codon usage (RSCU) for cp genome of S. sphenanthera. Notes: RSCU: Relative Synonymous Codon Usage.

Repeat structure and SSRs analyses

The simple sequence repeats (SSRs), also called microsatellites, were highly variable nucleotide arrays composed of 1–6 nucleotide repeat units in tandem (Chen et al., 2006). We obtained a total of 45 SSRs in the cp genome of S. sphenanthera. These SSRs included 30 mononucleotide repeats, six dinucleotide repeats, one trinucleotide repeat, five tetranucleotide repeats, and three pentanucleotide repeats. Hexanucleotide repeat was not found (Fig. 2). All mononucleotide SSRs were composed of A (12) and T (17) except one composed of G. Five dinucleotide repeats were composed of A/T motifs and one composed of a T/C motif. The repeat number of mononucleotide motifs ranged from 10 to 13. The largest SSRs were tandem repeats and 23 bp in length. It composed of tetranucleotide repeats of AAAT for three times, and mononucleotide repeats of A for 11 times. SSRs composed of G or C were less frequent than those of A or T, which might have been related to the greater stability of G-C making it difficult to change within the genome. Thirty-seven SSRs (82.22%) were located in the LSC region, six SSRs (13.33%) were located in the SSC region and two SSRs (4.44%) were located in the IR regions. Furthermore, 38 SSRs (84.44%) located in the noncoding regions, among which 31 were in the intergenic regions and seven in introns. Only seven SSRs (15.56%) were found in the gene coding regions (psbI, rpoC2, rpoB, psbC, cemA, ycf2). Consequently, SSRs were more abundant in non-coding regions than that in the CDS base on the above statistics.
Fig. 2

Analysis of simple sequence repeats (SSRs) in Schisandra sphenanthera and S. chinensis cp genomes. Mono: Mono-nucleotide; Di: Di-nucleotide; Tri: Tri-nucleotide; Tetra: Tetra-nucleotide; Penta: Penta-nucleotide; Hexa: Hexa-nucleotide.

Analysis of simple sequence repeats (SSRs) in Schisandra sphenanthera and S. chinensis cp genomes. Mono: Mono-nucleotide; Di: Di-nucleotide; Tri: Tri-nucleotide; Tetra: Tetra-nucleotide; Penta: Penta-nucleotide; Hexa: Hexa-nucleotide. Short dispersed repeats (SDRs), with lengths longer than 30 bp, were considered to be one of the major factors promoting the rearrangements of cp genome (Qian et al., 2013). Thirty-three pairs repeats ranging from 30 to 149 bp in length were found in the cp genome of S. sphenanthera (Fig. 3). Four forward, 10 palindromic, and 19 tandem repeats were identified. The length of all the forward repeats were 30–60 bp and palindromic repeats were 30–41 bp. Two tandem repeat motifs located in ycf2 of the LSC region, one of which contained the longest repeat with the length of 149 bp while the other 18 tandem repeats exhibited only 30 to 70 bp in length. The highest percentage of repeats, 22 of 33 pair repeats (two forward repeats; eight palindromic repeats; 12 tandem repeats) were completely distributed in the LSC region.
Fig. 3

Analysis of short dispersed repeats (SDRs) in cp genomes of Schisandra sphenanthera and S. chinensis. T: tandem repeats; F: forward repeats; P: palindromic repeats.

Analysis of short dispersed repeats (SDRs) in cp genomes of Schisandra sphenanthera and S. chinensis. T: tandem repeats; F: forward repeats; P: palindromic repeats.

Comparative analysis of basal angiosperms

Comparative analysis of the cp genomes of six basal angiosperms was performed with the annotation of S. sphenanthera as the reference (Fig. 4; Table 5). The cp genome of S. sphenanthera was most similar to that of S. chinensis and the most different from that of the basal angiosperm Trithuria inconspicua. The average size of analyzed genomes was 155 024 bp. The divergence was higher in the non-coding regions than that of the coding regions. The variation pattern in the length of SC regions was consistent with that of IR regions among all the six species analyzed except Illicium oligandrum. The cp genomes of Trithuria inconspicua (165 389 bp) and S. chinensis (146 730 bp) were the longest and the shortest in length, respectively. The variations of the length of IR regions contributed most to the differences in genome size among species (Table 1).
Fig. 4

Sequence identity plots between six sequenced chloroplast genomes, with Schisandra sphenanthera as a reference. The vertical scale indicates the identity percentage (50–100%). The horizontal axis corresponds to the coordinates within the chloroplast genome. Annotated genes are displayed along the top.

Table 5

Cp genomes size comparison of six basal angiosperms.

SpeciesLength/bp
TotalLSCSSCIR
Schisandra sphenanthera146 85395 62718 29216 467
Schisandra chinensis146 73097 35120 30515 058
Illicium oligandrum148 55397 14420 26715 571
Amborella trichopoda162 68690 97018 41426 651
Nymphaea alba159 93090 01419 56225 177
Trithuria inconspicua165 38984 468635437 284
Average155 02492 59617 19922 702
Sequence identity plots between six sequenced chloroplast genomes, with Schisandra sphenanthera as a reference. The vertical scale indicates the identity percentage (50–100%). The horizontal axis corresponds to the coordinates within the chloroplast genome. Annotated genes are displayed along the top. Cp genomes size comparison of six basal angiosperms.

IR contraction and expansion

The expansion and contraction of the boundaries between the IR and SC regions were primarily responsible for the size variations in genomes among the angiosperm lineages. We compared the junction of the IR/SC boundaries and their adjacent genes among S. sphenanthera, S. chinensis, Illicium oligandrum, Nymphaea alba, Trithuria inconspicua, and Amborella trichopoda (Fig. 5).
Fig. 5

Comparison at junction of IR/SC boundaries.

Comparison at junction of IR/SC boundaries. The adjacent genes of IR/SC boundaries of the cp genome of S. sphenanthera were the same as those of S. chinensis and Illicium oligandrum. However, the distance from the genes to the boundaries were different. The adjacent genes of the LSC-IRA (rps19, rpl2) and IRB-LSC (rpl2, trnH) of Nymphaea alba, Trithuria inconspicua, and Amborella trichopoda were identical to each other, but different from S. sphenanthera (trnL, ndhB at LSC-IRA; ndhB, trnH at IRB-LSC). The gene located at the border of IRA-SSC and SSC-IRB in Trithuria inconspicua was ndhD, whereas it was ycf1 in the other species. Obvious contraction happened in the IRA regions, which created the IRA-LSC border between the trnL and ndhB in S. sphenanthera. At the IRA-SSC border, ndhF shared some nucleotides with ycf1 in three species (33 bp in S. sphenanthera, 112 bp in S. chinensis, 11 bp in Illicium oligandrum). The complete ycf1 appeared to be a pseudogene in the IRA region because it created an incomplete duplication of the normal copy of ycf1 when it spanned across the IRB-SSC border. At the IRB-SSC border, the IRB region expanded by 1283 bp toward ycf1 in S. sphenanthera, 1281 bp in S. chinensis, 413 bp in Illicium oligandrum, 155 bp in Nymphaea alba, and 1582 bp in Amborella trichopoda. The IRB-LSC border spanned across the trnH in the cp genome of S. sphenanthera and S. chinensis, whereas it was located entirely in the LSC region in other species. This result showed the clear contraction of the IRB region of S. sphenanthera and S. chinensis (Fig. 5).

Divergence hotspots of S. sphenanthera and S. chinensis

Basic informations indicated that the cp genomes of S. sphenanthera and S. chinensis have been well preserved during their long evolution. The cp genomes of S. sphenanthera and S. chinensis shared the same four-part circular structure and the former was only 123 bp longer than the latter. The GC contents of the cp genomes of these two species were similar. The number of unique genes, protein-coding genes, tRNA genes, and rRNA genes were also identical to each other, as well as the loss of ycf15 in S. sphenanthera also occurred in S. chinensis. Numbers of SNPs and InDels were analyzed between S. sphenanthera and S. chinensis. As a result, 474 SNPs and 97 InDels were identified (Table S1), among which 363, 12, and 99 SNPs located in the LSC, IRA/B and SSC region, respectively; 80, five, and 12 InDels located in the LSC IRA/B and SSC region, respectively. The first two longest InDels distributed in the LSC region. The longest InDel (474 bp) was present in S. sphenanthera but absent in S. chinensis. In contrast, the second InDel (435 bp) existed in S. chinensis but absent in S. sphenanthera. These SNPs and InDels are all potential markers to distinguish these two species. The sliding window analysis showed the nucleotide variability values between S. sphenanthera and S. chinensis differed from 0 to 0.03000 with a mean of 0.00364, which suggested a high sequence similarity. Five regions with much higher variation (Pi > 0.015), trnS-trnG, ccsA-ndhD, psbI-trnS, trnT-psbD and ndhF-rpl32, were recognized (Fig. 6). Three of these loci were found in the LSC region, two were in the SSC region, but none located in the IR regions. According to the analysis, it’s obvious that the SC regions had strikingly higher divergence compared to the IR regions.
Fig. 6

Sliding window analysis of complete cp genomes of S. sphenanthera and S. chinensis. Window length: 600 bp, step size: 200 bp.

Sliding window analysis of complete cp genomes of S. sphenanthera and S. chinensis. Window length: 600 bp, step size: 200 bp.

Discussion

Cp genomes of S. sphenanthera and S. chinensis are relatively conservative compared with other species of basal angiosperms

Analyzing the cp genome of S. sphenanthera, S. chinensis and the other four species from basal angiosperm lineages showed that the cp genome of S. sphenanthera was most similar to that of S. chinensis, and the sizes of the cp genome of S. sphenanthera (146 853 bp) and S. chinensis (146 730 bp) were relative small when compared to the other four species. The largest and smallest cp genome analyzed in this study were Trithuria inconspicua and S. chinensis respectively, with the former approximately 18.7 kb larger than the latter. IR region contraction clearly occurred in S. sphenanthera and S. chinensis on the basis of the full analysis of the adjacent genes on the IR/SC borders and the distance from the genes to the borders. The expansion and contraction of IR boundaries could cause some genes to move into the IR regions or remain in the SC regions. The cp genome of the two species had the similar GC content and the same number of intron-containing genes, functional genes, and rRNAs genes to each other, and shared the lack of ycf15 gene. The ycf15 gene was first identified in the cp genome of Nicotiana (Shi et al., 2013). It has been reported that ycf15 may be used as a potential marker to distinguish Colchicum from Gloriosa since the deletion of ycf15 was thought to occur only in Colchicum (Nguyen et al., 2015). Previous studies have reported that the loss events also happened in some species of Pteridophyta and Gymnosperm (Kim et al., 2014, Li et al., 2016, Wakasugi et al., 1994). The repeat regions of cp genomes play an important role in gene recombination and rearrangement (Smith, 2002). Chloroplast SSR markers are more effective in genetic studies for population structure analyses as these short repeats are haploid and uniparental inherited (Echt et al., 1998). We obtained 45 SSRs in S. sphenanthera and 47 SSRs in S. chinensis. The number of SSRs in these two species was similar, but the repeat motifs differed slightly. The abundance of SDRs was related to the extent of gene rearrangement given the fact that most repeats always occurred near the rearrangement hotspots and might mediate these rearrangement events (Chumley et al., 2006, Haberle et al., 2008, Pombert et al., 2005). Short repeat motifs might influence inter-molecular recombination in plastid DNA and create diversity within the population (Kawata et al., 1997). These repeat motifs provided a source of information for additional population genetic studies of S. sphenanthera.

Genomic divergence and hotspot regions are the potential molecular markers

Comparative analyses of the basal angiosperms and the hotspot analysis showed that the sequence divergence in IR regions was lower than that in SC regions. The low sequence divergence in IR regions may be related to the copy correction between IR regions by gene conversion (Khakhlova & Bock, 2006). Both the cp genomes of S. sphenanthera and S. chinensis were conserved. However, we identified 474 SNPs and 97 InDels through sequence comparison. Empirical phylogenetic studies using SNPs have become commonplace across diverse taxa and studies (Leaché & Oaks, 2017). In addition, five regions with greater variability (Pi > 0.015), trnS-trnG, ccsA-ndhD, psbI-trnS, trnT-psbD, and ndhF-rpl32, were recognized through sliding window analysis. To ensure the safety and accuracy of clinical medication of “Nan-Wuweizi” and “Wuweizi”, rapid and efficient DNA barcodes were needed to distinguish their source plants. “Nan-Wuweizi” and “Wuweizi” were used in more than 10 and 90 kinds of different traditional herbal medical products, respectively (Committee, 2015). The chemical indicator of “Nan-Wuweizi” is schisantherin A (C30H32O9) while that of “Wuweizi” is schisandrin (C24H32O7). The price of “Wuweizi” is nearly six to nine times higher than that of “Nan-Wuweizi”. At present, some “Nan-Wuweizi” were used as the counterfeits products of “Wuweizi”. Besides, fruits of the other species of Schisandra were also misused as “Nan-Wuweizi” or “Wuweizi” because of their similar morphology. DNA degrades heavily in highly processed materials, making it difficult to conduct accurate molecular identification. This study provided multiple SNPs that can be used to design a short nucleotide signature for distinguishing the TCM “Nan-Wuweizi” and “Wuweizi”. Furthermore, these SNPs and InDels, especially those from the five regions with high variation, have the greatest potential for clarifying the interspecific relationships accurately in the phylogenetic study and identifying the medicinal material within Schisandra.

Conclusion

This study reported the cp genome of S. sphenanthera and performed the comparative genome analyses with S. chinensis. The cp genomes of the two species were highly similar in sequence and structure. Compared with S. chinensis, cp genome of S. sphenanthera was 123 bp longer in length, while their IR region, SSRs, and the border circumstances were similar. Ninety-seven InDels and 474 SNPs were identified between S. chinensis and S. sphenanthera, which could be the short nucleotide signature with the greatest potential to distinguish the processed crude drug or traditional Chinese medicine products. Five highly variable regions, trnS-trnG, ccsA-ndhD, psbI-trnS, trnT-psbD, and ndhF-rpl32, could be developed as DNA barcodes for accurate species identification. This study not only facilitated the biological identification of these two important traditional medicinal plants, but also provided plenty information for the further study of Schisandra.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
  36 in total

1.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

2.  Short inverted repeats function as hotspots of intermolecular recombination giving rise to oligomers of deleted plastid DNAs (ptDNAs).

Authors:  M Kawata; T Harada; Y Shimamoto; K Oono; F Takaiwa
Journal:  Curr Genet       Date:  1997-02       Impact factor: 3.886

3.  Loss of all ndh genes as determined by sequencing the entire chloroplast genome of the black pine Pinus thunbergii.

Authors:  T Wakasugi; J Tsudzuki; S Ito; K Nakashima; T Tsudzuki; M Sugiura
Journal:  Proc Natl Acad Sci U S A       Date:  1994-10-11       Impact factor: 11.205

4.  Mining and characterizing microsatellites from citrus ESTs.

Authors:  Chunxian Chen; Ping Zhou; Young A Choi; Shu Huang; Fred G Gmitter
Journal:  Theor Appl Genet       Date:  2006-02-11       Impact factor: 5.699

5.  VISTA: computational tools for comparative genomics.

Authors:  Kelly A Frazer; Lior Pachter; Alexander Poliakov; Edward M Rubin; Inna Dubchak
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

6.  The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs.

Authors:  Peter Schattner; Angela N Brooks; Todd M Lowe
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

7.  Complete chloroplast genome sequence of poisonous and medicinal plant Datura stramonium: organizations and implications for genetic engineering.

Authors:  Yang Yang; Yuanye Dang; Dang Yuanye; Qing Li; Li Qing; Jinjian Lu; Lu Jinjian; Xiwen Li; Li Xiwen; Yitao Wang; Wang Yitao
Journal:  PLoS One       Date:  2014-11-03       Impact factor: 3.240

8.  CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences.

Authors:  Chang Liu; Linchun Shi; Yingjie Zhu; Haimei Chen; Jianhui Zhang; Xiaohan Lin; Xiaojun Guan
Journal:  BMC Genomics       Date:  2012-12-20       Impact factor: 3.969

9.  OrganellarGenomeDRAW--a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets.

Authors:  Marc Lohse; Oliver Drechsel; Sabine Kahlau; Ralph Bock
Journal:  Nucleic Acids Res       Date:  2013-04-22       Impact factor: 16.971

10.  Contradiction between plastid gene transcription and function due to complex posttranscriptional splicing: an exemplary study of ycf15 function and evolution in angiosperms.

Authors:  Chao Shi; Yuan Liu; Hui Huang; En-Hua Xia; Hai-Bin Zhang; Li-Zhi Gao
Journal:  PLoS One       Date:  2013-03-18       Impact factor: 3.240

View more
  1 in total

1.  Characterization and Comparative Analysis of Chloroplast Genomes in Five Uncaria Species Endemic to China.

Authors:  Min-Min Chen; Miao Zhang; Zong-Suo Liang; Qiu-Ling He
Journal:  Int J Mol Sci       Date:  2022-10-01       Impact factor: 6.208

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.