Literature DB >> 30740573

The complete plastome of Panax stipuleanatus: Comparative and phylogenetic analyses of the genus Panax (Araliaceae).

Changkun Liu¹, Zhenyan Yang¹, Lifang Yang^1,2, Junbo Yang³, Yunheng Ji¹.

Abstract

Panax stipuleanatus (Araliaceae) is an endangered and medicinally important plant endemic to China. However, phylogenetic relationships within the genus Panax have remained unclear. In this study, we sequenced the complete plastome of P. stipuleanatus and included previously reported Panax plastomes to better understand the relationships between species and plastome evolution within the genus Panax. The plastome of P. stipuleanatus is 156,069 base pairs (bp) in length, consisting of a pair of inverted repeats (IRs, each 25,887 bp) that divide the plastome into a large single copy region (LSC, 86,126 bp) and a small single copy region (SSC, 8169 bp). The plastome contains 114 unigenes (80 protein-coding genes, 30 tRNA genes, and 4 rRNA genes). Comparative analyses indicated that the plastome gene content and order, as well as the expansion/contraction of the IR regions, are all highly conserved within Panax. No significant positive selection in the plastid protein-coding genes was observed across the eight Panax species, suggesting the Panax plastomes may have undergone a strong purifying selection. Our phylogenomic analyses resulted in a phylogeny with high resolution and supports for Panax. Nine protein-coding genes and 10 non-coding regions presented high sequence divergence, which could be useful for identifying different Panax species.

Entities: Chemical Disease Gene Species

Keywords: Araliaceae; Comparative genomics; Panax stipuleanatus; Phylogenomics; Plastome

Year: 2018 PMID： 30740573 PMCID： PMC6317490 DOI： 10.1016/j.pld.2018.11.001

Source DB: PubMed Journal: Plant Divers ISSN： 2468-2659

Introduction

The genus Panax L. (Araliaceae) is one of the most medicinally important plant in East Asia. It includes seven well-recognized species and one species complex that is disjunctly distributed in East Asia and eastern North America (Wen and Zimmer, 1996, Wen, 1999, Lee and Wen, 2004). Almost every species within this genus has been used as a medicinal herb in East Asia, especially in China (Yang et al., 1988). Because of its considerable medicinal benefits, Panax has been involved in many molecular-based phylogenetic analyses in the past few decades. However, high-resolution and well-supported phylogeny within this genus remain elusive. Phylogenetic analysis relying on ITS, 18S rRNA, and plastid fragments have shown that the genus Panax is a monophyletic group; however, the same approaches have not resolved infra-generic relationships within the clade (Wen and Zimmer, 1996, Wen, 1999, Zhu et al., 2003, Lee and Wen, 2004). Although recent studies have clarified the phylogenetic relationships between Panax trifolius, Panax stipuleanatus, Panax pseudoginseng, Panax notoginseng, and the Panax bipinnatifidus species complex, the phylogenetic relationships between Panax ginseng, Panax japonicus, and Panax quinquefolius remain unresolved (Shi et al., 2015, Zuo et al., 2017). The plastomes of angiosperms are typically circular DNA, consisting of two copies of a large inverted repeat (IR) region separated by a large single-copy (LSC) region and a small single-copy (SSC) region (Raubeson and Jansen, 2005, Wicke et al., 2011). Because plastome sequences show a high level of divergence between species and even populations, these sequences provide valuable information for resolving complex relationships in plants (Moore et al., 2007, Moore et al., 2010, Jansen et al., 2007, Parks et al., 2009). With the advent of second-generation DNA sequencing technologies, plastomes have been widely used in recent years to reconstruct robust phylogenies and to identify species (Nock et al., 2011, Yang et al., 2013, Ruhsam et al., 2015, Huang et al., 2016). To date, a total of 27 complete Panax plastomes (Supplementary Table S1) have been sequenced (e.g., Kim and Lee, 2004, Choi et al., 2014, Zhao et al., 2015, Han et al., 2016, Zhang et al., 2016, Nguyen et al., 2017). These plastid genomic resources provide an opportunity to reconstruct a well-supported phylogeny of this medicinally important genus. The species P. stipuleanatus (Fig. 1) is restricted to the montane evergreen broad-leaved forests along the border between Vietnam and China in southeast Yunnan province (Xiang and Lowry, 2007). In this region, P. stipuleanatus has been traditionally used as a substitute for P. notoginseng, and its roots are extensively collected and sold in the local markets (Yang et al., 1988). Because of overharvesting, natural populations of Panax stipuleanatas have been markedly reduced. As a result, the species was listed as an endangered species by the Ministry of Environmental Protection of P. R. China in 2013 (http://www.zhb.gov.cn/gkml/hbb/bgg/201309/t20130912_260061). Available genomic resources for P. stipuleanatus are therefore limited, and little is known about the plastome features of the species.

Fig. 1

The geographic distribution and morphological features of Panax stipuleanatus H. T. Tsai & K. M. Feng. A, geographic distribution; B, aerial part; C, inflorescence; D, fruit; E, rootstock.

The geographic distribution and morphological features of Panax stipuleanatus H. T. Tsai & K. M. Feng. A, geographic distribution; B, aerial part; C, inflorescence; D, fruit; E, rootstock. Here, we present the complete plastome of P. stipuleanatus obtained through Illumina sequencing and a reference-guided assembly of de novo contigs. In conjunction with the previously published plastomes, the features of the plastomes among Panax species were compared, including gene content and order. Highly divergent DNA regions offering potential use in further species identification and phylogenetic analysis were also identified. Phylogenetic relationships among Panax species were analyzed with the plastome-based dataset. These results may well broaden our understanding of the evolutionary history of Panax.

Materials and methods

Genome sequencing and assembly

Total genomic DNA was isolated from the silica-gel-dried leaves of P. stipuleanatus, collected in Maguan County, Yunnan Province, China, using a modified CTAB method (Doyle and Doyle, 1987, Yang et al., 2014). Voucher specimens (JYH-2016,466) were deposited in the Herbarium of the Kunming Institute of Botany, CAS (KUN). Genomic DNA was randomly fragmented into 400–600 bp with an ultrasonicator. Short-insert (500 bp) paired-end libraries were constructed using the Genomic DNA Sample Prep Kit (Illumina), according to the manufacturer's protocol, and then sequenced on the Illumina HiSeq 2500 system at BGI (Shenzhen, Guangdong, China). The Illumina raw data were filtered with a NGS QC Toolkit (Patel and Jain, 2012). High-quality reads were assembled into contigs using SPAdes v3.10.1 (Nurk et al., 2013) with its default parameters. The representative plastome sequence contigs were then mapped onto the reference plastome sequence of P. japonicus (Genbank accession number: KX247146) in Bowtie v2.2.6 (Langmead and Salzberg, 2012) with its default-preset options. Assemblies were then assessed and connected using Bandage (Wick et al., 2015). The validated complete plastome sequences were deposited in GenBank (Table S1).

Genomic annotation and comparison

The plastome of P. stipuleanatus was annotated with the online software tool, DOGMA (Wyman et al., 2004), coupled with manual corrections for start and stop codons. All tRNAs were further confirmed by the tRNAscan-SE v1.21 (Schattner et al., 2005). Functional classification of the plastid genes was determined by referring to the online database CpBase (http://chloroplast.ocean.washington.edu/). A circular map of the plastome was drawn in OrganellarGenomeDRAW (Lohse et al., 2007). Complete plastomes of the P. bipinnatifidus species complex, P. ginseng, P. japonicus, P. notoginseng, Panax quiquefolius, P. trifolius, and Panax vietnamensis were downloaded from NCBI. Multiple-sequence alignments were performed with MAFFT software (Katoh et al., 2002), and manually edited where necessary. Geneious v7.0 (Kearse et al., 2012) was used to compare the boundaries of the LSC, IR, and SSC regions among the Panax plastomes. To compare sequence divergence among different Panax plastomes, the mVISTA tool was used (Frazer et al., 2004), with P. ginseng set as the reference. Single nucleotide polymorphisms (SNPs) occurring across Panax plastomes were identified with the Shuffle-LAGAN model in Geneious v7.0 (Kearse et al., 2012). Divergent percentages of SNPs among the homologous regions across these species were also calculated.

Synonymous (Ks) and non-synonymous (Ka) substitution rate analysis

Non-synonymous (Ka) and synonymous (Ks) substitutions, and their ratios (Ka/Ks), are important indictors that reflect the plastome evolution and natural selection (Yang and Nielsen, 2000). Protein-coding exons were extracted from the plastomes of the eight Panax taxa and Aralia undulata Handel-Mazzetti. These genes were translated into protein sequences and aligned separately using Geneious v7.0 (Kearse et al., 2012). The Ks and Ka substitution rates for each protein-coding exon were estimated in DnaSP v5.0 software (Librado and Rozas, 2009).

Phylogenetic analysis

All 28 available plastome sequences of Panax were included in the analysis. A. undulata was used as the outgroup to root the phylogenetic tree. Maximum-likelihood (ML) analyses were conducted based on the following data partitions: (1) the whole plastome; (2) the protein-coding exons (Table S2); (3) the LSC regions; (4) the SSC regions; (5) the IR regions; and (6) the introns and intergenic spacers. All sequences were aligned with the software MAFFT (Katoh et al., 2002). All gaps in the sequence alignment were excluded. For each dataset, the best-fitting partition scheme and nucleotide substitution models were screened in the program PartitionFinder v2.1.1 (Lanfear et al., 2012). For each analysis, the branch lengths were linked, and the models of nucleotide substitution were restricted to RAxML (Stamatakis, 2006); the “greedy” search algorithm was selected. All ML analyses were done in RAxML-HPC BlackBox v8.1.24 software (Stamatakis, 2006). The bootstrap (BS) value for each branch was computed with 1000 bootstrap replicates.

Results

Plastome features

In total, we obtained 4,948,544 paired-end clean reads for P. stipuleanatus. These reads were used to assemble the P. stipuleanatus plastome. The overall size of the P. stipuleanatus plastome is 156,069 base pairs (bp), and it shows a typical quadripartite structure, including a pair of IRs (each 25,887 bp) that divide the genome into LSC (86,126 bp) and SSC (18,169 bp) regions (Fig. 2, Table 1). The total length of the coding regions (i.e., protein-coding genes, tRNA genes, and rRNA genes) and non-coding regions (i.e., introns and intergenic spacers) are 91,632 bp and 64,437 bp, respectively (Table 1). The plastome of P. stipuleanatus possesses 114 unique genes, including 80 protein-coding genes, 30 tRNAs, and four rRNA genes. Of these, 12 protein-coding genes (atpF, ndhA, ndhB, petB, petD, rpl16, rpl2, rpoC1, rps12, rps16, clpP, and ycf3), and six tRNAs (trnA-UGC, trnG-GCC, trnI-GAU, trnK-UUU, trnL-UAA, and trnV-UAC) contain at least one intron (Table 2).

Fig. 2

Plastome map of the Panax stipuleanatus. Genes shown outside of the outer layer circle are transcribed counterclockwise, whereas those genes inside of this circle are transcribed clockwise. The colored bars indicate the known protein-coding genes, tRNA, and rRNA. The dashed, darker gray area of the inner circle denotes the GC content, while the lighter gray area indicates the AT content of the genome. LSC, large single-copy; SSC, small single-copy; IR, inverted repeat.

Table 1

Comparison of plastome features among Panax species.

Species	Total	LSC	SSC	IRs	Coding sequence	Non-coding sequence
Species	Length (bp)	Length (bp)	Length (bp)	Length (bp)	Length (bp)	Length (bp)
Aralia undulate Handel-Mazzetti	156,333	86,028	18,089	26,108	92,417	63,916
Panax bipinnatifidus Seemann species complex	156,063	86,111	18,174	25,889	91,632	64,431
Panax ginseng C. A. Meyer	156,241–156,425	86,106–86,200	16,077–18,084	26,018–28,018	92,060–92,238	64,116–64,259
Panax japonicus (T. Nees) C. A. Meyer	156,188	86,199	18,013	25,988	91,822	64,365
Panax notoginseng (Burkill) F. H. Chen	156,324–156,466	86,082–86,190	18,004–18,554	25,861–26,136	91,866–92,134	64,202–64,521
Panax quinquefolius L.	156,088–156,364	86,095–86,124	17,993–18,080	26,000–26,080	88,009–92,012	64,076–68,355
Panax stipuleanatus C. T. Tsai & K. M. Feng	156, 069	86,126	18,169	25,887	91,632	64,437
Panax trifolius L.	156,157	86,322	18,047	25,894	91,441	64,716
Panax vietnamensis Ha & Grushv	155,992–155,993	86,177–86,178	17,935	25,940	91,634–91,643	64,350–64,358

Table 2

List of genes identified in the plastome of Panax stipuleanatus.

Category of Genes	Group of gene	Name of gene
Self-replication	Ribosomal RNA genes	rrn4.5×2, rrn5×2, rrn1×2, rrn23×2
	Transfer RNA genes	trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-UCC, trnG-UCC, trnH-GUG, trnK-UUU, trnL-UAA, trnL-UAG, trnM-CAU, trnP-UGG, trnQ-UUG, trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-UGU, trnT-GGU, trnV-UAC, trnY-GUA, trnW-CCA, trnfM-CAU, trnA-UGC×2, trnI-CAU×2, trnI-GAU×2, trnL-CAA×2, trnN-GUU×2, trnR-ACG×2, trnV-GAC×2
	Ribosomal protein (small subunit)	rps2, rps3, rps4, rps7×2, rps8, rps11, rps12*×2, rps14, rps15, rps16, rps18, rps19
	Ribosomal protein (large subunit)	rpl2×2, rpl14, rpl16, rpl20, rpl22, rpl23×2, rpl32, rpl33, rpl36
	RNA polymerase	rpoA, rpoB, rpoC1*, rpoC2
	Translational initiation factor	infA
Genes for photosynthesis	Subunits of photosystem I	psaA, psaB, psaC, psaI, psaJ, ycf3**, ycf4
	Subunits of photosystem II	psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
	Subunits of cytochrome	petA, petB, petD, petG, petL, petN
	Subunits of ATP synthase	atpA, atpB, atpE, atpF*, atpH, atpI
	Large subunit of Rubisco	rbcL
	Subunits of NADH dehydrogenase	ndhA, ndhB×2, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
Other genes	Maturase	matK
	Envelope membrane protein	cemA
	Subunit of acetyl-CoA	accD
	Synthesis gene	ccsA
	ATP-dependent protease	clpP**
	Component of TIC complex	ycf1×2
Genes of unknown function	Conserved open reading frames	ycf×2, ycf15×2

×2: Two gene copies in IR regions; *: With one intron; **: With two introns.

IRs expansion and contraction

The IRs/LSC and IRs/SSC boundaries among Panax plastomes were compared (Fig. 3). The extensions of the IRs into rps19 and the intergenic spacer between rpl2 and trnH-GUG occur respectively at the IRa/LSC and IRb/LSC boundaries. Although the expansion of the IRs into the ycf1 pseudogene at the IRs/SSC junctions occurred in all species, the overlap between the ycf1 pseudogene and ndhF was only detected in P. quiquefolius and P. vietnamensis.

Fig. 3

Comparison of the borders of the LSC, SSC, and IR regions among the eight Panax plastomes. LSC, large single-copy; SSC, small single-copy; IR, inverted repeat.

Sequence divergence in the Panax plastomes

Plastome sequence divergences were identified among the homologous regions across the eight Panax taxa (Fig. 4). We identified 1130 SNPs in the matrix of these plastomes, with average variant frequency of 0.72%. SNP mutations include 743 SNP sites detected in the LSC region, 291 SNPs in the SSC region, and 48 SNPs in the IR region. Their corresponding average variant frequencies were 0.86%, 1.60%, and 0.19%. In addition, 498 SNPs (average variant frequency = 0.62%) were detected in the protein-coding exons, while 620 SNPs (average variant frequency = 0.96%) were detected in the non-coding regions (Table 3). The divergent frequencies of the coding regions ranged from 0.07% to 2.41% (Supplementary Table S3), and those of the non-coding regions ranged from 0.11% up to 5.66% (Supplementary Table S4). According to the sequence divergences, we screened nine protein coding regions—ccsA, matK, ndhF, petL, psaI, rpl22, rpoA, rps3, and ycf1—which showed the percentage of SNPs >1%. We also scanned 10 non-coding regions—atpA-atpF, ccsA-ndhD, infA-rps8, ndhI-ndhA, psbK-psbI, rpl14-rpl16, rpl2-trnH-GUG, rpl22-rps19, rps19-rpl2, and trnY-GUA-trnE-UUC—which showed a divergence proportion >3%. These particular plastid DNA regions (19 in all) may be utilized as potential molecular markers to reconstruct the phylogeny of Panax and to identify different species of this genus.

Fig. 4

Visualized alignment of the eight Panax plastomes. The mVISTA-based identity plots show the sequence identity among the eight Panax plastome, for which P. ginseng serves as the reference. Gray arrows indicate the position and direction of each gene. Genome regions are color-coded as protein coding, rRNA, tRNA, or conserved non-coding regions. Black lines define the regions of sequence identity shared with P. ginseng (using a 50%-identity cutoff criterion).

Table 3

Summary of SNPs found in the eight representative Panax plastomes.

Data type	Number of SNPs	Characters (bp)	Divergence proportion (%)
Complete plastome	1130	156,069	0.7240
Protein-coding genes	498	79,774	0.6243
Non-coding regions	620	64,437	0.9622
LSC region	743	86,126	0.8627
SSC region	291	18,169	1.6016
IR regions	48	25,887	0.1854

Synonymous (Ks) and non-synonymous (Ka) substitution rate

The Ks values of the eight Panax plastomes ranged from 0.0166 to 0.0218, while the Ka values ranged from 0.0032 to 0.0034; the Ka/Ks ratio ranged from 0.1391 to 0.1729 (Table 4, Fig. 5). We identified five genes (i.e., cemA, matK, ndhA, ndhG, and ycf2) with Ka/Ks values greater than one. Three of them (ndhA, ndhG, and ycf2) were identified in P. quinquefolius; one gene (ycf2) was identified in P. japonicus, P. vietnamensis and P. notoginseng; two genes were identified in P. ginseng (ndhG and ycf2), P. stipuleanatus (ndhA and matK), the P. bipinnatifidus species complex (ndhA and matK) and P. trifolius (cemA and ycf2). However, their P-values were greater than 0.05 (Supplementary Table S5), which suggested that no single protein-coding gene in the eight Panax plastomes has yet to be positively selected in a statistically significant way (Yang and Nielsen, 2000).

Table 4

Substitution rates of 80 protein-coding genes in the eight Panax plastomes.

Taxa	Non-synonymous (Ka)	Synonymous (Ks)	Ka/Ks
Panax bipinnatifidus species complex	0.0032 ± 0.0006	0.0166 ± 0.0018	0.1391
Panax ginseng	0.0032 ± 0.0005	0.0185 ± 0.0020	0.1575
Panax japonicus	0.0034 ± 0.0005	0.0186 ± 0.0020	0.1683
Panax notoginseng	0.0032 ± 0.0006	0.0187 ± 0.0022	0.1521
Panax quinquefolius	0.0032 ± 0.0005	0.0177 ± 0.0020	0.1729
Panax stipuleanatus	0.0034 ± 0.0006	0.0169 ± 0.0018	0.1393
Panax trifolius	0.0034 ± 0.0005	0.0218 ± 0.0022	0.1523
Panax vietnamensis	0.0034 ± 0.0006	0.0188 ± 0.0021	0.1559

Aralia undulata was used as an outgroup. Data are presented as the means ± standard error.

Fig. 5

Non-synonymous substitution (Ka), synonymous substitution (Ks), and the Ka/Ks values for the Panax plastid protein-coding genes.

Substitution rates of 80 protein-coding genes in the eight Panax plastomes. Aralia undulata was used as an outgroup. Data are presented as the means ± standard error. Non-synonymous substitution (Ka), synonymous substitution (Ks), and the Ka/Ks values for the Panax plastid protein-coding genes.

Phylogenomic analysis

Six data partitions from the 29 plastomes were used to perform phylogenetic reconstruction. The tree topologies for the whole plastome (Fig. 6A), protein-coding exons (Fig. 6B), and SSC (Fig. 6D) were congruent with each other, only differing with regards to support values at the interior nodes. By comparison, the phylogenetic relationships based on the LSC regions (Fig. 6C), IR (Fig. 6E), and introns and intergenic spacers (Fig. 6F) showed many similarities with the results from the other three datasets, except for the positions of P. notoginseng, which clustered into a new branch with P. ginseng and P. quinquefolius. Among the six data partitions, the phylogenetic tree for the protein-coding regions received the highest support for the major branches. As the tree shows (Fig. 6B), the eight Panax species were fully resolved as four, well-supported monophyletic clades. The clade represented only by P. trifolius was placed at the basally branching position. The second clade (BS = 100%) includes P. stipuleanatus and the P. bipinnatifidus species complex. The third clade (BS = 100%) contains P. ginseng and P. quinquefolius. The fourth clade (BS = 100%) consists of P. japonicus, P. vietnamensis, and P. notoginseng, in which the subclade of P. japonicus + P. vietnamensis (BS = 100%) diverged from P. notoginseng. In addition, those taxa (P. ginseng, P. notoginseng, P. quinquefolius, and P. vietnamensis) with more than one plastome currently available, were recovered as well-supported monophyletic lineages in all the datasets.

Fig. 6

Phylogenetic tree reconstruction of the genus Panax via maximum likelihood (ML), based on (A) the whole plastome; (B) the protein-coding exons; (C) the large single-copy (LSC) regions; (D) the small single-copy (SSC) regions; (E) the inverted repeated (IR) regions; and (F) the introns and intergenic spacers. The numbers above the line represent the ML-bootstrap values (1000 replicates).

Discussion

Comparison of the plastomes in Panax

We compared the plastome of P. stipuleanatus with that of seven published Panax species. All Panax plastomes shared 114 unique genes (80 protein-coding genes, 30 tRNAs, and four rRNAs) in the same order. Although several previous studies (e.g., Millen et al., 2001, Jansen et al., 2007) have revealed that several protein-coding genes (i.e., accD, ycf1, ycf2, rpl22, rps16, rpl23, infA, and ndhF) have been independently lost over the course of angiosperm evolution, these genes were all identified in the eight Panax plastomes. Additionally, the sequence length of the whole plastome, the LSC, SSC, IRs, coding sequences, and non-coding regions of the eight Panax species were quite similar (Table 1). These results suggest that plastome structure and gene content in the genus Panax are highly conserved. IR expansions often lead to size variations in the plastomes of angiosperms (e.g., Cosner et al., 1997, Plunlett and Downie, 2000, Chumley et al., 2006, Liu et al., 2017, Zhou et al., 2018). However, the IRs/LSC junctions of the eight Panax plastomes were highly conserved: the IRb/LSC boundaries were located between the rpl2 and trnH-GUG genes and the IR regions expanded into rps19 at the IRa/LSC junction, which is similar to other genera within the Araliaceae family (Li et al., 2013). This type of IRs/LSC boundary has been detected in Cornales (Yang and Ji, 2017), but it has not yet been observed in any other orders within the asterids clade (Kim and Lee, 2004, Huang et al., 2014, Downie and Jansen, 2015, Stull et al., 2015, Yao et al., 2016). In contrast to the IRs/LSC junctions, the IRs/SSC boundaries among the eight Panax plastomes were slightly variable, which may have contributed to the overall size variations among Panax plastomes. During the evolutionary history of a certain lineage, environmental change may impose selective pressures that result in adaptive evolution (Yang and Nielsen, 2000). However, when we examined the eight Panax plastomes, we did not observe any signs or evolutionary fingerprints of positive selection in the protein-coding genes at the generic level (Fig. 5, Supplementary Table S5). In addition, the average Ka values for these genomes were relatively low, and their Ka/Ks values were all less than one (Table 4). These results imply that the plastid protein-coding genes of Panax species may have undergone strong purifying selection during their evolution (Yang and Nielsen, 2000).

Phylogenetic inferences

We used six datasets (complete plastome, protein-coding exons, LSC, SSC, IRs, and non-coding regions) to reconstruct the phylogeny of Panax. Although earlier studies revealed that both the phylogenetic resolution and the support values of nodes may be considerably improved by more and longer DNA sequences (Rokas and Carroll, 2005, Philippe et al., 2011), our results indicate that the phylogeny based on protein-coding exons generated the highest branch support (BS = 100%, Fig. 6B), even though its sequence length is shorter than that of whole plastomes and LSC regions. Furthermore, our analysis found that non-coding regions across Panax plastomes possessed the highest sequence divergence among the six datasets tested (Table 3), while the phylogeny based on non-coding regions did not have the highest branch support (Fig. 6F). The relatively lower node supports observed in the trees using whole plastomes and LSC regions may be attributed to the faster mutation rates of non-coding regions in the plastome, which, as Wiens (2003) has suggested, produces more evolutionary homoplasy. Accordingly, the protein-coding genes of plastomes can provide accurate information with which to trace the relationships within Panax. Compared to previous single- or multi-locus DNA sequence analyses, our plastid phylogenomic analysis produced higher-resolution nodes, with much higher support values within Panax. Similar findings have been reported in analyses of the plastome-wide protein-coding genes of major flowering plant lineages (Jansen et al., 2007), basal angiosperms (Moore et al., 2007), early diverging eudicots (Moore et al., 2010), commelinid monocots (Barrett et al., 2013), and basal Lamiid orders (Stull et al., 2015). Phylogenetic analysis using protein-coding exons across the plastome recovered four, well-supported, monophyletic clades within Panax, and the relationships within each clade were also robustly supported (BS = 100%, Fig. 6B), which enabled the phylogenetic backbone of this genus to be recovered. The tree shows that P. ginseng is sister to P. quiquefolius with a high support value (BS = 100%), which is consistent with analysis based on expressed sequence tags (Choi et al., 2013), but differs from analyses of nuclear ITS sequences (Wen and Zimmer, 1996, Choi and Wen, 2000) and plastid intergenic spacers (Shi et al., 2015, Zuo et al., 2017). The sister relationship between P. ginseng and P. quiquefolius was also supported by similar morphological traits (carrot-like main roots), habitats (temperate forests at high latitudes in East Asia and North America), and chromosome number (tetraploid, 2n = 48) (Yi et al., 2004, Choi et al., 2014, Shi et al., 2015). P. notoginseng is a commonly cultivated medicinal herb in southwest China (Yang et al., 1988). However, its relationship to other Panax species has been disputed (e.g., Wen and Zimmer, 1996, Choi and Wen, 2000, Shi et al., 2015, Zuo et al., 2017). Our results indicate that P. notoginseng, P. japonicus, and P. vietnamensis form a highly supported clade (BS = 100%), which is sister to the clade comprising of P. ginseng and P. quiquefolius; P. trifolius was found to be the earliest diverged clade of Panax, which is identical with previous analyses (Wen and Zimmer, 1996, Lee and Wen, 2004, Shi et al., 2015, Zuo et al., 2017). In previous studies, P. stipuleanatus has repeatedly shown a sister relationship with P. pseudoginseng (see Wen and Zimmer, 1996, Wen, 1999, Lee and Wen, 2004, Zuo et al., 2011, Zuo et al., 2015, Zuo et al., 2017); however, our phylogenomic analysis resolved P. stipuleanatus and the P. bipinnatifidus species complex as a clade (BS = 100%). Because we did not obtain a sample from P. pseudoginseng in the present study, future studies are required to resolve the position of this species. The impact of polyploidization on plant speciation during the evolution of Panax is a particularly interesting issue. This genus contains three tetraploid species (2n = 46), P. ginseng, P. japonicus, and P. quiquefolius; furthermore, the P. bipinnatifidus species complex possesses both tetraploid and diploid species/populations (Yi et al., 2004). Our tree topologies clearly indicate that these tetraploid species were scattered in three well-supported clades, suggesting that whole genome duplication events may have occurred independently during the evolutionary history of Panax. This interpretation is supported by the study of Shi et al. (2015).

Utility of the Panax plastomes

Two plastid protein coding genes, rbcL and matK, and the psbA-trnH intergenic spacer, have been recommended as universal plastid DNA barcodes for land plants (Kress et al., 2005, Hollingsworth et al., 2011). Zuo et al. (2011) then proposed that psbA-trnH and ITS were sufficient for identifying species of Panax. However, we found that variation in rbcL and psbA-trnH was relatively low among the eight Panax species (at less than 1%) (Supplementary Tables S3-4). Hence, the universal plastid DNA barcodes rbcL and psbA-trnH may have limited power to identify Panax species. Thus, novel DNA barcodes for this genus are urgently needed. Based on the sequence variations, we found nine protein-coding regions (ccsA, matK, ndhF, petL, psaI, rpl22, rpoA, rps3, and ycf1) and 10 non-coding regions (atpA-atpF, ccsA-ndhD, infA-rps8, ndhI-ndhA, psbK-psbI, rpl14-rpl16, rpl2-trnH-GUG, rpl22-rps19, rps19-rpl2, and trnY-GUA-trnE-UUC) harboring a high proportion of SNPs. We propose that these plastid DNA regions are potentially useful for identifying Panax species. Among them, matK and ycf1 have been proposed elsewhere as promising DNA barcodes (Hollingsworth et al., 2011, Dong et al., 2015), and rpl14-rpl16 have been widely used for phylogenetic studies (Shaw et al., 2014). In future studies, we will investigate whether or not these plastid DNA sequences can serve as reliable and effective DNA barcodes for rapid species identification of plants in the genus Panax. Notably, in this study species with more than one plastome available were recovered as well-supported monophyletic groups, and different individuals of the same species generated notable genetic differentiation (Fig. 6). These results suggest that the plastome would be a reliable and accurate barcode for improving the resolution of species identification in the genus Panax. Further studies based on sampling at the population scale are needed to evaluate the efficiency of the plastome as an organelle-scale barcode.

Conclusion

The plastome of P. stipuleanatus, an endangered and medicinally important plant, was sequenced and assembled. The genome is 156,069 bp in length and has a typical quadripartite structure. Comparative analysis showed that the plastomes of Panax are relatively conserved. We investigated the substitution rate of protein-coding regions, which suggest that the plastomes of Panax may have undergone strong purifying selection. We generated a well-supported phylogeny for Panax, in which P. stipuleanatus is sister to the P. bipinnatifidus species complex. Moreover, molecular markers with high sequence divergence were identified, which may be useful for phylogenetic analysis and species identification. Overall, the plastome of P. stipuleanatus will provide valuable genetic information for identifying species, phylogenetic research, as well as resource conservation.

52 in total

1. Many parallel losses of infA from chloroplast DNA during angiosperm evolution with multiple independent transfers to the nucleus.

Authors: R S Millen; R G Olmstead; K L Adams; J D Palmer; N T Lao; L Heggie; T A Kavanagh; J M Hibberd; J C Gray; C W Morden; P J Calie; L S Jermiin; K H Wolfe
Journal: Plant Cell Date: 2001-03 Impact factor: 11.277

2. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.

Authors: Z Yang; R Nielsen
Journal: Mol Biol Evol Date: 2000-01 Impact factor: 16.240

3. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform.

Authors: Kazutaka Katoh; Kazuharu Misawa; Kei-ichi Kuma; Takashi Miyata
Journal: Nucleic Acids Res Date: 2002-07-15 Impact factor: 16.971

4. Missing data, incomplete taxa, and phylogenetic accuracy.

Authors: John J Wiens
Journal: Syst Biol Date: 2003-08 Impact factor: 15.683

5. Automatic annotation of organellar genomes with DOGMA.

Authors: Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal: Bioinformatics Date: 2004-06-04 Impact factor: 6.937

6. More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy.

Authors: Antonis Rokas; Sean B Carroll
Journal: Mol Biol Evol Date: 2005-03-02 Impact factor: 16.240

7. Complete chloroplast genome sequences from Korean ginseng (Panax schinseng Nees) and comparative analysis of sequence evolution among 17 vascular plants.

Authors: Ki-Joong Kim; Hae-Lim Lee
Journal: DNA Res Date: 2004-08-31 Impact factor: 4.458

8. Phylogenetic relationship in the genus Panax: inferred from chloroplast trnK gene and nuclear 18S rRNA gene sequences.

Authors: Shu Zhu; Hirotoshi Fushimi; Shaoqing Cai; Katsuko Komatsu
Journal: Planta Med Date: 2003-07 Impact factor: 3.352

9. Phylogeny of Panax using chloroplast trnC-trnD intergenic region and the utility of trnC-trnD in interspecific studies of plants.

Authors: Chunghee Lee; Jun Wen
Journal: Mol Phylogenet Evol Date: 2004-06 Impact factor: 4.286

10. VISTA: computational tools for comparative genomics.

Authors: Kelly A Frazer; Lior Pachter; Alexander Poliakov; Edward M Rubin; Inna Dubchak
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

1 in total

1. The complete plastomes of seven Peucedanum plants: comparative and phylogenetic analyses for the Peucedanum genus.

Authors: Chang-Kun Liu; Jia-Qing Lei; Qiu-Ping Jiang; Song-Dong Zhou; Xing-Jin He
Journal: BMC Plant Biol Date: 2022-03-07 Impact factor: 4.215

1 in total