Literature DB >> 32719421

The complete chloroplast genome sequence of the medicinal plant Sophora tonkinensis.

Fan Wei1, Danfeng Tang1, Kunhua Wei1, Fang Qin1, Linxuan Li1, Yang Lin1, Yanxia Zhu1, Aziz Khan2, Muhammad Haneef Kashif2, Jianhua Miao3.   

Abstract

Sophora tonkinensis belongs to genus Sophora of the Fabaceae family. It is mainly distributed in the ridge and peak regions of limestone areas in western China and has high medicinal value and important ecological functions. Wild populations of S. tonkinensis are in danger and need urgent conservation. Furthermore, wild S. tonkinensis resources are very limited relative to the needs of the market, and many adulterants are present on the market. Therefore, a method for authenticating S. tonkinensis and its adulterants at the molecular level is needed. Chloroplast genomes are valuable sources of genetic markers for phylogenetic analyses, genetic diversity evaluation, and plant molecular identification. In this study, we report the complete chloroplast genome of S. tonkinensis. The circular complete chloroplast genome was 154,644 bp in length, containing an 85,810 bp long single-copy (LSC) region, an 18,321 bp short single-copy (SSC) region and two inverted repeat (IR) regions of 50,513 bp. The S. tonkinensis chloroplast genome comprised 129 genes, including 83 protein-coding genes, 38 transfer RNA (tRNA) genes, and 8 ribosomal RNA (rRNA) genes. The structure, gene order and guanine and cytosine (GC) content of the S. tonkinensis chloroplast genome were similar to those of the Sophora alopecuroides and Sophora flavescens chloroplast genomes. A total of 1,760 simple sequence repeats (SSRs) were identified in the chloroplast genome of S. tonkinensis, and most of them (93.1%) were mononucleotides. Moreover, the identified SSRs were mainly distributed in the LSC region, accounting for 60% of the total number of SSRs, while 316 (18%) and 383 (22%) were located in the SSC and IR regions, respectively. Only one complete copy of the rpl2 gene was present at the LSC/IRB boundary, while another copy was absent from the IRA region because of the incomplete structure caused by IR region expansion and contraction. The phylogenetic analysis placed S. tonkinensis in Papilionoideae, sister to S. flavescens, and the genera Sophora and Ammopiptanthus were closely related. The complete genome sequencing and chloroplast genome comparative analysis of S. tonkinensis and its closely related species presented in this paper will help formulate effective conservation and management strategies as well as molecular identification approaches for this important medicinal plant.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32719421      PMCID: PMC7385175          DOI: 10.1038/s41598-020-69549-z

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Sophora tonkinensis is an important medicinal plant species in the genus Sophora, which belongs to Papilionoideae, a subfamily of the Fabaceae. This species is distributed mainly in Baise city, Jinchengjiang city, and Donglan County of Guangxi province as well as Guizhou and Yunnan Provinces of China[1]. S. tonkinensis grows mostly in ridge and peak regions of limestone areas in the wild and is used as an efficient species for the ecological restoration of karst rocky desertification areas (Fig. 1A,B)[2]. S. tonkinensis is effective in curing acute pharyngolaryngeal infection, eczema, colpitis, sore throat, gastrointestinal haemorrhage, and acute dysentery diseases[3]. The active ingredients in S. tonkinensis Gagnep mainly consist of alkaloids, saponins, flavonoids, and polysaccharides. Studies have shown that S. tonkinensis has various pharmacological effects, such as antitumour, anti-inflammatory, anti-arrhythmic, anti-diarrhoea, analgesic, and immune regulatory effects, as well as anti-hepatic fibrosis and liver-protective activities[4].
Figure 1

(A) The wild habitat of S. tonkinensis. (B) Habitat destruction of S. tonkinensis. (C) S. tonkinensis is used as an efficient species for the ecological restoration of karst rocky desertification areas.

(A) The wild habitat of S. tonkinensis. (B) Habitat destruction of S. tonkinensis. (C) S. tonkinensis is used as an efficient species for the ecological restoration of karst rocky desertification areas. Due to the unique habitat and extremely low reproductive capacity of S. tonkinensis, its wild resources are very limited relative to the needs of the market[5]. The species is under threat due to commercial overexploitation and serious habitat destruction (Fig. 1C), and its wild populations have been seriously shrinking. However, little is known regarding its genetic background. The plant chloroplast genome, with a length of 110–160 kb, is a valuable source of genetic markers for phylogenetic analyses, genetic diversity evaluation, and plant molecular identification due to its conserved structure and comparatively high substitution rate[6,7]. Therefore, a good understanding of chloroplast genomic information will make it easy to study genetic variation in and design reasonable conservation strategies for wild populations of S. tonkinensis. Furthermore, there are many adulterants of S. tonkinensis on the market, and it is difficult to distinguish them according to outward appearance[8], indicating an urgent need for a molecular approach with which to differentiate S. tonkinensis species from other adulterating species. DNA barcode sequence analysis, a molecular identification technology, can provide a rapid, accurate, and automatable method of species identification using a standardized piece of DNA sequence[9-11]. Chloroplast non-coding regions have been successfully applied in DNA barcoding research. Yao et al. found that the psbA-trnH intergenic spacer region could be used as a barcode to distinguish various Dendrobium species and to differentiate them from adulterating species[12]. Chen et al. tested the discrimination ability of ITS2 in more than 6,600 plant samples belonging to 4,800 species from 753 distinct genera and found that the rate of successful identification with ITS2 was 92.7% at the species level[13]. Chloroplast genomic information for S. tonkinensis will provide candidate DNA barcodes for the authentication of S. tonkinensis and the identification of its adulterants. In the present study, we assembled and analysed the chloroplast genome sequence of S. tonkinensis based on Illumina paired-end (PE) sequencing data. The sequence was also compared with other known chloroplast genome sequences using bioinformatics analysis, and the evolutionary position of S. tonkinensis among the Papilionoideae was confirmed.

Results

Genome sequencing and assembly

In this study, PE DNA sequencing was carried out using the Illumina MiSeq sequencing platform. In total, 17,594,210*2 PE reads and 5,313,451,420 bases were obtained, and a nucleotide quality score greater than 20 (Q20) was achieved at a rate of 96.92%. After quality filtering, 16,892,769*2 PE reads, 663,584 single reads, and 5,058,544,355 bases were obtained. According to the total length of the assembled sequence, number of scaffolds and scaffold N50, the assembly results for multiple K-mers were evaluated comprehensively, and then the optimal-K-mer data were selected as the final assembly results. We obtained 1 scaffold with a length of 154,644 bp. These data demonstrated a high-quality assembly. The complete chloroplast genome sequence of S. tonkinensis was deposited in the Sequence Read Archive (SRA) (accession number: SRR8434290).

General features of the S. tonkinensis chloroplast genome

Overall, the S. tonkinensis chloroplast genome was 154,644 bp in length and presented a complete circular structure, including a pair of inverted repeats (IRs) (50,513 bp) that divided the genome into two single-copy regions (long single-copy (LSC) region of 85,810 bp; short single-copy (SSC) region of 18,321 bp) (Fig. 2). Coding regions (60,756 bp) accounted for 39.3% of the genome, and intergenic regions (93,888 bp) comprised the remaining 60.7%. The percentages of guanine and cytosine bases (GC %) in the gene regions (37.8%) were higher than those in the intergenic regions (35.4%). The average gene length and gene density were 732 bp and 0.484, respectively. The frequencies of the four bases adenine (A), thymine (T), cytosine (C), and guanine (G) in the S. tonkinensis chloroplast genome were 49,139, 49,198, 27,915, and 28,392, accounting for 31.7%, 31.8%, 18.1%, and 18.4% of the genome, respectively (Table 1). The chloroplast genome of S. tonkinensis contained a total of 129 genes, including 83 protein-coding genes, 38 transfer RNA (tRNA) genes, and 8 ribosomal RNA (rRNA) genes (Table 2). Of the 83 protein-coding genes, nine (rps16, rpoC1, atpF, petB, petD, rpl16, rpl2, ndhA and ndhB) contained one intron, while clpP and ycf3 possessed two introns (Table S1). The S. tonkinensis chloroplast genome contained 64 types of codons encoding 21 types of amino acids (Fig. 3). The number of codons differed from 247 to 2,320, with a fraction ranging from 0.08 to 1. The amino acids Met and Trp had only one codon, while the remaining amino acids possessed 2–6 codons.
Figure 2

Chloroplast genome map of S. tonkinensis. Genes shown in the inner side of the circle are transcribed clockwise and those located on the outside of the circle are transcribed counter-clockwise. Genes belonging to different functional groups are colour-coded. Dashed area in the inner circle indicates the GC content of the chloroplast genome.

Table 1

Summary of chloroplast genome characteristics of S. tonkinensis.

CharacteristicsNumber
Total length (bp)154,644
LSC length (bp)85,810
SSC length (bp)18,321
IR length (bp)50,513
GC content (%)36.41
Gene total length (bp)60,756
Gene number83
Genes duplicated in IR6
Gene average length (bp)732
Gene density (number/kb)0.484
Gene/genome (%)39.3
GC content in gene region (%)37.8
Intergenetic region length (bp)93,888
Intergenetic length/genome (%)60.7
GC content in intergenetic region (%)35.4
tRNA gene38
rRNA gene8
rRNA gene duplicated in IR4
A (bp)49,139
T (bp)49,198
G (bp)28,392
C (bp)27,915
Table 2

List of protein-coding genes present in the S. tonkinensis chloroplast genome.

CategoryGene groupGene name
Genes for photosynthesisSubunits of photosystem IpsaApsaBpsaCpsaIpsaJ
Subunits of photosystem IIpsbApsbBpsbCpsbDpsbEpsbF
psbHpsbIpsbJpsbKpsbLpsbM
psbNpsbTpsbZ
Subunits of ATP synthaseatpAatpBatpEatpFbatpHatpI
Subunits of cytochromepetApetBbpetDbpetGpetLpetN
Large subunit of RubiscorbcL
Subunits of NADH dehydrogenasendhAbndhBabcndhCndhDndhEndhF
ndhGndhHndhIndhJndhG
Self-replicationSmall subunit of ribosomerps2rps3rps4rps7acrps8rps11
rps12acrps14rps15rps16brps18rps19
Large subunit of ribosomerpl2abrpl14rpl16brpl20rpl23acrpl32
rpl33rpl36
DNA-dependent RNA polymeraserpoArpoBrpoC1brpoC2
Other genesMaturasematK
Envelope membrane proteincemA
Subunit of acetyl-CoAaccD
C-type cytochrome synthesis geneccsA
ProteaseclpPb
UnknownConserved hypothetical chloroplast reading framesycf1cycf2acycf3bycf4

aGenes located in the IR regions.

bGenes having introns.

cTwo gene copies in IRs.

Figure 3

Relative synonymous codon usage (RSCU) of 20 amino acid and stop codons in all protein-coding genes of the chloroplast genome of S. tonkinensis.

Chloroplast genome map of S. tonkinensis. Genes shown in the inner side of the circle are transcribed clockwise and those located on the outside of the circle are transcribed counter-clockwise. Genes belonging to different functional groups are colour-coded. Dashed area in the inner circle indicates the GC content of the chloroplast genome. Summary of chloroplast genome characteristics of S. tonkinensis. List of protein-coding genes present in the S. tonkinensis chloroplast genome. aGenes located in the IR regions. bGenes having introns. cTwo gene copies in IRs. Relative synonymous codon usage (RSCU) of 20 amino acid and stop codons in all protein-coding genes of the chloroplast genome of S. tonkinensis.

Comparison to the chloroplast genomes of other Sophora species

The size of the S. tonkinensis chloroplast genome was found to be similar to those of the Sophora alopecuroides and Sophora flavescens chloroplast genomes[14,15] (Table 3). However, the S. tonkinensis chloroplast genome had the longest LSC region (85,809 bp), whereas the S. alopecuroides chloroplast genome had the shortest LSC region (84,221 bp). As shown in Table 3, S. tonkinensis had the lowest GC content (36.4%), while S. alopecuroides and S. flavescens had the same higher GC content (36.6%). A total of 129 genes and 8 rRNA genes (four rRNA species) were identified in every species. S. tonkinensis and S. alopecuroides contained 83 protein-coding genes, whereas S. flavescens contained 84. The S. tonkinensis and S. alopecuroides chloroplast genomes possessed 38 tRNA genes, whereas the S. flavescens chloroplast genome possessed 37.
Table 3

Comparison of general features of the genus Sophora chloroplast genomes.

Genome featureS. tonkinensisS. alopecuroidesS. flavescens
Total length (bp)154,644154,108154,378
LSC length (bp)85,80984,22184,516
SSC length (bp)18,32018,13918,110
IR length (bp)50,51551,74851,752
Total genes129129130
Protein gene838384
tRNA gene383837
rRNA gene888
GC content (%)36.436.636.6
Comparison of general features of the genus Sophora chloroplast genomes.

Simple sequence repeat (SSR) and tandem repeat analyses

SSRs are molecular markers with high variation within the same species that are used in population genetic and polymorphism studies. The types, presence, and distribution of SSRs in the chloroplast genome of S. tonkinensis were studied. A total of 1,760 SSRs were identified, including 1,639 (93.1%) mononucleotide, 30 (1.7%) dinucleotide, 79 (4.5%) trinucleotide, 6 (0.3%) tetranucleotide, and 6 (0.3%) pentanucleotide repeats, and the mononucleotide A and T repeat units accounted for the largest portion, with a percentage of 81.9%. Moreover, the SSRs were mainly distributed in the LSC region, accounting for 60% of the total SSRs, while 316 (18%) and 383 (22%) were located in the SSC and IR regions, respectively (Fig. 4A). Of these, 21 dinucleotide, 55 trinucleotide, 6 tetranucleotide, and 4 pentanucleotide repeats were identified in the LSC region; 3 dinucleotide repeats, 5 trinucleotide repeats, and 1 pentanucleotide repeats were found in the SSC region; and 6 dinucleotide repeats, 19 trinucleotide repeats, and 1 pentanucleotide repeat were observed in the IR region (Fig. 4B–D). The size and location of the tetra- and pentapolymers are shown in Table S2. Of these repeats, 10 and 2 were localized in intergenic spacers and coding regions, respectively, and none were found in introns.
Figure 4

The distribution, type and presence of simple sequence repeats (SSRs) in the chloroplast genome of S. tonkinensis. (A) Presence of SSRs in the LSC, SSC, and IR regions. (B) Presence of polymers in the LSC regions. (C) Presence of polymers in the SSC regions. (D) Presence of polymers in the IR regions.

The distribution, type and presence of simple sequence repeats (SSRs) in the chloroplast genome of S. tonkinensis. (A) Presence of SSRs in the LSC, SSC, and IR regions. (B) Presence of polymers in the LSC regions. (C) Presence of polymers in the SSC regions. (D) Presence of polymers in the IR regions. Tandem repeat sequences play a crucial role in genome rearrangement and phylogenetic analysis[16]. In the current study, a total of 23 tandem repeats were identified in the S. tonkinensis chloroplast genome (Table S3), which was smaller than the numbers observed in S. alopecuroides (49), Ammopiptanthus mongolicus (39), and Maackia floribunda (64) of Papilionoideae . Most of the tandem repeats were distributed in intergenic spacers and introns (19 (82.7%) in the intergenic spacers and 1 in the intron of clpP), and just 2 and 1 were located in the protein-coding regions of ycf2 and ndhF, respectively.

Comparative analysis of the S. tonkinensis chloroplast genome

Three published sequences representing Sophora (S. alopecuroides), Ammopiptanthus (A. mongolicus), and Maackia (M. floribunda) of Papilionoideae were selected for comparison with the sequence of S. tonkinensis to estimate the sequence divergence of different regions of these plastomes. The overall sequence identities of the four Papilionoideae chloroplast genomes were plotted using mVISTA with the annotation of S. tonkinensis as the reference, and we observed approximately identical gene orders and organizations among them (Fig. 5). The coding regions were found to be more highly conserved than the non-coding regions, and the two IR regions were less divergent than the LSC and SSC regions. The most divergent coding regions of the four chloroplast genomes were ycf1, ndhF, accD, rpoC2, and rpoB, and the four rRNA genes (rrn4.5, rrn5, rrn16, and rrn23) were the most conserved.
Figure 5

Comparison of the chloroplast genome sequences of S. tonkinensis, S. alopecuroides, A. mongolicus, and M. floribunda generated with mVISTA. Grey arrows indicate the position and direction of each gene. Red and blue areas indicate the intergenic and genic regions, respectively. The vertical scale indicates the percentage of identity, ranging from 50 to 100%.

Comparison of the chloroplast genome sequences of S. tonkinensis, S. alopecuroides, A. mongolicus, and M. floribunda generated with mVISTA. Grey arrows indicate the position and direction of each gene. Red and blue areas indicate the intergenic and genic regions, respectively. The vertical scale indicates the percentage of identity, ranging from 50 to 100%. IRs are the most conserved regions in the chloroplast genome, and contraction and expansion at their boundaries are common evolutionary events, representing one of the main factors affecting chloroplast genome size. Using Nicotiana tabacum as the reference species, we compared the IR/LSC and IR/SSC borders of the chloroplast genomes of S. tonkinensis, S. alopecuroides, A. mongolicus, and M. floribunda of Papilionoideae (Fig. 6). The results showed that S. tonkinensis had size differences in the LSC, SSC and IR regions compared with those in other closely related chloroplast genomes of Papilionoideae species. In all of these species, the rps19 gene was located in the LSC region. The rpl2 gene of S. tonkinensis spanned the LSC and IRB regions, while the rpl2 genes of the other species were all observed in the IRB region, with a 4–5 bp distance from the LSC/IRB border. The ycf1 pseudogene spanned the IRB/SSC boundary in all chloroplast genes, while the yfc1 pseudogene and nadH gene overlapped in A. mongolicus. The nadH gene was present in the SSC region of all genomes, with a 7–74 bp distance from the IRB/SSC junction. Expansion and contraction of the ycf1 gene were observed in the boundary regions of the SSC/IRA. Size variation in ycf1 from 5,318 to 5,708 bp was identified in all chloroplast genomes. The trnH gene was found in the LSC region of all genomes but was located 2 to 138 bp from the IRA/LSC boundary. In S. tonkinensis, the rpl2 gene was absent in the IRA region because of the incomplete gene structure caused by the expansion and contraction of IR regions.
Figure 6

Comparison of the border regions among LSC, IR and SSC in the chloroplast genomes of S. tonkinensis, S. alopecuroides, A. mongolicus, and M. floribunda. JLB: junction line between LSC and IRb; JSB: junction line between IRb and SSC; JSA: junction line between SSC and IRa; JLA: junction line between IRa and LSC.

Comparison of the border regions among LSC, IR and SSC in the chloroplast genomes of S. tonkinensis, S. alopecuroides, A. mongolicus, and M. floribunda. JLB: junction line between LSC and IRb; JSB: junction line between IRb and SSC; JSA: junction line between SSC and IRa; JLA: junction line between IRa and LSC.

Synonymous (KS) and non-synonymous (KA) substitution rate analysis

A total of 70 genes in the chloroplast genome of S. tonkinensis were used to calculate the KA/KS ratio relative to the chloroplast genome of S. alopecuroides and S. flavescens (Fig. 7). The KA/KS ratios of most of the genes in S. tonkinensis Vs. those in S. flavescens and S. alopecuroides were consistent with negative (or purifying) selection (KA/KS < 1), while six genes (matK, psbE, psbF, psbM, psaI, and rpl36) displayed positive selection (KA/KS > 1). Notably, the KA/KS ratios of psbE, psbF, psbM, psaI, and rpl36 in the S. tonkinensis Vs. S. flavescens and S. alopecuroides comparisons were as high as 50, which indicated great evolutionary divergence in these genes. The rps2 and rpl32 genes were differentially selected: rps12 did not differ in the S. tonkinensis Vs. S. flavescens comparison, but it was positively selected in the S. tonkinensis Vs. S. alopecuroides comparison (KA/KS = 9.25). rpl32 exhibited no difference in the S. tonkinensis Vs. S. alopecuroides comparison but was negatively selected in the S. tonkinensis Vs. S. flavescens (KA/KS = 0.32) comparison.
Figure 7

KA/KS values of 70 protein-coding genes in the S. tonkinensis Vs. S. alopecuroides and S. tonkinensis Vs. S. flavescens comparisons. Orange coloured bars indicate S. tonkinensis Vs S. flavescens, and blue coloured bars indicate S. tonkinensis Vs. S. alopecuroides.

KA/KS values of 70 protein-coding genes in the S. tonkinensis Vs. S. alopecuroides and S. tonkinensis Vs. S. flavescens comparisons. Orange coloured bars indicate S. tonkinensis Vs S. flavescens, and blue coloured bars indicate S. tonkinensis Vs. S. alopecuroides.

Single nucleotide polymorphism (SNP) analysis

SNP loci are very useful resources for phylogenetic analysis and species identification[17]. To determine the differences between S. tonkinensis and the two other Sophora species S. alopecuroides and S. flavescens at the chloroplast genome level, SNP analysis was carried out with the chloroplast genome of S. tonkinensis as the reference sequence. The results revealed 805 SNPs were found in the intergenic region, and 485 SNPs, including 236 non-synonymous SNPs and 249 synonymous SNPs were identified in 64 protein-coding genes. Of these genes, ycf1 contained the most SNP sites (Fig. 8).
Figure 8

Variation analysis between S. tonkinensis and S. alopecuroides or S. flavescens. Green coloured bars indicate non-synonymous SNPs and purple coloured bars indicate synonymous SNPs.

Variation analysis between S. tonkinensis and S. alopecuroides or S. flavescens. Green coloured bars indicate non-synonymous SNPs and purple coloured bars indicate synonymous SNPs.

Phylogenetic analysis

In the present study, we aligned 20 complete chloroplast genomes of Papilionoideae to reveal the phylogenetic position of S. tonkinensis (Fig. 9). The phylogenetic positions of these 20 chloroplast genomes were successfully resolved with full bootstrap support across almost all nodes. We found that S. tonkinensis was grouped into Sophora with S. flavescens and S. alopecuroides and S. tonkinensis exhibited the closest relationship with S. flavescens. A close relationship among the genera Sophora, Salweenia and Ammopiptanthus was also uncovered.
Figure 9

Phylogenetic tree of S. tonkinensis and 19 species of subfamily Papilionoideae using Maximum likelihood based on the complete chloroplast genomes. The bootstrap value based on 1,000 replicates is shown on each node.

Phylogenetic tree of S. tonkinensis and 19 species of subfamily Papilionoideae using Maximum likelihood based on the complete chloroplast genomes. The bootstrap value based on 1,000 replicates is shown on each node.

Discussion

Since the first sequenced plant chloroplast genome was isolated from tobacco[18], thousands of chloroplast genomes from various species have been sequenced. As of 2019, more than 3,300 chloroplast genome sequences had been recorded in the National Center for Biotechnology Information (NCBI) database. In recent years, DNA barcoding has become a powerful tool for species identification. In plants, commonly used DNA barcodes include the chloroplast genes rbcL, matK and psba-trnh and nuclear genes ITS and ITS2[19]. Of these, ITS2 has been suggested as a universal DNA barcode for medicinal plants due to its strong identification ability[12]. However, DNA barcodes do not have enough variation information for species identification of some medicinal plants, such as Epimedium and Fritillariae[20,21]. Compared with a standardized piece of DNA sequence, the whole chloroplast genome contains more mutation sites and is more efficient in identification. The whole chloroplast genome is used as a superbarcode and has been successfully applied in species identification of some medicinal plants and their closely related species. Cui et al. found that three medicinal and edible Amomum taxa (A. villosum, A. villosum var. xanthioides and A. longiligulare) could be accurately identified using their whole chloroplast genomes[22]. Chen et al. discovered that the complete chloroplast genome can be used as a superbarcode to identify six Ligularia species[23]. The chloroplast genome could distinguish C. indicum from its closely related species and might become a potential superbarcode for the identification of these species[24]. Zhu et al. found that the complete plastome sequence dataset had the highest discriminatory power for D. officinale and its closely related species, indicating that complete plastome sequences can be used to accurately authenticate Dendrobium species[25]. The whole chloroplast genome of S. tonkinensis and its hypervariable region, including the most divergent regions (ycf1, ndhF, accD, and rpoC2), which are also the genes containing the most SNP sites, and the six positively selected genes (matK, psbE, psbF, psbM, psaI, and rpl36) could be selected as potential DNA barcodes for identification of species in future studies. Genetic variation plays an important role in the ability of plants to maintain their evolutionary potential to adapt to the ever-changing environment, therefore the maintenance of genetic variation is the main goal of the conservation strategies for most endangered species[26]. SSRs, also known as microsatellites, have high polymorphism rate at the species level[27-30]. Therefore, they have been widely used as effective molecular markers in population genetic and evolution studies[31,32]. Yang et al. used eight SSR primer to assess the genetic diversity and structure of 22 natural populations of the endangered medicinal plant Phellodendron amurense in China, and proposed proper conservation measures for this species[33]. An ex situ conservation measure for conserving genetically distant populations to maximize the genetic diversity of Eucommia ulmoides is recommended based on the genetic analysis diversity within and among the semi-wild and cultivated populations of E. ulmoides using two cpSSR loci[34]. In the S. tonkinensis chloroplast genome, five types of SSRs (mono-, di-, tri-, tetra-, and penta-nucleotide repeats) and a total of 150 SSR loci with a length of at least 10 bp were identified (Table S4). The mononucleotide repeats were the most abundant SSR. Most of the mononucleotide and dinucleotides are composed of multiple copies of A/T and AT/TA repeats, respectively, this result is similar to that of previous study on S. alopecuroides[35]. These SSRs of the S. tonkinensis chloroplast genome could be useful biomarkers for genetic diversity studies of wild populations of S. tonkinensis, which will help to formulate effective conservation and management strategies for this important medicinal plant.

Conclusions

In conclusion, the chloroplast genome of S. tonkinensis was sequenced on the Illumina HiSeq 2000 platform in this study. SSRs and tandem repeats were identified and 1,760 SSRs were found, most of which were mononucleotides, in the chloroplast genome of S. tonkinensis. SSR analysis can provide valuable information for developing highly variable DNA markers for population genetic surveys and other ecological and evolutionary studies of S. tonkinensis. Further, we performed phylogenetic analysis of 20 chloroplast genomes and collinearity analysis of three closely related species of S. tonkinensis. The contraction and expansion of the IR regions of the three closely related species were also compared. The results of the above analyses provide valuable reference information that will help formulate effective conservation and management strategies as well as molecular identification approaches for this important medicinal plant.

Materials and methods

Sample preparation and DNA extraction

Sophora tonkinensis leaves were obtained from 2-year-old plants in the Guangxi Botanical Garden of Medicinal Plants (Nanning, China). Genomic DNA was extracted using the DNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions.

Genome assembly and annotation

DNA was randomly fragmented by a Covaris M220 apparatus. After adding the poly “A” tail, the DNA fragments with desired lengths (400–500 bp) were ligated to adapters and purified using the TruSeq™ DNA Sample Prep Kit for Illumina MiSeq sequencing. Before assembly, raw reads were filtered, and the reads with adapters, reads containing too many uncalled bases (“N” characters, ≥ 10%), the reads showing a quality score below 20 (Q < 20), and the duplicated sequences were removed. The optimized sequence was first assembled by using SOAP de Novo v2.04 software (https://soap.genomics.org.cn/)[36]. Second, GapCloser v1.12 software was used to fill the gaps in the assembly results and for base correction. Annotation of the chloroplast genome was conducted using Dual Organellar GenoMe Annotator (DOGMA) software (https://dogma.ccbb.utexas.edu/)[37], and artificial correction was carried out to predict the genes, rRNAs, and tRNAs in the genome. A circular chloroplast genome map was drawn using the OGDRAW program (https://chlorobox.mpimp-golm.mpg.de/OGDraw.html)[38].

Codon usage analysis

RSCU (Relative Synonymous Codon Usage) was computed from the protein-coding gene sequences of the S. tonkinensis cp genome. The online program CodonW 1.4.2 (https://codonw.sourceforge.net/) was employed for RSCU and codon frequency analysis[39].

Analysis of simple sequence repeats (SSRs) and tandem repeats

The SSRs in the complete chloroplast genome of S. tonkinensis were detected using MIcroSAtellite identification tool (MISA) software (https://pgrc.ipk-gatersleben.de/misa/)[40]. The repeat sequences with repeating units of 5, 5, 3, 3, and 3 bp were considered as SSRs for mononucleotide, dinucleotides, trinucleotides, tetranucleotides, and pentanucleotide, respectively. Tandem Repeats Finder (TRF) v4.04 was used to identify tandem repeats[41].

Comparative genomic analysis of the S. tonkinensis chloroplast genome

The complete chloroplast genome of S. tonkinensis was compared with that of three other Papilionoideae species, namely, S. alopecuroides [MH_748034], A. mongolicus [NC_034742], and M. floribunda [KX_388160], in the NCBI plastid database using the mVISTA program with a shuffle-LAGAN model[42]. S. tonkinensis was set as the reference. The chloroplast genome sequence of S. tonkinensis was compared with those of S. alopecuroides [MH_748034] and S. flavescens [MK_114100] in the NCBI plastid database. The same functional protein-coding exons were extracted and aligned separately to analyse the KA and KS substitution rates using Geneious v7.1.9 software. The aligned sequences were translated into protein sequences and then analysed. The KA and KS substitution rates for each protein-coding exon were calculated in DnaSP[43]. The ratios KA/KS > 1, KA/KS = 1 and KA/KS < 1 indicate positive selection, neutral selection and negative selection, respectively[44]. The phylogenetic analysis was based on the complete chloroplast genome sequence of S. tonkinensis and 19 species of Papilionoideae downloaded from the NCBI database (https://www.ncbi.nlm.nih.gov/), including two species of Sophora species, S. alopecuroides and S. flavescens, and 17 chloroplast genomes from the other basic lineage of Papilionoideae. The maximum likehood method was used to infer the phylogenetic relationship with 1,000 bootstrap replicates in MEGA 5.0[45]. Supplementary Information 1.
  30 in total

Review 1.  Chloroplast evolution: secondary symbiogenesis and multiple losses.

Authors:  T Cavalier-Smith
Journal:  Curr Biol       Date:  2002-01-22       Impact factor: 10.834

2.  DNA barcoding a useful tool for taxonomists.

Authors:  David E Schindel; Scott E Miller
Journal:  Nature       Date:  2005-05-05       Impact factor: 49.962

3.  Identification of Dendrobium species by a candidate DNA barcode sequence: the chloroplast psbA-trnH intergenic region.

Authors:  Hui Yao; Jing-Yuan Song; Xin-Ye Ma; Chang Liu; Ying Li; Hong-Xi Xu; Jian-Ping Han; Li-Sheng Duan; Shi-Lin Chen
Journal:  Planta Med       Date:  2009-02-23       Impact factor: 3.352

4.  What can we do with 1000 plastid genomes?

Authors:  Julian Tonti-Filippini; Paul G Nevill; Kingsley Dixon; Ian Small
Journal:  Plant J       Date:  2017-03-30       Impact factor: 6.417

5.  Use of DNA barcodes to identify flowering plants.

Authors:  W John Kress; Kenneth J Wurdack; Elizabeth A Zimmer; Lee A Weigt; Daniel H Janzen
Journal:  Proc Natl Acad Sci U S A       Date:  2005-05-31       Impact factor: 11.205

6.  Highly variable chloroplast markers for evaluating plant phylogeny at low taxonomic levels and for DNA barcoding.

Authors:  Wenpan Dong; Jing Liu; Jing Yu; Ling Wang; Shiliang Zhou
Journal:  PLoS One       Date:  2012-04-12       Impact factor: 3.240

7.  A chloroplast genomic strategy for designing taxon specific DNA mini-barcodes: a case study on ginsengs.

Authors:  Wenpan Dong; Han Liu; Chao Xu; Yunjuan Zuo; Zhongjian Chen; Shiliang Zhou
Journal:  BMC Genet       Date:  2014-12-20       Impact factor: 2.797

Review 8.  Telling plant species apart with DNA: from barcodes to genomes.

Authors:  Peter M Hollingsworth; De-Zhu Li; Michelle van der Bank; Alex D Twyford
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2016-09-05       Impact factor: 6.237

9.  Gene-Based Nonparametric Testing of Interactions Using Distance Correlation Coefficient in Case-Control Association Studies.

Authors:  Yingjie Guo; Chenxi Wu; Maozu Guo; Xiaoyan Liu; Alon Keinan
Journal:  Genes (Basel)       Date:  2018-12-05       Impact factor: 4.096

10.  Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species.

Authors:  Shilin Chen; Hui Yao; Jianping Han; Chang Liu; Jingyuan Song; Linchun Shi; Yingjie Zhu; Xinye Ma; Ting Gao; Xiaohui Pang; Kun Luo; Ying Li; Xiwen Li; Xiaocheng Jia; Yulin Lin; Christine Leon
Journal:  PLoS One       Date:  2010-01-07       Impact factor: 3.240

View more
  8 in total

1.  The Complete Chloroplast Genomes of Primula obconica Provide Insight That Neither Species nor Natural Section Represent Monophyletic Taxa in Primula (Primulaceae).

Authors:  Qiang Li
Journal:  Genes (Basel)       Date:  2022-03-23       Impact factor: 4.141

2.  Comparative chloroplast genomes and phylogenetic analysis of Aquilegia.

Authors:  Wei Zhang; Huaying Wang; Jianhua Dong; Tengjiao Zhang; Hongxing Xiao
Journal:  Appl Plant Sci       Date:  2021-03-16       Impact factor: 1.936

3.  Comparative Chloroplast Genomics of Sophora Species: Evolution and Phylogenetic Relationships in the Early-Diverging Legume Subfamily Papilionoideae (Fabaceae).

Authors:  Min Liao; Xin-Fen Gao; Jun-Yi Zhang; Heng-Ning Deng; Bo Xu
Journal:  Front Plant Sci       Date:  2021-12-16       Impact factor: 5.753

4.  Comparative analysis of chloroplast genomes reveals phylogenetic relationships and intraspecific variation in the medicinal plant Isodon rubescens.

Authors:  Conglong Lian; Hao Yang; Jinxu Lan; Xueyu Zhang; Fei Zhang; Jingfan Yang; Suiqing Chen
Journal:  PLoS One       Date:  2022-04-06       Impact factor: 3.240

5.  Sequencing and characterization of the chloroplast genome of Aconitum forrestii Stapf provide insights into phylogenetics in Aconitum.

Authors:  Mei-Hua Yang; Cong-Long Xia; Yun-Hui Guan; Hai-Zhu Zhang; Xu-Bing Chen; Ying Wang
Journal:  Mitochondrial DNA B Resour       Date:  2022-06-24       Impact factor: 0.610

6.  The complete chloroplast genome of Verbascum thapsus L. (Scrophulariaceae) and its phylogenetic affinities.

Authors:  Yue Zhang; Yun-Hui Guan; Cheng-Zhao Pu; Yi-Jia Xi; Ai-En Tao; Cong-Long Xia
Journal:  Mitochondrial DNA B Resour       Date:  2022-09-02       Impact factor: 0.610

7.  The Complete Chloroplast Genome of Endangered Species Stemona parviflora: Insight into the Phylogenetic Relationship and Conservation Implications.

Authors:  Ran Wei; Qiang Li
Journal:  Genes (Basel)       Date:  2022-07-29       Impact factor: 4.141

8.  Cold Resistance of Euonymus japonicus Beihaidao Leaves and Its Chloroplast Genome Structure and Comparison with Celastraceae Species.

Authors:  Hongyu Cai; Xiaozheng Gu; Yongtan Li; Yachao Ren; Shufang Yan; Minsheng Yang
Journal:  Plants (Basel)       Date:  2022-09-20
  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.