| Literature DB >> 35627316 |
Shanyong Yi1,2, Haibo Lu2, Wei Wang1,2, Guanglin Wang1,3, Tao Xu1,2, Mingzhi Li4, Fangli Gu1,2, Cunwu Chen1,2, Bangxing Han2, Dong Liu1.
Abstract
Saposhnikovia divaricata, a well-known Chinese medicinal herb, is the sole species under the genus Saposhnikovia of the Apiaceae subfamily Apioideae Drude. However, information regarding its genetic diversity and evolution is still limited. In this study, the first complete chloroplast genome (cpDNA) of wild S. divaricata was generated using de novo sequencing technology. Similar to the characteristics of Ledebouriella seseloides, the 147,834 bp-long S. divaricata cpDNA contained a large single copy, a small single copy, and two inverted repeat regions. A total of 85 protein-coding, 8 ribosomal RNA, and 36 transfer RNA genes were identified. Compared with five other species, the non-coding regions in the S. divaricata cpDNA exhibited greater variation than the coding regions. Several repeat sequences were also discovered, namely, 33 forward, 14 reverse, 3 complement, and 49 microsatellite repeats. Furthermore, phylogenetic analysis using 47 cpDNA sequences of Apioideae members revealed that L. seseloides and S. divaricata clustered together with a 100% bootstrap value, thereby supporting the validity of renaming L. seseloides to S. divaricata at the genomic level. Notably, S. divaricata was most closely related to Libanotis buchtormensis, which contradicts previous reports. Therefore, these findings provide a valuable foundation for future studies on the genetic diversity and evolution of S. divaricata.Entities:
Keywords: complete cpDNA sequence; phylogeny; traditional Chinese medicine
Mesh:
Substances:
Year: 2022 PMID: 35627316 PMCID: PMC9141249 DOI: 10.3390/genes13050931
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.141
Figure 1Chloroplast genome map showing all reported genes of Saposhnikovia divaricata. Genes drawn inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes belonging to different functional groups are color-coded. The darker gray in the inner circle corresponds to GC content. Small single-copy (SSC) region, large single-copy (LSC) region, and inverted repeats (IRa and IRb) are displayed.
Figure 2Comparison of GC content of the wild S. divaricata cpDNA using GView program.
List of genes found in the chloroplast genome of the wild S. divaricata.
| Group of Genes | Gene Name | Number | |
|---|---|---|---|
|
| 8 | ||
| 36 | |||
| Ribosomal small subunit | 14 | ||
| Ribosomal large subunit | 9 | ||
| DNA-dependent RNA polymerase | 4 | ||
|
| Large subunit of rubisco |
| 1 |
| Photosystem I | 5 | ||
| Photosystem II | 15 | ||
| NADH dehydrogenase | 12 | ||
| Cytochrome b/f complex | 6 | ||
| ATP synthase | 6 | ||
|
| Maturase |
| 1 |
| Subunit of acetyl-CoA carboxylase |
| 1 | |
| Envelope membrane protein |
| 1 | |
| Protease |
| 1 | |
| C-type cytochrome synthesis |
| 1 | |
| Translation initiation factor |
| 1 | |
|
| Conserved open reading frames | 7 | |
|
| 129 |
One star character (*) means one intron; (**) means two introns; (×2) indicates genes with two copies.
The genes with introns in the wild S. divaricata chloroplast genome and the length of the exons and introns.
| Gene | Location | Exon1 (bp) | Intron1 (bp) | Exon2 (bp) | Intron2 (bp) | Exon3 (bp) |
|---|---|---|---|---|---|---|
|
| LSC | 37 | 2532 | 35 | ||
|
| IRb | 37 | 968 | 35 | ||
|
| IRa | 37 | 968 | 35 | ||
|
| IRb | 38 | 818 | 35 | ||
|
| IRa | 38 | 818 | 35 | ||
|
| LSC | 23 | 703 | 48 | ||
|
| LSC | 39 | 569 | 35 | ||
|
| LSC | 35 | 502 | 50 | ||
| 1 | LSC + IRa | 114 | 232 | 538 | 26 | |
| 1 | LSC + IRb | 114 | 232 | 538 | 26 | |
|
| LSC | 40 | 859 | 197 | ||
|
| LSC | 432 | 748 | 1605 | ||
|
| LSC | 394 | 651 | 434 | ||
|
| LSC | 9 | 950 | 399 | ||
|
| SSC | 553 | 1099 | 539 | ||
|
| LSC | 777 | 682 | 756 | ||
|
| IRa | 777 | 682 | 756 | ||
|
| LSC | 6 | 758 | 642 | ||
|
| LSC | 8 | 750 | 475 | ||
|
| LSC | 145 | 711 | 401 | ||
|
| LSC | 231 | 635 | 292 | 848 | 71 |
|
| LSC | 153 | 776 | 228 | 717 | 126 |
1 Since the rps12 gene is trans-spliced in the wild S. divaricata cpDNA, the length of intron 1 is not counted.
Figure 3Comparison of the GC content, codon usage preference, and amino acid proportion in the protein-coding genes of seven chloroplast genomes. (A–D) GC content in the synonymous codons at the first (GC1), second (GC2), and third (GC3) positions and total GC content (GCs). (E) Codon preference and proportion of amino acids based on relative synonymous codon usage (RSCU) values. Ter represents the stop codon. Legend: A, A. xanthorrhiza; B, P. praeruptorum; C, L. buchtormensis; D, S. divaricata; E, L. seseloides; F, S. montanum; G, A. paeoniifolia.
Figure 4Comparison of the seven chloroplast genomes belonging to subfamily Apioideae Drude using mVISTA program. Grey arrows and thick black lines above the alignments indicate gene orientations and IR positions, respectively. A cut-off of 70% identity was used for the plots, with the Y-scale representing the percent identity (50–100%). Genome regions are color-coded as protein-coding (exon; blue), ribosomal RNA (rRNA; cyan), and conserved non-coding sequences (CNS; pink).
Figure 5Comparison of the nucleotide variability (Pi) values among the seven species cp genomes. The Y-axis shows the Pi values; the X-axis shows the genes with high Pi values.
Figure 6Comparison of the borders of LSC, SSC, and IR regions among seven cp genomes.
Repeat sequences in the chloroplast genome of the wild S. divaricata.
| ID | Size | Repeat 1 | Type 1 | Size | Repeat 2 | Mismatch (bp) | E-Value | Gene | Region |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 34 | 7110 | F | 32 | 7126 | 3 | 0.00011 | IGS | LSC |
| 2 | 32 | 8400 | F | 31 | 36,451 | 2 | 1.90 × 10−5 | IGS | LSC |
| 3 | 34 | 9846 | R | 32 | 115,685 | 3 | 0.00011 | IGS | LSC;SSC |
| 4 | 35 | 9851 | C | 34 | 115,668 | 3 | 3.00 × 10−5 | IGS | LSC;SSC |
| 5 | 35 | 9851 | C | 32 | 115,670 | 3 | 3.00 × 10−5 | IGS | LSC;SSC |
| 6 | 35 | 20,788 | F | 35 | 20,837 | 3 | 3.00 × 10−5 | IGS | LSC |
| 7 | 36 | 32,147 | R | 37 | 32,160 | 3 | 2.23 × 10−6 | IGS | LSC |
| 8 | 39 | 44,679 | F | 39 | 98,962 | 2 | 1.74 × 10−9 | LSC;IRb | |
| 9 | 39 | 44,679 | F | 39 | 122,483 | 3 | 1.64 × 10−7 | LSC;SSC | |
| 10 | 35 | 44,682 | F | 35 | 95,893 | 3 | 3.00 × 10−5 | LSC;IRb | |
| 11 | 33 | 44,685 | F | 33 | 98,968 | 1 | 3.27 × 10−8 | LSC;IRb | |
| 12 | 31 | 51,905 | R | 31 | 64,144 | 1 | 4.92 × 10−7 | IGS | LSC |
| 13 | 35 | 51,907 | R | 35 | 51,907 | 2 | 3.57 × 10−7 | IGS | LSC |
| 14 | 32 | 51,907 | F | 32 | 51,923 | 2 | 1.90 × 10−5 | IGS | LSC |
| 15 | 37 | 51,911 | F | 36 | 64,143 | 3 | 2.23 × 10−6 | IGS | LSC |
| 16 | 42 | 51,912 | R | 42 | 51,912 | 2 | 3.15 × 10−11 | IGS | LSC |
| 17 | 42 | 51,912 | R | 40 | 51,912 | 2 | 3.15 × 10−11 | IGS | LSC |
| 18 | 28 | 51,912 | F | 28 | 115,670 | 0 | 8.53 × 10−8 | IGS | LSC;SSC |
| 19 | 31 | 51,912 | R | 31 | 115,663 | 1 | 4.92 × 10−7 | IGS | LSC;SSC |
| 20 | 36 | 51,913 | R | 39 | 51,916 | 3 | 1.64 × 10−7 | IGS | LSC |
| 21 | 35 | 51,914 | R | 37 | 115,674 | 3 | 2.23 × 10−6 | IGS | LSC;SSC |
| 22 | 33 | 51,922 | F | 32 | 115,663 | 2 | 5.07 × 10−6 | IGS | LSC;SSC |
| 23 | 30 | 51,925 | R | 29 | 115,669 | 1 | 1.90 × 10−6 | IGS | LSC;SSC |
| 24 | 32 | 52,673 | R | 32 | 52,673 | 2 | 1.90 × 10−5 | IGS | LSC |
| 25 | 31 | 64,142 | R | 31 | 64,142 | 2 | 7.14 × 10−5 | IGS | LSC |
| 26 | 35 | 64,144 | C | 36 | 115,671 | 3 | 8.18 × 10−6 | IGS | LSC;SSC |
| 27 | 25 | 67,922 | F | 25 | 67,946 | 0 | 5.46 × 10−6 | IGS | LSC |
| 28 | 84 | 91,433 | F | 84 | 91,451 | 1 | 1.64 × 10−38 |
| LSC |
| 29 | 70 | 91,433 | F | 70 | 91,469 | 3 | 2.12 × 10−25 |
| LSC |
| 30 | 52 | 91,433 | F | 52 | 91,487 | 3 | 5.90 × 10−15 |
| LSC |
| 31 | 59 | 91,440 | F | 59 | 91,476 | 1 | 1.30 × 10−23 |
| LSC |
| 32 | 45 | 91,440 | F | 45 | 91,494 | 2 | 5.66 × 10−13 |
| LSC |
| 33 | 59 | 91,458 | F | 59 | 91,476 | 0 | 1.85 × 10−26 |
| LSC |
| 34 | 41 | 91,458 | F | 41 | 91,494 | 0 | 1.27 × 10−15 |
| LSC |
| 35 | 23 | 91,458 | F | 23 | 91,512 | 0 | 8.73 × 10−5 |
| LSC |
| 36 | 44 | 94,003 | F | 44 | 94,024 | 1 | 1.04 × 10−14 | IGS | IRb |
| 37 | 36 | 94,011 | F | 36 | 94032 | 0 | 1.30 × 10−12 | IGS | IRb |
| 38 | 41 | 98,960 | F | 41 | 122,481 | 3 | 1.19 × 10−8 | IGS;ndhA (intron) | IRb;SSC |
| 39 | 33 | 98,968 | F | 33 | 122,489 | 2 | 5.07 × 10−6 | IGS;ndhA (intron) | IRb;SSC |
| 40 | 42 | 99,905 | F | 42 | 99,926 | 0 | 3.18 × 10−16 |
| IRb |
| 41 | 34 | 107,943 | F | 34 | 107,975 | 1 | 8.43 × 10−9 | IGS | IRb |
| 42 | 31 | 108,296 | F | 31 | 132,709 | 2 | 7.14 × 10−5 | IGS | IRb;IRa |
| 43 | 23 | 114,349 | F | 23 | 114,381 | 0 | 0.0000873 | IGS | SSC |
| 44 | 28 | 115,668 | R | 28 | 115,668 | 0 | 8.53 × 10−8 | IGS | SSC |
| 45 | 31 | 122,640 | R | 31 | 122,640 | 0 | 1.33 × 10−9 | SSC | |
| 46 | 34 | 133,027 | F | 34 | 133,059 | 1 | 8.43 × 10−9 | IGS | IRa |
| 47 | 31 | 133,030 | F | 30 | 133,063 | 2 | 7.14 × 10−5 | IGS | IRa |
| 48 | 23 | 133,038 | F | 23 | 133,070 | 0 | 0.0000873 | IGS | IRa |
| 49 | 42 | 141,068 | F | 42 | 141,089 | 0 | 3.18 × 10−16 |
| IRa |
| 50 | 44 | 146,968 | F | 44 | 146,989 | 1 | 1.04 × 10−14 | IGS | IRa |
1 F, Forword; R, Reverse, C, complement; IGS, intergenic space.
Simple sequence repeats (SSRs) in the wild S. divaricata chloroplast genome.
| ID | Type | Repeat Motif | bp | Start | End | Region | Gene | ID | Type | Repeat | bp | Start | End | Region | Gene |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | p1 | (A)10 | 10 | 1539 | 1548 | LSC | 26 | c | (A)11gacaggtttttgctccttttcgtataatattcttgtattcttgtaa | 86 | 71,813 | 71,898 | LSC |
| |
| 2 | p1 | (A)10 | 10 | 1794 | 1803 | LSC |
| 27 | p1 | (T)10 | 10 | 72,637 | 72,646 | LSC | |
| 3 | p3 | (TTA)5 | 15 | 5419 | 5433 | LSC |
| 28 | p1 | (T)12 | 12 | 83,124 | 83,135 | LSC |
|
| 4 | p1 | (A)10 | 10 | 9393 | 9402 | LSC |
| 29 | p1 | (T)16 | 16 | 84,843 | 84,858 | LSC | |
| 5 | p2 | (AT)7 | 14 | 9867 | 9880 | LSC | 30 | p1 | (G)13 | 13 | 94,287 | 94,299 | IRb | ||
| 6 | p2 | (AT)9 | 18 | 13,059 | 13,076 | LSC | 31 | p1 | (T)13 | 13 | 99,335 | 99,347 | IRb | ||
| 7 | p1 | (A)14 | 14 | 16,406 | 16,419 | LSC | 32 | p1 | (T)10 | 10 | 103,203 | 103,212 | IRb |
| |
| 8 | p1 | (T)11 | 11 | 18,651 | 18,661 | LSC |
| 33 | p1 | (G)14 | 14 | 104,444 | 104,457 | IRb |
|
| 9 | p1 | (T)12 | 12 | 26,380 | 26,391 | LSC | 34 | p1 | (A)10 | 10 | 111,049 | 111,058 | IRb |
| |
| 10 | p1 | (A)10 | 10 | 27,392 | 27,401 | LSC | 35 | p1 | (A)11 | 11 | 111,833 | 111,843 | IRb |
| |
| 11 | p3 | (AAT)6 | 18 | 28,642 | 28,659 | LSC | 36 | p1 | (A)12 | 12 | 115,539 | 115,550 | SSC | ||
| 12 | p1 | (T)12 | 12 | 29,602 | 29,613 | LSC | 37 | c | (TA)6ttt(TA) | 57 | 115,669 | 115,725 | SSC | ||
| 13 | p1 | (T)12 | 12 | 32,753 | 32,764 | LSC | 38 | p1 | (A)13 | 13 | 116,776 | 116,788 | SSC |
| |
| 14 | p1 | (A)12 | 12 | 33,320 | 33,331 | LSC | 39 | p1 | (A)11 | 11 | 120,333 | 120,343 | SSC | ||
| 15 | p1 | (C)10 | 10 | 37,141 | 37,150 | LSC | 40 | p1 | (T)10 | 10 | 121,130 | 121,139 | SSC | ||
| 16 | p1 | (A)13 | 13 | 43,455 | 43,467 | LSC | 41 | p1 | (T)15 | 15 | 128,046 | 128,060 | SSC | ||
| 17 | p1 | (T)10 | 10 | 45,269 | 45,278 | LSC |
| 42 | p1 | (T)11 | 11 | 128,410 | 128,420 | SSC |
|
| 18 | p2 | (TA)7 | 14 | 47,465 | 47,478 | LSC | 43 | p1 | (T)10 | 10 | 128,671 | 128,680 | SSC |
| |
| 19 | p2 | (TA)7 | 14 | 51,926 | 51,939 | LSC | 44 | p1 | (T)11 | 11 | 129,194 | 129,204 | IRa | ||
| 20 | p1 | (T)10 | 10 | 52,688 | 52,697 | LSC | 45 | p1 | (T)10 | 10 | 129,979 | 129,988 | IRa | ||
| 21 | p1 | (T)10 | 10 | 55,648 | 55,657 | LSC |
| 46 | p1 | (C)14 | 14 | 136,580 | 136,593 | IRa |
|
| 22 | p1 | (A)18 | 18 | 56,234 | 56,251 | LSC | 47 | p1 | (A)10 | 10 | 137,825 | 137,834 | IRa |
| |
| 23 | p1 | (T)10 | 10 | 58,021 | 58,030 | LSC | 48 | p1 | (A)13 | 13 | 141,690 | 141,702 | IRa | ||
| 24 | p1 | (T)10 | 10 | 60,531 | 60,540 | LSC | 49 | p1 | (C)13 | 13 | 146,738 | 146,750 | IRa | ||
| 25 | c | (A)10tatcagaacttt | 34 | 64,123 | 64,156 | LSC |
Figure 7Phylogeny of 47 taxa within Apioideae Drude species based on the ML analysis of the cp genome’s IRs, LSC, and SSC regions with Sanicula chinensis and Centella asiatica as the outgroups based on FastTree (left) and IQ-TREE (right). The information of all chloroplast genomes used for phylogenetic analysis was shown in Table S3.