| Literature DB >> 23460883 |
Jun Qian1, Jingyuan Song, Huanhuan Gao, Yingjie Zhu, Jiang Xu, Xiaohui Pang, Hui Yao, Chao Sun, Xian'en Li, Chuyuan Li, Juyan Liu, Haibin Xu, Shilin Chen.
Abstract
Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23460883 PMCID: PMC3584094 DOI: 10.1371/journal.pone.0057607
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Base composition in the Salvia miltiorrhiza chloroplast genome.
| T(U) (%) | C (%) | A (%) | G (%) | Length (bp) | ||
| LSC | 32.6 | 18.5 | 31.2 | 17.7 | 82,695 | |
| SSC | 33.8 | 16.7 | 34.2 | 15.3 | 17,555 | |
| IRa | 28.5 | 22.4 | 28.4 | 20.7 | 25,539 | |
| IRb | 28.4 | 20.7 | 28.5 | 22.4 | 25,539 | |
| Total | 31.3 | 19.3 | 30.6 | 18.7 | 151,328 | |
| CDS | 31.4 | 17.8 | 30.5 | 20.3 | 79,080 | |
| 1st position | 23.7 | 19.0 | 30.5 | 26.8 | 26,360 | |
| 2nd position | 32.6 | 20.4 | 29.2 | 17.8 | 26,360 | |
| 3rd position | 37.8 | 14.0 | 31.8 | 16.3 | 26,360 |
CDS: protein-coding regions.
Figure 1Gene map of the Salvia miltiorrhiza chloroplast genome.
Genes drawn inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes belonging to different functional groups are color-coded. The darker gray in the inner circle corresponds to GC content, while the lighter gray corresponds to AT content.
The genes with introns in the Salvia miltiorrhiza chloroplast genome and the length of the exons and introns.
| Gene | Location | Exon I (bp) | Intron I (bp) | Exon II (bp) | Intron II (bp) | Exon III (bp) |
|
| LSC | 144 | 699 | 411 | ||
|
| LSC | 71 | 692 | 292 | 628 | 228 |
|
| SSC | 553 | 985 | 539 | ||
|
| IR | 777 | 675 | 756 | ||
|
| LSC | 6 | 702 | 642 | ||
|
| LSC | 8 | 720 | 475 | ||
|
| LSC | 9 | 873 | 399 | ||
|
| IR | 391 | 658 | 434 | ||
|
| LSC | 456 | 759 | 1620 | ||
|
| LSC | 114 | - | 232 | 526 | 26 |
|
| LSC | 42 | 874 | 195 | ||
|
| IR | 38 | 795 | 35 | ||
|
| LSC | 23 | 682 | 48 | ||
|
| IR | 37 | 940 | 35 | ||
|
| LSC | 37 | 2522 | 35 | ||
|
| LSC | 37 | 453 | 50 | ||
|
| LSC | 36 | 576 | 37 | ||
|
| LSC | 129 | 696 | 228 | 726 | 153 |
The rps12 is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ end in the IR regions.
The codon–anticodon recognition pattern and codon usage for the Salvia miltiorrhiza chloroplast genome.
| Amino acid | Codon | No. | RSCU | tRNA | Amino acid | Codon | No. | RSCU | tRNA |
| Phe | UUU | 999 | 1.33 | Tyr | UAU | 771 | 1.63 | ||
| Phe | UUC | 499 | 0.67 |
| Tyr | UAC | 175 | 0.37 |
|
| Leu | UUA | 860 | 1.84 |
| Stop | UAA | 46 | 1.6 | |
| Leu | UUG | 569 | 1.22 |
| Stop | UAG | 22 | 0.77 | |
| Leu | CUU | 609 | 1.3 | His | CAU | 479 | 1.53 | ||
| Leu | CUC | 180 | 0.38 | His | CAC | 146 | 0.47 |
| |
| Leu | CUA | 394 | 0.84 |
| Gln | CAA | 723 | 1.54 |
|
| Leu | CUG | 194 | 0.41 | Gln | CAG | 217 | 0.46 | ||
| Ile | AUU | 1096 | 1.48 | Asn | AAU | 967 | 1.54 | ||
| Ile | AUC | 461 | 0.62 |
| Asn | AAC | 289 | 0.46 |
|
| Ile | AUA | 667 | 0.9 |
| Lys | AAA | 1062 | 1.48 |
|
| Met | AUG | 629 | 1 |
| Lys | AAG | 370 | 0.52 | |
| Val | GUU | 526 | 1.46 | Asp | GAU | 867 | 1.6 | ||
| Val | GUC | 178 | 0.5 |
| Asp | GAC | 216 | 0.4 |
|
| Val | GUA | 542 | 1.51 |
| Glu | GAA | 1016 | 1.5 |
|
| Val | GUG | 191 | 0.53 | Glu | GAG | 343 | 0.5 | ||
| Ser | UCU | 584 | 1.7 | Cys | UGU | 222 | 1.52 | ||
| Ser | UCC | 344 | 1 |
| Cys | UGC | 70 | 0.48 |
|
| Ser | UCA | 398 | 1.16 |
| Stop | UGA | 18 | 0.63 | |
| Ser | UCG | 200 | 0.58 | Trp | UGG | 470 | 1 |
| |
| Pro | CCU | 405 | 1.44 | Arg | CGU | 338 | 1.28 |
| |
| Pro | CCC | 234 | 0.83 | Arg | CGC | 121 | 0.46 | ||
| Pro | CCA | 324 | 1.15 |
| Arg | CGA | 353 | 1.33 | |
| Pro | CCG | 161 | 0.57 | Arg | CGG | 131 | 0.49 | ||
| Thr | ACU | 539 | 1.63 | Arg | AGA | 488 | 1.84 |
| |
| Thr | ACC | 247 | 0.75 |
| Arg | AGG | 159 | 0.6 | |
| Thr | ACA | 388 | 1.17 |
| Ser | AGU | 420 | 1.22 | |
| Thr | ACG | 150 | 0.45 | Ser | AGC | 115 | 0.33 |
| |
| Ala | GCU | 603 | 1.73 | Gly | GGU | 539 | 1.21 | ||
| Ala | GCC | 234 | 0.67 | Gly | GGC | 191 | 0.43 |
| |
| Ala | GCA | 393 | 1.13 |
| Gly | GGA | 720 | 1.62 |
|
| Ala | GCG | 167 | 0.48 | Gly | GGG | 331 | 0.74 |
RSCU: Relative Synonymous Codon Usage.
Repeated sequences in the Salvia miltiorrhiza chloroplast genome.
| Repeat number | Size (bp) | Type | Location | Repeat Unit | Region |
| 1 | 30 | F |
|
| LSC |
| 2 | 32 | F |
|
| LSC |
| 3 | 39 | F |
|
| LSC, SSC |
| 4 | 41 | F | IGS ( |
| IRb, SSC |
| 5 | 30 | I |
|
| LSC |
| 6 | 30 | I |
|
| LSC |
| 7 | 41 | I |
|
| SSC, IRa |
| 8 | 40 | T | IGS ( |
| LSC |
| 9 | 32 | T | IGS ( |
| LSC |
| 10 | 33 | T | IGS ( |
| LSC |
| 11 | 34 | T | IGS ( |
| LSC |
| 12 | 63 | T |
|
| IRb,a |
| 13 | 108 | T |
|
| IRb,a |
| 14 | 39 | T |
|
| SSC |
F: Forward; I: Inverted; T: Tandem; IGS: Intergenic spacer; CDS: protein-coding regions. The underline represents the shared repeats with Sesamum indicum.
Distribution of SSRs present in the 30 asterid chloroplast genomes.
| Taxon | Genome Size (bp) | AT (%) | SSR type | CDS | ||||||||
| Mono | Di | Tri | Tetra | Penta | Hexa | Total | % | No. | % | |||
|
| 150,698 | 63 | 115 | 35 | 4 | 7 | 0 | 1 | 162 | 49 | 50 | 31 |
|
| 154,719 | 63 | 141 | 62 | 3 | 8 | 2 | 1 | 217 | 50 | 68 | 31 |
|
| 156,687 | 62 | 117 | 46 | 2 | 9 | 0 | 0 | 174 | 51 | 50 | 29 |
|
| 153,493 | 62 | 98 | 40 | 3 | 8 | 0 | 0 | 149 | 52 | 46 | 31 |
|
| 155,189 | 63 | 115 | 46 | 3 | 4 | 0 | 0 | 168 | 51 | 57 | 34 |
|
| 155,871 | 62 | 109 | 40 | 3 | 8 | 0 | 0 | 160 | 52 | 53 | 33 |
|
| 155,911 | 62 | 133 | 56 | 6 | 8 | 2 | 0 | 205 | 50 | 69 | 34 |
|
| 156,768 | 62 | 109 | 47 | 2 | 7 | 0 | 0 | 165 | 50 | 56 | 34 |
|
| 151,762 | 62 | 119 | 42 | 2 | 8 | 0 | 1 | 172 | 52 | 71 | 41 |
|
| 151,104 | 62 | 119 | 33 | 4 | 4 | 0 | 0 | 160 | 51 | 63 | 39 |
|
| 162,046 | 63 | 146 | 38 | 5 | 13 | 1 | 1 | 204 | 53 | 67 | 33 |
|
| 150,689 | 63 | 124 | 51 | 8 | 8 | 0 | 0 | 191 | 51 | 55 | 29 |
|
| 165,121 | 62 | 149 | 42 | 8 | 9 | 3 | 3 | 214 | 50 | 81 | 38 |
|
| 152,765 | 62 | 118 | 49 | 3 | 2 | 0 | 0 | 172 | 48 | 43 | 25 |
|
| 155,941 | 62 | 118 | 41 | 5 | 9 | 1 | 0 | 174 | 54 | 62 | 36 |
|
| 155,943 | 62 | 118 | 41 | 5 | 9 | 1 | 0 | 174 | 54 | 62 | 36 |
|
| 155,745 | 62 | 122 | 43 | 4 | 8 | 1 | 0 | 178 | 54 | 58 | 33 |
|
| 155,863 | 62 | 119 | 41 | 3 | 10 | 1 | 0 | 174 | 56 | 62 | 36 |
|
| 155,888 | 62 | 155 | 35 | 0 | 4 | 2 | 0 | 196 | 51 | 50 | 26 |
|
| 155,862 | 62 | 152 | 36 | 0 | 3 | 2 | 0 | 193 | 51 | 46 | 24 |
|
| 155,875 | 62 | 153 | 35 | 0 | 3 | 2 | 0 | 193 | 51 | 46 | 24 |
|
| 155,896 | 62 | 153 | 36 | 0 | 3 | 2 | 0 | 194 | 51 | 46 | 24 |
|
| 155,942 | 62 | 153 | 36 | 0 | 4 | 2 | 0 | 195 | 51 | 45 | 23 |
|
| 156,318 | 62 | 92 | 39 | 3 | 8 | 2 | 1 | 145 | 50 | 54 | 37 |
|
| 151,328 | 62 | 122 | 35 | 0 | 8 | 0 | 1 | 166 | 52 | 53 | 32 |
|
| 153,324 | 62 | 137 | 38 | 3 | 7 | 0 | 1 | 186 | 51 | 54 | 29 |
|
| 155,371 | 62 | 106 | 37 | 2 | 8 | 1 | 1 | 155 | 51 | 48 | 31 |
|
| 155,461 | 62 | 114 | 33 | 1 | 7 | 1 | 0 | 156 | 51 | 46 | 29 |
|
| 155,296 | 62 | 103 | 36 | 2 | 8 | 1 | 0 | 150 | 51 | 46 | 31 |
|
| 162,321 | 62 | 96 | 47 | 5 | 17 | 0 | 1 | 166 | 43 | 44 | 27 |
CDS: protein-coding regions.
Percentage were calculated according to the total length of the CDS divided by the genome size.
Total number of SSRs identified in the CDS.
Percentage were calculated according to the total number of SSRs in the CDS divided by the total number of SSRs in the genome.
Figure 2Comparison of four chloroplast genome using mVISTA program.
Grey arrows and thick black lines above the alignment indicate genes with their orientation and the position of the IRs, respectively. A cut-off of 70% identity was used for the plots, and the Y-scale represents the percent identity between 50–100%. Genome regions are color-coded as protein-coding (exon), rRNA, tRNA and conserved noncoding sequences (CNS).
Figure 3Comparison of the borders of LSC, SSC and IR regions among four chloroplast genomes.
The IRb/SSC border extended into the ycf1 genes to create various lengths of ycf1 pseudogenes among four chloroplast genomes. The ycf1 pseudogene and the ndhF gene overlapped in both the Salvia miltiorrhiza and Arabidopsis thaliana cp genomes by 32 bp and 37 bp, respectively. Various lengths of rps19 pseudogenes were created at the IRa/LSC borders of Salvia miltiorrhiza, Sesamum indicum and Arabidopsis thaliana. This figure is not to scale.
Figure 4The MP phylogenetic tree of the asterid clade based on 71 protein-coding genes.
The MP tree has a length of 36,088, with a consistency index of 0.6628 and a retention index of 0.7561. Numbers above each node are bootstrap support values. Spinacia oleracea and Arabidopsis thaliana were set as outgroups.