| Literature DB >> 30149578 |
Xiang-Xiao Meng1, Yan-Fang Xian2, Li Xiang3, Dong Zhang4, Yu-Hua Shi5, Ming-Li Wu6, Gang-Qiang Dong7, Siu-Po Ip8, Zhi-Xiu Lin9, Lan Wu10,11, Wei Sun12.
Abstract
The genus Sanguisorba, which contains about 30 species around the world and seven species in China, is the source of the medicinal plant Sanguisorba officinalis, which is commonly used as a hemostatic agent as well as to treat burns and scalds. Here we report the complete chloroplast (cp) genome sequences of four Sanguisorba species (S. officinalis, S. filiformis, S. stipulata, and S. tenuifolia var. alba). These four Sanguisorba cp genomes exhibit typical quadripartite and circular structures, and are 154,282 to 155,479 bp in length, consisting of large single-copy regions (LSC; 84,405⁻85,557 bp), small single-copy regions (SSC; 18,550⁻18,768 bp), and a pair of inverted repeats (IRs; 25,576⁻25,615 bp). The average GC content was ~37.24%. The four Sanguisorba cp genomes harbored 112 different genes arranged in the same order; these identical sections include 78 protein-coding genes, 30 tRNA genes, and four rRNA genes, if duplicated genes in IR regions are counted only once. A total of 39⁻53 long repeats and 79⁻91 simple sequence repeats (SSRs) were identified in the four Sanguisorba cp genomes, which provides opportunities for future studies of the population genetics of Sanguisorba medicinal plants. A phylogenetic analysis using the maximum parsimony (MP) method strongly supports a close relationship between S. officinalis and S. tenuifolia var. alba, followed by S. stipulata, and finally S. filiformis. The availability of these cp genomes provides valuable genetic information for future studies of Sanguisorba identification and provides insights into the evolution of the genus Sanguisorba.Entities:
Keywords: Sanguisorba; chloroplast genome; molecular structure; phylogenetic analysis
Mesh:
Substances:
Year: 2018 PMID: 30149578 PMCID: PMC6225366 DOI: 10.3390/molecules23092137
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Sequence information and Illumina next-generation sequencing (NGS) data of the four Sanguisorba chloroplast genomes.
| Species | Raw Reads No. | Mapped Reads No. | Sequencing Depth | Cp Genome Length (bp) | GC Content (%) | LSC a (bp) | SSC a (bp) | IRs a (bp) |
|---|---|---|---|---|---|---|---|---|
|
| 11,554,422 | 609,666 | 581X | 155,479 | 37.19 | 85,547 | 18,768 | 25,582 |
|
| 16,876,554 | 656,271 | 628X | 154,282 | 37.33 | 84,405 | 18,659 | 25,609 |
|
| 18,828,898 | 1,080,144 | 1032X | 155,127 | 37.23 | 85,347 | 18,550 | 25,615 |
| 18,366,336 | 598,166 | 569X | 155,457 | 37.20 | 85,557 | 18,748 | 25,576 |
a LSC (large single-copy regions), SSC (small single-copy regions), and IRs (inverted repeats regions).
Figure 1Gene map of Sanguisorba officinalis chloroplast genome. Genes shown inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes in different functional groups are color-coded.
List of genes encoded by the four Sanguisorba chloroplast genomes.
| Category | Group | Name |
|---|---|---|
| Self-replication | rRNA genes | |
| tRNA genes | ||
| Small subunit of ribosome | ||
| Large subunit of ribosome | ||
| DNA dependent RNA polymerase | ||
| Genes for phytosynthesis | Subunits of NADH-dehydrogenase | |
| Subunits of photosystem I | ||
| Subunits of photosystem II | ||
| Subunits of cytochrome b/f complex | ||
| Subunits of ATP synthase | ||
| Large subunit of RuBisCO |
| |
| Other genes | Maturase |
|
| Envelope membrane protein |
| |
| Subunit of Acetyl-CoA-carboxylase |
| |
| C-type cytochrome synthesis gene |
| |
| Protease | ||
| Genes of unknown function | Open Reading Frames (ORF, ycf) | |
| Pseudo genes |
* Gene with one intron, ** Gene with two introns, a Gene with two copies.
The length of exons and introns in genes with introns in the Sanguisorba officinalis chloroplast genome.
| No. | Gene | Location | Exon I (bp) | Intron I (bp) | Exon II (bp) | Intron I (bp) | Exon III (bp) |
|---|---|---|---|---|---|---|---|
| 1 |
| LSC | 69 | 938 | 291 | 658 | 228 |
| 2 |
| SSC | 563 | 1185 | 541 | ||
| 3 |
| IR | 777 | 682 | 756 | ||
| 4 |
| LSC | 6 | 761 | 657 | ||
| 5 |
| LSC | 9 | 750 | 474 | ||
| 6 |
| LSC | 8 | 1011 | 403 | ||
| 7 |
| IR | 391 | 673 | 434 | ||
| 8 |
| LSC | 435 | 749 | 1620 | ||
| 9 | LSC | 114 | - | 232 | 543 | 26 | |
| 10 |
| LSC | 39 | 899 | 228 | ||
| 11 |
| IR | 38 | 814 | 35 | ||
| 12 |
| LSC | 23 | 698 | 48 | ||
| 13 |
| IR | 42 | 949 | 35 | ||
| 14 |
| LSC | 37 | 2516 | 35 | ||
| 15 |
| LSC | 37 | 554 | 50 | ||
| 16 |
| LSC | 39 | 601 | 37 | ||
| 17 |
| LSC | 126 | 723 | 228 | 766 | 153 |
* rps12 is a trans-spliced gene, of which two 3′ end residues are located in the IR region and the 5′ end in the LSC region.
Codon usage in the Sanguisorba officinalis chloroplast genomes. RSCU: Relative Synonymous Codon Usage.
| Amino Acid | Codon | Count | RSCU | tRNA | Amino Acid | Codon | Count | RSCU | tRNA |
|---|---|---|---|---|---|---|---|---|---|
| Phe | UUU | 899 | 1.38 | Tyr | UAU | 682 | 1.61 | ||
| Phe | UUC | 401 | 0.62 |
| Tyr | UAC | 165 | 0.39 |
|
| Leu | UUA | 810 | 2.03 |
| Stop | UAA | 43 | 1.65 | |
| Leu | UUG | 466 | 1.17 |
| Stop | UAG | 20 | 0.77 | |
| Leu | CUU | 503 | 1.26 | His | CAU | 403 | 1.51 | ||
| Leu | CUC | 148 | 0.37 | His | CAC | 132 | 0.49 |
| |
| Leu | CUA | 306 | 0.77 |
| Gln | CAA | 616 | 1.53 |
|
| Leu | CUG | 156 | 0.39 | Gln | CAG | 191 | 0.47 | ||
| Ile | AUU | 983 | 1.5 | Asn | AAU | 825 | 1.53 | ||
| Ile | AUC | 369 | 0.56 |
| Asn | AAC | 256 | 0.47 |
|
| Ile | AUA | 614 | 0.94 | Lys | AAA | 925 | 1.54 |
| |
| Met | AUG | 531 | 1 | Lys | AAG | 280 | 0.46 | ||
| Val | GUU | 471 | 1.48 | Asp | GAU | 712 | 1.62 | ||
| Val | GUC | 152 | 0.48 |
| Asp | GAC | 168 | 0.38 |
|
| Val | GUA | 474 | 1.48 |
| Glu | GAA | 904 | 1.52 |
|
| Val | GUG | 180 | 0.56 | Glu | GAG | 287 | 0.48 | ||
| Ser | UCU | 469 | 1.69 | Cys | UGU | 204 | 1.56 | ||
| Ser | UCC | 263 | 0.95 |
| Cys | UGC | 57 | 0.44 |
|
| Ser | UCA | 306 | 1.1 |
| Stop | UGA | 15 | 0.58 | |
| Ser | UCG | 171 | 0.61 | Trp | UGG | 396 | 1 |
| |
| Pro | CCU | 352 | 1.47 | Arg | CGU | 307 | 1.36 |
| |
| Pro | CCC | 198 | 0.83 | Arg | CGC | 95 | 0.42 | ||
| Pro | CCA | 257 | 1.08 |
| Arg | CGA | 312 | 1.38 | |
| Pro | CCG | 149 | 0.62 | Arg | CGG | 103 | 0.46 | ||
| Thr | ACU | 465 | 1.59 | Ser | AGU | 349 | 1.25 | ||
| Thr | ACC | 224 | 0.76 |
| Ser | AGC | 111 | 0.4 |
|
| Thr | ACA | 348 | 1.19 |
| Arg | AGA | 391 | 1.74 |
|
| Thr | ACG | 135 | 0.46 | Arg | AGG | 144 | 0.64 | ||
| Ala | GCU | 576 | 1.79 | Gly | GGU | 524 | 1.32 | ||
| Ala | GCC | 201 | 0.63 | Gly | GGC | 192 | 0.48 |
| |
| Ala | GCA | 348 | 1.08 |
| Gly | GGA | 568 | 1.43 |
|
| Ala | GCG | 161 | 0.5 | Gly | GGG | 305 | 0.77 | ||
| Average # codons = 22,768 | |||||||||
Types and numbers of SSRs found in the four Sanguisorba chloroplast genomes.
| SSR Type | Repeat Unit | Number | |||
|---|---|---|---|---|---|
|
|
|
| |||
| Mono | A/T | 55 | 56 | 68 | 53 |
| C/G | 4 | 3 | 1 | 2 | |
| Di | AT/AT | 11 | 9 | 8 | 11 |
| AG/CT | 1 | 1 | 1 | 1 | |
| Tri | AAT/ATT | 3 | 4 | 3 | 4 |
| Tetra | AAAT/ATTT | 4 | 3 | 5 | 4 |
| AAAG/CTTT | 1 | 1 | 1 | 1 | |
| ACAT/ATGT | 1 | 1 | 1 | 1 | |
| AGAT/ATCT | 1 | 1 | 1 | 1 | |
| AATT/AATT | 0 | 1 | 1 | 0 | |
| Penta | AAATT/AATTT | 1 | 0 | 0 | 1 |
| Hexa | AAAGGG/CCCTTT | 0 | 2 | 0 | 0 |
| AAAATC/ATTTTG | 0 | 0 | 1 | 0 | |
| Total | 82 | 82 | 91 | 79 | |
Figure 2Comparison of the border regions of the LSC, SSC, and IR among four chloroplast genomes. Ψ: pseudogenes.
Figure 3Comparison of the four Sanguisorba chloroplast genomes using mVISTA. CNS indicates conserved noncoding sequences. The Y-scale represents the percent identity between 50% and 100%.
Figure 4Phylogenetic relationships between the four Sanguisorba species determined by whole cp genome sequences using the maximum parsimony (MP) method. Fragaria chiloensis was set as the outgroup.