| Literature DB >> 27517885 |
Beibei Xiang1, Xiaoxue Li2, Jun Qian3, Lizhi Wang4, Lin Ma5, Xiaoxuan Tian6, Yong Wang7.
Abstract
Swertia mussotii is an important medicinal plant that has great economic and medicinal value and is found on the Qinghai Tibetan Plateau. The complete chloroplast (cp) genome of S. mussotii is 153,431 bp in size, with a pair of inverted repeat (IR) regions of 25,761 bp each that separate an large single-copy (LSC) region of 83,567 bp and an a small single-copy (SSC) region of 18,342 bp. The S. mussotii cp genome encodes 84 protein-coding genes, 37 transfer RNA (tRNA) genes, and eight ribosomal RNA (rRNA) genes. The identity, number, and GC content of S. mussotii cp genes were similar to those in the genomes of other Gentianales species. Via analysis of the repeat structure, 11 forward repeats, eight palindromic repeats, and one reverse repeat were detected in the S. mussotii cp genome. There are 45 SSRs in the S. mussotii cp genome, the majority of which are mononucleotides found in all other Gentianales species. An entire cp genome comparison study of S. mussotii and two other species in Gentianaceae was conducted. The complete cp genome sequence provides intragenic information for the cp genetic engineering of this medicinal plant.Entities:
Keywords: Gentianaceae; PacBio RS; Swertia mussotii; chloroplast genome; medicinal plant
Mesh:
Substances:
Year: 2016 PMID: 27517885 PMCID: PMC6274542 DOI: 10.3390/molecules21081029
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Base composition in the S. mussotii chloroplast genome.
| Region | T (U) (%) | C (%) | A (%) | G (%) | Length (bp) | |
|---|---|---|---|---|---|---|
| LSC | 32.6 | 18.5 | 31.2 | 17.7 | 83,567 | |
| SSC | 34.1 | 16.3 | 34.0 | 15.6 | 18,342 | |
| IRa | 28.3 | 22.5 | 28.2 | 21.0 | 25,761 | |
| IRb | 28.2 | 21.0 | 28.3 | 22.5 | 25,761 | |
| Total | 31.3 | 19.3 | 30.5 | 18.8 | 153,431 | |
| CDS | 31.3 | 18.1 | 30.2 | 20.4 | 77,193 | |
| 1st position | 23.9 | 19.2 | 30.4 | 26.5 | 25731 | |
| 2nd position | 32.6 | 20.6 | 28.7 | 18.1 | 25731 | |
| 3rd position | 37.2 | 14.6 | 31.6 | 16.6 | 25731 |
Figure 1Gene map of the S. mussotii chloroplast genome. Genes drawn inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes are colour-coded based on the functional groups to which they belong. CDS: protein-coding regions.
Genes present in the S. mussotii chloroplast genome.
| No. | Group of Genes | Gene Names |
|---|---|---|
| 1 | Photosystem I | |
| 2 | Photosystem II | |
| 3 | Cytochrome b/f complex | |
| 4 | ATP synthase | |
| 5 | NADH dehydrogenase | |
| 6 | RuBisCO large subunit | |
| 7 | RNA polymerase | |
| 8 | Ribosomal proteins (SSU) | |
| 9 | Ribosomal proteins (LSU) | |
| 10 | Other genes | |
| 11 | Proteins of unknown function | |
| 12 | Transfer RNAs | 37 tRNAs (6 contain one intron each, 7 in the IRs) |
| 13 | Ribosomal RNAs |
The presence of one or two asterisks after the name of a gene indicates that that gene contains one or two introns, respectively.
The genes with introns in the S. mussotii chloroplast genome and the lengths of the exons and introns.
| Gene | Location | Exon I (bp) | Intron I (bp) | Exon II (bp) | Intron II (bp) | Exon III (bp) |
|---|---|---|---|---|---|---|
| LSC | 161 | 700 | 403 | |||
| LSC | 71 | 784 | 292 | 680 | 228 | |
| SSC | 561 | 1117 | 540 | |||
| IR | 777 | 683 | 756 | |||
| LSC | 6 | 727 | 642 | |||
| LSC | 8 | 678 | 475 | |||
| LSC | 9 | 764 | 399 | |||
| IR | 393 | 657 | 435 | |||
| LSC | 435 | 734 | 1623 | |||
| LSC | 114 | - | 232 | 535 | 26 | |
| IR | 38 | 824 | 35 | |||
| LSC | 23 | 689 | 48 | |||
| IR | 37 | 950 | 35 | |||
| LSC | 37 | 2496 | 35 | |||
| LSC | 37 | 374 | 50 | |||
| LSC | 38 | 601 | 37 | |||
| LSC | 126 | 745 | 228 | 770 | 153 |
* The rps12 gene is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ end in the IR region.
The codon-anticodon recognition pattern and codon usage for the S. mussotii chloroplast genome.
| Amino Acid | Codon | No. | RSCU | tRNA | Amino Acid | Codon | No. | RSCU | tRNA |
|---|---|---|---|---|---|---|---|---|---|
| Phe | UUU | 981 | 1.32 | Tyr | UAU | 738 | 1.59 | ||
| Phe | UUC | 507 | 0.68 | Tyr | UAC | 189 | 0.41 | ||
| Leu | UUA | 847 | 1.84 | Stop | UAA | 49 | 1.75 | ||
| Leu | UUG | 551 | 1.19 | Stop | UAG | 21 | 0.75 | ||
| Leu | CUU | 610 | 1.32 | His | CAU | 467 | 1.5 | ||
| Leu | CUC | 187 | 0.41 | His | CAC | 157 | 0.5 | ||
| Leu | CUA | 392 | 0.85 | Gln | CAA | 698 | 1.54 | ||
| Leu | CUG | 182 | 0.39 | Gln | CAG | 207 | 0.46 | ||
| Ile | AUU | 1047 | 1.47 | Asn | AAU | 920 | 1.5 | ||
| Ile | AUC | 435 | 0.61 | Asn | AAC | 303 | 0.5 | ||
| Ile | AUA | 660 | 0.92 | Lys | AAA | 988 | 1.45 | ||
| Met | AUG | 582 | 1 | Lys | AAG | 377 | 0.55 | ||
| Val | GUU | 510 | 1.45 | Asp | GAU | 802 | 1.61 | ||
| Val | GUC | 187 | 0.53 | Asp | GAC | 194 | 0.39 | ||
| Val | GUA | 528 | 1.5 | Glu | GAA | 923 | 1.45 | ||
| Val | GUG | 185 | 0.52 | Glu | GAG | 350 | 0.55 | ||
| Ser | UCU | 540 | 1.6 | Cys | UGU | 221 | 1.52 | ||
| Ser | UCC | 352 | 1.04 | Cys | UGC | 70 | 0.48 | ||
| Ser | UCA | 382 | 1.13 | Stop | UGA | 14 | 0.5 | ||
| Ser | UCG | 221 | 0.66 | Trp | UGG | 461 | 1 | ||
| Pro | CCU | 395 | 1.42 | Arg | CGU | 339 | 1.28 | ||
| Pro | CCC | 234 | 0.84 | Arg | CGC | 102 | 0.39 | ||
| Pro | CCA | 318 | 1.14 | Arg | CGA | 356 | 1.35 | ||
| Pro | CCG | 166 | 0.6 | Arg | CGG | 139 | 0.53 | ||
| Thr | ACU | 485 | 1.46 | Arg | AGA | 385 | 1.14 | ||
| Thr | ACC | 272 | 0.82 | Arg | AGG | 143 | 0.42 | ||
| Thr | ACA | 413 | 1.24 | Ser | AGU | 477 | 1.81 | ||
| Thr | ACG | 157 | 0.47 | Ser | AGC | 171 | 0.65 | ||
| Ala | GCU | 614 | 1.8 | Gly | GGU | 534 | 1.2 | ||
| Ala | GCC | 225 | 0.66 | Gly | GGC | 198 | 0.45 | ||
| Ala | GCA | 378 | 1.11 | Gly | GGA | 705 | 1.59 | ||
| Ala | GCG | 148 | 0.43 | Gly | GGG | 342 | 0.77 |
RSCU: Relative Synonymous Codon Usage.
Repeat sequences and their distribution in the S. mussotii chloroplast genome.
| No. | Size (bp) | Type | Repeat 1 Start | Repeat 1 Location | Repeat 2 Start | Repeat 2 Location | Region |
|---|---|---|---|---|---|---|---|
| 1 | 39 | F | 97971 | IGS ( | 119586 | IRb, SSC | |
| 2 | 38 | F | 44377 | 97971 | IGS ( | LSC, IRb | |
| 3 | 38 | F | 44377 | 119586 | LSC, SSC | ||
| 4 | 37 | F | 216 | IGS ( | 244 | IGS ( | LSC |
| 5 | 38 | F | 39302 | 41526 | LSC | ||
| 6 | 32 | F | 8154 | 36099 | LSC | ||
| 7 | 30 | F | 7704 | IGS ( | 28958 | IGS ( | LSC |
| 8 | 30 | F | 9536 | 37013 | LSC | ||
| 9 | 30 | F | 38751 | 40966 | LSC | ||
| 10 | 30 | F | 58479 | 58512 | LSC | ||
| 11 | 30 | F | 75545 | 138996 | IGS ( | LSC, IRa | |
| 12 | 51 | P | 114672 | IGS ( | 114675 | IGS ( | SSC |
| 13 | 39 | P | 119586 | 138988 | IGS ( | SSC, IRa | |
| 14 | 38 | P | 44377 | 138989 | IGS ( | LSC, IRa | |
| 15 | 32 | P | 8154 | 45722 | LSC | ||
| 16 | 32 | P | 36096 | 45725 | LSC | ||
| 17 | 30 | P | 44378 | 75545 | LSC | ||
| 18 | 30 | P | 75545 | 119587 | LSC, SSC | ||
| 19 | 30 | P | 75545 | 97972 | IGS ( | LSC, IRb | |
| 20 | 31 | R | 42871 | IGS ( | 42875 | IGS ( | LSC |
F = forward, P = palindrome, IGS = intergenic spacer.
Figure 2Repeat sequences in six Gentianales chloroplast genomes. REPuter was used to identify repeat sequences with length ≥ 30 bp and sequence identify ≥90% in the chloroplast genomes. F, P, R, and C indicate the repeat types F (forward), P (palindrome), R (reverse), and C (complement), respectively. Repeats with different lengths are indicated in different colours.
Simple sequence repeats in the S. mussotii chloroplast genome.
| Unit | Length | No. | SSR Start | Region |
|---|---|---|---|---|
| A | 16 | 1 | 68265 | LSC |
| 13 | 3 | 45315 | LSC | |
| 80949 | LSC | |||
| 114240 | SSC | |||
| 11 | 1 | 22183 | LSC | |
| 10 | 7 | 8410 | LSC | |
| 12227 | LSC | |||
| 57572 | LSC | |||
| 63341 | LSC | |||
| 71135 | LSC | |||
| 77632 | LSC | |||
| 122496 | SSC | |||
| C | 11 | 1 | 60812 | LSC |
| T | 14 | 1 | 60823 | LSC |
| 13 | 2 | 118296 | SSC | |
| 118428 | SSC | |||
| 12 | 4 | 5757 | LSC | |
| 32886 | LSC | |||
| 35984 | LSC | |||
| 112141 | SSC | |||
| 11 | 3 | 1828 | LSC | |
| 124064 | SSC | |||
| 125507 | SSC | |||
| 10 | 7 | 92 | LSC | |
| 7909 | LSC | |||
| 54930 | LSC | |||
| 66001 | LSC | |||
| 120007 | LSC | |||
| 125752 | LSC | |||
| 127189 | SSC | |||
| AT | 10 | 1 | 47791 | LSC |
| TA | 10 | 1 | 47617 | LSC |
| ATT | 15 | 1 | 119656 | LSC |
| TTA | 12 | 1 | 127046 | LSC |
| TTC | 12 | 1 | 35761 | LSC |
| TTG | 12 | 1 | 111418 | SSC |
| AATT | 16 | 1 | 29843 | LSC |
| ATTT | 12 | 1 | 116917 | SSC |
| CATA | 12 | 1 | 151279 | IRa |
| TATG | 12 | 1 | 85709 | IRb |
| TATT | 12 | 1 | 116932 | SSC |
| TGTC | 12 | 1 | 30554 | LSC |
| TAATA | 15 | 1 | 116944 | SSC |
| TATTG | 15 | 1 | 62151 | LSC |
| CCTTTA | 18 | 1 | 37196 | LSC |
Distribution of SSRs present in the Gentianales chloroplast genomes.
| Taxon | Genome Size (bp) | AT (%) | SSR Type | CDS | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Mono | Di | Tri | Tetra | Penta | Hexa | Total | % a | No. b | % c | |||
| 153,431 | 62 | 30 | 2 | 4 | 6 | 2 | 1 | 45 | 58 | 10 | 22 | |
| 148,991 | 62 | 27 | 3 | 2 | 7 | 0 | 0 | 39 | 61 | 10 | 26 | |
| 148,776 | 62 | 27 | 4 | 2 | 7 | 0 | 1 | 41 | 61 | 10 | 24 | |
| 155,189 | 63 | 31 | 5 | 3 | 4 | 0 | 0 | 43 | 59 | 8 | 19 | |
| 154,950 | 62 | 33 | 6 | 7 | 9 | 1 | 0 | 56 | 59 | 5 | 9 | |
| 161,592 | 62 | 47 | 15 | 6 | 23 | 3 | 4 | 98 | 56 | 17 | 17 | |
| 158,719 | 62 | 56 | 13 | 7 | 16 | 2 | 7 | 101 | 55 | 17 | 17 | |
| 154,841 | 62 | 33 | 5 | 9 | 12 | 3 | 0 | 62 | 58 | 6 | 10 | |
| 153,970 | 62 | 47 | 9 | 7 | 7 | 1 | 1 | 72 | 59 | 7 | 10 | |
| 154,903 | 62 | 42 | 6 | 3 | 8 | 2 | 0 | 61 | 59 | 10 | 16 | |
| 155,011 | 62 | 41 | 7 | 4 | 9 | 2 | 0 | 63 | 58 | 5 | 8 | |
| 154,053 | 62 | 34 | 5 | 2 | 5 | 3 | 0 | 49 | 57 | 6 | 12 | |
| 153,398 | 62 | 26 | 4 | 7 | 3 | 4 | 1 | 45 | 60 | 7 | 16 | |
CDS: coding regions. a Percentages were calculated using the total length of the CDS divided by the genome size. b Total number of SSRs identified in the CDS. c Percentages were calculated using the total number of SSRs in the CDS divided by the total number of SSRs in the genome.
Figure 3Comparison of three chloroplast genomes using mVISTA. Grey arrows and thick lines above the alignment indicate genes with their orientation and the position of the IRs, respectively. A cut-off of 70% identity was used for the plots, and the y-axis represents the percent identity between 50%–100%. Genome regions are color-coded as protein-coding (exon), rRNA, tRNA, and conserved noncoding sequences (CNS).
Figure 4Comparison of the borders of the LSC, SSC, and IR regions among thirteen chloroplast genomes. Ψ indicates a pseudogene. This figure is not to scale.