| Literature DB >> 29642545 |
Xia Liu1, Yuan Li2, Hongyuan Yang3, Boyang Zhou4.
Abstract
The complete chloroplast (cp) genome of Talinum paniculatum (Caryophyllale), a source of pharmaceutical efficacy similar to ginseng, and a widely distributed and planted edible vegetable, were sequenced and analyzed. The cp genome size of T. paniculatum is 156,929 bp, with a pair of inverted repeats (IRs) of 25,751 bp separated by a large single copy (LSC) region of 86,898 bp and a small single copy (SSC) region of 18,529 bp. The genome contains 83 protein-coding genes, 37 transfer RNA (tRNA) genes, eight ribosomal RNA (rRNA) genes and four pseudogenes. Fifty one (51) repeat units and ninety two (92) simple sequence repeats (SSRs) were found in the genome. The pseudogene rpl23 (Ribosomal protein L23) was insert AATT than other Caryophyllale species by sequence alignment, which located in IRs region. The gene of trnK-UUU (tRNA-Lys) and rpl16 (Ribosomal protein L16) have larger introns in T. paniculatum, and the existence of matK (maturase K) genes, which usually located in the introns of trnK-UUU, rich sequence divergence in Caryophyllale. Complete cp genome comparison with other eight Caryophyllales species indicated that the differences between T. paniculatum and P. oleracea were very slight, and the most highly divergent regions occurred in intergenic spacers. Comparisons of IR boundaries among nine Caryophyllales species showed that T. paniculatum have larger IRs region and the contraction is relatively slight. The phylogenetic analysis among 35 Caryophyllales species and two outgroup species revealed that T. paniculatum and P. oleracea do not belong to the same family. All these results give good opportunities for future identification, barcoding of Talinum species, understanding the evolutionary mode of Caryophyllale cp genome and molecular breeding of T. paniculatum with high pharmaceutical efficacy.Entities:
Keywords: Talinum paniculatum; chloroplast genome; medicinal plant; phylogeny
Mesh:
Substances:
Year: 2018 PMID: 29642545 PMCID: PMC6017404 DOI: 10.3390/molecules23040857
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1The complete chloroplast genome map of Talinum paniculatum (Jacp) Gaertn. Genes are color-coded based on functional group. Genes drawn inside the circle are transcribed clockwise, and those outside are transcribed counterclockwise. The genome orientation are the orange arrow.
Chloroplast genome composition of Talinum paniculatum (Jacp) Gaertn.
| Region | Size (bp) | T(U) (%) | C (%) | A (%) | G (%) | Genes | Protein-Coding Genes | tRNA Genes | rRNA Genes |
|---|---|---|---|---|---|---|---|---|---|
| LSC | 86,898 | 33.2 | 17.8 | 32.2 | 16.8 | 83 | 56 | 22 | 0 |
| SSC | 18,529 | 34.6 | 15.9 | 34.9 | 14.6 | 12 | 12 | 1 | 0 |
| IRA | 25,751 | 29.0 | 22.3 | 28.2 | 20.4 | 19 | 7 | 7 | 4 |
| IRB | 25,751 | 28.2 | 20.4 | 29.0 | 22.3 | 19 | 8 | 7 | 4 |
| Total | 156,929 | 31.8 | 19.0 | 30.7 | 18.5 | 128 | 83 (4) | 37 (7) | 8 (4) |
| CDS | 78,438 | 31.5 | 17.6 | 30.8 | 20.1 | ||||
| 1st position | 26,146 | 23.7 | 18.8 | 30.7 | 26.7 | ||||
| 2nd position | 26,146 | 32.7 | 20.2 | 29.4 | 17.8 | ||||
| 3rd position | 26,146 | 38.1 | 13.7 | 32.4 | 15.7 |
CDS: protein-coding regions. The numbers in brackets represent the number of repeated genes.
Genes of the Talinum paniculatum (Jacp) Gaertn.
| Group of Genes | Name of Gene | No. |
|---|---|---|
| Photosytem I | 5 | |
| Photosytem II | 15 | |
| Cytochrome b/f complex | 6 | |
| ATP system | 6 | |
| NADH dehydrogenase | 12 | |
| RuBisCO large subnit | 1 | |
| RNA polymerase | 4 | |
| Ribosomal proteins (SSU) | 14 | |
| Ribosomal proteins (LSU) | 9 | |
| Miscellaneous proteins | 6 | |
| Hypothetical chloroplast reading frames (ycf) | 5 | |
| Transfer RNAs | 37 | |
| Ribosomal RNAs | 8 | |
| Pseudogene | 4 | |
| Total | 132 |
* indicates a duplicated gene.
The intron-containing genes in the T. paniculatum cp genome and the lengths of the exons and introns.
| No. | Gene | Location | ExonI (bp) | IntronI (bp) | ExonII (bp) | IntronII (bp) | ExonIII (bp) |
|---|---|---|---|---|---|---|---|
| 1 | LSC | 35 | 2502 | 37 | |||
| 2 | LSC | 202 | 867 | 41 | |||
| 3 | LSC | 23 | 707 | 48 | |||
| 4 | LSC | 410 | 745 | 145 | |||
| 5 | LSC | 1611 | 794 | 432 | |||
| 6 | LSC | 153 | 773 | 229 | 769 | 125 | |
| 7 | LSC | 37 | 599 | 50 | |||
| 8 | LSC | 35 | 586 | 38 | |||
| 9 | LSC | 229 | 590 | 291 | 894 | 71 | |
| 10 | LSC | 6 | 768 | 642 | |||
| 11 | LSC | 8 | 792 | 475 | |||
| 12 | LSC | 399 | 1102 | 9 | |||
| 13 | IR | 756 | 668 | 777 | |||
| 14 | IR | 114 | - | 232 | 533 | 26 | |
| 15 | IR | 37 | 947 | 35 | |||
| 16 | IR | 38 | 818 | 35 | |||
| 17 | SSC | 539 | 1087 | 553 |
* indicates duplicated gene.
Codon usage in the T. paniculatum cp genome.
| Amino Acid | Codon | Count | RSCU | Amino Acid | Codon | Count | RSCU | Amino Acid | Codon | Count | RSCU | Amino Acid | Codon | Count | RSCU |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Phe | UUU(F) | 975 | 1.3 | Ser | UCU(S) | 556 | 1.67 | Tyr | UAU(Y) | 779 | 1.61 | Stop | UGA(*) | 20 | 0.72 |
| Phe | UUC(F) | 527 | 0.7 | Ser | UCC(S) | 331 | 0.99 | Tyr | UAC(Y) | 190 | 0.39 | Trp | UGG(W) | 460 | 1 |
| Leu | UUA(L) | 854 | 1.84 | Ser | UCA(S) | 408 | 1.22 | Stop | UAA(*) | 46 | 1.66 | Ala | GCU(A) | 615 | 1.75 |
| Leu | UUG(L) | 554 | 1.2 | Ser | UCG(S) | 187 | 0.56 | Stop | UAG(*) | 17 | 0.61 | Ala | GCC(A) | 234 | 0.66 |
| Leu | CUU(L) | 601 | 1.3 | Ser | AGU(S) | 393 | 1.18 | His | CAU(H) | 448 | 1.49 | Ala | GCA(A) | 416 | 1.18 |
| Leu | CUC(L) | 179 | 0.39 | Ser | AGC(S) | 127 | 0.38 | His | CAC(H) | 155 | 0.51 | Ala | GCG(A) | 144 | 0.41 |
| Leu | CUA(L) | 408 | 0.88 | Pro | CCU(P) | 426 | 1.6 | Gln | CAA(Q) | 729 | 1.54 | Arg | CGU(R) | 360 | 1.36 |
| Leu | CUG(L) | 184 | 0.4 | Pro | CCC(P) | 200 | 0.75 | Gln | CAG(Q) | 216 | 0.46 | Arg | CGC(R) | 92 | 0.35 |
| Ile | AUU(I) | 1130 | 1.5 | Pro | CCA(P) | 299 | 1.12 | Asn | AAU(N) | 960 | 1.53 | Arg | CGA(R) | 366 | 1.38 |
| Ile | AUC(I) | 410 | 0.54 | Pro | CCG(P) | 143 | 0.54 | Asn | AAC(N) | 297 | 0.47 | Arg | CGG(R) | 120 | 0.45 |
| Ile | AUA(I) | 718 | 0.95 | Thr | ACU(T) | 531 | 1.61 | Lys | AAA(K) | 1061 | 1.5 | Arg | AGA(R) | 475 | 1.79 |
| Met | AUG(M) | 607 | 1 | Thr | ACC(T) | 250 | 0.76 | Lys | AAG(K) | 357 | 0.5 | Arg | AGG(R) | 175 | 0.66 |
| Val | GUU(V) | 514 | 1.48 | Thr | ACA(T) | 402 | 1.22 | Asp | GAU(D) | 883 | 1.66 | Gly | GGU(G) | 552 | 1.26 |
| Val | GUC(V) | 160 | 0.46 | Thr | ACG(T) | 134 | 0.41 | Asp | GAC(D) | 184 | 0.34 | Gly | GGC(G) | 190 | 0.43 |
| Val | GUA(V) | 522 | 1.5 | Cys | UGU(C) | 235 | 1.55 | Glu | GAA(E) | 1041 | 1.53 | Gly | GGA(G) | 712 | 1.62 |
| Val | GUG(V) | 194 | 0.56 | Cys | UGC(C) | 68 | 0.45 | Glu | GAG(E) | 323 | 0.47 | Gly | GGG(G) | 302 | 0.69 |
RSCU: Relative synonymous codon usage. RSCU > 1 are highlighted in bold. * indicates stop codon.
Repeat sequences distribution in the T. paniculatum (Jacp) Gaertn chloroplast genome.
| No. | Size (bp) | Type | Repeat 1 Start | Repeat 1 Location | Repeat 2 Start | Repeat 2 Location | Location |
|---|---|---|---|---|---|---|---|
| 1 | 35 | F | 3144 | matK | 6441 | IGS ( | LSC |
| 2 | 30 | P | 4153 | IGS ( | 4211 | IGS ( | LSC |
| 3 | 30 | R | 4578 | IGS ( | 4581 | IGS ( | LSC |
| 4 | 30 | C | 4581 | IGS ( | 4582 | IGS ( | LSC |
| 5 | 30 | R | 6862 | IGS ( | 8268 | IGS ( | LSC |
| 6 | 30 | F | 7743 | IGS ( | 7770 | IGS ( | LSC |
| 7 | 32 | F | 7896 | IGS ( | 36,017 | IGS ( | LSC |
| 8 | 30 | P | 7898 | IGS ( | 46,268 | LSC | |
| 9 | 37 | R | 8258 | IGS ( | 8261 | IGS ( | LSC |
| 10 | 37 | R | 8258 | IGS ( | 8264 | IGS ( | LSC |
| 11 | 35 | F | 8258 | IGS ( | 8277 | IGS ( | LSC |
| 12 | 35 | R | 8266 | IGS ( | 8277 | IGS (trnS-GCU, | LSC |
| 13 | 34 | R | 8258 | IGS ( | 8261 | IGS (trnS-GCU, | LSC |
| 14 | 33 | F | 8261 | IGS ( | 8283 | IGS ( | LSC |
| 15 | 32 | R | 8263 | IGS ( | 8280 | IGS ( | LSC |
| 16 | 31 | F | 8261 | IGS ( | 8264 | IGS ( | LSC |
| 17 | 31 | R | 8261 | IGS ( | 8280 | IGS ( | LSC |
| 18 | 31 | R | 8267 | IGS ( | 29,873 | IGS ( | LSC |
| 19 | 30 | P | 8267 | IGS ( | 62,668 | IGS ( | LSC |
| 20 | 30 | P | 8280 | IGS ( | 31,428 | IGS ( | LSC |
| 21 | 31 | F | 9566 | 37,057 | LSC | ||
| 22 | 30 | P | 36,019 | IGS ( | 46,268 | LSC | |
| 23 | 30 | F | 39,314 | 41,538 | LSC | ||
| 24 | 42 | F | 44,540 | 123,558 | LSC, SSC | ||
| 25 | 39 | F | 44,543 | 100,738 | IGS ( | LSC, IRb | |
| 26 | 39 | P | 44,543 | 143,050 | IGS ( | LSC, IRa | |
| 27 | 30 | F | 44,555 | 100,750 | IGS ( | LSC, IRb | |
| 28 | 30 | P | 44,555 | 143,047 | IGS ( | LSC, IRa | |
| 29 | 40 | P | 76,849 | IGS ( | 76,849 | IGS ( | LSC |
| 30 | 30 | P | 84,344 | IGS ( | 84,346 | IGS ( | LSC |
| 31 | 61 | F | 93,517 | 93,535 | IRb | ||
| 32 | 61 | P | 93,517 | 150,231 | IRb, IRa | ||
| 33 | 61 | P | 93,535 | 150,249 | IRb, IRa | ||
| 34 | 61 | F | 150,231 | 150,249 | IRa | ||
| 35 | 52 | F | 93,526 | 93,544 | IRb | ||
| 36 | 52 | P | 93,526 | 150,231 | IRb, IRa | ||
| 37 | 52 | P | 93,544 | 150,249 | IRb, IRa | ||
| 38 | 34 | F | 93,526 | 93,562 | IRb | ||
| 39 | 34 | P | 93,526 | 150,231 | IRb, IRa | ||
| 40 | 34 | P | 93,562 | 150,267 | IRb, IRa | ||
| 41 | 43 | F | 93,517 | 93,533 | IRb | ||
| 42 | 43 | P | 93,517 | 150,231 | IRb, IRa | ||
| 43 | 43 | P | 93,553 | 150,267 | IRb, IRa | ||
| 44 | 43 | F | 150,231 | 150,267 | IRa | ||
| 45 | 40 | F | 100,738 | IGS ( | 123,561 | IRb, SSC | |
| 46 | 34 | F | 109,506 | IGS ( | 109,538 | IGS ( | IRb |
| 47 | 34 | F | 109,506 | IGS ( | 134,255 | IGS ( | IRb, IRa |
| 48 | 34 | P | 109,538 | IGS ( | 134,287 | IGS ( | IRb, IRa |
| 49 | 34 | P | 134,255 | IGS ( | 134,287 | IGS ( | IRa |
| 50 | 38 | P | 118,646 | IGS ( | 118,646 | IGS ( | SSC |
| 51 | 40 | P | 123,561 | 143,049 | IGS( | SSC, IRa |
F: forward repeat; P: palindrome (inverted) repeat; R: reverse repeat; C: complement repeat. IGS: intergenic spacer.
Figure 2Repeat sequences in seven chloroplast genomes of Caryophyllales. REPuter was used to identify repeat sequences with length ≥30 bp and sequence identify ≥90% in the chloroplast genomes. F, P, R and C indicate the repeat types forward, palindrome, reverse and complement, respectively.
Frequency of simple sequence repeats in the T. paniculatum chloroplast genome.
| Length Unit | 10 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | Total |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | 15 | 15 | 3 | 8 | 1 | 1 | 3 | 1 | 1 | 33 | |||
| T | 14 | 14 | 4 | 7 | 3 | 2 | 1 | 1 | 1 | 33 | |||
| C | 1 | 1 | 1 | ||||||||||
| G | 1 | 1 | 1 | ||||||||||
| AG | 1 | 1 | 1 | ||||||||||
| AT | 4 | 4 | 3 | 7 | |||||||||
| TA | 2 | 2 | 1 | 1 | 4 | ||||||||
| AAT | 1 | 1 | 1 | ||||||||||
| ATA | 1 | 1 | |||||||||||
| TTA | 2 | 2 | |||||||||||
| TAT | 1 | 1 | |||||||||||
| AGGT | 1 | 1 | |||||||||||
| ATGG | 1 | 1 | |||||||||||
| AATT | 1 | 1 | |||||||||||
| CTAC | 1 | 1 | |||||||||||
| TTTC | 1 | 1 | |||||||||||
| TAAT | 1 | 1 | |||||||||||
| GGAA | 1 | 1 |
Figure 3Distribution of SSRs present in seven chloroplast genomes of Caryophyllales.
Simple sequence repeats in the CDSs of the T. paniculatum chloroplast genome.
| No. | Type | Motif | Size | Start | End | Location | Region |
|---|---|---|---|---|---|---|---|
| 1 | P1 | (A)10 | 10 | 47 | 56 | LSC | |
| 2 | P1 | (A)10 | 10 | 637 | 646 | LSC | |
| 3 | P1 | (A)11 | 11 | 2104 | 2114 | LSC | |
| 4 | P1 | (A)12 | 12 | 3942 | 3953 | LSC | |
| 5 | P2 | (AT)5 | 10 | 755 | 764 | LSC | |
| 6 | P4 | (AATT)3 | 12 | 3974 | 3985 | LSC | |
| 7 | P4 | (CCAT)3 | 12 | 54 | 65 | LSC |
Figure 4Sequence alignment of matK among nine species cp genome in Caryophyllales. As the matK gene are too long, only the sequences with greater variation were shown here.
Figure 5The gene of rpl23 sequence alignment among six Caryophyllales species.
Figure 6Complete chloroplast genome sequence comparison of eight species using mVISTA, with T. paniculatum as a reference. The horizontal axis corresponds to the coordinates within the chloroplast genome. The vertical scale represents the identity percentage. The grey lines and the arrows show the genes with their orientation and position. CNS: conserved noncoding sequences.
Figure 7Comparison of the borders of the LSC, SSC and IR regions in nine Caryophyllales species. #indicates that the gene is a pseudogene.
Figure 8Phylogenetic tree of the 35 species in Caryophyllales using maximum parsimony (MP) and tree bisection-reconnection (TBR) analysis based on 48 protein-coding genes using a non-partitioning scheme. The phylogenetic tree was drawn using Cistanche deserticola and Rehmannia chingiis as outgroup.