| Literature DB >> 28800082 |
Xiaofeng Shen1,2, Mingli Wu3,4, Baosheng Liao5, Zhixiang Liu6, Rui Bai7, Shuiming Xiao8, Xiwen Li9, Boli Zhang10,11, Jiang Xu12, Shilin Chen13.
Abstract
The complete chloroplast genome of Artemisia annua (Asteraceae), the primary source of artemisinin, was sequenced and analyzed. The A. annua cp genome is 150,995 bp, and harbors a pair of inverted repeat regions (IRa and IRb), of 24,850 bp each that separate large (LSC, 82,988 bp) and small (SSC, 18,267 bp) single-copy regions. Our annotation revealed that the A. annua cp genome contains 113 genes and 18 duplicated genes. The gene order in the SSC region of A. annua is inverted; this fact is consistent with the sequences of chloroplast genomes from three other Artemisia species. Fifteen (15) forward and seventeen (17) inverted repeats were detected in the genome. The existence of rich SSR loci in the genome suggests opportunities for future population genetics work on this anti-malarial medicinal plant. In A. annua cpDNA, the rps19 gene was found in the LSC region rather than the IR region, and the rps19 pseudogene was absent in the IR region. Sequence divergence analysis of five Asteraceae species indicated that the most highly divergent regions were found in the intergenic spacers, and that the differences between A. annua and A. fukudo were very slight. A phylogenetic analysis revealed a sister relationship between A. annua and A. fukudo. This study identified the unique characteristics of the A. annua cp genome. These results offer valuable information for future research on Artemisia species identification and for the selective breeding of A. annua with high pharmaceutical efficacy.Entities:
Keywords: Artemisia annua; chloroplast genome; phylogeny
Mesh:
Substances:
Year: 2017 PMID: 28800082 PMCID: PMC6152406 DOI: 10.3390/molecules22081330
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Base composition in the A. annua chloroplast genome.
| Region | T (U) (%) | C (%) | A (%) | G (%) | Length (bp) | |
|---|---|---|---|---|---|---|
| LSC | 32.4 | 17.5 | 32.1 | 18.0 | 82,988 | |
| SSC | 34.2 | 16.1 | 35.0 | 14.7 | 18,267 | |
| IRA | 28.5 | 20.8 | 28.3 | 22.3 | 24,850 | |
| IRB | 28.3 | 22.3 | 28.5 | 20.8 | 24,850 | |
| Total | 31.3 | 18.7 | 31.2 | 18.8 | 150,955 | |
| CDS | 31.6 | 17.6 | 30.7 | 20.1 | 79,335 | |
| 1st position | 24.0 | 18.9 | 30.6 | 26.7 | 26,445 | |
| 2nd position | 33.0 | 20.2 | 29.4 | 17.7 | 26,445 | |
| 3rd position | 38.0 | 13.8 | 32.0 | 16.0 | 26,445 |
CDS: protein-coding regions.
Figure 1Gene map of the A. annua chloroplast genome. Genes drawn inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes belonging to different functional groups are color-coded. The darker gray in the inner circle corresponds to GC content, while the lighter gray corresponds to AT content.
Codon-anticodon recognition patterns and codon usage of the A. annua chloroplast genome.
| Amino Acid | Codon | No. | RSCU | tRNA | Amino Acid | Codon | No. | RSCU | tRNA |
|---|---|---|---|---|---|---|---|---|---|
| Phe | UUU | 993 | 1.32 | Tyr | UAU | 811 | 1.64 | ||
| Phe | UUC | 510 | 0.68 | Tyr | UAC | 178 | 0.36 | ||
| Leu | UUA | 890 | 1.87 | Stop | UAA | 52 | 1.77 | ||
| Leu | UUG | 579 | 1.22 | Stop | UAG | 21 | 0.72 | ||
| Leu | CUU | 622 | 1.31 | His | CAU | 471 | 1.51 | ||
| Leu | CUC | 198 | 0.42 | His | CAC | 151 | 0.49 | ||
| Leu | CUA | 368 | 0.77 | Gln | CAA | 732 | 1.52 | ||
| Leu | CUG | 196 | 0.41 | Gln | CAG | 230 | 0.48 | ||
| Ile | AUU | 1092 | 1.47 | Asn | AAU | 1017 | 1.56 | ||
| Ile | AUC | 433 | 0.58 | Asn | AAC | 287 | 0.44 | ||
| Ile | AUA | 706 | 0.95 | Lys | AAA | 1042 | 1.47 | ||
| Met | AUG | 633 | 1.00 | Lys | AAG | 371 | 0.53 | ||
| Val | GUU | 512 | 1.44 | Asp | GAU | 868 | 1.61 | ||
| Val | GUC | 174 | 0.49 | Asp | GAC | 213 | 0.39 | ||
| Val | GUA | 546 | 1.54 | Glu | GAA | 1001 | 1.50 | ||
| Val | GUG | 188 | 0.53 | Glu | GAG | 337 | 0.50 | ||
| Ser | UCU | 588 | 1.74 | Cys | UGU | 202 | 1.38 | ||
| Ser | UCC | 324 | 0.96 | Cys | UGC | 91 | 0.62 | ||
| Ser | UCA | 417 | 1.23 | Stop | UGA | 15 | 0.51 | ||
| Ser | UCG | 167 | 0.49 | Trp | UGG | 462 | 1.00 | ||
| Pro | CCU | 441 | 1.58 | Arg | CGU | 350 | 1.33 | ||
| Pro | CCC | 188 | 0.67 | Arg | CGC | 107 | 0.41 | ||
| Pro | CCA | 329 | 1.18 | Arg | CGA | 343 | 1.30 | ||
| Pro | CCG | 159 | 0.57 | Arg | CGG | 124 | 0.47 | ||
| Thr | ACU | 535 | 1.63 | Arg | AGA | 485 | 1.84 | ||
| Thr | ACC | 246 | 0.75 | Arg | AGG | 174 | 0.66 | ||
| Thr | ACA | 411 | 1.25 | Ser | AGU | 410 | 1.21 | ||
| Thr | ACG | 124 | 0.38 | Ser | AGC | 122 | 0.36 | ||
| Ala | GCU | 617 | 1.74 | Gly | GGU | 589 | 1.32 | ||
| Ala | GCC | 228 | 0.64 | Gly | GGC | 189 | 0.42 | ||
| Ala | GCA | 415 | 1.17 | Gly | GGA | 707 | 1.58 | ||
| Ala | GCG | 158 | 0.45 | Gly | GGG | 306 | 0.68 |
RSCU: Relative Synonymous Codon Usage.
The length of exons and introns in genes with introns in the A. annua chloroplast genome.
| Gene | Location | Exon I (bp) | Intron I (bp) | Exon II (bp) | Intron II (bp) | Exon III (bp) |
|---|---|---|---|---|---|---|
| LSC | 37 | 1860 | 35 | |||
| LSC | 23 | 729 | 47 | |||
| LSC | 37 | 424 | 50 | |||
| LSC | 38 | 572 | 37 | |||
| IR | 42 | 777 | 35 | |||
| IR | 38 | 812 | 35 | |||
| LSC | 232 | 535 | 26 | 114 | ||
| LSC | 40 | 876 | 185 | |||
| LSC | 9 | 1015 | 399 | |||
| IR | 394 | 626 | 470 | |||
| LSC | 430 | 734 | 1640 | |||
| SSC | 556 | 1064 | 539 | |||
| IR | 777 | 670 | 756 | |||
| SSC | 127 | 700 | 230 | 735 | 153 | |
| LSC | 6 | 747 | 642 | |||
| LSC | 145 | 699 | 410 | |||
| LSC | 71 | 796 | 292 | 606 | 228 |
* The rps12 gene is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ ends in the IR regions.
Long repeat sequences in the A. annua chloroplast genome.
| ID | Repeat Start 1 | Type | Size (bp) | Repeat Start 2 | Mismatch (bp) | E-Value | Gene | Region |
|---|---|---|---|---|---|---|---|---|
| 1 | 8544 | F | 32 | 34,909 | −3 | 4.65E-05 | IGS | LSC |
| 2 | 28,063 | F | 31 | 29,661 | −3 | 1.69E-04 | IGS | LSC |
| 3 | 28,070 | F | 30 | 29,666 | −2 | 2.18E-05 | IGS | LSC |
| 4 | 38,054 | F | 32 | 40,278 | −2 | 1.55E-06 | LSC | |
| 5 | 38,065 | F | 30 | 40,289 | −3 | 6.09E-04 | LSC | |
| 6 | 43,070 | F | 41 | 96,883 | −1 | 1.63E-13 | LSC; IRA | |
| 7 | 43,072 | F | 39 | 118,107 | −1 | 2.48E-12 | LSC; SSC | |
| 8 | 43,075 | F | 35 | 93,834 | −3 | 9.59E-07 | LSC; IRA | |
| 9 | 66,346 | F | 30 | 98,046 | −2 | 2.18E-05 | IGS | LSC; IRA |
| 11 | 86,539 | F | 30 | 147,378 | −3 | 6.09E-04 | IRA; IRB | |
| 12 | 90,121 | F | 30 | 90,157 | −1 | 5.00E-07 | IRA | |
| 13 | 96,885 | F | 39 | 118,107 | 0 | 2.12E-14 | IGS; | IRA; SSC |
| 14 | 105,777 | F | 30 | 105,809 | −2 | 2.18E-05 | IGS | IRA |
| 15 | 128,104 | F | 30 | 128,136 | −2 | 2.18E-05 | IGS | IRB |
| 16 | 8548 | I | 30 | 44,753 | −2 | 2.18E-05 | IGS | LSC |
| 17 | 29,662 | I | 30 | 29,881 | −2 | 2.18E-05 | IGS | LSC |
| 18 | 34,911 | I | 30 | 44,755 | −1 | 5.00E-07 | IGS | LSC |
| 19 | 43,070 | I | 41 | 137,019 | −1 | 1.63E-13 | LSC; IRB | |
| 20 | 43,075 | I | 35 | 140,074 | −3 | 9.59E-07 | LSC; IRB | |
| 21 | 66,346 | I | 30 | 135,867 | −2 | 2.18E-05 | IGS | LSC; IRB |
| 22 | 90,109 | I | 60 | 143,756 | −2 | 7.68E-23 | IRA; IRB | |
| 23 | 90,109 | I | 42 | 143,756 | −2 | 2.57E-12 | IRA; IRB | |
| 24 | 90,121 | I | 30 | 143,756 | −1 | 5.00E-07 | IRA; IRB | |
| 25 | 90,124 | I | 45 | 143,756 | 0 | 5.18E-18 | IRA; IRB | |
| 26 | 90,127 | I | 60 | 143,774 | −2 | 7.68E-23 | IRA; IRB | |
| 27 | 90,142 | I | 45 | 143,774 | 0 | 5.18E-18 | IRA; IRB | |
| 28 | 90,145 | I | 42 | 143,792 | −2 | 2.57E-12 | IRA; IRB | |
| 29 | 90,157 | I | 30 | 143,792 | −1 | 5.00E-07 | IRA; IRB | |
| 30 | 105,777 | I | 30 | 128,104 | −2 | 2.18E-05 | IGS | IRA; IRB |
| 31 | 105,809 | I | 30 | 128,136 | −2 | 2.18E-05 | IGS | IRA; IRB |
| 32 | 118,107 | I | 39 | 137,019 | 0 | 2.12E-14 | SSC; IRB |
F: Forward; I: Inverted; IGS: intergenic space; CDS: protein-coding regions.
Simple sequence repeats in the A. annua chloroplast genome.
| cpSSR ID | Repeat Motif | Length (bp) | Start | End | Region | Annotation |
|---|---|---|---|---|---|---|
| 1 | (A)15 | 15 | 3204 | 3218 | LSC | |
| 2 | (A)14 | 14 | 3708 | 3721 | LSC | |
| 3 | (A)10 | 10 | 6121 | 6130 | LSC | |
| 4 | (T)10 | 10 | 9944 | 9953 | LSC | |
| 5 | (A)10 | 10 | 13,630 | 13,639 | LSC | |
| 6 | (A)12 | 12 | 20,826 | 20,837 | LSC | |
| 7 | (T)10 | 10 | 23,027 | 23,036 | LSC | |
| 8 | (A)11 | 11 | 26,289 | 26,299 | LSC | |
| 9 | (A)14 | 14 | 28,513 | 28,526 | LSC | |
| 10 | (A)11 | 11 | 39,312 | 39,322 | LSC | |
| 11 | (A)10 | 10 | 48,206 | 48,215 | LSC | |
| 12 | (AT)6 | 12 | 52,028 | 52,039 | LSC | |
| 13 | (T)14 | 14 | 53,085 | 53,098 | LSC | |
| 14 | (A)17 | 17 | 53,306 | 53,322 | LSC | |
| 15 | (A)19 | 19 | 54,902 | 54,920 | LSC | |
| 16 | (A)10 | 10 | 56,832 | 56,841 | LSC | |
| 17 | (A)14 | 14 | 57,920 | 57,933 | LSC | |
| 18 | (A)11 | 11 | 59,654 | 59,664 | LSC | |
| 19 | (T)10 | 10 | 59,775 | 59,784 | LSC | |
| 20 | (T)10 | 10 | 64,476 | 64,485 | LSC | |
| 21 | (T)10 | 10 | 64,902 | 64,911 | LSC | |
| 22 | (A)11 | 11 | 66,255 | 66,265 | LSC | |
| 23 | (T)10 | 10 | 69,525 | 69,534 | LSC | |
| 24 | (A)14 | 14 | 70,210 | 70,223 | LSC | |
| 25 | (T)10 | 10 | 71,655 | 71,664 | LSC | |
| 26 | (TA)6 | 12 | 72,640 | 72,651 | LSC | |
| 27 | (T)14 | 14 | 73,210 | 73,223 | LSC | |
| 28 | (A)15 | 15 | 80,929 | 80,943 | LSC | |
| 29 | (T)10 | 10 | 81,209 | 81,218 | LSC | |
| 30 | (T)11 | 11 | 101,234 | 101,244 | IRA | |
| 31 | (GAA)5 | 15 | 108,039 | 108,053 | SSC | |
| 32 | (TAA)5 | 15 | 117,240 | 117,254 | SSC | |
| 33 | (T)10 | 10 | 118,903 | 118,912 | SSC | |
| 34 | (A)14 | 14 | 121,936 | 121,949 | SSC | |
| 35 | (A)11 | 11 | 132,700 | 132,710 | IRB |
Figure 2Comparison of five chloroplast genomes using mVISTA. Grey arrows and thick black lines above the alignment indicate gene orientation. Purple bars represent exons, blue bars represent UTRs, and pink bars represent non-coding sequences (CNS). The Y-scale axis represents the percent identity (shown: 50–100%). Genome regions are color-coded as either protein-coding exons, rRNAs, tRNAs, or conserved noncoding sequences (CNS).
Figure 3Comparison of the borders of the LSC, SSC, and IR regions among five chloroplast genomes. Ψ: pseudogenes, /: distance from the edge.
Figure 4ML phylogenetic tree reconstruction 20 taxa of Asteraceae clade based on concatenated sequence from 50 chloroplast protein-coding genes. The position of Artemisia annua is indicated in block letter. Berberis bealei was set as the outgroup.