Literature DB >> 28800082

Complete Chloroplast Genome Sequence and Phylogenetic Analysis of the Medicinal Plant Artemisia annua.

Xiaofeng Shen1,2, Mingli Wu3,4, Baosheng Liao5, Zhixiang Liu6, Rui Bai7, Shuiming Xiao8, Xiwen Li9, Boli Zhang10,11, Jiang Xu12, Shilin Chen13.   

Abstract

The complete chloroplast genome of Artemisia annua (Asteraceae), the primary source of artemisinin, was sequenced and analyzed. The A. annua cp genome is 150,995 bp, and harbors a pair of inverted repeat regions (IRa and IRb), of 24,850 bp each that separate large (LSC, 82,988 bp) and small (SSC, 18,267 bp) single-copy regions. Our annotation revealed that the A. annua cp genome contains 113 genes and 18 duplicated genes. The gene order in the SSC region of A. annua is inverted; this fact is consistent with the sequences of chloroplast genomes from three other Artemisia species. Fifteen (15) forward and seventeen (17) inverted repeats were detected in the genome. The existence of rich SSR loci in the genome suggests opportunities for future population genetics work on this anti-malarial medicinal plant. In A. annua cpDNA, the rps19 gene was found in the LSC region rather than the IR region, and the rps19 pseudogene was absent in the IR region. Sequence divergence analysis of five Asteraceae species indicated that the most highly divergent regions were found in the intergenic spacers, and that the differences between A. annua and A. fukudo were very slight. A phylogenetic analysis revealed a sister relationship between A. annua and A. fukudo. This study identified the unique characteristics of the A. annua cp genome. These results offer valuable information for future research on Artemisia species identification and for the selective breeding of A. annua with high pharmaceutical efficacy.

Entities:  

Keywords:  Artemisia annua; chloroplast genome; phylogeny

Mesh:

Substances:

Year:  2017        PMID: 28800082      PMCID: PMC6152406          DOI: 10.3390/molecules22081330

Source DB:  PubMed          Journal:  Molecules        ISSN: 1420-3049            Impact factor:   4.411


1. Introduction

Artemisia annua, an herbaceous annual with a strong volatile aroma, belongs to the genus Artemisia (Asteraceae). It is the sole natural source of the antimalarial drug artemisinin [1], and is cultivated as a high-value medicinal plant (Qing hao). Anti-malarial artemisinin combination therapy (ACT) has received strong interest from the global health community because of the efficacy of artemisinin and its derivatives [2]. Furthermore, the 2015 Nobel Prize for Physiology or Medicine was awarded to Professor Youyou Tu for the discovery of artemisinin [3]. However, there are concerns that the production of high-quality artemisinin may not be sufficient to meet future demand [2]. A. annua has a broad, global distribution and has many distinct locally-adapted ecotypes [4]. Beyond China, A. annua is also present in Eastern Europe, North America, and elsewhere in Asia [5]. However, the artemisinin content of A. annua ecotypes varies widely from region to region [5]. With the exception of a few rare high-artemisinin ecotypes found in China, the artemisinin content in A. annua ecotypes are generally insufficient (i.e., <1%) for commercialized extraction [6], and no other species been found to be suitable for mass production of artemisinin [1,7]. Oxygen released from chloroplasts in A. annua can upregulate the expression of genes involved in artemisinin biosynthesis, and can also catalyze artemisinin synthesis from dihydroartemisinin [8,9]. In addition to their role in photosynthesis, chloroplasts are also involved in cytoplasmic male sterility (CMS) [10] and secondary metabolic activities [11]. The chloroplast (cp) genome has a conserved quadripartite structure: a large single-copy (LSC) region, a small single-copy (SSC) region, and two inverted repeat (IR) regions. The majority of angiosperm cp genomes exhibit significant conservation of gene order and contents [12]. However, large-scale genome rearrangements and intron gains and losses have been identified in several angiosperm lineages [13,14,15]. A draft cp genome assembly for A. annua is of great importance for exploring putative links between A. annua’s chloroplast function and its adaptability and phytochemical characteristics. The transcriptome sequences and genetic map of A. annua have been previously reported [16,17,18], but little is known about its cp genomic structure. Here we report the complete chloroplast genome sequence of A. annua, along with a characterization of long repeats and SSRs, and comparative analyses of the cp genome as a whole. Comparative analyses among cp genomes of other Asteraceae species revealed significant variation in genome size, highly divergent regions in intergenic spacers, as well as gene loss. Comprehensive cp genomic analyses will help to identify Artemisia species, provide insight into its evolutionary history, and improve the development of A. annua as a pharmacological resource [19,20].

2. Results and Discussion

2.1. Characteristics of A. annua cpDNA

The complete cp genome of A. annua is 150,995 bp in size, with a pair of IR regions of 24,850 bp that separate a LSC region of 82,988 bp from a SSC region of 18,267 bp (Table 1 and Figure 1). The overall GC and AT content of the A. annua cp genome is 37.5% and 62.5%, respectively, which is similar to the cp genomes of other Asteraceae spp. [21,22,23]. The IR regions possess higher GC content (43%) than do the LSC (35.5%) or SSC regions (30.8%) (Table 1). Within the protein-coding regions (CDS), the AT content of the first, second, and third codon positions, is 54.6%, 62.4%, and 70.0%, respectively (Table 1). The bias toward a higher AT representation at the third codon position has been found to be common in other plant cp genomes [15,24], and this bias is used to discriminate cpDNA from nuclear and mitochondrial DNA [25]. The coding regions constitute 52.6% of the genome, and therefore the non-coding regions—including introns, pseudogenes, and intergenic spacers—account for 47.4%.
Table 1

Base composition in the A. annua chloroplast genome.

Region T (U) (%)C (%)A (%)G (%)Length (bp)
LSC 32.417.532.118.082,988
SSC 34.216.135.014.718,267
IRA 28.520.828.322.324,850
IRB 28.322.328.520.824,850
Total 31.318.731.218.8150,955
CDS 31.617.630.720.179,335
1st position24.018.930.626.726,445
2nd position33.020.229.417.726,445
3rd position38.013.832.016.026,445

CDS: protein-coding regions.

Figure 1

Gene map of the A. annua chloroplast genome. Genes drawn inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes belonging to different functional groups are color-coded. The darker gray in the inner circle corresponds to GC content, while the lighter gray corresponds to AT content.

The A. annua cp genome encodes 113 predicted functional genes, including 80 protein-coding genes, 29 tRNA genes, and four rRNA genes (Table S1). In addition, there are 18 genes duplicated in the IR, making a total of 131 genes present in the A. annua cp genome (Figure 1). These genes have also been observed in Artemisia frigida [26]. Among these genes, seven protein-coding, seven tRNA, and all four rRNA genes are duplicated in the IR regions. The LSC region contains 62 protein-coding and 22 tRNA genes, whereas the SSC region contains one tRNA gene and 12 protein-coding genes. Based on the sequences of protein-coding and tRNA genes, the frequency of codon usage was estimated for the A. annua cp genome and is summarized in Table 2. Together, all genes in the A. annua cp genome are encoded by 26,445 codons. Among these, leucine, with 2853 (10.7%) of the codons, is the most frequent amino acid in the cp genome, and cysteine, with 293 (1.1%), is the least frequent (Table 2). A- and U-ending codons were common. Except for trnL-CAA, all types of preferred synonymous codons (RSCU > 1) ended with A or U.
Table 2

Codon-anticodon recognition patterns and codon usage of the A. annua chloroplast genome.

Amino AcidCodonNo.RSCUtRNAAmino AcidCodonNo.RSCUtRNA
PheUUU9931.32 TyrUAU8111.64
PheUUC5100.68trnF-GAATyrUAC1780.36trnY-GUA
LeuUUA8901.87 StopUAA521.77
LeuUUG5791.22trnL-CAAStopUAG210.72
LeuCUU6221.31 HisCAU4711.51
LeuCUC1980.42 HisCAC1510.49trnH-GUG
LeuCUA3680.77 GlnCAA7321.52trnQ-UUG
LeuCUG1960.41 GlnCAG2300.48
IleAUU10921.47 AsnAAU10171.56
IleAUC4330.58trnI-CAUAsnAAC2870.44
IleAUA7060.95 LysAAA10421.47
MetAUG6331.00trnM-CAULysAAG3710.53
ValGUU5121.44 AspGAU8681.61
ValGUC1740.49trnV-GACAspGAC2130.39trnD-GUC
ValGUA5461.54 GluGAA10011.50trnE-UUC
ValGUG1880.53 GluGAG3370.50
SerUCU5881.74 CysUGU2021.38
SerUCC3240.96trnS-GGACysUGC910.62trnC-GCA
SerUCA4171.23trnS-UGAStopUGA150.51
SerUCG1670.49 TrpUGG4621.00trnW-CCA
ProCCU4411.58 ArgCGU3501.33trnR-ACG
ProCCC1880.67 ArgCGC1070.41
ProCCA3291.18trnP-UGGArgCGA3431.30
ProCCG1590.57 ArgCGG1240.47
ThrACU5351.63 ArgAGA4851.84trnR-UCU
ThrACC2460.75trnT-GGUArgAGG1740.66
ThrACA4111.25trnT-UGUSerAGU4101.21
ThrACG1240.38 SerAGC1220.36trnS-GCU
AlaGCU6171.74 GlyGGU5891.32
AlaGCC2280.64 GlyGGC1890.42trnG-GCC
AlaGCA4151.17 GlyGGA7071.58
AlaGCG1580.45 GlyGGG3060.68

RSCU: Relative Synonymous Codon Usage.

In total, there are 17 intron-containing genes, 15 (nine protein-coding and six tRNA genes) of which contain one intron, and two of which (ycf3 and clpP) contain two introns (Table 3). The trnK-UUU has the largest intron (1860 bp), which itself contains the matK gene. The rps12 gene is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ ends in the IR regions. Ycf3 is required for the stable accumulation of the photosystem I complex [27,28]. The intron gain in ycf3 of A. annua may be useful for further studies of the mechanism of photosynthesis evolution, and of variation in singlet oxygen released by chloroplasts in from Artemisia.
Table 3

The length of exons and introns in genes with introns in the A. annua chloroplast genome.

GeneLocationExon I (bp)Intron I (bp)Exon II (bp)Intron II (bp)Exon III (bp)
trnK-UUULSC37186035
trnG-UCCLSC2372947
trnL-UAALSC3742450
trnV-UACLSC3857237
trnI-GAUIR4277735
trnA-UGCIR3881235
rps12 *LSC23253526 114
rps16LSC40876185
rpl16LSC91015399
rpl2IR394626470
rpoC1LSC4307341640
ndhASSC5561064539
ndhBIR777670756
ycf3SSC127700230735153
petBLSC6747642
atpFLSC145699410
clpPLSC71796292606228

* The rps12 gene is a trans-spliced gene with the 5′ end located in the LSC region and the duplicated 3′ ends in the IR regions.

Introns may contain “old code”—i.e., the part of a gene that loses its function during evolution. Several unicellular eukaryotes seem to experience selective pressures to lose introns. Therefore, the fact of intron gain and/or intron loss requires an evolutionary explanation. A common partial explanation for the range of intron densities is the random accumulation of introns in nuclear genomes over time after inheritance from an intron-poor ancestor. More experimental evidence is required to reveal whether the variation of the introns in the A. annua cp genome is related to adaptation to environmental stresses, or to facilitate artemisinin biosynthesis.

2.2. Long Repeat and SSR Analysis

For repeat structure analysis, 15 forward and 17 inverted repeats were detected in the A. annua cp genome (Table 4). Most of these repeats show lengths between 30 and 39 bp, while the ycf2 gene possesses the two longest inverted repeats at 60 bp. Two repeats relevant to psa genes (No. 4 and 5) and three forward and three inverted repeats (No. 1–3, No. 16–18) in the intergenic spacers are distributed in the LSC region. Moreover, two forward and eight inverted repeats (No. 11 and 12, No. 22–29) associated with ycf2, two forward and two inverted repeats (No. 14 and 15, No. 31 and 32) in the intergenic spacers, are distributed in the IR region.
Table 4

Long repeat sequences in the A. annua chloroplast genome.

IDRepeat Start 1TypeSize (bp)Repeat Start 2Mismatch (bp)E-ValueGeneRegion
18544F3234,909−34.65E-05IGSLSC
228,063F3129,661−31.69E-04IGSLSC
328,070F3029,666−22.18E-05IGSLSC
438,054F3240,278−21.55E-06psaB; psaALSC
538,065F3040,289−36.09E-04psaB; psaALSC
643,070F4196,883−11.63E-13ycf3 (intron); IGSLSC; IRA
743,072F39118,107−12.48E-12ycf3 (intron); ndhA (intron)LSC; SSC
843,075F3593,834−39.59E-07ycf3 (intron); ndhB (intron)LSC; IRA
966,346F3098,046−22.18E-05IGSLSC; IRA
1186,539F30147,378−36.09E-04ycf2IRA; IRB
1290,121F3090,157−15.00E-07ycf2IRA
1396,885F39118,10702.12E-14IGS; ndhA (intron)IRA; SSC
14105,777F30105,809−22.18E-05IGSIRA
15128,104F30128,136−22.18E-05IGSIRB
168548I3044,753−22.18E-05IGSLSC
1729,662I3029,881−22.18E-05IGSLSC
1834,911I3044,755−15.00E-07IGSLSC
1943,070I41137,019−11.63E-13ycf3 (intron); IGSLSC; IRB
2043,075I35140,074−39.59E-07ycf3 (intron); ndhB (intron)LSC; IRB
2166,346I30135,867−22.18E-05IGSLSC; IRB
2290,109I60143,756−27.68E-23ycf2IRA; IRB
2390,109I42143,756−22.57E-12ycf2IRA; IRB
2490,121I30143,756−15.00E-07ycf2IRA; IRB
2590,124I45143,75605.18E-18ycf2IRA; IRB
2690,127I60143,774−27.68E-23ycf2IRA; IRB
2790,142I45143,77405.18E-18ycf2IRA; IRB
2890,145I42143,792−22.57E-12ycf2IRA; IRB
2990,157I30143,792−15.00E-07ycf2IRA; IRB
30105,777I30128,104−22.18E-05IGSIRA; IRB
31105,809I30128,136−22.18E-05IGSIRA; IRB
32118,107I39137,01902.12E-14ndhA (intron); rps12 (CDS)SSC; IRB

F: Forward; I: Inverted; IGS: intergenic space; CDS: protein-coding regions.

SSRs, well-known as microsatellites, are short (1–6 bp), tandemly repeated DNA sequences that are widely distributed throughout the genome. cpSSRs, uniparental in inheritance, have been widely employed in the analysis of plant population structure, diversity, differentiation and maternity analysis [29,30,31]. Here, the distribution of SSRs was analyzed for the A. annua cp genome, and 35 SSRs, most of them distributed in LSC, were identified. These included 31 mononucletide SSRs (88.57%), two dinucleotide SSRs (5.71%), and two trinucleotide SSR (5.71%) (Table 5). Sixteen of the 35 SSR loci were found in the intergenic regions, while the other 19 SSRs were located in genes. All 31 mononucleotide SSRs belonged to the A/T type. Our results are consistent with the hypothesis that cpSSRs are generally composed of short polyadenine (polyA) or polythymine (polyT) repeats and rarely contain tandem guanine (G) or cytosine (C) repeats. Thus, these SSRs contribute to the AT richness of cp genomes. cpSSRs have been important resources for the study of economically important plants and their relatives. Furthermore, the potential of cpSSRs to offer unique insights into species identification, genetic diversity, and evolutionary processes in wild plant species is quite tremendous [32]. Our results will provide cpSSR markers that can be used to examine genetic diversity in A. annua and its relative species, and to provide an efficient means by which to select germplasm with anti-malarial pharmaceutical efficacy.
Table 5

Simple sequence repeats in the A. annua chloroplast genome.

cpSSR IDRepeat MotifLength (bp)StartEndRegionAnnotation
1(A)151532043218LSCmatK
2(A)141437083721LSC
3(A)101061216130LSC
4(T)101099449953LSC
5(A)101013,63013,639LSCrpoB
6(A)121220,82620,837LSCrpoC2
7(T)101023,02723,036LSCrpoC2
8(A)111126,28926,299LSCatpH
9(A)141428,51328,526LSCatpA
10(A)111139,31239,322LSCpsaA
11(A)101048,20648,215LSC
12(AT)61252,02852,039LSC
13(T)141453,08553,098LSCatpB
14(A)171753,30653,322LSCatpB
15(A)191954,90254,920LSCrbcL
16(A)101056,83256,841LSC
17(A)141457,92057,933LSCaccD
18(A)111159,65459,664LSCycf4
19(T)101059,77559,784LSCycf4
20(T)101064,47664,485LSC
21(T)101064,90264,911LSC
22(A)111166,25566,265LSC
23(T)101069,52569,534LSC
24(A)141470,21070,223LSC
25(T)101071,65571,664LSCpsbB
26(TA)61272,64072,651LSCpsbB
27(T)141473,21073,223LSCpsbN
28(A)151580,92980,943LSC
29(T)101081,20981,218LSC
30(T)1111101,234101,244IRA
31(GAA)515108,039108,053SSCndhF
32(TAA)515117,240117,254SSCndhI
33(T)1010118,903118,912SSC
34(A)1414121,936121,949SSCycf1
35(A)1111132,700132,710IRB

2.3. Comparative Chloroplast Genomic Analysis

The whole cp genome sequence of A. annua was compared to those of Artemisia fukudo, Lactuca sativa, Jacobaea vulgaris, and Cynara cornigera. The cp genome size of A. annua is the second smallest among the five completed Asteraceae cp genomes. It is larger than J. vulgaris (150,689 bp) (Table S2), but smaller than the cp genomes of A. fukudo, C. cornigera, and L. sativa by 56 bp, 1595 bp, 1817 bp, respectively. A. annua has the smallest SSC region (18,267 bp) among these sequenced Asteraceae cp genomes. The next smallest SSC region is from J. vulgaris, with a size of 18,276 bp. There are no significant differences in sequence length between SSC or IR, and the variation in sequence length is the main reason that there is a difference in the length of the LSC region. Comparative genome analysis [33] permits the examination of how DNA sequences diverge among related species. The whole sequence identity of the five Asteraceae cp genomes was plotted using mVISTA, with the annotated A. annua cp genome as a reference (Figure 2). The comparison shows that the two IR regions are less divergent than the LSC and SSC regions. In addition, the coding regions are more conserved than the non-coding regions, and the highly divergent regions among the five cp genomes occur in the intergenic spacers, including rnH-psbA, psbM-petN, trnC-GCA-petN, trnE-UUC-rpoB, trnY-GUA-trnE-UUC, trnV-UAC-ndhC, rbcL-accD, accD–psaI, and rpl32-trnL-UAG in LSC, as well as ndhI-ndhG and ycf1-rps15 in SSC. Similar results have been observed in other plant cp genomes [21,34]. Moreover, the most divergent coding regions are the ndhF, ycf1, and ycf2 genes in five Asteraceae cp genomes. However, there is only a very slight difference between A. annua and A. fukudo. In our study, we observed that all eight rRNA genes are highly conserved.
Figure 2

Comparison of five chloroplast genomes using mVISTA. Grey arrows and thick black lines above the alignment indicate gene orientation. Purple bars represent exons, blue bars represent UTRs, and pink bars represent non-coding sequences (CNS). The Y-scale axis represents the percent identity (shown: 50–100%). Genome regions are color-coded as either protein-coding exons, rRNAs, tRNAs, or conserved noncoding sequences (CNS).

2.4. IR Contraction and Expansion in the A. annua cp Genome

Although IRs are the most conserved regions of the cp genomes, contraction and expansion at the borders of IR regions are common evolutionary events, and are hypothesized to explain size differences between cp genomes [35,36]. Detailed comparisons of the IR-SSC and IR-LSC boundaries among four Asteraceae cp genomes (Artemisia annua, Artemisia fukudo, Artemisi frigida, and Artemisia montana) are presented in Figure 3. The IRb/SSC border is generally positioned between the ycf1 pseudogene and the ndhF gene. The ycf1 pseudogene has proven to be useful for analyzing cp genome variation in higher plants and algae [37]. The ndhF gene, related to photosynthesis, was found to be 56 bp, 58 bp, 60 bp, and 75 bp away from the IRb/SSC border, in A. montana, A. annua, A. fukudo, and A. frigida, respectively. However, some unique structural differences exist in the A. annua cp genome: the trnH gene is present at the longest distance (114 bp) from the LSC edge; the rps19 pseudogene is absent in A. annua due to the contraction of the borders of the IR regions; the rps19 gene was present in the LSC region due to the expansion of LSC. It has been reported that the rps19 gene is one of the most abundant transcripts in the chloroplast’s genome [38]. The IR/LSC boundaries are not static among the cp genome in Artemisia species, but are dynamic processes confined to conservative expansions and contractions, which is similar to what has been found in other plants [39].
Figure 3

Comparison of the borders of the LSC, SSC, and IR regions among five chloroplast genomes. Ψ: pseudogenes, /: distance from the edge.

The comparison of cp genome size among examined Asteraceae species is displayed in Table S3. The length of the IR (24,850 bp) in A. annua is 106 bp smaller than that of A. fukudo, 122 bp smaller than that of A. frigida, and 109 bp smaller than that of A. montana. These differences may be related to the loss of rps19 and rps19 pseudogenes in A. annua IR regions. However, there are no significant differences in the length of the whole cp genome among the four Asteraceae cp genomes. The cp genome of A. annua (150,955 bp) is 56 bp smaller than that of A. fukudo, 121 bp smaller than that of A. frigida, and 175 bp smaller than that of A. montana. Non-functional DNA is rapidly deleted, resulting in the failure of pseudogenes to accumulate, which is the likely cause of this variation. Pairwise cp genomic alignment between A. annua and the three Artemisia cp genomes (A. frigida, A. fukudo, and A. montana) revealed a high degree of synteny (Figures S1–S3). Previous work had reported that the cp genome of A. frigida had two inversion events in the LSC region, and at least one re-inversion event in the SSC [26]. Our results suggest that A. annua has similar sequence rearrangements. To further confirm the accuracy of the assembly and the gene order of the SSC in A. annua, four primers were designed to amplify the junctions of IRs and the LSC/SSC. These primers would create an amplicon by PCR amplification, which could then be analyzed via Sanger sequencing using the primers listed in Table S4. The inversion and re-inversion events in A. annua suggest that the SSC may be an active region for sequence rearrangements in plant cp genomes. Outside the Asteraceae [40,41], other angiosperms have been found to have an inverted SSC region, including Piper cenocladum [42], Dioscorea elephantipes, and Chloranthus spicatus [43]. Although chloroplast gene order is generally conserved in land plant genomes [44], many sequence rearrangements have been reported in cp genomes from a wide variety of different plant species, including inversions in the LSC region [45,46,47], IR contraction or expansions with inversions [48], and re-inversion in the SSC region. It has been proposed that sequence rearrangements in cp genomes are caused by intramolecular recombination events [49]. Sequence rearrangements that alter cp genome structure in related species may also provide genetic diversity information that can be used for molecular classification and evolution studies.

2.5. Phylogenetic Analysis

A. annua belongs to the tribe Anthemideae in the Asteraceae. Several studies have reported analyzes of the phylogenetic relationships within the Asteraceae based on chloroplast coding or non-coding sequences [50,51]. The availability of a completed A. annua cp genome provides us with sequence information that can be used to study the molecular evolution and phylogeny of A. annua. We performed multiple sequence alignments using 50 protein-coding genes commonly present in cp genome sequences in 20 Asteraceae species. One additional cp genome, Berberis bealei (Berberidaceae), was included as an outgroup (Figure 4). On the basis of a GTR + G + I nucleotide substitution model with 100% bootstrap values, as recommended by Jmodeltest, the ML phylogenetic results strongly supported the hypothesis that A. annua is the sister of the closely related species Artemisia fukudo. Furthermore, we hypothesized that Artemisia fukudo may have similar phytochemical properties [52].
Figure 4

ML phylogenetic tree reconstruction 20 taxa of Asteraceae clade based on concatenated sequence from 50 chloroplast protein-coding genes. The position of Artemisia annua is indicated in block letter. Berberis bealei was set as the outgroup.

3. Materials and Methods

3.1. DNA Sequencing, cp Genome Assembly, and Validation

Fresh A. annua leaves were collected from tissue cultured seedlings. Total DNA was extracted from approximately 10 g of fresh leaf tissue using the modified CTAB method [53]. The DNA concentration for each sample was estimated by measuring A260 using an ND-2000 spectrometer [54] (Nanodrop Technologies, Wilmington, DE, USA), and visual quality was assessed using agarose gel electrophoresis. Pure DNA was used to construct shotgun libraries (250 bp) according to the manufacturer’s instructions. Sequencing was performed by an Illumina Hiseq 1500 platform (San Diego, CA, USA). This resulted in approximately 100 Gb data. First, raw reads were trimmed by Fastqc. Next, we performed BLASTs between trimmed reads and reference sequences (Artemisia frigida) to extract cp-like reads [55]. Finally, the cp-like reads were used for sequence assembly with SOAPdenovo [56]. Sequence extension was executed using SSPACE [57], and gaps were filled using GapCloser [58]. To verify the assembly, the four junction regions between the IR regions and LSC/SSC were confirmed by PCR amplification and Sanger sequencing, using the primers listed in Table S4. The final cp genome of A. annua was submitted to GenBank (Accession Number: MF623173).

3.2. Gene Annotation and Sequence Analyses

The initial gene annotation was performed with CPGAVAS [59] (http://www.herbalgenomics.org/cpgavas) and further confirmation was performed using BLAST and DOGMA [60]. tRNA genes were identified by tRNAscanSE [61]. The circular cp genome map was drawn using the OGDRAWv1.2 [62] program (http://ogdraw.mpimp-golm.mpg.de/). To analyze the characteristics of variations in synonymous codon usage, relative synonymous codon usage values (RSCU), codon usage, and AT content were determined using MEGA5.2 [63].

3.3. Genome Comparison

MUMmer [64] was used to perform pairwise cp genomic alignment. The mVISTA [65] program in the Shuffle-LAGAN mode [66], was employed to compare the cp genome of A. annua with the cp genomes of Artemisia fukudo, Lactuca sativa, Jacobaea vulgaris, and Cynara cornigera (KU360270, AP007232, HQ234669 and KP842707), using the annotation of A. annua as the reference. MISA [67] was used to visualize the SSRs and REPuter [68] was used to visualize forward and inverted repeats.

3.4. Phylogenetic Analysis

A total of 19 complete cp genome sequences were downloaded from the NCBI Organelle Genome and Nucleotide Resources database. For the phylogenetic analysis, a set of 50 protein-coding genes shared in all 20 analyzed genomes was used. Genes were aligned by clustalw2 [69]. Jmodeltest 3.7 [70] was used to select the best model for ML (Maximum likelihood) analysis, and the phylogenetic tree was plotted using RAxML-HPC 2.7.6.3 on XSEDE at the CIPRES Science Gateway (http://www.phylo.org/). Bootstrap analysis was executed with 1000 replicates and TBR branch swapping. In addition, Berberis bealei was set as the outgroup.

4. Conclusions

Here we report the first complete cpDNA sequence of A. annua, an important medicinal plant. Compared to the cp genomes of three related Artemisia species, the cp genome of A. annua has the smallest size, while the genome structure and composition are similar. In addition, the cp genome of A. annua has an inverted SSC region, and is similar in that respect to most Asteraceae. However, a re-inversion event in the SSC region of the A. annua lineage suggests that the SSC might be an active region for inversion events in Asteraceae species. Repeated sequences, together with the aforementioned SSRs, are informative sources for the development of new molecular markers. Phylogenetic relationships among 20 Asteraceae species strongly supported the known taxonomic status of A. annua in Asteraceae and the sisterhood of the closely related species A. fukudo. The comprehensive data presented in this study provide insight into the evolutionary relationships between species of the genus Artemisia, and provide an assembly of a whole cp genome of A. annua, which may be useful for future breeding and further biological discoveries.
  56 in total

1.  Functional studies of Ycf3: its role in assembly of photosystem I and interactions with some of its subunits.

Authors:  H Naver; E Boudreau; J D Rochaix
Journal:  Plant Cell       Date:  2001-12       Impact factor: 11.277

2.  Automatic annotation of organellar genomes with DOGMA.

Authors:  Stacia K Wyman; Robert K Jansen; Jeffrey L Boore
Journal:  Bioinformatics       Date:  2004-06-04       Impact factor: 6.937

3.  Chloroplast simple sequence repeats (cpSSRs): technical resources and recommendations for expanding cpSSR discovery and applications to a wide array of plant species.

Authors:  Daniel Ebert; Rod Peakall
Journal:  Mol Ecol Resour       Date:  2009-01-28       Impact factor: 7.090

4.  Chloroplast DNA evidence on the ancient evolutionary split in vascular land plants.

Authors:  L A Raubeson; R K Jansen
Journal:  Science       Date:  1992-03-27       Impact factor: 47.728

5.  Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns.

Authors:  Robert K Jansen; Zhengqiu Cai; Linda A Raubeson; Henry Daniell; Claude W Depamphilis; James Leebens-Mack; Kai F Müller; Mary Guisinger-Bellian; Rosemarie C Haberle; Anne K Hansen; Timothy W Chumley; Seung-Bum Lee; Rhiannon Peery; Joel R McNeal; Jennifer V Kuehl; Jeffrey L Boore
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

Review 6.  Qinghaosu (artemisinin): an antimalarial drug from China.

Authors:  D L Klayman
Journal:  Science       Date:  1985-05-31       Impact factor: 47.728

7.  Development and characterization of 18 novel EST-SSRs from the western flower Thrips, Frankliniella occidentalis (Pergande).

Authors:  Xian-Ming Yang; Jing-Tao Sun; Xiao-Feng Xue; Wen-Chao Zhu; Xiao-Yue Hong
Journal:  Int J Mol Sci       Date:  2012-03-05       Impact factor: 6.208

8.  Differential transcriptome analysis of glandular and filamentous trichomes in Artemisia annua.

Authors:  Sandra S A Soetaert; Christophe M F Van Neste; Mado L Vandewoestyne; Steven R Head; Alain Goossens; Filip C W Van Nieuwerburgh; Dieter L D Deforce
Journal:  BMC Plant Biol       Date:  2013-12-20       Impact factor: 4.215

9.  CpGAVAS, an integrated web server for the annotation, visualization, analysis, and GenBank submission of completely sequenced chloroplast genome sequences.

Authors:  Chang Liu; Linchun Shi; Yingjie Zhu; Haimei Chen; Jianhui Zhang; Xiaohan Lin; Xiaojun Guan
Journal:  BMC Genomics       Date:  2012-12-20       Impact factor: 3.969

10.  Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus.

Authors:  Linda A Raubeson; Rhiannon Peery; Timothy W Chumley; Chris Dziubek; H Matthew Fourcade; Jeffrey L Boore; Robert K Jansen
Journal:  BMC Genomics       Date:  2007-06-15       Impact factor: 3.969

View more
  50 in total

1.  Characteristics analysis of the complete Wurfbainia villosa chloroplast genome.

Authors:  Wenli An; Jing Li; Zerui Yang; Yuying Huang; Song Huang; Xiasheng Zheng
Journal:  Physiol Mol Biol Plants       Date:  2020-03-19

2.  Structure and features of the complete chloroplast genome of Melastoma dodecandrum.

Authors:  Xiasheng Zheng; Changwei Ren; Song Huang; Jing Li; Ying Zhao
Journal:  Physiol Mol Biol Plants       Date:  2019-03-12

3.  Complete chloroplast genome features of the model heavy metal hyperaccumulator Arabis paniculata Franch and its phylogenetic relationships with other Brassicaceae species.

Authors:  Hongcheng Wang; Chenchen Gan; Xi Luo; Changyu Dong; Shijun Zhou; Qin Xiong; Qingbei Weng; Xin Hu; Xuye Du; Bin Zhu
Journal:  Physiol Mol Biol Plants       Date:  2022-04-04

4.  Sequence Characteristics and Phylogenetic Analysis of the Artemisia argyi Chloroplast Genome.

Authors:  Changjie Chen; Yuhuan Miao; Dandan Luo; Jinxin Li; Zixin Wang; Ming Luo; Tingting Zhao; Dahui Liu
Journal:  Front Plant Sci       Date:  2022-06-20       Impact factor: 6.627

5.  Comparative Analysis of the Complete Chloroplast Genomes of Eight Ficus Species and Insights into the Phylogenetic Relationships of Ficus.

Authors:  Xi Xia; Jingyu Peng; Lin Yang; Xueli Zhao; Anan Duan; Dawei Wang
Journal:  Life (Basel)       Date:  2022-06-07

6.  Complete chloroplast genomes of Achnatherum inebrians and comparative analyses with related species from Poaceae.

Authors:  Xuekai Wei; Xiuzhang Li; Taixiang Chen; Zhenjiang Chen; Yuanyuan Jin; Kamran Malik; Chunjie Li
Journal:  FEBS Open Bio       Date:  2021-05-10       Impact factor: 2.693

7.  The Complete Chloroplast Genome Sequence of Tree of Heaven (Ailanthus altissima (Mill.) (Sapindales: Simaroubaceae), an Important Pantropical Tree.

Authors:  Josphat K Saina; Zhi-Zhong Li; Andrew W Gichira; Yi-Ying Liao
Journal:  Int J Mol Sci       Date:  2018-03-21       Impact factor: 5.923

8.  The Complete Chloroplast Genomes of Two Lancea Species with Comparative Analysis.

Authors:  Xiaofeng Chi; Jiuli Wang; Qingbo Gao; Faqi Zhang; Shilong Chen
Journal:  Molecules       Date:  2018-03-07       Impact factor: 4.411

9.  Sequencing, Characterization, and Comparative Analyses of the Plastome of Caragana rosea var. rosea.

Authors:  Mei Jiang; Haimei Chen; Shuaibing He; Liqiang Wang; Amanda Juan Chen; Chang Liu
Journal:  Int J Mol Sci       Date:  2018-05-09       Impact factor: 5.923

10.  Chloroplast Genome of the Folk Medicine and Vegetable Plant Talinum paniculatum (Jacq.) Gaertn.: Gene Organization, Comparative and Phylogenetic Analysis.

Authors:  Xia Liu; Yuan Li; Hongyuan Yang; Boyang Zhou
Journal:  Molecules       Date:  2018-04-09       Impact factor: 4.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.