Literature DB >> 29874832

Sequencing and Analysis of Chrysanthemum carinatum Schousb and Kalimeris indica. The Complete Chloroplast Genomes Reveal Two Inversions and rbcL as Barcoding of the Vegetable.

Xia Liu1, Boyang Zhou2, Hongyuan Yang3, Yuan Li4, Qian Yang5, Yuzhuo Lu6, Yu Gao7.   

Abstract

Chrysanthemum carinatum Schousb and Kalimeris indica are widely distributed edible vegetables and the sources of the Chinese medicine Asteraceae. The complete chloroplast (cp) genome of Asteraceae usually occurs in the inversions of two regions. Hence, the cp genome sequences and structures of Asteraceae species are crucial for the cp genome genetic diversity and evolutionary studies. Hence, in this paper, we have sequenced and analyzed for the first time the cp genome size of C. carinatum Schousb and K. indica, which are 149,752 bp and 152,885 bp, with a pair of inverted repeats (IRs) (24,523 bp and 25,003) separated by a large single copy (LSC) region (82,290 bp and 84,610) and a small single copy (SSC) region (18,416 bp and 18,269), respectively. In total, 79 protein-coding genes, 30 distinct transfer RNA (tRNA) genes, four distinct rRNA genes and two pseudogenes were found not only in C. carinatum Schousb but also in the K. indica cp genome. Fifty-two (52) and fifty-nine (59) repeats, and seventy (70) and ninety (90) simple sequence repeats (SSRs) were found in the C. carinatum Schousb and K. indica cp genomes, respectively. Codon usage analysis showed that leucine, isoleucine, and serine are the most frequent amino acids and that the UAA stop codon was the significantly favorite stop codon in both cp genomes. The two inversions, the LSC region ranging from trnC-GCA to trnG-UCC and the whole SSC region were found in both of them. The complete cp genome comparison with other Asteraceae species showed that the coding area is more conservative than the non-coding area. The phylogenetic analysis revealed that the rbcL gene is a good barcoding marker for identifying different vegetables. These results give an insight into the identification, the barcoding, and the understanding of the evolutionary model of the Asteraceae cp genome.

Entities:  

Keywords:  Asteraceae; C. carinatum Schousb; K. indica; barcoding; chloroplast genome; inversion

Mesh:

Substances:

Year:  2018        PMID: 29874832      PMCID: PMC6099409          DOI: 10.3390/molecules23061358

Source DB:  PubMed          Journal:  Molecules        ISSN: 1420-3049            Impact factor:   4.411


1. Introduction

Chloroplasts are crucial for sustaining life on Earth. The chloroplast (cp) genome encodes many key proteins for photosynthesis and other important metabolic processes of plants’ interactions with their environment, such as drought, salt, light, and so on, which give us insights to understand the plant biology, diversity, evolution and climatic adaptation, DNA barcoding and genetic engineering [1,2,3,4,5,6]. Although a relatively conserved architecture and core gene set of cp genomes are shown across higher plants due to the absence of sexual recombination or occurrence of cp capture, considerable variation occurs during evolution [7,8,9,10,11]. A lot of research on the evolution and barcoding of the cp genome in Asteraceae was reported [12]. Particularly interestingly, two cp genome regions have a higher inversion frequency in the sunflower family (Asteraceae) compared to most eudicots. One region locates the SSC (small single copy) and the other region locates between the trnC-GCA to trnG-UCC in the LSC (large single copy) region, which maybe help shape our understanding of Compositae evolution and the adaptive versus non-adaptive processes for cellular and genomic complexity [13,14,15,16,17,18,19,20,21,22,23,24,25]. To date, over 2400 sequenced cp genomes (http://www.ncbi.nlm.nih.gov/genomes/) are available. For the Asteraceae family, 129 whole cp genomes were sequenced, of which 16 have no inversions in the SSC compared with most other land plants [24,25,26]. However, all 16 species maintain inversion regions in LSC. If there does turn out to be a relationship between inversion and certain evolutionary factors in certain systems, it would depend on the acquisition of more genome sequence information, and the relationship between the cp genome structure/content and the complexity evolutionary factor [27]. The Asteraceae (Compositae), or sunflower family is the largest clade in the Asterales and the largest family of flowering plants on Earth. They are angiosperms that appeared about 140 million years ago, comprising 1911 genera and 32,913 species. Nearly one-fourth of all the species of flowering plants belong to the Asteracea, Fabacea, and Orchidaceae families [28,29,30]. The Asteraceae family includes a great diversity of species, including annuals, perennials, stem succulents, vines, shrubs, and trees. Commercially important plants in Asteraceae include food crops, ornamental plants for their flowers, and species with medicinal properties. Kalimeris indica (L.) and Chrysanthemum carinatum Schousb both belong to the Asterales family. Kalimeris indica (L.) is a member of Kalimeris in the sunflower family Asteraceae and is variously named as Ma Lan, Ji Er Chang, and Tian Bian Ju in China. Kalimeris indica (L.) is not only a traditional Chinese medicine and Miaos’ medicinal plant employed in China to treat colds, diarrhea, gastric ulcers, acute gastric abscess, conjunctivitis, acute orchitis, blood vomiting, and injuries, but it is also a popular vegetable [31,32,33,34,35]. As well as Kalimeris indica, Chrysanthemum carinatum Schousb, an important species of Chrysanthemum L. in the Asteraceae family, is a popular annual herb foodstuff in China, because of its fragrance, flavor, and abundant nutritional value. Moreover, the aerial parts of Chrysanthemum L. have been used for the protection or remedy of several diseases in oriental medicinal systems [36,37,38,39,40]. Here, we report the complete cp genome sequence of Kalimeris indica (L.) and Chrysanthemum carinatum Schousb for the first time. Meanwhile, their gene structure and organization were analyzed. Furthermore, the whole cp genome sequences were compared with other genii of the Asteraceae family, especially the inversions, which were found and compared. Phylogenetic analyses of DNA sequences for rcbL of 24 plant species indicated that rcbL would be a good molecular barcoding target.

2. Results

2.1. Features of the C. carinatum Schousb and K. indica cp Genome

The complete cp genome of C. carinatum Schousb and K. indica have a typical quadripartite structure and are 149,752 bp and 152,885 bp in size, respectively (Figure 1). C. carinatum Schousb has an LSC region of 82,290 bp and an SSC region of 18,416 bp which are separated by a pair of IR regions of 24,523 bp (Table 1 and Figure 1A). K. indica has an LSC region of 84,610 bp ranging from trnH-GUG to rps19 (Ribosomal protein S19), a SSC region of 18,269 bp from ycf1 (hypothetical protein 1 gene) to ndhF (NAD(P)H dehydrogenase), a pair of IR regions of 25,003 bp from rps19 to ycf1 and ranging from pseudogene ycf1 to pseudogene rps19, respectively (Table 1 and Figure 1B). Both of them have two inversions like most of the sunflower family species, one inversion occurs in the whole SSC region and the other inversion region is located in the LSC from trnC-GCA to trnG-UCC [13,15,41].
Figure 1

The complete cp genome map of C. carinatum Schousb and K. indica. (A) C. carinatum Schousb; (B) K. indica. The genes marked inside the circle are transcribed clockwise, and those outside are counterclockwise. Genes are color-coded according to their function.

Table 1

The base composition in the C. carinatum Schousb and K. indica cp genomes.

Region C. carinatum Schousb K. indica
T(U) (%)C (%)A (%)G (%)Length (bp)T(U) (%)C (%)A (%)G (%)Length (bp)
LSC32.417.632.018.182,29032.617.332.217.984,610
SSC35.114.734.116.118,41634.914.833.916.418,269
IRa28.322.328.620.824,52328.322.228.720.825,003
IRb28.620.828.322.324,52328.720.828.322.225,003
Total31.819.030.718.5149,75231.819.030.718.5152,885
CDS31.517.730.720.177,28931.517.730.620.378,372
1st position23.718.930.626.725,76323.918.830.626.826,124
2nd position32.620.429.317.725,76332.620.329.317.826,124
3rd position38.313.732.115.925,76338.013.931.816.326,124
From Table 1, the overall GC content of the C. carinatum Schousb and K. indica cp genome is the same (37.5%), with a higher GC content (43.1% and 43%) in the IR regions than in the LSC (35.7% and 35.2%) and SSC regions (30.8% and 31.2%), which is similar to other Asteraceaes [11,12,13,14,15,17,18,25,26,42]. The AT content in the third, second and first codon position of the C. carinatum Schousb cp genome are 70.4%, 61.9%, and 54.3%, and the AT content in the third, second, and first codon position of K. indica are 69.8%, 61.9%, and 54.5% (Table 1). The results showed that the third codon position and the second codon position have significantly higher AT representation, which is a common feature of the cp genome [43,44,45,46].

2.2. Functional Genes of the C. carinatum Schousb and K. indica cp Genome

There are 113 unique functional genes and two pseudogenes in the C. carinatum Schousb or K. indica cp genome (Table 2). Among the 113 functional genes, 79 protein-coding genes, 30 distinct transfer RNA (tRNA) genes, and four distinct rRNA genes were found not only in the C. carinatum Schousb genome but also in the K. indica cp genome (Table 2). Notably, seven protein-coding, seven tRNA, and all the rRNA genes are duplicated in the IR regions, which is common in most cp genomes [47,48]. Meanwhile, the coding regions constitute 51.6% and 51.3% of the genome in the C. carinatum Schousb and K. indica cp genome, respectively. The non-coding regions constitute 48.4% and 48.7% including the introns, pseudogenes, and intergenic spacers, respectively. Two pseudogenes (ycf1 and rps19) are both found in both C. carinatum Schousb and K. indica cp genomes, a common feature shared with cp genomes from other plants [25,47,48].
Table 2

The genes present in the C. carinatum Schousb and K. indica cp genomes.

Group of GenesC. carinatum Schousb Gene NamesK. indica Gene Names
Photosystem I psaA, psaB, psaC, psaI, psaJ psaA, psaB, psaC, psaI, psaJ
Photosystem II psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, psbN, psbT, psbZ
Cytochrome b/f complex petA, petB *, petD *, petG, petL, petN petA, petB *, petD *, petG, petL, petN
ATP synthase atpA, atpB, atpE, atpF *, atpH, atpI atpA, atpB, atpE, atpF *, atpH, atpI
NADH dehydrogenasendhA *, ndhB * (×2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK ndhA *, ndhB * (×2), ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK
RuBisCO large subunit rbcL rbcL
RNA polymerase rpoA, rpoB, rpoC1 *, rpoC2 rpoA, rpoB, rpoC1 *, rpoC2
Ribosomal proteins (SSU)rps2, rps3, rps4, rps7 (×2), rps8, rps11, rps12 ** (×2), rps14, rps15, rps16 *, rps18, rps19rps2, rps3, rps4, rps7 (×2), rps8, rps11, rps12 ** (×2), rps14, rps15, rps16 *, rps18, rps19
Ribosomal proteins (LSU)rpl2 * (×2), rpl14, rpl16 *, rpl20, rpl22, rpl23 (×2), rpl32, rpl33, rpl36rpl2 * (×2), rpl14, rpl16 *, rpl20, rpl22, rpl23 (×2), rpl32, rpl33, rpl36
Miscellaneous proteins accD, clpP **, matK, ccsA, cemA, infA accD, clpP **, matK, ccsA, cemA, infA
Hypothetical chloroplast reading frames (ycf)ycf1, ycf2 (×2), ycf3 **, ycf4ycf1, ycf2 (×2), ycf3 **, ycf4
Transfer RNAstrnA-UGC (×2), trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCC, trnH-GUG, trnI-CAU (×2), trnI-GAU (×2), trnK-UUU, trnL-CAA(×2), trnL-UAA, trnL-UAG, trnfM-CAU, trnM-CAU, trnN-GUU (×2), trnP-UGG, trnQ-UUG, trnR-ACG (×2), trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC (×2), trnV-UAC, trnW-CCA, trnY-GUAtrnA-UGC (×2), trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnG-UCC, trnH-GUG, trnI-CAU (×2), trnI-GAU (×2), trnK-UUU, trnL-CAA(×2), trnL-UAA, trnL-UAG, trnfM-CAU, trnM-CAU, trnN-GUU (×2), trnP-UGG, trnQ-UUG, trnR-ACG (×2), trnR-UCU, trnS-GCU, trnS-GGA, trnS-UGA, trnT-GGU, trnT-UGU, trnV-GAC (×2), trnV-UAC, trnW-CCA, trnY-GUA
Ribosomal RNAsrrn4.5 (×2), rrn5 (×2), rrn16 (×2), rrn23 (×2)rrn4.5 (×2), rrn5 (×2), rrn16 (×2), rrn23 (×2)
Pseudogenes ycf1, rps19 ycf1, rps19
Total132132

(×2) indicates a duplicated gene; * represents the introns that the gene contains; ** indicates there are two introns that the gene contains.

Among the C. carinatum Schousb or K. indica cp genomes, a total of 18 genes (six tRNA genes and 12 protein-coding genes) contain introns (Table 3). They are mostly located in the LSC region (13 genes), with only one located in the SSC region, and four genes locate in the IR regions. Like other angiosperms, clpP, rps12, and ycf3 contain two introns [48]. Interestingly, rps12 was found to be a trans-spliced gene, whose 5′ end is located in the LSC region and the duplicated 3′ end is located in the IR region. Consistent with many research results, the largest intron is in the trnK-UUU gene, which contains the matk gene in the intron. Moreover, trnL-UAA has the smallest intron in both of them. Comparing these 18 introns between C. carinatum Schousb and K. indica, most of them are in K. indica and are longer than those in C. carinatum Schousb, whereas the introns of rpl16, trnK-UUU, and the introII rps12 are a bit shorter, and the trnV-UAC intron is the same size, which may be one reason why the whole cp genome of K. indica is larger than C. carinatum Schousb.
Table 3

The intron-containing genes of the C. carinatum Schousb and K. indica cp genomes and the lengths of the introns and exons.

C. GeneLocationExon I (bp)Intron I (bp)Exon II (bp)Intron II (bp)Exon III (bp)K. GeneLocationExon I (bp)Intron I (bp)Exon II (bp)Intron II (bp)Exon III (bp)
atpF LSC145699410 atpF LSC145709410
clpP LSC71796291611229 clpP LSC71814291615229
ndhA SSC5391045553 ndhA SSC5531064539
ndhB IR777670756 ndhB IR777674756
petB LSC6751642 petB LSC6823642
petD LSC8675475 petD LSC8809475
rpl16 LSC91029399 rpl16 LSC91098399
rpl2 IR391664434 rpl2 IR391671434
rpoC1 LSC4327331641 rpoC1 LSC4327421638
rps12 * LSC114-23253626 rps12 * LSC114 23253526
rps16 LSC41891184 rps16 LSC41876226
trnA-UGC IR3881235 trnA-UGC IR3582038
trnG-UCC LSC2372247 trnG-UCC LSC2373248
trnI-GAU IR4377535 trnI-GAU IR3878035
trnK-UUU LSC37256230 trnK-UUU LSC37253935
trnL-UAA LSC3742250 trnL-UAA LSC3743850
trnV-UAC LSC3857337 trnV-UAC LSC3857337
ycf3 LSC125698229735153 ycf3 LSC125702229739153

* The rps12 gene is a trans-spliced gene with the 5′ end located in the LSC region and a duplicated 3′ end in the IR region.

2.3. Codon Usage of the K. indica and C. carinatum Schousb cp Genome

As shown in Table 4 and Table 5, a total of 25,415 and 26,124 codons are involved in the protein-coding in C. carinatum Schousb and K. indica, respectively. Of these codons, leucine, isoleucine, and serine are the most frequent amino acids in C. carinatum Schousb, which encode in 2778 (10.93%), 2195 (8.64%), and 2080 (8.18%) codons, respectively. Leucine, isoleucine, and serine are the most frequent amino acids in K. indica, which encode in 2795 (10.70%), 2190 (8.38%) and 2131 (8.16%) codons, respectively. Both of them contain 85 stop codons, which is the least frequent codon. The cysteine codons are the least frequent universal amino acid, which has 286 (1.13%) and 294 (1.13%) codons in C. carinatum Schousb and K. indica, respectively.
Table 4

The codon-anticodon recognition pattern and codon usage for the C. carinatum Schousb cp genome.

Amino AcidCodonNo.RSCUtRNAAmino AcidCodonNo.RSCUtRNA
PheUUU956 1.32 StopUAA49 1.73
PheUUC4960.68 trnF-GAA StopUAG210.74
LeuUUA874 1.89 trnL-UAA StopUGA150.53
LeuUUG553 1.19 trnL-CAA HisCAU452 1.52
LeuCUU620 1.34 HisCAC1430.48 trnH-GUG
LeuCUC1830.40 GlnCAA703 1.51 trnQ-UUG
LeuCUA3580.77 trnL-UAG GlnCAG2270.49
LeuCUG1900.41 AsnAAU978 1.56
IleAUU1075 1.47 AsnAAC2770.44 trnN-GUU
IleAUC4280.58 trnI-GAU LysAAA1026 1.50 trnK-UUU
IleAUA6920.95 trnI-CAU LysAAG3450.50
MetAUG6131.00 trn(f)M-CAU AspGAU831 1.59
ValGUU490 1.44 AspGAC2140.41 trnD-GUC
ValGUC1660.49 trnV-GAC GluGAA986 1.50 trnE-UUC
ValGUA523 1.54 trnV-UAC GluGAG3310.50
ValGUG1780.52 CysUGU198 1.38
SerUCU580 1.77 CysUGC880.62 trnC-GCA
SerUCC3130.96 trnS-GGA TrpUGG448 1.00 trnW-CCA
SerUCA394 1.20 trnS-UGA ArgCGU336 1.32 trnR-ACG
SerUCG1540.47 ArgCGC1000.39
SerAGA474 1.86 ArgCGA339 1.33
SerAGG1650.65 trnS-GCU ArgCGG1180.46
TyrUAU799 1.64 ArgAGU406 1.24 trnR-UCU
TyrUAC1750.36 trnY-GUA ArgAGC1180.36
ProCCA322 1.17 trnP-UGG GlyGGU581 1.32
ProCCG1590.58 GlyGGC1880.43 trnG-GCC
ProCCU434 1.58 GlyGGA686 1.56 trnG-UCC
ProCCC1840.67 GlyGGG3050.69
ThrACU529 1.64 AlaGCA416 1.18 trnA-UGC
ThrACC2360.73 trnT-GGU AlaGCG1620.46
ThrACA404 1.25 trnT-UGU AlaGCU611 1.73
ThrACG1250.39 AlaGCC2230.63

RSCU: Relative synonymous codon usage. RSCU > 1 are highlighted in bold.

Table 5

The codon-anticodon recognition pattern and codon usage for the K. indica cp genome.

Amino AcidCodonNo.RSCUtRNAAmino AcidCodonNo.RSCUtRNA
PheUUU982 1.31 StopUAA49 1.73
PheUUC5150.69 trnF-GAA StopUAG210.74
LeuUUA870 1.87 trnL-UAA StopUGA150.53
LeuUUG578 1.24 trnL-CAA HisCAU452 1.51
LeuCUU607 1.30 HisCAC1480.49 trnH-GUG
LeuCUC1860.4 GlnCAA723 1.53 trnQ-UUG
LeuCUA3750.81 trnL-UAG GlnCAG2200.47
LeuCUG1790.38 AsnAAU969 1.53
IleAUU1072 1.47 AsnAAC2950.47 trnN-GUU
IleAUC4270.58 trnI-GAU LysAAA1036 1.48 trnK-UUU
IleAUA6910.95 trnI-CAU LysAAG3600.52
MetAUG6331 trn(f)M-CAU AspGAU847 1.61
ValGUU503 1.45 AspGAC2050.39 trnD-GUC
ValGUC1800.52 trnV-GAC GluGAA987 1.47 trnE-UUC
ValGUA517 1.49 trnV-UAC GluGAG3590.53
ValGUG1900.55 CysUGU209 1.42
SerUCU585 1.76 CysUGC850.58 trnC-GCA
SerUCC3080.93 trnS-GGA TrpUGG4631 trnW-CCA
SerUCA401 1.21 trnS-UGA ArgCGU351 1.33 trnR-ACG
SerAGA488 1.85 ArgCGC1080.41
SerAGG1770.67 trnS-GCU ArgCGA346 1.31
SerUCG1720.52 ArgCGG1140.43
TyrUAU804 1.64 ArgAGU405 1.22 trnR-UCU
TyrUAC1750.36 trnY-GUA ArgAGC1210.36
ProCCU414 1.50 GlyGGU571 1.28
ProCCC2090.76 GlyGGC1990.45 trnG-GCC
ProCCA314 1.14 trnP-UGG GlyGGA681 1.53 trnG-UCC
ProCCG1670.61 GlyGGG3280.74
ThrACU531 1.62 AlaGCU622 1.74
ThrACC2390.73 trnT-GGU AlaGCC2330.65
ThrACA405 1.23 trnT-UGU AlaGCA410 1.15 trnA-UGC
ThrACG1370.42 AlaGCG1610.45
Usually, relative synonymous codon usage (RSCU) can be divided into four types, including lack of bias (RSCU < 1.0), low bias (1.0 < RSCU< 1.2), moderately biased (1.2 < RSCU< 1.3) and highly biased (RSCU > 1.3) [49,50]. As shown in Table 4 and Table 5, it is uncannily similar that there are 32 lack of bias codons, two no-bias codons with RSCU = 1 (tryptophan and methionine), and 21 highly biased codons, with the exception of five low bias codons in C. carinatum Schousb but two low bias codons in K. indica, and three moderately biased codons in C. carinatum Schousb but six moderately biased codons in K. indica, respectively. The UAA stop codon was the significantly favorite stop codon in the cp genomes. The results showed that the RSCU was significantly biased except for tryptophan and methionine in C. carinatum Schousb and K. indica and that the A/T ending is very rich, which is popular in the cp genomes of higher plants [51,52].

2.4. Repeats Structure and SSRs

Analysis of the repeat structure showed that the total of 42 repeats of C. carinatum Schousb is significantly less than K. indica (59), which may be one reason for the larger cp genome of K. indica. There are 18 forward repeats, 20 palindromic repeats, three reverse repeats and one complement repeats in the cp genome of the C. carinatum Schousb (Table 6) and 17 forward repeats, 28 palindromic repeats, 11 reverse repeats, and eight complement repeats in the K. indica cp genome (Table 7). All of them range from 30 to 60 bp in length and are mostly located in the intergenic spacer (IGS) and intron sequences. Comparison analysis of the repeats of six other species of Asteraceae showed that H. annuus contained the most repeats (572) and T. mongolicum contained the fewest repetitive sequences (28) (Figure 2). Among the eight species of Asteraceae, we also found that the base fragment with the most repeats was between 30–39 bp in length. Complement repeats are relatively richer than in other families, although A. annua, E. paradoxa, T. mongolicum and C. indicum do not contain complement repeats [48].
Table 6

The repeats of the C. carinatum Schousb cp genome and their distribution.

No.Size (bp)TypeRepeat 1 LocationRepeat 2 LocationRegionNo.Size (bp)TypeRepeat 1 LocationRepeat 2 LocationRegion
160Fycf2 (CDS)ycf2 (CDS)IRb2231RIGS (trnT-GGU, psbD)IGS (trnT-GGU, psbD)LSC
260Pycf2 (CDS)ycf2 (CDS)IRb, IRa2330PIGS (psbI, trnS-GUC)trnS-GGA (CDS)LSC
360Pycf2 (CDS)ycf2 (CDS)IRb, IRa2430Fycf2 (CDS)ycf2 (CDS)IRb
460Fycf2 (CDS)ycf2 (CDS)IRa, IRa2530Pycf2 (CDS)ycf2 (CDS)IRb, IRa
551FIGS (rps11, rpl36)IGS (rps11, rpl36)LSC2630Pycf2 (CDS)ycf2 (CDS)IRb, IRa
648PIGS (psbT, psbN)IGS (psbT, psbN)LSC2735FpsaB (CDS)psaA (CDS)LSC
746FIGS (accD, psaI)IGS (accD, psaI)LSC2835Fycf3 (intron2)ndhB (intron)LSC, IRb
845Fycf2 (CDS)ycf2 (CDS)IRb2935Pycf3 (intron2)ndhB (intron)LSC, IRa
945Pycf2 (CDS)ycf2 (CDS)IRb, IRa3032PIGS (trnT-GGU, psbD)IGS (trnT-GGU, psbD)LSC
1045Pycf2 (CDS)ycf2 (CDS)IRb, IRa3131FIGS (psbI, trnS-GUC)IGS (psbI, trnS-GUC)LSC
1146PIGS (petN, psbM)IGS (petN, psbM)LSC3230PIGS (psbC, trnS-UGA)trnS-GGA (CDS)LSC
1239PIGS (rps12, trnV-GAC)ndhA (intron)IRb, SSC3330FrbcL (CDS)IGS (rbcL, accD)LSC
1339FndhA (intron)IGS (trnV-GAC, rps12)SSC, IRa3432FIGS (psbI, trnS-GUC)IGS (psbC, trnS-UGA)LSC
1441Fycf3 (intron2)IGS (rps12, trnV-GAC)LSC, IRb3531RIGS (trnL-UAG, rpl32)IGS (trnL-UAG, rpl32)SSC
1541Pycf3 (intron2)IGS (trnV-GAC, rps12)LSC, IRa3630CIGS (trnT-GGU, psbD)IGS (trnT-GGU, psbD)LSC
1639Pycf3 (intron2)ndhA (intron)LSC, SSC3730FpsaB (CDS)psaA (CDS)LSC
1742Fycf2 (CDS)ycf2 (CDS)IRb3830Fycf2 (CDS)ycf2 (CDS)IRb, IRa
1842Pycf2 (CDS)ycf2 (CDS)IRb, IRa3930Pycf2 (CDS)ycf2 (CDS)IRb
1942Pycf2 (CDS)ycf2 (CDS)IRb, IRa4030FIGS (trnL-UAG, rpl32)IGS (trnL-UAG, rpl32)SSC
2042Fycf2 (CDS)ycf2 (CDS)IRa4130RIGS (trnL-UAG, rpl32)IGS (trnL-UAG, rpl32)SSC
2141PIGS (ndhE, psaC)IGS (psaC, ndhD)SSC4230Pycf2 (CDS)ycf2 (CDS)IRa

F = forward, P = palindrome, R = reverse, C = complement, IGS = intergenic spacer.

Table 7

The repeats of the K. indica cp genome and their distribution.

No.Size (bp)TypeRepeat 1 LocationRepeat 2 LocationRegionNo.Size (bp)TypeRepeat 1 LocationRepeat 2 LocationRegion
148PIGS (psbT, psbN)IGS (psbT, psbN)LSC3132FIGS (rrn5, rrn4.5)IGS (rrn5, rrn4.5)IRa
239Pycf3 (intron2)ndhA (intron)LSC, SSC3234RIGS (trnT-GGU, psbD)IGS (trnT-GGU, psbD)LSC
341Fycf3 (intron2)IGS (rps12, trnV-GAC)LSC, IRb3331RIGS (ndhC, trnV-UAC)petB (intron)LSC
441Pycf3 (intron2)IGS (trnV-GAC, rps12)LSC, IRa3433CIGS (accD, psaI)petD (CDS)LSC
539PIGS (rps12, trnV-GAC)ndhA (intron)IRb, SSC3533CIGS (accD, psaI)IGS (accD, psaI)LSC
639FndhA (intron)IGS (trnV-GAC, rps12)SSC, IRa3630PIGS (psbC, trnS-UGA)trnS-GGA (CDS)LSC
742Fycf2 (CDS)ycf2 (CDS)IRb3732CIGS (trnH-GUG, psbA)IGS (accD, psaI)LSC
842Pycf2 (CDS)ycf2 (CDS)IRb, IRa3832FIGS (psbI, trnS-GCU)IGS (psbC, trnS-UGA)LSC
942Pycf2 (CDS)ycf2 (CDS)IRb, IRa3932PIGS (petN, psbM)IGS (petN, psbM)LSC
1042Fycf2 (CDS)ycf2 (CDS)IRa4032CIGS (trnG-UCC, trnT-GGU)petD (intron)LSC
1141RpetD (intron)petD (intron)LSC4132FpsaB (CDS)psaA (CDS)LSC
1243PIGS (psaC, ndhD)IGS (psaC, ndhD)SSC4232RIGS (accD, psaI)IGS (accD, psaI)LSC
1339RIGS (accD, psaI)IGS (accD, psaI)LSC4331PIGS (trnG-UCC, trnT-GGU)IGS (ndhC, trnV-UAC)LSC
1431FIGS (rps12, trnV-GAC)IGS (rps12, trnV-GAC)IRb4431PIGS (trnG-UCC, trnT-GGU)IGS (trnG-UCC, trnT-GGU)LSC
1531PIGS (rps12, trnV-GAC)IGS (trnV-GAC, rps12)IRb, IRa4531PIGS (trnT-GGU, psbD)IGS (trnT-UGU, trnL-UAA)LSC
1631PIGS (rps12, trnV-GAC)IGS (trnV-GAC, rps12)IRb, IRa4631FIGS (psaA, ycf3)IGS (psaA, ycf3)LSC
1731FIGS (trnV-GAC, rps12)IGS (trnV-GAC, rps12)IRa4731RIGS (accD, psaI)IGS (accD, psaI)LSC
1837PIGS (rpl22, rps19)IGS (rpl22, rps19)LSC4831FIGS (accD, psaI)petD (intron)LSC
1931RpetD (intron)petD (intron)LSC4931CpetD (intron)petD (intron)LSC
2030PIGS (psbI, trnS-GCU)trnS-GGA (CDS)LSC5031FndhF (CDS)IGS (ndhF, ycf1)SSC
2130Fycf2 (CDS)ycf2 (CDS)IRb5130CIGS (trnG-UCC, trnT-GGU)IGS (trnT-GGU, psbD)LSC
2230Pycf2 (CDS)ycf2 (CDS)IRb, IRa5230RIGS (trnT-GGU, psbD)IGS (trnT-GGU, psbD)LSC
2330Pycf2 (CDS)ycf2 (CDS)IRb, IRa5330Cycf3 (intron2)IGS (ndhC, trnV-UAC)LSC
2432PIGS (trnG-UCC, trnT-GGU)IGS (trnG-UCC, trnT-GGU)LSC5430CIGS (ndhC, trnV-UAC)IGS (accD, psaI)LSC
2532PIGS (psbZ, trnG-GCC)IGS (psbZ, trnG-GCC)LSC5530FIGS (ndhC, trnV-UAC)IGS (ndhC, trnV-UAC)LSC
2632PIGS (accD, psaI)petD (CDS)LSC5630FIGS (accD, psaI)IGS (accD, psaI)LSC
2732RpetD (CDS)petD (CDS)LSC5730RIGS (accD, psaI)IGS (accD, psaI)LSC
2832FIGS (rrn4.5, rrn5)IGS (rrn4.5, rrn5)IRb5830RIGS (accD, psaI)petD (intron)LSC
2932PIGS (rrn4.5, rrn5)IGS (rrn5, rrn4.5)IRb, IRa5930FpetD (intron)petD (intron)LSC
3032PIGS (rrn4.5, rrn5)IGS (rrn5, rrn4.5)IRb, IRa

F = forward, P = palindrome, R = reverse, C = complement, IGS = intergenic spacer.

Figure 2

The repeat sequences of eight Asteraceae cp genomes. F (forward), P (palindrome), R (reverse), and C (complement) represent the repeat types. Different colours represent the repeats in different lengths.

There are 70 and 90 simple sequence repeats (SSRs) in the cp genome of C. carinatum Schousb and K. indica, respectively (Table 8 and Table 9). Most of them are mononuclear repeats; 43 in C. carinatum Schousb and 30 in K. indica, respectively. Eight dinucleotide repeats, five trinucleotide repeats, thirteen tetranucleotide repeats, and one pentanucleotide repeats were also found in the C. carinatum Schousb cp genome. Eighteen dinucleotide repeats, twenty-eight trinucleotide repeats, seven tetranucleotide repeats, and seven pentanucleotide repeats were also discovered in K. indica. The majority of SSRs are located in the LSC region, 81.43% (C. carinatum Schousb) and 78.89% (K. indica) of SSRs are located in the LSC (Table 8 and Table 9) whereas, only twelve (C. carinatum Schousb) and 9 (K. indica) SSRs are located in the CDSs (Table 10). Comparing with the six other species of Asteraceae, the most SSRs and the least SSRs located in CDS were found in the K. indica cp genome. Furthermore, most of the SSRs are AT repeats, which is consistent with the AT richness [48,51].
Table 8

The simple sequence repeats in the C. carinatum Schousb cp genome.

UnitLengthNo.LocationRegionUnitLengthNo.LocationRegionUnitLengthNo.LocationRegion
A191IGS (rpl32, ndhF)SSCT221IGS (atpF, atpA)LSCTA141IGS (trnH-GUG, psbA)LSC
131IGS (psbE, petL)LSC 161rps16 (intron)LSC 122IGS (trnT-GGU, psbD)LSC
121IGS (ycf1, rps15)SSC 143IGS (trnE-UUC, rpoB)LSC IGS (rpl33, rps18)LSC
116IGS (trnK-UUU, rps16)LSC IGS (atpB, rbcL)LSC 101rpoC1 (exonII)LSC
IGS (trnR-UCU, trnG-UCC)LSC IGS (rps8, rpl14)LSCTG101IGS (rps16, trnQ-UUG)LSC
IGS (psbZ, trnG-GCC)LSC 133IGS (ndhC, trnV-UAC)LSCATT122IGS (cemA, petA)LSC
IGS (psbE, petL)LSC IGS (rbcL, accD)LSC IGS (trnL-UAG, rpl32)SSC
IGS (rps18, rpl20)LSC IGS (psbB, psbT)LSCGAA151 ycf1 SSC
IGS (petD, rpoA)LSC 125IGS (trnH-GUG, psbA)LSCTTA121ndhA (intron)SSC
1010trnk-UUU (intron)LSC IGS (trnD-GUC, trnY-GUA)LSCTTC121 psbC LSC
rpoB LSC ycf3 (intronII)LSCAATA122IGS (trnK-UUU, rps16)LSC
rpoC1 (intron)LSC clpP (intronI)LSC ndhA (intron)SSC
rpoC1 (exonII)LSC ycf1 LSCATAC121IGS (trnF-GAA, ndhJ)LSC
rpoC1 (exonII)LSC 111IGS (petA, psbJ)LSCATTG121 ycf2 IRa
IGS (psaA, ycf3)LSC 109trnk-UUU (intron)LSCATTT221IGS (atpF, atpA)LSC
IGS (ycf3, trnS-GGA)LSC IGS (psbM, trnD-GUC)LSCCAAT121IGS (ycf2, trnL-CAA)IRb
IGS (trnT-UGU, rnL-UAA)LSC rpoC1 (intron)LSCGATT121ndhA (intron)SSC
psbT LSC IGS (atpI, atpH)LSCTAGA121IGS (rbcL, accD)LSC
IGS (rrn5, trnR-ACG)IRb IGS (atpI, atpH)LSCTAAA121IGS (atpI, atpH)LSC
C121rps16 (intron)LSC IGS (atpA, trnR-UCU)LSCTAAT121IGS (petN, psbM)LSC
AT141rps16 (intron)LSC rpoA LSCTATT122IGS (rpl33, rps18)LSC
121IGS (psbZ, trnG-GCC)LSC IGS (rpl14, rpl16)LSC ndhD SSC
101 rpoC2 LSC IGS (trnR-ACG, rrn5)IRaTTTC161rpl16 (intron)LSC
AATTT151IGS (ccsA, trnL-UAG)SSC
Table 9

The simple sequence repeats in the K. indica cp genome.

UnitLengthNo.LocationRegionUnitLengthNo.LocationRegionUnitLengthNo.LocationRegion
A181IGS (trnS-UGA, psbZ)LSCAT161IGS (psbZ, trnG-GCC)LSCGAA151 ycf1 SSC
121IGS (psbZ, trnG-GCC)LSC 107rps16 (intron)LSC 121 ycf1 SSC
113IGS (trnQ-UUG, psbK)LSC 10 rpoC2 LSCTAT211IGS (rps12, trnV-GAC)IRb
11 IGS (psbM, trnD-GUC)LSC 10 IGS (psbZ, trnG-GCC)LSC 122IGS (trnT-UGU, trnL-UAA)LSC
11 IGS (ycf4, cemA)LSC 10 IGS (psbZ, trnG-GCC)LSC 12 IGS (trnF-GAA, ndhJ)LSC
107 rpoB LSC 10 IGS (psbE, petL)LSCTTA152IGS (ndhC, trnV-UAC)LSC
10 rpoC1 (exonII)LSC 10 IGS (petD, rpoA)LSC 15 IGS (clpP, psbB)LSC
10 IGS (petB, petD)LSC 10 rpl16 (intron)LSC 122petB (intron)LSC
10 ndhB (intron)IRbTA231petD (intron)LSC 12 petD (intron)LSC
10 ndhA (intron)SSC 201IGS (accD, psaI)LSC 121 psbC LSC
10 IGS (trnL-UAG, rpl32)SSC 181petD (intron)LSC 121IGS (trnK-UUU, rps16)LSC
10 IGS (rpl2, rps19)IRa 141IGS (accD, psaI)LSC 121IGS (trnG-UCC, trnT-GGU)LSC
T181IGS (trnE-UUC, rpoB)LSC 123IGS (trnT-GGU, psbD)LSC 121IGS (trnE-UUC, rpoB)LSC
171 rpoA LSC 12 petD (intron)LSC 121IGS (clpP, psbB)LSC
142IGS (rpl20, rps12)LSC 12 IGS (rpl22, rps19)LSC 121IGS (trnS-GCU, trnC-GCA)LSC
14 ndhA (intron)SSC 103rpoC1 (exonII)LSC 121IGS (ndhI, ndhG)SSC
131IGS (psbE, petL)LSC 10 IGS (psaJ, rpl33)LSCTATC121IGS (psbA, trnK-UUU)LSC
122IGS (atpF, atpA)LSC 10 ndhA (intron)SSCTATT122IGS (rpl33, rps18)LSC
12 clpP (intronII)LSCAAT122IGS (trnR-UCU, trnG-UCC)LSC 12 IGS (psaC, ndhD)SSC
112IGS (atpB, rbcL)LSC 12 IGS (trnT-GGU, psbD)LSCTCTA121ndhA (intron)SSC
11 IGS (psaI, ycf4)LSCATA211IGS (trnV-GAC, rps12)IRaTTCT121IGS (trnG-UCC, trnT-GGU)LSC
109IGS (trnC-GCA, petN)LSC 153ycf3 (intron)LSCTTTA121petD (intron)LSC
10 IGS (rpoC2, rps2)LSC 15 IGS (trnP-UGG, psaJ)LSCTTTC121trnK-UUU (intron)LSC
10 IGS (atpI, atpH)LSC 15 IGS (trnL-UAG, rpl32)SSCATTAG151IGS (rbcL, accD)LSC
10 IGS (ndhC, trnV-UAC)LSC 122IGS (trnG-UCC, trnT-GGU)LSCTATAT231petD (intron)LSC
10 clpP (intronI)LSC 12 rpl16 (intron)LSC 201IGS (accD, psaI)LSC
10 IGS (rps8, rpl14)LSCATT151IGS (trnG-UCC, trnT-GGU)LSC 151IGS (accD, psaI)LSC
10 IGS (rps19, rpl2)IRbTAA151IGS (accD, psaI)LSCTATTA212IGS (rps12, trnV-GAC)IRb
10 IGS (psaC, ndhD)SSC 122IGS (trnE-UUC, rpoB)LSC 21 IGS (trnV-GAC, rps12)IRa
10 ndhB (intron)IRa 12 IGS (trnT-GGU, psbD)LSCTCCTA151IGS (rps4, trnT-UGU)LSC
Table 10

The distribution of SSRs present in the Asteraceae cp genomes.

TaxonGenome Size (bp)GC (%)SSR TypeCDS
MonoDiTriTetraPentaHexaTotal% aNo. b% c
C . carinatum Schousb 149,75237.47438513107051.61217.1
K . indica 152,88537.2530182213709051.3910
D . alveolatum 152,26537.3734141114207551.4810.7
A . annua 150,95237.483910415307152.11014.1
E . paradoxa 151,83737.5937445015151.51427.5
H . annuus 151,10437.6240444005251.21426.9
T . mongolicum 151,45137.6714536113051.31136.7
C . indicum 150,97237.483810414106748.8710.4

CDS: protein-coding regions. a the percentage ratio of the total length of the CDS to the genome size. b the total number of SSRs in CDS. c the percentage ratio of the total number of SSRs in CDS to the total number of SSRs in the whole genome.

2.5. Comparison of the Whole cp Genome of Asteraceae

Using C. carinatum Schousb as a reference, mVISTA was used to compare the overall sequence of the cp genome of the eight Asteraceae species (Figure 3). It was found that the cp genome sequence of C. carinatum Schousb had the shortest size, whereas the K. indica cp genome had the longest size among the eight Astareace species. The difference of the sequence length was mainly due to the difference of length of the LSC region and there was no significant differences in the SSC and IRs regions’ length. It is important to note that the SSC regions of A. annua and T. mongolicum are completely different from C. carinatum Schousb because of the inversion in most of the Asteraceae species, whereas the inversion region in LSC compared to other eudicots is relatively constant [13,14,15,16,17,18,19,20,21,22,23,24,25]. In order to illustrate this, we compared the cp genomes between 129 species of Asteraceae and some cp genomes of non-Asteraceae species using the mVISTA software to confirm the structural changes (Data not shown). Among them, only 16 species of Asteraceae (Artemisia argyi, Artemisia capillaries, Artemisia frigida, Artemisia gmelinii, Artemisia montana, Aster altaicus, Carthamus tinctorius, Centaurea diffusa, Eclipta prostrate, Lactuca sativa, Saussurea chabyoungsanica, Saussurea involucrate, Saussurea polylepis, Taraxacum mongolicum, Taraxacum officinale, Taraxacum platycarpum) did not invert in the SSC region (Supplemental Table S1). However, the LSC region was generally inverted. In addition, the coding area is more conservative than the non-coding area [53,54].
Figure 3

The comparison of eight Asteraceae cp genomes by using mVISTA. The grey arrows above the contrast indicate the direction of the gene translation.The y-axis represents the percent identity between 50% and 100%. Protein codes (exon), rRNA, tRNA and conserved non-coding sequence (CNS) are shown in different colors, respectively.

2.6. IR Contraction and Expansion

The IR contraction and expansion of C. carinatum Schousb and K. indica were analyzed by comparing the LSC/IRB/SSC/IRA boundary among the eight species of Asteraceae (Figure 4). For all of them, the junction of the IRB and LSC regions was connected by the rps19 gene such that pseudogenes rps19 in the IRA/LSC boundaries were usually found in the duplication of the 3′end of the rps19 in IRA, in which the pseudogene rps19 lengths ranged from 60 to 99 bp. The ycf1 gene was located in both IRB/SSC and IRA/LSC boundaries. Additionally, 477 to 581 bp pseudogenes ycf1 at the border of IRA/SSC were produced. Among these eight cp genomes, C. carinatum Schousb had the largest ycf1 pseudogene (581 bp in length), whereas the inversion of the SSC region in A. annua and T. mongolicum caused the pseudogene ycf1 to appear at the border of IRB/SSC. T. mongolicum and C. indicum showed the absence of the ycf1 pseudogene and the rps19 pseudogene. With the exception of T. mongolicum, all ndhF genes were located in the SSC region, which has a distance between 35–77 bp from the IRA border, and the ndhF gene of C. carinatum Schousb has the largest distance to IRA border. Furthermore, the ndhF gene of T. mongolicum is 10 bp across the IRB region and intersects with the pseudogene ycf1. The trnH-GUG genes are located in the LSC region between 0–13 bp from the IRA border and C. carinatum Schousb has the largest one. Oddly, a relatively smaller IR size but longer pseudogene ycf1 length were found in C. carinatum Schousb, which may be due to the occurrence of the contraction in the intergenic regions in C. carinatum Schousb [48].
Figure 4

The comparison of the borders of the LSC, SSC, and IR regions among the eight Asteraceae cp genomes.

2.7. Phylogenetic Analysis and Barcoding

The cp genome is a useful resource and tool for taxonomy, determining evolutionary relationships within families, and DNA barcoding [42,55,56,57,58,59]. Here, for obtaining a useful barcoding marker and a reasonable phylogenetic status of C. carinatum Schousb and K. indica, we established the phylogenetic tree of 24 sample species using the Maximum Likelihood (ML) method by the alignment of 18 genes (atpF, clpP, matK, ndhA, ndhB, ndhF, petB, petD, psaB, psbA, rbcL, rpl2, rpl16, rpoB, rpoC1, rps12, rps16, rps19), respectively (Supplemental Figure S1). These 24 species include 18 Asteraceae species, four Orchidaceae species, one Chenopodiaceae species and one Cruciferae species. We compared six non-Asteraceae species with 18 Asteraceae species because the former are morphologically similar to C. carinatum Schousb and K. indica. From the results of the alignment, the matK, ndhF and rbcL genes are better than the others. There are eight nodes that had bootstrap values >90% when matK and ndhF were used for gene alignment. However, they both had three nodes with bootstrap values <40% and one node even had a bootstrap value <30%. As for the rbcL gene, 10 out of the 19 nodes had bootstrap values >90%. The minimum bootstrap value is 46% and the rest of the nodes had bootstrap values >50%. We show the phylogenetic tree by rbcL gene alignment in Figure 5, which shows that C. carinatum Schousb is a closely related species with Chrysanthemum indicum and that K. indica is closely related with the Conyza bonariensisspecies, which is consistent with their classification and morphology. Hence, the rbcL gene could be a good candidate gene to be used as a barcoding marker [51,52,54].
Figure 5

The molecular phylogenetic analysis of the cp protein-coding gene rcbL for 24 samples using the Maximum Likelihood method. The tree was constructed by using MEGA7. The stability of each tree node was tested by bootstrap analysis with 1000 replicates.

3. Discussion

We reported on the two genome sequence of C. carinatum Schousb and K. indica, which provides an important resource to study the evolutionary and inversion mechanism as well as the molecular barcoding of vegetables in the Asteraceae family. Although cp genomes of Angiosperms are well-conserved in the genomic structure, the inversion and IR expansion contraction occur frequently [7,8,9,10,11]. These results showed that the inversion of trnC-GCA to trnG-UCC in the LSC and the whole SSC occurred in these two species, which are congruent with most of the cp genomes of Asteraceae family [41,60,61,62]. Oddly, though some no-inversion occurs in the SSC region, the inversion of trnC-GCA to trnG-UCC in the LSC does not occur, which maybe could make amplification of trnC-GCA or trnG-UCC boundary regions a good resource for the molecular taxonomy of Asteraceae. Meanwhile, the SSC/IRA and LSC/IRB border extends into the ycf1 and rps19 with the subsequent formation of the ycf1 and rps19 pseudogenes, respectively [63,64]. The ycf1 and rps19 pseudogenes occur in most of the cp genomes, with deletion regions of similar size in species of the same family, but different deletion size between different families (Figure 4) [48]. Here, the analysis of the codon usage frequency and RSCU showed that leucine and isoleucine are the most popular amino acids in the cp genomes in the C. carinatum Schousb and K. indica as Angiosperms [23,52,65,66,67]. Meanwhile, a significantly higher T bases appearance was indicated in the 3th CDS position than in the 2nd position or 1st position. The T bases appear at the end of most favorite synonymous codons (RSCU > 1.0), but the reverse occurs for the G base (Table 1, Table 4 and Table 5). Consistent with most earlier reports about repeats and SSRs, the mononucleotide repeats with A/T repeats are more abundant, which may lead to AT richness in the Angiosperm cp genomes [68,69,70]. Here, we compared the whole cp genome of six species of Asteraceae, which revealed that the inversion is usual in most Asteraceae species, with some no-inversion occuring only in the SSC regions’ inversions, but not in the inversions of trnC-GCA to trnG-UCC in the LSC. Among the 129 Asteraceae species that have been sequenced, inversion occurs in the LSC region. Whether or not the SSC region is inverted is closely related to the genus. For example, Diplostephium has the most sequenced species and all of them have inverted SSC regions. Inconsistent inversions within the genus only occurred in the Aster genus and the Taraxacum genus (Supplemental Table S1). This phenomenon may be a key feature of the cp genome of Asteraceae. It provides an insight into future evolution research. Here, the rbcL genes represents a good diversity among some vegetables. The closer the relationship between species, the higher the sequence similarity of the genes. The phylogenetic tree of the rbcL gene among the Asteraceae species can obtain a better discrimination result. More importantly, it also could differentiate different family species, which illustrate that the rbcL gene is not only highly distinguished within the Asteraceae family, but also highly distinguished within most of vegetables. So, rbcL would be one of the best choices for DNA barcoding to distinguish between vegetables [71,72,73].

4. Materials and Methods

4.1. DNA Extraction and Sequencing

The total DNA was extracted from about 100 g fresh leaves of C. carinatum Schousb and K. indica using the CTAB method [74]. The total DNA quantity was evaluated by the value of the ratio of absorbance measurements at 260 nm and 280 nm (A260/A280) using Nanodrop2000 (Thermo Fisher Scientific, Waltham, MA, USA), whereas visual assessment of the DNA size and integrity was performed using gel electrophoresis. The DNA was sheared to fragments of 300~500 bp. Paired-end libraries were prepared wih the TruSeqTM DNA Sample Prep Kit and the TruSeq PE Cluster Kit. The genome was then sequenced using the HiSeq4000 platform (Illumina, Santiago, CA, USA).

4.2. Genome Assembly

The assembly of the cp genome of C. carinatum Schousb and K. indica were first carried out through the error correction and production of initial contigs by the GS FLX De Novo Assembler Software (Newbler V2.6). PCR amplification and Sanger sequencing were performed to verify the four-f- junction regions between the IRs and the LSC/SSC. The final cp genome of C. carinatum Schousb and K. indica were submitted to the Genbank with the accession number MG710386 and MG710387, respectively.

4.3. Gene Annotation and Codon Usage Analysis

The cp genome was annotated using BLAST [75] and DOGMA with manual corrections [76]. The tRNAscan-SE was used to identify the tRNA genes [77]. OGDRAW was used to draw the circular genome map [78]. MEGA5 was used to analyze the characteristics of the variations in the synonymous codon [79]. The relative synonymous codon usage values (RSCU), codon usage, and GC content were also determined by MEGA7 [80].

4.4. Repeat Structure and Single Sequence Repeats (SSRs) Analysis

Analysis of tandem repeats with more than 30 bp and a minimum of 90% sequence (forward, palindromic, reverse, and complement) and single sequence repeats (SSRs) was performed by REPuter [81] and MISA [82], respectively, with the same parameters as described in Ni et al. [53].

4.5. Comparative Genome Analysis of the C. carinatum Schousb and K. indica Genomes

Comparison of the overall cp genome of C. carinatum Schousb and K. indica with six cp genomes of Asteraceae was performed by mVISTA [83,84] using the annotation of C. carinatum Schousb as a reference.

4.6. Phylogenetic Analysis

A total of 24 complete cp genome sequences were downloaded from the NCBI Organelle Genome Resources database. MEGA7 [80] was used to construct the evolutionary tree of the cp protein-coding gene rcbL of 24 samples by the Maximum Likelihood method.

5. Conclusions

In this study, the complete cp genome of C. carinatum Schousb and K. indica were reported and analyzed for the first time. Both of them are key traditional Chinese medicines and edible vegetables. Comparing them with other Asteraceae species, the cp genome of these two species display two inversions, one is trnC-GCA to trnG-UCC in the LSC and other is the whole SSC region. We found that most of the inversions of trnC-GCA to trnG-UCC almost happened in every cp genome of the Asteraceae species. Seventy and ninety simple sequence repeats (SSRs) were present in the cp genome of C. carinatum Schousb and K. indica, respectively. Certainly, these results provide good chances for developing barcoding molecular markers for different families distinguish, which can be obtained through combination by different barcoding markers or through the boundary region marker. The rbcL is a good barcoding marker. Meanwhile, these results would be useful for the evolutionary study of C. carinatum Schousb and K. indica, and might also contribute to the genetics and barcoding of easily-confused leafy vegetables.
  67 in total

1.  A correlation between antioxidant activity and metabolite release during the blanching of Chrysanthemum coronarium L.

Authors:  Jiyoung Kim; Jung Nam Choi; Kang Mo Ku; Daejung Kang; Jong Sang Kim; Jung Han Yoon Park; Choong Hwan Lee
Journal:  Biosci Biotechnol Biochem       Date:  2011-04-22       Impact factor: 2.043

2.  Molecular phylogeny and biogeography of the Qinghai-Tibet Plateau endemic Nannoglottis (Asteraceae).

Authors:  Jian-Quan Liu; Tian-Gang Gao; Zhi-Duan Chen; An-Ming Lu
Journal:  Mol Phylogenet Evol       Date:  2002-06       Impact factor: 4.286

3.  [Chemical constituents and bioactivity of Kalimeris indica].

Authors:  Wenqing Xu; Xiaojian Gong; Xin Zhou; Chao Zhao; Huaguo Chen
Journal:  Zhongguo Zhong Yao Za Zhi       Date:  2010-12

4.  A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide.

Authors:  G C Allen; M A Flores-Vergara; S Krasynanski; S Kumar; W F Thompson
Journal:  Nat Protoc       Date:  2006       Impact factor: 13.491

5.  Complete chloroplast genome of the multifunctional crop globe artichoke and comparison with other Asteraceae.

Authors:  Pasquale L Curci; Domenico De Paola; Donatella Danzi; Giovanni G Vendramin; Gabriella Sonnante
Journal:  PLoS One       Date:  2015-03-16       Impact factor: 3.240

6.  Comparative Transcriptome and Chloroplast Genome Analyses of Two Related Dipteronia Species.

Authors:  Tao Zhou; Chen Chen; Yue Wei; Yongxia Chang; Guoqing Bai; Zhonghu Li; Nazish Kanwal; Guifang Zhao
Journal:  Front Plant Sci       Date:  2016-10-13       Impact factor: 5.753

7.  Molecular Structure and Phylogenetic Analyses of Complete Chloroplast Genomes of Two Aristolochia Medicinal Species.

Authors:  Jianguo Zhou; Xinlian Chen; Yingxian Cui; Wei Sun; Yonghua Li; Yu Wang; Jingyuan Song; Hui Yao
Journal:  Int J Mol Sci       Date:  2017-08-24       Impact factor: 5.923

8.  Complete chloroplast genomes from apomictic Taraxacum (Asteraceae): Identity and variation between three microspecies.

Authors:  Rubar Hussein M Salih; Ľuboš Majeský; Trude Schwarzacher; Richard Gornall; Pat Heslop-Harrison
Journal:  PLoS One       Date:  2017-02-09       Impact factor: 3.240

9.  Plastid Genome Comparative and Phylogenetic Analyses of the Key Genera in Fagaceae: Highlighting the Effect of Codon Composition Bias in Phylogenetic Inference.

Authors:  Yanci Yang; Juan Zhu; Li Feng; Tao Zhou; Guoqing Bai; Jia Yang; Guifang Zhao
Journal:  Front Plant Sci       Date:  2018-02-01       Impact factor: 5.753

10.  Complete Chloroplast Genome Sequence and Phylogenetic Analysis of Paeonia ostii.

Authors:  Shuai Guo; Lili Guo; Wei Zhao; Jiang Xu; Yuying Li; Xiaoyan Zhang; Xiaofeng Shen; Mingli Wu; Xiaogai Hou
Journal:  Molecules       Date:  2018-01-26       Impact factor: 4.411

View more
  6 in total

1.  A comparative analysis of the complete chloroplast genomes of three Chrysanthemum boreale strains.

Authors:  Swati Tyagi; Jae-A Jung; Jung Sun Kim; So Youn Won
Journal:  PeerJ       Date:  2020-07-03       Impact factor: 2.984

2.  Complete chloroplast genome sequence and phylogenetic analysis of Spathiphyllum 'Parrish'.

Authors:  Xiao-Fei Liu; Gen-Fa Zhu; Dong-Mei Li; Xiao-Jing Wang
Journal:  PLoS One       Date:  2019-10-23       Impact factor: 3.240

3.  Comparative analysis and implications of the chloroplast genomes of three thistles (Carduus L., Asteraceae).

Authors:  Joonhyung Jung; Hoang Dang Khoa Do; JongYoung Hyun; Changkyun Kim; Joo-Hwan Kim
Journal:  PeerJ       Date:  2021-01-14       Impact factor: 2.984

4.  Comparative plastid genomics of four Pilea (Urticaceae) species: insight into interspecific plastid genome diversity in Pilea.

Authors:  Jingling Li; Jianmin Tang; Siyuan Zeng; Fang Han; Jing Yuan; Jie Yu
Journal:  BMC Plant Biol       Date:  2021-01-07       Impact factor: 4.215

5.  Characteristic Volatile Fingerprints of Four Chrysanthemum Teas Determined by HS-GC-IMS.

Authors:  Zhiling Wang; Yixin Yuan; Bo Hong; Xin Zhao; Zhaoyu Gu
Journal:  Molecules       Date:  2021-11-24       Impact factor: 4.411

6.  Complete chloroplast genome sequence of Amomum villosum and comparative analysis with other Zingiberaceae plants.

Authors:  Li Yang; Chong Feng; Miao-Miao Cai; Jie-Hu Chen; Ping Ding
Journal:  Chin Herb Med       Date:  2020-09-16
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.