Literature DB >> 28303008

An analysis of Echinacea chloroplast genomes: Implications for future botanical identification.

Ning Zhang1, David L Erickson2, Padmini Ramachandran2, Andrea R Ottesen2, Ruth E Timme2, Vicki A Funk3, Yan Luo2, Sara M Handy2.   

Abstract

Echinacea is a common botanical used in dietary supplements, primarily to treat upper respiratory tract infections and to support immune function. There are currently thought to be nine species in the genus Echinacea. Due to very low molecular divergence among sister species, traditional DNA barcoding has not been successful for differentiation of Echinacea species. Here, we present the use of full chloroplast genomes to distinguish between all 9 reported species. Total DNA was extracted from specimens stored at the National Museum of Natural History, Smithsonian Institution, which had been collected from the wild with species identification documented by experts in the field. We used Next Generation Sequencing (NGS) and CLC Genomics Workbench to assemble complete chloroplast genomes for all nine species. Full chloroplasts unambiguously differentiated all nine species, compared with the very few single nucleotide polymorphisms (SNPs) available with core DNA barcoding markers. SNPs for any two Echinacea chloroplast genomes ranged from 181 to 910, and provided robust data for unambiguous species delimitation. Implications for DNA-based species identification assays derived from chloroplast genome sequences are discussed in light of product safety, adulteration and quality issues.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28303008      PMCID: PMC5428300          DOI: 10.1038/s41598-017-00321-6

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Echinacea, i.e., purple coneflower, is one of the most popular botanicals used in dietary supplements. The range of Echinacea spans the Atlantic drainage region of the United States and extends into south central Canada[1]. For this genus, the Southern United Stated is an important native area with two species, i.e. E. tennesseensis and E. laevigata endemic to the southeast United States. Use of Echinacea products has dramatically increased: sales in 2013 increased by 94.7% over those in 2012, making it the 8th most commonly sold herb in the United States[2]. By 2014, sales of Echinacea had increased by 79% from 2013 and it was the 3rd most commonly sold herb in the United States with the sales surpassing $50 million[3]. Although not approved as a drug by the Food and Drug Administration, Echinacea products are often marketed for treatment of upper respiratory infections[4, 5]; other marketed uses include immune system stimulant[6, 7], adjunct therapy for chronic candidiasis in women, and external wound healing[8]. Native Americans have been using Echinacea extensively to treat stomach cramps, rabies, toothaches, soremouth, throat, dyspepsia, colds, headache and snake bites[9]. The three species used most commonly in dietary supplements are E. purpurea, E. angustifolia and E. pallida, available as teas, capsules and tablets. Importantly, each species appears to have different pharmacological activities, depending on the particular method of preparation and on which part of a given plant is used[8]. In addition to the three species, there are six other closely-related species in the same genus, i.e., E. sanguinea, E. tennessensis, E. paradoxa, E. atrorubens, E. laevigata, and E. speciosa [10]. Ardjommand-woelkart and Bauer (2016), among others, have noted that both E. angustifolia (whole plant) and E. purpurea (dry root) have been associated with allergic reactions[11-13]. However, aside from these few instances, there are no known drug interactions or side effects[8] associated with the 9 species. The increased use of Echinacea species has led to concerns about adulterated products[14]. One of the reasons is that a few Echinacea species are phenotypically similar so it is easy to misidentify them if not familiar with the morphological variations among them[10]. The most common adulteration of Echinacea is the substitution of the root of Parthenium integrifolium for E. purpurea [15]. The American Herbal Pharmacopoeia Standard of Identity includes additional adulterants for E. purpurea: Helianthus spp., Lespedeza capita, Eryngium aquaticum, and Rudbekia nitida (http://www.herbal-ahp.org/documents/macroscopy/Ech_purpurea_macro.pdf, accessed 09/13/16). Even when Echinacea species are being used in products, it is not easy to differentiate among the three most appropriate Echinacea species, i.e., E. purpurea, E. angustifolia, and E. pallida; as a result, mislabeling occurs frequently[15, 16]. Given that different species may enact different effects, such adulteration could decrease the safety, efficacy and reliability of commercial Echinacea products. Distinguishing among Echinacea species using molecular methods is challenging due to extremely low levels of molecular divergence. This reflects a pattern seen among other members of Asteraceae, which demonstrate substantial morphological variation, but very little molecular differentiation, due to recent and rapid species radiations[17, 18]. Flagel et al.[19] used three nuclear markers (Adh, CesA, and GPAT) and two plastid loci (trnS and trnG) to examine the phylogeny of Echinacea; however, no resolved topologies were obtained, suggesting incomplete lineage sorting, as well as the potential for widespread hybridization within the genus[19]. DNA barcoding has been an effective tool for rapidly and accurately identifying many plant species[20-22]. Mitochondrial cytochrome c oxidase (CO1) has been successfully used as a barcode for animal species[23]; however, no single universal barcode has been entirely successful for distinguishing all plants to the species level[24]. In 2009, the Plant Working Group of the Consortium for the Barcode of Life (CBOL) proposed a 2-locus combination of matK + rbcL as a universal plant barcode; however, this approach only provides a discriminatory efficiency of 72%[20]. Many studies have shown that core DNA markers, i.e., matK and rbcL, cannot resolve closely-related species. For example, the commercially and medicinally important species of turmeric (Curcuma longa, Zingiberaceae) cannot be separated from almost a hundred other Curcuma species using matK and rbcL [25]. A similar phenomenon was recently described for Venus slippers (Paphiopedilum spp.), where DNA barcodes were only successful 18.86% of the time for this popular family of orchids[26]. A study on DNA differentiation of pine nut samples conducted in our lab also indicated that the core barcoding markers were not effective for this group, so ycf1 was developed for species level identification[27]. Subsequently, two alternative strategies were proposed to discriminate among plant species: the first was the use of multiple loci[28-30], and the second was the use of whole-chloroplast genomes, termed ‘super-barcoding’[31-34]. CBOL demonstrated that the use of seven plastid DNA barcoding markers only improved species discrimination from 72% to 73% when compared with the use of two core markers[20]. The idea of using whole chloroplast genomes to identify plant species was first proposed by Kane and Cronk (2008) and has been highlighted by a few recent review articles[22]. Using complete chloroplast genomes holds promise for efficient differentiation of species compared to a multi-locus approach, especially for closely related species such as Echinacea. Advances in next-generation sequencing platforms have reduced the obstacles of time, effort, and cost, necessary to acquire whole chloroplast genomes. With earlier methods, chloroplast DNA had to be enriched, a time-consuming task requiring substantial fresh leaf tissue[35]. Approaches using polymerase chain reaction (PCR) enrichment, such as long PCR[36] (using 27 primers) or multiple overlapping short-range PCR[37] (using 138 primers), have been used, but these procedures are time-consuming and labor-intensive, and the primers used in such assays do not work equally well across different taxonomic groups. Nonetheless, complete chloroplast genomes have been shown to be highly effective for resolving relationships among species with low molecular divergence[32, 33, 38, 39], and have been successfully employed for species identification[34]. Use of comparative chloroplast genomics has also been useful to identify divergent regions that can be employed for species-specific PCR-based diagnostics. For example, in 2013 Handy et al. used a large chloroplast dataset to design a species-specific assay to differentiate Pinus armandii, which causes a taste disturbance known as dysgeusia[40], from other species that do not. Although direct sequencing of genomic DNA is still costly, quickly advancing Next Generation Sequencing (NGS) technologies may ultimately prove to be more cost effective and technically efficient than other (often more time consuming) approaches to full chloroplast sequencing. For example, using the Illumina Miseq and Hiseq (Illumina, San Diego), 2 × 300 and 2 × 250 bp reads (respectively) can be obtained with rapid throughput kits (~27 hours) yielding as much as 12 to15 Gb from a MiSeq and as much as 60 to 120 Gb from a Hiseq. It was estimated that less than 1 GB of whole-DNA short reads can be effectively assembled into a full chloroplast genome with 51x coverage[41]. Therefore, this approach alleviates the need for expensive enrichment methods and fully leverages advances in DNA sequencing and bioinformatics. In this study we extracted DNA from dried herbarium tissue samples for all 9 Echinacea species, sequenced each using the Illumina MiSeq platform, and here present complete chloroplast genomes for each species. Additionally, we highlight how variation within chloroplast regions can be utilized to develop rapid species-specific assays.

Results

The data gathered for each species ranged from 434 MB for E. tennessensis to 2,531 MB for E. purpurea, with coverage of chloroplast genomes ranging between 20x for E. tennessensis and 65x for E. angustifolia. Additional information, including GenBank accession numbers, is available in Table 1.
Table 1

The nine species sampled in this study and information on the chloroplast genome assembly.

SpeciesRaw data size (MB)Number of readsSize of reads (bp)Coverage of chloroplaste genomeSize of chloroplast genome (bp)Accession number
E. purpurea 2,53110,394,8282 × 30040151,913KX548224
E. sanguinea 2,43710,966,2082 × 25051151,926KX548225
E. tennessensis 4341,814,3562 × 25020151,877KX548223
E. pallida 8324,078,6142 × 25033151,883KX548218
E. paradoxa 1,6926,202,4802 × 30051151,837KX548217
E. atrorubens 4721,923,8462 × 25031151,912KX548220
E. laevigata 5452,198,6222 × 25028151,886KX548219
E. angustifolia 8783,338,7422 × 30065151,935KX548221
E. speciosa 4831,941,4302 × 25022151,860KX548222
The nine species sampled in this study and information on the chloroplast genome assembly. The chloroplast genome of each Echinacea species appears to be collinear with the one of Parthenium argentatum, the most closely related public cpDNA genome, except for two inversions. These two inversions are specific to P. argentatum when compared with the other three Asteraceae species, i.e., E. purpurea, Helianthus annuus, and Chrysanthemum indicum (Figure S1). The first inversion is 891 bp long, located between trnS and psbM, and the second is 886 bp long, located between psbM and rpoB, these regions can be used for differentiating P. argentatum using PCR. In addition, positions of these two inversions in Echinacea species exchange with each other (Figure S1). Based on our alignments, no structural variations were detected among the nine Echinacea chloroplast genomes, so E. purpurea was used as an example to demonstrate the structure of Echinacea spp chloroplasts (Fig. 1).
Figure 1

Gene map of the Echinacea purpurea chloroplast genome. Genes shown outside the circle are transcribed clockwise and those inside are transcribed counterclockwise. Gene belonging to different functional groups are color-coded as indicated by icons on the lower left corner. Dashed area in the inner circle indicates the GC content of the chloroplast genome. LSC, SSC and IR means large single copy, small single copy and inverted repeat, respectively.

Gene map of the Echinacea purpurea chloroplast genome. Genes shown outside the circle are transcribed clockwise and those inside are transcribed counterclockwise. Gene belonging to different functional groups are color-coded as indicated by icons on the lower left corner. Dashed area in the inner circle indicates the GC content of the chloroplast genome. LSC, SSC and IR means large single copy, small single copy and inverted repeat, respectively. The length of the chloroplast genome of E. purpurea is 151,913 bp. There are two inverted repeats (IRs) of 25,070 bp each, separated by a large single-copy and small single-copy (LSC and SSC) region of 83,602 bp and 18,171 bp, respectively. The G + C content of E. purpurea is 37.6% across the whole chloroplast genome. In total, there are 131 genes with 81 unique protein-coding genes, six of which are duplicated in the IR (Fig. 1). There are 18 unique genes with introns, five of which are duplicated in the IR; two genes have two introns and 16 genes have only one intron. There are 36 tRNA genes, 29 of which are unique and seven of which are duplicated in the IR. There are four unique ribosomal DNA and all of them are duplicated in the IR so there are eight ribosomal DNA in total. As shown in Table 2, the number of base differences among these nine Echinacea species ranges from 181 (0.12%, E. paradox vs. E. atrorubens) to 910 (0.60%, E. atrorubens vs. E. purpurea). The number of differences between protein-coding genes is very low: 42 of 81 gene alignments are identical and the most divergent gene is ycf1, which has 31 variable sites and 4 indels within the 5059-bp alignment (Table 3). Table 4 lists the twenty-five most variable non-coding regions based on percentage of sequence identities. Eleven of these twenty-five overlap with those identified by Timme et al.[42] and three overlap with the ten plastid markers proposed by Shaw et al.[43] for low-level phylogenetic inferences[43] (Table 4).
Table 2

Number and percentage of differences among nine Echinacea chloroplast genomes.

paradox atrorubens sanguinea pallida angustifolia tennesseensis laevigata speciosa purpurea
paradox 0.12%0.23%0.18%0.44%0.52%0.51%0.50%0.56%
atrorubens 1810.20%0.18%0.48%0.55%0.55%0.55%0.60%
sanguinea 3453080.16%0.45%0.54%0.53%0.54%0.60%
pallida 2732762470.41%0.50%0.50%0.50%0.55%
angustifolia 6727276856290.47%0.45%0.45%0.53%
tennesseensis 7878378277657110.29%0.20%0.31%
laevigata 7728358137646774450.24%0.31%
speciosa 7688308277676893093650.23%
purpurea 849910908842811469478350
Table 3

The 10 most-divergent coding regions among nine Echinacea species.

GenesLengthVariable sitesIndelsPercentage of identical sites (%)Timme et al.[42]
ycf1 5,04931499.0
rps8 4053099.3
rpoA 1,0094199.3
rpoB 3,1987199.3
petD 4833099.4
matK 1,2826099.4
rbcL 14587099.5
ndhF 2,23211099.5
ndhI 5013099.6
psbE 2521099.6
Table 4

The 25 most-divergent non-coding regions among nine Echinacea species.

GenesLength (bp)Variable sitesIndelsPercentage of identical sites (%)Timme et al.[42] Shaw et al.[43]
ccsA → trnL-UAG 1382381.9
psbI → trnS-GCU 1444586.8
5 S rRNA → trnR-ACG 3120286.9
atpF → atpA 720288.9
rpl32 → ndhF 9044789.9
trnT-UGU → trnL-UAA 6035890.9
petN → psbM 5393490.9
rps4 → trnT-UGU 3923391.6
petD → rpoA 2053391.7
ndhI → ndhG 3883192.5
trnT-GGU → psbD 127011892.9
ndhD → ccsA 2342493.2
trnH-GUG → psbA 3858493.2
trnK-UUU → matK 3041393.4
psbC → trnS-UGA 2461393.6
ndhC → trnV-UAC 9989793.9
ycf3 → trnS-GCU 9108494.0
trnK-UUU → rps16 7832594.1
trnR-UCU → trnG-UCC 2215294.6
rps8 → rpl14 2031394.6
psaA → ycf3 7476594.9
psaI → ycf4 3960294.9
rpoC2 → rps2 2590295.0
rbcL → accD 5803295.0
rps2 → atpI 2331195.3
Number and percentage of differences among nine Echinacea chloroplast genomes. The 10 most-divergent coding regions among nine Echinacea species. The 25 most-divergent non-coding regions among nine Echinacea species. We used both coding and non-coding regions of the chloroplast genomes to effectively separate all Echinacea species and infer a phylogeny (Fig. 2). The nine Echinacea species separated into two clades with strong support. One clade is comprised of E. tennesseensis, E. speciosa, E. purpurea and E. laevigata. E. tennesseensis appears to be closely related to E. speciosa with a bootstrap value of 63%; and together they are both sister to E. purpurea with a bootstrap value of 100%. While E. laevigata is closely related to the other three species, i.e., E. tennesseensis, E. speciosa, and E. purpurea. The second clade is comprised of five species and is well-supported with a bootstrap value of 100%. E. angustifolia is closely related to the other four species, forming a clade with a bootstrap value of 100%. E. atrorubens is sister to E. paradox with a bootstrap value of 100%, and E. pallida is sister to E. sanguinea with a bootstrap value of 57%.
Figure 2

The ML tree of Echinacea reconstructed using chloroplast genomes. Numbers on branch nodes are bootstrap values. The branch connecting the outgroup Parthenium argentatum and nine Echinacea species was collapsed.

The ML tree of Echinacea reconstructed using chloroplast genomes. Numbers on branch nodes are bootstrap values. The branch connecting the outgroup Parthenium argentatum and nine Echinacea species was collapsed. In contrast, using the core barcoding region matK, we only identified 5 variable sites and 0 variable sites for rbcL within the 943-bp and 599-bp alignments, respectively. Even using both markers, no variations between E. purpurea and E. tennesseensis or between E. paradox and E. atrorubens could be identified. As a result, the tree constructed using the two core DNA barcoding markers (matK and rbcL) provided no resolution at most nodes (Fig. 3). E. pallida, E. sanguinea, E. paradox, and E. atrorubens formed a clade with a bootstrap value of 100%, which is congruent with the one reconstructed using chloroplast genomes. Echinacea paradox is sister to E. atrorubens with a 100% bootstrap value. However, the positions of E. pallida and E. sanguinea were unresolved and the positions of the other five species could not be resolved using matK and rbcL. Therefore, these two core DNA markers are too conserved to use in diagnostic identification questions.
Figure 3

ML trees reconstructed using matK + rbcL (left) and using chloroplast genomes (right) Numbers are bootstrap values, branches with bootstrap values <50% are collapsed. These two phylogenies show the power of chloroplast genomes for delimitation of Echinacea species when compared with core DNA barcodes.

ML trees reconstructed using matK + rbcL (left) and using chloroplast genomes (right) Numbers are bootstrap values, branches with bootstrap values <50% are collapsed. These two phylogenies show the power of chloroplast genomes for delimitation of Echinacea species when compared with core DNA barcodes. Examination of the 727-bp alignment of ITS regions yielded only 7 variable sites. Additionally, no variation was observed among the three species: E. atrorubens, E. purpurea, and E. angustifolia. Thus, differentiation of Echinacea species using the ITS region was not robust. In the tree reconstructed using ITS, only 2 bootstrap values of 8 nodes were higher than 50% (Fig. 4a). E. paradox, E. sanguinea, and E. speciosa are highly supported as one clade with a 81% bootstrap value; E. angustifolia, E. purpurea, E. atrorubens, E. laevigata, and E. pallida group into one clade with a bootstrap value of 58%. Interestingly, the topology reconstructed using ITS is substantially different from the one obtained using chloroplast genomes (Fig. 3).
Figure 4

ML trees reconstructed using ITS (a) and ITS + trnH-psbA (b). Numbers are bootstrap values, branches with the bootstrap value <50% are collapsed. Both phylogenies show the lack of resolution among Echinacea species using either combination of genes.

ML trees reconstructed using ITS (a) and ITS + trnH-psbA (b). Numbers are bootstrap values, branches with the bootstrap value <50% are collapsed. Both phylogenies show the lack of resolution among Echinacea species using either combination of genes. The alignment of the nine Echinacea chloroplast genomes suggests that the intergenic region between trnH and psbA may be an appropriate gene for DNA barcoding for the majority of Echinacea species - especially if used in combination with ITS. However, differentiation relies upon very few SNPs so validation using a greater number of authenticated individuals would be needed. The size of the trnH-psbA PCR product ranges from 499 (E. purpurea) to 511 bp (E. laevigata) and the number of SNPs between any two species ranges from 0 (E. atrorubens vs E. paradox and E. speciosa vs E. tennesseensis) to 16 (E. laevigata vs E. purpurea) (Table S1). According to the chloroplast alignment, universal primers for trnH-psbA (trnHf_05[44]/psbA3_f[45]) should successfully amplify all 9 Echinacea species. In addition, the alignment indicates that pairs of species that cannot be differentiated using trnH-psbA alone, such as (E. atrorubens and E. paradoxa) and (E. speciosa vs E. tennesseensis) could in theory be differentiated with the addition of the ITS marker. However, even with both markers, the number of diagnostic SNPs ranges from only 1 (E. speciosa vs E. tennesseensis) to 18 (E. purpurea vs E. laevigata) (Table S2) and bootstrap values for the tree constructed with trnH and psbA and ITS are extremely low (Fig. 4b).

Discussion

We successfully used direct sequencing of genomic DNA to recover complete chloroplast genomes from all nine reported Echinacea species and demonstrated that full chloroplast genomes can effectively differentiate all nine species. In addition to clarifying relationships among species, chloroplast genomes provide valuable data for improved DNA-based identification assays. This is especially true for closely related species, such as Echinacea that cannot be currently identified using most core DNA barcoding markers. Conclusive documentation of indels could identify regions for use with PCR based screening diagnostics[46]. For example if a region that distinguishes important species based on the size of DNA fragments can be identified and validated, this method could be used without sequencing, thus creating a rapid low cost approach to species identification. In the absence of suitable indels, other variable regions in closely related species can be targeted for either PCR, real-time PCR or other sequence based identification methods[40]. There are currently 916 chloroplast genomes of land plants available in GenBank, among them, 456 (49.8%) were sequenced since 2015. With the advancement of NGS technologies and bioinformatics tools, obtaining chloroplast genomes has become quick and relatively inexpensive. Some methods developed for metagenomics, like kSNP[47], Kraken[48] and Pathoscope[49], can be used to identify species using whole-genome sequencing data in conjunction with genome scale references. We are currently investigating these options, and they will be the focus of a future manuscript. The data generated for this Echinacea inquiry will become part of the U.S. Food and Drug Administration’s library of chloroplast genomes, the details of which will be discussed in a future publication. Future studies will explore the most useful and efficient way to identify Echinacea species using either whole chloroplast genomes or targeted assays developed from the full chloroplast genomes.

Methods

Sampling

We sampled all nine Echinacea species available from the U.S. National Herbarium. Voucher information can be found in Table 1 and Table 5.
Table 5

Sampling in this Echinacea study.

SpeciesVoucherYear collected
E. purpurea US 23490971958
E. sanguinea US 14680351930
E. tennessensis US 9804161916
E. pallida US 22330631948
E. paradoxa US 16530131935
E. atrorubens US 22351641955
E. laevigata US 33608601998
E. angustifolia US 28024331974
E. speciosa US 23490801960
Sampling in this Echinacea study.

DNA isolation, and sequencing

Total DNA was extracted from the dry leaves of specimens using the DNeasy Plant Mini Kit (part #69106, Qiagen, Valencia, CA,). For the library construction, 200 ng DNA was taken and sheared into ~550 bp contigs with the Covaris M220 Focused-ultrasonicator. The library was constructed using either the TruSeq DNA HT Sample Prep Kit (Illumina, FC-121-3003) or the TruSeq Nano DNA NeoPrep Kit (Illumina, NP-101-1001). Sequencing was run on the Illumina MiSeq Sequencer with MiSeq Reagent Kit v2 (MS-102-2001) or MiSeq Reagent Kit v3 (MS-102-3001) to obtain 2 × 250 or 2 × 300 reads, respectively.

Genome assembly and annotation

Before assembly, the reads were trimmed using the Qiagen CLC Genomics Workbench v.8.5.1 (hereafter called CLC) with default settings. Then the trimmed sequences were assembled into contigs using de novo assembly, implemented in CLC. In addition, a reference-guided assembly was performed using CLC with the published chloroplast genome of the closest available relative, Parthenium argentatum (NC_013553), as the reference genome. After finishing reference-guided assembly, a consensus sequence of Echinacea was obtained. Both the consensus sequence from the reference-guided assembly and the contigs from the de novo assembly were imported into Geneious Pro 9.0.4, and then those contigs of chloroplast were mapped onto the consensus sequence. The mapped contigs were checked and adjusted manually to align with the consensus sequence obtained using referenced-guided assembly[39]. The final sequence of Echinacea chloroplast genome is the ordered sequence of those mapped contigs. We annotated the chloroplast genome using Geneious with the chloroplast genome of Helianthus annuus (NC_007977) as the reference since the annotation of H. annuus is known to be accurate[42, 50]. All sequence data has been deposited in Genbank (Accession numbers KX548217- KX548225, Table 1).

Retrieving gene sequences of widely-used DNA barcoding markers

In order to test if core DNA barcode markers can be used for identification here, we obtained gene sequences of matK, rbcL, and ITS (internal transcribed spacer) for Echinacea species and for their closely-related species. In order to be effective, these needed to have variable bases in each of the nine species being investigated. Based on the alignment of P. argentatum with nine Echinacea chloroplast genomes, we extracted two core plastid DNA barcoding markers matK and rbcL. These markers used for DNA barcoding were delimitated by corresponding primers, rbcLa-F (ATGTCACCACAAACAGAGACTAAAGC)[51]/rbcLa-R (GTAAAATCAAGTCCACCRCG)[28] for rbcL, matK-xf (TAATTTACGATCAATTCATTC)[52]/matK-MALP (ACAAGAAAGTCGAAGTAT)[53] for matK. We also obtained the gene sequences of ITS, another commonly used marker, from each Echinacea species. To obtain the ITS sequence for each species, the contig containing the ITS was obtained. The contigs of each species obtained using de novo assembly mentioned above were built into a BLAST database on the local server, then the ITS sequence of Echinacea pallida (EU785938) was used as the seed to search against the database. Usually, the best-hit contig contains the sequence of ETS, 18S, ITS1, 5.8S, ITS2, and 26S. Then we delimitated the region of ITS using the corresponding primers, i.e., ITS1 (TCCGTAGGTGAACCTGCGG)[54]/ITS4 (TCCTCCGCTTATTGATATGC)[54]. Since the ITS sequence of P. argentatum is not available, H. annuus (JX867644) was used as the outgroup.

Phylogenetic analysis

Whole chloroplast genomes of nine Echinacea species and the one of Parthenium were aligned using MAFFT v7[55]. As the sequences of IRa and IRb are almost identical, only one of them was included in the phylogenetic analyses. In addition, the sequences of tRNAs and rDNAs of nine Echinacea species are almost identical, so those genes were removed for all samples from the alignment. In order to reduce phylogenetic noise, three inverted intergenic regions of Parthenium were deleted from the alignment. The program PartitionFinder[56] was used for identifying partitions used in developing model parameters for phylogeny estimation. A maximum likelihood (ML) tree was inferred with RAxML v8.1[57] using the model of GTRGAMMAI, and 1,000 rapid bootstrap replications were performed. The sequences of matK + rbcL and ITS were aligned with MAFFT v7, then the ML trees were reconstructed using RAxML with the GTRGAMMAI model, and 1,000 rapid bootstrap replications were performed. Since this study mainly focuses on species delimitation rather than phylogeny, these genes were not concatenated for further phylogenetic analyses. These alignments were deposited into the DRYAD with the accession number of XXXX.
  44 in total

1.  Biological identifications through DNA barcodes.

Authors:  Paul D N Hebert; Alina Cywinska; Shelley L Ball; Jeremy R deWaard
Journal:  Proc Biol Sci       Date:  2003-02-07       Impact factor: 5.349

2.  High-resolution phylogeny for Helianthus (Asteraceae) using the 18S-26S ribosomal DNA external transcribed spacer.

Authors:  Ruth E Timme; Beryl B Simpson; C Randal Linder
Journal:  Am J Bot       Date:  2007-11       Impact factor: 3.844

3.  Chloroplast DNA phylogeny, reticulate evolution, and biogeography of Paeonia (Paeoniaceae).

Authors:  T Sang; D Crawford; T Stuessy
Journal:  Am J Bot       Date:  1997-08       Impact factor: 3.844

4.  Family-level relationships of Onagraceae based on chloroplast rbcL and ndhF data.

Authors:  Rachel A Levin; Warren L Wagner; Peter C Hoch; Molly Nepokroeff; J Chris Pires; Elizabeth A Zimmer; Kenneth J Sytsma
Journal:  Am J Bot       Date:  2003-01       Impact factor: 3.844

Review 5.  Medicinal properties of Echinacea: a critical review.

Authors:  B Barrett
Journal:  Phytomedicine       Date:  2003-01       Impact factor: 5.340

6.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

7.  Ten species in one: DNA barcoding reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator.

Authors:  Paul D N Hebert; Erin H Penton; John M Burns; Daniel H Janzen; Winnie Hallwachs
Journal:  Proc Natl Acad Sci U S A       Date:  2004-10-01       Impact factor: 11.205

8.  Analysis of 81 genes from 64 plastid genomes resolves relationships in angiosperms and identifies genome-scale evolutionary patterns.

Authors:  Robert K Jansen; Zhengqiu Cai; Linda A Raubeson; Henry Daniell; Claude W Depamphilis; James Leebens-Mack; Kai F Müller; Mary Guisinger-Bellian; Rosemarie C Haberle; Anne K Hansen; Timothy W Chumley; Seung-Bum Lee; Rhiannon Peery; Joel R McNeal; Jennifer V Kuehl; Jeffrey L Boore
Journal:  Proc Natl Acad Sci U S A       Date:  2007-11-28       Impact factor: 11.205

9.  Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors:  Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal:  Bioinformatics       Date:  2012-04-27       Impact factor: 6.937

10.  A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region.

Authors:  W John Kress; David L Erickson
Journal:  PLoS One       Date:  2007-06-06       Impact factor: 3.240

View more
  18 in total

1.  Using herbarium-derived DNAs to assemble a large-scale DNA barcode library for the vascular plants of Canada.

Authors:  Maria L Kuzmina; Thomas W A Braukmann; Aron J Fazekas; Sean W Graham; Stephanie L Dewaard; Anuar Rodrigues; Bruce A Bennett; Timothy A Dickinson; Jeffery M Saarela; Paul M Catling; Steven G Newmaster; Diana M Percy; Erin Fenneman; Aurélien Lauron-Moreau; Bruce Ford; Lynn Gillespie; Ragupathy Subramanyam; Jeannette Whitton; Linda Jennings; Deborah Metsger; Connor P Warne; Allison Brown; Elizabeth Sears; Jeremy R Dewaard; Evgeny V Zakharov; Paul D N Hebert
Journal:  Appl Plant Sci       Date:  2017-12-22       Impact factor: 1.936

2.  Sequencing and Analysis of Chrysanthemum carinatum Schousb and Kalimeris indica. The Complete Chloroplast Genomes Reveal Two Inversions and rbcL as Barcoding of the Vegetable.

Authors:  Xia Liu; Boyang Zhou; Hongyuan Yang; Yuan Li; Qian Yang; Yuzhuo Lu; Yu Gao
Journal:  Molecules       Date:  2018-06-05       Impact factor: 4.411

3.  Accurate authentication of Dendrobium officinale and its closely related species by comparative analysis of complete plastomes.

Authors:  Shuying Zhu; Zhitao Niu; Qingyun Xue; Hui Wang; Xuezhu Xie; Xiaoyu Ding
Journal:  Acta Pharm Sin B       Date:  2018-06-01       Impact factor: 11.413

4.  Prevalence of isomeric plastomes and effectiveness of plastome super-barcodes in yews (Taxus) worldwide.

Authors:  Chao-Nan Fu; Chung-Shien Wu; Lin-Jiang Ye; Zhi-Qiong Mo; Jie Liu; Yu-Wen Chang; De-Zhu Li; Shu-Miaw Chaw; Lian-Ming Gao
Journal:  Sci Rep       Date:  2019-02-26       Impact factor: 4.379

5.  Plastome Evolution in Dolomiaea (Asteraceae, Cardueae) Using Phylogenomic and Comparative Analyses.

Authors:  Jun Shen; Xu Zhang; Jacob B Landis; Huajie Zhang; Tao Deng; Hang Sun; Hengchang Wang
Journal:  Front Plant Sci       Date:  2020-04-15       Impact factor: 5.753

6.  Microfluidic Enrichment Barcoding (MEBarcoding): a new method for high throughput plant DNA barcoding.

Authors:  Morgan R Gostel; Jose D Zúñiga; W John Kress; Vicki A Funk; Caroline Puente-Lelievre
Journal:  Sci Rep       Date:  2020-05-26       Impact factor: 4.379

7.  Sequencing the Vine of the Soul: Full Chloroplast Genome Sequence of Banisteriopsis caapi.

Authors:  Padmini Ramachandran; Ning Zhang; William B McLaughlin; Yan Luo; Sara Handy; James Alan Duke; Rodolfo Vasquez; Andrea Ottesen
Journal:  Genome Announc       Date:  2018-06-21

Review 8.  Echinacea biotechnology: advances, commercialization and future considerations.

Authors:  Jessica L Parsons; Stewart I Cameron; Cory S Harris; Myron L Smith
Journal:  Pharm Biol       Date:  2018-12       Impact factor: 3.503

9.  Complete Chloroplast Genomes from Sanguisorba: Identity and Variation Among Four Species.

Authors:  Xiang-Xiao Meng; Yan-Fang Xian; Li Xiang; Dong Zhang; Yu-Hua Shi; Ming-Li Wu; Gang-Qiang Dong; Siu-Po Ip; Zhi-Xiu Lin; Lan Wu; Wei Sun
Journal:  Molecules       Date:  2018-08-24       Impact factor: 4.411

10.  Comparison of the complete plastomes and the phylogenetic analysis of Paulownia species.

Authors:  Pingping Li; Gongli Lou; Xiaoran Cai; Bin Zhang; Yueqin Cheng; Hongwei Wang
Journal:  Sci Rep       Date:  2020-02-10       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.