Literature DB >> 33854846

Comparative chloroplast genomes and phylogenetic analysis of Aquilegia.

Wei Zhang1, Huaying Wang1, Jianhua Dong1, Tengjiao Zhang1, Hongxing Xiao1.   

Abstract

PREMISE: Aquilegia is an ideal taxon for studying the evolution of adaptive radiation. Current phylogenies of Aquilegia based on different molecular markers are inconsistent, and therefore a clear and accurate phylogeny remains uncertain. Analyzing the chloroplast genome, with its simple structure and low recombination rate, may help solve this problem.
METHODS: Next-generation sequencing data were generated or downloaded for Aquilegia species, enabling their chloroplast genomes to be assembled. The assemblies were used to estimate the genome characteristics and infer the phylogeny of Aquilegia.
RESULTS: In this study, chloroplast genome sequences were assembled for Aquilegia species distributed across Asia, North America, and Europe. Three of the genes analyzed (petG, rpl36, and atpB) were shown to be under positive selection and may be related to adaptation. The phylogenetic tree of Aquilegia showed that its member species formed two clades with high support, North American and European species, with the Asian species being paraphyletic; A. parviflora and A. amurensis clustered with the North American species, while the remaining Asian species were found in the European clade. In addition, A. oxysepala var. kansuensis should be considered as a separate species rather than a variety. DISCUSSION: The complete chloroplast genomes of these Aquilegia species provide new insights into the reconstruction of the phylogeny of related species and contribute to the further study of this genus.
© 2021 Zhang et al. Applications in Plant Sciences is published by Wiley Periodicals, Inc. on behalf of the Botanical Society of America.

Entities:  

Keywords:  adaptive evolution; chloroplast genomes; columbine; phylogeny

Year:  2021        PMID: 33854846      PMCID: PMC8027367          DOI: 10.1002/aps3.11412

Source DB:  PubMed          Journal:  Appl Plant Sci        ISSN: 2168-0450            Impact factor:   1.936


The genus Aquilegia L. (columbine), comprising approximately 70 perennial herb species, belongs to the family Ranunculaceae and is widely distributed in North America and Eurasia (Munz, 1946). Recently, several new species were reported, bringing the number of columbine taxa to about 110 species (Erst et al., 2017, 2020; Luo et al., 2018). Although the morphologies and habitats of columbine species differ, the phylogenetic resolution of this genus at the molecular level is very low, and therefore the genus is considered to be a widespread population complex. The morphological differences of the floral spurs between species of Aquilegia are easily observed and attract different pollinators, which has led to the rapid divergence of the columbines to form a large number of species (Hodges and Derieg, 2009). Moreover, natural hybrids among columbine species have also been frequently reported (Taylor, 1967). As a result, Aquilegia species have become a model for evolution studies; however, the phylogenetic trees presented in previous studies contain multifurcations, which may be caused by a lack of informative sites (Hodges and Arnold, 1994; Bastida et al., 2010; Fior et al., 2013), complicating subsequent research on the speciation of this genus. It is therefore very important to construct a relatively clear phylogenetic relationship of these species for future evolutionary studies. Genomic sequencing could compensate for the lack of informative sites in shorter sequences. Notably, the decline in sequencing costs in recent years has made this approach possible for all parts of the plant genome (nuclear, mitochondrial, and chloroplast). Because of the easy interspecific hybridization among Aquilegia species, the nuclear genome structure is complex, with a high recombination rate (Filiault et al., 2018). The mitochondrial genomes of the angiosperms are relatively complex; the order of genes differs among species, and only some regions of the genome are conserved (Kubo et al., 2000). In contrast, the monophyletic inheritance of the chloroplast genome sequence is more suitable for the phylogenetic analysis of Aquilegia due to its low recombination rate and high level of conservation (Dong et al., 2012; Curci et al., 2015; Downie and Jansen, 2015; Nadachowska‐Brzyska et al., 2015). Fior et al. (2013) selected 21 chloroplast genes with rapid evolutionary rates to establish the phylogenetic relationships among Aquilegia species. Although the topology of this phylogeny had a lower resolution and support for some branches (Fior et al., 2013) than previously constructed trees based on fewer chloroplast sequences (Hodges and Arnold, 1994; Bastida et al., 2010), the resolution and support rate were improved. Hence, the complete chloroplast genome sequence is an ideal molecular marker for inferring the phylogenetic relationships of the Aquilegia genus. The chloroplast genome is a closed‐loop structure approximately 115–210 kbp in size, and generally consists of four parts: two inverted repeat regions (IRA and IRB), a large single‐copy region (LSC), and a small single‐copy region (SSC) (Yurina and Odintsova, 1998; Park et al., 2018). Some plant groups have special chloroplast genome structures, such as species of the genus Erodium L’Hér., which lack the IR regions (Guisinger et al., 2010). Because of its stable genomic structure, identical gene content, and conserved sequence (Dong et al., 2012), the chloroplast genome is used as a molecular marker for the inference of phylogenetic relationships (Li et al., 2018; Liu et al., 2018; Lu et al., 2018; Mader et al., 2018; Xie et al., 2018) and adaptative evolution (Dong et al., 2018; Fan et al., 2018). In this study, we assembled and analyzed the chloroplast genomes of 14 columbine species from Asia, Europe, and North America, and constructed a phylogenetic tree of the genus to shed light on radiative speciation in Aquilegia and lay a foundation for inferring the evolutionary history of the columbines.

METHODS

Plant materials

Seeds of A. amurensis Kom., A. ecalcarata Maxim., A. oxysepala Trautv. & C. A. Mey. var. kansuensis Brühl, A. parviflora Ledeb., A. rockii Munz, A. viridiflora Pall., and A. yabeana Kitag. were collected from China (Appendix 1), and all voucher specimens were deposited in the Northeast Normal University Herbarium in Changchun, China (accession numbers NENU_Aq1001–NENU_Aq1007). Seeds were grown in the greenhouse of Northeast Normal University with 12 h of light at 25°C and 12 h of dark at 20°C.

DNA extraction and sequencing

Total genomic DNA was extracted from fresh leaves using a modified cetyltrimethylammonium bromide (CTAB) method (Doyle and Doyle, 1987). Genomic library generation and sequencing were used to acquire 2 × 150‐bp paired reads generated on the Illumina Xten by Biomarker Technologies (Beijing, China). Furthermore, raw reads of A. aurea Janka, A. chrysantha A. Gray, A. formosa Fisch. ex DC., A. japonica Nakai & Hara, A. oxysepala var. oxysepala, A. sibirica Schur ex Nyman, and A. vulgaris L. previously published by Filiault et al. (2018) were downloaded from the National Center for Biotechnology Information (NCBI) Sequence Read Archive database (http://www.ncbi.nlm.nih.gov/sra [accessed December 2018]) to assemble the chloroplast genome (Appendix 2).

Chloroplast genome assembly and annotation

To obtain high‐quality genome sequences, all reads were filtered as follows: remove reads containing adapters, a content of more than 10% N, or more than 50% low‐quality bases (quality value <10). We then used the chloroplast_assembly_protocol pipeline to assemble the chloroplast genome (Sancho et al., 2018). Briefly, DUK (http://duk.sourceforge.net) was used to extract the chloroplast reads, which were filtered using FASTQC version 0.10.1 (Andrew, 2010) and Trimmomatic version 0.32 (Bolger et al., 2014). Next, the pass‐filtered reads were de novo assembled using Velvet version 1.2.07 (Zerbino, 2010), SSPACE Basic version 2.0 (Boetzer et al., 2011), and GapFiller version 1.11 (Boetzer and Pirovano, 2012; Nadalin et al., 2012), with annotation performed using the online program DOGMA (Wyman et al., 2004). Finally, the circular genome map of Aquilegia was illustrated using the Organellar Genome DRAW tool (Lohse et al., 2013) after manually checking the annotation results.

Repeat sequence characterization

The Perl script MISA (Thiel et al., 2003) was employed to identify the location of simple sequence repeat (SSR) loci in the complete chloroplast genome sequences. The thresholds used to detect the SSRs were 10, 5, 4, 3, 3, and 3 for mono‐, di‐, tri‐, tetra‐, penta‐, and hexanucleotides, respectively. The recognition results were checked manually, and the redundant results were removed. REPuter (Kurtz et al., 2001) was then used to identify repeat sequences in the chloroplast, including palindromic, forward, reverse, and complementary sequences. The parameters were set as follows: (1) Hamming distance of 3, (2) 90% or greater sequence identity, and (3) a minimum repeat size of 30 bp. The default settings were used for all other parameters.

Genetic divergence and phylogenetic analysis of Aquilegia

The homologous genes were extracted from 14 Aquilegia species using a Python script (available on GitHub, see Data Availability Statement), after which these homologous genes were aligned using MAFFT version 7.407 (Katoh and Standley, 2013) with the default settings. Furthermore, the nucleotide diversity (π) of these homologous genes was analyzed using DnaSP version 6.0 (Rozas et al., 2017). To avoid the effect of sequence redundancy when building the phylogenetic trees, we selected the LSC regions, IRB regions, and SSC regions as arrays. In addition, the published chloroplast genome sequences of A. rockii (MK573514.1, NC_033341.1), A. ecalcarata (NC_041528.1, MK569474.1), and A. coerulea (NC_041527.1, MK569492.1) in GenBank were used. Semiaquilegia adoxoides Makino (MH142265.2) was considered as the outgroup (Fior et al., 2013; Zhai et al., 2019). The array was aligned using MAFFT version 7.407 and was adjusted manually in CLC Sequence Viewer 8.0 (QIAGEN Digital Insights, Redwood City, California, USA). The maximum likelihood tree was generated using IQ‐TREE version 1.6.12 using 1000 bootstrap replicates (Nguyen et al., 2015). Meanwhile, the Bayesian inference trees were produced using MrBayes version 3.2 (Ronquist et al., 2012), based on Markov chain Monte Carlo analyses run for 1,000,000 generations. These trees were sampled every 1000 generations with the first 250 trees discarded in the burn‐in period. The program was stopped when the standard deviation was less than 0.01. The final tree was visualized in iTOL (https://itol.embl.de/itol.cgi) (Letunic and Bork, 2006).

Natural selection analysis

To identify genes under selection in Aquilegia, the genes of the chloroplast genomes were analyzed with the PAML package (Yang, 2007). First, all coding sequences (CDS) of the Aquilegia species and other Ranunculaceae species were extracted from the genome sequences using a Python script (Appendix 3). Each single‐copy sequence was aligned according to its codons using MEGA X (Kumar et al., 2018) and checked manually, and then used as input for CodeML in the PAML package. Moreover, the concatenated alignment was also used to construct phylogenetic relationships among species using IQ‐TREE version 1.6.12 (Nguyen et al., 2015). Finally, each CDS alignment was used to calculate the nonsynonymous (dN) and synonymous (dS) substitution rates, along with their ratio (ω = dN/dS). ω > 1 indicates positive selection, ω = 1 indicates neutral selection, and ω < 1 indicates negative selection (Yang and Nielsen, 2002). The branch‐site model (X. Yang et al., 1998; Z. Yang et al., 1998) was combined with the naive empirical Bayes (NEB) method, and the Bayesian empirical Bayes (BEB) method was used to identify potential positively selected genes using CodeML in the PAML package. The null hypothesis allows a ω for each clade (model = 2, NSsites = 2, fix ω = 1, and ω = 1), while the alternative hypothesis allows a ω for Aquilegia and another ω for other clades (model = 2, NSsites = 2, fix ω = 0, and ω = 2). A chi‐square test was completed with chi2 in the PAML package. A P value > 0.05 suggests the null hypothesis should be accepted; otherwise, the alternative hypothesis should be accepted and the site should be considered a positively selected gene.

RESULTS

Features of Aquilegia chloroplast genomes

The complete chloroplast genomes of the Aquilegia species from Asia, North America, and Europe displayed a typical quadripartite structure similar to the majority of land plant chloroplast genomes (Fig. 1). The sizes of the complete chloroplast genomes ranged from 157,689 to 161,387 bp. All complete chloroplast genomes were composed of four sections, including an LSC region (86,761–88,076 bp), an SSC region (17,466–18,879 bp), and two IR regions (25,612–28,015 bp). The GC content of the 14 species was very similar in both the whole chloroplast genome (38.94%–39.08%) and the corresponding regions (LSC [37.43%–37.71%], SSC [33.30%–33.91%], and IR [43.04%–43.41%]), with the IR regions having the highest GC contents (Table 1). These sequence data are available in GenBank (accession numbers MT919110–MT9191116 and MN809218–MN809224).
FIGURE 1

Gene maps of the Aquilegia viridiflora chloroplast genome. Genes inside the circle are transcribed clockwise, while genes outside are transcribed counterclockwise (as indicated by arrows). Different colors indicate different functional groups. The dark gray shading within the inner circle corresponds to the GC content and the light gray shading corresponds to the AT content. IRA and IRB, inverted repeat regions; LSC, large single‐copy region; ORF, open reading frame; SSC, small single‐copy region.

TABLE 1

Summary of the complete Aquilegia chloroplast genomes sequenced in this study.

SpeciesLSCSSCIRsTotalNCBI no.
Length (bp)GC (%)Length (% of genome)Length (bp)GC (%)Length (% of genome)Length (bp)GC (%)Length (% of genome)Length (bp)GC (%)
A. aurea a 87,72437.5454.8018,87933.3011.7926,73543.3716.70160,07339.00MT919114
A. vulgaris a 88,13737.4354.9818,76133.6411.7026,71143.3316.66160,32038.96MT919112
A. japonica a 87,98637.5255.1318,16933.4711.3826,72343.3716.74159,60139.00MT919110
A. oxysepala var. oxysepala a 87,65137.4355.0818,47433.9111.6126,50343.3116.65159,13138.96MT919111
A. sibirica a 88,05337.4454.5617,46633.3610.8227,93443.0417.31161,38738.94MT919115
A. oxysepala var. kansuensis 87,65537.6555.0318,63833.6411.7026,49843.2516.64159,28939.05MN809219
A. yabeana 88,03037.6054.8618,74433.5911.6826,84543.2916.73160,46439.04MN809218
A. ecalcarata 87,66237.6354.7718,74733.5011.7126,82443.3616.76160,05739.07MN809221
A. rockii 87,37537.6455.0918,33933.5111.5626,44543.2316.67158,60439.04MN809222
A. viridiflora 88,07637.6154.9718,66233.7511.6526,74443.2016.69160,22639.01MN809220
A. amurensis 87,86537.7155.7218,60033.6211.8025,61243.4116.24157,68939.08MN809224
A. parviflora 87,96937.7055.6118,61233.5911.7725,79943.3916.31158,17939.08MN809223
A. chrysantha a 87,37137.5254.7218,72433.5011.7326,78643.3416.78159,66738.96MT919113
A. formosa a 87,58837.6054.3717,48233.3810.8528,01543.1617.39161,10039.04MT919116

IRs = inverted repeat regions; LSC = large single‐copy region; NCBI = National Center for Biotechnology Information; SSC = small single‐copy region.

Raw data were downloaded from NCBI.

Gene maps of the Aquilegia viridiflora chloroplast genome. Genes inside the circle are transcribed clockwise, while genes outside are transcribed counterclockwise (as indicated by arrows). Different colors indicate different functional groups. The dark gray shading within the inner circle corresponds to the GC content and the light gray shading corresponds to the AT content. IRA and IRB, inverted repeat regions; LSC, large single‐copy region; ORF, open reading frame; SSC, small single‐copy region. Summary of the complete Aquilegia chloroplast genomes sequenced in this study. IRs = inverted repeat regions; LSC = large single‐copy region; NCBI = National Center for Biotechnology Information; SSC = small single‐copy region. Raw data were downloaded from NCBI. The chloroplast genomes of the Aquilegia species contained 154 genes (98 protein‐coding genes, 48 transfer RNA [tRNA] genes, and eight ribosomal RNA genes). Most of the genes located in the LSC and SSC regions were single copy, while 26 of the genes located in the IR regions were duplicated, including 11 protein‐coding genes (rps7, rps12, rps19, rpl2, rpl23, orf42, orf56, ycf2, ycf15, ycf68, and ndhB), 11 tRNA genes (trnI‐CAU [×3], trnL‐CAA, trnG‐UCC, trnV‐GAC, trnI‐GAU, trnA‐UGC [×2], trnR‐ACG, and trnN‐GUU), and four rRNA genes (rrn4.5, rrn5, rrn16, and rrn23). The LSC region comprises 63 protein‐coding genes and 25 tRNA genes, and the SSC region comprises 13 protein‐coding genes and a single tRNA gene. Among all the genes, seven protein‐coding genes (rpoC1, atpF, rpl2, ycf68, ndhB, ndhF, and ndhA) contained only one intron, while one protein‐coding gene (ycf3) contained two introns (Appendix S1).

Repeat analysis

We identified a range of 84–89 repeat sequences in the 14 Aquilegia chloroplast genomes, including 45–51 palindromic repeats and 33–44 forward repeats; reverse and complement repeats were not identified (Fig. 2A). In all species, the palindromic repeats were 56–398 bp in length and the forward repeats were 56–357 bp in length (Fig. 2B, C). The SSR analysis of the Aquilegia chloroplast genome identified a range of 69–84 microsatellites of six types; A. chrysantha and A. viridiflora had the lowest and highest numbers of microsatellites, respectively (Fig. 3A). Among all SSRs, the most abundant type was mononucleotide repeats, which accounted for 66.51% of the total SSRs, followed by dinucleotide (13.32%), tetranucleotide (7.22%), trinucleotide (5.91%), pentanucleotide (4.32%), and hexanucleotide (2.72%) repeats. AT repeats accounted for a larger proportion of mononucleotide repeats (92.95%) than GC repeats (7.05%). Similarly, the AT content (90.15%) accounted for a larger proportion than the GC content (9.85%) in dinucleotides (Fig. 3B, Appendix S2). Not surprisingly, all SSRs were detected in noncoding regions of the Aquilegia chloroplast genome.
FIGURE 2

Analysis of repeat sequences in the Aquilegia chloroplast genomes, performed using REPuter. (A) Number of different repeat sequences detected in Aquilegia species. Blue and green represent palindrome repeat sequences and forward repeat sequences, respectively. (B) Length of the palindrome repeat sequences in Aquilegia species. (C) Length of the forward repeat sequence in Aquilegia species. In (B) and (C), green, orange, and purple represent European species, Asian species, and North American species, respectively.

FIGURE 3

Analysis of simple sequence repeats (SSRs) in Aquilegia chloroplast genomes, performed using MISA (Thiel et al., 2003). (A) Number of various SSR types (mono‐, di‐, tri‐, tetra‐, penta‐, and hexanucleotides) detected in Aquilegia species. (B) Type and frequency of each SSR detected in the Aquilegia species analyzed.

Analysis of repeat sequences in the Aquilegia chloroplast genomes, performed using REPuter. (A) Number of different repeat sequences detected in Aquilegia species. Blue and green represent palindrome repeat sequences and forward repeat sequences, respectively. (B) Length of the palindrome repeat sequences in Aquilegia species. (C) Length of the forward repeat sequence in Aquilegia species. In (B) and (C), green, orange, and purple represent European species, Asian species, and North American species, respectively. Analysis of simple sequence repeats (SSRs) in Aquilegia chloroplast genomes, performed using MISA (Thiel et al., 2003). (A) Number of various SSR types (mono‐, di‐, tri‐, tetra‐, penta‐, and hexanucleotides) detected in Aquilegia species. (B) Type and frequency of each SSR detected in the Aquilegia species analyzed.

Sequence divergence and phylogeny of Aquilegia

The π value was used to evaluate sequence divergence in Aquilegia chloroplast genomes. In genic regions, the range of variation in π was 0–0.00511, with a mean of 0.00061; π of the LSC region (0–0.00511, with a mean of 0.00055) was higher than in other regions (0–0.00453 in the IR regions, with a mean of 0.00041; 0–0.00252 in the SSC region, with a mean of 0.0013). Overall, these results demonstrated that the sequence divergence in Aquilegia chloroplast genomes was small, but some regions showed high genetic diversity, such as rpoC2, trnS‐GGA, and trnL‐CAA (π > 0.004) (Fig. 4, Appendix S3).
FIGURE 4

The nucleotide diversity of all chloroplast genes in Aquilegia. Red circles represent highly polymorphic genes.

The nucleotide diversity of all chloroplast genes in Aquilegia. Red circles represent highly polymorphic genes. To reveal the phylogeny of Aquilegia, aligned chloroplast genome sequences were used to construct phylogenetic trees using both maximum likelihood and Bayesian analyses. The two resulting trees showed identical topologies, and the bootstrap values and posterior probabilities were very high for each lineage. The Aquilegia species were divided into two clades: one clade contained A. aurea and A. vulgaris from Europe and A. sibirica, A. oxysepala var. oxysepala, A. japonica, A. ecalcarata, A. rockii, A. viridiflora, A. yabeana and A. oxysepala var. kansuensis from Asia; the other clade contained A. formosa, A. chrysantha, and A. coerulea from North America and A. amurensis and A. parviflora from Asia. All the topologies supported A. japonica and A. oxysepala var. oxysepala as sister clades, and A. sibirica shared a common ancestor with them. Interestingly, the A. ecalcarata sequence assembled by us clustered with A. rockii, while the A. ecalcarata sequence downloaded from GenBank was grouped with A. yabeana and A. oxysepala var. gansuensis. In addition, A. viridiflora formed a single clade with A. ecalcarata and A. rockii. Although A. oxysepala var. oxysepala and A. oxysepala var. kansuensis are considered varieties of the same species, they were found in two different clades. Similarly, A. japonica and A. amurensis, which are treated as a single species by the Flora of China (Li, 2007), were also found in two different clades (Fig. 5).
FIGURE 5

Phylogenetic relationships of Aquilegia. (A) Phylogeny of all chloroplast genome sequences built using Bayesian inference, with posterior probabilities (%) indicated above the branches. (B) Phylogeny of all chloroplast genome sequences using maximum likelihood, with bootstrap values indicated above the branches. Green, orange, and purple represent European species, Asian species, and North American species, respectively. Semiaquilegia adoxoides is included as the outgroup.

Phylogenetic relationships of Aquilegia. (A) Phylogeny of all chloroplast genome sequences built using Bayesian inference, with posterior probabilities (%) indicated above the branches. (B) Phylogeny of all chloroplast genome sequences using maximum likelihood, with bootstrap values indicated above the branches. Green, orange, and purple represent European species, Asian species, and North American species, respectively. Semiaquilegia adoxoides is included as the outgroup.

Positive selection analysis

Positive selection tests were performed on 54 CDS from Aquilegia and their related species using the PAML package. No significant selection was found to act on the chloroplast genes of Aquilegia (P > 0.05), but three genes with a higher posterior probability were detected using the BEB and NEB methods (atpB, petG, and rpl36). Therefore, atpB, petG, and rpl36 were considered to be genes potentially under positive selection (Table 2).
TABLE 2

Analysis of the positive selection of all genes in the Aquilegia chloroplast genome based on the branch‐site model.

Gene namelnL0lnL1df P BEBNEB
psbM −293.77830−292.8027910.08124NANA
psbL −281.27366−281.2736610.5NANA
ccsA −6608.03557−6608.0352910.5NANA
psaC −927.93380−927.9338010.14717NANA
psaB −7228.74873−7228.7487510.49748NANA
rpl33 −957.68785−957.6878510.5NANA
psbF −281.15093−281.1509310.5NANA
psaI −404.70806−404.7080610.5NANA
atpI −2841.63957−2841.6395910.49944NANA
atpH −756.68830−756.6883010.5NANA
rps19 −1282.74781−1282.7478210.49862NANA
rps18 −331.72349−331.7234910.5NANA
ndhK −2320.17644−2320.1764410.49831NANA
ndhJ −1920.71103−1920.7110310.5NANA
ndhA −5351.81643−5351.8164310.49944NANA
atpB a −6085.06153−6085.0615310.524 A 0.830NA
ycf4 −2596.58913−2596.5891510.49411NANA
rpoA −5881.89224−5881.8922710.49691NANA
rps14 −1426.69915−1426.7252410.49495NANA
ndhG −2793.36061−2793.3606310.49813NANA
atpE −1748.33822−1748.3382210.09889NANA
psbT −407.74460−407.7446010.5NANA
petN −172.09765−172.0976510.5NANA
ycf3 −1481.19480−1481.1948010.5NANA
psbJ −349.00121−349.0012110.5NANA
psbK −764.88861−764.8886210.4992NANA
ndhb −3100.09672−3100.0964910.49741NANA
ndhC −1479.48370−1479.4837010.49729NANA
atpA −6298.17867−6298.1786910.49767NANA
ndhH −5676.37359−5676.3735910.49171NANA
ndhI −2058.27789−2058.2779110.49831NANA
psbZ −572.70301−572.7030110.49887NANA
rps2 −2861.59762−2861.5976310.49822NANA
petA −4229.69609−4229.6959810.5NANA
psbD −3394.40957−3394.4 095610.49831NANA
psbE −724.90892−724.9089210.49531NANA
rpoC2 −21681.02490−21681.0248910.49874NANA
psaJ −520.11363−520.1136210.5NANA
psbN −365.09625−365.0962510.5NANA
psaA −6058.99244−6058.9924210.49686NANA
rpl36 a −480.99354−480.9935310.17307NA0.996 b
psbC −4629.80228−4629.8022810.5NANA
psbB −5837.08819−5837.0882010.49652NANA
psbI −326.55011−326.5501110.5NANA
psbH −1141.77124−1142.2350510.49944NANA
rbcL −5381.09986−5381.0998610.5NANA
matK −8587.46216−8587.4621910.5NANA
ndhE −1430.75023−1430.7502310.5NANA
rpl20 −2045.17646−2044.2450710.49874NANA
atpF −2329.18412−2329.1841410.5NANA
petL −364.79445−364.7944510.5NANA
cemA −3613.71602−3613.7160210.5NANA
petG a −345.98700−345.9870010.49851NA0.997 b
rpoB −13852.11743−13852.1174310.49831NANA

A = alanine (amino acid); BEB = Bayesian empirical Bayes; NA = not available; NEB = naive empirical Bayes.

Genes under positive selection.

P > 99%.

Analysis of the positive selection of all genes in the Aquilegia chloroplast genome based on the branch‐site model. A = alanine (amino acid); BEB = Bayesian empirical Bayes; NA = not available; NEB = naive empirical Bayes. Genes under positive selection. P > 99%.

DISCUSSION

The structure of Aquilegia chloroplast genomes

In this study, we assembled and annotated the complete chloroplast genomes of 14 Aquilegia species, including 10 species from Asia, two from Europe, and two from North America. Based on these chloroplast genome sequences, we calculated polymorphism and inferred the phylogenetic relationships within Aquilegia. The structure and gene order of chloroplast genomes are highly conserved in the angiosperms (Choi et al., 2016). In our study, the chloroplast genomes of 14 Aquilegia species showed a typical quadripartite structure (Fig. 1), and the gene composition and gene order were similar in each species. The expansion or contraction of IR regions plays an important role in the length of the chloroplast genome (Raubeson et al., 2007; Wang et al., 2008; Yang et al., 2010). In the Aquilegia chloroplast genomes, the total length of the complete sequence was directly proportional to the length of the IR region (Table 1). Insertion/deletion polymorphisms (indels) in these sequences resulted in variations in the length of the Aquilegia chloroplast genome, which is a common phenomenon found in Camellia L. (Huang et al., 2014), Quercus L. (Yin et al., 2018), Amaranthus L. (Chaney et al., 2016), and the other angiosperms (Jiang et al., 2017). Compared with the other two regions, the GC content was the highest in the IR regions in Aquilegia. This effect may be caused by the presence of more rDNA in the IR regions, which has a higher GC content (approximately 50%) (Xie et al., 2018). Both long repetitive sequences and SSRs with high copy‐number diversity are valuable and useful molecular markers in studies of plant population genetics, phylogenetic reconstruction, and plant evolution at the intraspecific level (Wu et al., 2015; Ivanova et al., 2017). Here, long repeat sequences and SSRs of different lengths were found in each species (Figs. 2, 3), indicating that they can both be used as molecular markers for research on Aquilegia. Among these regions, the SSC region had the highest nucleotide polymorphism level, followed by the LSC region; the IR regions had the lowest nucleotide polymorphism level, indicating that the IR regions were most conserved. This result is likely due to the high conservation of the rDNA in the IR regions (Hershkovitz and Zimmer, 1996). The nucleotide polymorphisms of chloroplast genes in Aquilegia were smaller than those of other genera, such as Populus L. (Gao et al., 2019), Camellia (Li et al., 2019a), and Anguinum Fourr. (Jin et al., 2019); however, some variable genes were identified, including rpoC2, trnS‐GGA, and trnL‐CAA (Fig. 4). These regions with high levels of polymorphism are also a good resource for studying the phylogeny and population genetics of Aquilegia, especially rpoC2, which has the highest levels of polymorphism (Walker et al., 2019).

The phylogeny of Aquilegia based on chloroplast genomes

Biogeographic and phylogenetic analyses have indicated that Aquilegia had a common ancestor from eastern Asia, and later adaptive radiations took place independently in North America and Western Europe (Bastida et al., 2010; Fior et al., 2013). Aquilegia amurensis is restricted to the northern Greater Khingan Mountains, while A. parviflora is distributed in the northern Greater Khingan Mountains and Siberia. Despite this, we found these species were phylogenetically close to Aquilegia species from North America, whereas the remaining Asian species were phylogenetically close to Aquilegia species from Europe. The phylogeny based on the chloroplast genome was not completely consistent with that of the study by Fior et al. (2013). In our study, A. oxysepala var. oxysepala, A. japonica, and A. sibirica fell within a single clade; however, Filiault et al. (2018) had concluded that A. oxysepala var. oxysepala was located at the base of the phylogenetic tree, and A. japonica and A. sibirica shared a most recent common ancestor (MRCA). Li et al. (2014) used a combination of morphological characteristics, habitat type, and nuclear and chloroplast phylogenies (Bastida et al., 2010; Fior et al., 2013; Li et al., 2014) of these three species to propose that A. sibirica diverged first from the MRCA, and A. oxysepala var. oxysepala and A. japonica then differentiated into new species (Li et al., 2019b) containing more individuals. Our results also support the research of Li et al. (2019b). In addition, the position of A. viridiflora in this study was inconsistent with the phylogeny based on chloroplast genes by Fior et al. (2013) and the phylogeny by Lu et al. (2019). The inconsistency may be caused by incomplete lineage sorting and introgression in species undergoing rapid adaptive radiation (Meyer et al., 2017; Cai et al., 2020); therefore, the taxonomic status of A. viridiflora is worthy of further study. In addition, according to the Flora of China (Li, 2007), A. oxysepala var. kansuensis is considered a variety of A. oxysepala var. oxysepala, although their morphological characteristics, distribution ranges, and habitats all differ from each other. In both the present and previous studies (Fior et al., 2013), A. oxysepala var. oxysepala and A. oxysepala var. kansuensis showed distant genetic relationships; therefore, we suggest that A. oxysepala var. kansuensis should be considered as a separate species rather than a variety. The phylogenetic tree shows that A. ecalcarata sequences were present on two different branches, providing further evidence to the previous report that A. ecalcarata is not monophyletic with a single origin and may have a complicated evolutionary history (Huang et al., 2018). In the future, to infer the phylogenetic relationships of rapidly evolving species within Aquilegia, we should collect more varieties and a greater number of species to construct the phylogeny.

Adaptative evolution of Aquilegia

Synonymous and nonsynonymous nucleotide substitution patterns play an important role in adaptive evolution. In Aquilegia, no significant positive selection was detected for the majority of genes, with only three genes (petG, rpl36, and atpB) showing possible positive selection; these may have played an important role in adaptive evolution in Aquilegia. Based on annotation information from the UniProtKB database (https://www.uniprot.org), in Arabidopsis thaliana (L.) Heynh., petG controls the components of the cytochrome bf6‐f complex subunit 5, which mediates electron transfer between photosystem II (PSII) and PSI, cyclic electron flow around PSI, and state transitions (Sato et al., 1999; Kandlbinder et al., 2004); the rpl36 gene encodes the 50S ribosomal protein L36, which serves as a structural component of the ribosome (Sato et al., 1999; Koia et al., 2013); and the atpB gene controls the ATP synthase subunit beta, which produces ATP from ADP in the presence of a proton gradient across the membrane (Sato et al., 1999; Friso et al., 2004). Previous studies showed that rpl36 was under positive selection in the Araceae and Sophora tonkinensis Gagnep. (Fan et al., 2020; Henriquez et al., 2020), while atpB was under positive selection in Urophysa Ulbr. and the Liliaceae (sensu lato) (Xie et al., 2018; She et al., 2020). These genes are highly correlated with physiological processes such as photosynthesis and disease resistance; thus, their positive selection may assist Aquilegia species in rapid adaptation to various environments and enable their wide global distribution.

AUTHOR CONTRIBUTIONS

X.H. and W.H. designed the study and evaluated the results; W.H. and Z.W. collected the materials; Z.W., D.J., and Z.T. participated in the data analysis; Z.W. and W.H. prepared the manuscript; and all authors read and approved the final manuscript. APPENDIX S1. List of genes encoded by the Aquilegia chloroplast genomes. Click here for additional data file. APPENDIX S2. Number of each type of simple sequence repeat in Aquilegia species. Click here for additional data file. APPENDIX S3. The nucleotide diversity of all genes of Aquilegia. Click here for additional data file.
SpeciesLatitude (°N)Longitude (°E)Distribution regionSize (Gbp)Raw readsChloroplast readsDepthVoucher specimen
A. viridiflora 40.954111.672Asia1318,729,5999,936,5324774×NENU_Aq1001
A. oxysepala var. kansuensis 31.815109.009Asia1116,161,1753,273,4511519×NENU_Aq1002
A. ecalcarata 37.160102.223Asia1116,159,8543,721,4391875×NENU_Aq1003
A. parviflora 50.422121.476Asia9.614,222,7753,179,1531517×NENU_Aq1004
A. amurensis 52.672123.870Asia9.914,758,6206,110,2852874×NENU_Aq1005
A. rockii 29.951101.964Asia1115,337,2633,696,9581664×NENU_Aq1006
A. yabeana 33.9125112.041Asia1318,296,4603,276,9271523×NENU_Aq1007
SpeciesSRA no.Size (Gbp)Chloroplast readsDepthDistribution region
A. aurea SRR40509525.915,526,5788520×Europe
A. vulgaris SRR40434927.548,865,46426,870×Europe
A. sibirica SRR40509025.228,912,82116,384×Asia
A. formosa SRR40855428.411,593,5727209×North America
A. chrysantha SRR40855926.811,964,7087209×North America
A. japonica SRR41349926.628,881,07916,384×Asia
A. oxysepala var. oxysepala SRR41392128.041,390,03424,248×Asia

Sequencing was performed on the Illumina platform.

SpeciesGenBank accession no.
Aconitum brachypodum NC_041579.1
Actaea vaginata MK253451.1
Adonis coerulea MK253469.1
Anemoclema glaucifolium MH205609.1
Anemone raddeana NC_041526.1
Anemonopsis macrophylla NC_041527.1
Aquilegia coerulea NC_041528.1
Aquilegia coerulea MK569474.1
Aquilegia ecalcarata NC_041529.1
Aquilegia ecalcarata MK569475.1
Aquilegia rockii NC_046738.1
Aquilegia rockii MK573514.1
Asteropyrum cavaleriei NC_041530.1
Beesia calthifolia NC_041531.1
Calathodes oxycarpa NC_041475.1
Callianthemum taipaicum NC_041476.1
Caltha palustris MK253465.1
Ceratocephala falcata MK253464.1
Clematis terniflora KJ956785.1
Consolida ajacis NC_041534.1
Coptis chinensis MK569483.1
Delphinium anthriscifolium MK253461.1
Dichocarpum dalzielii MK253459.1
Enemion raddeanum NC_041535.1
Eranthis stellata NC_041536.1
Glaucidium palmatum MK569492.1
Gymnaconitum gymnandrum NC_033341.1
Halerpestes sarmentosa MK253457.1
Helleborus thibetanus NC_041540.1
Hydrastis canadensis MK569495.1
Isopyrum manshuricum NC_041541.1
Leptopyrum fumarioides NC_041542.1
Megaleranthis saniculifolia FJ597983.1
Naravelia pilulifera NC_039542.1
Nigella damascena NC_041537.1
Oxygraphis glacialis NC_041538.1
Paraquilegia anemonoides NC_041479.1
Pulsatilla chinensis MK569491.1
Ranunculus macranthus DQ359689.1
Semiaquilegia adoxoides MK569498.1
Staphisagria macrosperma MN648404.1
Thalictrum thalictroides NC_039433.1
Trollius chinensis NC_031849.1
Trollius ranunculoides MK253447.1
Urophysa rockii MK569502.1
  60 in total

1.  REPuter: the manifold applications of repeat analysis on a genomic scale.

Authors:  S Kurtz; J V Choudhuri; E Ohlebusch; C Schleiermacher; J Stoye; R Giegerich
Journal:  Nucleic Acids Res       Date:  2001-11-15       Impact factor: 16.971

2.  Scaffolding pre-assembled contigs using SSPACE.

Authors:  Marten Boetzer; Christiaan V Henkel; Hans J Jansen; Derek Butler; Walter Pirovano
Journal:  Bioinformatics       Date:  2010-12-12       Impact factor: 6.937

3.  MEGA X: Molecular Evolutionary Genetics Analysis across Computing Platforms.

Authors:  Sudhir Kumar; Glen Stecher; Michael Li; Christina Knyaz; Koichiro Tamura
Journal:  Mol Biol Evol       Date:  2018-06-01       Impact factor: 16.240

4.  MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Authors:  Kazutaka Katoh; Daron M Standley
Journal:  Mol Biol Evol       Date:  2013-01-16       Impact factor: 16.240

5.  Pineapple translation factor SUI1 and ribosomal protein L36 promoters drive constitutive transgene expression patterns in Arabidopsis thaliana.

Authors:  Jonni Koia; Richard Moyle; Caroline Hendry; Lionel Lim; José Ramón Botella
Journal:  Plant Mol Biol       Date:  2012-12-22       Impact factor: 4.076

6.  Complete chloroplast genome of the multifunctional crop globe artichoke and comparison with other Asteraceae.

Authors:  Pasquale L Curci; Domenico De Paola; Donatella Danzi; Giovanni G Vendramin; Gabriella Sonnante
Journal:  PLoS One       Date:  2015-03-16       Impact factor: 3.240

7.  Chloroplast Genome Analysis of Resurrection Tertiary Relict Haberlea rhodopensis Highlights Genes Important for Desiccation Stress Response.

Authors:  Zdravka Ivanova; Gaurav Sablok; Evelina Daskalova; Gergana Zahmanova; Elena Apostolova; Galina Yahubyan; Vesselin Baev
Journal:  Front Plant Sci       Date:  2017-02-20       Impact factor: 5.753

8.  Comparative chloroplast genomics: analyses including new sequences from the angiosperms Nuphar advena and Ranunculus macranthus.

Authors:  Linda A Raubeson; Rhiannon Peery; Timothy W Chumley; Chris Dziubek; H Matthew Fourcade; Jeffrey L Boore; Robert K Jansen
Journal:  BMC Genomics       Date:  2007-06-15       Impact factor: 3.969

9.  Phylogenomic and Comparative Analyses of Complete Plastomes of Croomia and Stemona (Stemonaceae).

Authors:  Qixiang Lu; Wenqing Ye; Ruisen Lu; Wuqin Xu; Yingxiong Qiu
Journal:  Int J Mol Sci       Date:  2018-08-13       Impact factor: 5.923

10.  Comparative Analysis of the Chloroplast Genomes of the Chinese Endemic Genus Urophysa and Their Contribution to Chloroplast Phylogeny and Adaptive Evolution.

Authors:  Deng-Feng Xie; Yan Yu; Yi-Qi Deng; Juan Li; Hai-Ying Liu; Song-Dong Zhou; Xing-Jin He
Journal:  Int J Mol Sci       Date:  2018-06-22       Impact factor: 5.923

View more
  3 in total

1.  The diversity of elaborate petals in Isopyreae (Ranunculaceae): a special focus on nectary structure.

Authors:  Qing-Qing Zhu; Cheng Xue; Li Sun; Xin Zhong; Xin-Xin Zhu; Yi Ren; Xiao-Hui Zhang
Journal:  Protoplasma       Date:  2022-06-27       Impact factor: 3.356

2.  Complete chloroplast genome of Lilium ledebourii (Baker) Boiss and its comparative analysis: lights into selective pressure and adaptive evolution.

Authors:  Morteza Sheikh-Assadi; Roohangiz Naderi; Mohsen Kafi; Reza Fatahi; Seyed Alireza Salami; Vahid Shariati
Journal:  Sci Rep       Date:  2022-06-07       Impact factor: 4.996

3.  Comparison Analysis Based on Complete Chloroplast Genomes and Insights into Plastid Phylogenomic of Four Iris Species.

Authors:  Jing-Lu Feng; Li-Wei Wu; Qing Wang; Yun-Jia Pan; Bao-Li Li; Yu-Lin Lin; Hui Yao
Journal:  Biomed Res Int       Date:  2022-07-27       Impact factor: 3.246

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.