Literature DB >> 34804117

Comparative Analysis of Complete Chloroplast Genomes of 13 Species in Epilobium, Circaea, and Chamaenerion and Insights Into Phylogenetic Relationships of Onagraceae.

Yike Luo1, Jian He1, Rudan Lyu1, Jiamin Xiao1, Wenhe Li1, Min Yao1, Linying Pei2, Jin Cheng3, Jinyu Li4, Lei Xie1.   

Abstract

The evening primrose family, Onagraceae, is a well defined family of the order Myrtales, comprising 22 genera widely distributed from boreal to tropical areas. In this study, we report and characterize the complete chloroplast genome sequences of 13 species in Circaea, Chamaenerion, and Epilobium using a next-generation sequencing method. We also retrieved chloroplast sequences from two other Onagraceae genera to characterize the chloroplast genome of the family. The complete chloroplast genomes of Onagraceae encoded an identical set of 112 genes (with exclusion of duplication), including 78 protein-coding genes, 30 transfer RNAs, and four ribosomal RNAs. The chloroplast genomes are basically conserved in gene arrangement across the family. However, a large segment of inversion was detected in the large single copy region of all the samples of Oenothera subsect. Oenothera. Two kinds of inverted repeat (IR) region expansion were found in Oenothera, Chamaenerion, and Epilobium samples. We also compared chloroplast genomes across the Onagraceae samples in some features, including nucleotide content, codon usage, RNA editing sites, and simple sequence repeats (SSRs). Phylogeny was inferred by the chloroplast genome data using maximum-likelihood (ML) and Bayesian inference methods. The generic relationship of Onagraceae was well resolved by the complete chloroplast genome sequences, showing potential value in inferring phylogeny within the family. Phylogenetic relationship in Oenothera was better resolved than other densely sampled genera, such as Circaea and Epilobium. Chloroplast genomes of Oenothera subsect. Oenothera, which are biparental inheritated, share a syndrome of characteristics that deviate from primitive pattern of the family, including slightly expanded inverted repeat region, intron loss in clpP, and presence of the inversion.
Copyright © 2021 Luo, He, Lyu, Xiao, Li, Yao, Pei, Cheng, Li and Xie.

Entities:  

Keywords:  IR expansion; Onagraceae; RNA editing; biparental inheritance; chloroplast genome; inversion; phylogeny

Year:  2021        PMID: 34804117      PMCID: PMC8600051          DOI: 10.3389/fgene.2021.730495

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


Introduction

Chloroplast is one of the most important organelles in plant cells and play vital metabolic roles in photosynthesis as well as amino acid and lipid synthesis (Daniell et al., 2016). It has its own genetic material that does not obey the Mendelian laws of heredity. The chloroplast genome of angiosperms often shows a stable quadripartite ring structure containing one large single copy (LSC) region and one small single copy (SSC) region separated by two copies of an inverted repeat (IR) region. It usually shows uniparental inheritance (Ravi et al., 2008), and its sequence, gene number, and gene order have been considered to be very conserved (Wolfe et al., 1987). However, many types of mutation occur in the chloroplast genome, including single nucleotide polymorphisms (SNPs), indels, IR contraction and expansion, inversion, and translocation (Ahmed et al., 2012; Daniell et al., 2016; Liu et al., 2018; He et al., 2019; Mehmood et al., 2020), which provide potential molecular markers for phylogenetic inference, DNA barcoding, and population genetics. Studies have shown that environmental factors, such as hot, desiccation, and metal ion stress, may have an important influence on molecular evolution (such as change GC content, promote nucleotide substitution, and decrease the abundance of small RNAs) and diversification of the plant chloroplast genomes (Fitzgerald et al., 2011; Wang et al., 2011; Ivanova et al., 2017; Gao et al., 2018; Li et al., 2020). In recent years, the use of complete chloroplast genome data for phylogenetic inference has greatly deepened our insight into the evolution of plants at a wide range of taxonomic levels (Park et al., 2017; Wen et al., 2018; Li et al., 2019; Valcárcel and Wen, 2019; Wang L. et al., 2020; Brandrud et al., 2020). The inheritance of chloroplast genomes is predominantly maternal in angiosperms. However, biparental transmission of chloroplast genome has arisen in multiple lineages of angiosperms (Hu et al., 2008). It has been estimated that approximately 20% of angiosperm species potentially have biparentally inherited chloroplast genomes (Corriveau and Coleman, 1988; Zhang et al., 2003; Zhang and Sodmergen, 2010). Biparental inheritance of chloroplast may have important impact on evolution, such as producing genetic incompatibility to arise in speciation (Greiner et al., 2011). It has also been hypothesized that the nature of chloroplast inheritance may affect its genome stability (Wicke et al., 2011). Although the underlying mechanisms are unknown, structural rearrangements in chloroplast genome in correlation with biparental inheritance had been recognized in various kinds of plant taxa (Jansen and Ruhlman, 2012; Choi et al., 2020). The evening primrose family, Onagraceae, is composed of about 650 species of herbs, shrubs, and rarely trees distributed worldwide and species-rich in the New World (Raven, 1988). Onagraceae is characterized by flowers with four (or rarely two or five) petals, an inferior ovary, an often dehiscent capsule, and pollen grains held together by viscin threads. The family was sharply defined (Raven, 1964), but with disputed interpretation of subfamily, tribal, and some generic delimitation in its long taxonomic history (Kurabayashi et al., 1962; Raven, 1964; Munz, 1965; Wagner et al., 2007). Several molecular phylogenetic analyses using Sanger’s sequencing method have been conducted to resolve the phylogenetic relationships within Onagraceae (Martin and Dowd, 1986; Crisci et al., 1990; Bult and Zimmer, 1993; Conti et al., 1993; Levin et al., 2003, 2004; Berry et al., 2004; Hoggard et al., 2004; Evans et al., 2005; Ford and Gottlieb, 2007; Xie et al., 2009; Liu et al., 2017). Based on molecular and morphological data, a recent taxonomic monograph by Wagner et al. (2007) included 22 genera in Onagraceae. These genera were further grouped into two subfamilies: subfam. Ludwigioideae W. L. Wagner and Hoch (with only one genus, Ludwigia L.) and subfam. Onagroideae W. L. Wagner and Hoch (with six tribes and 21 genera). Onagraceae contains many popular garden plants including evening primrose (Oenothera L.) and fuchsia (Fuchsia L.). Some species of the family also have medicinal value and are widely used to make oil, spices, and nectar (Chen et al., 2007). Inheritance of the chloroplast genome in Onagraceae has attracted great attention of botanists (Cleland, 1972; Chiu et al., 1988; Chiu and Sears, 1992; Chiu and Sears, 1993; Sears et al., 1996; Massouh et al., 2016; Sobanski et al., 2019). Both maternal and biparental inheritance of chloroplast genomes has been reported in the family (Wagner et al., 2007). Oenothera subsect. Oenothera are known to have biparentally transmitted chloroplast (Corriveau and Coleman, 1988; Wagner et al., 2007). Whereas, chloroplast genomes from Circaea L. and Fuchsia have been shown to be maternally transmitted (Corriveau and Coleman, 1988; Zhang et al., 2003). Chloroplasts of Epilobium L. were also reported to be mainly maternally transmitted, but very low proportions of paternally transmitted chloroplast were also found (Schmitz and Kowallik, 1986). As mentioned above, biparentally inherited chloroplast genomes of many plant taxa have shown extensive rearrangement of genome structure. Thus, Onagraceae provides an opportunity to better understand differences in the chloroplast genome structure and sequence diversification between the two inheritance types. However, there are still no comparative studies concerning this issue and only a limited number of complete chloroplast genomes have been published to date. In the present study, we report the complete chloroplast genomes from three genera (Circaea, Chamaenerion Ség. , and Epilobium) of Onagraceae, among which those of the Circaea are reported for the first time. We hypothesized that the structure and sequence variation of chloroplast genomes in Onagraceae show different structures between biparentally and maternally inherited chloroplast genomes. Thus, we compared the synteny and chloroplast genome structure across the family and investigated their chloroplast genome structure and sequence variation. We also conducted a phylogenetic study to explore the evolutionary trends of chloroplast genome variation and the potential application value of the chloroplast markers across Onagraceae.

Materials and Methods

Taxon Sampling and Next-Generation Sequencing

We sampled 16 accessions representing three genera (Circaea, Chamaenerion, and Epilobium) and 13 species of Onagraceae (Supplementary Table S1). We also retrieved all the (15 samples representing 14 species) published complete chloroplast genome sequences of Onagraceae to date, as well as two samples from Lythraceae (sister family of Onagraceae) from GenBank for phylogenetic analysis. In total, five genera (Circaea, Chamaenerion, Epilobium, Ludwigia, and Oenothera) and 27 species (31 samples) of Onagraceae were included in this study. The taxonomy of Onagraceae at generic and infrageneric level followed Wagner et al. (2007). Our sampling covered both subfamilies (subfam. Ludwigioideae and subfam. Onagroideae) and three of the total six tribes in subfam. Onagroideae. Biparentally inherited chloroplast genomes were known to have occurred in species of Oenothera subsect. Oenothera (Wagner et al., 2007). So, we used chloroplast genome of O. biennis L. as a representative of biparentally inherited chloroplast genome (also reported by Corriveau and Coleman, 1988) to compare with the maternally inherited one from Circaea and Epilobium. Approximately 50 mg dried leaf tissue was ground for each sample. Total genomic DNA was extracted using the cetyl-trimethylammonium bromide (CTAB; Doyle and Doyle, 1987) method. The quality of DNA was assessed by 0.8% agarose gel electrophoresis, and extracted DNA was sent to Novogene (http://www.novogene.com, China) for short-insert (350 bp) library construction and next-generation sequencing. Paired-end reads of 2 × 150 bp were generated on the Illumina Hiseq 4,000 Genome Analyzer platform. We used the FASTX Toolkit (http://hannonlab.cshl.edu/fastx_toolkit) to filter the raw reads and remove the adaptors and low-quality reads to obtain high-quality data. The BLAT analysis, as implemented in a Python script (Weitemier et al., 2014), was applied to exclude nuclear and mitochondrial reads using a published complete chloroplast genome sequence of Epilobium ulleungensis as the reference (GenBank accession no. MH198310). Subsequently, the putative chloroplast reads were de novo assembled using Geneious v. Prime (Kearse et al., 2012) with a low sensitivity setting. Gaps between contigs were filled by re-mapping the entire reads to both contigs using the FineTuning program in Geneious v. Prime (iterating up to 100 times), as described by He et al. (2019). Contigs were connected into larger contigs by overlapping their terminal sequences using the RepeatFinder option in Geneious v. Prime (Kearse et al., 2012). After building an approximate 130-kb contig (including a complete SSC, a complete IRa, a complete LSC, and a partial IRb region) for each sample, the boundaries of the IR region were determined using the RepeatFinder. The IR region was manually inverted and duplicated to construct the complete chloroplast genome sequence using Geneious v. Prime (Kearse et al., 2012). The correction of the gaps and junctions between IRs and LSC/SSC regions were confirmed by PCR amplifications. The complete chloroplast genome sequences were annotated using the Plastid Genome Annotator (Qu et al., 2019) and checked manually in Geneious v. Prime (Kearse et al., 2012). Illustrations of the newly sequenced chloroplast genome sequences were drawn using the Organellar Genome DRAW tool v. 1.3.1 (Lohse et al., 2013).

Comparative Evaluation of the Chloroplast Genome

The newly sequenced chloroplast genomes were compared with those of the other published Onagraceae species. Amino acid frequency and codon usage were calculated using the Geneious v. Prime (Kearse et al., 2012) and CodonW v. 1.4 (Peden, 1999) software, and the putative RNA editing sites in protein-coding genes were determined by the predictive RNA editor for plant chloroplasts (PREP-cp) suite (Mower, 2009). For the synteny analysis of the Onagraceae chloroplast genome, mVISTA (Frazer et al., 2004) was used in LAGAN and Shuffle-LAGAN mode, with default parameters using Epilobium sikkimense Hausskn. as reference. The contraction and expansion of the IR boundaries between the four main parts of the genome (LSC/IRb/SSC/IRa) were visualized using IRscope (Amiryousefi et al., 2018). We also conducted a sliding window analysis to identify the nucleotide variability (Pi) of the complete chloroplast genomes of the three newly sequenced genera and Oenothera using DnaSP v. 5 (Librado and Rozas, 2009). The microsatellites were determined by MIcroSAtellite (MISA) (Varshney et al., 2005), with a minimum threshold of seven nucleotides for mononucleotide repeats, four for di-, and 3 each for tri-, tetra-, penta-, and hexanucleotide repeats. The REPuter program (Kurtz et al., 2001) was used to analyze forward (F), reverse (R), complement (C), and palindromic (P) oligonucleotide repeats with a minimum repeat size of 30 bp and similarities of 90%. Furthermore, tandem repeats were evaluated by the Tandem Repeats Finder (Benson, 1999) using default parameters.

Phylogenetic Analysis

The phylogenetic analysis was performed among 31 species of Onagraceae using two Lythraceae samples as outgroups. For phylogenetic tree reconstruction, we removed IRa from the analysis and manually reverted the inverted regions in samples of Oenothera subsect. Oenothera. We also divided the complete chloroplast genome sequences into coding regions (CDs, including protein-coding genes, tRNA genes, and rRNA genes), intergenic spacer regions (IGS), and introns. Each dataset was further divided into LSC, SSC, and IR regions. All the 13 separated and combined datasets (the complete CDs sequence, the complete IGS, the complete intron, the LSC-CDs, the LSC-IGS, the LSC intron, the SSC-CDs, the SSC-IGS, the SSC-intron, the IR-CDs, the IR-IGS, the IR-intron, and the complete chloroplast genome datasets) were then aligned using MAFFT v. 6.833 (Katoh et al., 2005) and manually adjusted by Geneious v. Prime (Kearse et al., 2012). The ambiguous alignments were removed from the datasets using a Python script (He et al., 2019). We used both the maximum likelihood (ML) and Bayesian inference (BI) methods for phylogenetic reconstruction for each dataset. The ML tree for each dataset was generated by RAxML v.8.1.17 (Stamatakis, 2014) using the GTR + G model as suggested in the user manual. The bootstrap percentages were calculated after 500 replicates. Bayesian inference for each dataset was conducted using MrBayes v3.2.3 (Ronquist and Huelsenbeck, 2003). Substitution models and data partitions of the complete chloroplast genome dataset for the Bayesian analysis were determined by PartitionFinder v2.1.1 (Lanfear et al., 2017). Six partitioning schemes were used for the complete chloroplast genome dataset: 1) no partitions, 2) partitioned by coding and non-coding regions (with the four rRNA genes as the third partition), 3) by LSC, SSC, and IRs, 4) coding region by genes (non-coding region as one partition), 5) coding region by genes and codon positions (non-coding region as one partition), 6) coding region by the third codon position (the first and second codon positions as on partition and the third position as the other partition, non-coding region as another one partition). The best scheme was selected according to the Bayesian information criterion (BIC). Partitioning of other datasets was on the basis of the result of the complete chloroplast genome dataset. For the Bayesian inference, the default priors in MrBayes were applied for tree search. Two independent Markov chain Monte Carlo (MCMC) chains were created, each with three heated and one cold chain for 2,000,000 generations and sampling trees every 100 generations. The first 25% of the trees were discarded as burn-in, and the remaining trees were used to generate the consensus tree. All the alignments used in this study are available on Zenodo, with the identifier https://doi.org/10.5281/zenodo.5545914.

Results

Chloroplast Genome Assembly, Organization, and Nucleotide Composition Features

For each newly sampled Onagraceae species, approximately 6 Gb clean NGS data were obtained, which means that the whole genomic coverage of our NGS data ranged from ca. 6–30 × (https://cvalues.science.kew.org/). We filtered out 130,748–410,678 chloroplast reads from the samples for de novo assembly. The coverage of the chloroplast genome was from 79 to 271 ×. One to seven large contigs were retained. All the gaps between the de novo contigs were successfully bridged by re-mapping the cleaned reads to both contigs using the FineTuning program in Geneious v. Prime (Kearse et al., 2012) with 100 iterations. The correction of the gaps and junctions between IRs and LSC/SSC regions were confirmed by PCR amplifications. All the newly assembled sequences were deposited in the GenBank under accession numbers of MZ326160 and from MZ353628 to MZ353642 (Supplementary Table S1). Chloroplast genome sequences of Circaea ranged from 155,817 bp (C. alpina subsp. micrantha (A. K. Skvortsov) Boufford) to 156,024 bp (Circaea alpina subsp. caulescens (Kom.) Tatew.) in size, and the overall GC content varied from 37.7 to 37.8%. For Chamaenerion samples, the complete chloroplast genome sequences ranged from 159,496 bp (C. conspersum (Hausskn.) Kitam.) to 160,416 bp (C. angustifolium subsp. circumvagum (Mosquin) Moldenke), and the overall GC content varied from 38.1 to 38.2%. For Epilobium chloroplast genome, the sizes ranged from 160,748 bp (Epilobium amurense subsp. amurense Hausskn.) to 161,144 bp (E. sikkimense Hausskn.), and the overall GC content varied from 38.1 to 38.2% (Supplementary Table S2). All the newly assembled chloroplast genome sequences contained a pair of IRs (24,996–27,519 bp) separated by a LSC region (87,569–89,163 bp) and a SSC region (17,157–18,283 bp). The complete chloroplast genomes encoded an identical set of 112 genes, including 78 protein-coding genes, 30 transfer RNAs, and four ribosomal RNAs. Among these, 17 (in Circaea samples) and 18 (in Chamaenerion and Epilobium samples) genes were duplicated in IR, and 18 genes had introns (Figure 1; Table 1, and Supplementary Table S2). Among the 18 intron-containing genes, 16 (10 protein-coding genes and 6 tRNA genes) had one intron and two (ycf3 and clpP) had two introns. However, the two introns in clpP gene are absent in Oenothera sect. Oenothera samples. The longest intron (2,487 bp) was in the trnK gene of Epilobium williamsii P. H. Raven.
FIGURE 1

Chloroplast genome maps of Chamaenerion, Circaea, and Epilobium sampled in the present study. Thick lines on the outer circle identify inverted repeat regions (IRa and IRb). The innermost track indicates the G + C content. Genes on the outside of the map are transcribed in a clockwise direction, and genes on the inside of the map are transcribed in a counterclockwise direction. IR, inverted repeat; LSC, large single copy; SSC, small single copy. Red arrows showed the different IR-SC boundaries between the two chloroplast genome structures.

TABLE 1

Genes present in the chloroplast genome of the 16 newly sequenced Epilobium, Circaea, and Chamaenerion samples.

Gene typeGene name
Ribosomal RNA genes16S rRNA23S rRNA4.5S rRNA5S rRNA
Transfer RNA genes trnA-UAC gene trnA-UGC gene trnC gene trnD gene trnE gene
trnF gene trnfM gene trnG-UCC gene trnG-GCC gene trnH gene
trnI gene trnK gene trnL-UAA gene trnL-CAA gene trnL-GAU gene
trnL-UAG gene trnM gene trnN gene trnP-UGG gene trnQ gene
trnR gene trnR-UCU gene trnS gene trnS-GCU gene trnS-GGA gene
trnT-UGU gene trnT-GGU gene trnV gene trnW-CCA gene trnY gene
Small subunit of the ribosome rps2 gene rps3 gene rps4 gene rps7 gene rps8 gene
rps11 gene rps12 gene rps14 gene rps15 gene rps16 gene
rps18 gene rps19 gene
The large subunit of the ribosome rpl2 gene rpl14 gene rpl16 gene rpl20 gene rpl22 gene
rpl23 gene rpl32 gene rpl33 gene rpl36 gene
RNA polymerase subunits rpoA gene rpoB gene rpoC1 gene rpoC2 gene
NADH dehydrogenase ndhA gene ndhB gene ndhC gene ndhD gene ndhE gene
ndhF gene ndhG gene ndhH gene ndhI gene ndhJ gene
ndhK gene
Photosystem Ⅰ psaA gene psaB gene psaC gene psaI gene psaJ gene
Cytochrome b/f complex petA gene petB gene petD gene petG gene petL gene
petN gene
ATP synthase atpA gene atpB gene atpE gene atpF gene atpH gene
atpI gene
Large subunit of rubisco rbcL gene
Maturase matK gene
Protease clpP gene
Envelope membrane protein cemA gene
Subunit of acetyl-CoA-carboxylase accD gene
Photosystem Ⅱ psbA gene psbB gene psbC gene psbD gene psbE gene
psbF gene psbH gene psbI gene psbJ gene psbK gene
psbL gene psbM gene psbN gene psbT gene psbZ gene
Copper chaperone for superoxide dismutase ccsA gene
Conserved open reading frames ycf 1,2,3,4
Chloroplast genome maps of Chamaenerion, Circaea, and Epilobium sampled in the present study. Thick lines on the outer circle identify inverted repeat regions (IRa and IRb). The innermost track indicates the G + C content. Genes on the outside of the map are transcribed in a clockwise direction, and genes on the inside of the map are transcribed in a counterclockwise direction. IR, inverted repeat; LSC, large single copy; SSC, small single copy. Red arrows showed the different IR-SC boundaries between the two chloroplast genome structures. Genes present in the chloroplast genome of the 16 newly sequenced Epilobium, Circaea, and Chamaenerion samples.

Codon Usage and Amino Acid Frequencies

Relative synonymous codon usage (RSCU) of the chloroplast genome sequences of the newly assembled samples was calculated using all protein-coding genes. Results of amino acid frequency, RSCU, and putative RNA editing sites are shown in Supplementary Figure S1 and Supplementary Tables S3, S4. There were 50 putative RNA editing sites detected in the 18 protein-coding genes of Epilobium, 43 sites detected in 16 protein-coding genes of Circaea, and 48 sites detected in 17 protein-coding genes of Chamaenerion. Among the three genera, the gene with the most RNA editing sites was ndhB (12 sites), and the second was ndhD (5 sites). The most common type of substitution in Epilobium was serine to leucine (26%), followed by proline to leucine (18%). This phenomenon also existed in the other two genera: the Chamaenerion chloroplast genome displayed 31.3% of editing sites substituted from serine to leucine, and 14.6% from proline to leucine; and the Circaea chloroplast genome showed 32.6% of editing sites substituted from serine to leucine, and 18.6% from proline to leucine. Among the 50 recognized RNA editing sites in Epilobium, 35 substitutions occurred at the second nucleotide position and 15 substitutions occurred at the first nucleotide position. Similar results were also detected in the other two genera.

Chloroplast Genome Comparison

To investigate the synteny and structural variation of the chloroplast genomes of Onagraceae, we performed multiple alignments of all the tested samples using mVISTA (Supplementary Figure S2). LAGAN and Shuffle-LAGAN programs were applied for this analysis. Results of generic representatives are shown in Figure 2. When using the LAGAN method, Oenothera subsect. Oenothera samples showed a large area of mismatch in their LSC region due to gene inversion. This inversion occurred between rbcL and trnQ-UUG and was approximately 56 kb in length. Typically, clpP gene has two introns in many angiosperm species. However, these two introns are absent in Oenothera sect. Oenothera samples, but still present in O. curtiflora W. L. Wagner and Hoch (sect. Gaura (L.) W. L. Wagner and Hoch). In addition, compared with other genera, some mismatch regions were found in the IR region of Oenothera, Epilobium, and Chamaenerion, which was caused by expansion of their IR zones by inclusion of the ndhF gene, and rarely other genes (described below).
FIGURE 2

Sequence alignment of representative samples from five genera of Onagraceae and two outgroups using the mVISTA program (alignment of all 33 samples are shown in Supplementary Figure S2). A cut-off of 70% similarity was used for the plot, and the Y-scale represents the percentage similarity ranging from 50 to 100%. Blue represents coding regions, and pink represents non-coding regions. (A): LAGAN method, the large empty part of the Oenothera biennis graph is the inverted region; (B): Shuffle LAGAN method.

Sequence alignment of representative samples from five genera of Onagraceae and two outgroups using the mVISTA program (alignment of all 33 samples are shown in Supplementary Figure S2). A cut-off of 70% similarity was used for the plot, and the Y-scale represents the percentage similarity ranging from 50 to 100%. Blue represents coding regions, and pink represents non-coding regions. (A): LAGAN method, the large empty part of the Oenothera biennis graph is the inverted region; (B): Shuffle LAGAN method. Subsequently, we compared the IR/SC boundary regions of 31 species of Onagraceae and two species of Lythraceae (Supplementary Figure S3). The early diverged genera of Onagraceae, Ludwigia and Circaea, have 17 genes in the IR region, which is the same with most other angiosperm genera such as Amborella Baill., Caltha L., and Arabidopsis Heynh. (Sato et al., 1999; He et al., 2019). So, the IR region of Ludwigia and Circaea can be considered as the primitive type of the family. Other tested Onagraceae genera showed more or less IR expansion. Almost all the tested samples from Chamaenerion, Epilobium, and Oenothera have 18-gene IR region (with inclusion of ndhF) (Figure 3). Two samples from Oenothera subsect. Munzia (W. Dietr.), O. picensis Phil. and O. villaricae W. Dietr., have 21-gene IR, with additional ccsA, trnL-UAG, rpl32 and ndhF genes.
FIGURE 3

Comparison of the LSC, IR, and SSC boundary regions of representative samples the three newly sequenced genera of Onagraceae and Oenothera samples. IR: inverted repeats; LSC: large single copy; SSC: small single copy.

Comparison of the LSC, IR, and SSC boundary regions of representative samples the three newly sequenced genera of Onagraceae and Oenothera samples. IR: inverted repeats; LSC: large single copy; SSC: small single copy. Sliding window analysis (Figure 4) showed that the nuclear variability of the IR region was relatively low in the three newly sequenced Onagraceae genera as well as in published Oenothera samples. Among the tested genera, Oenothera had the highest nucleotide variation. In addition, extremely high variations were discovered at both ends of the inversion in the LSC of the Oenothera chloroplast genome, which may be the main cause of the structural rearrangement of chloroplast genomes in Oenothera subsect. Oenothera.
FIGURE 4

A sliding window analysis of the complete chloroplast genomes of Epilobium, Chamaenerion, Circaea, and Oenothera samples showing nulceotide variability (Pi) within each genus. Circles identify the regions bordering inversion sites of Oenothera, and lines parallel to the X-axis identify the positions of LSC, SSC, and IR regions.

A sliding window analysis of the complete chloroplast genomes of Epilobium, Chamaenerion, Circaea, and Oenothera samples showing nulceotide variability (Pi) within each genus. Circles identify the regions bordering inversion sites of Oenothera, and lines parallel to the X-axis identify the positions of LSC, SSC, and IR regions.

SSR and Repeats Analyses

The chloroplast genome sequences are known to be highly conserved. However, chloroplast SSRs have been applied as important phylogenetic markers for unraveling polymorphisms across species and populations in plant molecular studies (Cato and Richardson, 1996; Xu et al., 2002; Wills et al., 2005; Bi et al., 2018). Furthermore, the primers for the chloroplast SSRs are conserved, which may facilitate primer design across species and genera. In this study, we detected 47–90 SSRs from chloroplast genome sequences of the three newly sequenced genera. Those in Circaea had 47–55 SSRs, which was the lowest among the three genera (Table 2). Epilobium and Chamaenerion had a higher number of SSRs, ranging from 76 to 90. The mononucleotide repeat unit (A/T) was the most common type, accounting for 85.6–97.9% of all 16 samples. The dinucleotide repeat unit (AT/TA) was the second most abundant, accounting for 1.7–9.2%. The mononucleotide repeat unit C/G existed in Chamaenerion and Circaea samples and in the chloroplast genome sequence of Epilobium sikkimense. Chamaenerion and Epilobium contained only two types of repeats: mononucleotide and dinucleotide. A trinucleotide repeat unit (AAT/ATT) was found in all samples of Circaea. Tetranucleotide repeats were found in C. alpina subsp. caulescens (AATAT/ATATA) and C. glabrescens (Pamp.) Hand.-Mazz. (AAAGG/AAGGA). Only one hexanucleotide repeat (AAATAT/ATAAAT) was present in C. alpina subsp. caulescens. These SSRs were mainly located in the IGS region and sometimes also occurred in introns and CDs. As expected, most SSRs were detected in the LSC region, followed by the SSC and IR regions.
TABLE 2

Simple sequence repeats (SSRs) for the 16 newly sequenced Epilobium, Circaea, and Chamaenerion samples.

GenomesRepeat unitsNumberPercentage (%)LocationRegion
IntronIGSCDSLSCSSCIR
Chamaenerion angustifolium subsp. circumvagum (Mosquin) MoldenkeA/T7785.613531159126
C/G66.72442
AT/AT77.8257
C. angustifolium subsp. angustifolium (L.) Scop.A/T8090.91457960146
C/G33.41212
AT/AT55.7235
C. conspersum (Hausskn.) Kitam.A/T6787.07481249108
C/G45.21322
AT/AT67.8246
Circaea alpina subsp. caulescens (Kom.) Tatew.A/T4989.183564252
C/G11.8111
AT/TA23.622
AAT/ATA11.811
AATAT/ATATA11.811
AAATAT/ATAAAT11.81
C. alpina subsp. micrantha (A. K. Skvortsov) BouffordA/T4187.262963362
C/G24.31221
AT/TA36.422
AAT/ATA12.111
C. cordata RoyleA/T4898.083464152
AAT/ATA12.011
C. glabrescens (Pamp.) Hand.-Mazz.A/T5393.093864652
C/G11.811
AT/TA11.811
AAT/ATA11.811
AAAGG/AAGGA11.811
C. repens Wall. ex Asch. & MagnusA/T4892.353764152
C/G23.822
AT/AT11.911
AAT/ATA11.911
Epilobium amurense subsp. amurense Hausskn.A/T7592.676265898
AT/TA67.42451
E. amurense subsp. cephalostigma (Hausskn.) C.J. Chen, Hoch & P.H. RavenA/T7790.686366188
AT/TA89.42662
E. cylindricum D. DonA/T7392.4859653128
AT/TA67.62451
E. minutiflorum Hausskn.A/T8092.076766578
AT/TA78.02561
E. royleanum Hausskn.A/T6990.875666078
AT/TA79.22561
E. sikkimense Hausskn.A/T8296.576966598
C/G11.211
AT/TA22.422
E. tibetanum Hausskn.A/T7592.686165898
AT/TA67.42451
E. williamsii P. H. RavenA/T7692.776365998
AT/TA67.31551
Simple sequence repeats (SSRs) for the 16 newly sequenced Epilobium, Circaea, and Chamaenerion samples. In addition to the SSRs, we also explored the role of repeats identified by REPuter (Kurtz et al., 2001). We found a total of 640 repeats in the 16 samples (Figure 5). Only palindromic and forward repeats were detected in Epilobium and Chamaenerion. In addition to these two types of repeat, a complement repeat was detected in Circaea cordata Royle, and a reverse repeat was found in C. alpina subsp. caulescens. The Circaea chloroplast genome sequence had 10–19 forward repeats, whereas Chamaenerion and Epilobium had 30–48 forward repeats. Among all the detected repeats, palindromic repeats accounted for 12.18% and forward repeats accounted for 87.5% of total repeats, whereas complement and reverse repeats only accounted for 0.032%. The repeat length of Circaea and Chamaenerion was shorter, and most repeats were between 30 and 44 bp. The repeat length of Epilobium was longer, at 30–59 bp in most samples. Much longer repeats (over 100 bp) were found in Epilobium and Chamaenerion. The longest repeat was 203 bp in length and was located in the ycf2 gene. The region containing the majority of the repeats was CDs (70%), followed by the IGS (25%), and the intron region (5%). A large number of repeats were found in the ycf2 gene (in CDs), especially in Epilobium and Chamaenerion samples. The presence of those SSRs and repeats demonstrated that the loci were potentially mutation hotspots in the chloroplast genome, and they may play an important role in developing genetic markers for future phylogenetic or population genetic studies.
FIGURE 5

Analyses of repeated sequences in 16 newly sequenced chloroplast genomes of Onagraceae. (A): number of four repeat types; (B): frequency of direct repeats by length; (C): location of repeats; (D): frequency of palindromic repeats by length.

Analyses of repeated sequences in 16 newly sequenced chloroplast genomes of Onagraceae. (A): number of four repeat types; (B): frequency of direct repeats by length; (C): location of repeats; (D): frequency of palindromic repeats by length.

Biparentally vs. Maternally Inheritated Chloroplast Genome

Comparing chloroplast genome sequences of Oenothera biennis (biparental transmitted) and Circaea (maternally transmitted), several structural differences are depicted. A large inversion occurs in the LSC regions of O. biennis, whereas chloroplast genome of Circaea lacks it. Two intons in clpP genes (present in most genera of Onagraceae) are absent in the O. biennis. The IR region of the O. biennis chloroplast genome was slightly expanded with the inclusion of ndhF gene (Figures 2, 3, 6). Chloroplasts of Epilobium were also reported to be mainly (but not entirely) maternally inherited (Schmitz and Kowallik, 1986). Chloroplast genomes of Epilobium samples also lack the inversion, and their clpP have both introns, although their IR regions were also slightly expanded.
FIGURE 6

Bayesian consensus tree of Onagraceae species inferred from complete chloroplast genome sequences. Maximum likelihood (ML) bootstrap values/Posterior Probability (PP) values are shown at each node. Internal branches that are fully supported by both analyses (with 100 ML bootstrap values and 1 PP values) were thickened. ML bootstrap values < 50 and PP values < 0.95 are shown as --.

Bayesian consensus tree of Onagraceae species inferred from complete chloroplast genome sequences. Maximum likelihood (ML) bootstrap values/Posterior Probability (PP) values are shown at each node. Internal branches that are fully supported by both analyses (with 100 ML bootstrap values and 1 PP values) were thickened. ML bootstrap values < 50 and PP values < 0.95 are shown as --. Although we cannot concluded that all the members of Oenothera subsect. Oenothera have the biparentally transmitted chloroplast, the chloroplast genome structure, gene content, and the gene arrangement of all the samples from this subsection are quite stable (Figure 6). In contrast, the chloroplast genome of Oenothera sect. Gaura are the similar to those in Chamaenerion and Epilobium rather than to other Oenothera samples. Oenothera subsect. Munzia have clpP without introns, which is similar to subsect. Oenothera, but have much more expanded IR regions (with 21 genes) and no inversion in their chloroplast genomes. To better accommodate the heterogeneity of the data in the processes of Bayesian analysis, the complete chloroplast dataset was tested by six partitioning treatments. All the partitioning strategies showed similar results and no obvious improvement was observed among them. It seemed that partitioning the coding region by the third codon position obtained a little better result (Table 3). For this reason, we used this partition strategy for the datasets which have coding regions (the complete CDs sequence, the LSC-CDs, the SSC-CDs, the IR-CDs, and the complete chloroplast genome datasets), and GTR + I + G for each partition tested by PartitionFinder were applied for Bayesian analysis. We used GTR + G model tested by PartitionFinder (and no partitioning strategy) for the non-coding datasets (the complete IGS, the complete intron, the LSC-IGS, the LSC intron, the SSC-IGS, the SSC-intron, the IR-IGS, and the IR-intron) for both ML and Bayesian analyses. The complete chloroplast genome dataset (including LSC, SSC, and IR with an aligned length of 126,290 bp) generated a phylogeny (Figure 6), which is consistent with all the 12 separated phylogenies (Supplementary Figure S4).
TABLE 3

Comparison of partitioning strategies used for the complete chloroplast genome dataset.

DatasetPartitioning strategyParametersSubsetsln L BIC
CompleteNo partition721−549020.001098905.52
ChloroplastCoding and non coding953−554228.871109618.89
GenomeLSC, SSC, IRs943−541571.771084273.53
DatasetBy gene18612−548690.441099654.32
By gene and codon position25519−545647.251094411.31
By the third codon position842−528124.331057271.73
Comparison of partitioning strategies used for the complete chloroplast genome dataset. Phylogenies created from all the datasets using both methods were basically the same, especially for strongly supported clades. Thus in this study, our discussion was on the basis of the phylogeny inferred by the complete chloroplast genome dataset. All the Onagraceae tribes and genera were strongly supported. The genus Ludwigia was shown to be the first diverged genus in the Onagraceae. The phylogenetic relationship within the genus Circaea was not well resolved, whereas the genus Oenothera showed a clear phylogenetic structure. Oenothera curtiflora (sect. Gaura) was revealed to be the first diverged species in the genus. Other Oenothera species formed a strongly supported clade with a long branch. Within this clade, two subsections (subsect. Munzia and subsect. Oenothera) were clearly resolved with high supporting values. The genus Epilobium was also relatively densely sampled; however, this genus was not well resolved in the basal part of the phylogeny.

Discussion

The present study reports the first chloroplast genomes of Circaea. Chloroplast genomes of some species in Chanaenerion and Epilobium were also reported for the first time. We compared the genetic diversity within each genus to obtain insight into the molecular evolution of chloroplast genomes in Onagraceae. Gene content and organization of the Onagraceae chloroplast genome were analyzed to reveal phylogenetic information pertaining to gene rearrangement. Analysis of codon usage by the chloroplast genome can aid to understand the selection pressure on genes and genome structure (Yang et al., 2014). In the present study, the preference of codons ending with A/T in Onagraceae chloroplast genomes was confirmed (Supplementary Table S4). The same results were also observed in other angiosperm species, such as in Fabaceae, Solanaceae, Asteraceae, and many others (Nie et al., 2014; Mehmood et al., 2020; Somaratne et al., 2020). Our results also show the highest similarities of codon usage among the three newly sequenced genera (Supplementary Figure S1), indicating that these genera may have experienced similar environmental stresses in their evolutionary history. Most SSRs in the newly sequenced Onagraceae chloroplast genomes were found to be mononucleotides (A/T) (Table 2), which is similar to reports in other families of angiosperms. The genus Circaea contained more types of SSRs and repeats than the other two genera (Figure 5; Table 2). This SSR and repeat information may be helpful for the development of molecular markers for population genetics analysis and developing DNA barcodes. The angiosperm chloroplast genomes are conserved in gene content and organization among different lineages (Palmer, 1985). However, structural variation and gene rearrangements in chloroplast genomes have been discovered in many angiosperm families, such as Anacardiaceae (Wang Y. B. et al., 2020), Apiaceae (Lee et al., 2019), Asteraceae (Walker et al., 2014), Campanulaceae (Haberle et al., 2008), Euporbiaceae (Tangphatsornruang et al., 2011), Geraniaceae (Weng et al., 2014), Fabaceae (Cai et al., 2008), Lentibulariaceae (Silva et al., 2019), Podostemaceae (Bedoya et al., 2019), and Ranunculaceae (Liu et al., 2018; He et al., 2019; Zhai et al., 2019). In Onagraceae, our results show the similarities in gene content and organization in chloroplast genome among almost all the sampled genera and species, with the exception of Oenothera subsect. Oenothera species that contain a large inversion (ca. 56 kb) in the LSC region (Supplementary Figure S2). This large gene inversion had been reported previously by Greiner et al. (2008) and is clearly a derived character (synapomorphy) for subsect. Oenothera (Figure 6). Extremely high nucleotide variations occur at both ends of this inversion, which may be the direct cause of this inversion. The phylogenetic analysis in this study also clearly demonstrated the evolutionary trends of the other two structural variations (IR expansion and intron loss in clpP) of the chloroplast genome in Onagraceae. Previous studies have shown that expansion/contraction of the IR region is common in angiosperm chloroplast genomes and is the major cause of length variation in chloroplast genomes (Goulding et al., 1996; Kim and Lee, 2004). IR expansion that results in the duplication of genes has been reported in various plant taxa (Chumley et al., 2006; Lee et al., 2007; Park et al., 2017; Liu et al., 2018). Typically, there are 17 genes in the IR region in a wide range of angiosperm taxa (He et al., 2019). In Onagraceae, the early diverged genera, Ludwigia and Circaea, have 17 genes in their IR region, which may represent primitive state of this character in the family. For the other samples, two kinds of IR expansion were discovered. Chloroplast genomes of Chamaenerion, Epilobium, and Oenothera sect. Gaura have 18-gene IR regions, whereas, Oenothera subsect. Munzia have 21-gene IR regions. From our phylogenetic analysis, the 18-gene IR region can be seen as a derived state from the 17-gene IR regions, and the 21-gene IR region maybe further derived from the 18-gene IR region (Figure 6). The IR regions in Onagraceae seem to evolve toward gradual expansion, and no IR contraction was detected in the family by our analysis. In addition to inversion and IR expansion, another derived character, i.e., introns loss in clpP (Figure 6), is present in Oenothera sect. Oenothera, but not in sect. Gaura and other genera. In addition, the occurrence order of the three structural variations can also be inferred by our phylogenetic analysis. The IR expansion (from 17 genes to 18 genes) in Oenothera, Chamaenerion, and Epilobium, happened before the loss of clpP introns in Oenothera sect. Oenothera, and then followed by the acquisition of the large inversion in subsect. Oenothera. Oenothera subsect. Oenothera seemed to be a very distinctive group carrying almost all specialized chloroplast genome variation in Onagraceae. Species of this subsection are not only known to have biparentally inherited chloroplast genomes but also known to have permanent translocation heterozygosity (PTH), a specialized system in which all seven pairs of chromosomes exchange their arms during meiosis (Cleland, 1972; Raven, 1979; Harte, 1994; Dietrich, 1997; Wagner et al., 2007). In our chloroplast genome analysis, three derived characters, presence of a large inversion, intron loss in clpP, and 18-gene IR, are concentrated in Oenothera subsect. Oenothera. Among them, presence of inversion is only found in this subsection. Although the presence of inversion and biparental transmission of the chloroplast genome are only possessed by Oenothera subsect. Oenothera, we still cannot tell whether biparental transmission has triggered the large inversion or vice versa, because there are many chloroplast genomes with inversions in other plant taxa (such as in Ranunculaceae, He et al., 2019) that do not have biparental plastid transmission. The phylogenetic relationship resolved in this study is basically consistent with that reported in previous studies (Levin et al., 2003; Levin et al., 2004). However, phylogeny inferred from the complete chloroplast genome sequences was better resolved and better supported statistically than from previous studies using Sanger’s sequencing method (Bult and Zimmer, 1993; Conti et al., 1993; Levin et al., 2003; Levin et al., 2004; Sytsma et al., 2004), demonstrating that chloroplast genome sequences may be a good molecular marker for resolving phylogeny of Onagraceae at generic level. Within each genus, species phylogeny was better resolved in Oenothera than in Circaea and Epilobium, which is due to the higher level of variation in Oenothera chloroplast genome (Figure 6). This result indicates that the chloroplast genome sequences can be applied for inferring phylogenetic relationship of Oenothera at sectional or even species level. However, Oenothera has 18 sections (Wagner et al., 2007) and the chloroplast genome from only two sections have been reported. Further studies are needed to be done in the future because it is possible that the other unsampled sections might have their own distinguishing characteristics in the chloroplast genome.

Conclusion

The complete chloroplast genome sequences of 16 samples representing 13 species in Circaea, Chamaenerion, and Epilobium (Onagraceae) were assembled in this study. We compared chloroplast genomes across the Onagraceae samples and obtained comprehensive molecular information including nucleotide content, codon usage, RNA editing sites, structural variation, and simple sequence repeats (SSRs) through bioinformatic analyses. Phylogeny of Onagraceae was inferred using maximum-likelihood (ML) and Bayesian inference (BI) methods to understand generic and specific relationships. The results of the present study showed potential values of the complete chloroplast genome sequences in inferring phylogeny of the family and may provide powerful genetic resources for future studies.
  67 in total

1.  Chloroplast genomic data provide new and robust insights into the phylogeny and evolution of the Ranunculaceae.

Authors:  Wei Zhai; Xiaoshan Duan; Rui Zhang; Chunce Guo; Lin Li; Guixia Xu; Hongyan Shan; Hongzhi Kong; Yi Ren
Journal:  Mol Phylogenet Evol       Date:  2019-02-28       Impact factor: 4.286

2.  Origin of angiosperms and the puzzle of the Jurassic gap.

Authors:  Hong-Tao Li; Ting-Shuang Yi; Lian-Ming Gao; Peng-Fei Ma; Ting Zhang; Jun-Bo Yang; Matthew A Gitzendanner; Peter W Fritsch; Jie Cai; Yang Luo; Hong Wang; Michelle van der Bank; Shu-Dong Zhang; Qing-Feng Wang; Jian Wang; Zhi-Rong Zhang; Chao-Nan Fu; Jing Yang; Peter M Hollingsworth; Mark W Chase; Douglas E Soltis; Pamela S Soltis; De-Zhu Li
Journal:  Nat Plants       Date:  2019-05-06       Impact factor: 15.793

3.  Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

Authors:  Joseph F Walker; Michael J Zanis; Nancy C Emery
Journal:  Am J Bot       Date:  2014-04-03       Impact factor: 3.844

4.  The evolution of the plastid chromosome in land plants: gene content, gene order, gene function.

Authors:  Susann Wicke; Gerald M Schneeweiss; Claude W dePamphilis; Kai F Müller; Dietmar Quandt
Journal:  Plant Mol Biol       Date:  2011-03-22       Impact factor: 4.076

5.  Dynamic Chloroplast Genome Rearrangement and DNA Barcoding for Three Apiaceae Species Known as the Medicinal Herb "Bang-Poong".

Authors:  Hyun Oh Lee; Ho Jun Joh; Kyunghee Kim; Sang-Choon Lee; Nam-Hoon Kim; Jee Young Park; Hyun-Seung Park; Mi-So Park; Soonok Kim; Myounghai Kwak; Kyu-Yeob Kim; Woo Kyu Lee; Tae-Jin Yang
Journal:  Int J Mol Sci       Date:  2019-05-04       Impact factor: 5.923

6.  PGA: a software package for rapid, accurate, and flexible batch annotation of plastomes.

Authors:  Xiao-Jian Qu; Michael J Moore; De-Zhu Li; Ting-Shuang Yi
Journal:  Plant Methods       Date:  2019-05-21       Impact factor: 4.993

7.  Plastid Genomes of Five Species of Riverweeds (Podostemaceae): Structural Organization and Comparative Analysis in Malpighiales.

Authors:  Ana M Bedoya; Bradley R Ruhfel; C Thomas Philbrick; Santiago Madriñán; Claudia P Bove; Attila Mesterházy; Richard G Olmstead
Journal:  Front Plant Sci       Date:  2019-08-20       Impact factor: 5.753

8.  Organelle inheritance and genome architecture variation in isogamous brown algae.

Authors:  Ji Won Choi; Louis Graf; Akira F Peters; J Mark Cock; Koki Nishitsuji; Asuka Arimoto; Eiichi Shoguchi; Chikako Nagasato; Chang Geun Choi; Hwan Su Yoon
Journal:  Sci Rep       Date:  2020-02-06       Impact factor: 4.379

9.  The PREP suite: predictive RNA editors for plant mitochondrial genes, chloroplast genes and user-defined alignments.

Authors:  Jeffrey P Mower
Journal:  Nucleic Acids Res       Date:  2009-05-11       Impact factor: 16.971

10.  Hyb-Seq: Combining target enrichment and genome skimming for plant phylogenomics.

Authors:  Kevin Weitemier; Shannon C K Straub; Richard C Cronn; Mark Fishbein; Roswitha Schmickl; Angela McDonnell; Aaron Liston
Journal:  Appl Plant Sci       Date:  2014-08-29       Impact factor: 1.936

View more
  1 in total

1.  Complete chloroplast genome sequence of Lens ervoides and comparison to Lens culinaris.

Authors:  Nurbanu Tayşi; Yasin Kaymaz; Duygu Ateş; Hatice Sari; Cengiz Toker; M Bahattin Tanyolaç
Journal:  Sci Rep       Date:  2022-09-05       Impact factor: 4.996

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.