Literature DB >> 34276716

The Conservation of Chloroplast Genome Structure and Improved Resolution of Infrafamilial Relationships of Crassulaceae.

Hong Chang1, Lei Zhang1, Huanhuan Xie1, Jianquan Liu1,2, Zhenxiang Xi1, Xiaoting Xu1.   

Abstract

Crassulaceae are the largest family in the angiosperm order Saxifragales. Species of this family are characterized by succulent leaves and a unique photosynthetic pathway known as Crassulacean acid metabolism (CAM). Although the inter- and intrageneric relationships have been extensively studied over the last few decades, the infrafamilial relationships of Crassulaceae remain partially obscured. Here, we report nine newly sequenced chloroplast genomes, which comprise several key lineages of Crassulaceae. Our comparative analyses and positive selection analyses of Crassulaceae species indicate that the overall gene organization and function of the chloroplast genome are highly conserved across the family. No positively selected gene was statistically supported in Crassulaceae lineage using likelihood ratio test (LRT) based on branch-site models. Among the three subfamilies of Crassulaceae, our phylogenetic analyses of chloroplast protein-coding genes support Crassuloideae as sister to Kalanchoideae plus Sempervivoideae. Furthermore, within Sempervivoideae, our analyses unambiguously resolved five clades that are successively sister lineages, i.e., Telephium clade, Sempervivum clade, Aeonium clade, Leucosedum clade, and Acre clade. Overall, this study enhances our understanding of the infrafamilial relationships and the conservation of chloroplast genomes within Crassulaceae.
Copyright © 2021 Chang, Zhang, Xie, Liu, Xi and Xu.

Entities:  

Keywords:  Crassulaceae; adaptive evolution; chloroplast genome; comparative genomics; infrafamilial relationships; phylogenomics

Year:  2021        PMID: 34276716      PMCID: PMC8281817          DOI: 10.3389/fpls.2021.631884

Source DB:  PubMed          Journal:  Front Plant Sci        ISSN: 1664-462X            Impact factor:   5.753


Introduction

The angiosperm family Crassulaceae, also known as the stonecrop family, belongs to the order Saxifragales and include 34 genera and approximately 1,400 species (APG IV, 2016; Messerschmid et al., 2020), which are predominantly perennial herbs, subshrubs, or shrubs (Thiede and Eggli, 2007). The species of Crassulaceae primarily occur in (semi-)arid and mountainous habitats of the temperate and subtropical areas (van Ham and Hart, 1998), and are distributed worldwide with centers of diversity in Mexico, southern Africa, Macaronesia, and the Himalayas (Mort et al., 2001). Physiologically, Crassulaceae are characterized by the Crassulacean acid metabolism (CAM) photosynthetic pathway (Gontcharova and Gontcharov, 2009), which achieves a higher level of water-use efficiency than either the C3 or C4 pathway in water-limited environment (Nobel, 1991). A recent study has identified the crown group of Crassulaceae as one of the 30 core diversification shifts across the angiosperm phylogeny (Magallón et al., 2019). This increased diversification rate may be well associated with the CAM pathway that has been recognized as an evolutionary key innovation (Pilon-Smits et al., 1996; Quezada and Gianoli, 2011; Silvestro et al., 2014). Until now, numerous phylogenetic studies have been performed to evaluate the infrafamilial relationships of Crassulaceae with a broad taxon sampling (e.g., Berger, 1930; van Ham and Hart, 1998; Mort et al., 2001; Mayuzumi and Ohba, 2004; Gontcharova et al., 2006, 2008; Folk et al., 2019; Messerschmid et al., 2020), which have led to the establishment of three subfamilies by Thiede and Eggli (2007), i.e., Crassuloideae, Kalanchoideae, and Sempervivoideae. Although most of these studies are based on one or a few genetic loci, the monophyly of each of three subfamilies sensu Thiede and Eggli (2007) is well supported. The inter- and intrageneric relationships within each of three subfamilies have been also extensively studied using a variety of chloroplast and nuclear loci. The smallest subfamily Crassuloideae comprise approximately 200 species in a single genus, Crassula L. (Messerschmid et al., 2020), whose intrageneric relationships have been recently addressed by sampling 103 species (Bruyns et al., 2019). The subfamily Kalanchoideae include approximately 240 species in four genera (Smith et al., 2019), i.e., Adromischus Lem., Cotyledon L., Kalanchoe, and Tylecodon Toelken, and the inter- and intrageneric relationships have been exclusively assessed in a few separate studies (Gehrig et al., 2001; Mort et al., 2005; Nowell, 2008). The largest and most complex subfamily Sempervivoideae contain 28 genera and over 1,000 species (Thiede and Eggli, 2007), and the inter- and intrageneric relationships have been investigated by a considerable number of studies (e.g., Mes et al., 1997; Jorgensen and Frydenberg, 1999; Mort et al., 2002; Acevedo-Rosas et al., 2004; Fairfield et al., 2004; Mayuzumi and Ohba, 2004; Gontcharova et al., 2006; Carrillo-Reyes et al., 2008, 2009; Yost et al., 2013; Zhang et al., 2014; Klein and Kadereit, 2015; Nikulin et al., 2016; Vázquez-Cotero et al., 2017; de la Cruz-López et al., 2019). These phylogenetic studies have led to the recognition of five tribes within Sempervivoideae by Thiede and Eggli (2007), i.e., Aeonieae (Aeonium Webb and Berth., Aichryson Webb and Berth., and Monanthes Haw.), Sedeae (Afrovivella A. Berger, Dudleya Britton and Rose, Echeveria DC., Graptopetalum Rose, Lenophyllum Rose, Pachyphytum Link et al., Pistorinia DC., Prometheum H. Ohba, Rosularia Stapf, Sedella Britton and Rose, Sedum L., Thompsonella Britton and Rose, and Villadia Rose), Semperviveae (Petrosedum Grulich and Sempervivum L.), Telephieae (Hylotelephium H. Ohba, Kungia K.T. Fu, Meterostachys Nakai, Orostachys Fisch., and Sinocrassula A. Berger), and Umbiliceae (Phedimus Raf., Pseudosedum A. Berger, Rhodiola, and Umbilicus DC.). Furthermore, based on one nuclear (ITS) and three chloroplast markers (matK, rps16, and trnL-trnF), a recent phylogenetic study of 298 Crassulaceae species has recovered six major clades of Sempervivoideae, i.e., Telephium clade, Petrosedum clade, Sempervivum clade, Aeonium clade, Leucosedum clade, and Acre clade (Messerschmid et al., 2020). Despite the significant progress made in the tribal and generic circumscription of Crassulaceae, phylogenetic relationships among tribes and major clades remain poorly to moderately supported or sometimes contradicted, especially within Sempervivoideae (Supplementary Figure 1). Chloroplasts are semi-autonomous replication organelles, originating in endosymbiosis between cyanobacteria and non-photosynthetic host, and play crucial roles in photosynthesis and physiology of plants (Gao et al., 2019a; Huo et al., 2019; Li et al., 2021). The chloroplast genome is mostly a typical quadripartite circular DNA genome comprising a small single-copy (SSC), a large single-copy (LSC), and two inverted repeats (IRs) (Hu et al., 2016; Chang et al., 2020). With the development of next−generation sequencing technology, chloroplast genomes have been proven to be powerful tools for resolving vague relationships of many complicated lineages due to many advantages, such as uniparental inheritance, the relatively conserved structure, low recombination and substitution rate (Goremykin et al., 2003, 2004; Twyford and Ness, 2017; Li et al., 2020a). Although the structure and sequence of plastomes are relatively conservative, variations in structure, size, and evolutionary rates of genes have been found in many studies, which in some cases signify phylogenetical information and adaptation to environment (Chumley et al., 2006; Hu et al., 2015; Piot et al., 2018; Gao et al., 2019a; Gruzdev et al., 2019; Shrestha et al., 2019; Zang et al., 2019). Here, we sequenced fourteen fully annotated chloroplast genomes including nine species of Crassulaceae and five species of Haloragaceae. By taking advantage of the data already available, we analyzed a total of 33 chloroplast genomes and aimed to (i) assess the structural characteristics of the chloroplast genome in a comparative framework, (ii) improve the resolution of the infrafamilial relationships within Crassulaceae, and (iii) investigate the adaptive evolution by selective pressures analysis of protein-coding genes in Crassulaceae.

Materials and Methods

Taxon Sampling, DNA Extraction, and Sequencing

The aim of our taxon sampling was to try to obtain chloroplast genome sequences for at least one representative of each well-supported major clades. The final taxon sampling contained a total of 26 Crassulaceae species (Table 1), representing all three subfamilies sensu Thiede and Eggli (2007), and all five tribes sensu Thiede and Eggli (2007) or five out of the six major clades sensu Messerschmid et al. (2020) within Sempervivoideae. Of these complete chloroplast genomes, nine were newly generated in this study, i.e., Aeonium arboreum (L.) Webb and Berthel., Cotyledon tomentosa Harv., Crassula perforata Thunb., Graptopetalum amethystinum (Rose) E. Walther, Kalanchoe fedtschenkoi Raym.-Hamet and H. Perrier, Orostachys fimbriata (Turcz.) A. Berger, Pachyphytum compactum Rose, Sempervivum tectorum L., and Sinocrassula densirosulata (Praeger) A. Berger. In addition, seven complete chloroplast genomes were included as outgroups, i.e., Myriophyllum spicatum L., Penthorum chinense L., and five new sequenced chloroplast genomes of Haloragis aspera Lindl., Haloragis erecta (Murray) Oken, Glischrocaryon aureum (Lindl.) Orchard, Glischrocaryon glandulosum (Orchard) Christenh. and Byng, Gonocarpus micranthus Thunb. These seven species belong to the family Haloragaceae sensu lato, which are the closest living relatives of Crassulaceae (Jian et al., 2008).
TABLE 1

Structural information of the chloroplast genomes of Crassulaceae and outgroups.

FamilySubfamilyCladeSpeciesGenome sizeLSC lengthSSC lengthIR lengthGC-contentNo. ofGenBank
(bp)(bp)(bp)(bp)PCGsaccession no.
CrassulaceaeCrassuloideaeCrassula perforata*145,73779,46516,65224,81037.8%85MW206794
KalanchoideaeCotyledon tomentosa*150,04982,25016,99525,40238.2%85MW206793
Kalanchoe fedtschenkoi*150,00182,01517,01225,48737.7%85MW206796
Kalanchoe tomentosa150,75782,84617,05125,43037.6%85MN794319
SempervivoideaeAcreGraptopetalum amethystinum*150,36582,00916,76425,79637.9%84MW206795
Pachyphytum compactum*149,33981,04116,75025,77437.9%84MW206798
Sedum emarginatum149,18881,39916,72125,53437.8%82MT680404
Sedum japonicum149,60981,42916,63625,77237.7%85KM281675
Sedum lineare149,25780,96316,64825,82337.9%85MT755626
Sedum plumbizincicola149,39781,59816,66925,56537.7%85MN185459
Sedum sarmentosum150,44882,21216,67025,78337.7%85JX427551
AeoniumAeonium arboreum*150,98682,59616,70625,84237.8%84MW206792
LeucosedumRosularia alpestris151,28882,93116,78525,78637.8%85MN794333
SempervivumSempervivum tectorum*151,18282,86516,70925,80437.6%85MW206799
TelephiumHylotelephium ewersii151,69983,25316,83825,80437.7%85MN794014
Orostachys fimbriata*151,19582,79216,83325,78537.8%84MW206797
Orostachys japonica151,41983,01616,84925,77737.8%85MN794320
Phedimus aizoon151,39382,86817,04325,74137.7%85MN794321
Phedimus kamtschaticus151,65283,01016,68825,97737.8%85MG680403
Rhodiola integrifolia151,45282,91517,05525,74137.8%85MN794327
Rhodiola ovatisepala151,07382,34817,09325,81637.7%85MN794328
Rhodiola rosea151,34882,71617,05225,79037.7%85MH410216
Rhodiola yunnanensis151,25782,56117,00825,84437.8%85MN794332
Sinocrassula densirosulata*151,77383,12316,90425,87337.7%85MW206800
Sinocrassula indica151,75583,15916,88825,85437.7%85MN794334
Umbilicus rupestris150,99582,68116,92625,69437.6%85MN794335
Haloragaceae s.l.Penthorum chinense156,68686,73518,39925,77637.3%84JX436155
Glischrocaryon aureum*158,41787,74318,71825,97837.1%83MW971555
Glischrocaryon glandulosum*158,14688,12318,74325,64036.8%84MW971556
Gonocarpus micranthus*158,65588,16519,00025,74542.8%83MW971559
Haloragis aspera*159,39589,20718,48225,85336.7%81MW971557
Haloragis erecta*159,41489,04318,55525,90836.7%84MW971558
Myriophyllum spicatum158,86088,42018,81425,81336.5%84MH191392
Structural information of the chloroplast genomes of Crassulaceae and outgroups. The DNA materials of four species (Haloragis aspera, Haloragis erecta, Glischrocaryon aureum and Glischrocaryon glandulosum) were provided by DNA Bank of Royal Botanic Gardens, Kew[1]. Fresh leaves of the other ten species were collected from the field and preserved with silica gel. The total genomic DNA was extracted using a modified CTAB method (Allen et al., 2006). For each species, one paired-end library with an insertion size of ∼350 base pairs (bp) was prepared from the total genomic DNA using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, MA, United States), which was then sequenced using the HiSeq 2500 System (Illumina, Inc., CA, United States) to obtain paired 150-bp reads. Briefly, (i) the genomic DNA was sonicated using the S220 Focused-ultrasonicator (Covaris, MA, United States), (ii) the fragmented DNA was end-repaired, dA-tailed, adapter ligated, and subjected to 10–12 cycles of PCR amplification, and (iii) the quality of each library was assessed using the 2100 Bioanalyzer system (Agilent, CA, United States).

Chloroplast Genome Assembly and Annotation

The raw Illumina reads were first filtered to remove paired-end reads if either of the reads contained (i) adapter sequences, (ii) more than 10% of N bases, and (iii) more than 50% of bases with a Phred quality score less than five. The filtered reads were then assembled using NOVOPlasty version 2.7.2 (Dierckxsens et al., 2017), and the complete chloroplast genome of Sedum sarmentosum Bunge (Dong et al., 2013) was used as the reference genome. These assemblies were manually inspected using Geneious version 11.0.3 (Kearse et al., 2012). The assembled chloroplast genomes were annotated using Plann version 1.1 (Huang and Cronk, 2015), and the positions of exons and introns were inspected and adjusted using Sequin version 15.50. In addition, the circular maps of the chloroplast genomes were drawn using OGDRAW version 1.2 (Lohse et al., 2013), and all annotated chloroplast genomes were deposited in GenBank (Sayers et al., 2020).

Comparative Analysis of Chloroplast Genomes

For the 26 chloroplast genomes of Crassulaceae, the complete nucleotide sequences were compared using the glocal alignment algorithm Shuffle-LAGAN (Brudno et al., 2003) as implemented in the program Mvista[2] (Frazer et al., 2004). Here, Rhodiola rosea L. was chosen as the reference to evaluate gene content variation following Zhao et al. (2020). To better determine whether any specific pattern of structural variation exists at the family level, the chloroplast genome of Rhodiola rosea was used as the representative owing to the highly conserved gene content and gene order within Crassulaceae (see section “Results”), and compared with that of Penthorum chinense using progressiveMauve (Darling et al., 2010) as implemented in the software package Mauve version 2.3.1 (Darling et al., 2004). Following Firetti et al. (2017), one of the inverted repeat (IR) regions was manually removed prior to the alignment. To further identify the hypervariable regions, coding and non-coding regions of the 26 chloroplast genomes were first extracted using PhyloSuite version 1.2.1 (Zhang et al., 2020) and aligned separately using MAFFT version 7.427 (Katoh and Standley, 2013) with default parameters. The nucleotide diversity (π) was then estimated separately for coding and non-coding regions using DnaSP version 6.0 (Rozas et al., 2017). Moreover, since the size variation of the chloroplast genome may be attributed to the expansion or contraction of the IR region, the boundaries between the IR and single-copy regions were identified using IRscope[3] (Amiryousefi et al., 2018) and manually inspected using Geneious.

Phylogenetic Analysis

The nucleotide sequences of the chloroplast protein-coding genes were aligned separately using MAFFT and then concatenated into a supermatrix using PhyloSuite. The optimal partitioning scheme and models of DNA sequence evolution were determined using the relaxed hierarchical clustering algorithm (Lanfear et al., 2014) as implemented in PartitionFinder v2.1.1 (Lanfear et al., 2017). Phylogenetic relationships were inferred for 26 species of Crassulaceae (i.e., the 33-taxon supermatrix) using both Bayesian inference (BI) and maximum likelihood (ML) methods. For the BI method, four parallel Markov chain Monte Carlo (MCMC) runs were performed using MrBayes version 3.2.7 (Ronquist et al., 2012). The supermatrix was partitioned based on the optimal scheme determined by PartitionFinder, and the best-fitting substitution model was specified as prior for each partition with model parameters unlinked across partitions. A total of 1,000,000 generations were run with sampling every 500 generations, and the first 25% of samples were discarded as burn-in. Convergence of runs was assumed when the average standard deviation of split frequencies dropped below 0.01. The best-scoring ML tree was inferred using RAxML version 8.2.11 (Stamatakis, 2014) with the GTRGAMMAX model for each partition, and branch support was assessed using the rapid bootstrap algorithm (Stamatakis et al., 2008) with 1,000 replicates. In order to test the potential effect of uneven taxon sampling from each of the genera, we further subsampled the 26 species of Crassulaceae down to a single species as the representative of each genus. Phylogenetic relationships were then estimated from the 20-taxon supermatrix as described above.

Positive Selection Analysis

The likelihood ratio test (LRT) and Bayes empirical Bayes (BEB) based on modified branch-site model (Yang et al., 2005; Zhang et al., 2005; Yang and Dos, 2011) were used to identify positively selected genes. Since the topological structures of phylogenetic trees constructed by ML and BI methods were congruent, ML tree was used to positive selection analysis. The amino acid sequences of the sixty-seven common protein-coding genes were aligned using MAFFT and converted into nucleotide alignments using PAL2NAL version 14 (Suyama et al., 2006). The nucleotide alignments were trimmed to obtain the final alignments for positive selection analysis by trimAL version 1.4 (Capellagutiérrez et al., 2009). The branch-site model was performed by codeml program in PAML version 4.9 (Yang, 2007). The branch-site test of positive selection was run with the ω of foreground lineage fixed to 1 (fix_omega = 1) for the null hypothesis and estimated (fix_omega = 0) for the alternative hypothesis. The LRT values at df = 1 were calculated by Chi Square test in PAML, and genes with p < 0.05 were treated as candidate positives. Finally, BEB was used to identify those positively selected codon sites.

Results

Characteristics of Chloroplast Genomes in Crassulaceae

After quality control and pre-processing, at least four gigabases (Gb) of whole-genome sequencing data were obtained for each of the nine species (Table 1). These clean reads were assembled into high-quality chloroplast genomes using a reference-guided approach, and the resulting coverage ranged from 1,115 × (i.e., Cotyledon tomentosa) to 12,687 × (i.e., Kalanchoe fedtschenkoi). All these newly assembled chloroplast genomes exhibited a typical quadripartite structure, with two IR regions (i.e., IRa and IRb) separating the LSC and SSC regions (Figure 1).
FIGURE 1

Circular gene map of the Crassulaceae chloroplast genome. The genes labeled inside and outside of the circle are transcribed in clockwise and counterclockwise directions, respectively. The inner circle shows the quadripartite structure, with two IR regions (IRa and IRb) separating the large single-copy (LSC) and small single-copy (SSC) regions. The gray ring marks the GC-content with the inner circle indicating a 50% threshold.

Circular gene map of the Crassulaceae chloroplast genome. The genes labeled inside and outside of the circle are transcribed in clockwise and counterclockwise directions, respectively. The inner circle shows the quadripartite structure, with two IR regions (IRa and IRb) separating the large single-copy (LSC) and small single-copy (SSC) regions. The gray ring marks the GC-content with the inner circle indicating a 50% threshold. The structure of the chloroplast genome appeared to be largely conserved across the family (Table 1). For each of the 26 Crassulaceae species, the size of the chloroplast genome varied from 145,737 bp (i.e., Crassula perforata) to 151,773 bp (i.e., Sinocrassula densirosulata), and the overall GC-content ranged from 37.6% (i.e., Sempervivum tectorum) to 38.2% (i.e., Cotyledon tomentosa). In addition, the total number of annotated genes in each of these chloroplast genomes ranged from 131 (i.e., Sedum emarginatum Migo) to 134 (i.e., Sedum lineare Thunb.), and all these chloroplast genomes possessed 37 tRNA and four rRNA genes. Using Rhodiola rosea as the reference, the analysis of mVISTA showed high similarity in gene content and gene order among the 26 chloroplast genomes, and further indicated a fairly high sequence similarity, especially in the coding regions (Supplementary Figure 2). This observation was corroborated by our analysis of nucleotide diversity (Figure 2). The nucleotide diversity in the coding regions ranged from 0 to 0.0794, with an average of 0.0230, which was significantly lower than that in the non-coding regions (0–0.1614, 0.0647; p-value < 2.2 × 10–16, Welch’s t-test). Here, the five coding regions with the highest nucleotide diversity were matK, ycf1, ndhF, rpl22, and rpl32, and the corresponding non-coding regions were trnH-psbA, trnG-trnR, rpl32-trnL, rps16-trnQ, and ccsA-ndhD. Furthermore, no evidence of genomic rearrangement was found in the chloroplast genome of Crassulaceae, when compared with that of Penthorum chinense using progressiveMauve (Supplementary Figure 3).
FIGURE 2

Distribution of nucleotide diversity (π) in coding (A) and non-coding (B) regions of 26 Crassulaceae chloroplast genomes.

Distribution of nucleotide diversity (π) in coding (A) and non-coding (B) regions of 26 Crassulaceae chloroplast genomes. The boundaries of the LSC, SSC, and IR regions were highly consistent within the family, and no obvious expansion or contraction of the IR region was detected in the 26 chloroplast genomes (Supplementary Figure 4). Here, trnH was shown to be the first gene in the LSC region at the junction between IRa and LSC (i.e., IRa/LSC). At the other end of the LSC region, the junction LSC/IRb was identified as located within the rps19 gene, which gave rise to a truncated copy of the rps19 gene in the IRa region. For both ends of the SSC region, the junctions IRb/SSC and SSC/IRa were found to be located within ndhF and ycf1 gene, respectively. As a consequence, a truncated copy of the ycf1 gene was retained in the IRb region.

Phylogenetic Relationships of Crassulaceae

The 33-taxon supermatrix contained a total of 79 genes and 70,905 sites, and the amount of missing data (including gaps and undetermined characters) was 4.5%. Phylogenetic analyses of the 33-taxon supermatrix using ML and BI methods yielded an identical topology (Figure 3), and all relationships were strongly supported by both methods, i.e., ≥85 ML bootstrap percentage (BP) and ≥0.99 Bayesian posterior probability (PP).
FIGURE 3

Phylogenetic inference of 33-taxon supermatrix using maximum likelihood (ML) and Bayesian inference (BI) methods. Branch support was assessed using ML bootstrap percentage (BP) and Bayesian posterior probability (PP), and internal branches with less than 100 BP/1.0 PP are indicated with corresponding values. Species with newly sequenced chloroplast genomes are marked with the asterisks, and clade designations are labeled accordingly.

Phylogenetic inference of 33-taxon supermatrix using maximum likelihood (ML) and Bayesian inference (BI) methods. Branch support was assessed using ML bootstrap percentage (BP) and Bayesian posterior probability (PP), and internal branches with less than 100 BP/1.0 PP are indicated with corresponding values. Species with newly sequenced chloroplast genomes are marked with the asterisks, and clade designations are labeled accordingly. All 26 Crassulaceae species formed a monophyletic group, and were divided into three subclades corresponding to the three subfamilies sensu Thiede and Eggli (2007), i.e., Crassuloideae, Kalanchoideae, and Sempervivoideae (Figure 3). When rooted with Haloragaceae s.l., Crassuloideae (represented by Crassula perforata) were resolved as sister to Kalanchoideae plus Sempervivoideae. Within Kalanchoideae, the two sampled species of Kalanchoe formed a monophyletic group that was sister to Cotyledon tomentosa. Within Sempervivoideae, our sampled species fell into five clades, i.e., Acre clade, Aeonium clade, Leucosedum clade, Sempervivum clade, and Telephium clade (Figure 3). Here, the Telephium clade was established as sister to the rest of Sempervivoideae, and further split into two lineages. One lineage comprised two sampled species of Phedimus and four sampled species of Rhodiola, which formed two reciprocally monophyletic groups, and the other contained the sampled species of Hylotelephium, Orostachys, Sinocrassula, and Umbilicus. Importantly, Umbilicus rupestris (Salisb.) Dandy was recovered as sister to a clade consisting of Hylotelephium, Orostachys, and Sinocrassula. The Sempervivum clade (represented by Sempervivum tectorum) and the Aeonium clade (represented by Aeonium arboreum) were placed as successive sister lineages to the Leucosedum clade [represented by Rosularia alpestris (Kar. and Kir.) Boriss.] plus the Acre clade. Furthermore, the five sampled species of Sedum in the Acre clade were recovered as a paraphyletic group, with Graptopetalum amethystinum and Pachyphytum compactum nested within them. Importantly, except the non-monophyly of Sedum, the same set of intergeneric relationships was recovered from the 20-taxon supermatrix (Supplementary Figure 5), suggesting that our results should be robust to taxon sampling. A total number of sixty-seven common genes were subjected to positive selection analyses (Table 2). The LRTs with p-value > 0.05 suggested that there was no statistical support for positive selection in any genes (Table 2), although the BEB approach identified fourteen genes (atpB, ndhE, ndhJ, petA, psaC, psaJ, psbB, psbD, psbN, rpoC2, rps15, rps3, rps7, and ycf4) with relatively high posterior probabilities of codon sites.
TABLE 2

The positive selection test based on the branch-site model.

Gene nameNull hypothesisAlternative hypothesisSignificance test



InLω = 1InLω > 1BEBP-value
accD−5,981.5601−5,981.396999.0000.566
atpA−5,342.6331−5,342.6333.8731.000
atpB−4,756.5161−4,756.5161.00019 Q 0.555; 33 F 0.596; 76 F 0.596; 85 I 0.570; 253 Q 0.585; 269 R 0.585; 307 N 0.592; 392 S 0.600; 424 M 0.593; 450 R 0.580; 458 K 0.5991.000
atpE−1,313.3751−1,313.3751.0001.000
atpF−2,322.4921−2,322.470436.6980.841
atpH−669.5191−670.7631.0000.115
atpI−2,397.1941−2,397.1941.0001.000
ccsA−4,278.5651−4,278.5653.7101.000
cemA−2,983.4941−2,982.704999.0000.209
clpP−2,044.1451−2,043.96771.3800.549
infA−799.9541−799.9543.3221.000
ndhA−4,381.0331−4,381.0363.3450.920
ndhB−2,843.1381−2,843.1383.0421.000
ndhC−1,206.9681−1,206.9683.0340.997
ndhE−1,161.4821−1,160.256999.0004 D 0.558; 5 F 0.566; 24 H 0.540; 57 F 0.566; 92 L 0.5510.117
ndhG−2,177.6051−2,177.6063.0611.000
ndhH−4,525.8301−4,525.8303.1791.000
ndhI−1,904.6981−1,904.6983.5381.000
ndhJ−1,546.5881−1,544.996999.0001 I 0.566; 14 R 0.576; 33 N 0.560; 51 Q 0.574; 78 F 0.578; 90 F 0.533; 107 N 0.583; 128 R 0.5660.075
petA−3,388.0671−3,388.0671.0005 L 0.511; 7 Q 0.504; 32 I 0.5111.000
petD−1,543.0211−1,543.0211.0001.000
petL−304.8191−304.8193.3961.000
petN−195.1531−195.1533.7901.000
psaA−6,285.5581−6,285.5581.0001.000
psaB−6,370.1891−6,370.2721.0000.689
psaC−650.9381−650.67326.52153 H 0.5530.467
psaJ−381.3701−381.3701.0008 R 0.766; 37 Q 0.7591.000
PsbA−2,934.7411−2,934.7288.4200.888
psbB−5,011.0331−5,011.0331.00039 R 0.503; 73 K 0.508; 124 N 0.509; 155 F 0.509; 287 M 0.507; 289 M 0.521; 332 M 0.509; 505 K 0.5021.000
psbC−4,124.9141−4,124.9141.0001.000
psbD−2,854.4221−2,853.539999.00013 E 0.571; 159 R 0.5180.183
psbE−630.8731−630.8731.0001.000
psbF−234.9971−234.9971.0001.000
psbH−766.1571−766.1571.0001.000
psbI−281.5721−281.5723.7941.000
psbJ−283.1781−283.1781.0001.000
psbK−700.8441−700.824609.8360.841
psbL−264.2961−264.2961.0001.000
psbM−329.7041−329.7041.0001.000
psbN−266.7501−268.0021.29111 F 0.822; 31 P 0.840; 34 Q 0.6790.114
psbT−332.7581−332.7581.0001.000
psbZ−540.5211−540.5211.0001.000
rbcL−4,364.5571−4,364.5571.3791.000
rpl14−1,092.2321−1,092.205999.0000.823
rpl16−1,532.1081−1,532.00833.6460.655
rpl2−1,729.2331−1,729.2331.0001.000
rpl20−1,410.5031−1,410.496234.3130.903
rpl23−514.5971−514.5945.9140.943
rpl32−768.4211−768.4213.2621.000
rpl33−770.9691−770.721999.0000.480
rpl36−358.6421−358.6426.4661.000
rpoA−4,186.0191−4,185.993999.0000.823
rpoB−12,323.0161−12,323.0162.5831.000
rpoC2−18,495.0251−18,494.1944.364280 I 0.684; 794 R 0.683; 1,085 L 0.6230.198
rps11−1,415.2711−1,415.25045.7780.841
rps12−765.5061−765.3597113.0440.590
rps14−989.9591−989.9591.0001.000
rps15−1,369.4571−1,369.0558.03254 K 0.7960.370
rps18−849.9391−849.9391.0001.000
rps19−1,186.2791−1,186.2791.9791.000
rps2−2,405.9321−2,405.9341.0001.000
rps3−2,700.2781−2,700.2781.00074 I 0.505; 88 K 0.620; 89 N 0.504; 105 C 0.550; 180 H 0.590; 199 E 0.5841.000
rps4−1,960.4981−1,960.4971.0001.000
rps7−786.6101−786.34549.90812 F 0.9140.467
rps8−1,693.3061−1,693.3061.0001.000
ycf3−1,302.3371−1,302.3371.0001.000
ycf4−2,054.4321−2,054.4321.0002 M 0.600; 28 Q 0.646; 33 F 0.573; 54 D 0.595; 151 S 0.5201.000
The positive selection test based on the branch-site model.

Discussion

In this study, we report nine newly sequenced complete chloroplast genomes of Crassulaceae. Our comparative analyses indicate that the overall gene organization of the chloroplast genome is highly conserved across all 26 Crassulaceae species investigated here. In addition, the results of IRscope analysis reveal no obvious expansion or contraction of the chloroplast IR region. Furthermore, although a recent study has identified a unique 4-kb inversion in the chloroplast genome of the outgroup species Myriophyllum spicatum (Liao et al., 2020), our analyses show a high degree of similarity in chloroplast gene order between Crassulaceae and the outgroup species Penthorum chinense. Previous studies have demonstrated that the size variation of the angiosperm chloroplast genome is primarily due to variation in the IR region, intergenic region, and gene copy number (Zheng et al., 2017; Amiryousefi et al., 2018; Bedoya et al., 2019). The results of mVISTA analysis suggest that hypervariable intergenic regions in the LSC region, such as petA-psbJ, psbM-trnD, psbZ-trnG, rps16-trnQ, and trnE-trnT (Supplementary Figure 2), contribute most to the chloroplast genome size variation within Crassulaceae. Moreover, phylogenetic studies of Crassulaceae have relied primarily on a limited set of chloroplast markers (e.g., matK, rps16, and trnL-trnF). Our comparative analyses, however, have shown that these chloroplast markers appear to be relatively low in nucleotide diversity (Figure 2), which may be partially responsible for the lack of phylogenetic resolution within Sempervivoideae (Mort et al., 2001; Messerschmid et al., 2020). Thus, to achieve better phylogenetic resolution, future studies of Crassulaceae should focus on molecular markers from more variable regions of the chloroplast genome, such as ccsA-ndhD, rps16-trnQ, rpl32-trnL, and trnH-psbA (Prince, 2015). With increased taxon sampling of key lineages of Crassulaceae, our phylogenetic analyses of chloroplast genome sequences have substantially improved the phylogenetic resolution, and provided robust inference of the infrafamilial relationships. Among the three subfamilies sensu Thiede and Eggli (2007), our phylogenetic analyses have confirmed the monophyly of Kalanchoideae and of Sempervivoideae, and suggested that Crassuloideae are sister to Kalanchoideae plus Sempervivoideae, corroborating previous studies based on DNA sequences and restriction site variation (e.g., van Ham and Hart, 1998; Mort et al., 2001, 2009; Bruyns et al., 2019; Folk et al., 2019). In addition, the sister relationship between Kalanchoideae and Sempervivoideae is supported by two putative morphological synapomorphies, i.e., leaves with a single apical or subapical hydathode and seeds with costate testa (Thiede and Eggli, 2007). Numerous studies have attempted to identify major lineages within Sempervivoideae (e.g., van Ham and Hart, 1998; Mort et al., 2001; Mayuzumi and Ohba, 2004; Nikulin et al., 2016; Folk et al., 2019; Messerschmid et al., 2020), but relationships among these major lineages remain uncertain (Supplementary Figure 1). Our analyses have revealed five clades that are successively sister lineages, i.e., Telephium clade, Sempervivum clade, Aeonium clade, Leucosedum clade, and Acre clade. These clades correspond to five out of the six major clades sensu Messerschmid et al. (2020). Of these six major clades, the lone exception here is the Petrosedum clade sensu Messerschmid et al. (2020), for which no chloroplast genome sequence is currently available. The Telephium clade identified here is equivalent to tribes Telephieae plus Umbiliceae sensu Thiede and Eggli (2007). Although it was first proposed by van Ham and Hart (1998), the monophyly of the Telephium clade has only recently been confirmed (i.e., ≥85 BP) by phylogenetic analyses of chloroplast genome sequences (Kim and Kim, 2020) and 301 low-copy nuclear genes (Folk et al., 2019). Our results add further evidence to support the recognition of the Telephium clade. In addition, Umbilicus rupestris is strongly supported by our phylogenetic analyses as sister to the tribe Telephieae (i.e., our sampled species of Hylotelephium, Orostachys, and Sinocrassula), thus corroborating recent findings (Folk et al., 2019; Kim and Kim, 2020; Zhao et al., 2020) and highlighting the paraphyly of the tribe Umbiliceae (i.e., our sampled species of Phedimus, Rhodiola, and Umbilicus). Furthermore, the Sempervivum and Aeonium clades identified here correspond to tribes Semperviveae and Aeonieae sensu Thiede and Eggli (2007), respectively, and the Leucosedum and Acre clades identified here together constitute tribe Sedeae sensu Thiede and Eggli (2007). It has been indicated that the genes with positive selection played key parts in the adaptation to diverse environments (Moseley et al., 2018; Gao et al., 2019b; Li et al., 2020b). However, no positive selection was statistically supported among sixty-seven chloroplast protein-coding genes in sampled Crassulaceae. This result indicated these protein-coding genes are under strong structural and functional constraints. The absence of positive selection in most protein-coding genes of chloroplast genome was also found in Euonymus (Li et al., 2021) and Quercus (Yin et al., 2018). In conclusion, by expanding the number of informative molecular characters, we have further improved the resolution of the phylogenetic relationships among major lineages within Crassulaceae, which will facilitate the identification of non-molecular synapomorphies. However, additional sampling of key lineages (e.g., Petrosedum clade) is required to fully resolve the infrafamilial relationships. Furthermore, the boundaries of some of the traditional genera in Crassulaceae remain poorly defined. For example, Sedum, the largest genus with approximately 470 species, has been shown to be highly polyphyletic, and its intrageneric relationships remain largely unresolved (Messerschmid et al., 2020). Thus, chloroplast phylogenomics will continue to enhance our understanding of the evolutionary history of Crassulaceae in the future.

Data Availability Statement

All annotated chloroplast genomes have been deposited in GenBank (https://www.ncbi.nlm.nih.gov/genbank/), and accession numbers are provided in Table 1.

Author Contributions

JL, ZX, and XX designed research. HC, LZ, and HX performed research and analyzed data. HC, JL, ZX, and XX wrote the manuscript. All authors reviewed and revised the manuscript.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
  65 in total

1.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level.

Authors:  Jianzhi Zhang; Rasmus Nielsen; Ziheng Yang
Journal:  Mol Biol Evol       Date:  2005-08-17       Impact factor: 16.240

2.  [Molecular phylogeny and systematics of flowering plants of the family Crassulaceae DC].

Authors:  S B Goncharova; A A Goncharov
Journal:  Mol Biol (Mosk)       Date:  2009 Sep-Oct

3.  PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses.

Authors:  Robert Lanfear; Paul B Frandsen; April M Wright; Tereza Senfeld; Brett Calcott
Journal:  Mol Biol Evol       Date:  2017-03-01       Impact factor: 16.240

4.  VISTA: computational tools for comparative genomics.

Authors:  Kelly A Frazer; Lior Pachter; Alexander Poliakov; Edward M Rubin; Inna Dubchak
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

5.  Disentangling the effects of key innovations on the diversification of Bromelioideae (bromeliaceae).

Authors:  Daniele Silvestro; Georg Zizka; Katharina Schulte
Journal:  Evolution       Date:  2013-09-11       Impact factor: 3.694

6.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments.

Authors:  Mikita Suyama; David Torrents; Peer Bork
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

7.  Plastome organization and evolution of chloroplast genes in Cardamine species adapted to contrasting habitats.

Authors:  Shiliang Hu; Gaurav Sablok; Bo Wang; Dong Qu; Enrico Barbaro; Roberto Viola; Mingai Li; Claudio Varotto
Journal:  BMC Genomics       Date:  2015-04-17       Impact factor: 3.969

8.  Plastid Genomes of Five Species of Riverweeds (Podostemaceae): Structural Organization and Comparative Analysis in Malpighiales.

Authors:  Ana M Bedoya; Bradley R Ruhfel; C Thomas Philbrick; Santiago Madriñán; Claudia P Bove; Attila Mesterházy; Richard G Olmstead
Journal:  Front Plant Sci       Date:  2019-08-20       Impact factor: 5.753

9.  OrganellarGenomeDRAW--a suite of tools for generating physical maps of plastid and mitochondrial genomes and visualizing expression data sets.

Authors:  Marc Lohse; Oliver Drechsel; Sabine Kahlau; Ralph Bock
Journal:  Nucleic Acids Res       Date:  2013-04-22       Impact factor: 16.971

10.  The complete chloroplast genome of Myriophyllum spicatum reveals a 4-kb inversion and new insights regarding plastome evolution in Haloragaceae.

Authors:  Yi-Ying Liao; Yu Liu; Xing Liu; Tian-Feng Lü; Ruth Wambui Mbichi; Tao Wan; Fan Liu
Journal:  Ecol Evol       Date:  2020-03-04       Impact factor: 2.912

View more
  2 in total

1.  Comparative analysis of the complete chloroplast genomes of six threatened subgenus Gynopodium (Magnolia) species.

Authors:  Huanhuan Xie; Lei Zhang; Cheng Zhang; Hong Chang; Zhenxiang Xi; Xiaoting Xu
Journal:  BMC Genomics       Date:  2022-10-20       Impact factor: 4.547

2.  The complete chloroplast genome sequence of Corylopsis sinensis (Hamamelidaceae).

Authors:  Haoyu Zhang; Jiahao Gu; Hong Chang
Journal:  Mitochondrial DNA B Resour       Date:  2022-02-24       Impact factor: 0.658

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.