Literature DB >> 32736519

Evolutionary directions of single nucleotide substitutions and structural mutations in the chloroplast genomes of the family Calycanthaceae.

Wenpan Dong1,2, Chao Xu1, Jun Wen1,3, Shiliang Zhou4,5.   

Abstract

BACKGROUND: Chloroplast genome sequence data is very useful in studying/addressing the phylogeny of plants at various taxonomic ranks. However, there are no empirical observations on the patterns, directions, and mutation rates, which are the key topics in chloroplast genome evolution. In this study, we used Calycanthaceae as a model to investigate the evolutionary patterns, directions and rates of both nucleotide substitutions and structural mutations at different taxonomic ranks.
RESULTS: There were 2861 polymorphic nucleotide sites on the five chloroplast genomes, and 98% of polymorphic sites were biallelic. There was a single-nucleotide substitution bias in chloroplast genomes. A → T or T → A (2.84%) and G → C or C → G (3.65%) were found to occur significantly less frequently than the other four transversion mutation types. Synonymous mutations kept balanced pace with nonsynonymous mutations, whereas biased directions appeared between transition and transversion mutations and among transversion mutations. Of the structural mutations, indels and repeats had obvious directions, but microsatellites and inversions were non-directional. Structural mutations increased the single nucleotide mutations rates. The mutation rates per site per year were estimated to be 0.14-0.34 × 10- 9 for nucleotide substitution at different taxonomic ranks, 0.64 × 10- 11 for indels and 1.0 × 10- 11 for repeats.
CONCLUSIONS: Our direct counts of chloroplast genome evolution events provide raw data for correctly modeling the evolution of sequence data for phylogenetic inferences.

Entities:  

Keywords:  Calycanthaceae; Chloroplast genome; Indels; Structural mutations; Substitution rate

Mesh:

Substances:

Year:  2020        PMID: 32736519      PMCID: PMC7393888          DOI: 10.1186/s12862-020-01661-0

Source DB:  PubMed          Journal:  BMC Evol Biol        ISSN: 1471-2148            Impact factor:   3.260


Background

Genome evolution is a major theme of biology in the genomics era. The topics cover patterns, directions and rates of substitutions, repeats, rearrangements and recombinations, hybridization and polyploidy, lateral gene transfer, gene families, etc., with varying depth and scope [1, 2]. Genome evolution can be easily demonstrated by genome structure mutations and nucleotide substitutions. The mutations of genome structure include insertions/deletions (indels) and inversions. The nucleotide substitutions are classified into transition (Ts) and transversion (Tv). A genome consists of coding regions and noncoding regions (including introns and intergenic spacers). The nuclear genome is usually very large and complicate. Moreover, plant nuclear genome sequencing is still a bottle-neck due to high costs, bacteria or fungi contaminations, high heterozygosity, etc. [3, 4]. Only economically important or model plants have their complete nuclear genomes sequenced. Instead, the chloroplast genomes unique to plants are very much smaller and easier to manipulate, and it is more likely to give a complete picture of plant genome evolution. Therefore, chloroplast genomes are currently a right choice for genome evolution studies. Considerable attention has been paid to the evolution rate variations among genes or lineages [5, 6]. The chloroplast genes such as ndhF, matK, rbcL and trnL-F have been well studied [7, 8]. Chloroplast DNA shows a biased transition (Ts) mutation toward A and T. For example, the frequency of A and T at the 3rd codon position is fourfold of other nucleotides in the rbcL gene of angiosperms [8]. This explains why the chloroplast genomes are usually A/T rich. The rates of the single nucleotide mutations are not uniform among different genes. Transition/transversion ratios (Ts/Tv) are 0.9 for rbcL and 1.4 for matK [9, 10]. Among the eight transversion mutation possibilities, from A to T and C to G are significantly less frequent than the other four possibilities [11-13]. Whether this observation is a general pattern or only a special case remains to be tested at genome level. Structural mutations of genomes convey important evolution information of organisms. The indels and inversions are rich in chloroplast genomes and can be reliably identified and used to reveal the evolution of organisms [14, 15]. Structural mutations are not randomly distributed throughout the chloroplast genome [14]. Tandem repeat-induced indels showed a statistically significant bias towards A/T-rich and the indel mutation rate was estimated to be approximately 0.8 ± 0.04 × 10− 9 per site per year in Poaceae [16]. Short inversions also have a widespread occurrence in chloroplast genomes and often form stem-loop structures [17-19]. Understanding the evolution of such structural mutations is crucial for making full and correct use of the genome information [20]. The rates of DNA mutation is one of the core questions in molecular evolution [5]. The mutation rates can be estimated from either mutation accumulation (MA) lines [21, 22] or phylogenetic inference [23]. For the latter method, if the branch age is known, the absolute substitution rate can be calculated. The branch age is usually dated using calibrated molecular clocks. It is a common practice to infer the directions of mutations according to a phylogeny which, unfortunately, is usually based on the same dataset. It would be better if the phylogeny is independent to dataset [24]. The family Calycanthaceae serves as an ideal reference because the phylogenetic relationships within the family are self-evident. Calycanthaceae is a small family holding a position at the base of Laurales [25]. There are in total about 10 species belonging to three genera of two subfamilies (see below for more details). The phylogenetic topology of the family has only one possibility at subfamily, genus and species ranks (Fig. 1), which enables us to infer the chloroplast genome evolution in this family at different taxonomic ranks. In this study, we use Calycanthaceae as a model to empirically observe the directions and to estimate the rates of different mutations imprinted in the chloroplast genomes at different taxonomic ranks. More specifically, we are going to answer (1) if there are significant mutation rate differences between the two subfamilies, between the two genera within subfamily Calycanthoideae, and among species within Calycanthus and within Chimonanthus; and (2) what the evolutionary patterns, directions of nucleotide substitutions and structural mutations in the Calycanthaceae chloroplast genomes are like. Such kind of empirical data is of values for precisely modeling the chloroplast genome evolution.
Fig. 1

Phylogenetic relationships five species belonging to two subfamilies, three genera of the Calycanthaceae using two species of Magnoliaceae as outgroups

Phylogenetic relationships five species belonging to two subfamilies, three genera of the Calycanthaceae using two species of Magnoliaceae as outgroups

Methods

The family Calycanthaceae and sampling strategies

Calycanthaceae holds a basal position in Laurales [26]. The family is subdivided into two subfamilies: subfamily Idiospermoideae [one genus, one species, Idiospermum australiense (Diels) S. T. Blake] and subfamily Calycanthoideae (two genera, ca. nine species). Subfamily Idiospermoideae was sometimes considered a distinct family [27] but it is more natural in Calycanthaceae. There are three species in Calycanthus, one in China (C. chinensis Cheng & S. Y. Chang) and two in USA (C. floridus L. and C. occidentalis Hook. & Arn.). Calycanthus chinensis was once separated and put in the monotypic genus Sinocalycanthus. There are about six species in Chimonanthus. There are deciduous species like Ch. praecox L., and evergreen species like Ch. nitens Oliv. (see Zhou et al. 2006 for details). We selected five species, C. chinensis, C. floridus, Ch. nitens, Ch. praecox and Idiospermum australiense, to represent two subfamilies, three genera and deciduous and evergreen species within Chimonanthus. Thus their phylogenetic relationships are intuitively quite clear (Fig. 1). The chloroplast genome of C. floridus has been determined [28] and the genomes of other four species need to be determined. Young and healthy leaves of these species were collected. The voucher details of the samples are given in Supplementary Table S1. According to APG IV [29], more basal species Liriodendron tulipifera L. and Magnolia kwangsiensis Figlar & Noot. are used as outgroups.

Chloroplast genome sequencing and annotation

Genomic DNA was extracted from the silica gel-dried leaves of four species using mCTAB method [30] and purified using the Wizard DNA Clean-Up System (A7280, Promega). The fragments covering the whole chloroplast genomes were amplified using the universal primers provided by Dong et al. [31]. Specific primers were designed based on the chloroplast genome of C. floridus using Primer Premier v. 5.0 (Premier Biosoft International, CA, USA) and Oligo v. 6.71 (Molecular Biology Insights, CO, USA) to bridge the gaps (Supplementary Table S2). The genomes were assembled using Sequencher ver. 4.7 (Gene Code) with the genome of C. floridus as a reference. The resulting genomes were annotated using Dual Organellar Genome Annotator (DOGMA) [32]. All tRNA genes were further verified using the corresponding structures predicted by tRNAscan-SE 1.21 [33].

Codon usage analysis

Codon usage was determined for all protein-coding genes. The frequency of codon usage and the amino acid composition was determined using MEGA version 7 [34].

Chloroplast genome alignments and molecular dating

All five Calycanthaceae chloroplast genomes were aligned together with the outgroup L. tulipifera and M. kwangsiensis using MUSCLE v3.7 [35] and adjusted manually using Se-Al 2.0 [36]. To estimate the divergence times of subfamilies, genera and species within Calycanthaceae, we conducted molecular dating analyses by adding 22 more genome sequences from GenBank (Supplementary Table S3) representing the major monophyletic branches of basal angiosperms, monocots, rosids, Saxifragales, and asterids. Sequences of the eighty-three coding genes were extracted from the genomes, aligned and concatenated into a super matrix. Rare insertions and unreliably aligned regions were excluded from analyses. The dating methods were the same as Xue et.al [37]. For calibration, three constraints were used: (i) the angiosperm crown group was set to a minimum age of 131.8 Mya [38]; (ii) the eudicots crown was set to a minimum age of 125 Mya; (iii) the crown group of Calycanthaceae and Magnoliaceae was set to a maximum age of 140 Mya based on the onset of angiosperm radiation, and the minimum age of Calycanthaceae was set to a minimum age of 90 Mya according to the fossil record of Jerseyanthus calycanthoides [39].

Genome partition and mutation identification

The genome sequence data were subject to a series of grouping to test the variations of mutation rates and directionality: (1) coding regions, introns and intergenic spacers; (2) transition and transversion of coding genes; (3) synonymous (dS) and nonsynonymous (dN) substitution of coding genes; and (4) gene clusters of the same functions. Hierarchical outgroup usage was adopted to infer the directions of changes. At subfamily rank, L. tulipifera and M. kwangsiensis was used as outgroups; at genus rank within subfamily Calycanthoideae, I. australiense was used as an outgroup; and at species rank within Calycanthus and Chimonanthus, they were used as outgroups reciprocally. The directions of mutations were identified according to the outgroup(s). If a state is the same as the outgroup(s), the state is considered plesiomorphic, and accordingly, the state different from that of outgroup(s) is considered apomorphic. We classified structural mutations into four categories: insertions or deletions (indels, Fig. 2a), repeats (Fig. 2b), microsatellites (Fig. 2c & d), and inversions (Fig. 2e). Although microsatellites are also indels or repeats, we classified them out as a distinct category for their very high variabilities. Inversions were sought out using the REPtuter program [40] first and then confirmed by reexamining the alignments. Each inversion is always accompanied by an inverted repeat at the opposite franking end, which forms a stem-loop structure. Conventional statistics were used to give estimates and to test the significance of the numbers and ratios of the mutations. The Chi-square test, Fisher’s exact test, T-test, significance test of correlation coefficient, and Wilcoxon signed rank test were done with R packages. Genetic distances between any taxa were calculated using MEGA 5 [41].
Fig. 2

Structural mutation types in chloroplast genomes. a indel; b repeat; (c) and (d) microsatellite; e inversion

Structural mutation types in chloroplast genomes. a indel; b repeat; (c) and (d) microsatellite; e inversion

Estimation of mutation rate

The rate of mutation per site per year (μ) and its variance (ν) were estimated using the following formula [42]. Where m is the number of observed mutation, n is the number of total sites, and T is the divergence time of a node. The μ and ν values of structural mutations were calculated using the method of Saitou and Ueda [43]. In their method, the total number of structural mutations was divided by the additive time based on branch lengths and by the length of the nucleotide sequences.

Availability of supporting data

The data sets supporting the results of this article are given as supporting files. All nucleotide sequences were deposited in the NCBI GenBank repository (GenBank accession numbers: MH377056- MH377059).

Results

General features of the Calycanthaceae chloroplast genomes

The structures of the four new chloroplast genomes were very similar to that of C. floridus (Supplementary Figure S1, Table 1). The genome sizes varied from 153,250 bp (Ch. praecox) to 154,746 bp (I. australiense). The overall G + C content was 39.23–39.30%, and the coding regions accounted for 58.99 to 59.24%. All five genomes encoded 114 unique genes with identical gene order and gene clusters, 17 or 18 of them were duplicated in the inverted repeat (IR) region (Supplementary Table S4 and Figure S2). Each chloroplast genome comprised four rRNA genes (6S, 23S, 5S, 4.5S), 30 tRNA genes and 80 protein-coding genes including ycf1, ycf2, ycf3 and ycf4. Each of 16 genes contained a single intron, and each of two genes (ycf3 and clpP) contained two introns. The mRNA gene of rps12 was generated by transsplicing.
Table 1

Major features of the chloroplast genomes of five species in Calycanthus, Chimonanthus and Idiospermum

Genome featureC. floridusC. chinensisCh. nitensCh. praecoxI. australiense
GenBank accession numbersAJ428413MH377059MH377058MH377057MH377056
Size (bp)153,337153,346153,250153,252154,767
LSC length (bp)86,94886,98386,88286,91285,482
IR length (bp)23,29523,28423,33023,28724,860
SSC length (bp)19,79919,79519,70819,76619,565
Total number of genes, including ycfs131131131131132
Number of genes in IR1717171718
Number of genes with introns1818181818
GC content of the genome (%)39.339.2739.2739.2539.23
GC content of protein-coding genes, tRNAs and rRNAs (%)59.0459.2459.0759.0758.99
Major features of the chloroplast genomes of five species in Calycanthus, Chimonanthus and Idiospermum The transition areas between the large single copy region (LSC) and the IR (LSC/IRb and IRa/LSC) were identical in Calycanthoideae but different in Idiospermoideae. In Calycanthoideae, the rpl2 gene was in the IR region and the rps19 gene was in LSC region (Supplementary Figure S2). However, in Idiospermoideae, both the rpl2 and rps19 genes were in IR region. The codon usage and amino acid composition frequencies were not significantly different among the five Calycanthaceae chloroplast genomes (Supplementary Figure S3). The total protein coding genes comprised 68,394–68,463 bp that encoded 22,798–22,821 codons. Of these codons, the ATT (3.47%) was the most frequent codon and the TGC (0.30%) was the least. About 10.24% codons encoded leucine, whereas only 1.14% codon encoded cysteine, which were the most and the least frequently used amino acids in the chloroplast genome, respectively.

Evolution of the single nucleotide polymorphic loci

There were 2861 single nucleotide polymorphism (SNP) in the five Calycanthaceae chloroplast genomes. Among them, 2781 were biallelic, 22 (0.79%) SNPs were triallelic, and 58 (2.03%) were parallel mutation SNPs at different taxonomic ranks. The trialletic and parallel mutation SNPs were excluded from subsequent analyses. Of the 2781 sites, the mutation directions of 2573 sites were inferable according to the outgroups. The ratios of Ts to Tv were high at subfamily and species ranks (1.53 and 1.53 or 1.91, respectively) but low at genus rank (1.09), with an overall value of 1.61. The proportions of two transitional mutations and four transversional mutations varied significantly among one another at family, genus and species ranks (G test, P < 0.01). Among the four transversion mutations, A → C + T → G mutations were much more common than other three possibilities (Fig. 3). A → T + T → A and G → C + C → G were much fewer. However, the overall ratio of nonsynonymous mutations to synonymous mutations (dN/dS) in the coding regions nearly equaled 1 (Table 2). When the SNPs of the eight functional gene clusters (atp, ndh, pet, psa, psb, rpl, rpo, and rps) were sorted into photosynthetic metabolism- (atp and ndh), apparatus- (pet, psa and psb), and ribosomal protein-related (rpl, rpo and rps) groups, there was a tendency that the photosynthetic apparatus-related genes (pet, psa and psb) had the highest Ts/Tv (= 2.86 ~ 4.83) but the lowest dN/dS (= 0.17 ~ 0.38) values (Fig. 4), indicating these genes were under high selection pressure.
Fig. 3

Percentage of each substitution type in the chloroplast genomes of five species belonging to different subfamilies and genera in Calycanthaceae

Table 2

Taxonomic and genomic distribution of the biallelic single nucleotide polymorphic loci in the five chloroplast genomes of Calycanthaceae

SubfamilyGenusSpecies
Genome regionLength (bp)Value%Value%Calycanthus%Chimonanthus%Total%
Total substitutions130,34519961.532690.212830.222330.1827812.13
Coding region68,7639801.431180.171160.171180.1813321.94
 Nonsynonymous/4890.71530.08610.09640.096670.97
 Synonymous/4910.71650.09550.08540.086650.97
 dN/dS/1/0.82/1.11/1.19/1/
Intron14,1041851.31300.21360.26240.172751.95
Intergenic spacer43,1398311.931210.281310.3910.2111742.72
Fig. 4

The ratios of Ts/Tv and dN/dS in different functional gene groups

Percentage of each substitution type in the chloroplast genomes of five species belonging to different subfamilies and genera in Calycanthaceae Taxonomic and genomic distribution of the biallelic single nucleotide polymorphic loci in the five chloroplast genomes of Calycanthaceae The ratios of Ts/Tv and dN/dS in different functional gene groups The 2781 biallelic SNPs were subdivided into coding, intron and intergenic spacer regions and sorted into subfamily, genus and species ranks (Table 2). At subfamily rank, there were 1996 SNPs in total: 980 in coding regions, 185 in intron regions, and 831 in intergenic spacer regions. The percentages of SNP to the total lengths were 1.43, 1.31 and 1.93%, respectively, indicating sequences of intergenic spacers were relatively more variable, but not the intron regions. In the coding regions, the ratio of 489 nonsynonymous mutations to 491 synonymous mutations (dN/dS) was very close to 1.0. The SNPs at genus rank within Calycanthoideae or at species rank within Calycanthus and within Chimonanthus were much fewer than those at subfamily rank (Table 2). At genus rank, there were 269 mutations, 118 (43.9%) in coding regions, 30 (11.1%) in intron regions, and 121 (45.0%) in intergenic spacer regions. Deviation of dN/dS (=0.82) to synonymous mutations was obvious at genus rank. At species rank, sequence variability was similar to genus rank, with 283 mutations within Calycanthus and 233 within Chimonanthus (Table 2). Again the intergenic spacers were more variable than introns and coding regions. The ratios of dN/dS were larger than 1, indicating nonsyonymous mutations had been fixed in the genomes.

Evolution of structural mutations

Indel

There were 178 indels in total (Table 3). Among them 143 (80.3%) were observed at the subfamily rank, 11 (6.2%) at the genus rank, 11 (6.2%) within Calycanthus, and 13 (7.3%) within Chimonanthus. Most indels occurred in intergenic spacers. Only 10 indels were observed in coding regions (seven in ycf1, one in rpl33, rpoC1 and accD, respectively). The length of the indels ranged from one to 364 base pairs, but 75.8% of them were not longer than 10 bp. The direction of indel mutations was deletion-biased (Fisher’s exact test, P < 0.001). The deletions were about 8.5 times as much as the insertions (Table 3).
Table 3

Numbers, directions and locations of indels and repeats in the five chloroplast genomes of Calycanthaceae

SubfamilyGenusSpeciesTotal
Mutation typeCalycanthusChimonanthus
IndelTotal143111113178
 Size1 ~ 10 bp1126512135
11 ~ 50 bp1923125
50 ~ 100 bp532010
>100 bp70108
 Locationexon811010
intron2312127
space1129812141
 Directioninsertion1411117
deletion113101012145
uncertain1600016
RepeatTotal891773116
 Size3 bp40004
4 bp2530028
5 bp4092152
6 bp1522221
7 bp10102
9 bp30003
10 bp11003
>10 bp02204
 Locationexon42006
intron1821021
space67136389
 Directioninsertion75156399
deletion1121014
uncertain30003
Numbers, directions and locations of indels and repeats in the five chloroplast genomes of Calycanthaceae

Repeat

There were 116 repeating events (Table 3). Among them 89 (76.7%) were observed at the subfamily rank, 17 (14.7%) at the genus rank, 7 (6.0%) within Calycanthus, and 3 (2.6%) within Chimonanthus. Most repeats occurred in intergenic spacers. Only 6 repeats were observed in coding regions (three in ycf1, one in ndhF, rpoC1, and rpoC2, respectively). The length of the repeats ranged from three to 22 base pairs, and four to six base pairs were the most common (84.48%, Fig. 3). Unlike indels, the direction of repeat mutations was insertion-biased (Fisher’s exact test, P < 0.001). Approximately 85.3% of the total repeats were insertions.

Microsatellite

There were 129 microsatellites, 96 were poly A/T, 13 were poly C/G, eight were 2-bp loci and two were 3-bp loci. All microsatellite loci located in noncoding regions and were A/T-rich indels (Fisher’s exact test, P < 0.001). Their mutation directions were not inferable.

Inversion

There were 13 inversions of 2-bp to 92-bp in length in the five Calycanthaceae chloroplast genomes (Table 4). All inversions were accompanied by a pair of inverted repeats immediately flanking the inversion (Fig. 2e). Seven inversions occurred at the subfamily rank, two at the genus rank, three at the species rank, and one at both the subfamily and the species ranks. Eight inversions occurred in the LSC region, three in the SSC region, and two in the IR region. A 4-bp inversion occurred within the coding region of ndhG, an 8-bp inversion happened in the ndhA intron, and 11 inversions were found in intergenic spacers. The franking inverted repeats were from 7 bp to 36 bp in length. No correlation was inferred between the length of the inversions and the inverted repeats (p = 0.31). Convergence happened on inversions. For example, both C. chinensis and Ch. nitens shared an inversion at trnN-trnR, and C. floridus and Ch. praecox shared the inversion at psbC-trnS (Table 4).
Table 4

The genomic location, length and taxonomic distributions of 13 inversions. uc: direction uncertain; Cc: C. chinensis; Cf: C. floridus; Chn: Ch.nitens; Chp: Ch. praecox; Ia: I. australiense; Lt: Liriodendron tulipifera; Mk: Magnolia kwangsiensis

LocationLengthInversionsTaxonomic rank
loopstemMkLtIaCfCcChnChp
IRtrnNGUU-trnRACG1214nononoyesyesnonoGenus
IRtrnNGUU-trnRACG29217nonononoyesyesnoSpecies
LSCatpF-atpH56ucucnoyesyesyesyesSubfamily
LSCpetA-psbJ1636nonoyesnonononoSubfamily
LSCpsbC-trnSUGA711nonoyesyesnonoyesSpecies
LSCrps18-rpl201617nononoyesyesyesyesSubfamily
LSCrps2-rpoC2109nonoyesnonononoSubfamily
LSCrps4-trnT27nononoyesyesyesyesSubfamily
LSCtrnS-trnG312nononoyesyesyesyesSubfamily
LSCtrnTGGU-psbD47nonoyesyesyesnonoGenus
SSCndhA intron811noyesyesnonononoSubfamily
SSCndhG49ucucnoyesnononoSpecies
SSCrpl32-trnL28noucnoyesyesnoyesSubfamily, Species
The genomic location, length and taxonomic distributions of 13 inversions. uc: direction uncertain; Cc: C. chinensis; Cf: C. floridus; Chn: Ch.nitens; Chp: Ch. praecox; Ia: I. australiense; Lt: Liriodendron tulipifera; Mk: Magnolia kwangsiensis

Association of structural mutations to nucleotide substitutions

There would exist certain relationships between structural mutation (S) and nucleotide mutations (N). We calculated genetic distances (gd) between regions with and without structural mutations using SNP at three taxonomic ranks (Table 5). It was obvious that the ratio of N to S (N/S) was the smallest at subfamily rank and the largest at species rank. On the contrary, the genetic distances between regions with S (gdS) and without S (gdN) were the largest at subfamily rank and smallest at species rank. Similar patterns existed when only coding or noncoding regions were considered. Noticeably, the gdS was always larger than gdN, suggesting that the nucleotides in the regions with structural mutations were more variable than those in the regions without structural mutations.
Table 5

Associations of nucleotide mutations (N) to structural mutations (S). gdN: the genetic distance of the regions without structural mutations; gdS: the genetic distance of the regions with structural mutations. The distances are based on SNPs. Ia: Idiospermum australiense; Cc: Calycanthus chinensis; Cf: Calycanthus floridus; Chn: Chimonanthus nitens; Chp: Chimonanthus praecox

LevelComparisonWhole genomeNoncoding regionCoding region
NSN/SgdSgdNgdS/gdNgdSgdNgdS/gdNgdSgdNgdS/gdN
SubfamilyIa/Cc266425010.660.03170.01302.440.03150.01452.170.03230.01262.56
SubfamilyIa/Cf261424910.500.03120.01292.420.03070.01372.240.03250.01272.56
SubfamilyIa/Chn264726410.030.03130.01292.430.03070.01452.120.03290.01262.61
SubfamilyIa/Chp25872609.950.03070.01272.420.03010.01372.200.03250.01252.60
 GenusCc/Chn7154615.540.00760.00372.050.00810.00591.370.00630.00321.97
 GenusCc/Chp6674116.270.00720.00332.180.00770.00481.600.00590.00301.97
 GenusChn/Cf6874914.020.00750.00352.140.00770.00491.570.00700.00322.19
 GenusChp/Cf6174414.020.00690.00322.160.00700.00401.750.00650.00302.17
SpeciesCc/Cf3831722.530.00430.00182.390.00440.00261.690.00400.00162.50
SpeciesChp/Chn3341620.880.00330.00191.740.00350.00281.250.00300.00171.76
Associations of nucleotide mutations (N) to structural mutations (S). gdN: the genetic distance of the regions without structural mutations; gdS: the genetic distance of the regions with structural mutations. The distances are based on SNPs. Ia: Idiospermum australiense; Cc: Calycanthus chinensis; Cf: Calycanthus floridus; Chn: Chimonanthus nitens; Chp: Chimonanthus praecox

Evolutionary rates of nucleotide and structural mutations

The divergence times of the Calycanthaceae were estimated using a matrix of 83 genes from 27 species of angiosperms (Supplementary Table S3). The total length of the matrix was 63,525 bp. The divergence time of the family was estimated to be 110 Mya from other members in Laurales, 108.8 Mya between the Calycanthoideae and the Idiospermoideae, 18.7 Mya between Calycanthus and Chimonanthus, 7.6 Mya between the two species of Calycanthus, and 6.2 Mya between the two species of Chimonanthus (Supplementary Figure S4). The evolutionary rates of the genomes were calculated using the lengths of the genomes, the number of substitutions and the times since divergence. The rates of nucleotide substitution were 0.14 × 10− 9 per site per year at the subfamily rank, 0.19 × 10− 9 per site per year at the genus rank, 0.34 × 10− 9 per site per year in Calycanthus, 0.32 × 10− 9 per site per year in Chimonanthus, and 0.25 × 10− 9 per site per year in the Calycanthaceae on average. The mutation rates of indels and repeats were estimated to be 1.73 × 10− 11 per site per year. Indels had a rate of 0.64 × 10− 11 per site per year and repeats had a rate of 1.0 × 10− 11 per site per year. The rate of microsatellites and the inversions were not considered because these mutations lacked phylogenetic information and the directions were not inferable.

Discussion

Mutations are at least the raw materials if not a drive in evolution [5]. It is tedious to count the molecular mutations even in the small genomes such as chloroplast genomes of plants and most of our understanding of evolution is based on small regions instead of the whole genome. What seems more difficult is to determine the directions of mutations, a critical concept in evolution. The inherently obvious phylogenetic relationships within Calycanthaceae serve as an ideal reference for studying the chloroplast genome evolution at subfamily, genus and species ranks.

General patterns of chloroplast genome evolution

Mutations in Calycanthaceae had experienced a long history of natural selection and exhibited some unique patterns that changed our general notion on genome evolution. It is believed that there are sites which mutate more frequently than other sites. Such kind of sites were very rare in Calycanthaceae and perhaps in other species as well. About 98% of single nucleotide mutant sites had two alleles and only 22 (0.79%) sites had more than two alleles. Reverse mutations should be even rarer (2.03%). This implies that modeling the evolution of such uncertain sites does not add much knowledge to the study. Nucleotide mutations are of unequal possibilities. Changes from C or G to A or T were more common in the chloroplast genomes of Calycanthaceae (and other seed plants) [12, 13]. This explains why A or T is about 10% more than G or C. However, such kind of biased mutations happened only in the noncoding regions [12, 44]. The mutations biased to the opposite side in the coding regions. The G and C content of the coding genes was approximately 60%. Of course there are a few exceptions, esp. non-seed plants, for example, Selaginella [45]. It is postulated that the mutation rate of noncoding regions is higher than that of the coding regions. In our case, it is true for intergenic spacer regions. The nucleotide mutation rate in introns was almost the same as that of coding regions. Synonymous substitutions which do not change the function of a gene, are believed to be more frequently present than nonsynonymous substitutions. However, in the coding genes of the chloroplast genomes of Calycanthaceae, dN/dS equaled one, indicating a balanced synonymous and nonsynonymous mutations. There are mutation peaks and valleys in the plastid genomes owing to functional constrains [46], and nonsynonymous mutations are believed to be a genetic force that generates heterogeneity [47]. The dN/dS, the ratio indicating natural selection pressure, varies across the plastid genomes of the Calycanthaceae. Photosynthetic apparatus genes have the lowest ratio of dN/dS, but the rpo-genes show relatively high dN/dS (Fig. 4). Therefore, “lucky genes” might sometimes be found even from coding genes for phylogeny [48]. Structural mutations are not as common as nucleotide substitutions. Structural mutations occur mostly in noncoding regions, especially for microsatellite mutations (Table 2 and Table 3). Motifs of a single nucleotide repeats are much more common than the typical motifs of two to six nucleotides [14, 49, 50]. Structural mutations are characterized by insertions, deletions, repeats, inversions and microsatellites of small length, usually shorter than 10 bases or base pairs [14].

Direction of mutations

Although there are eight possible transversional changes and only four transitional changes, transitions are much more common in the chloroplast genomes. Among the eight transversions, their frequencies are different. Mutations of A → C + T → G and C → A + G → T were much more common than other mutations in the chloroplast genomes of Calycanthaceae. There is a nucleotide T bias in chloroplast genomes. Methylation-deamination and uncorrected post-replicative transition mutation lead to the replacement of C with T. The same phenomenon was observed on nuclear genome data [21, 51] and animal mitochondrial genome data [52]. Structural mutations are potential informative phylogenetic characters of low homoplasy if handled properly. For example, Arachis [53], yam [54], and Cynara [55] have 85, 43, 34 indels in their whole chloroplast genomes, respectively. However, the direction of structural mutations are of less focus [56]. Same indels and repeats do not occur by chance in different taxa and data of correctly coded indels and repeats are of very low homoplasy in phylogeny (Table 3). In Calycanthacee the direction of indels and repeats were easily inferable. For indels, deletions were much more common than insertions; and for repeats, repeat gains were much more than repeat losses. Microsatellites are sometimes used in phylogenetic analyses and a repeat number increase was observed in maize short term evolution [52]. However, in this study the directions of microsatellite mutations in the chloroplast genomes of Calycanthaceae did not have a general tendency either to increase or to decrease at family, genus or species ranks. Therefore, inclusion of the gaps induced by microsatellites above species rank is likely to introduce homoplasy into datasets. Parallel mutations of inversions among individuals were observed in Calycanthaceae. Inversions often happen due to repetitive structures by parallel or back mutations during chloroplast genome evolution [17, 19]. Inversions must be identified when doing alignment but the gaps induced by inversions should be used with caution above species rank. In short, the structural mutations are often a mixture of phylogenetic signals and noises and only phylogenetic informative structural mutation information is considered when nucleotide data are insufficient to resolve a phylogeny [57, 58].

Taxonomic rank- and genome position-related mutations

It is now a consensus that mutation rate varies among lineages [59]. Many studies showed that annual plants evolve faster than perennials and trees [5, 60]. Our study demonstrated an accelerated evolution from subfamily to species in Calycanthaceae. The mutation rate is the lowest at subfamily rank and highest at species rank. Such a heterogeneity of mutation rates among taxonomic ranks is probably due to life history, time of selection and random mutations [5, 61]. In Calycanthaceae, the ancestors were likely to be large trees of very long life history such as I. australiense. Shrubs of relatively short life history are a derived character state [60]. From the branch lengths it is easy to understand that ancestors of higher taxonomic ranks experienced longer natural selection that might have eliminated quite many variations. Evolution might accelerate in one historical period and decelerate in the other. Although it is well known that mutation rates vary significantly among genes of genomes [62]. Some regions have been proven more variable than others [63-65], for example, ycf1, trnH-psbA, rpl32-trnL. Generally ribosomal protein-related and photosynthetic metabolic genes had higher mutation rates than photosynthetic apparatus genes in Calycanthaceae chloroplast genome (Fig. 4). There are two explanations. The first one is that the ribosomal protein-related and photosynthetic metabolic genes have relaxed evolutionary constraint. For example, the rpl32 have been completely lost from the chloroplast genome in multiple lineages within the land plants [66, 67]. The second one is the functions of some of these genes have been taken over by some genes in the nuclear genome and strong coevolution between plastid- and nuclear-encoded subunits that have accelerated mutation rates [67-70]. The regions of rich structural mutations also had higher nucleotide diversity in the chloroplast genomes [44, 71, 72].

Conclusions

A detailed understanding of the characteristics of the chloroplast genome evolution is helpful for correct use of chloroplast genome data in evolutionary biology. We observed the taxonomic and genomic distributions of mutations existed in the five chloroplast genomes of Calycanthaceae, inferred their directions, and estimated the rate of evolution. These direct observations provide raw data for computer simulation and modeling true evolution of sequence data. Additional file 1: Table S1. Sample accession numbers of the four species in Calycanthaceae analyzed in this study and their localities with voucher information. Additional file 2: Table S2. List of specific primers used to amplify and sequence the chloroplast genomes of Calycanthaceae. Additional file 3: Table S3. List of taxa used for estimating the divergence time of the Calycanthaceae. Additional file 4: Table S4. List of genes found in the chloroplast genomes of Calycanthaceae. Additional file 5: Figure S1. Chloroplast genome map of Chimonanthus praecox, a representative of all four newly determined chloroplast genomes. The genes inside and outside the circle are transcribed clockwise and counterclockwise, respectively. Genes that belong to different functional groups are represented by different colors. The thick lines indicate the extent of the inverted repeats (IRa and IRb) that separate the genomes into small (SSC) and large (LSC) single-copy regions. The flower in the center is of Chimonanthus praecox. Additional file 6: Figure S2. Detailed view of the border regions between the inverted repeats and the single-copy regions of the Calycanthaceae chloroplast genomes. The figure is not to scale. Additional file 7: Figure S3. The codon usage in the Calycanthaceae chloroplast genomes. Additional file 8: Figure S4. Divergence times of the crown groups estimated using BEAST version 1.6.1 under the uncorrelated lognormal (UCLN) model based on 83 chloroplast genes of 26 taxa. Numbers on the nodes are the estimated medium ages (Mya).
  57 in total

1.  Single-nucleotide mutation rate increases close to insertions/deletions in eukaryotes.

Authors:  Dacheng Tian; Qiang Wang; Pengfei Zhang; Hitoshi Araki; Sihai Yang; Martin Kreitman; Thomas Nagylaki; Richard Hudson; Joy Bergelson; Jian-Qun Chen
Journal:  Nature       Date:  2008-07-20       Impact factor: 49.962

Review 2.  Mutational dynamics of microsatellites.

Authors:  Atul Bhargava; F F Fuentes
Journal:  Mol Biotechnol       Date:  2010-03       Impact factor: 2.695

3.  Unparalleled GC content in the plastid DNA of Selaginella.

Authors:  David Roy Smith
Journal:  Plant Mol Biol       Date:  2009-09-23       Impact factor: 4.076

4.  A genome-wide view of Caenorhabditis elegans base-substitution mutation processes.

Authors:  Dee R Denver; Peter C Dolan; Larry J Wilhelm; Way Sung; J Ignacio Lucas-Lledó; Dana K Howe; Samantha C Lewis; Kazu Okamoto; W Kelley Thomas; Michael Lynch; Charles F Baer
Journal:  Proc Natl Acad Sci U S A       Date:  2009-09-10       Impact factor: 11.205

5.  Neighboring base composition and transversion/transition bias in a comparison of rice and maize chloroplast noncoding regions.

Authors:  B R Morton
Journal:  Proc Natl Acad Sci U S A       Date:  1995-10-10       Impact factor: 11.205

6.  The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs.

Authors:  Peter Schattner; Angela N Brooks; Todd M Lowe
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

7.  A chloroplast genomic strategy for designing taxon specific DNA mini-barcodes: a case study on ginsengs.

Authors:  Wenpan Dong; Han Liu; Chao Xu; Yunjuan Zuo; Zhongjian Chen; Shiliang Zhou
Journal:  BMC Genet       Date:  2014-12-20       Impact factor: 2.797

8.  ycf1, the most promising plastid DNA barcode of land plants.

Authors:  Wenpan Dong; Chao Xu; Changhao Li; Jiahui Sun; Yunjuan Zuo; Shuo Shi; Tao Cheng; Junjie Guo; Shiliang Zhou
Journal:  Sci Rep       Date:  2015-02-12       Impact factor: 4.379

9.  Comparative analysis of the complete chloroplast genome sequences in psammophytic Haloxylon species (Amaranthaceae).

Authors:  Wenpan Dong; Chao Xu; Delu Li; Xiaobai Jin; Ruili Li; Qi Lu; Zhili Suo
Journal:  PeerJ       Date:  2016-11-10       Impact factor: 2.984

10.  Sequencing angiosperm plastid genomes made easy: a complete set of universal primers and a case study on the phylogeny of saxifragales.

Authors:  Wenpan Dong; Chao Xu; Tao Cheng; Kui Lin; Shiliang Zhou
Journal:  Genome Biol Evol       Date:  2013       Impact factor: 3.416

View more
  10 in total

1.  Genome diploidization associates with cladogenesis, trait disparity, and plastid gene evolution.

Authors:  Sheng Zuo 左胜; Xinyi Guo 郭新异; Terezie Mandáková; Mark Edginton; Ihsan A Al-Shehbaz; Martin A Lysak
Journal:  Plant Physiol       Date:  2022-08-29       Impact factor: 8.005

2.  Phylogenomics and Genetic Diversity of Arnebiae Radix and Its Allies (Arnebia, Boraginaceae) in China.

Authors:  Jiahui Sun; Sheng Wang; Yiheng Wang; Ruishan Wang; Kangjia Liu; Enze Li; Ping Qiao; Linyuan Shi; Wenpan Dong; Luqi Huang; Lanping Guo
Journal:  Front Plant Sci       Date:  2022-06-09       Impact factor: 6.627

3.  Chloroplast phylogenomics and divergence times of Lagerstroemia (Lythraceae).

Authors:  Wenpan Dong; Chao Xu; Yanlei Liu; Jipu Shi; Wenying Li; Zhili Suo
Journal:  BMC Genomics       Date:  2021-06-09       Impact factor: 3.969

4.  The complete chloroplast genome of the Euphorbia maculata L. (Euphorbiaceae): characterization and phylogeny.

Authors:  Yiheng Wang; Jun Luo; Mengli Wang; Yang Ge; Lanping Guo
Journal:  Mitochondrial DNA B Resour       Date:  2020-11-06       Impact factor: 0.658

5.  Complete chloroplast genome of Engelhardtia fenzlii (Juglandaceae).

Authors:  Mu Liu; Jinsen Lu; Yuan Li; Lvshui Zhang
Journal:  Mitochondrial DNA B Resour       Date:  2021-01-27       Impact factor: 0.658

6.  Chloroplast phylogenomic insights into the evolution of Distylium (Hamamelidaceae).

Authors:  Wenpan Dong; Yanlei Liu; Chao Xu; Yongwei Gao; Qingjun Yuan; Zhili Suo; Zhixiang Zhang; Jiahui Sun
Journal:  BMC Genomics       Date:  2021-04-22       Impact factor: 3.969

7.  Comparative Chloroplast Genome Analyses of the Winter-Blooming Eastern Asian Endemic Genus Chimonanthus (Calycanthaceae) With Implications For Its Phylogeny and Diversification.

Authors:  Abbas Jamal; Jun Wen; Zhi-Yao Ma; Ibrar Ahmed; Long-Qing Chen; Ze-Long Nie; Xiu-Qun Liu
Journal:  Front Genet       Date:  2021-11-30       Impact factor: 4.599

8.  Chloroplast Genome Evolution and Species Identification of Styrax (Styracaceae).

Authors:  Yun Song; Wenjun Zhao; Jin Xu; MingFu Li; Yongjiang Zhang
Journal:  Biomed Res Int       Date:  2022-02-24       Impact factor: 3.411

9.  Haplotype Analysis of Chloroplast Genomes for Jujube Breeding.

Authors:  Guanglong Hu; Yang Wu; Chaojun Guo; Dongye Lu; Ningguang Dong; Bo Chen; Yanjie Qiao; Yuping Zhang; Qinghua Pan
Journal:  Front Plant Sci       Date:  2022-03-10       Impact factor: 5.753

10.  Comparative Analysis of Bacillariophyceae Chloroplast Genomes Uncovers Extensive Genome Rearrangements Associated with Speciation.

Authors:  Yichao Wang; Jing Wang; Yang Chen; Shuya Liu; Yongfang Zhao; Nansheng Chen
Journal:  Int J Environ Res Public Health       Date:  2022-08-14       Impact factor: 4.614

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.