Literature DB >> 28333233

Contrasting Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants.

Amanda R De La Torre^1,2, Zhen Li^3,4, Yves Van de Peer^3,4,5, Pär K Ingvarsson^2,6.

Abstract

The majority of variation in rates of molecular evolution among seed plants remains both unexplored and unexplained. Although some attention has been given to flowering plants, reports of molecular evolutionary rates for their sister plant clade (gymnosperms) are scarce, and to our knowledge differences in molecular evolution among seed plant clades have never been tested in a phylogenetic framework. Angiosperms and gymnosperms differ in a number of features, of which contrasting reproductive biology, life spans, and population sizes are the most prominent. The highly conserved morphology of gymnosperms evidenced by similarity of extant species to fossil records and the high levels of macrosynteny at the genomic level have led scientists to believe that gymnosperms are slow-evolving plants, although some studies have offered contradictory results. Here, we used 31,968 nucleotide sites obtained from orthologous genes across a wide taxonomic sampling that includes representatives of most conifers, cycads, ginkgo, and many angiosperms with a sequenced genome. Our results suggest that angiosperms and gymnosperms differ considerably in their rates of molecular evolution per unit time, with gymnosperm rates being, on average, seven times lower than angiosperm species. Longer generation times and larger genome sizes are some of the factors explaining the slow rates of molecular evolution found in gymnosperms. In contrast to their slow rates of molecular evolution, gymnosperms possess higher substitution rate ratios than angiosperm taxa. Finally, our study suggests stronger and more efficient purifying and diversifying selection in gymnosperm than in angiosperm species, probably in relation to larger effective population sizes.

Entities: Chemical Disease Species

Keywords: angiosperms; gymnosperms; life-history traits; mutation; selection; substitution rates

Mesh：

Year: 2017 PMID： 28333233 PMCID： PMC5435085 DOI： 10.1093/molbev/msx069

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Introduction

The study of the forces of mutation and selection and their effects at the molecular and phenotypic levels is crucial to understand how species have evolved over time (Lynch 2010). Under a strictly neutral model, the fate of substitutions at the molecular level is mainly determined by mutation and random genetic drift instead of by natural selection (Kimura 1968). Under the neutral theory, mutations are assumed to be selectively neutral, nearly neutral (S = 0) or strongly deleterious (S = –∞) while advantageous mutations are assumed to be too rare to have a significant effect on sequence evolution. Later, Ohta (1992) proposed a modified version of the neutral theory in which a substantial fraction of substitutions are caused by the random fixation of slightly deleterious mutations, and that a small fraction of all new mutations may have positive selection coefficients (the so called “nearly neutral model”). Although the strictly neutral model remains the most commonly used null model in population genetics, the controversy regarding the different models of evolution often comes down to a discussion about the importance of positive selection (Nielsen and Yang 2003). Selective pressures on amino acid mutations are often measured in comparative studies using the ratio of non-synonymous to synonymous substitution rates (also called “omega” and denoted by ω or dN/dS). When an amino acid change is neutral, the rate of fixation will be the same as that of a synonymous mutation, and ω = 1. Amino acid altering substitution rate ratios are denoted by ω < 1 or ω > 1, indicating negative selection, and positive selection, respectively (Yang and Nielsen 2002). Omega thus also gives an estimate of the rate of substitution at selected sites in comparison to the neutral substitution rate. Several codon-based likelihood models have been developed to study the distribution of ω among sites (site models) or among branches (branch models) using a phylogenetic framework (Yang et al. 1997; Yang and Nielsen 1998; Yang et al. 2000; Nielsen and Yang 2002). Estimating ω along particular lineages of a phylogeny allows the testing of hypotheses regarding the relative effects of selection in certain lineages of interest (Nielsen 2005); and also for the estimation of the distribution of selection coefficients among taxa. Because positive selection most likely affects only a few sites at a few time points, the evaluation of individual sites (site models) and branches (branch models) has more power to detect adaptive evolution than the pairwise sequence comparison, in which substitution rates are averaged over all amino acid sites (Yang 2002). Substitution rates have been reported to be variable across the tree of life, however the causes underlying this variation remain uncertain. In flowering plants, this variation has been explained by differences in life forms, height, generation times, genome size, environmental variables, and species richness (Gaut et al. 1992; Smith and Donoghue 2008; Lanfear et al. 2013; Bromham et al. 2015). Variable substitution rates have also been reported across nuclear and organelle genomes, and among genes with different functional categories (Bromham et al. 2015). Although considerable attention has been given to flowering plants (angiosperms), very few reports on the rate of molecular evolution exist for their sister seed plant clade, the gymnosperms, and to our knowledge differences in rates of molecular evolution among angiosperms and gymnosperms have never been tested within a phylogenetic framework. Gymnosperms are an ancient and widespread plant clade that represent four of the five main lineages of seed plants, and includes cycads, ginkgos, gnetophytes, and conifers (Wang and Ran 2014). Gymnosperm lineages separated from each other during the Late Carboniferous to the Late Triassic (311–212 mya), earlier than the occurrence of the earliest extant angiosperms around 300 mya (Magallón et al. 2013). Despite being dominant through most of the Mesozoic, gymnosperms were severely affected by extreme climatic shifts especially during the late Neogene, which may have favored the disproportionate loss of ancient lineages and their replacement by younger lineages in the Northern Hemisphere (Won and Renner 2006; Crisp and Cook 2011; Leslie et al. 2012). Cenozoic extinctions may have contributed to the low diversity of extant gymnosperms (Wang and Ran 2014), however they are unlikely to explain the 30-fold difference in species diversity between gymnosperm and angiosperm species. Besides differences in species diversity, gymnosperms and angiosperms differ in a number of features, of which contrasting reproductive biology (mating system, pollination type, and seed morphology), physiology (water-conducting systems), and life spans are the most prominent (Leitch and Leitch 2012; De La Torre et al. 2014a). Gymnosperms are typically outcrossing species, in which wind plays the main role in the pollination and dispersal of uncovered seeds. In addition, widespread gymnosperms are thought to have large effective population sizes and weak population structure (Neale and Kremer 2011). The recent genome sequencing of gymnosperm species has revealed that the enormous genomes of gymnosperms (20–40 Gb) are mainly composed of large and variable sets of transposable elements, and that they have similar numbers of protein-coding genes compared to other plant species (De La Torre et al. 2014a). The highly conserved morphology of gymnosperms, evidenced by the similarity of extant species to the earliest fossil records, and the high levels of macrosynteny among conifers (Pavy et al. 2012), have led scientists to believe that gymnosperms are slow-evolving plants (Won and Renner 2006). However, this remains a controversial issue, with studies showing opposing results (Willyard et al. 2007; Palme et al. 2009; Buschiazzo et al. 2012; Chen et al. 2012). Limitations of previous studies include the use of a small number of genes and/or species, the use of highly diverged species compared using different gene sets, and the lack of a phylogenetic framework that includes species from different taxonomic families. Considering that the rate of molecular evolution strongly depends on the selective constraints of proteins or amino acids, and these constraints are variable among genes, it seems risky to conclude significant differences in rates of molecular evolution from the comparison among different gene sets. Moreover, a recent whole-genome study in Picea species has found contrasting rates of sequence divergence among genes in relation to their functional category, duplication status and gene family size (De La Torre et al. 2015a). Evolutionary and phylogenetic analyses among gymnosperms and angiosperms have also been limited by the absence of orthologous genes that allow for such comparisons. In this study, we used a newly identified set of 42 single-copy genes obtained from whole genomic and/or transcriptomic data from a broad taxonomic sampling that includes all conifers (with the exception of Araucariaceae), cycads, ginkgo, and many angiosperms with sequenced genomes (Li Z, De La Torre AR, Sterck L, Canovas FM, Avila C, Von Arnold S, Ingvarsson PK, Van de Peer Y, in review). We aimed to test for differences in the rate of molecular evolution among gymnosperm and angiosperm species, and to understand the possible causes driving any such variation. With this, we hope we can contribute to a better understanding of the complex evolutionary relationships among major plant clades and to elucidate the main evolutionary processes that have shaped the seed plants we see today.

Results

Rates of Sequence Divergence

When evaluating pairwise estimates between species for each taxonomic family in the angiosperm and gymnosperm phylogeny (fig. 1), we found significant differences in the rates of sequence divergence between the two major plant clades. The results of the sign test indicated that dN, dS, and ω were significantly different between angiosperms and gymnosperms across the 42 genes evaluated. The number of synonymous substitutions per site (dS) was lower in gymnosperms than in angiosperms in all of the 42 genes studied (P < 0.001). Similarly, the number of non-synonymous substitutions per site (dN) was lower in gymnosperms for 86% of the studied genes (P < 0.001), whereas the ratio of non-synonymous/synonymous substitutions (ω) was higher in gymnosperms for 76.2% of the genes (P < 0.001) (see supplementary table S3, Supplementary Material Online). Consistent results were found when comparing branch estimations of dS (t = 3, df = 57.87, P < 0.01) and ω (t = –4.0379, df = 37.8, P < 0.001); however dN was not significantly different between angiosperms and gymnosperms when comparing branch estimations in all terminal branches (t = –0.319, df = 52.34, P = 0.751) (see supplementary table S4, Supplementary Material Online). The most contrasting differences were found in the absolute rates of silent-site divergence (μ), with angiosperms rates being, on average, seven times higher than that observed in gymnosperm species (5.35 × 10 − 9 vs. 7.71 × 10 − 10 synonymous substitutions/site/year, respectively). Families showing the highest rates were Brassicaceae and Poaceae, and the ones with the lowest rates were Cuppresaceae and Podocarpaceae. Within gymnosperms, Gnetophytes had the highest rates of sequence divergence (table 1).

Table 1

Absolute Rates of Silent-Site Divergence (μ) for Each Taxonomic Family or Subfamily Based on 42 Single-Copy Nuclear Genes in 31 Gymnosperms and 34 Angiosperms Species.

Plant Clade	Taxa	Subtaxa	dS	T (years)	μ (site/year)
Gymnosperms	Pinaceae		0.102		7.28067E–10
		Pinus	0.070	84500000	4.16479E–10
		Cedrus_Abies	0.128	129700000	4.9387E–10
		Larix_Pseudotsuga	0.148	61600000	1.19939E–09
		Picea	0.061	38300000	8.02526E–10
	Cupressaceae		0.189	159200000	5.92216E–10
	Taxaceae		0.225	153000000	7.34573E–10
	Podocarpaceae		0.170	146100000	5.81248E–10
	Cycadales		0.333	248100000	6.70325E–10
	Gnetophytes		0.073	25000000	1.45547E–09
Angiosperms	Brassicaceae		0.352	27000000	6.51648E–09
	Malvaceae		0.373	40900000	4.56E–09
	Euphorbiaceae		0.463	51000000	4.54382E–09
	Fabaceae		0.502	53100000	4.72809E–09
	Cucurbitaceae		0.239	20000000	5.9795E–09
	Rosaceae		0.421	52300000	4.02113E–09
	Solanaceae		0.084	7300000	5.75068E–09
	Poaceae		0.372		6.0277E–09
		Poaceae I	0.457	39700000	5.76174E–09
		Poaceae II	0.287	22800000	6.29366E–09

Note.—Divergence times (T) are based on fossil calibration data and published studies (see supplementary table S2, Supplementary Material Online).

Phylogenetic tree obtained from a concatenated alignment of 42 single-copy genes and 66 species distributed between two main seed plant lineages (angiosperms in green, gymnosperms in blue, and outgroup in black) inferred by RAxML. Stem branches for angiosperms and gymnosperms were used for model testing using the branch models implemented in codeml. Results of the model testing can be found in table 2. Absolute Rates of Silent-Site Divergence (μ) for Each Taxonomic Family or Subfamily Based on 42 Single-Copy Nuclear Genes in 31 Gymnosperms and 34 Angiosperms Species. Note.—Divergence times (T) are based on fossil calibration data and published studies (see supplementary table S2, Supplementary Material Online).

Estimation of the Variation in Selective Pressures (ω) among Branches

We estimated the variation in selective pressures (ω) among stem branches in the gymnosperm and angiosperm phylogenies, using the branch models in PAML (fig. 1). The results of the model testing indicated that the most parameter-rich hypothesis (H4) fits the data best. This hypothesis assumes that all ancestral and terminal branches have different omegas. The second best hypothesis was H3, which suggests a long-term shift in selective pressure resulting in both lineages having different omegas (ω0≠ωA ≠ ωG). Likelihood ratio tests were significant for H1–H3 (2Δl = 3562, P < 0.001) and H2–H3 (2Δl = 129.31, P < 0.001). Hypothesis H2 ranked third followed by H1 and, hypothesis H0, where there were no differences in selective pressures between lineages (ω0= ωG = ωA) had the worst fit to the data (table 2). Omega ratios for gymnosperms were significantly higher than for angiosperms in all hypotheses tested with the exception of H0 (one-ratio model).

Table 2

Parameter Estimates under Models of Variable Omega (ω) among Branches from the Gymnosperm and Angiosperm Lineages Based on a Concatenated Alignment of 31,737 Nucleotide Sites in 61 Species.

Hypothesis	Model	No. of Parameters	Background	Foreground	Parameter Estimates	ln L	AIC
H0	M0	1	ω0 = ωG = ωA		ω = 0.11356	–621777.74	1243557.48
H1	M2	2	ω0 = ωG	ωA	ω0 = ωG = 0.1238; ωA = 0.0989	–621644.6499	1243293.3
H2	M2	2	ω0 = ωA	ωG	ω0 = ωA = 0.0935; ωG = 0.2603	–619928.0209	1239860.042
H3	M2	3		ω0, ωA, ωG	ω0 = 0.0857; ωA = 0.1024; ωG = 0.2631	–619863.362	1239732.724
H4	M1	120			ω ranges between 0.05 to 0.76	–617598.8386	1235437.677

Note.—We tested the following hypotheses: H0: Homogeneous selective pressure in both clades (ω0 = ωG = ωA); H1: Selective Pressure in the Angiosperm clade (ω0 = ωG, ωA); H2: selective pressure in the gymnosperm clade (ω0 = ωA, ωG); H3: long-term shift in selective pressure resulting in both clades having different omegas (ω0, ωG, ωA); and H4: all terminal branches in both clades have different omegas. For hypotheses H1 to H3, ωG and ωA identify estimated omegas on stem branches of the gymnosperm and angiosperm lineages, according to phylogenetic tree in figure 1; ω0 represents the estimated omega in unselected branches (background). Parameter estimates of model M1 (hypothesis H4) are detailed in supplementary table S7, Supplementary Material Online.

Parameter Estimates under Models of Variable Omega (ω) among Branches from the Gymnosperm and Angiosperm Lineages Based on a Concatenated Alignment of 31,737 Nucleotide Sites in 61 Species. Note.—We tested the following hypotheses: H0: Homogeneous selective pressure in both clades (ω0 = ωG = ωA); H1: Selective Pressure in the Angiosperm clade (ω0 = ωG, ωA); H2: selective pressure in the gymnosperm clade (ω0 = ωA, ωG); H3: long-term shift in selective pressure resulting in both clades having different omegas (ω0, ωG, ωA); and H4: all terminal branches in both clades have different omegas. For hypotheses H1 to H3, ωG and ωA identify estimated omegas on stem branches of the gymnosperm and angiosperm lineages, according to phylogenetic tree in figure 1; ω0 represents the estimated omega in unselected branches (background). Parameter estimates of model M1 (hypothesis H4) are detailed in supplementary table S7, Supplementary Material Online.

Estimation of Substitution Parameters Using Site Models

A discrete model (M3) that uses an unconstrained discrete distribution to model heterogeneous omega ratios among sites (Yang et al. 2000), showed the best fit to the data in all angiosperm taxonomic families with the exception of Brassicaceae (model 7: beta). The nearly neutral model (model 1a) that assumes a proportion of conserved sites with ω0 < 1 and a proportion of neutral sites with ω1 =1, and the selection model (M2a) that allows an additional class of sites with ω2 > 1, had the worst fit in all angiosperm taxa. Excluding poorly fitted M1a and M2a, the average ω among all models ranged from 0.092 to 0.093 (Brassicaceae), 0.106–0.108 (Malvaceae), 0.117–0.125 (Fabaceae), 0.118–0.156 (Rosaceae), and 0.131–0.133 (Poaceae). Average ω ratios suggest a nonsynonymous mutation has only 9–13% as much chance as a synonymous mutation of being fixed, suggesting most sites are highly conserved in angiosperm taxa. In gymnosperms, M8 (beta & ω) showed the best fit in Cupressaceae and Cycads, M10 (beta & gamma +1) in Pinus, and M2a (selection) in Picea. The beta model (M7) had the worst fit to the data in Picea and Pinus; and the gamma model (M5) in Cupressaceae and Cycads. Excluding poorly fitted M7 for Picea and Pinus, and M5 for Cuppressaceae and Cycads, the average ω among all models ranged from 0.66–0.67 (Pinus), 0.37–0.41 (Picea), 0.167–0.178 (Cuppresaceae), and 0.2–0.216 (Cycads). Overall, these estimates suggest a significantly higher chance of fixation of nonsynonymous mutations in gymnosperms (17–67%) than in angiosperm taxa. Parameters estimates for each of the six site models tested in each of the five angiosperm and four gymnosperm families are included in supplementary table S5, Supplementary Material Online. When comparing the proportion of conserved, neutral and positively selected sites under the discrete model (M3), we observed that all taxa had a higher proportion of conserved followed by neutral and then by positively selected sites. Conserved sites (ω < 1) in angiosperms ranged from 55% to 77%, whereas gymnosperm sites ranged from 66% to 76%; differences between taxa were not significant (P = 0.62). The proportion of neutral sites (ω = 1) ranged from 21% to 39% in angiosperms, and from 21% to 28% in gymnosperms; however the differences between medians were not significant (P = 0.22) (supplementary table S5, Supplementary Material Online).

Inference of the Proportion of Sites under Selection

Models that allow for sites under positive selection such as M2a, M3, and M8, all suggested the presence of a very small number of weakly positively selected sites in angiosperm taxa. Models M3 and M8 identified between 0.7–1.3% sites under weak diversifying selection (ω = 1.65 and ω = 1.38, respectively) in Brassicaceae. However, likelihood ratio test statistics for comparing M1a and M2a, and M7 and M8 did not show significant results (table 3). In addition, the NEB analyses did not identify any sites under positive selection. For Malvaceae, four sites under positive selection were identified by the BEB analysis (P > 95%), and the likelihood ratio test for comparison between M7–M8 was significant (2Δl = 33.063, P < 0.001, df = 2). However, none of the other tested models identified sites under positive selection. Weakly to moderate diversifying selection (ω = 1.89 and ω = 2.75) was found in a very small proportion of sites (1.1% and 0.5%) in Fabaceae, based on the results of M3 and M8. In addition, the NEB analysis identified two sites under positive selection using M3, and the likelihood ratio test for comparison between M7–M8 was significant (2Δl = 22.74, P < 0.001, df = 2). In Rosaceae, M3 and M8 identified 0.2% of sites under strong diversifying selection. In addition, the BEB analysis identified two sites under positive selection using M3, and the likelihood ratio test for comparison between M7–M8 was significant (2Δl = 6.42, P < 0.05, df = 2). Finally, weakly diversifying selection (ω = 1.53 and ω = 1.85) was found in 0.7 and 1.3% sites in Poaceae. Four sites were identified by the BEB analyses in both M3 and M8, and the likelihood ratio test for M7–M8 was significant (2Δl = 15.428, P < 0.001 df = 2) (table 3, see supplementary table S5, Supplementary Material Online).

Table 3

Results of the Likelihood Ratio Tests for Evidence of Positive Selection in Each of the Taxonomic Families Studied.

Taxa	M1a–M2a (2ΔL)	P Value	M7–M8 (2ΔL)	P Value
Brassicaceae	0	NS	2.577926	0.2766
Malvaceae	0	NS	33.063102	0.00000007
Fabaceae	0	NS	22.744606	0.00001151
Rosaceae	0	NS	6.429714	0.04016
Poaceae	0	NS	15.428744	0.0004463
Pinus	511.4601	0	549.377428	0
Picea	116.785146	0	116.646654	0
Cupressaceae	0	NS	15.11157	0.000523
Cycadales	0	NS	6.535688	0.03808

Note.—Comparisons between site models M1a and M2a, and M7 and M8 for each taxonomic family were evaluated.

Results of the Likelihood Ratio Tests for Evidence of Positive Selection in Each of the Taxonomic Families Studied. Note.—Comparisons between site models M1a and M2a, and M7 and M8 for each taxonomic family were evaluated. In contrast to angiosperm taxa, evidence for moderate to strong diversifying selection and a higher number of sites under positive selection was found in gymnosperm taxa. In Pinus, M2a, M3, and M8 identified 16.3%, 23.13%, and 22.21% of sites under moderate to strong diversifying selection (ω = 3.2, ω = 2.73, and ω = 2.8, respectively). This taxonomic family had the highest number of sites under positive selection (240, 245, and 1070) evidenced by the results of the BEB analyses under M2a and M8, and the NEB analysis under M3. Likelihood ratio tests were significant for both the M1a–M2a (2Δl = 511.46, P < 0.001 df = 2) and the M7–M8 (2Δl = 549.37, P < 0.001 df = 2) comparisons. Similarly, in Picea, M2a identified 21.48% of weakly selected sites (ω = 1.19), and 3.25% of sites under strong positive selection (ω = 3.29). Models M3 and M8 identified 5.6% of sites under moderate positive selection (ω = 2.8). The number of sites under positive selection was highly variable among models: 1, 301, and 885 based on the BEB analyses under M2a and M8, and by the NEB results based on M3. The LRT statistics were significant for both the M1a–M2a and M7–M8 comparisons (table 3). A small proportion (1.2–1.4%) of sites under moderate positive selection (ω = 2.21 and ω = 2.32) were evidenced by the results of M3 and M8 in Cuppresaceae. Three sites were found to be under positive selection based on the NEB analysis of M3. Finally, in Cycads, M3 identified 4.6% of weakly selected sites (ω = 1.71); and M8 identified 1.9% of sited under moderate selection (ω = 2.38). The LRT statistics for M7–M8 comparisons were significant for both Cuppressaceae and Cycads (table 3, see supplementary table S5, Supplementary Material Online).

Distribution of Selection Coefficients

Parameter estimates α and β from the gamma distribution of ω across sites in each taxonomic family were used to estimate the distribution of the selection coefficients of new mutations (Yang et al. 2000; Nielsen and Yang 2003). As expected, we found that α and β varied across taxonomic families, resulting in varying estimates of S and f(S). For all taxonomic families, the distributions of selection coefficients for new mutations had a peak at S = –5, suggesting that the majority of sites are under purifying selection. In all angiosperms, except Brassicaceae, selection coefficients are mostly distributed with S = [–10, 0]; whereas in gymnosperms, the majority of selection coefficients are distributed within a wider range with S = [–20, 10]. Gymnosperm families had a greater proportion of sites under strong purifying selection (S < –15) than angiosperms (t = –3.57, P < 0.05); whereas angiosperm families had a greater proportion of neutral and mildly deleterious (S ≈ 0 and S > –15) sites than gymnosperms (t = 4.119, P < 0.05) (fig. 2). The proportion of advantageous mutations (positive selection coefficients) also varied among families, with gymnosperm taxa having on average, more sites under positive selection than angiosperm taxa (t = –5.74, P < 0.001) (fig. 2).

(A) Boxplots showing differences in number of synonymous substitutions (dS), nonsynonymous substitutions (dN), absolute rate of silent-site divergence (μ), and substitution rate ratio (ω), among life forms defined as angiosperms herbs (green), angiosperms shrub/trees (light green), and gymnosperms (blue). Results of the statistical tests of comparisons among groups can be found in supplementary table S6, Supplementary Material Online. (B) Proportion of sites under negative selection, when all sites are evaluated (top left), when S < –15 (top right), and when S > –15 (bottom left); and proportion of sites under positive selection (bottom right). Green boxes represent angiosperm species, and blue boxes, gymnosperms. (C) Distribution of the selection coefficient (f(S)) of new mutations. It was calculated for all species in each taxonomic family studied, assuming a gamma distribution of ω among sites. Parameter estimates α and β were obtained from running the site model 5 in codeml (PAML) using a concatenated alignment of 29,000–31,000 sites per taxonomic family. Selection coefficients were then obtained replacing α and β in equation 1. Green dotted vertical lines are used to show the distribution of f(S) when S ≈ 0 and ω = 1 (neutrality). Taxonomic families containing less than 5 species were not analyzed due to difficulties in constructing the phylogenetic trees in RaxML.

Correlations with Life-History Traits

In order to explain the variation we observe in dN, dS, and ω between the major plant clades, we tested whether these variables showed any associations with life history traits. Our results suggest that generation time (time to reach reproductive maturity, measured in years) was strongly negatively correlated with μ (family level r = –0.78, P < 0.001) (table 4). Also, genome size (amount of DNA contained in a haploid nucleus and measured in picograms (1C)) was strongly negatively correlated with μ (family level r = –0.67) but positively correlated with ω (species level r = 0.56 P < 0.001, family level r = 0.52 P < 0.05) (table 4). Finally, species richness (number of extant species in each taxonomic family) was significantly correlated with dS (r = 0.7, P < 0.001) and dN (r = 0.48, P < 0.05) at the family level (not tested at species level).

Table 4

	Species Correlations			Taxonomic Family Correlations
	Parameters	Generation Time (years)	Genome Size (1C)	Generation Time (years)	Genome Size (1C)	Species Richness
dS	r	–0.172	–0.135	–0.153	–0.311	0.703
	P-value	0.223	0.341	0.558	0.225	0.002
	df	50	50	15	15	15
dN	r	0.005	0.041	0.189	–0.026	0.485
	P-value	0.973	0.771	0.467	0.922	0.048
	df	50	50	15	15	15
Μ	r	–0.183	–0.037	–0.781	–0.670	0.057
	P-value	0.194	0.797	0.000	0.003	0.829
	df	50	50	15	15	15
Ω	r	0.212	0.567	0.482	0.527	–0.224
	P-value	0.132	0.000	0.050	0.030	0.387
	df	50	50	15	15	15

Note.—Substitution rates and life-history traits were estimated by species pairs (species correlations table); and averaged within taxonomic family (taxonomic family correlations table). We reject a null hypothesis of correlation equal to zero when P < 0.05 (shaded areas).

Correlation among Number of Synonymous (dS) and Nonsynonymous Substitutions (dN), Nonsynonymous/synonymous Rate Ratio (ω), and the Absolute Rate of Silent-Site Divergence (μ), with Life-History Traits Using the Phylogenetically Independent Contrast Method (PIC). Note.—Substitution rates and life-history traits were estimated by species pairs (species correlations table); and averaged within taxonomic family (taxonomic family correlations table). We reject a null hypothesis of correlation equal to zero when P < 0.05 (shaded areas). We found significant differences (P < 0.05) in μ among groups with different life forms when using six groups (angiosperms dicots herbs, angiosperms dicots shrubs, angiosperms dicots trees, angiosperms monocots herbs, gymnosperms, and Gnetophytes). Differences in ω and μ were also significant among groups when comparing four groups (angiosperms herbs, angiosperms trees/shrubs, gymnosperms and Gnetophytes) and three groups (angiosperms herbs, angiosperms shrubs/trees, and gymnosperms) (see supplementary table S6, Supplementary Material Online). Number of synonymous (dS) and non-synonymous substitutions (dN) were not significantly different in any of the groups when assessed by P < 0.05. Boxplots showing differences among groups when groups are divided into three (angiosperms herbs, angiosperms trees/shrubs, and gymnosperms) are plotted in figure 2. In the Gnetophyte group, only Gnetum species were included and not Welwitschia and Ephedra (see Discussion part).

Discussion

Slower Rates of Molecular Evolution in Gymnosperms

Our study clearly suggests slower rates of molecular evolution in gymnosperm than in angiosperm protein-coding genes, evidenced by a lower number of synonymous substitutions (dS) and lower rates of silent-site divergence (μ). Based on the evolutionary and phylogenetic analyses of 31,968 nucleotide sites from a wide taxonomic sampling, our results suggest μ values of 4.1–14 × 10 − 10 in gymnosperm taxa, and 4–6.5 × 10 − 9 in angiosperm taxa (table 1). Taken together, this corresponds to a 7-fold average variation among gymnosperms and angiosperms (7.71 × 10 − 10 vs. 5.35 × 10 − 9 synonymous substitutions/site/year, respectively). Our results lay within the ranges obtained by previous studies comparing a few Pinaceae species, suggesting the variation found in this taxonomic family is a relatively good representation of the variation found in all the gymnosperm taxa, with the exception of Gnetophytes. For example, Willyard et al. (2007), while comparing different species of Pinus found absolute rates of silent-site divergence of 7–13.1 × 10 − 10. Similarly, Chen et al. (2012) estimated pairwise sequence divergence between three Picea and one Taxus species and obtained μ ranging from 5.5–12.4 × 10 − 10. Finally, Buschiazzo et al. (2012), obtained a 25-fold difference when comparing substitution rates between orthologs of Picea sitchensis and Pinus taeda with Arabidopsis thaliana, and a 4-fold difference when comparing the same Picea-Pinus orthologs with Populus trichocarpa. The pattern of slower rates of molecular evolution is consistent when grouping species according to their life forms, with gymnosperms showing lower dS and μ than angiosperms herbs and angiosperms shrubs and trees. This level of evolutionary conservation (which probably help explain the high levels of macrosynteny previously observed in some Pinaceae species) is surprising considering the ancient nature of the plant clade, which appeared on Earth much earlier than flowering plants.

Higher Substitution Rate Ratio (ω) in Gymnosperms than in Angiosperms

Equally surprising are the high substitution rate ratios (ω) found in gymnosperms despite their slow rates of evolution. Parameter estimates under models of variable ω between stem branches of the angiosperm and gymnosperms lineages based on a concatenated alignment of 31,737 nucleotide sites supported a model of different and higher ω in gymnosperms than in angiosperms (table 2). In addition, substitution parameters based on seven different site models, found angiosperms rate ratios varied from 0.09–0.13, whereas gymnosperms ratios varied from 0.17–0.67. Overall, these estimates suggest a significantly higher chance of fixation of nonsynonymous mutations than synonymous mutations in gymnosperms than in angiosperm taxa (supplementary table S5, Supplementary Material Online). High ω ratios could result from a high dN over a low dS, however this does not seem to be the case for gymnosperms. Instead, higher ω ratios result from a low to moderate dN over a very low dS in gymnosperm genes. Our results are in contrast with previous smaller scale studies either showing significantly lower dS and dN as a cause of a 4-fold higher ω in gymnosperms than in angiosperms (Buschiazzo et al. 2012), or the ones that did not find significant differences in ω ratios among gymnosperms and angiosperms (Chen et al. 2012). These studies were limited by a small number of species that would not allow for evolutionary analyses within a phylogenetic context. When using a small number of species, pairwise estimates of dN, dS, and ω will strongly depend on the selection of species. In addition, the evaluation of individual sites (site models) and the branch models have more power to test variable ω ratios than the pairwise sequence comparisons, in which ω rates are averaged over all amino acid sites (Yang 2002). In the case of the Buschiazzo’s study, highly diverged sets of species were used for comparisons (P. sitchensis–P.taeda, and A.thaliana–P.trichocarpa). High sequence divergence between species is often associated with difficulties in the alignment, different codon usage biases and nucleotide compositions. In addition, saturation of substitutions may be particularly problematic when using pairwise methods in comparison with branch methods.

Positive Selection and Adaptive Evolution in Gymnosperms

Under the Nearly Neutral Theory, the strength and efficacy of selection depends on the long-term effective population size (Ne). Theoretical approaches predict that as population sizes increase, the power of natural selection increases faster than the influx of new mutations (Akashi et al. 2012). Natural selection thus becomes more effective in removing deleterious mutations and in fixing advantageous mutations, which results in a lower substitution rate of deleterious mutations and also a higher substitution rate of advantageous mutations in large populations (Lanfear et al. 2014). Current estimates suggest that species with large population sizes such as Drosophila, mice, bacteria, and rabbits show signs of adaptive evolution (Bierne and Eyre-Walker 2004; Carneiro et al. 2012; Phifer-Rixey et al. 2012). In contrast, little evidence of adaptive amino acid substitutions has been found in flowering plants (mostly with modest Ne < 100,000) (Gossmann et al. 2010; Hough et al. 2013). Few exceptions have been found in angiosperms with Ne >100,000 such as Capsella grandiflora (Slotte et al. 2010; Williamson et al. 2014); Helianthus spp. (Strasburg et al. 2011); and Populus spp (Ingvarsson 2010; Wang et al. 2016a, 2016b). In gymnosperms, previous reports of Ne range from 120,000–560,000 for Picea and Pinus species (Brown et al. 2004; Bouille and Bosuquet 2005; Syring et al. 2007). In our study, we found a higher proportion of sites with positive selection coefficients and evidence of stronger diversifying selection in gymnosperms than in angiosperms (P = 0.033). Models that allow for sites under positive selection such as M2a, M3 and M8, all suggested the presence of a very small number (0.2–1.3%) of weakly positively selected sites (ω = 1.53–1.89) in angiosperm taxa, in contrast to a higher number (≤23.13%) of sites with moderate to strong diversifying selection (ω = 2.21–3.29) in gymnosperm taxa. In addition, results of the BEB analysis using site model 2a identified 241 (0.759%) sites under selection in gymnosperms and none in angiosperms. The same analysis using site model M8 identified 546 (1.72%) sites under selection in gymnosperms and 10 (0.03%) in angiosperms. Finally, the NEB analysis using site model M3 identified 1960 (6.175%) sites in gymnosperms and only six (0.018%) in angiosperms. Likelihood ratio tests for the presence of diversifying selection were significant for Picea and Pinus (M1a–M2a comparison) and for all taxa with the exception of Brassicaceae under M7 and M8 comparisons. Under M3, Picea and Pinus showed the highest proportions of sites under selection, whereas Rosaceae showed the lowest proportion. Our results are consistent with previous estimates of low levels of adaptive evolution in eleven angiosperm species (Gossmann et al. 2010); and with estimates in Pinus contorta and Pinus taeda, in which the proportions of sites fixed by positive selection were 13–52% and 22–37%, respectively (Eckert et al. 2013; Hodgins et al. 2016). There is a rich literature reporting the influence of positive selection in Pinaceae species, which in absence of reports in other gymnosperm families, has frequently been used as an example of all gymnosperm taxa. The recent sequencing of some Pinaceae genomes have paved the way to test the influence of positive selection at the genome level (De La Torre et al. 2015a; Hodgins et al. 2016), which have confirmed what was previously found at the genetic (small number of genes) and population levels (Eckert et al. 2010; Buschiazzo et al. 2012; Pavy et al. 2012; De La Torre et al. 2015b). Furthermore, studies on quantitative trait variation suggest natural selection is highly efficient in producing a relatively fast evolutionary response in contrast to the slow evolutionary rate (Savolainen and Pyhäjärvi 2007). Examples of this include the evolution of cold adapted genotypes in several long-lived tree species in the northern hemisphere after the Last Glaciation (Mimura and Aitken 2007; Wachowiak et al. 2009; Holliday et al. 2010; Kujala and Savolainen 2012; De La Torre et al. 2014b). In addition, because of their long generation times and low mutation rates, gymnosperm genomes may retain the consequences of demographic events for a long time, suggesting that even species that have contracted their ranges and currently have small distributions may resemble those with long-term large effective population sizes. Although our results suggest an important role of positive selection in the evolution of gymnosperm taxa, we do not believe that Pinaceae is a good representative of other gymnosperm taxa. Pinus and Picea have indeed significantly higher number of sites under diversifying selection than other gymnosperm taxa. Recent evolutionary radiations may have resulted in lower dS and higher ω in comparisons within these taxonomic families (Palme et al. 2009). We expect our estimation of the proportion of sites under selection to be conservative because of the use of single copy genes. Single copy genes are usually ubiquitously expressed genes that encode for basic cellular functions that are preserved across taxa. For this reason, they are likely to be under stronger purifying selection due to functional and structural constraints than paralogous genes in gene families in both angiosperms (De Smet et al. 2013; Li et al. 2016) and gymnosperms (De La Torre et al. 2015a). These genes also seem to experience less frequent positive selection than paralogous genes, which tend to be more narrowly expressed (Larracuente et al. 2007). For this reason, we expect a lower proportion of conserved sites and a higher proportion of selected sites in paralogous genes (which account for the majority of genes in plant genomes). In fact, it has been suggested that many of the genes under diversifying selection in conifers belong to large multi-copy gene families such as Leucine Rich Repeats, Cytochrome P450, among others (Pavy et al. 2012; Neale et al. 2014; De La Torre et al. 2015a). In addition, some sites under positive selection may go undetected because structural constraints may induce purifying selection to push ω to values lower than 1 in highly conserved genes (Echave et al. 2016). However, the introduction of paralogous genes in a phylogeny has confounding effects for both the phylogenetic reconstruction and also for the estimation of sequence divergence among taxa, therefore, we believe than the use of single-copy genes is probably the best method for comparisons among diverse taxa. In our efforts to identify a highly confident set of very well aligned orthologs across a broad taxonomic sampling, the number of genes used in this study was significantly reduced raising concerns about the representativeness of these genes in a genome-wide context. However, comparisons with previous studies, suggest our estimates of rates of divergence are quite comparable with estimates of rates with much larger data sets in specific taxa (Gossmann et al. 2010; Eckert et al. 2013; Hodgins et al. 2016).

Stronger and More Efficient Purifying Selection in Gymnosperms

When estimating the distribution of selection coefficients, we found that the majority of sites in both angiosperms and gymnosperms were highly conserved, suggesting an important role of purifying selection in the evolution of plant genomes. Previous studies in angiosperm species have suggested that the largest proportion of mutations is strongly deleterious, and that the number of slightly and mildly deleterious mutations seems to be relatively conserved among species (Gossmann et al. 2010). In contrast with these studies, our study has found differences in the proportion of sites under mildly and strong purifying selection. Gymnosperm families had a greater proportion of sites under strong purifying selection (S < –15) than angiosperms (t = –3.57, P < 0.05); whereas angiosperm families had a greater proportion of neutral and mildly deleterious (S ≈ 0 and S >–15) sites than gymnosperms (t = 4.119, P < 0.05) (fig. 2). This could be explained by the expectation that species with larger Ne generally experience stronger purifying selection (Slotte et al. 2010; Gossmann et al. 2010; Hough et al. 2013; Williamson et al. 2014). Slightly deleterious mutations will be less likely to segregate at higher frequencies and fix in large populations (Akashi et al. 2012). Such mutations are also selected against by the partial low degree of selfing and high early inbreeding depression in conifers (Williams and Savolainen 1996; Remington and O’Malley 2000). Larger population sizes may also explain the lower number of neutral sites in gymnosperms than in angiosperms found in this study, as it has been shown that the number of effectively neutral mutations is negatively correlated with Ne in many species (Piganeau and Eyre-Walker 2009; Gossmann et al. 2010).

Life Form Does Not Explain Differences in Molecular Evolution among Woody Angiosperms and Woody Gymnosperms, or within Gymnosperms

In flowering plants, it has been recently suggested that life form (also measured as plant height) is correlated with rates of molecular evolution, with shrubs and trees showing lower rates of molecular evolution than herbaceous plants (Gaut et al. 1992; Smith and Donoghue 2008; Lanfear et al. 2013). Similarly, in our study, we found significantly lower μ, on average, in angiosperm shrubs/trees compared to angiosperm herbs (see supplementary table S6, Supplementary Material Online, and, fig. 2). Both dS and dN showed large variation among angiosperm herbs, and differences in dS and dN among angiosperm shrubs/trees and herbs were not significant (see supplementary table S6, Supplementary Material Online). When incorporating gymnosperms into the analysis, life form on its own does not explain the differences in rates of molecular evolution among seed plants, as we observe significant differences in dN, dS, ω, and μ also among woody angiosperms and woody gymnosperms (fig. 2). Our analyses showed that Gnetophytes, the only gymnosperm division that contains non-woody genera (Welwitschia and Ephedra), shows higher dN and μ than what we observe in other gymnosperm families (table 1). Therefore, differences in life form may drive differences in the rate of molecular evolution found in Gnetophytes. However, we still observe these differences when only including the woody genus Gnetum (and excluding Ephedra and Welwitschia), suggesting other factors may influence the differences in rate of sequence evolution (table 1). In fact, Gnetophytes are different from the rest of gymnosperms in a number of features that include but are not restricted to their morphology, ecology, and the presence of angiosperm-like characteristics such as special water-conducting wood vessels, and reproductive structures organized in compound strobili (Doyle and Donoghue 1986; Friedman 1998). For these reasons, the position of Gnetophytes within gymnosperms is contentious (Doyle 1998; Braukmann et al. 2009; Cibrian-Jaramillo et al. 2010; Ran et al. 2010; Xi et al. 2013). In any case, the lack of close, extant relatives to Gnetophytes suggests that this problem might never really be solved satisfactorily, regardless of the amount of sequence data we have access to. Our results suggest that estimations of rates of sequence divergence among a few species from very diverged Gnetophyte taxa (such as Gnetum, Welwitschia and Ephedra) may lead to inaccurate estimates of dN and dS. For this reason, we decided to exclude the more distant Welwitschia and Ephedra from all correlations with life history traits.

Generation Times and Their Effect on Low Substitution Rates

The differences in rates of molecular evolution between angiosperms with different life forms have been presumed to reflect differences in generation times between herbaceous and woody plants (Gaut et al. 1992; Smith and Donoghue 2008; Lanfear et al. 2013). The generation time hypothesis suggests that species with shorter generation times (e.g. herbaceous plants) accumulate more replication errors per unit time because they copy their genomes more often, which results in higher mutation rates (Li et al. 1996). Although generation time may not be a good indicator of the overall rate of genome replication because the number of mitotic cell divisions can vary substantially between generations and among plant species; it remains strongly associated with the long-term rates of meiosis in plants (Petit and Hampe 2006; Lanfear et al. 2013). Therefore, if a significant proportion of heritable mutations occur during meiosis, plants with longer generation times (e.g. gymnosperms) would have lower mutation and substitution rates per unit of time (Lanfear et al. 2013). Alternatively, it is been proposed that differences in the rates of mitosis (mitotic cell divisions that occur in the apical meristem before gametogenesis) can account for the observed differences in rates of molecular evolution among plants of different height (Lanfear et al. 2013). Lower absolute growth rates in long-lived woody perennials would translate in fewer cell divisions and less opportunities for DNA replications errors than short-lived plants (Bromham et al. 2015). This would result in a lower mutation rate per unit of time in long-lived species. Our results are consistent with this expectation, as we found a strong negative correlation between generation times and μ (r = –0.78, P < 0.001) (table 4). Long generation times, large effective population sizes and low recombination rates may also help explain the low synonymous polymorphism in gymnosperms (Savolainen and Pyhäjärvi 2007; Jaramillo-Correa et al. 2010). In plants, in the absence of a segregated germline, gametes arise from the apical meristem late in development following periods of vegetative growth. Because of this, somatic mutations acquired during vegetative growth can be transmitted to the next generation (Watson et al. 2016). Longer growth periods are thought to result in a larger number of cell divisions, increasing the opportunities for mutations to occur per generation (Schultz and Scofield 2009). Our results suggest, than on average, for the species under study, long-lived gymnosperms accumulate four times more mutations per generation than short-lived angiosperms (assuming an average generation time of 20 years in gymnosperms, and an average μ equal to 1.57 × 10 − 8 in gymnosperms, and 3.92 × 10 − 9 in angiosperms). This is coincident with previous reports of increased number of somatic mutations in gymnosperms species (Cloutier et al. 2003; O’Connell and Ritland 2004) and other woody perennials (Ally et al. 2010; Bobiwash et al. 2013). However, because the per-generation increase in mutation rate in woody perennials may be less than predicted from their differences in generation time (Petit and Hampe 2006), we would expect our reported values to be an over-estimation of the actual difference between plant seed clades. Differences in generation times would predict a mutation rate >100 times larger in mangrove trees than in annuals, when estimates were only 25 times larger (Klekowski and Godfrey 1989). Alternatively, there may be selection to reduce DNA replication-dependent errors through minimizing the number of cell divisions required during development, as recently suggested in Arabidopsis (Watson et al. 2016).

Larger Genomes Are Correlated with Larger Omegas and Slower Rates of Sequence Divergence

In our study, we also explored the relationship between genome size and rates of molecular evolution. Gymnosperm genomes are characterized by their enormous genome sizes (20–40 Gb), unique genome silencing mechanisms and low unequal recombination (Leitch and Leitch 2012, Nystedt et al. 2013). While angiosperm genomes are highly dynamic and have efficient mechanisms to counteract the increase in DNA amount stemming from WGDs or transposable elements (e.g. replication or recombination-based errors generating indels, unequal recombination between sister chromosomes, Grovel and Wendel 2010), gymnosperm genomes seem to be less dynamic and may have evolved their own epigenetic mechanisms to silence retrotransposons (Leitch and Leitch 2012). Our results indicate that genome size is positively correlated with ω (r = 0.56, P < 0.001 at the species level, and r = 0.52, P < 0.05 at the family level). In addition, genome size is negatively correlated with μ (r = –0.67, P < 0.01), suggesting species with large genomes such as gymnosperms have slower rates of molecular evolution than species with small genomes, confirming previous studies on angiosperms species (Bromham et al. 2015). A possible explanation for this is that if large genome sizes correlate with larger cells and a reduction in growth, then plants with larger genomes might have fewer replications and therefore less accumulated mutations per unit of time (Bromham et al. 2015). Our study aims to contribute to the unexplored field of molecular evolution in seed plants by investigating the differences in rates of molecular evolution among gymnosperms and angiosperms using a phylogenetic framework. The recent genomic and transcriptomic resources in gymnosperm species opened a window to understand the evolution of this ancient and important plant clade. However, gymnosperm resources are still limited in comparison to their sister clade of flowering plants. We hope that with the development of new genomic resources, studies in molecular evolution will include a broader taxonomic sampling that provides us with a more complete understanding of the evolution of seed plants.

Materials and Methods

Generation of Phylogenetic Markers

To develop our set of phylogenetic markers, we performed deep sequencing and assembly of whole transcriptomes from two conifer species (Pinus pinaster and Pinus sylvestris) (Li Z et al., (in review), transcriptome data can be found at: http://bioinformatics.psb.ugent.be/supplementary_data/zheli/phylo/). We integrated this data with whole-transcriptome data from another 29 gymnosperms, 34 angiosperms, and 1 outgroup (Physcomitrella patens) obtained from public databases PlantGDB, oneKP, TreeGenes and PLAZA v3.0 (Proost et al. 2015). We used OrthoMCL (Li et al. 2003) to build orthologous gene families across the species. To reduce the number of single-copy genes, we selected only the gene families that were conserved and had low copy number. Later we used hidden Markov probabilistic models implemented in HMMER to build an HMM profile for each gene family based on multiple sequence alignment (Eddy 2009). These HMM profiles were then used to assign additional proteins to the existing gene families. Then, we used HMMSEARCH to find the best protein hit of an HMM profile in each species and selected the markers with only reciprocal best hits for phylogenomic analysis. In order to increase the spectrum of the phylogenetic markers, we only selected single-copy markers that were present in a majority of species. Multiple sequence alignments were carried out for each gene family based on amino acid sequences using Muscle v3.8.31 (Edgar et al. 2004). Trimal v1.4 (Capella-Gutierrez et al. 2009) was used to back translate the amino acid alignments into coding sequence alignments, and to remove low quality alignment regions and spurious sequences. We defined a spurious sequence as a sequence having less than 70% of the total alignment positions that were present in 75% of 66 species. Genes that lost their sequences in more than 10% of the species, after removing spurious sequences, were not used for further analyses. This way, we obtained 42 single-copy markers across 66 seed plant species that were used for evolutionary and phylogenetic analyses. Annotations of all genes are reported in supplementary table S1, Supplementary Material Online. Multiple sequence alignments for each of the 42 single-copy genes were divided in two groups containing either angiosperms (34 species) or gymnosperms (31 species), to allow further comparisons between major plant taxa. Alignment gaps and low quality regions were manually removed using Jalview version 2.8.1 (Waterhouse et al. 2009). Also, species containing more than 30% gaps or ambiguous sites were not kept for further analyses. Pairwise estimates (runmode= –2, seqtype = 1, model = 0, NSsites = 0) of the number of synonymous substitutions per site (dS), nonsynonymous substitutions per site (dN) and nonsynonymous/synonymous rate ratio (also called “omega” and denoted by ω or dN/dS) were calculated between species in terminal branches for each of the taxonomic families of the angiosperm and gymnosperm phylogeny using the maximum likelihood method of Goldman and Yang (1994) in the codeml program from the PAML package (version 4.8, Yang 2007). The analysis was repeated twice for each of the 42 genes. For each sequence pair, only the results with the higher lnL (log likelihood) were retained. We discarded genes with dS values lower than 0.01, as these values may result in inaccurate estimates of ω; and also genes with dS or dN > 3 which suggest saturation of substitutions. Abnormally high omega rates (ω > 10) were also discarded (Villanueva-Cañas et al. 2013). To evaluate the differences in synonymous (dS) and non-synonymous substitutions (dN) and their ratio (ω) between angiosperms and gymnosperms, we used a sign test in the R package PASWR (Ugarte et al. 2008) to test the null hypothesis that the median for the differences between the pairs equals zero. We tested the alternative hypotheses of a higher ω in gymnosperms than in angiosperms; and a lower dN and dS in gymnosperms than in angiosperm species for each of the genes. Significance values were calculated for all genes for each of the variables measured using a one-sided exact binomial test in R. In addition to the pairwise estimates, we also calculated dS, dN, and ω for each terminal branch of the phylogeny using the free-ratios branch model (Model 1) implemented in codeml. Differences in dS, dN, and ω between plant lineages were tested using a Welch two-sample t-test in R. Absolute rates of silent-site divergence were estimated for each taxonomic family, assuming the “molecular clock” hypothesis (Kimura 1968, Kimura and Ohta 1971), in which the change at the molecular level occurs constantly through time across evolutionary lineages. Synonymous substitutions (dS) for each gene were averaged for all pairs of species in each of the six taxonomic families within gymnosperms (Pinaceae, Cuppresaceae, Taxaceae, Podocarpaceae, Cycadaceae, and Gnetophytes); and in eight taxonomic families in angiosperms (Brassicaceae, Malvaceae, Euphorbiaceae, Fabaceae, Cucurbitaceae, Rosaceae, Solanaceae, and Poaceae). Poaceae was further divided in two groups (each containing a different ancestral branch) as Poaceae I and Poaceae II (see supplementary figure S1, Supplementary Material Online). Absolute rates of silent-site divergence were calculated for each of the genes using the formula μ = dS/2T where μ is the synonymous divergence rate per site per year, dS is the mean of synonymous substitutions per site, and T is the time of divergence between two species in years (Gaut et al. 2011). Overall rates of silent-site divergence for each family were calculated using the same formula, where dS is the mean of synonymous substitutions per synonymous site across all 42 genes, and T is the time of divergence in years. Estimations of divergence times (T) were based on fossil records and previously published divergence times (see supplementary table S2, Supplementary Material Online). Whenever possible, we selected the median time of divergence between the species in each taxonomic group, making sure the estimates are consistent with previous estimates for each taxa. To understand the variation in selective pressures among the gymnosperm and angiosperm plant lineages, we formulated the following hypotheses: H0: homogeneous selective pressure in both lineages (ω0= ωG = ωA); H1: selective pressure in the angiosperm clade (ω0= ωG, ωA); H2: selective pressure in the gymnosperm clade (ω0= ωA, ωG); H3: long-term shift in selective pressure resulting in both clades having different omegas (ω0, ωG, ωA); and H4: all terminal branches in both clades have different omegas. All estimated omegas for hypotheses H1 to H3 were calculated on the stem branches of the angiosperm (ωA) and the gymnosperm (ωG) clades according to the phylogenetic tree in figure 1. The estimated omega in unselected branches (background) is represented by ω0. We tested the different hypotheses using branch models (Yang 1998; Yang and Nielsen 1998) in PAML 4.8 (Yang 1997; Yang 2007) that allow ω to vary among branches in the tree. The one-ratio model (runmode = 0, seqtype = 1, model = 0), which assumes one ω ratio for all branches, was first used to estimate the branch lengths from a concatenated sequence alignment of all 42 single-copy genes (31,968 nucleotide sites), and a phylogenetic tree (see below for details on tree building). This model was also used to test the null hypothesis (H0). The resulting branch lengths were then used as initial values to run the program three times with model 2 (model = 2, NSSites = 0) to test hypotheses H1, H2, and H3. This model allows different branch groups to have different omegas, according to the different branch labels assigned. Omega in selected branches (foreground) was then compared with omega in unselected branches (background). Finally, we tested hypothesis 4 by fitting model 1 (model = 1, NSsites = 0), also called the free-ratios model. A likelihood ratio test was used to assess deviations from the null model (both angiosperms and gymnosperms had the same ω) for the gene set. Corrections for multiple testing were done using the Benjamini and Hochberg method (1995) with a false discovery rate threshold of 0.05. Phylogenetic trees used in all branch models were constructed applying a posteriori partitioning, inferred from Bayesian searches of the substitution rate matrix using a mixture model approach, to the concatenated alignment by carrying out BayesPhylogenies (Pagel and Meade 2004) with 10 million generations of Markov chain Monte Carlo analysis. We finally obtained 12 partitions by the Perl script provided in Xi et al. (2012). The phylogeny of seed plants was inferred by RAxML (8.2) with the concatenated alignment and the partitions, and edited for publication with FigTree v.1.4.2 (http://tree.bio.ed.ac.uk).

Estimation of Substitution Parameters and Inference of the Proportion of Sites under Selection for Each Taxonomic Family

We tested different site models in codeml to estimate substitution parameters; infer omega distributions and assess the proportion of conserved (ω < 1), neutral (ω = 1), and selected (ω > 1) nucleotide sites in each taxonomic family. We used the following parameters: runmode = 0 (user tree); model = 0; NSsites = 0,1,2,3,5,7,8,10; kappa (transition/transversion rate ratio) to be estimated from the data. Models M1a (nearly neutral), M2a (positive selection), M3 (discrete), M5 (gamma), M7 (beta), M8 (beta & ω), and M10 (beta & gamma + 1) detailed in Yang et al. (2000), were tested for each taxonomic family independently (all terminal branches in each clade), using a concatenated alignment of 42 genes. By testing all species that share a most recent common ancestor, we make sure that species had a similar amount of time to accrue genetic differences, and that therefore any difference in the amount of genetic change accumulated through time represents a difference in the evolutionary rate. For this analysis, three species were removed due to problems in the concatenated alignment (Beta vulgaris, Cryptomerica japonica and Welwitschia mirabilis). Input phylogenetic trees were constructed for each taxonomic family using the GTR + GAMMA model for maximum likelihood of phylogenetic inference in RAxML (Stamatakis 2014). We used the Akaike Information Criteria (AIC) to rank models and estimate the most probable distribution of omega for each taxonomic family. Runs were repeated to ensure the convergence of results. In addition, we also estimated the number of sites under positive selection using two different analyses. In the first one, we compared models M1a–M2a and M7–M8 using likelihood ratio tests. In the second one, we used the Bayes Empirical Bayes (BEB) (Yang et al. 2005), implemented in site models M2a and M8, to calculate the posterior probabilities for site classes and to identify sites under positive selection when the likelihood ratio test was significant. We also used the results of the Naïve Empirical Bayes (NEB) but only when BEB results were not available (site model M3). While omega (ω), as formulated in the codon models, is a property of a particular site or collection of sites in the DNA sequence; the selective coefficient (S) is a property of a particular allele or mutation (Nielsen and Yang 2003). Therefore, it is possible to infer the distribution of S from the distribution of ω if some assumptions are made in the mutation model. We assume no interference in the fixation process of multiple mutations at different sites, and further that there are no more than two alleles segregating at the same nucleotide site. We also assume that all non-synonymous mutations at the same amino acid site have equal selection coefficients S. If these assumptions are not met, we may under-estimate the selection coefficients. We use the inverse mapping from ω to S to obtain the distribution of the selective coefficient of new mutations from the distribution of ω. Model 5 (Yang et al. 2000) assumes that omega (ω) is gamma distributed among codon sites with parameters α and β, and therefore the probability density function for S is obtained by: where h(S) = (es – 1) (Nielsen and Yang 2003). We assumed a constant population size among lineages within each taxonomic family. In order to explain the variation in substitution rates and ω between major plant clades, we evaluated the associations of these with several biological and life history traits. Generation time (time to reach maturity), genome size (amount of DNA contained in a haploid nucleus measured in picograms), and species richness (number of extant species in each taxonomic group or family) were obtained from public databases. Because phylogenetic non-independence of data points can lead to an inflation of the significance of observed relationships between substitution rates and traits (Lanfear et al. 2010), we used the phylogenetically independent contrasts (PIC) method (Felsenstein 1985) implemented in the R package APE to estimate the correlations between variables (Paradis et al. 2004). This analysis was done twice, either using all species in the phylogeny or using only taxonomic families. For the latter analysis, a phylogenetic tree with average branch lengths for each family, and family-average estimates of all variables were used as input files in APE. Phylogenetic trees were imported in Newick format and then saved with FigTree v1.4.2 (Rambaut 2006, http://tree.bio.ed.ac.uk) before PIC analysis. We also tested for significant differences in dN, dS, ω, and μ among groups with different life forms using a Kruskal-Wallis one-way analysis of variance (rank sum test). The analysis was repeated three times taking in consideration different groups. In the first one there were six groups as follows: angiosperms dicots herbs, angiosperms dicots shrubs, angiosperms dicots trees, angiosperms monocots herbs, gymnosperms and Gnetophytes. In the second one, there were four groups: angiosperms herbs, angiosperms trees/shrubs, gymnosperms and Gnetophytes. And in the last test three groups were considered (angiosperms herbs, angiosperms shrubs/trees, and gymnosperms) while Gnetophytes were excluded.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.

86 in total

1. Codon-substitution models for heterogeneous selection pressure at amino acid sites.

Authors: Z Yang; R Nielsen; N Goldman; A M Pedersen
Journal: Genetics Date: 2000-05 Impact factor: 4.562

2. Back to nature: ecological genomics of loblolly pine (Pinus taeda, Pinaceae).

Authors: Andrew J Eckert; Andrew D Bower; Santiago C González-Martínez; Jill L Wegrzyn; Graham Coop; David B Neale
Journal: Mol Ecol Date: 2010-08-13 Impact factor: 6.185

3. Evidence for widespread positive and purifying selection across the European rabbit (Oryctolagus cuniculus) genome.

Authors: Miguel Carneiro; Frank W Albert; José Melo-Ferreira; Nicolas Galtier; Philippe Gayral; Jose A Blanco-Aguiar; Rafael Villafuerte; Michael W Nachman; Nuno Ferrand
Journal: Mol Biol Evol Date: 2012-01-31 Impact factor: 16.240

4. Selection on nuclear genes in a Pinus phylogeny.

Authors: A E Palmé; T Pyhäjärvi; W Wachowiak; O Savolainen
Journal: Mol Biol Evol Date: 2009-01-23 Impact factor: 16.240

5. Taller plants have lower rates of molecular evolution.

Authors: Robert Lanfear; Simon Y W Ho; T Jonathan Davies; Angela T Moles; Lonnie Aarssen; Nathan G Swenson; Laura Warman; Amy E Zanne; Andrew P Allen
Journal: Nat Commun Date: 2013 Impact factor: 14.919

6. Evolution of the mutation rate.

Authors: Michael Lynch
Journal: Trends Genet Date: 2010-06-30 Impact factor: 11.639

7. Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors: Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal: Bioinformatics Date: 2009-01-16 Impact factor: 6.937

8. Evidence for variation in the effective population size of animal mitochondrial DNA.

Authors: Gwenael Piganeau; Adam Eyre-Walker
Journal: PLoS One Date: 2009-02-09 Impact factor: 3.240

9. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Authors: Salvador Capella-Gutiérrez; José M Silla-Martínez; Toni Gabaldón
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

10. Sequencing of the needle transcriptome from Norway spruce (Picea abies Karst L.) reveals lower substitution rates, but similar selective constraints in gymnosperms and angiosperms.

Authors: Jun Chen; Severin Uebbing; Niclas Gyllenstrand; Ulf Lagercrantz; Martin Lascoux; Thomas Källman
Journal: BMC Genomics Date: 2012-11-02 Impact factor: 3.969

43 in total

1. Phylogenomics resolves the deep phylogeny of seed plants and indicates partial convergent or homoplastic evolution between Gnetales and angiosperms.

Authors: Jin-Hua Ran; Ting-Ting Shen; Ming-Ming Wang; Xiao-Quan Wang
Journal: Proc Biol Sci Date: 2018-06-27 Impact factor: 5.349

2. Evolutionary footprints of a cold relic in a rapidly warming world.

Authors: Eva Wolf; Emmanuel Gaquerel; Mathias Scharmann; Levi Yant; Marcus A Koch
Journal: Elife Date: 2021-12-21 Impact factor: 8.140

3. Adaptive changes of the autosomal part of the genome in a dioecious clade of Silene.

Authors: Jitka Zluvova; Zdenek Kubat; Roman Hobza; Bohuslav Janousek
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2022-03-21 Impact factor: 6.237

4. Genomic insights into the genotype-environment mismatch and conservation units of a Qinghai-Tibet Plateau endemic cypress under climate change.

Authors: Heng Yang; Jialiang Li; Richard Ian Milne; Wenjing Tao; Yi Wang; Jibin Miao; Wentao Wang; Tsam Ju; Sonam Tso; Jian Luo; Kangshan Mao
Journal: Evol Appl Date: 2022-06-10 Impact factor: 4.929

5. A Comprehensive Study of the WRKY Transcription Factor Family in Strawberry.

Authors: José Garrido-Gala; José-Javier Higuera; Antonio Rodríguez-Franco; Juan Muñoz-Blanco; Francisco Amil-Ruiz; José L Caballero
Journal: Plants (Basel) Date: 2022-06-15

6. Phylogenomics and Genetic Diversity of Arnebiae Radix and Its Allies (Arnebia, Boraginaceae) in China.

Authors: Jiahui Sun; Sheng Wang; Yiheng Wang; Ruishan Wang; Kangjia Liu; Enze Li; Ping Qiao; Linyuan Shi; Wenpan Dong; Luqi Huang; Lanping Guo
Journal: Front Plant Sci Date: 2022-06-09 Impact factor: 6.627

7. Individualistic evolutionary responses of Central African rain forest plants to Pleistocene climatic fluctuations.

Authors: Andrew J Helmstetter; Kevin Béthune; Narcisse G Kamdem; Bonaventure Sonké; Thomas L P Couvreur
Journal: Proc Natl Acad Sci U S A Date: 2020-12-04 Impact factor: 12.779

Review 8. Evolution and diversity of the angiosperm anther: trends in function and development.

Authors: Johanna Åstrand; Christopher Knight; Jordan Robson; Behzad Talle; Zoe A Wilson
Journal: Plant Reprod Date: 2021-06-26 Impact factor: 3.767

9. Speciation along a latitudinal gradient: The origin of the Neotropical cycad sister pair Dioon sonorense-D. vovidesii (Zamiaceae).

Authors: José Said Gutiérrez-Ortega; Francisco Molina-Freaner; José F Martínez; Miguel Angel Pérez-Farrera; Andrew P Vovides; Antonio Hernández-López; Ayumi Tezuka; Atsushi J Nagano; Yasuyuki Watano; Yuma Takahashi; Masashi Murakami; Tadashi Kajita
Journal: Ecol Evol Date: 2021-05-01 Impact factor: 2.912

10. Comparative Genomics of an Unusual Biogeographic Disjunction in the Cotton Tribe (Gossypieae) Yields Insights into Genome Downsizing.

Authors: Corrinne E Grover; Mark A Arick; Justin L Conover; Adam Thrash; Guanjing Hu; William S Sanders; Chuan-Yu Hsu; Rubab Zahra Naqvi; Muhammad Farooq; Xiaochong Li; Lei Gong; Joann Mudge; Thiruvarangan Ramaraj; Joshua A Udall; Daniel G Peterson; Jonathan F Wendel
Journal: Genome Biol Evol Date: 2017-12-01 Impact factor: 3.416