Literature DB >> 33871608

Large-Scale Phylogenomic Analyses Reveal the Monophyly of Bryophytes and Neoproterozoic Origin of Land Plants.

Danyan Su1, Lingxiao Yang1, Xuan Shi1, Xiaoya Ma1, Xiaofan Zhou2, S Blair Hedges3, Bojian Zhong1.   

Abstract

The relationships among the four major embryophyte lineages (mosses, liverworts, hornworts, vascular plants) and the timing of the origin of land plants are enigmatic problems in plant evolution. Here, we resolve the monophyly of bryophytes by improving taxon sampling of hornworts and eliminating the effect of synonymous substitutions. We then estimate the divergence time of crown embryophytes based on three fossil calibration strategies, and reveal that maximum calibration constraints have a major effect on estimating the time of origin of land plants. Moreover, comparison of priors and posteriors provides a guide for evaluating the optimal calibration strategy. By considering the reliability of fossil calibrations and the influences of molecular data, we estimate that land plants originated in the Precambrian (980-682 Ma), much older than widely recognized. Our study highlights the important contribution of molecular data when faced with contentious fossil evidence, and that fossil calibrations used in estimating the timescale of plant evolution require critical scrutiny.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  bryophytes; land plants; maximum bounds; phylogeny; timescale

Year:  2021        PMID: 33871608      PMCID: PMC8321542          DOI: 10.1093/molbev/msab106

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

Streptophyta comprises all land plants (embryophytes) and six monophyletic groups of streptophyte green algae, and they have significantly contributed to global environmental change in geological history (Kenrick et al. 2012; Lenton et al. 2012, 2016). The colonization of the terrestrial realms by early land plants is one of most important events in the evolution of life on earth, causing soil formation, increasing primary productivity, impacting weathering and global climates, and establishing new habitats for animals that increased their diversity (Heckman et al. 2001; Parnell and Foster 2012). The origin and evolution of the various groups of land plants largely initiated our modern terrestrial ecosystems. It is widely accepted that land plants evolved from streptophyte green algae which adapted to freshwater conditions early in their history (Becker and Marin 2009). During the invasion of plants onto land, they had to overcome enhanced ultraviolet (UV) radiation, water deficit, salinity, and other environmental stresses (Fang et al. 2017). Deciding which plant innovations were fundamental in the transformation from freshwater algae to land-dwelling forms has been long-debated (Ligrone et al. 2012; Hori et al. 2014; Buschmann and Zachgo 2016; Chater et al. 2017; Jill 2017; Reski 2018; Szovenyi et al. 2019). However, studies of trait evolution are hindered by the lack of resolution of early-diverging land plant phylogeny (Niklas and Kutschera 2010; Rensing 2018). Support for Zygnematophyceae as sister to land plants is strengthened by recent phylogenomic analyses (Zhong et al. 2013, 2015; Wickett et al. 2014; Cheng et al. 2019; One Thousand Plant Transcriptomes Initiative 2019; Jiao et al. 2020), but the relationships among the four major groups of land plantsmosses, liverworts, hornworts, vascular plants—are still unsettled (Cox 2018; Puttick et al. 2018; de Sousa et al. 2019; One Thousand Plant Transcriptomes Initiative 2019; Bell et al. 2020). Bryophytes—that is, liverworts, mosses, and hornworts—are the second-most diverse group of land plants (Laenen et al. 2014; Tomescu et al. 2018). They play key roles in tracing the evolution of important characters associated with the terrestrialization process. However, the relationships of bryophytes (hornworts, liverworts, and mosses) and tracheophytes have long been controversial, hindering our ability to understand early land plant evolution. Among three bryophyte lineages, recent molecular studies have reached a consensus supporting mosses and liverworts as forming a natural group (Wickett et al. 2014; Cox 2018; Puttick et al. 2018; de Sousa et al. 2019; One Thousand Plant Transcriptomes Initiative 2019). Morphological similarities between the moss and liverwort locomotory apparatus (e.g., centrioles, flagella, unique microtubule, and lamellar arrays) also have supported the same conclusion (Renzaglia et al. 2018). In previous studies, hornworts were placed as sister to either the clade of ((mosses+liverworts)+tracheophytes) (Renzaglia et al. 2000; Wickett et al. 2014) or tracheophytes (Qiu et al. 2006; Puttick et al. 2018), or the clade of mosses and liverworts (Puttick et al. 2018; de Sousa et al. 2019; Zhang et al. 2020). The position of hornworts is the only uncertainty and the key point for resolving early land plant phylogeny. Puttick et al. (2018) and de Sousa et al. (2019) revisited the phylogenetic relationships among the four major lineages of land plants by considering compositional heterogeneity and substitutional saturation using the large-scale data set from Wickett et al. (2014). Although both studies recovered the monophyly of bryophytes, Puttick et al. (2018) could not reject two hypotheses where bryophytes were not monophyletic: liverworts-mosses or liverworts as sister group to the remaining land plants. Similarly, de Sousa et al. (2019) increased branch support for a monophyletic bryophyte group, but this hypothesis was supported using a small number of genes and taxa. Thus, a well-supported phylogenetic relationship among bryophytes (hornworts, liverworts, and mosses) and tracheophytes is crucial to better understand the evolutionary novelties for plant colonization of land. Establishing the timescale of Streptophyta is essential for testing the hypotheses of codiversification between plants and animals. Although the fossil record offers windows into the history of plants, it is well known to be incomplete and insufficient to provide a coherent picture of land plant history. Inferring ancient evolutionary timescales has often suffered from limited taxon sampling (Foster et al. 2017), biases in methodologies (dos Reis et al. 2015), and especially the rarity of unambiguous fossils (Massoni et al. 2015; dos Reis et al. 2016). Acritarchs are taxonomically ambiguous microfossils from the Palaeoproterozoic to Early Ordovician. Many have been interpreted as phytoplankton, but their exact biological affinities are difficult to confirm for establishing a well-resolved timetable of early-diverging streptophyte algae (Moczydłowska et al. 2011). Cryptospores are critical for understanding the nature of the earliest land floras. Nevertheless, the fossil record of cryptospores is scattered and incomplete, and their affinities are contentious (Edwards et al. 2014). The lack of unambiguous bryophyte fossils has been a problem for calibrating molecular clock studies of early land plants (Tomescu et al. 2018). The poor fossil records of these key lineages make it difficult to provide reliable fossil constraints for divergence time estimation. There have been a handful of molecular studies performed to estimate divergence times for early land plants (Heckman et al. 2001; Clarke et al. 2011; Magallon et al. 2013; Morris et al. 2018). Clarke et al. (2011) tested the impact of changing the maximum constraints on the three basal nodes (liverwort, moss, and hornwort) of early land plants. When the maximum constraints were changed from 1,042 to 509 Ma, the mean age estimate for the origin of embryophytes was younger and differed by 165 My. Compared with previous studies, Morris et al. (2018) estimated a much younger origin time of land plants (Middle Cambrian—Early Ordovician: 515–470 Ma), which was limited by a narrow temporal calibration (516–469 Ma). Hedges et al. (2018) questioned the validity of the maximum constraints of some nodes in Morris et al. (2018) because calibrations based on fossil absence are less reliable. They re-estimated the divergence times after removing some maximum bounds, and inferred an older origin time for embryophytes (793–560 Ma) than that obtained in Morris et al. (2018). In agreement with the findings from Battistuzzi et al. (2015), the maximum constraints of calibrated nodes have a pervasive and significant impact upon molecular clock estimates. Considering the rare reliable fossils and the difficulties of determining appropriate interpretations of the fossil record, there are great challenges for establishing the timescale of land plants. In this study, we inferred the phylogeny and divergence times of all major lineages of Streptophyta, especially focusing on relationships among the three bryophyte groups and the time of origin of land plants. We increased taxon sampling among bryophytes (especially hornworts) and explored the effect of analytical errors, stemming from sequence biases, among the 1,440 genes in the data set. Species tree inferences under the multispecies coalescent model and concatenation approach supported the monophyly of bryophytes. To test the impact of fossil calibrations, we implemented three different maximum bounds in divergence time estimation using 100 clock-like genes and 22 fossil calibrations. Our new well-resolved phylogeny and timescale of land plants provide the foundation for accurate interpretation of the development of plant traits during the water-to-land transition.

Results and Discussion

The Well-Supported Monophyletic Bryophytes

Transcriptomes provide a large amount of nuclear data for phylogenomic reconstruction of plants, whereas there exists a serious deficiency of available molecular data from hornworts. Only one study that used the transcriptomes of 1,124 species (One Thousand Plant Transcriptomes Initiative 2019) made efforts in increasing taxon sampling of hornworts so far. We expanded taxon sampling within bryophytes (72 species, including nine additional hornwort species) compared with Puttick et al. (2018) and de Sousa et al. (2019). Our molecular data included 1,440 nuclear genes from 120 streptophyte species with three Chlorophyta as outgroups. The multispecies coalescent model and concatenation approach were both used for reconstructing species trees. The concatenation analyses of nucleotide data using maximum likelihood and Bayesian methods supported hornworts as sister to all other land plants (supplementary figs. S1 and S2, Supplementary Material online). It has been widely reported that substitutional saturation can affect branch support and induce a phylogenetic artifact (Jeffroy et al. 2006; Liu et al. 2014). Recent molecular studies have indicated that fast-evolving synonymous substitutions may lead to the nonmonophyly of bryophytes using nucleotide data (e.g., Cox et al. 2014; Li et al. 2014; Liu et al. 2014; de Sousa et al. 2019). To mitigate this error stemming from fast-evolving synonymous substitutions, we reanalyzed our nucleotide data in which synonymous nucleotides at codon sites were recoded using nucleotide ambiguity codes and synonymous substitutions were eliminated. The three bryophyte lineages were recovered as a monophyletic group with strong support using codon-degenerated data (fig. 1 and supplementary figs. S3 and S4, Supplementary Material online). In addition, maximum likelihood and Bayesian analyses based on amino acid data also strongly recovered the monophyly of bryophytes and achieved well-supported resolution of relationships among all major lineages of bryophytes (fig. 1 and supplementary figs. S5 and S6, Supplementary Material online). These results indicated that reducing fast-evolving synonymous substitutions by codon-degeneracy recovered monophyly of bryophytes in agreement with the results using an amino acid matrix. To assess the effect of compositional heterogeneity, maximum likelihood analyses were employed on the concatenated amino acid data using a site-heterogeneous mixture model (LG+C20+F+R5) and Dayhoff recoding strategy, respectively. These analyses resulted in bryophyte monophyly with high support (supplementary figs. S7 and S8, Supplementary Material online), indicating that the compositional heterogeneity of 1,440 concatenated genes does not impose a significant bias in reconstructing the relationships among major lineages of land plants.
Fig. 1.

The concatenation species tree of land plants and their algal relatives. The cladogram is reconstructed using an amino acid supermatrix of 1,440 genes by IQ-TREE. Nodal support values are estimated by SH-aLRT test (SH) and ultrafast bootstrap (UFBS) in IQ-TREE, and Bayesian posterior probabilities (BPP) in ExaBayes. The first three are SH, UFBS, and BPP values based on codon-degenerate nucleotide data, and the last three are SH, UFBS, and BPP values based on amino acid data. The nodes without values indicate full support.

The concatenation species tree of land plants and their algal relatives. The cladogram is reconstructed using an amino acid supermatrix of 1,440 genes by IQ-TREE. Nodal support values are estimated by SH-aLRT test (SH) and ultrafast bootstrap (UFBS) in IQ-TREE, and Bayesian posterior probabilities (BPP) in ExaBayes. The first three are SH, UFBS, and BPP values based on codon-degenerate nucleotide data, and the last three are SH, UFBS, and BPP values based on amino acid data. The nodes without values indicate full support. Considering the substitutional saturation in nucleotide data, only the amino acid data were used for coalescent analysis. The coalescent-based species tree resolved bryophytes as a robust monophyletic group (fig. 2 and supplementary figs. S9 and S10, Supplementary Material online). Mosses and liverworts constitute a clade with full support, which is consistent with previous analyses based on chloroplast genes (Nishiyama et al. 2004; Goremykin and Hellwig 2005; Karol et al. 2010; Ruhfel et al. 2014) and nuclear genes (Wickett et al. 2014; Puttick et al. 2018; de Sousa et al. 2019). The hornwort clade is identified as the sister-group to the clade of mosses plus liverworts (fig. 2). In mosses, the relationships among the 17 orders are similar to those obtained by Liu et al. (2019). In addition, our phylogenies provided strong support for a sister relationship between Zygnematophyceae and land plants that was strengthened by recent studies (Zhong et al. 2013, 2014; Wickett et al. 2014; Puttick et al. 2018; Cheng et al. 2019; Jiao et al. 2020).
Fig. 2.

The coalescent species tree of land plants and their algal relatives. The cladogram is reconstructed using 1,440 genes by ASTRAL. Nodal support values are estimated by local posterior probability and multilocus bootstrapping (gene+site resampling) (PP/MLBS), and the nodes without values indicate full support.

The coalescent species tree of land plants and their algal relatives. The cladogram is reconstructed using 1,440 genes by ASTRAL. Nodal support values are estimated by local posterior probability and multilocus bootstrapping (gene+site resampling) (PP/MLBS), and the nodes without values indicate full support.

New Timescale for Early Land Plant Evolution

The Bayesian methods of clock dating can incorporate complex parameters (e.g., birth rate, death rate, sample fraction, and rate drift) to establish various relaxed clock models for describing the uncertainty in the fossil record and variation in evolutionary rate (Ho 2014). Many studies have found that fossil calibrations are the most important variables in molecular clock dating (Magallon et al. 2013; Barba-Montoya et al. 2018; Nie et al. 2020). Fossil evidence is the most common type of information for calibration, converting molecular sequence change into estimates of absolute times and rates (Barba-Montoya et al. 2017). We applied fossil calibrations following recommendations (Parham et al. 2012), emphasizing fewer but more reliable calibration points. However, well-dated fossils only provide reliable minimum constraints for divergence times, and a simple hard minimum bound is insufficient information for establishing a robust timescale. Two approaches are typically used for maximum constraints. One is to assign a parametric distribution on the age of the fossil (minimum) calibration, such as the gamma, lognormal, or the truncated Gaussian distribution, whereas the shapes of these prior distributions are often established without basic biological justification (Chazot et al. 2019). The second approach assumes that a clade has not yet evolved in a geologic period if there is an absence of available fossils in that period. In this case, the upper bound of the period is the maximum constraint for the clade. It is challenging to prove that the absence of a taxon is true rather than due to the incompleteness of the fossil and rock records (Marshall 2019). Despite these two approaches, researchers have not reached a consensus on how to establish a suitable maximum age for a lineage divergence (Donoghue and Yang 2016). Because the choice of a maximum constraint for crown embryophytes has been controversial (e.g., Clarke et al. 2011: 1,042 Ma; Morris et al. 2018: 515.5 Ma), we focused on exploring the influence of different maximum bounds for the origin of land plants. The extensive sampling of early embryophyte lineages in our study provides substantial transcriptomic data to estimate the divergence times for early land plants. Molecular clock analyses were performed using a Bayesian relaxed clock method (MCMCTree) (Yang 2007) based on 100 clock-like genes and the coalescent-based species tree (fig. 2). We employed three fossil calibration strategies to accommodate different maximum bounds of crown embryophytes and the internal calibration nodes among monophyletic bryophytes. In the first calibration strategy (hereinafter referred to as “Strategy 1”), the soft maximum constraints (515.5 Ma) were based on the first appearance of cryptospores following Morris et al. (2018). For the second calibration strategy (hereinafter referred to as “Strategy 2”), we applied a more conservative maximum bound (1,042 Ma) used by Clarke et al. (2011). In the third calibration strategy (hereinafter referred to as “Strategy 3”), we specified a truncated Cauchy distribution on the nodes. Our results indicate that the divergence times of early embryophyte lineages are highly sensitive to the maximum limits of calibration nodes (fig. 3 and table 1). Strategy 1 estimated that crown embryophytes originated at 518–500 Ma (Cambrian) (fig. 3 and supplementary fig. S11, Supplementary Material online), which is consistent with the estimated time obtained from Morris et al. (2018) that used a similar maximum calibration. However, Strategies 2 and 3 produced much older estimates, inferring a Neoproterozoic origin of embryophytes (Strategy 2: 980–682 Ma; Strategy 3: 919–639 Ma) (figs. 3 and supplementary fig. S12, Supplementary Material online).
Fig. 3.

Comparison of results from three different strategies. (a) Comparison of divergence time estimates from three fossil strategies. Node ages are plotted at the posterior means, with horizontal bars representing 95% credibility intervals. (b) Comparison of the age distributions on the crown node of land plants for the specified priors (black line), effective priors (purple), and posteriors (orange red).

Table 1.

The 95% HPD Age Estimates of Major Lineages Using Three Strategies.

CladeStrategy 1 (Ma)Strategy 2 (Ma)Strategy 3 (Ma)
Embryophyta518–500980–682919–639
Bryophytes507–472902–614855–584
Mosses460–397683–450643–419
Liverworts439–405580–405550–373
Hornworts418–225569–264538–246
Marchantiopsida303–227311–227295–191
Jungermanniopsida386–241454–280418–251
Tracheophyta503–462880–593830–566
Lycopodiophyta459–386722–388688–382
Euphyllophyta473–416726–467693–455
Monilophyta425–376568–372557–361
Spermatophyta368–332369–338369–338
Gymnospermae338–308339–308340–308
Angiospermae256–209258–217257–216
Comparison of results from three different strategies. (a) Comparison of divergence time estimates from three fossil strategies. Node ages are plotted at the posterior means, with horizontal bars representing 95% credibility intervals. (b) Comparison of the age distributions on the crown node of land plants for the specified priors (black line), effective priors (purple), and posteriors (orange red). Timetree of Streptophyta inferred from Strategy 2. Node ages are plotted at the posterior means and horizontal bars represent 95% credibility intervals. A total of 22 fossil calibration nodes are marked by red dots. The 95% HPD Age Estimates of Major Lineages Using Three Strategies. We further compared the user-specified priors, effective priors, and posteriors (supplementary figs. S13–S15, Supplementary Material online). The user-specified prior (also called “user prior”) is the temporal fossil calibration prior on individual node, and it is truncated in the construction of the effective prior (also called “marginal prior” or “joint prior”) on times by ensuring that ancestral nodes are older than descendant nodes (dos Reis et al. 2015; Warnock et al. 2015; Brown and Smith 2018). Effective priors are dependent on the interaction among user priors, topology, and the birth–death process that can specify the distribution of the ages of the noncalibration nodes. Although the effective and user-specified priors should be the same ideally, the effective priors are usually different from the user-specified calibration densities due to the effects of truncation (Barba-Montoya et al. 2017), and sometimes it implies the interaction of “pseudodata” in the user priors (Brown and Smith 2018). Posterior distributions are generated when using the data in divergence time estimation. A noticeable discordance between effective priors and posteriors indicates that posterior time estimates are not simply dependent on priors. In calibration Strategy 1, there is a nearly half-overlap of the effective prior and posterior distribution, especially in their estimated peak values (fig. 3). We did not expect such overfitting of joint priors and posteriors in view of uncertainties in the fossil record. Instead, the ideal situation is one in which the posteriors are different from the effective priors. Our results indicate that Strategy 1 is being largely constrained by the fossil calibrations instead of supported by the molecular data. If there are potential problems with the calibration information of Strategy 1, these comparisons imply that the sequence data lack sufficient information to overrule the narrow temporal range (515.5–469 Ma). Similar to Morris et al. (2018), we employed the maximum age of the oldest-possible nonmarine palynomorphs as the maximum bound for constraining the crown embryophytes in Strategy 1. Cryptospores are nonmarine sporomorphs, and there is little knowledge about their producers and affinity (Edwards et al. 2014). Yin et al. (2013) presented new cryptospore-like microfossils during the Cambrian period, which may have originated from land plants or primitive plant sporoderm types. These microfossils suggest that land plants possibly occurred earlier than the Cambrian. Using this age as maximum bound of crown embryophytes may mistakenly estimate a younger origin time if they actually belong to the crown group rather than the stem lineage of land plants. In contrast, there is a considerable distinction between effective prior and posterior densities in Strategies 2 and 3 (fig. 3), indicating that the molecular data contributed the information relevant to the age of embryophytes and the age estimates were not entirely driven by priors. The difference between the effective prior and user-specified calibration density in Strategy 3 implied the interaction of “pseudodata” (fig. 2). The truncated Cauchy distributions (Strategy 3) employed at nodes 4–12 might be unwarranted. Although the Cauchy distribution attempts to assume a genuine condition rather than a diffuse uniform fossil calibration prior, it greatly constrained the age bounds. Compared with Strategies 1 and 3, employing broad uniform priors (i.e., Strategy 2) is a more fruitful approach for relaxing the excessive influence of fossil calibration distributions (Brown and Smith 2018). The fossil record provides important biological evidence to establish user-specified priors, and the user priors interact with the tree priors to establish effective priors. Although the data may correct prior assumptions if priors are unrealistic (Bromham 2019), the reliability and precision of fossil calibrations still have a great impact on the estimated divergence times even with an infinite amount of data (Rannala and Yang 2007). We should consider the reliability of fossil calibrations firstly, followed closely by the influences of molecular data. The maximum constraint of crown embryophytes in Strategy 1 is controversial, and the prior information of crown embryophytes from Strategy 1 affected posteriors most obviously among three strategies. Taken together, our analyses support that crown embryophytes originated in the Neoproterozoic (fig. 4, Strategy 2: 980–682 Ma).
Fig. 4.

Timetree of Streptophyta inferred from Strategy 2. Node ages are plotted at the posterior means and horizontal bars represent 95% credibility intervals. A total of 22 fossil calibration nodes are marked by red dots.

Bryophytes are the earliest-diverging embryophytes and the second-most diverse group of land plants, but they have few reliable fossils for deepening our understanding of the colonization of the land by plants. The important character of embryophytes, sporopollenin, is an extremely resistant polymer for the outer wall of all land-plant spores and pollen grains, improving the preservation potential of the plants themselves, and especially the spores (Li et al. 2019). The discovery of fossil spore assemblages offers new windows into the origin of early land plants (Edwards et al. 2014). Cryptospores with permanent tetrads are regarded as spores of land plants (Morris et al. 2018). The well-preserved cryptospores from the Dapingian Zanjon Formation in Argentina (Tetrahedraletes cf. Medinensis: 469 Ma) provide the earliest fossil record for land plants (Rubinstein et al. 2010; Morris et al. 2018). However, the accurate phylogenetic position of these cryptophytes remains unclear because of the highly fragmented fossils. Clarke et al. (2011) also indicated that crown land plants gradually enhanced fossilization potential, such as the development of a thickened cuticle and spore walls. The earliest land plants may have had low potential for fossil preservation. The source plants of many enigmatic cryptospores are not likely to belong to the first land plants. Paleontologists have re-examined the fossil evidence of bryophytes, and suggested that the traditional view about the poor bryophyte fossil record needs to be revised (Tomescu et al. 2018). High potential and various modes for fossil preservation of bryophytic material, along with the extensive stratigraphic range of fossils, fully imply that many exquisite fossils of bryophytes will be discovered in the future (Tomescu et al. 2018). The similarities between the Dapingian cryptospore assemblage and younger cryptospore occurrences possibly indicate that phenotypic change in the early evolution of embryophytes was slow (Rubinstein et al. 2010). This evidence implies that a time gap may exist between the actual origin of land plants and the earliest fossil record based on cryptospores, which is consistent with our new timescale of land plant origin. The recent discovery of a remarkably well-preserved green algal fossil from the Precambrian (1,000 Ma) (Tang et al. 2020) also offers circumstantial evidence. This fossil is a member of the Order Siphonocladales, Class Ulvophyceae, and is nested within the crown group Chlorophyta. The occurrence of this multicellular fossil confirms that Chlorophyta, at least, had originated before 1,000 Ma. Although fully consistent with our molecular timescale of plant evolution, it contradicts other studies including that of Morris et al. (2018), who obtained time estimates for the origin of Chlorophyta hundreds of millions of years younger than the new chlorophyte fossil.

Conclusions

Our phylogenomic analyses markedly improved the robustness of early land plant relationships, and strongly supported the monophyly of bryophytes using a large nuclear data set and dense taxon sampling. We further show that the controversy over land plant relationships has largely stemmed from nucleotide analyses that did not fully accommodate biases from fast-evolving synonymous substitutions. In evolutionary time estimation, fossil calibrations and molecular data are the crucial sources of actual biological information. Changing the calibrations will obviously influence time estimation, considering the importance of fossil calibrations in time priors. High consistency among different analyses will strengthen the confidence for a certain result, whereas inconsistency demands reasonable explanation. By comparing user priors, effective priors, and posteriors, we gained insight into why past studies have differed widely in their estimated time of the origin of land plants. We found that studies favoring a Neoproterozoic origin of land plants (980–682 Ma) are informed more by molecular data whereas those favoring a Phanerozoic origin (518–500 Ma) are informed more by fossil constraints. Our divergence time analyses highlighted the important contribution of the molecular data (time-dependent molecular change) when faced with contentious fossil evidence. Fossil calibrations used in estimating the timescale of plant evolution require greater scrutiny, and more efforts are needed to explain the results of disparate estimated times. A careful integration of fossil and molecular evidence will revolutionize our understanding of how land plants evolved.

Materials and Methods

Algal Strains and Culture Conditions

Lamprothamnium succinctum (NIES-1606), Nitella flexilis (NIES-1611), Nitellopsis obtusa (NIES-1638), and Gonatozygon brebissonii (NIES-138) strains were obtained from the Microbial Culture Collection at the National Institute for Environmental Studies, Tsukuba, Japan. All strains were grown in soil and cultured at 20 °C under alternating 12 h-light/12 h-dark periods. Lamprothamnium succinctum, N. flexilis, and Nitellopsis obtusa were grown in mSWC-2 medium (Okazaki et al. 1984; Sakayama et al. 2004), and G. brebissonii was grown in C medium (Ichimura 1971).

Data Processing, Assembly, and Annotation

We sequenced transcriptomes of four charophyte species using Illumina HiSeq technologies. Library construction and sequencing were performed at Novogene Bioinformatics Technology Co., Ltd (Beijing, China). Illumina raw data from Lamprothamnium succinctum, N. flexilis, Nitellopsis obtuse, and G. brebissonii were filtered by removing reads containing adapters, reads with more than 10% ambiguous bases (N), and low-quality reads (more than 50% bases with small Qphyred ≤20). We assembled four transcriptomes using Trinity with default settings (Grabherr et al. 2011), except that min_kmer_cov was set to 2. The transcripts were clustered into genes by using Corset (Davidson and Oshlack 2014) with default parameters. All of the assembled unigenes were BLAST against the NCBI nonredundant protein database (Nr), NCBI nonredundant nucleotide database (Nt) and Swiss-Prot to predict protein function with the E-value cutoff of 10−5, and Eukaryotic Ortholog Groups of protein database (KOG) with the E-value cutoff of 10−3 (Altschul et al. 1997). We identified for each gene the protein domains and unannotated regions using the Pfam database (Finn et al. 2014). All the unigenes were functionally annotated in the KEGG database (Moriya et al. 2007). The gene ontology (GO) classification of each gene model was carried out using Blast2GO v2.5 (Gotz et al. 2008).

Taxon Sampling and Data Collection

The 124 Streptophyta taxa were sampled from 63 orders including 68% bryophyte orders (supplementary table S1, Supplementary Material online). Three Chlorophyta taxa were designated as outgroups. The public genome data were downloaded from Phytozome (http://phytozome.jgi.doe.gov/pz/portal.html, last accessed January 1, 2019), GenBank (GCA_000708835.1), GigaDB (http://gigadb.org/dataset/view/id/100209, last accessed January 1, 2019), Dryad Digital Repository (https://datadryad.org/resource/doi:10.5061/dryad.0vm37.2, last accessed January 1, 2019), and Orcae (https://bioinformatics.psb.ugent.be/orcae/overview/Chbra, last accessed January 1, 2019). The 90 transcriptomes were obtained from the 1KP project. To address the limitation of sparse taxon sampling of bryophytes and charophytes, we newly sequenced four transcriptomes from previously unsampled charophyte species, and assembled 13 transcriptomes of bryophytes that were downloaded from NCBI SRA database (supplementary table S1, Supplementary Material online) by using Trinity with default settings (min_kmer_cov=2).

Ortholog Identification

We firstly used protein sequences predicted from the complete genomes of 15 selected species (Arabidopsis thaliana, Daucus carota, Oryza sativa, Amborella trichopoda, Gnetum montanum, Ginkgo biloba, Selaginella moellendorffii, Physcomitrella patens, Sphagnum fallax, Marchantia polymorpha, Chara braunii, Klebsormidium nitens, Chlamydomonas reinhardtii, Coccomyxa subellipsoidea, Ostreococcus lucimarinus) to generate putative ortholog groups by utilizing the tree-based approach (Yang and Smith 2014). The reduction of sequence redundancy was carried out using CD-HIT (-c 0.995 -n 5) (Fu et al. 2012). Homology searches were conducted using all-by-all BlastP from peptides of 15 complete genomes with an E value cutoff of 10 and -max_target_seqs 1,000. We used a hit_fraction cutoff of 0.5 to filter BlastP hits and set IGNORE_INTRASPECIFIC_HITS to be True. Markov clustering (MCL v14) (van Dongen 2000) was performed on filtered data with the E value cutoff of 10−5 and an inflation value of 2. We excluded small clusters that contained less than 13 species and employed the “fasta_to_tree_pxclsq.py” (Yang and Smith 2014) for building homolog tree of each remaining cluster with default parameters. Based on the resulting trees, we trimmed long tips longer than a relative length cutoff 0.5 and more than ten times longer than its sister or exceeded an absolute value set to 2. The monophyletic tips were masked by the “mask_tips_by_taxonID_genomes.py” (Yang and Smith 2014). We cut long internal branches that were longer than 0.8 for reducing deep paralogs, and only retained the clusters including no less than 13 species. We repeated the above processes including building homolog trees, trimming (a relative length cutoff: 0.5; an absolute value: 1.7), masking, and cutting deep paralogs (long internal branches >0.7). We further pruned final homologous gene trees using the “prune_paralogs_MO.py” (minimal_taxa: 13) (Yang and Smith 2014) and resulted in a set of 2,148 clusters of putative orthologs. The transcriptomic and genomic data of other 112 species were then incorporated into the 2,148 core orthologous clusters using Orthograph v0.6.3 (Petersen et al. 2017) by default value.

Phylogenetic Analyses

Amino acid sequences of each orthologous group (OG) were aligned using MAFFT v7.310 (Katoh and Standley 2013), with the option “-localpair -maxiterate 1,000.” We excluded poorly aligned regions using Gblocks 0.91b (Castresana 2000) with “Allowed Gap Positions” set to “half” and other default parameters. OGs were removed when their longest sequence was shorter than 100 amino acids. We eliminated short sequences of each OG using trimAl v1.4 (Capella-Gutierrez et al. 2009) with the option -resoverlap 0.5 -seqoverlap 50. All alignments that did not contain ≥80% species were discarded, leaving 1,440 OGs. To reduce the potential effects of missing data, we further removed four species, each of which covered less than 70% OGs (<1,008 OGs), thereby reducing the number of species to 123. The corresponding nucleotide alignments of 1,440 OGs were generated using PAL2NAL (Suyama et al. 2006), and poorly aligned positions were excluded using Gblocks 0.91b with the “codon” model, half gaps allowed, and otherwise default settings. A concatenation-based method was used to infer phylogenetic trees for both nucleotide and amino acid data sets. Maximum likelihood (ML) analysis was performed using IQ-TREE v2.0.5 (Minh et al. 2020). Nodal support values were estimated using SH-aLRT test (Guindon et al. 2010) and ultrafast bootstrap (Minh et al. 2013) with 1,000 replicates. We applied ModelFinder (Kalyaanamoorthy et al. 2017) to select optimal partitioning schemes and appropriate amino acid or nucleotide substitution models, using -TESTMERGE and -rcluster 10 options (Lanfear et al. 2014). The best-fitting models of substitution for each matrix are presented in supplementary table S2, Supplementary Material online. Bayesian inferences were conducted by the MPI version of ExaBayes v1.5.1 (Aberer et al. 2014) under the GTR (nucleotide) and LG (amino acid) models with the same partitioning schemes as in ML analyses. Two independent MCMC runs with two chains were conducted from parsimony starting topologies sampling every 500 generations. ExaBayes runs continued until the termination condition of mean topological differences was less than 5% with at least 500,000 and 200,000 generations for nucleotide and amino acid matrix, respectively. Posterior distributions of trees were summarized using the “consense” script with 25% burn-in. Convergence was assumed when all parameters had effective sampling sizes (ESS) greater than 100 estimated with Tracer v1.6 (Rambaut et al. 2018), and potential scale reduction factors (PSRF) close to 1 using the “postProcParam” program in ExaBayes. We used three strategies to alleviate the effect of systematic errors: 1) analyzing the concatenated amino acid data under a site-heterogeneous mixture model (LG+C20+F+R5) that accounts for compositional heterogeneity across sites, 2) recoding the 20 amino acids of the original supermatrix into six-state groups under the Dayhoff recoding scheme, which were assigned numbers 0–5: C, FWY, HKR, ILMV, EDNQ, and AGPST (Dayhoff et al. 1978), and 3) recoding the nucleotide matrix with codon-degenerate characters (de Sousa et al. 2019), which use nucleotide ambiguity codes to eliminate all possible synonymous substitutions among codon variants of amino acids. The ML analysis of amino acid supermatrix under LG+C20+F+R5 model was performed using IQ-TREE with same parameter settings as above. The Dayhoff recoding strategy recodes the 20 amino acids into six groups on the basis of their chemical and physical properties. This strategy is commonly used in phylogenomic studies to reduce the effects of lineage-specific compositional heterogeneity (Hrdy et al. 2004;Susko and Roger 2007; Rota-Stabelli et al. 2013; Puttick et al. 2018). The phylogenetic reconstruction using the Dayhoff-recoded data set was executed under a 6-state GTR model using IQ-TREE, and all the other aspects (e.g., tree search strategy) remain the same as original amino acid matrix. In addition, the codon-degenerate nucleotide data were analyzed both using the ML and Bayesian methods under the optimal partitioning schemes and appropriate substitution models. For the coalescent method, only amino acid alignments of 1,440 OGs were used. Individual gene trees were reconstructed by using RAxML v8.2.12 (Stamatakis 2014) with the best fitting model (-m PROTGAMMAAUTO –auto-prot=bic) and 200 rapid bootstrap replicates. The species tree was inferred using ASTRAL-III v5.6.3 (Zhang et al. 2018) with nodal support values estimated by local posterior probability and multilocus bootstrapping (gene+site resampling). To reduce gene tree estimation error, we further considered contracting low support branches (below 20% bootstrap support) from individual gene trees.

Divergence Time Estimation

Data Set Assembly for Molecular Dating

We used MCMCTree v.4.9h from the PAML package (Yang 2007) to estimate divergence times, and conducted preliminary tests with the concatenated data set (1,440 OGs of 123 species) and fossil calibrations. Increasing the number of generations failed to improve convergence but raised computational burden for our large-scale data set (376,109 amino acid positions). Large amounts of data will generate an unpredictable computational burden under parameter-rich models, and different topologies and rate heterogeneity across genes may lead to mis-specified models (Smith et al. 2018). Foster et al. (2017) suggested selecting a subset of informative genes in molecular dating analyses, considering that time estimates are largely consistent in full compared with reduced data sets. “Clock-like” genes are defined as those that evolve in a clock-like manner (Jarvis et al. 2014), and they can minimize errors associated with model mis-specification (Smith et al. 2018). Therefore, we identified clock-like or nearly clock-like genes with low root-to-tip variance and minimal conflict using SortaDate (Smith et al. 2018). We calculated the variance of root-to-tip length for each rooted gene tree, and compared the individual gene trees with species trees by implementing bipartition-comparison analyses on each tree. The screening criteria were “1 = root-to-tip variance, 3 = bipartiton, 2 = tree-length.” Finally, 100 genes of 1,440 OGs were selected to be eligible as clock-like genes. These selected genes were likely to reduce the deviations caused by lineage-specific rate variation. The MCMC analyses were run for 20 million generations sampled every 500 generations after a burnin of 2,000,000 iterations. The chain convergence was assessed by running MCMC analyses twice simultaneously, and the effective sample size (ESS) of all parameters was confirmed to be >200 using Tracer v.1.6 (Rambaut et al. 2018).

Rate Priors

Two relaxed-clock models are often used for divergence time estimation, that is, the independent-rates (IR) model and the autocorrelated-rates (AR) model. Recent phylogenomic analyses have shown that the AR model is more suitable for the analysis of closely related species, and the IR model better fits distantly related taxa (e.g., Foster et al. 2017; Barba-Montoya et al. 2018; Nie et al. 2020). Our phylogenomic data set consists of many distantly related species, and the degree of autocorrelation would decrease as the rate differences among lineages amplify. Therefore, we used the IR model to estimate the divergence time of Streptophyta. The time unit was set to 100 My. Parameter σ2 was assigned a gamma prior G (1, 10) to determine the degree of rate variation across branches. In order to obtain a suitable prior on the mean of the rate μ (representing the overall rate), we compared the amino acid pairwise distance between Arabidopsis thaliana and Bryoandersonia illecebra (0.138 substitutions/site) using the LG + Γ4 + F model in CODEML (Yang 2007). The assumed divergence time between the two species is approximately 469 Ma (Edwards et al. 2014; Morris et al. 2018; Tomescu et al. 2018), and hence the mean rate was 0.138/4.69 = 0.0294 for 100 clock-like genes (meaning 2.94 × 10−10 amino acid substitutions per site per year). The shape parameter of the gamma distribution prior on rate was fixed to 2 following Morris et al. (2018), so that the scale parameter was set to 68.

Time Priors

Fossil calibrations and the birth–death process are the important sources of information for constructing the prior on times. The parameters for the birth–death process were set as λ = μ = 1 and ρ = 0.0003. The sampling proportion (ρ) of 0.03% was based on our sample size (120 taxa) compared with the number of extant Streptophyte species (∼386,969) (http://www.theplantlist.org, last accessed September 20, 2019; Shaw et al. 2011; Pteridophyte Phylogeny Group 2016). The ML estimates of the branch lengths were calculated based on the LG + Γ4 + F amino acid substitution model using CODEML program. The MCMCTree combined the calibration distributions and the birth–death process model to generate the joint priors. We ran the MCMC analyses without sequence data to obtain the effective priors.

Fossil Constraints

We applied 22 fossil calibrations in our analyses (fig. 3). The details of 22 selected fossil calibrations following contemporary standards (Parham et al. 2012) are available in the supplementary table S3, Supplementary Material online. Uniform distributions were employed at nodes 1–3 and 17–22 with a hard minimum age (pL = 1e-300) and a soft maximum age (pU = 0.025). We applied Cauchy distributions with 2.5% left tail probability at nodes 13–16. We tested the impact of different maximum bounds on nodes 4–12 owing to the controversy on the maximum constraint of land plants (Clarke et al. 2011; Hedges et al. 2018; Morris et al. 2018). In Strategy 1, we used uniform distributions with a hard minimum bound (pL = 1e-300) and a soft maximum bound (pU = 0.025) corresponding to 515.5 Ma for nodes 4–12. In Strategy 2, we changed the soft maximum ages from 515.5 to 1,042 Ma. In Strategy 3, no maximum bound was imposed on these nodes, and the minimum bound was represented using a truncated Cauchy distribution with 2.5% left tail probability. We constructed calibration densities for three calibration strategies by MCMCTreeR (Puttick 2019).

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.
  91 in total

1.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors:  J Castresana
Journal:  Mol Biol Evol       Date:  2000-04       Impact factor: 16.240

2.  Molecular evidence for the early colonization of land by fungi and plants.

Authors:  D S Heckman; D M Geiser; B R Eidell; R L Stauffer; N L Kardos; S B Hedges
Journal:  Science       Date:  2001-08-10       Impact factor: 47.728

Review 3.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors:  S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal:  Nucleic Acids Res       Date:  1997-09-01       Impact factor: 16.971

4.  Establishing a time-scale for plant evolution.

Authors:  John T Clarke; Rachel C M Warnock; Philip C J Donoghue
Journal:  New Phytol       Date:  2011-07-06       Impact factor: 10.151

5.  Chloroplast phylogeny indicates that bryophytes are monophyletic.

Authors:  Tomoaki Nishiyama; Paul G Wolf; Masanori Kugita; Robert B Sinclair; Mamoru Sugita; Chika Sugiura; Tatsuya Wakasugi; Kyoji Yamada; Koichi Yoshinaga; Kazuo Yamaguchi; Kunihiko Ueda; Mitsuyasu Hasebe
Journal:  Mol Biol Evol       Date:  2004-07-07       Impact factor: 16.240

6.  Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7.

Authors:  Andrew Rambaut; Alexei J Drummond; Dong Xie; Guy Baele; Marc A Suchard
Journal:  Syst Biol       Date:  2018-09-01       Impact factor: 15.683

7.  Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes.

Authors:  Malte Petersen; Karen Meusemann; Alexander Donath; Daniel Dowling; Shanlin Liu; Ralph S Peters; Lars Podsiadlowski; Alexandros Vasilikopoulos; Xin Zhou; Bernhard Misof; Oliver Niehuis
Journal:  BMC Bioinformatics       Date:  2017-02-16       Impact factor: 3.169

8.  Whole-genome analyses resolve early branches in the tree of life of modern birds.

Authors:  Erich D Jarvis; Siavash Mirarab; Andre J Aberer; Bo Li; Peter Houde; Cai Li; Simon Y W Ho; Brant C Faircloth; Benoit Nabholz; Jason T Howard; Alexander Suh; Claudia C Weber; Rute R da Fonseca; Jianwen Li; Fang Zhang; Hui Li; Long Zhou; Nitish Narula; Liang Liu; Ganesh Ganapathy; Bastien Boussau; Md Shamsuzzoha Bayzid; Volodymyr Zavidovych; Sankar Subramanian; Toni Gabaldón; Salvador Capella-Gutiérrez; Jaime Huerta-Cepas; Bhanu Rekepalli; Kasper Munch; Mikkel Schierup; Bent Lindow; Wesley C Warren; David Ray; Richard E Green; Michael W Bruford; Xiangjiang Zhan; Andrew Dixon; Shengbin Li; Ning Li; Yinhua Huang; Elizabeth P Derryberry; Mads Frost Bertelsen; Frederick H Sheldon; Robb T Brumfield; Claudio V Mello; Peter V Lovell; Morgan Wirthlin; Maria Paula Cruz Schneider; Francisco Prosdocimi; José Alfredo Samaniego; Amhed Missael Vargas Velazquez; Alonzo Alfaro-Núñez; Paula F Campos; Bent Petersen; Thomas Sicheritz-Ponten; An Pas; Tom Bailey; Paul Scofield; Michael Bunce; David M Lambert; Qi Zhou; Polina Perelman; Amy C Driskell; Beth Shapiro; Zijun Xiong; Yongli Zeng; Shiping Liu; Zhenyu Li; Binghang Liu; Kui Wu; Jin Xiao; Xiong Yinqi; Qiuemei Zheng; Yong Zhang; Huanming Yang; Jian Wang; Linnea Smeds; Frank E Rheindt; Michael Braun; Jon Fjeldsa; Ludovic Orlando; F Keith Barker; Knud Andreas Jønsson; Warren Johnson; Klaus-Peter Koepfli; Stephen O'Brien; David Haussler; Oliver A Ryder; Carsten Rahbek; Eske Willerslev; Gary R Graves; Travis C Glenn; John McCormack; Dave Burt; Hans Ellegren; Per Alström; Scott V Edwards; Alexandros Stamatakis; David P Mindell; Joel Cracraft; Edward L Braun; Tandy Warnow; Wang Jun; M Thomas P Gilbert; Guojie Zhang
Journal:  Science       Date:  2014-12-12       Impact factor: 47.728

9.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

10.  Uncertainty in the Timing of Origin of Animals and the Limits of Precision in Molecular Timescales.

Authors:  Mario dos Reis; Yuttapong Thawornwattana; Konstantinos Angelis; Maximilian J Telford; Philip C J Donoghue; Ziheng Yang
Journal:  Curr Biol       Date:  2015-10-22       Impact factor: 10.834

View more
  6 in total

Review 1.  Plant protein-coding gene families: Their origin and evolution.

Authors:  Yuanpeng Fang; Junmei Jiang; Xiaolong Hou; Jiyuan Guo; Xiangyang Li; Degang Zhao; Xin Xie
Journal:  Front Plant Sci       Date:  2022-09-07       Impact factor: 6.627

2.  Origin and adaptive evolution of UV RESISTANCE LOCUS 8-mediated signaling during plant terrestrialization.

Authors:  Zhenhua Zhang; Chenjie Xu; Shiyu Zhang; Chen Shi; Hong Cheng; Hongtao Liu; Bojian Zhong
Journal:  Plant Physiol       Date:  2022-01-20       Impact factor: 8.005

Review 3.  Evolutionary insight of plant cuticle biosynthesis in bryophytes.

Authors:  Haoyu Li; Cheng Chang
Journal:  Plant Signal Behav       Date:  2021-06-23

4.  Microbiome and related structural features of Earth's most archaic plant indicate early plant symbiosis attributes.

Authors:  Anchittha Satjarak; G Karen Golinski; Marie T Trest; Linda E Graham
Journal:  Sci Rep       Date:  2022-04-20       Impact factor: 4.996

5.  Cryogenian Glacial Habitats as a Plant Terrestrialisation Cradle - The Origin of the Anydrophytes and Zygnematophyceae Split.

Authors:  Jakub Žárský; Vojtěch Žárský; Martin Hanáček; Viktor Žárský
Journal:  Front Plant Sci       Date:  2022-01-27       Impact factor: 5.753

Review 6.  Pervasive genome duplications across the plant tree of life and their links to major evolutionary innovations and transitions.

Authors:  Xin Qiao; Shaoling Zhang; Andrew H Paterson
Journal:  Comput Struct Biotechnol J       Date:  2022-06-15       Impact factor: 6.155

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.