| Literature DB >> 35498716 |
Ting Yang1,2,3, Sunil Kumar Sahu1,2, Lingxiao Yang4, Yang Liu1,2, Weixue Mu1,2, Xin Liu1,2, Mikael Lenz Strube3, Huan Liu1,2,5, Bojian Zhong4.
Abstract
The plastid organelle is essential for many vital cellular processes and the growth and development of plants. The availability of a large number of complete plastid genomes could be effectively utilized to understand the evolution of the plastid genomes and phylogenetic relationships among plants. We comprehensively analyzed the plastid genomes of Viridiplantae comprising 3,654 taxa from 298 families and 111 orders and compared the genomic organizations in their plastid genomic DNA among major clades, which include gene gain/loss, gene copy number, GC content, and gene blocks. We discovered that some important genes that exhibit similar functions likely formed gene blocks, such as the psb family presumably showing co-occurrence and forming gene blocks in Viridiplantae. The inverted repeats (IRs) in plastid genomes have doubled in size across land plants, and their GC content is substantially higher than non-IR genes. By employing three different data sets [all nucleotide positions (nt123), only the first and second codon positions (nt12), and amino acids (AA)], our phylogenomic analyses revealed Chlorokybales + Mesostigmatales as the earliest-branching lineage of streptophytes. Hornworts, mosses, and liverworts forming a monophylum were identified as the sister lineage of tracheophytes. Based on nt12 and AA data sets, monocots, Chloranthales and magnoliids are successive sister lineages to the eudicots + Ceratophyllales clade. The comprehensive taxon sampling and analysis of different data sets from plastid genomes recovered well-supported relationships of green plants, thereby contributing to resolving some long-standing uncertainties in the plant phylogeny.Entities:
Keywords: Viridiplantae; gene blocks; inverted repeats; phylogenetics; plastid genome
Year: 2022 PMID: 35498716 PMCID: PMC9038950 DOI: 10.3389/fpls.2022.808156
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
FIGURE 1Characteristic features of plastid genomes. The genome size, protein-coding genes number, gene copy number, and intron number in Viridiplantae. Boxplots represent minimum, median, and maximum values.
FIGURE 2Overview of GC content in Viridiplantae. (A) GC content variation among the 14 major lineages of Viridiplantae. (B) GC content variation based on five sets of 72 protein-coding genes represented by first base (GC1), the second base (GC2), the third base (GC3) of codon, along with GC123 and GC12. (C) GC content variation in psb family genes. (D) GC content variation of five genes located in IR and non-IR region. Boxplots represent minimum, median, and maximum of GC content. Asterisks (*) represent the significant difference from respective genes using Student’s t-test (***p < 0.001); ns = not significant.
FIGURE 3Plastid phylogenomic tree inferred based on the matrix nt12 of 72 protein-coding genes of 3,654 green plants and six Rhodophyta using IQTREE. The colors in the internal circle indicate different families whereas the colors in the external circle indicate different orders (Further details can be found in Supplementary Figure 11). The green branches represent the branch with more than 95% UFboot.
FIGURE 4Summary of the phylogenomic tree based on three data sets (nt12, nt123, and AA) of 72 plastid protein-coding genes of 3,654 green plants and six Rhodophyta using IQTREE. The colored branch and vertical lines (on the right side of the tree) represent the clade with conflicting phylogenetic placements based on three data sets. Totally, 631 taxa were obtained by selecting one to three representatives from each family and at least one taxon for the families with fewer taxon sampling, and the tree is represented at the order level in the figure.
FIGURE 5Various branching orders for the phylogenetically discordant relationships. (A) Early Viridiplantae diversification, (B) early diversification of green algae, (C) the lineages of angiosperms, (D) early embryophyte diversification. The summarized topology is based on three data sets (nt12, nt123, and AA) of 72 protein-coding genes of 3,654 green plants and six Rhodophyta using IQTREE, including 1KP data set (nuclear gene-based).