| Literature DB >> 35291941 |
Jian Liu1,2, Anders J Lindstrom3, Xun Gong4,5,6.
Abstract
BACKGROUND: Plastid genomes (plastomes) present great potential in resolving multiscale phylogenetic relationship but few studies have focused on the influence of genetic characteristics of plastid genes, such as genetic variation and phylogenetic discordance, in resolving the phylogeny within a lineage. Here we examine plastome characteristics of Cycas L., the most diverse genus among extant cycads, and investigate the deep phylogenetic relationships within Cycas by sampling 47 plastomes representing all major clades from six sections.Entities:
Keywords: Cycads; Cycas; Gene tree discordance; Plastid phylogenomics; Plastome evolution
Mesh:
Year: 2022 PMID: 35291941 PMCID: PMC8922756 DOI: 10.1186/s12870-022-03491-2
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Fig. 1Sliding-window analysis of the whole chloroplast genomes for 47 Cycas. The window length is set as 600 bp with the step size as 100 bp. The X-axis denotes the midpoint position of a window. Y-axis shows nucleotide diversity (Pi) of each window. The orange line denotes a Pi threshold of 0.006 to screen high variation regions
Fig. 2Genetic variation and substitution rates among plastid protein-coding genes (PCGs) and among functional groups. a Estimates of nonsynonymous (dN), synonymous (dS) substitution rates and dN/dS of 69 plastid protein-coding genes (PCG) with dN > 0.0003. b dN/dS c Nucleotide diversity (π) d percentage of variation (PV) and e gene length of functional groups. Detailed information of functional group is provided in Table S3
Fig. 3Phylogram of Maximum likelihood (ML) tree of Cycas based on full plastomic dataset. Colored dots on the branches represent different bootstrap percentage (BP) ranges indicated at the bottom left. Numbers on the nodes represent BP lower than 90% and corresponded Posterior Probabilities (PP) provided by MrBayes, respectively. The ‘-’ symbol next to BP indicates clade is not supported by Bayesian inference. The inset map depicts the distribution of the revealed clades with corresponded colors
Fig. 4Comparison of tanglegram of Cycas based on plastid protein-encoding genes. a Maximum likelihood cladogram of Cycas based on concatenated genes using IQTREE. Maximum likelihood bootstrap (BS) values and the Posterior Probabilities (PP) calculated from MrBayes are shown at nodes respectively, except nodes with 100% (BS) and 1.0 (PP), ‘-’ indicates no support value. b Cladogram generated by the coalescent method in ASTRAL-III. Numbers on the branches depict Local posterior probabilities (LPP), with LPP below 0.9 not shown. Conflicted lineages are highlighted in red font. The highlighted clades I-IV on the left are correspond to Fig. 3 and the subclades corresponded to morphological classification are indicated on the right
Fig. 5Substitution rates of plastid protein-coding genes in different Cycas taxa. a The estimations of nonsynonymous (dN), synonymous (dS) substitution rates of 47 Cycas. Taxon are grouped by sections; b the dN/dS of six sections of Cycas, c dN/dS of four major phylogenetic clades; d dN/dS of seven revealed phylogenetic subclades. See Fig. 3 for the definition of sections and phylogenetic (sub)clades
Fig. 6Gene tree clusters revealed by TREESPACE landscape analyses (Robinson-Foulds) based on Maximum Likelihood (ML) tree topologies, and the comparisons of characteristics between the three clusters. a Principal coordinate analysis depicting ordinations of species trees versus 82 plastid protein-coding gene trees. Note that there are 11 genes overlapped in Cluster 3 (red dots). The gene names of each cluster can be found in Table 1. b-e Boxplots comparing three gene clusters for the variation in phylogenetic signal across the plastid genes, as identified by the TREESPACE analysis. Box-and-whisker plots indicate the median (horizontal line), 25th and 75th percentiles (bottom and top of the box), and limits of the 95% confidence intervals (lower and upper whiskers). Dots beyond the 95% confidence intervals are outliers. The asterisks indicate different levels of significant differences between clusters (*: <0.05, **: <0.01; ***: <0.001). b A comparison of the mean GC content of genes among clusters. c A comparison of the aligned sequence lengths (numbers of sites) of genes in the three clusters. d A comparison of the proportion of genetic variance for genes among clusters. e A comparison of the ratio of nonsynonymous to synonymous substitution rates for each gene in the three clusters
Fig. 7Gene tree clusters revealed by TREESPACE landscape analyses (Robinson-Foulds) based on Bayesian Inference (BI) tree topologies, and the comparisons of characteristics between the three clusters. a Principal coordinate analysis depicting ordinations of species trees versus 82 plastid protein-coding gene trees. Note that there are 12 genes overlapped in Cluster 3 (red dots). The gene names of each cluster can be found in Table 1. b-e Boxplots comparing three gene clusters for the variation in phylogenetic signal across the plastid genes, as identified by the TREESPACE analysis. Box-and-whisker plots indicate the median (horizontal line), 25th and 75th percentiles (bottom and top of the box), and limits of the 95% confidence intervals (lower and upper whiskers). Dots beyond the 95% confidence intervals are outliers. The asterisks indicate different levels of significant differences between clusters (*: <0.05, **: <0.01; ***: <0.001). b A comparison of the mean GC content of genes among clusters. c A comparison of the aligned sequence lengths (numbers of sites) of genes in the three clusters. d A comparison of the proportion of genetic variance for genes among clusters. e A comparison of the ratio of nonsynonymous to synonymous substitution rates for each gene in the three clusters
Gene clusters inferred by different tree inference methods (Maximum Likelihood and Bayesian Inference) and lists of genes in each cluster. Bolded gene names depict distinct genes revealed by two methods in each cluster. Note that the ycf1 gene is clustered with the combined tree dataset based on the Maximum Likelihood method, thus is not listed here
| Gene cluster | Names of genes grouped based on different tree inference methods | |
|---|---|---|
| Maximum Likelihood | Bayesian Inference | |
Cluster 1 (8 shared) | 8 genes: | 14 genes: |
Cluster 2 (56 shared) | 62 genes: | 56 genes: |
Cluster 3 (11 shared) | 11 genes: | 12 genes: |