| Literature DB >> 31579615 |
Joseph F Walker1, Nathanael Walker-Hale2, Oscar M Vargas3, Drew A Larson4, Gregory W Stull5.
Abstract
Evolutionary relationships among plants have been inferred primarily using chloroplast data. To date, no study has comprehensively examined the plastome for gene tree conflict. Using a broad sampling of angiosperm plastomes, we characterize gene tree conflict among plastid genes at various time scales and explore correlates to conflict (e.g., evolutionary rate, gene length, molecule type). We uncover notable gene tree conflict against a backdrop of largely uninformative genes. We find alignment length and tree length are strong predictors of concordance, and that nucleotides outperform amino acids. Of the most commonly used markers, matK, greatly outperforms rbcL; however, the rarely used gene rpoC2 is the top-performing gene in every analysis. We find that rpoC2 reconstructs angiosperm phylogeny as well as the entire concatenated set of protein-coding chloroplast genes. Our results suggest that longer genes are superior for phylogeny reconstruction. The alleviation of some conflict through the use of nucleotides suggests that stochastic and systematic error is likely the root of most of the observed conflict, but further research on biological conflict within plastome is warranted given documented cases of heteroplasmic recombination. We suggest that researchers should filter genes for topological concordance when performing downstream comparative analyses on phylogenetic data, even when using chloroplast genomes. ©2019 Walker et al.Entities:
Keywords: Angiosperms; Chloroplast; Gene tree conflict; Phylogenomics; Plastome; matK; rbcL; rpoC2
Year: 2019 PMID: 31579615 PMCID: PMC6764362 DOI: 10.7717/peerj.7747
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Summary of chloroplast conflict against the reference phylogeny of angiosperms.
Green, orange, and purple boxes indicate where the amino acid, nucleotide, or MQSST phylogenies conflict with the reference phylogeny. Pie charts depict the amount of gene tree conflict observed in the nucleotide analysis, with the blue, green, red and gray slices representing, respectively, the proportion of gene trees concordant, conflicting (supporting a single main alternative topology), conflicting (supporting various alternative topologies), and uninformative (BS < 70 or missing taxon) at each node in the species tree. The dashed lines represent 30 myr time intervals (positioned based on Magallón et al., 2015 and Vargas, Ortiz & Simpson, 2017) used to bin nodes for examinations of conflict at different levels of divergence.
Figure 2Gene tree concordance/conflict at varying time scales.
Each diagram represents a different molecule type and shows the proportion of concordance each gene exhibits at the five time slices shown in Fig. 1: (1) 150–121 mya, (2) 120–91 mya, (3) 90–61 mya, (4) 60–31 mya, (5) 30–0 mya and C is the concordance summed over all the time scales. The individual genes are scaled by length of alignment; however, ycf1 and ycf2 are cut to approximately the length of rpoC2 due to their extremely long alignments. (A) Results from amino acid data considering only nodes with bootstrap support ≥70%. (B) Results from nucleotide data considering only nodes with bootstrap support ≥70%. The plots along the bottom show relationships between gene concordance levels and alignment length and tree length, excluding outlying genes (see Methods). Each point represents the proportion of concordance considering only nodes with bootstrap support ≥70%. Red lines show the predicted values from logistic regression and asterisks give the p-value of the relationship from univariate logistic regression, ∗∗∗ = p < 0.001. (C) Logistic regression of concordant nodes from amino acid data against alignment length. (D) Logistic regression of concordant nodes from amino acid data against tree length. (E) Logistic regression of concordant nodes from nucleotide data against alignment length. (F) Logistic regression of concordant nodes from nucleotide data against tree length.
Figure 3Histograms depicting number of concordant edges each gene tree contains compared to the reference phylogeny (i.e., the AT).
Histograms are binned by number of concordant nodes; bar heights give the number of genes in each bin. Commonly used markers (matK, ndhF, rbcL, and ycf1) are labeled on the graph, along with the most concordant gene (rpoC2) and the number of concordant nodes for the complete chloroplast (CC) compared to the AT. (A) Concordance in the nucleotide dataset not considering bootstrap support. (B) Concordance in the amino acid dataset not considering bootstrap support. (C) Concordance in the nucleotide dataset considering bootstrap support ≥ 70%. (D) Concordance in the amino acid dataset considering bootstrap support ≥70%.
Log-Likelihood scores rounded to the nearest whole number for alternative resolutions of contentious nodes based on edge-based analyses using a modification of the Maxmimum Gene Wise Edge (MGWE) method (Walker, Brown & Smith, 2018).
The best-supported relationship (highest log-likelihood score) for each case is presented in bold.
| Contentious relationships | GT | AT | CC | Alternative topology (>3 genes supporting) |
|---|---|---|---|---|
| −618371 | −618371 | |||
| Lamiid relationships | −618235 | |||
| −618535 | N/A (BS <70) | N/A | ||
| NA | −618308 | −618319 |
Contentious relationships tested using edge-based analyses.
Individual genes supporting, with ≥70, BS the alternative topologies examined for each contentious relationship.
| Contentious relationships | GT | AT | CC | Alternative topology (>3 genes supporting) |
|---|---|---|---|---|
| Lamiid relationships | ||||
| None | N/A | N/A |