| Literature DB >> 36160958 |
Olle Thureborn1, Sylvain G Razafimandimbison2, Niklas Wikström1,3, Catarina Rydin1,3.
Abstract
Subfamily Rubioideae is the largest of the main lineages in the coffee family (Rubiaceae), with over 8,000 species and 29 tribes. Phylogenetic relationships among tribes and other major clades within this group of plants are still only partly resolved despite considerable efforts. While previous studies have mainly utilized data from the organellar genomes and nuclear ribosomal DNA, we here use a large number of low-copy nuclear genes obtained via a target capture approach to infer phylogenetic relationships within Rubioideae. We included 101 Rubioideae species representing all but two (the monogeneric tribes Foonchewieae and Aitchinsonieae) of the currently recognized tribes, and all but one non-monogeneric tribe were represented by more than one genus. Using data from the 353 genes targeted with the universal Angiosperms353 probe set we investigated the impact of data type, analytical approach, and potential paralogs on phylogenetic reconstruction. We inferred a robust phylogenetic hypothesis of Rubioideae with the vast majority (or all) nodes being highly supported across all analyses and datasets and few incongruences between the inferred topologies. The results were similar to those of previous studies but novel relationships were also identified. We found that supercontigs [coding sequence (CDS) + non-coding sequence] clearly outperformed CDS data in levels of support and gene tree congruence. The full datasets (353 genes) outperformed the datasets with potentially paralogous genes removed (186 genes) in levels of support but increased gene tree incongruence slightly. The pattern of gene tree conflict at short internal branches were often consistent with high levels of incomplete lineage sorting (ILS) due to rapid speciation in the group. While concatenation- and coalescence-based trees mainly agreed, the observed phylogenetic discordance between the two approaches may be best explained by their differences in accounting for ILS. The use of target capture data greatly improved our confidence and understanding of the Rubioideae phylogeny, highlighted by the increased support for previously uncertain relationships and the increased possibility to explore sources of underlying phylogenetic discordance.Entities:
Keywords: Angiosperms353; Rubiaceae; Rubioideae; incomplete lineage sorting; non-coding DNA; nuclear phylogeny; phylogenomics; target capture
Year: 2022 PMID: 36160958 PMCID: PMC9493367 DOI: 10.3389/fpls.2022.967456
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 6.627
Characteristics of assembled datasets used for phylogenetic inference.
| Dataset | # Of loci | Concatenated length | # Of PIS (%) | Average taxon coverage (%) | Average alignment length | Average PIS | Average percentage |
| Full supercontig | 353 | 1,055,164 | 876,813 (83.1%) | 117/124 (94.4%) | 2,989 | 2,484 | 82.7 |
| Full CDS | 353 | 310,806 | 169,772 (54.6%) | 117/124 (94.4%) | 880 | 481 | 53.6 |
| Paralog-filtered supercontig | 186 | 632,932 | 526,877 (83.2%) | 115/124 (92.7%) | 3,403 | 2,833 | 82.9 |
| Paralog-filtered CDS | 186 | 181,088 | 99,394 (54.9%) | 115/124 (92.7%) | 974 | 534 | 53.7 |
PIS, parsimony informative sites.
Phylogenetic inference performance of the assembled datasets for attributes under consideration.
| Phylogenetic inference approach | ||||||
| Coalescence (ASTRAL) | Concatenation (IQ-TREE) | |||||
| Dataset | Normalized quartet score | # Of branches below < 95% ingroup| global | # Of branches for which a polytomy could not be rejected. ingroup| global | Average LPP | # Of branches below < 95% ingroup| global | Average BS |
| Full supercontig | 0.930 | 6| 8 | 4| 5 | 0.983 | 0| 0 | 99.9 |
| Full CDS | 0.880 | 13| 17 | 8| 11 | 0.964 | 7| 10 | 97.8 |
| Paralog-filtered supercontig | 0.939 | 8| 11 | 9| 10 | 0.973 | 1| 4 | 99.6 |
| Paralog-filtered CDS | 0.882 | 17| 23 | 14| 19 | 0.945 | 6| 10 | 97.5 |
FIGURE 1Coalescent-based species tree estimated using ASTRAL on the full supercontig dataset. Numbers below branches denote local posterior probability (LPP) support values. Only support values smaller than 100% are shown. Pie charts show relative frequencies of the three quartet topologies around the branch (blue = congruent with species tree, yellow = first alternative topology, red = second alternative topology). Asterisks next to pie charts indicate failure to reject the hypothesis that the branch is a polytomy. Bullets after species names indicate samples downloaded from ENA. Inset shows branch lengths in coalescent units.
FIGURE 2Concatenation-based tree estimated using IQ-TREE on the full supercontig dataset. Numbers above branches denote ultrafast bootstrap (BS) support values. Only support values smaller than 100% are shown. Bullets after species names indicate samples downloaded from ENA. Inset shows branch lengths in number of substitutions per site.