| Literature DB >> 28778145 |
Olga K Kamneva1, John Syring2, Aaron Liston3, Noah A Rosenberg4.
Abstract
BACKGROUND: Hybridization is observed in many eukaryotic lineages and can lead to the formation of polyploid species. The study of hybridization and polyploidization faces challenges both in data generation and in accounting for population-level phenomena such as coalescence processes in phylogenetic analysis. Genus Fragaria is one example of a set of plant taxa in which a range of ploidy levels is observed across species, but phylogenetic origins are unknown.Entities:
Keywords: Gene trees; Haplotypes; Hybridization; Polyploidy; Species networks; Strawberry; Target capture sequencing
Mesh:
Year: 2017 PMID: 28778145 PMCID: PMC5543553 DOI: 10.1186/s12862-017-1019-7
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Fig. 1Overview of the analysis. Step 1 of the workflow illustrates DNA extraction and target capture sequencing. Step 2 includes standard NGS data clean - up steps such as read trimming, mapping to reference, and variant calling performed as described in the “Sequence and haplotype assembly” subsection of the methods. Haplotypes were phased using assembly with the HapCompass program, which utilizes the fact that alleles of different markers will co-occur in NGS reads if they are present within same the haplotypes. Step 3 of the workflow includes a procedure for identifying regions of contiguous haplotype assembly across individuals represented by haplotypes within the alignments. Then standard steps in phylogeny reconstructions are included to obtain gene phylogenies. However, since some alignments were short with few informative sites, we used the SH test to evaluate if alignments do contain detectable phylogenetic signal. Step 4 of this workflow illustrates inference of a species tree from a set of gene trees. Step 5 depicts the summarizing of gene trees into consensus network. Step 6 shows the testing of candidate hybridizations in an ILS-aware framework
Fig. 2Taxonomic and sequence composition of different datasets used for phylogenetic analysis. Sampled species, their geographic range, ploidy level, the number of individuals included from each species, and taxonomic sampling in every dataset are shown. For each dataset, the total number of aligned base pairs is shown as well. Consensus sequences (or haplotypes in the case of dataset 5) from one individual from a diploid outgroup species, Drymocallis glandulosa, were included in every dataset and were used to root gene trees
STEMhy tests of putative hybridizations
| Hybrid (ploidy) | Parents (ploidy) | Outgroup (ploidy of 2 is not shown) | Dataset | Level of Support | Δ AIC | Selected model (% from all bootstrap runs) | Estimated mixture fractions [95% bootstrap interval] |
|---|---|---|---|---|---|---|---|
|
|
|
| 2 | 20% | 302 | hybr (88) |
|
|
|
|
| 2 | 20% | 220 | hybr (100) |
|
|
|
|
| 2 | 20% | Run failed | ||
|
|
|
| 2 | 40% | 4317 | hybr (100) |
|
|
|
|
| 6 | 15% |
| ||
|
|
|
| 7 | 15% |
| ||
|
|
|
| 8 | 20% |
| ||
|
|
|
| 9 | 15% |
| ||
|
|
|
| 10 | 15% | 9 | no-hybr (63) |
|
|
|
|
| 11 | 20% | 1 | no-hybr (84) |
|
|
|
|
| 11 | 20% | 12,450 | hybr (99) |
|
|
|
|
| 11 | 30% | 1 | no-hybr (84) |
|
|
|
|
| 12 | 20% | 21,892 | hybr (100) |
|
|
|
|
| 13 | 20% | 38,048 | hybr (94) |
|
|
|
|
| 14 | <15% | 5996 | hybr (67) |
|
PhyloNet tests of putative hybridizations
| Hybrid (ploidy) | Parents (ploidy) | Outgroup (ploidy = 2) | Dataset | Level of Support | Δ AIC | Selected model | Estimated mixture fractions |
|---|---|---|---|---|---|---|---|
|
|
|
| 2 | 20% | 10 | hybr |
|
|
|
|
| 2 | 20% | 8 | no-hybr |
|
|
|
|
| 2 | 20% | 20 | hybr |
|
|
|
|
| 2 | 40% | 126 | hybr |
|
|
|
|
| 6 | 15% | 57 | hybr |
|
|
|
|
| 7 | 15% | 44 | hybr |
|
|
|
|
| 8 | 20% | 33 | hybr |
|
|
|
|
| 9 | 15% | 56 | hybr |
|
|
|
|
| 10 | 15% | 24 | hybr |
|
|
|
|
| 11 | 20% | 151 | hybr |
|
|
|
|
| 11 | 20% | 340 | hybr |
|
|
|
|
| 11 | 30% | 66 | hybr |
|
|
|
|
| 12 | 20% | 511 | hybr |
|
|
|
|
| 13 | 20% | 408 | hybr |
|
|
|
|
| 14 | <15% | 285 | hybr |
|
Fig. 3Species trees reconstructed using ASTRAL for various sets of taxa. Topologies were tested using bootstrap trees generated by PhyML. Clades observed in >85% of bootstrap replicates are not labeled, clades observed in 70 to 85% of replicates are marked with small circles, and those observed in <70% of replicates are marked with large circles. Ploidy levels are shown for non-diploid species. Branch lengths are not to scale. Species trees for datasets 1 and 5 are not shown, but they are included in the supplementary material (Additional file 4: Figure S11). Distances between topologies of reconstructed species trees are calculated as described in the text, and they are plotted in the heat map at the top left. Distance ranges are encoded as follows: XS: [0, 0.1], S: (0.1, 0.2], M: (0.2, 0.3], L: (0.3, 0.4], XL: (0.4, 0.5], XXL: (0.5, 0.6]
Fig. 4Cluster networks for dataset 8, constructed using all fragments passing the SH test against 100 random trees. The percent indicates the minimum support required for a cluster to be included in the procedure for identifying putative hybridizations. (a) 15. (b) 20. (c) 30. (d) 40. (e) 50%. The asterisk indicates a potential hybridization event leading to formation of tetraploid F. moupinensis at the highest confidence level. Networks for all datasets appear in Additional file 4: Figure S7