| Literature DB >> 27066216 |
Kevin Schneider1, Stephan Koblmüller2, Kristina M Sefc2.
Abstract
The homoplasy excess test (HET) is a tree-based screen for hybrid taxa in multilocus nuclear phylogenies. Homoplasy between a hybrid taxon and the clades containing the parental taxa reduces bootstrap support in the tree. The HET is based on the expectation that excluding the hybrid taxon from the data set increases the bootstrap support for the parental clades, whereas excluding non-hybrid taxa has little effect on statistical node support. To carry out a HET, bootstrap trees are calculated with taxon-jackknife data sets, that is excluding one taxon (species, population) at a time. Excess increase in bootstrap support for certain nodes upon exclusion of a particular taxon indicates the hybrid (the excluded taxon) and its parents (the clades with increased support).We introduce a new software program, hext, which generates the taxon-jackknife data sets, runs the bootstrap tree calculations, and identifies excess bootstrap increases as outlier values in boxplot graphs. hext is written in r language and accepts binary data (0/1; e.g. AFLP) as well as co-dominant SNP and genotype data.We demonstrate the usefulness of hext in large SNP data sets containing putative hybrids and their parents. For instance, using published data of the genus Vitis (~6,000 SNP loci), hext output supports V. × champinii as a hybrid between V. rupestris and V. mustangensis.With simulated SNP and AFLP data sets, excess increases in bootstrap support were not always connected with the hybrid taxon (false positives), whereas the expected bootstrap signal failed to appear on several occasions (false negatives). Potential causes for both types of spurious results are discussed.With both empirical and simulated data sets, the taxon-jackknife output generated by hext provided additional signatures of hybrid taxa, including changes in tree topology across trees, consistent effects of exclusions of the hybrid and the parent taxa, and moderate (rather than excessive) increases in bootstrap support. hext significantly facilitates the taxon-jackknife approach to hybrid taxon detection, even though the simple test for excess bootstrap increase may not reliably identify hybrid taxa in all applications.Entities:
Keywords: AFLP; Canidae; SNP; Vitis champinii; bootstrap support; homoplasy excess test; hybridization; phylogenetics
Year: 2015 PMID: 27066216 PMCID: PMC4824276 DOI: 10.1111/2041-210X.12490
Source DB: PubMed Journal: Methods Ecol Evol Impact factor: 7.781
Figure 1Excess homoplasy introduced by a hybrid taxon in a multilocus phylogenetic tree. (a) The hybrid is placed intermediate to the parental taxa. Bootstrap support (numbers above branches) for clades containing the parental taxa is low due to homoplasy with the hybrid. Circled numbers identify nodes. (b) Exclusion of the hybrid increases bootstrap support for clades containing the parental taxa. (c) Exclusion of one parent taxon causes changes in BS support and tree topology: increased bootstrap support for both parental clades, and placement of the hybrid with its other parent. (d) BS values for each node observed in the full tree. BS values in the full tree (first line) and in each taxon‐jackknife tree are compiled in table. SC, support carryover: BS values were not scored for nodes that were sister to the excluded taxon. NA, node had joined the excluded taxon. (e) Boxplots representing the distribution of BS values scored in the taxon‐jackknife trees for each node observed in the full tree. The boxes encompass 50% of the observed values that are located between the first and the third quartile and define the interquartile range (IQR). Vertical bars within boxes mark the median value. Whiskers extend to the smallest and the largest BS value located within the 1·5 × IQR distance from the boxes, whereas values beyond this distance are considered outliers and represented by dots. Dot colour indicates whether an outlier was caused by the exclusion of the hybrid, parent 1 or parent 2.
Examples of HETs used to screen multilocus phylogenies for hybrid taxa
| Taxon group | Reported upper outliers/examined nodes (if reported) | Polymorphic markers used in HET | Tree size (number of individuals, taxa) | Number of exclusion experiments (exclusion sets | Outlier detection method | Genetic distance; tree‐building algorithm | References |
|---|---|---|---|---|---|---|---|
| Cichlid fish (Cichlidae) | |||||||
| Cameroon crater lake cichlids | 10/30 nodes |
(a) 2355 AFLP |
(a) |
(a) 16 (taxa) | Boxplot, >1·5 IQR | Link | Schliewen & Klee ( |
| Austrotilapiini | 2/selected focus nodes | 5 nuclear gene sequences |
| 63 (individuals) | Boxplot, >1·5 IQR | GTR+3; ML | Schwarzer |
|
| 54/48 nodes | 1706 AFLP |
| 45 (clades) | Boxplot, >1·5 IQR | Link | Schwarzer |
| Haplochromini | 61/60 nodes | 1984 AFLP |
| 86 (species and clades) | Boxplot, >1·5 IQR | Link | Schwarzer |
|
| 5 | 1351 AFLP |
| 102 (individuals) | Boxplot, >1·5 IQR | Link | Geiger, McCrary & Schliewen ( |
| Bower‐building Lake Malawi cichlids | 1 | 3171 AFLP |
| 19 (species) | Boxplot, reported outlier >3 IQR | Nei & Li ( | Kidd, Kidd & Kocher ( |
| Tropheini | 6/selected focus nodes | 1258 AFLP |
| 21 (species and undescribed taxa) | Boxplot, >3 IQR | Nei & Li ( | Koblmüller |
|
| 14 reported (selected examples)/15 nodes | 768 AFLP |
| 53 (populations and combinations of populations) | Boxplot, >3 IQR | Nei & Li ( | Egger |
| East African cichlids | 34 | 3282 AFLP |
| 414 (species, clades, random sets) | Boxplot, >3 IQR | Jaccard, NJ | Weiss, Cotterill & Schliewen ( |
|
| 0 | 2478 AFLP |
| 11 (species) | Boxplot, >1·5 IQR | Nei & Li ( | Kidd |
| Limnochromini | 0/8 nodes | 1128 AFLP |
| 9 (species) | Boxplot, >1·5 IQR | Nei & Li ( | P. C. Kirchberger & S. Koblmüller unpublished |
|
| 0/6 nodes | 659 AFLP |
| 7 (species) | Boxplot, >1·5 IQR | Nei & Li ( | Kirchberger |
| Sailfin silversides, Telmatherinidae | 1/1 node | 1327 AFLP |
| >100 (taxa and random sets) | histogram | Link | Herder |
| Fruit‐eating bats, Phyllostomidae | 2/6 nodes | 374 AFLP |
| 8 (species) | Boxplot, reported outliers ≫3 IQR | Nei & Li | Larsen, Marchán‐Rivadeneira & Baker ( |
| Clownfish, Pomacentridae | 13/40 nodes | 7 nuclear gene sequences |
| >27 (27 species plus combinations of species) | Boxplot, >1·5 IQR | GTR+G; ML | Litsios & Salamin ( |
Studies often report outliers only for selected nodes.
Exclusion experiments were conducted by excluding one taxon at a time (e.g. Individual/species/population/clade), by excluding multiple taxa at a time, or by excluding random sets of individuals.
This study combined the results of two HETs with data sets (a) and (b).
Figure 2Homoplasy excess test results for the Vitis SNP data set. V. × champinii is a natural hybrid between V. rupestris and V. mustangensis. Trees were rooted with V. rotundifolia, and nodes are labelled with BS values. (a) The full tree, containing hybrid and parent taxa. (b) Tree obtained after exclusion of the hybrid, V. x champinii. (c) Graphical output of hext, showing boxplots of BS values across taxon‐jackknife trees for nodes at which at least one outlier value was detected at a distance >1·5 IQR from the box. Description of the nodes (on left vertical axis) and identification of species whose exclusion caused the upper BS outlier(s) were added manually. Bold font highlights nodes and taxa contributing the expected hybrid signals. Nodes A and B are identified in 2a, node “V. rupestris” joins the five V. rupestris accessions. gird., V. girdiana; rup., V. rupestris; syl., V. sylvestris; pal., V. palmata; mon., V. monticola; rip., V. riparia; acer., V. acerifolia; pia, V. piasezkii; champ., V. × champinii; must., V. mustangensis. (d, e) Annotated boxplots revealing reduced homoplasy and tree topology changes after exclusion of the putative hybrid taxon V. × champinii (d) and its parental species V. rupestris and V. mustangensis (e). The node defining monophyly of V. × champinii did not occur in the full tree (2a) and is therefore not included in the default hext output shown in 2c. The boxplot for this node was obtained using the hext option to create custom boxplots for defined nodes. Both exclusion of V. rupestris and of V. mustangensis yielded a BS of 100% (overlapping signals drawn as light‐blue circle with dark‐blue ring). The smaller of the two upper outliers at the V. × champinii node originates from the tree excluding V. acerifolia, and is not interpreted as evidence of the putative hybrid origin of V. × champinii. Similarly, in (d), the smaller of the two upper outliers at the V. rupestris node originates from the tree excluding V. piasezkii, and is also not interpreted as evidence of the putative hybrid origin of V. × champinii. Other upper outliers not highlighted in (d) are due to the exclusion of a parent as annotated in (e). Lower outliers highlighted in neither (d) nor (e) are also not considered connected to the putative hybrid origin of V. × champinii.
Homoplasy excess tests (HETs) with simulated data sets. Except for #7, polymorphisms were simulated on the tree topology shown in Fig. 3a. Columns headed ‘simulated data sets’ provide information on simulations and resulting trees (number of polymorphic loci resulting from the simulation; θ, MCcoal population size parameter; hybrid τ, MCcoal divergence time parameter for the origin of the hybrid taxon; mean BS, average bootstrap values for nodes in the full tree). Columns headed ‘signals related to hybrid taxon’ report the number of nodes at which upper BS outliers or maximum BS values were observed upon exclusion of the hybrid taxon or of a taxon descending from one of the hybrid's parents (IQR, interquartile range). Columns headed ‘false‐positive upper boxplot outliers’ report the total number of BS outliers upon exclusion of non‐hybrid taxa. In the last column, only nodes joining at least two different taxa were considered
| Simulated data sets | Signals related to hybrid taxon | False‐positive upper boxplot outliers | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Exclusion of hybrid taxon | Exclusion of parent taxon | ||||||||||
| No. | Marker loci | θ | Hybrid τ | Mean BS | Upper boxplot outliers >1·5 × IQR (indicated parent lineage) | Upper boxplot outliers >3 × IQR | Maximum BS, but not outlier, at >1·5 × IQR (indicated parent lineage) | Upper boxplot outliers >1·5 × IQR (excluded parent) | >1·5 × IQR | >3 × IQR | for nodes ≥2 taxa; >1·5 × IQR |
| No hybrid taxon | |||||||||||
| 1 | 362 SNP | 0·00001 | n.a. | 80·48 | n.a. | n.a. | n.a. | n.a. | 0 | 0 | 0 |
| 2 | 987 AFLP | 0·00001 | n.a. | 97·51 | n.a. | n.a. | n.a. | n.a. | 2 | 0 | 1 |
| 3 | 645 SNP | 0·0005 | n.a. | 58·88 | n.a. | n.a. | n.a. | n.a. | 21 | 12 | 14 |
| 4 | 1310 SNP | 0·0005 | n.a. | 73·25 | n.a. | n.a. | n.a. | n.a. | 13 | 8 | 7 |
| 5 | 4001 AFLP | 0·0005 | n.a. | 80·48 | n.a. | n.a. | n.a. | n.a. | 9 | 4 | 4 |
| 6 | 4910 SNP | 0·0005 | n.a. | 84·02 | n.a. | n.a. | n.a. | n.a. | 3 | 3 | 0 |
| 7 | 5293 AFLP (radiation) | 0·0005 | n.a. | 65·95 | n.a. | n.a. | n.a. | n.a. | 10 | 6 | 10 |
| Hybrid taxon: s × l | |||||||||||
| 8 | 1726 AFLP | 0·0005 | 0·00001 | 63·25 | 1 (s) | 0 | 2 (l) | 0 | 18 | 11 | 10 |
| 9 | 1285 SNP | 0·0005 | 0·00001 | 76·55 | 2 (s) | 2 | 0 | 2 (s) | 12 | 7 | 3 |
| 10 | 5002 AFLP | 0·0005 | 0·00001 | 79·81 | 0 | 0 | 1 (s), 1 (l) | 0 | 4 | 2 | 2 |
| 11 | 5088 SNP | 0·0005 | 0·00001 | 89·26 | 1 (l) | 0 | 4 (s) | 1 (l) | 5 | 1 | 0 |
| 12 | 5003 AFLP | 0·0005 | 0·000002 | 79·52 | 0 | 0 | 3 (s), 2(l) | 0 | 4 | 0 | 0 |
| Hybrid taxon: (r,q) × k | |||||||||||
| 13 | 640 SNP | 0·0005 | 0·000025 | 58·02 | 3 (r,q) | 2 | 0 | 1 (r) | 19 | 9 | 9 |
| 14 | 1328 AFLP | 0·0005 | 0·000025 | 64·16 | 2 (r,q) | 0 | 1 (r,q) | 1 (q) | 16 | 4 | 9 |
| 15 | 1284 SNP | 0·0005 | 0·000025 | 70·07 | 2 (r,q) | 1 | 0 | 1 (r) | 12 | 6 | 2 |
| 16 | 5047 SNP | 0·0005 | 0·000025 | 82·95 | 0 | 0 | 1 (r,q) | 0 | 3 | 1 | 0 |
| 17 | 409 SNP | 0·00001 | 0·000025 | 88·66 | 2 (r,q) | 0 | 1 (r,q), 1 (k) | 0 | 5 | 1 | 0 |
| 18 | 385 SNP | 0·00001 | 0·000025 | 91·58 | 1 (k) | 0 | 3 (r,q) | 0 | 7 | 0 | 0 |
| 19 | 787 SNP | 0·00001 | 0·000025 | 95·24 | 0 | 0 | 2 (r,q), 1 (k) | 0 | 2 | 0 | 0 |
| 20 | 784 SNP | 0·00001 | 0·000025 | 97·13 | 0 | 0 | 3 (r,q), 1 (k) | 0 | 0 | 0 | 0 |
| 21 | 964 AFLP | 0·00001 | 0·000025 | 98·28 | 1 (k) | 1 | 2 (r,q) | 0 | 5 | 0 | 0 |
| Hybrid taxon: (r,q) × (l,m,n) | |||||||||||
| 22 | 1672 AFLP | 0·0005 | 0·000037 | 63·91 | 0 | 0 | 1 (r,q) | 2 (m); 1 (l) | 18 | 5 | 7 |
| 23 | 1279 SNP | 0·0005 | 0·000037 | 64·02 | 2 (r,q) | 1 | 0 | 0 | 16 | 10 | 2 |
| 24 | 5160 SNP | 0·0005 | 0·000037 | 85·20 | 0 | 0 | 1 (r,q) | 0 | 4 | 0 | 0 |
| 25 | 375 SNP | 0·00001 | 0·000037 | 90·37 | 0 | 0 | 3 (r,q) | 0 | 2 | 0 | 0 |
| 26 | 400 SNP | 0·00001 | 0·000037 | 91·02 | 0 | 0 | 1 (r,q) | 0 | 4 | 3 | 1 |
Figure 3(a) Tree topology used to simulate AFLP and SNP data sets. Simulations assumed that hybridization between two lineages marked with corresponding symbols gave rise to a novel hybrid taxon. (b) Full tree obtained from the SNP data set in simulation #11 (Table 2) including a hybrid taxon originating from hybridization between the lineages marked with red diamonds (l × s). BS support is given near nodes (% in 1000 BS replicates). BS boxplots are shown for selected nodes to illustrate the patterns described in the text.
Figure 4(a) Negative correlation between numbers of false‐positive boxplot outliers (upper BS outliers unconnected to hybrid taxon) and average node support across the full tree in simulations of AFLP and SNP data. Outliers were identified as values at a distance >1·5 × IQR (open symbols) and >3 × IQR (filled symbols) from the third quartile. (b) True positive boxplot outliers (upper BS outliers upon exclusion of hybrid taxon) were more likely to occur when average node support in the full tree was low.