| Literature DB >> 18625077 |
Barbara R Holland1, Steffi Benthin, Peter J Lockhart, Vincent Moulton, Katharina T Huber.
Abstract
BACKGROUND: A simple and widely used approach for detecting hybridization in phylogenies is to reconstruct gene trees from independent gene loci, and to look for gene tree incongruence. However, this approach may be confounded by factors such as poor taxon-sampling and/or incomplete lineage-sorting.Entities:
Mesh:
Year: 2008 PMID: 18625077 PMCID: PMC2500029 DOI: 10.1186/1471-2148-8-202
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1An example of two applications of the Z-rule, which underpins the Z-closure supernetwork method, where two partial splits displayed in the input trees (A) are extended to complete splits as shown in (B) and (C). The bold lines that form the 'Z' shape indicate that the intersection of the taxon sets is non-empty, eg in (B) {C,D}∩ {D,M} = {D}, {D,M}∩ {M,P,T} = {M}, {M,P,T}∩ {O,P,T} = {P,T}, but {C,D}∩ {O,P,T} = ∅ so the Z-rule can be applied.
Figure 2(A) A hybridization network (number 7 from Table 1) with two hybridization nodes. (B) The principal trees of the hybridization network – these are found by choosing a single parent at each hybridization node and then suppressing the resulting internal nodes of degree 2. (C) The splits associated with the hybridization network are those displayed by the principal trees in (B). (D) A split network displaying the splits in (C).
Figure 3Flowchart indicating the steps used in the simulation study.
Hybridization networks used in simulations.
| ID | H | S | Principle trees (given in Newick format) |
|---|---|---|---|
| 1 | 0 | 5 | (((a,b),(c,d)),((e,f),(g,h))); |
| 2 | 1 | 8 | (((a,b),(c,d)),((e,f),(g,h))); (((a,b),c),(((e,d),f),(g,h))); |
| 3 | 1 | 7 | (((a,b),(c,d)),((e,f),(g,h))); (((a,(b,d)),c),((e,f),(g,h))); |
| 4 | 0 | 5 | (((((((a,b),c),d),e),f),g),h); |
| 5 | 1 | 7 | (((((((a,b),c),d),e),f),g),h); ((((((a,c),(b,d)),e),f),g),h); |
| 6 | 1 | 10 | (((((((a,b),c),d),e),f),g),h); ((((((a,c),d),e),f),(g,b)),h); |
| 7 | 2 | 8 | (((a,b),(c,d)),((e,f),(g,h))); (((a,b),(c,d)),(((e,f),g),h)); |
| ((a,((b,c),d)),((e,f),(g,h))); ((a,((b,c),d)),(((e,f),g),h)); | |||
| 8 | 2 | 9 | (((a,b),(c,d)),((e,f),(g,h))); (((a,b),c),(((d,e),f),(g,h))); |
| ((((a,b),(g,h)),(c,d)),(e,f)); ((((a,b),(g,h)),c),((d,e),f)); | |||
| 9 | 3 | 9 | ((((a,b),(c,d)),(e,f)),(g,h)); ((((a,b),(c,d)),e),(f,(g,h))); |
| (((a,b),(c,d)),((e,f),(g,h))); (((a,b),(c,d)),(e,(f,(g,h)))); | |||
| ((((a,b),c),(d,(e,f))),(g,h)); ((((a,b),c),(d,e)),(f,(g,h))); | |||
| (((a,b),c),((d,(e,f)),(g,h))); (((a,b),c),((d,e),(f,(g,h)))); | |||
| 10 | 3 | 24 | ((((b,e),(a,c)),((d,f),g)),h); (((b,(a,c)),((e,g),(d,f))),h); |
| (((a,(b,e)),(((c,d),f),g)),h); (((a,b),(((c,d),f),(e,g))),h); | |||
| (((a,c),((((b,e),d),f),g)),h); (((a,c),(((b,d),f),(e,g))),h); | |||
| ((a,((((b,e),(c,d)),f),g)),h); ((a,(((b,(c,d)),f),(e,g))),h); |
The column H gives the number of hybridization events, and the column S gives the number of unique non-trivial splits contained in the principal trees.
Figure 4False positives (A) and false negatives (B) with increasing numbers of input trees for Z-closure (ZC) and Q-imputation (Q) keeping splits with no homoplasy on any tree (HF1), keeping splits with no homoplasy on 75% or more of the trees (HF2), keeping splits with no homoplasy on 50% or more of the trees (HF3), keeping splits with a homoplasy score of 1 or less on all of the trees (HF4), or keeping the 8 highest weight splits (CF) for hybridization network 7. Values are averages over the 12 combinations of coalescent branch length b and number of missing taxa m. The maximum possible number of false negatives for this hybridization network is 8.
Figure 5False positives (A) and false negatives (B) with increasing number of input trees for the highest setting of missing taxa (m = 3) and the smallest setting for coalescent branch lengths (b = 0.5) for hybridization network 7. Abbreviations are as descibed in Figure 4. The maximum possible number of false negatives for this hybridization network is 8.
Figure 6False positives (A) and false negatives (B) as the number of missing taxa m increases from 0 to 3 for hybridization network 7. Results are averaged over the 12 possible settings for number of gene trees g and coalescent branch lengths b. Abbreviations are as descibed in Figure 4. The maximum possible number of false negatives for this hybridization network is 8.
Figure 7False positives (A) and false negatives (B) for the two different branch length settings using in the coalescent simulation (b = 0.5 and b = 1), and for the control without incomplete lineage-sorting (b = 8) for hybridization network 7. Results are averaged over the 16 possible settings for number of gene trees g and number of missing taxa m. Abbreviations are as descibed in Figure 4. The maximum possible number of false negatives for this hybridization network is 8.
False positives for hybridization network 7 using the counting filter to select the 8 highest weight splits.
| Q-imputation | Z-closure | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.2 | 0.64 | 2.18 | 0 | 0.23 | 0.56 | 0.66 | |||
| 0 | 0.07 | 0.33 | 1.25 | 0 | 0.17 | 0.56 | 1.26 | |||
| 0 | 0.08 | 0.25 | 0.86 | 0 | 0.12 | 0.49 | 0.98 | |||
| 0 | 0.01 | 0.18 | 0.47 | 0 | 0.04 | 0.29 | 0.76 | |||
| 1.62 | 2.05 | 2.29 | 3.38 | 1.61 | 1.97 | 2.25 | 2.25 | |||
| 0.68 | 1.44 | 1.98 | 2.69 | 0.67 | 1.37 | 1.96 | 2.6 | |||
| 0.34 | 0.97 | 1.37 | 2 | 0.33 | 0.84 | 1.43 | 2.1 | |||
| 0.18 | 0.74 | 1.05 | 1.93 | 0.18 | 0.68 | 1.12 | 2.01 | |||
| 2.85 | 3.24 | 3.74 | 4.56 | 2.79 | 3.04 | 3.71 | 3.78 | |||
| 1.73 | 2.25 | 2.68 | 3.76 | 1.71 | 2.18 | 2.64 | 3.56 | |||
| 1.3 | 1.67 | 2.24 | 3.28 | 1.31 | 1.65 | 2.35 | 3.16 | |||
| 0.94 | 1.43 | 2.12 | 2.57 | 0.95 | 1.33 | 2.05 | 2.54 | |||
Figure 8False positives (A) and false negatives (B) averaged over 48 combinations of number of missing taxa m, number of gene trees g, and coalescent branch lengths b versus the number of true splits for hybridization networks 1–9. Note that the number of true splits is the maximum possible number of false negatives.