| Literature DB >> 22689773 |
Mukul S Bansal1, Eric J Alm, Manolis Kellis.
Abstract
MOTIVATION: Gene family evolution is driven by evolutionary events such as speciation, gene duplication, horizontal gene transfer and gene loss, and inferring these events in the evolutionary history of a given gene family is a fundamental problem in comparative and evolutionary genomics with numerous important applications. Solving this problem requires the use of a reconciliation framework, where the input consists of a gene family phylogeny and the corresponding species phylogeny, and the goal is to reconcile the two by postulating speciation, gene duplication, horizontal gene transfer and gene loss events. This reconciliation problem is referred to as duplication-transfer-loss (DTL) reconciliation and has been extensively studied in the literature. Yet, even the fastest existing algorithms for DTL reconciliation are too slow for reconciling large gene families and for use in more sophisticated applications such as gene tree or species tree reconstruction.Entities:
Mesh:
Year: 2012 PMID: 22689773 PMCID: PMC3371857 DOI: 10.1093/bioinformatics/bts225
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Simple DTL scenarios. (a) and (b) depict two possible reconciliations of G and S: the dotted arcs show the mapping (with the leaf mapping being specified by the leaf labels on the gene tree), and the label at each internal node of G specifies the type of event represented by that node. The reconciliation in (a) requires two transfers and one loss and the one in (b) requires one duplication and two losses
Runtime comparison
| Dataset type | Dataset size | RANGER-DTL-U | AnGST | Mowgli |
|---|---|---|---|---|
| Simulated | 50 taxa (100 datasets) | 2 s | 3 m:26 s | 28 m:30 s |
| 100 taxa (100 datasets) | 3 s | 15 m:4 s | 3 h:52 m | |
| 200 taxa (100 datasets) | 9 s | 1 h:2 m | 29 h:43 m | |
| 500 taxa (100 datasets) | 35 s | >800 h | >400 h | |
| 1000 taxa (100 datasets) | 2 m:57 s | — | >6000 h | |
| 10 000 taxa (1 dataset) | 4 h:7 m | — | — | |
| Biological | 4733 gene trees, 100 taxa species tree | 1 m:03 s | 3 h:45 m | 41 h:36 m |
This table shows the runtimes of RANGER-DTL-U, AnGST and Mowgli on simulated and biological datasets. Times are shown in hours (h), minutes (m) and seconds (s). Experiments were performed on a desktop computer with a 3.2 GHz Intel Core i3 processor and 4 GB of RAM.