| Literature DB >> 26818591 |
Jarosław Paszek1, Paweł Górecki2.
Abstract
BACKGROUND: Discovering the location of gene duplications and multiple gene duplication episodes is a fundamental issue in evolutionary molecular biology. The problem introduced by Guigó et al. in 1996 is to map gene duplication events from a collection of rooted, binary gene family trees onto theirs corresponding rooted binary species tree in such a way that the total number of multiple gene duplication episodes is minimized. There are several models in the literature that specify how gene duplications from gene families can be interpreted as one duplication episode. However, in all duplication episode problems gene trees are rooted. This restriction limits the applicability, since unrooted gene family trees are frequently inferred by phylogenetic methods.Entities:
Mesh:
Year: 2016 PMID: 26818591 PMCID: PMC4895600 DOI: 10.1186/s12864-015-2308-4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Gene and species tree reconciliation. Left: the lca-mapping between a gene tree G and a species tree S shown for internal nodes. The decoration of nodes indicates gene duplication events. Right: an embedding of G into S. To reconcile these trees 6 gene duplications and 17 gene losses (not shown) are required, i.e., D(G,S)=6 and DL(G,S)=23. See also G 4∗ in Fig. 3
Fig. 3An example of unrooted episode clustering. A species tree S and four unrooted gene trees G 1, G 2, G 3, G 4 with all D-minimal rootings. For every gene tree two star topologies are shown: one for the duplication-loss cost (left) and one for the duplications cost (right). Every edge of a gene tree is decorated with the corresponding cost of rooting. Every duplication node in rootings of gene trees is decorated by all possible locations (i.e., valid mappings) of its duplication cluster from optimal solutions of single-UEC. Note that the rooting G 4∗, whose lca-mappings are shown in Fig. 1, has two duplications at (c,(b,a)) and (h,(f,g)) that are raised (here) to create two duplications clusters. Let {G 2,G 4} be an instance of UEC Problem. Then, the ⊤-cluster, that is present in G 2∗, contributes to the optimal solution. In such a case, the solution is induced by one of the two instances of EC problem: {G 2∗,G 4,1} or {G 2∗,G 4,7}. This property is proved in Theorem 5 and in Lemma 6
Fig. 2Types of stars. Star topology with the center v, types of edges and stars
Fig. 4Trees from Lemma 1 and 2. A gene tree G (left) and the rootings of G (right) from Lemma 1 and Lemma 2
Fig. 5Trees from Theorem 5 and Lemma 6. The rootings of G from Theorem 5 and Lemma 6. We use the notation G instead of G 〈. See Fig. 4 for a legend of the symbols used
Experimental results
| Set | # Species trees | # Leaves | # Gene trees |
| Our model | Model [ | ||
|---|---|---|---|---|---|---|---|---|
|
| % Locations |
| % Locations | |||||
| Guigó | 71 | 16 | 53 | 0 | 4 | 12,9 % | 5 | 16,1 % |
| Génolevures | 1 [ | 9 | 4144 | 55 | 17 | 100 % | 17 | 100 % |
| 1 [ | 9 | 4144 | 156 | 17 | 100 % | 17 | 100 % | |
| TreeFam | 1 | 28 | 1274 | 67 | 45 | 81,8 % | 45 | 81,8 % |
Fig. 6Duplication clusters in empirical datasets. Duplication clusters (marked by red circles) inferred from experiments. a Guigó species tree (chosen from 71 species trees from [35] as the most biologically reasonable [40]). b TreeFam species tree based on NCBI taxonomy