| Literature DB >> 26452124 |
Abstract
BACKGROUND: Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring. CONTRIBUTION: In this paper, we give a mini-review about the state-of-the-art of sampling and counting rearrangement scenarios, focusing on the reversal, DCJ and SCJ models. Above that, we also give a Gibbs sampler for sampling most parsimonious labeling of evolutionary trees under the SCJ model. The method has been implemented and tested on real life data. The software package together with example data can be downloaded from http://www.renyi.hu/~miklosi/SCJ-Gibbs/.Entities:
Mesh:
Year: 2015 PMID: 26452124 PMCID: PMC4603625 DOI: 10.1186/1471-2105-16-S14-S6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The computational complexity of five specific counting problems under three different rearrangement models as described in details in the text.
| Reversal | DCJ | SCJ | |
|---|---|---|---|
| Pairwise rearrangement | C: #P-complete | C: #P-complete | T: in FP [ |
| C: in FPRAS | T: in FPRAS [ | ||
| Median | T: not in FP‡ | T: not in FP‡ | T: in FP* |
| T: not in FPRAS‡ | T: not in FPRAS‡ | ||
| Median scenario | T: not in FP‡ | T: not in FP‡ | T: #P-complete[ |
| T: not in FPRAS‡ | T: not in FPRAS‡ | U: in/not in FPRAS | |
| Tree labeling | T: not in FP‡ | T: not in FP‡ | U: FP/#P-complete |
| T: not in FPRAS‡ | T: not in FPRAS‡ | U: in/not in FPRAS | |
| Tree scenario | T: not in FP‡ | T: not in FP‡ | T: #P-complete[ |
| T: not in FPRAS‡ | T: not in FPRAS‡ | T: not in FPRAS [ | |
Notations: T: theorem, C: conjecture, U: unknown complexity, and there is no evidence to set up a conjecture favoring one of the possibilities. All theorems are referenced except: ‡: based on the fact that the corresponding optimization problem is NP-hard, *: proved in this paper. In all cases, "not in FP" should be considered under the assumption that P ≠ NP. Similarly, "not in FPRAS" should be considered under the assumption that RP ≠ NP.
Figure 1A rooted binary tree with two most parsimonious labelings of internal nodes. a) The B functions of the Fitch algorithm calculated in the bottom-up phase. b) The (canonical) Fitch solution. c) The values calculated in the Sankoff-Rousseau algorithm and the edges in the metagraph M (see text for details). For readability, only those values are indicated that contribute in estimating the number of most parsimonious solutions. Also, vertices of the tree are not indicated, i.e. s(k) is written instead of s(v, k). From positioning, it should be obvious which s value belongs to which vertex. d) The most parsimonious solution that can be obtained only by the Sankoff-Rousseau algorithm and not by the Fitch algorithm.
Figure 2Inferring the performance of the Gibbs sampler on 8 Vertebrates genomes. See the text for detailed description of the data and the method. Left: The number of SCJ operations of the 14 edges of the evolutionary tree in the samples of the Gibbs sampler. Samples were collected after each 10000 Gibbs sampling steps, 1000 samples were collected. For readability, the numbers of SCJ operations falling onto edges have been shifted such that the average of them be 20, 40, 60,... 280. Right: Autocorrelations of the number of SCJ operations on edges in the samples. One unit on the first axis means 10000 Gibbs sampling steps.