| Literature DB >> 31510674 |
Sahand Khakabimamaghani1, Salem Malikic1, Jeffrey Tang2, Dujian Ding1, Ryan Morin2,3, Leonid Chindelevitch1, Martin Ester1,4.
Abstract
MOTIVATION: Despite the remarkable advances in sequencing and computational techniques, noise in the data and complexity of the underlying biological mechanisms render deconvolution of the phylogenetic relationships between cancer mutations difficult. Besides that, the majority of the existing datasets consist of bulk sequencing data of single tumor sample of an individual. Accurate inference of the phylogenetic order of mutations is particularly challenging in these cases and the existing methods are faced with several theoretical limitations. To overcome these limitations, new methods are required for integrating and harnessing the full potential of the existing data.Entities:
Mesh:
Year: 2019 PMID: 31510674 PMCID: PMC6612880 DOI: 10.1093/bioinformatics/btz355
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 3.A sample phylogenetic tree and its factorization
Fig. 1.Probabilistic graphical model of Hintra. Latent and observed variables are indicated by white and shaded circles, respectively
Fig. 2.Two sample scenarios in which tree factorization and parameter learning as in Caravagna results in undesired inference. The small circles denote the tumor subclones and the empty circle is the germline cell. The edges are labeled with the mutations, denoted by letters within larger circles. The true tree topologies are shown with solid edges. Each ambiguous situation is shown in a different color, with dashed ovals indicating the conflicting evidence (source of ambiguity) and the dashed edge indicating the possible mistake due to that evidence
Fig. 4.Bias in the topology prior probabilities and how the Bayesian approach mitigates this bias
Fig. 5.Results for the synthetic datasets from Caravagna
Fig. 6.Results for the synthetic datasets based on Scenario A from Figure 2
Fig. 7.Results for the synthetic datasets based on Scenario B from Figure 2
Problem size factors considered in the running time analysis
| Factor | Values |
|---|---|
| Samples ( | 20, |
| Mutations per sample | 3, |
| Combinations | 1, |
| Δ | 0.025, |
| CPU cores | 2, |
Note: Default values are italicized.
Fig. 8.Results of running time and memory analysis