Yuanyuan Qi1, Dikshant Pradhan2, Mohammed El-Kebir1. 1. 1Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA. 2. 2Department of Bioengineering, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA.
Abstract
BACKGROUND: Tumors exhibit extensive intra-tumor heterogeneity, the presence of groups of cellular populations with distinct sets of somatic mutations. This heterogeneity is the result of an evolutionary process, described by a phylogenetic tree. In addition to enabling clinicians to devise patient-specific treatment plans, phylogenetic trees of tumors enable researchers to decipher the mechanisms of tumorigenesis and metastasis. However, the problem of reconstructing a phylogenetic tree T given bulk sequencing data from a tumor is more complicated than the classic phylogeny inference problem. Rather than observing the leaves of T directly, we are given mutation frequencies that are the result of mixtures of the leaves of T. The majority of current tumor phylogeny inference methods employ the perfect phylogeny evolutionary model. The underlying Perfect Phylogeny Mixture (PPM) combinatorial problem typically has multiple solutions. RESULTS: We prove that determining the exact number of solutions to the PPM problem is #P-complete and hard to approximate within a constant factor. Moreover, we show that sampling solutions uniformly at random is hard as well. On the positive side, we provide a polynomial-time computable upper bound on the number of solutions and introduce a simple rejection-sampling based scheme that works well for small instances. Using simulated and real data, we identify factors that contribute to and counteract non-uniqueness of solutions. In addition, we study the sampling performance of current methods, identifying significant biases. CONCLUSIONS: Awareness of non-uniqueness of solutions to the PPM problem is key to drawing accurate conclusions in downstream analyses based on tumor phylogenies. This work provides the theoretical foundations for non-uniqueness of solutions in tumor phylogeny inference from bulk DNA samples.
BACKGROUND: Tumors exhibit extensive intra-tumor heterogeneity, the presence of groups of cellular populations with distinct sets of somatic mutations. This heterogeneity is the result of an evolutionary process, described by a phylogenetic tree. In addition to enabling clinicians to devise patient-specific treatment plans, phylogenetic trees of tumors enable researchers to decipher the mechanisms of tumorigenesis and metastasis. However, the problem of reconstructing a phylogenetic tree T given bulk sequencing data from a tumor is more complicated than the classic phylogeny inference problem. Rather than observing the leaves of T directly, we are given mutation frequencies that are the result of mixtures of the leaves of T. The majority of current tumor phylogeny inference methods employ the perfect phylogeny evolutionary model. The underlying Perfect Phylogeny Mixture (PPM) combinatorial problem typically has multiple solutions. RESULTS: We prove that determining the exact number of solutions to the PPM problem is #P-complete and hard to approximate within a constant factor. Moreover, we show that sampling solutions uniformly at random is hard as well. On the positive side, we provide a polynomial-time computable upper bound on the number of solutions and introduce a simple rejection-sampling based scheme that works well for small instances. Using simulated and real data, we identify factors that contribute to and counteract non-uniqueness of solutions. In addition, we study the sampling performance of current methods, identifying significant biases. CONCLUSIONS: Awareness of non-uniqueness of solutions to the PPM problem is key to drawing accurate conclusions in downstream analyses based on tumor phylogenies. This work provides the theoretical foundations for non-uniqueness of solutions in tumor phylogeny inference from bulk DNA samples.
Authors: Andrew Roth; Jaswinder Khattra; Damian Yap; Adrian Wan; Emma Laks; Justina Biele; Gavin Ha; Samuel Aparicio; Alexandre Bouchard-Côté; Sohrab P Shah Journal: Nat Methods Date: 2014-03-16 Impact factor: 28.547
Authors: Nicholas McGranahan; Francesco Favero; Elza C de Bruin; Nicolai Juul Birkbak; Zoltan Szallasi; Charles Swanton Journal: Sci Transl Med Date: 2015-04-15 Impact factor: 17.956
Authors: Amit G Deshwar; Shankar Vembu; Christina K Yung; Gun Ho Jang; Lincoln Stein; Quaid Morris Journal: Genome Biol Date: 2015-02-13 Impact factor: 13.583
Authors: Cyriac Kandoth; Michael D McLellan; Fabio Vandin; Kai Ye; Beifang Niu; Charles Lu; Mingchao Xie; Qunyuan Zhang; Joshua F McMichael; Matthew A Wyczalkowski; Mark D M Leiserson; Christopher A Miller; John S Welch; Matthew J Walter; Michael C Wendl; Timothy J Ley; Richard K Wilson; Benjamin J Raphael; Li Ding Journal: Nature Date: 2013-10-17 Impact factor: 49.962
Authors: Lydia Y Liu; Vinayak Bhandari; Adriana Salcedo; Shadrielle M G Espiritu; Quaid D Morris; Thomas Kislinger; Paul C Boutros Journal: Nat Commun Date: 2020-12-07 Impact factor: 14.919