Yi Li1, Xiaohui Xie2. 1. Department of Computer Science, Institute for Genomics and Bioinformatics and Center for Machine Learning and Intelligent Systems, University of California, Irvine, CA 92697, USA. 2. Department of Computer Science, Institute for Genomics and Bioinformatics and Center for Machine Learning and Intelligent Systems, University of California, Irvine, CA 92697, USADepartment of Computer Science, Institute for Genomics and Bioinformatics and Center for Machine Learning and Intelligent Systems, University of California, Irvine, CA 92697, USADepartment of Computer Science, Institute for Genomics and Bioinformatics and Center for Machine Learning and Intelligent Systems, University of California, Irvine, CA 92697, USA.
Abstract
MOTIVATION: Next-generation sequencing (NGS) has revolutionized the study of cancer genomes. However, the reads obtained from NGS of tumor samples often consist of a mixture of normal and tumor cells, which themselves can be of multiple clonal types. A prominent problem in the analysis of cancer genome sequencing data is deconvolving the mixture to identify the reads associated with tumor cells or a particular subclone of tumor cells. Solving the problem is, however, challenging because of the so-called 'identifiability problem', where different combinations of tumor purity and ploidy often explain the sequencing data equally well. RESULTS: We propose a new model to resolve the identifiability problem by integrating two types of sequencing information-somatic copy number alterations and loss of heterozygosity-within a unified probabilistic framework. We derive algorithms to solve our model, and implement them in a software package called PyLOH. We benchmark the performance of PyLOH using both simulated data and 12 breast cancer sequencing datasets and show that PyLOH outperforms existing methods in disambiguating the identifiability problem and estimating tumor purity. AVAILABILITY AND IMPLEMENTATION: The PyLOH package is written in Python and is publicly available at https://github.com/uci-cbcl/PyLOH. CONTACT: xhx@ics.uci.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Next-generation sequencing (NGS) has revolutionized the study of cancer genomes. However, the reads obtained from NGS of tumor samples often consist of a mixture of normal and tumor cells, which themselves can be of multiple clonal types. A prominent problem in the analysis of cancer genome sequencing data is deconvolving the mixture to identify the reads associated with tumor cells or a particular subclone of tumor cells. Solving the problem is, however, challenging because of the so-called 'identifiability problem', where different combinations of tumor purity and ploidy often explain the sequencing data equally well. RESULTS: We propose a new model to resolve the identifiability problem by integrating two types of sequencing information-somatic copy number alterations and loss of heterozygosity-within a unified probabilistic framework. We derive algorithms to solve our model, and implement them in a software package called PyLOH. We benchmark the performance of PyLOH using both simulated data and 12 breast cancer sequencing datasets and show that PyLOH outperforms existing methods in disambiguating the identifiability problem and estimating tumor purity. AVAILABILITY AND IMPLEMENTATION: The PyLOH package is written in Python and is publicly available at https://github.com/uci-cbcl/PyLOH. CONTACT: xhx@ics.uci.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971
Authors: Graham R Bignell; Jing Huang; Joel Greshock; Stephen Watt; Adam Butler; Sofie West; Mira Grigorova; Keith W Jones; Wen Wei; Michael R Stratton; P Andrew Futreal; Barbara Weber; Michael H Shapero; Richard Wooster Journal: Genome Res Date: 2004-02 Impact factor: 9.043
Authors: Xiaojun Zhao; Cheng Li; J Guillermo Paez; Koei Chin; Pasi A Jänne; Tzu-Hsiu Chen; Luc Girard; John Minna; David Christiani; Chris Leo; Joe W Gray; William R Sellers; Matthew Meyerson Journal: Cancer Res Date: 2004-05-01 Impact factor: 12.701
Authors: R Sachidanandam; D Weissman; S C Schmidt; J M Kakol; L D Stein; G Marth; S Sherry; J C Mullikin; B J Mortimore; D L Willey; S E Hunt; C G Cole; P C Coggill; C M Rice; Z Ning; J Rogers; D R Bentley; P Y Kwok; E R Mardis; R T Yeh; B Schultz; L Cook; R Davenport; M Dante; L Fulton; L Hillier; R H Waterston; J D McPherson; B Gilman; S Schaffner; W J Van Etten; D Reich; J Higgins; M J Daly; B Blumenstiel; J Baldwin; N Stange-Thomann; M C Zody; L Linton; E S Lander; D Altshuler Journal: Nature Date: 2001-02-15 Impact factor: 49.962
Authors: R Mei; P C Galipeau; C Prass; A Berno; G Ghandour; N Patil; R K Wolff; M S Chee; B J Reid; D J Lockhart Journal: Genome Res Date: 2000-08 Impact factor: 9.043
Authors: K Lindblad-Toh; D M Tanenbaum; M J Daly; E Winchester; W O Lui; A Villapakkam; S E Stanton; C Larsson; T J Hudson; B E Johnson; E S Lander; M Meyerson Journal: Nat Biotechnol Date: 2000-09 Impact factor: 54.908
Authors: D Pinkel; R Segraves; D Sudar; S Clark; I Poole; D Kowbel; C Collins; W L Kuo; C Chen; Y Zhai; S H Dairkee; B M Ljung; J W Gray; D G Albertson Journal: Nat Genet Date: 1998-10 Impact factor: 38.330
Authors: Valsamo Anagnostou; Kellie N Smith; Patrick M Forde; Noushin Niknafs; Rohit Bhattacharya; James White; Theresa Zhang; Vilmos Adleff; Jillian Phallen; Neha Wali; Carolyn Hruban; Violeta B Guthrie; Kristen Rodgers; Jarushka Naidoo; Hyunseok Kang; William Sharfman; Christos Georgiades; Franco Verde; Peter Illei; Qing Kay Li; Edward Gabrielson; Malcolm V Brock; Cynthia A Zahnow; Stephen B Baylin; Robert B Scharpf; Julie R Brahmer; Rachel Karchin; Drew M Pardoll; Victor E Velculescu Journal: Cancer Discov Date: 2016-12-28 Impact factor: 39.397
Authors: James X Sun; Yuting He; Eric Sanford; Meagan Montesion; Garrett M Frampton; Stéphane Vignot; Jean-Charles Soria; Jeffrey S Ross; Vincent A Miller; Phil J Stephens; Doron Lipson; Roman Yelensky Journal: PLoS Comput Biol Date: 2018-02-07 Impact factor: 4.475