Karen E van Rens1, Veli Mäkinen2, Alexandru I Tomescu2. 1. Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland and HAN University of Applied Sciences, Nijmegen, The Netherlands Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland and HAN University of Applied Sciences, Nijmegen, The Netherlands. 2. Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland and HAN University of Applied Sciences, Nijmegen, The Netherlands.
Abstract
MOTIVATION: Recent studies sequenced tumor samples from the same progenitor at different development stages and showed that by taking into account the phylogeny of this development, single-nucleotide variant (SNV) calling can be improved. Accurate SNV calls can better reveal early-stage tumors, identify mechanisms of cancer progression or help in drug targeting. RESULTS: We present SNV-PPILP, a fast and easy to use tool for refining GATK's Unified Genotyper SNV calls, for multiple samples assumed to form a phylogeny. We tested SNV-PPILP on simulated data, with a varying number of samples, SNVs, read coverage and violations of the perfect phylogeny assumption. We always match or improve the accuracy of GATK, with a significant improvement on low read coverage. AVAILABILITY AND IMPLEMENTATION: SNV-PPILP, available at cs.helsinki.fi/gsa/snv-ppilp/, is written in Python and requires the free ILP solver lp_solve. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Recent studies sequenced tumor samples from the same progenitor at different development stages and showed that by taking into account the phylogeny of this development, single-nucleotide variant (SNV) calling can be improved. Accurate SNV calls can better reveal early-stage tumors, identify mechanisms of cancer progression or help in drug targeting. RESULTS: We present SNV-PPILP, a fast and easy to use tool for refining GATK's Unified Genotyper SNV calls, for multiple samples assumed to form a phylogeny. We tested SNV-PPILP on simulated data, with a varying number of samples, SNVs, read coverage and violations of the perfect phylogeny assumption. We always match or improve the accuracy of GATK, with a significant improvement on low read coverage. AVAILABILITY AND IMPLEMENTATION: SNV-PPILP, available at cs.helsinki.fi/gsa/snv-ppilp/, is written in Python and requires the free ILP solver lp_solve. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.