Justine Rudewicz1, Hayssam Soueidan2, Raluca Uricaru2, Hervé Bonnefoi3, Richard Iggo3, Jonas Bergh4, Macha Nikolski2. 1. Centre de BioInformatique de Bordeaux, University of BordeauxBordeaux, France; Laboratoire Bordelais de Recherche en Informatique, Centre National de la Recherche Scientifique, University of BordeauxBordeaux, France; Bergonié Cancer Institute, Institut National de la Santé et de la Recherche Médicale U1218, University of BordeauxBordeaux, France. 2. Centre de BioInformatique de Bordeaux, University of BordeauxBordeaux, France; Laboratoire Bordelais de Recherche en Informatique, Centre National de la Recherche Scientifique, University of BordeauxBordeaux, France. 3. Bergonié Cancer Institute, Institut National de la Santé et de la Recherche Médicale U1218, University of Bordeaux Bordeaux, France. 4. Karolinska Institute and University Hospital Stockholm, Sweden.
Abstract
Targeted sequencing is commonly used in clinical application of NGS technology since it enables generation of sufficient sequencing depth in the targeted genes of interest and thus ensures the best possible downstream analysis. This notwithstanding, the accurate discovery and annotation of disease causing mutations remains a challenging problem even in such favorable context. The difficulty is particularly salient in the case of third generation sequencing technology, such as PacBio. We present MICADo, a de Bruijn graph based method, implemented in python, that makes possible to distinguish between patient specific mutations and other alterations for targeted sequencing of a cohort of patients. MICADo analyses NGS reads for each sample within the context of the data of the whole cohort in order to capture the differences between specificities of the sample with respect to the cohort. MICADo is particularly suitable for sequencing data from highly heterogeneous samples, especially when it involves high rates of non-uniform sequencing errors. It was validated on PacBio sequencing datasets from several cohorts of patients. The comparison with two widely used available tools, namely VarScan and GATK, shows that MICADo is more accurate, especially when true mutations have frequencies close to backgound noise. The source code is available at http://github.com/cbib/MICADo.
Targeted sequencing is commonly used in clinical application of NGS technology since it enables generation of sufficient sequencing depth in the targeted genes of interest and thus ensures the best possible downstream analysis. This notwithstanding, the accurate discovery and annotation of disease causing mutations remains a challenging problem even in such favorable context. The difficulty is particularly salient in the case of third generation sequencing technology, such as PacBio. We present pan class="Chemical">MICADo, a de Bruijn graph based method, implemented in python, that makes possible to distinguish between patient specific mutations and other alterations for targeted sequencing of a cohort of patients. MICADo analyses NGS reads for each sample within the context of the data of the whole cohort in order to capture the differences between specificities of the sample with respect to the cohort. MICADo is particularly suitable for sequencing data from highly heterogeneous samples, especially when it involves high rates of non-uniform sequencing errors. It was validated on PacBio sequencing datasets from several cohorts of patients. The comparison with two widely used available tools, namely VarScan and GATK, shows that MICADo is more accurate, especially when true mutations have frequencies close to backgound noise. The source code is available at http://github.com/cbib/MICADo.
Entities:
Keywords:
cancer; code:python; de Bruijn graphs; patients' cohort; targeted sequencing; third generation sequencing
Authors: Daniel C Koboldt; Qunyuan Zhang; David E Larson; Dong Shen; Michael D McLellan; Ling Lin; Christopher A Miller; Elaine R Mardis; Li Ding; Richard K Wilson Journal: Genome Res Date: 2012-02-02 Impact factor: 9.043
Authors: Hervé Bonnefoi; Martine Piccart; Jan Bogaerts; Louis Mauriac; Pierre Fumoleau; Etienne Brain; Thierry Petit; Philippe Rouanet; Jacek Jassem; Emmanuel Blot; Khalil Zaman; Tanja Cufer; Alain Lortholary; Elisabet Lidbrink; Sylvie André; Saskia Litière; Lissandra Dal Lago; Véronique Becette; David A Cameron; Jonas Bergh; Richard Iggo Journal: Lancet Oncol Date: 2011-05-11 Impact factor: 41.316
Authors: Abel Gonzalez-Perez; Ville Mustonen; Boris Reva; Graham R S Ritchie; Pau Creixell; Rachel Karchin; Miguel Vazquez; J Lynn Fink; Karin S Kassahn; John V Pearson; Gary D Bader; Paul C Boutros; Lakshmi Muthuswamy; B F Francis Ouellette; Jüri Reimand; Rune Linding; Tatsuhiro Shibata; Alfonso Valencia; Adam Butler; Serge Dronov; Paul Flicek; Nick B Shannon; Hannah Carter; Li Ding; Chris Sander; Josh M Stuart; Lincoln D Stein; Nuria Lopez-Bigas Journal: Nat Methods Date: 2013-08 Impact factor: 28.547
Authors: Chen-Shan Chin; Jon Sorenson; Jason B Harris; William P Robins; Richelle C Charles; Roger R Jean-Charles; James Bullard; Dale R Webster; Andrew Kasarskis; Paul Peluso; Ellen E Paxinos; Yoshiharu Yamaichi; Stephen B Calderwood; John J Mekalanos; Eric E Schadt; Matthew K Waldor Journal: N Engl J Med Date: 2010-12-09 Impact factor: 91.245
Authors: Michael A Quail; Miriam Smith; Paul Coupland; Thomas D Otto; Simon R Harris; Thomas R Connor; Anna Bertoni; Harold P Swerdlow; Yong Gu Journal: BMC Genomics Date: 2012-07-24 Impact factor: 3.969