Kathrin Trappe1, Anne-Katrin Emde2, Hans-Christian Ehrlich2, Knut Reinert2. 1. Department of Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany and New York Genome Center, New York, NY 10013, USA Department of Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany and New York Genome Center, New York, NY 10013, USA. 2. Department of Computer Science, Freie Universität Berlin, 14195 Berlin, Germany, Research Group Bioinformatics (NG4), Robert Koch Institute, 13353 Berlin, Germany and New York Genome Center, New York, NY 10013, USA.
Abstract
MOTIVATION: The landscape of structural variation (SV) including complex duplication and translocation patterns is far from resolved. SV detection tools usually exhibit low agreement, are often geared toward certain types or size ranges of variation and struggle to correctly classify the type and exact size of SVs. RESULTS: We present Gustaf (Generic mUlti-SpliT Alignment Finder), a sound generic multi-split SV detection tool that detects and classifies deletions, inversions, dispersed duplications and translocations of ≥ 30 bp. Our approach is based on a generic multi-split alignment strategy that can identify SV breakpoints with base pair resolution. We show that Gustaf correctly identifies SVs, especially in the range from 30 to 100 bp, which we call the next-generation sequencing (NGS) twilight zone of SVs, as well as larger SVs >500 bp. Gustaf performs better than similar tools in our benchmark and is furthermore able to correctly identify size and location of dispersed duplications and translocations, which otherwise might be wrongly classified, for example, as large deletions.
MOTIVATION: The landscape of structural variation (SV) including complex duplication and translocation patterns is far from resolved. SV detection tools usually exhibit low agreement, are often geared toward certain types or size ranges of variation and struggle to correctly classify the type and exact size of SVs. RESULTS: We present Gustaf (Generic mUlti-SpliT Alignment Finder), a sound generic multi-split SV detection tool that detects and classifies deletions, inversions, dispersed duplications and translocations of ≥ 30 bp. Our approach is based on a generic multi-split alignment strategy that can identify SV breakpoints with base pair resolution. We show that Gustaf correctly identifies SVs, especially in the range from 30 to 100 bp, which we call the next-generation sequencing (NGS) twilight zone of SVs, as well as larger SVs >500 bp. Gustaf performs better than similar tools in our benchmark and is furthermore able to correctly identify size and location of dispersed duplications and translocations, which otherwise might be wrongly classified, for example, as large deletions.
Authors: Biao Liu; Jeffrey M Conroy; Carl D Morrison; Adekunle O Odunsi; Maochun Qin; Lei Wei; Donald L Trump; Candace S Johnson; Song Liu; Jianmin Wang Journal: Oncotarget Date: 2015-03-20