| Literature DB >> 21194472 |
Kim Wong1, Thomas M Keane, James Stalker, David J Adams.
Abstract
We present a pipeline, SVMerge, to detect structural variants by integrating calls from several existing structural variant callers, which are then validated and the breakpoints refined using local de novo assembly. SVMerge is modular and extensible, allowing new callers to be incorporated as they become available. We applied SVMerge to the analysis of a HapMap trio, demonstrating enhanced structural variant detection, breakpoint refinement, and a lower false discovery rate. SVMerge can be downloaded from http://svmerge.sourceforge.net.Entities:
Mesh:
Year: 2010 PMID: 21194472 PMCID: PMC3046488 DOI: 10.1186/gb-2010-11-12-r128
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1An overview of the SVMerge pipeline. SVMerge uses a suite of software tools to detect structural variants (SVs) from mapped reads. The calls are filtered, merged and then validated computationally by local de novo assembly. The output is in BED format, allowing for easy downstream analysis or viewing in a genome browser. The SVMerge pipeline is extendable so that calls made by other software can be included in the downstream analysis. BAM, Binary Alignment/Map format.
Structural variation callers used in SVMerge
| Software | Analysis method | SV types called | Size detection limitations |
|---|---|---|---|
| BDMax | Paired-end mapping | D, I, Inv, T | Insertions limited by library insert size |
| Pindel | Split-mapping | D, I, D+I | Insertions limited by read size; deletions <1 Mb |
| SECluster | Clusters of one-end-mapped reads | I | Minimum size dependent on insert size |
| RetroSeq | Targeted insertion calling | RI | Minimum size dependent on insert size |
| RDXplorer | Read depth | D, G | Minimum size approximately 1 kb |
Listed are the software used to call structural variants (SVs), the analysis method used, SV types called, and limitations of the SV caller. 'BDMax' is BreakDancerMax. D, deletion; D+I, deletion with small insertion; G, copy number gain; I, insertion; Inv, inversion; RI, repeat insertion; T, translocation.
Improvement of breakpoint resolution using local de novo assembly and breakpoint refinement in SVMerge
| Raw | Refined | |||||
|---|---|---|---|---|---|---|
| SV type | Called | Correct | Mean distance | Breakpoints detected | Correct | Mean distance |
| Homozygous | ||||||
| Deletion (random) | 99 | 9 | +5/-3 | 99 | 77 | -1/-1 |
| Deletion (repeat) | 99 | 4 | +11/-8 | 99 | 89 | 0/0 |
| Inversion | 100 | 0 | -169/175 | 85 | 46 | 51/24 |
| Insertion | 99 | 0 | 0/205 | 97 | 60 | -1/1 |
| Heterozygous | ||||||
| Deletion (random) | 96 | 2 | +6/-4 | 96 | 40 | -35/+18 |
| Deletion (repeat) | 94 | 0 | +19/-15 | 91 | 35 | 0/2 |
| Inversion | 99 | 0 | -166/+165 | 73 | 30 | -58/+287 |
| Insertion | 96 | 0 | +1/+202 | 18 | 18 | 0/0 |
To evaluate the performance of the local assembly and breakpoint refinement step in SVMerge, structural variants (SVs) were generated in human chromosome 20. For each category, 100 SVs were generated by random selection of location and size. Repeat deletions were selected from a list of LINEs and SINEs on chromosome 20. The raw, unfiltered calls are from BreakDancer raw output, except insertion calls, which are from SECluster raw output. 'Called' is the total number of SVs out of the 100 simulated SVs that were found in the raw output; 'Breakpoints detected' is the number of SVs, out of the total called, for which the SVMerge pipeline was able to detect breakpoints with the local assembly, and contig alignment and analysis steps; 'Correct' is the number of predictions that had matches to the actual breakpoint coordinates; 'Mean distance' is the mean distance from the actual breakpoints, where the numbers represent the 5'/3' breakpoints. The '+' indicates the mean distance was upstream of the actual breakpoint, and '-' indicates the mean distance was downstream. Raw and refined breakpoints were considered 'correct' if the direction and deviation at both the 5' and 3' breakpoints were equal.
Structural variant calls for individual NA18506
| Call set | Deletion | Insertion | Inversion | CNG | Total | |
|---|---|---|---|---|---|---|
| BDMax | 4,141 | 1,844 | 324 | - | - | 6,309 |
| Pindel | 458 | 0 | - | - | - | 458 |
| SECluster | - | 1,215 | - | - | - | 1,215 |
| RetroSeq | - | 2,297 | - | - | - | 2,297 |
| RDXplorer | 575 | - | - | 280 | - | 855 |
| Merged raw | 4,717 | 5,252 | 324 | 280 | - | 10,573 |
| SVMerge final | 4,184 | 575 | 38 | 280 | 99 | 5,176 |
a'Complex' refers to any locus with more than one structural variant type - for example, an inversion with a deletion. Shown are the numbers of raw calls (>100 bp) from each structural variation (SV) caller, filtered by score and location only (see Materials and methods), for NA18506, the child in the HapMap trio dataset. 'BDMax' is BreakDancerMax. Pindel is able to identify deletions that also contain small insertions; these are included in the total deletion count. 'Merged raw' is the resulting number of calls after merging of these calls by their coordinates (see Materials and methods). 'SVMerge final' is the total number of calls made after refinement of the SV call list by local assembly and read depth analysis. Copy number gains (CNG) are not subject to validation by local assembly.
Contribution of individual structural variant callers to the 'SVMerge final' call set for NA18506
| Unique SV calls | ||||||
|---|---|---|---|---|---|---|
| Deletion | Insertion | Inversion | CNG | Shared SV calls | Total SVs | |
| BDMax | 3,283 | 45 | 124 | - | 442 | 3,874 |
| Pindel | 25 | 0 | - | - | 404 | 429 |
| SECluster | - | 449 | - | - | 40 | 489 |
| RetroSeq | - | 44 | - | - | 7 | 51 |
| RDXplorer | 526 | - | - | 280 | 49 | 855 |
Shown are the number of structural variant (SV) calls (>100 bp) from each SV caller that are included in the 'final' call set for NA18506. 'Unique SV calls' are those that are made by a single SV caller only, and 'Shared SV calls' are SVs that were found by more than one method. 'BDMax' is BreakDancerMax. CNG, copy number gain.
Velvet parameters for each individual
| Sample | hash_length | ins_len | Exp_cov | cov_cutoff |
|---|---|---|---|---|
| NA18506 | 29 | 220 | 35 | 2 |
| NA18507 | 27 | 200 | 35 | 2 |
| NA18508 | 29 | 200 | 35 | 2 |
Exonerate (v.2.2.0) parameters used for mapping de novo contigs back to the reference genome
| Parameter | Value | Parameter | Value |
|---|---|---|---|
| model | affine:local | gappedextension | FALSE |
| bestn | 50; 100 for inversions | joinrangeext | 300 |
| gapexend | -3 | score | 15 |
| dnahspdropoff | 10 | dna submatrix values (inversions) | 5 for match, -15 for mismatch |
| hspfilter | 200 |