| Literature DB >> 29123647 |
Fritz J Sedlazeck1, Andi Dhroso2, Dale L Bodian3, Justin Paschall4, Farrah Hermes5, Justin M Zook6.
Abstract
The impact of structural variants (SVs) on a variety of organisms and diseases like cancer has become increasingly evident. Methods for SV detection when studying genomic differences across cells, individuals or populations are being actively developed. Currently, just a few methods are available to compare different SVs callsets, and no specialized methods are available to annotate SVs that account for the unique characteristics of these variant types. Here, we introduce SURVIVOR_ant, a tool that compares types and breakpoints for candidate SVs from different callsets and enables fast comparison of SVs to genomic features such as genes and repetitive regions, as well as to previously established SV datasets such as from the 1000 Genomes Project. As proof of concept we compared 16 SV callsets generated by different SV calling methods on a single genome, the Genome in a Bottle sample HG002 (Ashkenazi son), and annotated the SVs with gene annotations, 1000 Genomes Project SV calls, and four different types of repetitive regions. Computation time to annotate 134,528 SVs with 33,954 of annotations was 22 seconds on a laptop.Entities:
Keywords: NGS; annotation; bioinformatics; structural variants; whole genome sequencing
Year: 2017 PMID: 29123647 PMCID: PMC5668921 DOI: 10.12688/f1000research.12516.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. SV type specific overlap schema of SURVIVOR_ant to identify which genomic annotations overlaps with which type of SV.
By default SURVIVOR_ant takes 1kbp surrounding the start and stop coordinates into account. Furthermore, for deletions and duplications we take the overlapping regions into account.
Structural variant callsets for the Ashkenazi son.
| Sequencing
| Structural Variant
| Call Set Name | Reference |
|---|---|---|---|
| Illumina | Mobile Element
| HG002.TE_insertions.recover_filt_mod | (
|
| Illumina | CommonLaw | HG002.commonlaw.deletions.bilkentuniv.082815 | (
|
| Illumina | FermiKit | HG002.fermikit.sv | (
|
| Illumina | FreeBayes | HG002_ALLCHROM_hs37d5_novoalign_
| (
|
| Illumina | GATK Haplotype
| HG002_ALLCHROM_hs37d5_novoalign_
| (
|
| Illumina | CNVnator | HG002_CNVnator_deletions.hs37d5.sort | (
|
| Illumina | MetaSV | MetaSV_151207_variants | (
|
| PacBio | Assemblytics | hg002.Assemblytics_structural_variants | (
|
| PacBio | MultibreakSV | hg002_attempt1.1_MultibreakSV_mod | (
|
| PacBio | Parliament - forced
| parliament.assembly.H002 | (
|
| PacBio | Parliament - forced
| parliament.pacbio.H002 | (
|
| PacBio | smrt-sv | smrt-sv.dip_indel | (
|
| PacBio | Assemblytics | trio2.Assemblytics_structural_variants | (
|
| PacBio | PBHoney | PBHoney_15.8.24_HG002.tails_20 | (
|
| Complete
| Complete
| vcfBeta-GS000037263-ASM_delgt19 | (
|
| Bionano | son_hap_refsplit20160129_1kb | (
|
Summary over the overlapping annotation for the SVs data set.
| Annotation type | # of overlapping
|
|---|---|
| Ensembl genes | 22,184 |
| Repeats | 7,264 |
| 1000 genomes SVs | 4,506 |
Figure 2. Number of calls per callset for each type of SV, including filtered calls.
Figure 3. Histogram of distance from the median start position for deletion calls 400 to 999 bp in size for a PacBio-based Assemblytics callset and for an Illumina-based MetaSV callset.
Only sites with calls from at least 4 different callsets were included in order to calculate a useful median value at each site.