| Literature DB >> 28728542 |
Jan Schröder1,2,3, Adrianto Wirawan4, Bertil Schmidt4, Anthony T Papenfuss5,6,7,8,9.
Abstract
BACKGROUND: A precise understanding of structural variants (SVs) in DNA is important in the study of cancer and population diversity. Many methods have been designed to identify SVs from DNA sequencing data. However, the problem remains challenging because existing approaches suffer from low sensitivity, precision, and positional accuracy. Furthermore, many existing tools only identify breakpoints, and so not collect related breakpoints and classify them as a particular type of SV. Due to the rapidly increasing usage of high throughput sequencing technologies in this area, there is an urgent need for algorithms that can accurately classify complex genomic rearrangements (involving more than one breakpoint or fusion).Entities:
Keywords: Genomic rearrangements; Structural variations
Mesh:
Substances:
Year: 2017 PMID: 28728542 PMCID: PMC5520322 DOI: 10.1186/s12859-017-1760-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Example of a simple structural variant that illustrates how the signatures of fusions are defined. The horizontal structure in the middle represents the double-stranded DNA of a chromosome. Two read pairs are depicted as horizontal arrows mapping to the positive and negative strand of the DNA (the reads in the pairs are enumerated as pxry — pair x, read y). Assuming that the insert size of the two pairs are significantly above their expected value, an SV caller would call a deletion event from the two read pairs. The two dashed vertical lines indicate two breakends in the DNA. The breakends are connected by an arrow labelled “fusion”, which corresponds to the deletion event. The orientation signature of the fusion is indicated as “+” and “−“next to the breakends according to the mapping orientation of the reads that constitute the evidence to the fusion call
Simple SV event labels defined by their breakend signature
| Chromosomes| | Orientations | ||
|---|---|---|---|
| + | − | ||
| chr1 = chr2 | + | Inversion type 1 (INV1) | Deletion type (DEL) |
| − | Tandem duplication (TAN) | Inversion type 2 (INV2) | |
| chr1 ≠ chr2 | + | Inter-chromosomal inversion type 1 (INVTX1) | Inter-chromosomal translocation type 1 (ITX1) |
| − | Inter-chromosomal translocation type 2 (ITX2) | Inter-chromosomal translocation type 2 (INVTX2) | |
Rows refer to the orientation of the first breakend and columns to the orientation of the second breakend. Simple events may be combined into complex event types; for example, an inversion (an inverted segment of DNA) is comprised of the two simple events INV1 and INV2
Fig. 2Rearrangement patterns (or signatures) for complex SVs. The connectors identify simple event types by color and arrowheads (for “−”orientation). a-d show events on the same chromosome, e-g show inter-chromosomal rearrangements. Intra-chromosomal duplications upstream of the insertion site are not shown in a) and are slightly different as they reverse the order of the deletion and tandem duplication events. Translocated rearrangements are not explicitly shown in d-f as they simply require a single deletion additional to the event types shown (see b))
Fig. 3Outline of the workflow and key components of the CLOVE algorithm
Fig. 4SV detection accuracy in simulated data before (raw) and after classification with CLOVE (classified). The scatter plots indicate performance for individual runs and the lines the average on the data. The significance of change in means is indicated by a p-value at the top of each panel (except for MetaSV, which is shown without CLOVE classification)
Results for SV recovery from the output of Socrates and Crest before (R) and after (C) classification with CLOVE in real sequencing data
| Organism | Tool | Data | TP | HTP | FP | FN | Sn (95% CI) | Pr (95% CI) | Acc (95% CI) |
|---|---|---|---|---|---|---|---|---|---|
| Ecoli | Socrates | M5R | 8 | 7 | 33 | 6 | 0.71 (.50,.86) | 0.31 (.20,.45) | 0.28 (.18,.40) |
| M5C | 12 | 0 | 0 | 8 | 0.60 (.39,.78) | 1.00 (.76,1.0) | 0.60 (.39,.78) | ||
| M0R | 8 | 13 | 49 | 0 | 1.00 (.85,1.0) | 0.30 (.21,.42) | 0.30 (.21,.42) | ||
| M0C | 17 | 2 | 0 | 2 | 0.90 (.71,.97) | 1.00 (.83,1.0) | 0.90 (.71,.97) | ||
| Crest | R | 7 | 7 | 4 | 6 | 0.70 (.48,.85) | 0.77 (.55,.91) | 0.58 (.39,.76) | |
| C | 10 | 0 | 0 | 10 | 0.50 (.30,.70) | 1.00 (.72,1.0) | 0.50 (.30,.70) | ||
| Socrates + Crest | R | 8 | 13 | 50 | 0 | 1.00 (.85,1.0) | 0.30 (.20,.41) | 0.30 (.20,.41) | |
| C | 17 | 3 | 0 | 1 | 0.95 (.77,1.0) | 1.00 (.84,1.0) | 0.95 (.77,1.0) | ||
| Human | Delly | R | 1447 | 0 | 6818 | 1932 | 0.43 (.41,.44) | 0.18 (.17,.18) | 0.14 (.14.15) |
| C | 1437 | 0 | 0 | 1942 | 0.43 (.41,.44) | 1.00 (1.0,1.0) | 0.43 (.41,.44) | ||
| Socrates | R | 900 | 0 | 3781 | 2361 | 0.28 (.26,.29) | 0.19 (.18,.20) | 0.13 (.12.14) | |
| C | 894 | 0 | 0 | 2403 | 0.27 (.26,.29) | 1.00 (1.0,1.0) | 0.27 (.26,.29) | ||
| Delly + Socrates | R | 1819 | 0 | 10,394 | 1464 | 0.55 (.54,.57) | 0.14 (.14,.16) | 0.13 (.13,.14) | |
| C | 1816 | 0 | 0 | 1493 | 0.55 (.53,.57) | 1.00 (1.0,1.0) | 0.55 (.53,.57) |
Columns refer to true positive events (TP), half true positives (HTP), false positives (FP), false negatives (FN), sensitivity/recall (Sn), precision (Pr), and accuracy (Acc). Confidence intervals calculated through binconf in R are supplied for the latter three columns