| Literature DB >> 25428359 |
Ryan P Abo1, Matthew Ducar1, Elizabeth P Garcia2, Aaron R Thorner1, Vanesa Rojas-Rudilla2, Ling Lin1, Lynette M Sholl2, William C Hahn3, Matthew Meyerson4, Neal I Lindeman2, Paul Van Hummelen1, Laura E MacConaill5.
Abstract
Genomic structural variation (SV), a common hallmark of cancer, has important predictive and therapeutic implications. However, accurately detecting SV using high-throughput sequencing data remains challenging, especially for 'targeted' resequencing efforts. This is critically important in the clinical setting where targeted resequencing is frequently being applied to rapidly assess clinically actionable mutations in tumor biopsies in a cost-effective manner. We present BreaKmer, a novel approach that uses a 'kmer' strategy to assemble misaligned sequence reads for predicting insertions, deletions, inversions, tandem duplications and translocations at base-pair resolution in targeted resequencing data. Variants are predicted by realigning an assembled consensus sequence created from sequence reads that were abnormally aligned to the reference genome. Using targeted resequencing data from tumor specimens with orthogonally validated SV, non-tumor samples and whole-genome sequencing data, BreaKmer had a 97.4% overall sensitivity for known events and predicted 17 positively validated, novel variants. Relative to four publically available algorithms, BreaKmer detected SV with increased sensitivity and limited calls in non-tumor samples, key features for variant analysis of tumor specimens in both the clinical and research settings.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25428359 PMCID: PMC4330340 DOI: 10.1093/nar/gku1211
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Algorithm workflow for a given target region. (B) Illustration of reads with ‘misaligned’ sequences that are soft-clipped by the alignment tool or paired-end reads with unmapped mates are extracted to use for building contigs. The locations of the discordantly mapped paired-end reads with signatures suggestive of inversions, tandem duplications and translocations are stored and used for downstream analysis and filtering. (C) BreaKmer assembly process using the kmer subtraction procedure to iteratively build contigs.
A list of sequenced tumor and non-tumor samples with known alterations, clinical annotations and number of replicates
| Sample ID | Known alteration | Diagnosis | Tumor percentage | Tissue | Detection method | Panel | ||
|---|---|---|---|---|---|---|---|---|
| 1 | FLT3 indel | AML | 50 | Blood | Sanger sequencea | OPv2.1 | 1 | 0 |
| 2 | FLT3 indel | AML | 50 | Blood | Sanger sequencea | OPv2.1 | 1 | 0 |
| 3 | FLT3 indel | AML | 50 | Blood | Sanger sequencea | OPv2.1 | 1 | 0 |
| 4 | FLT3 indel | AML | 50 | Blood | Sanger sequencea | OPv2.1 | 1 | 0 |
| 5 | FLT3 indel | AML | 50 | Blood | Sanger sequencea | OPv2.1 | 1 | 0 |
| 6 | FLT3 indel | AML | 50 | Blood | Sanger sequencea | OPv2.1 | 1 | 0 |
| 7 | FLT3 indel | AML | 50 | Blood | Sanger sequencea | OPv2.1 | 1 | 0 |
| 8 | FLT3 indel | AML | 50 | Blood | Sanger sequencea | OPv2.1 | 1 | 0 |
| 9 | KIT deletion | GIST | NA | FFPE | Sanger sequence | OPv2.1 | 1 | 0 |
| 10 | t(2;2) | LA | 50 | FF | FISHb | OPv2.1 | 16 | 6 |
| 11 | t(2;2) | LA | 60 | FFPE | FISHb | OPv2.1 | 1 | 0 |
| 12 | t(2;2) | LA | 100 | Cell line | See reference ( | OPv2.2 | 2 | 0 |
| 13 | t(2;2) ALK-EML4 | LA | 100 | Cell line | See reference ( | OPv2.2 | 1 | 0 |
| 14 | t(2;2) ALK-EML4 | LA | 70 | FFPE | FISHb | OPv2.2 | 1 | 0 |
| 15 | t(9;22) BCR-ABL1 | CML | NA | Blood | qRT-PCR | OPv2.2 | 3 | 0 |
| 16 | t(9;22) BCR-ABL1 | ALL | NA | Heme | qRT-PCR | OPv2.1 | 2 | 0 |
| 17 | t(9;22) BCR-ABL1 | CML | NA | Heme | qRT-PCR | OPv2.1 | 1 | 0 |
| 18 | t(9;22) BCR-ABL1 | CML | NA | Heme | qRT-PCR | OPv2.1 | 10 | 6 |
| 19 | t(9;22) BCR-ABL1 | CML | NA | Heme | qRT-PCR | OPv2.1 | 7 | 0 |
| 20 | t(9;22) BCR-ABL1 | CML | NA | Heme | qRT-PCR | OPv2.1 | 6 | 0 |
| 21 | t(9;22) BCR-ABL1 | CML | 50 | Bone Marrow | qRT-PCR | OPv2.2 | 1 | 0 |
| 22 | t(7;16) | LA | 70 | FFPE | PCR | OPv2.1 | 15 | 6 |
| 23 | t(21;22) | Ewing's Sarcoma, PNET | >90 | FFPE | FISHc | OPv2.2 | 1 | 0 |
| 24 | t(11;22) | Ewing's Sarcoma | 100 | FFPE | FISHc | OPv2.1 | 1 | 0 |
| 25 | t(11;22) | DSRCT | 90 | FFPE | FISHc | OPv2.1 | 1 | 0 |
| 26 | t(11;22) | Ewing's Sarcoma, PNET | >90 | FFPE | karyotype | OPv2.2 | 1 | 0 |
| 27 | t(8;17) | CML | 90 | MeoH Ac-acid fixed | Karyotype, FISH | OPv2.1 | 1 | 0 |
| 28 | t(4;4) | CML | NA | MeoH Ac-acid fixed | FISH | OPv2.1 | 1 | 0 |
| 29 | t(16;21) | AML | 100 | Bone Marrow | Karyotype | OPv2.1 | 1 | 0 |
| 30 | t(14;18) | CAP survey sample | NA | DNA | Qualitative DNA PCR assay | OPv2.1 | 3 | 0 |
| 31 | t(14;18) | FL | NA | Frozen lymph node | Qualitative DNA PCR assay | OPv2.1 | 8 | 0 |
| 32 | t(10;11) | AML | NA | MeoH Ac-acid fixed | FISH | OPv2.1 | 1 | 0 |
| 33 | t(11;17) | AML | NA | MeoH Ac-acid fixed | FISH | OPv2.1 | 1 | 0 |
| 34 | t(6;11) | AML | 50 | Bone Marrow | FISH | OPv2.1 | 1 | 0 |
| 35 | t(9;11) | AML | 100 | Cell line | See reference ( | OPv2.2 | 2 | 0 |
| 36 | t(8;14) | DLBCL | 70 | FFPE | FISH | OPv2.1 | 1 | 0 |
| 37 | t(15;17) | APML/AML M3 | 50 | Bone Marrow | RT-PCR | OPv2.2 | 1 | 0 |
| 38 | t(15;17) | APML/AML M3 | NA | Bone Marrow | Qualitative DNA PCR assay | OPv2.1 | 10 | 6 |
| 39–118 | Normal ‘controls’ | NA | NA | Blood | NA | OPv2.1 | 80 | 0 |
Nt: total number of replicates; Nd: number of dilution replicates; LA: lung adenocarcinoma; AML: acute myeloid leukemia; GIST: gastrointestinal stromal tumor; NSCLC: non-small cell lung cancer; DSRCT: desmoplastic small round cell tumor; CML: chronic myelogenous leukemia; ALL: acute lymphoblastic leukemia; PNET: primitive neuroectodermal tumor; DLBCL: diffuse large B-cell lymphoma; APML: acute promyelocytic leukemia; FL: follicular B-cell lymphoma; OPv2.1: OncoPanel-clinical; OPv2: OncoPanel v2; FFPE: formalin-fixed paraffin-embedded; FF: fresh frozen; FISH: fluourescence in situ hybridization; qRT-PCR: quantitative real-time PCR; NA: not available.
aNon-CLIA validated assay.
bVysis LSI ALK Dual color, Break Apart Rearrangement Probe (Abbott Molecular) at 2p23.
cLSI EWSR1 Dual Color, Break Apart Rearrangement Probe (Abbott Molecular) at 22q12.
Figure 2.(A) A circos plot displaying links between gene partners and their genomic locations for the known translocations. (B) BreaKmer analysis results for the 38 cancer specimens and 80 ‘normal’ controls. For the 18 known SV events listed in the table rows, the true-positive (gray rectangle) and false-negative (red rectangle) results are shown for each replicate analyzed with the corresponding SV. The rectangles in the center are spaced to indicate separate samples. Boxplots on the right show the distributions of total read support (black boxplots) with the read depth (gray boxplots) at the inferred breakpoints for each of the known variants detected by BreaKmer. (C) A circos plot showing the validated novel translocation partners and their genomic locations identified by BreaKmer.
Figure 3.Plots displaying the relations between sequence read evidence and read depths. (A) A scatterplot showing the relation between the total read support (RS) for the known SV events identified from the BreaKmer analysis and the maximum sequence read depth (RD) observed at the inferred SV breakpoints on the log scale. Each point represents a replicate in which a true-positive call was made by BreaKmer, and the point color corresponds to the known SV of the sample replicate. (B) A scatterplot showing the relation between the quantity of the two types of sequence read evidence identified by BreaKmer for translocations. Each point represents a replicate with a known translocation that BreaKmer properly identified with the log transformed number of assembled reads (AS) on the x-axis and the log transformed number of discordantly mapped read pairs (DR) on the y-axis. (C) Boxplots showing the distributions of the BreaKmer inferred breakpoint read depth (RD, top panel) in relation to the amount of total read support (RS, bottom panel) of the identified known translocations for the four samples with tumor purity dilution replicates.
Counts for the number of true-positive results for all the replicates, listed by the known alterations and four SV detection methods
| True-positive counts | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total replicates | BreaKmer | CREST | Meerkat | BreakDancer | |||||||||||
| Known alteration | ND | D50 | D20 | ND | D50 | D20 | ND | D50 | D20 | ND | D50 | D20 | ND | D50 | D20 |
| 24 | 3 | 3 | 24 | 3 | 3 | 24 | 3 | 3 | 22 | 3 | 3 | 24 | 3 | 3 | |
| 15 | 3 | 3 | 13 | 3 | 2 | 13 | 2 | 2 | 13 | 3 | 1 | 10 | 0 | 1 | |
| 9 | 3 | 3 | 9 | 3 | 3 | 7 | 2 | 0 | 8 | 3 | 3 | 9 | 3 | 1 | |
| 11 | 0 | 0 | 11 | 0 | 0 | 1 | 0 | 0 | 10 | 0 | 0 | 6 | 0 | 0 | |
| 5 | 3 | 3 | 5 | 3 | 3 | 5 | 3 | 3 | 5 | 3 | 3 | 5 | 3 | 3 | |
| 8 | 0 | 0 | 8 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| 2 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | |
| 2 | 0 | 0 | 2 | 0 | 0 | 2 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |
| 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Total replicates | 86 | 12 | 12 | 84 | 12 | 11 | 66 | 10 | 8 | 70 | 12 | 10 | 64 | 9 | 8 |
| Total samples | 38 | 4 | 4 | 37 | 4 | 4 | 30 | 4 | 3 | 27 | 4 | 4 | 26 | 3 | 4 |
ND: non-dilution replicates; D50: dilution replicates with 50% tumor purity; D20: dilution replicates with 20% tumor purity.
A list of the total number of previously unidentified SV calls made by the SV detection methods for the tumor and non-tumor replicates
| BreaKmer | CREST | Meerkat | BreakDancer | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sample ID(s) | Known alteration | MTG-TRL | MTG-TRL | MTG-TRL | M( | MTG-TRL | ||||||||
| 10 | 10 | 3 | 0.3(0) | 0 | 1797 | 179.7(99.9) | 96.3 | 826 | 82.6(98.3) | 42.8 | 337 | 33.7(95) | 0.6 | |
| 22 | 9 | 9 | 1(30) | 1.22 | 886 | 98.4(99.5) | 62.78 | 420 | 46.7(96.7) | 22.56 | 264 | 29.3(94.3) | 1.78 | |
| 31 | 8 | 3 | 0.38(0) | 0 | 699 | 87.38(99.9) | 57.75 | 421 | 52.6(97.4) | 22.13 | 1060 | 132.5 97.3) | 0.63 | |
| 19 | 7 | 5 | 0.7(20) | 0 | 991 | 141.6(99.8) | 78 | 542 | 77.4(98.5) | 31 | 589 | 84.1(99) | 0.29 | |
| 20 | 6 | 4 | 0.7(50) | 0 | 360 | 60(1) | 44.5 | 121 | 20.2(95) | 10 | 430 | 71.7(98.4) | 0 | |
| 18 | 4 | 2 | 0.5(50) | 0 | 822 | 205.5(1) | 111.5 | 259 | 64.8(98.5) | 24.5 | 187 | 46.8(98.4) | 0.25 | |
| 38 | 4 | 2 | 0.5(0) | 0 | 277 | 69.3(1) | 48 | 135 | 33.8(95.6) | 13 | 349 | 87.25(98.6) | 1.25 | |
| 15 | 3 | 3 | 1(0) | 0 | 513 | 171(99.8) | 105.3 | 334 | 111.3(97.6) | 51.33 | 1358 | 452.67(97.9) | 3 | |
| 30 | 3 | 2 | 0.7(0) | 0 | 296 | 98.7(1) | 61 | 173 | 57.7(98.8) | 20 | 416 | 138.7(98) | 0 | |
| 12 | 2 | 6 | 3 | 0 | 207 | 103.5 | 72.5 | 156 | 78 | 36.5 | 437 | 218.5 | 5.5 | |
| 16 | 2 | 2 | 1 | 0 | 161 | 80.5 | 54 | 104 | 52 | 17 | 555 | 277.5 | 1.5 | |
| 35 | 2 | 5 | 3 | 1 | 538 | 269 | 142.5 | 336 | 168 | 75.5 | 1601 | 800.5 | 6.5 | |
| 1 | 1 | 2 | 2 | 0 | 86 | 86 | 70 | 27 | 27 | 8 | 31 | 31.0 | 1 | |
| 2 | 1 | 2 | 2 | 0 | 74 | 74 | 45 | 32 | 32 | 10 | 118 | 118.0 | 0 | |
| 3 | 1 | 2 | 2 | 0 | 93 | 93 | 58 | 30 | 30 | 12 | 25 | 25.0 | 0 | |
| 4 | 1 | 2 | 2 | 0 | 275 | 275 | 129 | 79 | 79 | 30 | 81 | 81.0 | 0 | |
| 5 | 1 | 2 | 2 | 0 | 83 | 83 | 58 | 30 | 30 | 13 | 45 | 45.0 | 1 | |
| 6 | 1 | 2 | 2 | 0 | 79 | 79 | 53 | 26 | 26 | 11 | 27 | 27.0 | 1 | |
| 7 | 1 | 3 | 3 | 0 | 160 | 160 | 101 | 38 | 38 | 15 | 100 | 100.0 | 0 | |
| 8 | 1 | 1 | 1 | 0 | 87 | 87 | 65 | 44 | 44 | 16 | 17 | 17.0 | 0 | |
| 9 | 1 | 1 | 1 | 0 | 59 | 59 | 43 | 19 | 19 | 5 | 10 | 10.0 | 0 | |
| 11 | 1 | 3 | 3 | 0 | 132 | 132 | 75 | 99 | 99 | 28 | 128 | 128.0 | 1 | |
| 13 | 1 | 4 | 4 | 0 | 220 | 220 | 138 | 151 | 151 | 70 | 1558 | 1558.0 | 5 | |
| 14 | 1 | 5 | 5 | 0 | 758 | 758 | 266 | 429 | 429 | 142 | 104 | 104.0 | 3 | |
| 17 | 1 | 3 | 3 | 0 | 86 | 86 | 58 | 66 | 66 | 27 | 135 | 135.0 | 0 | |
| 21 | 1 | 3 | 3 | 0 | 134 | 134 | 92 | 32 | 32 | 5 | 61 | 61.0 | 3 | |
| 23 | 1 | 3 | 3 | 0 | 1034 | 1034 | 155 | 2733 | 2733 | 287 | 373 | 373.0 | 1 | |
| 24 | 1 | 2 | 2 | 1 | 1583 | 1583 | 208 | 884 | 884 | 127 | 99 | 99.0 | 3 | |
| 25 | 1 | 2 | 2 | 0 | 481 | 481 | 161 | 186 | 186 | 70 | 192 | 192.0 | 0 | |
| 26 | 1 | 3 | 3 | 1 | 3506 | 3506 | 407 | 2140 | 2140 | 346 | 266 | 266.0 | 6 | |
| 27 | 1 | 2 | 2 | 0 | 1973 | 1973 | 249 | 1006 | 1006 | 185 | 137 | 137.0 | 10 | |
| 28 | 1 | 3 | 3 | 0 | 207 | 207 | 108 | 107 | 107 | 39 | 14 | 14.0 | 1 | |
| 29 | 1 | 2 | 2 | 0 | 38 | 38 | 33 | 17 | 17 | 4 | 29 | 29.0 | 0 | |
| 32 | 1 | 5 | 5 | 1 | 280 | 280 | 138 | 110 | 110 | 43 | 36 | 36.0 | 1 | |
| 33 | 1 | 2 | 2 | 0 | 409 | 409 | 159 | 85 | 85 | 42 | 11 | 11.0 | 1 | |
| 34 | 1 | 2 | 2 | 0 | 72 | 72 | 51 | 45 | 45 | 11 | 16 | 16.0 | 1 | |
| 36 | 1 | 3 | 3 | 0 | 70 | 70 | 37 | 50 | 50 | 14 | 15 | 15.0 | 0 | |
| 37 | 1 | 4 | 4 | 0 | 137 | 137 | 99 | 99 | 99 | 49 | 73 | 73.0 | 3 | |
| 39–118 | - | 80 | 151 | 1.9 | 0.025 | 3706 | 46.33 | 32.69 | 2204 | 27.55 | 8.9875 | 3217 | 40.21 | 0.44 |
N: number of replicates/samples; T: total number of ‘additional calls’; M: average number of ‘additional’ calls made per replicate/sample; S: percentage of ‘additional’ calls made that were uniquely called by a single replicate; MTG-TRL: average number of target genes involved in ‘additional’ translocation calls made per replicate/sample.
For samples with more than three replicates, the average per replicate/sample is calculated as well as the percentage of unique calls amonge the sample replicates.