| Literature DB >> 19668202 |
Ken Chen1, John W Wallis, Michael D McLellan, David E Larson, Joelle M Kalicki, Craig S Pohl, Sean D McGrath, Michael C Wendl, Qunyuan Zhang, Devin P Locke, Xiaoqi Shi, Robert S Fulton, Timothy J Ley, Richard K Wilson, Li Ding, Elaine R Mardis.
Abstract
Detection and characterization of genomic structural variation are important for understanding the landscape of genetic variation in human populations and in complex diseases such as cancer. Recent studies demonstrate the feasibility of detecting structural variation using next-generation, short-insert, paired-end sequencing reads. However, the utility of these reads is not entirely clear, nor are the analysis methods with which accurate detection can be achieved. The algorithm BreakDancer predicts a wide variety of structural variants including insertion-deletions (indels), inversions and translocations. We examined BreakDancer's performance in simulation, in comparison with other methods and in analyses of a sample from an individual with acute myeloid leukemia and of samples from the 1,000 Genomes trio individuals. BreakDancer sensitively and accurately detected indels ranging from 10 base pairs to 1 megabase pair that are difficult to detect via a single conventional approach.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19668202 PMCID: PMC3661775 DOI: 10.1038/nmeth.1363
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1Overview of BreakDancer algorithm. (a) The workflow. (b) Five types of anomalous read pairs recognized by BreakDancerMax. A pair of arrows represents the location and the orientation of a read pair. A dotted line represents a chromosome in the subject genome. A solid line represents a chromosome in the reference genome.
Figure 2Performance of BreakDancer in simulation. TPR and FPR of BreakDancerMax (BDMax) at the confidence threshold of Q ≥ 30 are shown. TPR analytic refers to the percent of variants that can hypothetically be detected by BDMax under an analytic model (Online Methods). TPR detectable is the percent of variants hose flanking regions (300 bp both to the left and to the right) contain 2 or more confidently mapped ARPs in the MAQ alignment. The performance of BreakDancerMini (BDMini) is characterized by its TPR and FPR. The combined performance (BD all) is obtained by merging the results of these to programs.
Comparison of BreakDancer with other tools. Structural variants predicted by BreakDancer on the Yoruban (NA18507) sample were compared to sets of variants discovered by alternative approaches[14,21]. ESP (large structural variants that were found by analyzing discordant fosmid clone-end alignment), DIP (small deletion/insertion polymorphisms found as gaps in the paired alignment between the fosmid end sequences and the reference). The MPSV weighted, MPSV unweighted, Probabilistic, and MoDIL refer to sets of SVs predicted by VariationHunter[24] and by MoDIL[25] respectively. Call sets for these tools were downloaded from http://compbio.cs.sfu.ca/strvar.htm and http://compbio.cs.toronto.edu/modil/. The dbSNP v129 set refers to indels that are 10 bp or longer in dbSNP version 129. The BGI set refers to 10 bp or longer intra-contig indels produced by Beijing Genome Institute through whole genome de novo assembly on the same sample. The Strict* criteria require the length of the intersection between the validated and the predicted variants to overlap at least 50% of the length of the union of the intervals, or the predicted variants to be entirely encompassed by the fosmid interval. Before the slash sign (/) are the numbers of overlapping variants, after are the number of predictions in the corresponding category.
| Type | Deletion | Deletion | Deletion | Deletion | Deletion | Insertion | Insertion | Insertion | Inversion |
|---|---|---|---|---|---|---|---|---|---|
| Method | ESP | DIP | Assembly | ESP | DIP | Assembly | ESP | ||
| dbSNP v129 | BGI | dbSNP v129 | BGI | ||||||
| >=10bp | >=10bp | >=10bp | >=10bp | ||||||
| 92 | 116,395 | 82,956 | 107,760 | 5,704 | 107,458 | 82,956 | 41,134 | 13 | |
| strict* | 1bp | 1bp | 1bp | 1bp | 1bp | 1bp | 1bp | 1bp | |
| 55/9,202 | 955/9,202 | 2,039/9,202 | 3,123/9,202 | 5,015/9,202 | 339/4,901 | 903/4,901 | 827/4,901 | 2/665 | |
| 21/21,433 | 4528/21,433 | 7379/21,433 | 9,344/21,433 | 1,598/21,433 | 2,876/17,029 | 5,083/17,029 | 3,878/17,029 | N/A | |
| 59/27,092 | 4970/27,092 | 7998/27,092 | 10,792/27,092 | 5,064/27,092 | 2,983/19,305 | 5,336/19,305 | 4,104/19,305 | 2/655 | |
| 57/8,959 | 711/8,959 | 1332/8,959 | 2,246/8,959 | 4,819/8,959 | 121/5,575 | 192/5,575 | 192/5,575 | 2/504 | |
| 55/7,599 | 588/7,599 | 1022/7,599 | 1,835/7,599 | 4,537/7,599 | 70/3,772 | 88/3,772 | 93/3,772 | 4/433 | |
| 58/8,537 | 703/8,537 | 1217/8,537 | 2,061/8,537 | 4,703/8,537 | 100/7,142 | 124/7,142 | 131/7,142 | 1/181 | |
| 20/13,147 | 622/13,147 | 967/13,147 | 1,162/13,147 | 540/13,147 | 282/3,981 | 687/3,981 | 571/3,981 | N/A |
Figure 3Size distribution of deletions detected in an AML genome. 3170 deletions were detected from the sequence data by BreakDancerMax ranging from 58 bp to 959,498 bp. To signature peaks at 300 bp and at 6,000 bp correspond respectively to the AluY and the L1Hs retro-transposon. In comparison, only 116 inherited CNVs were detected using Affymetrix 6.0 array on this sample.