| Literature DB >> 27566673 |
Nicholas Stoler1, Barbara Arbeithuber2, Wilfried Guiblet1, Kateryna D Makova3, Anton Nekrutenko4,5.
Abstract
Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex .Entities:
Keywords: Duplex sequencing; Genomic data analysis; Low frequency polymorphism discovery; Next generation sequencing
Mesh:
Substances:
Year: 2016 PMID: 27566673 PMCID: PMC5000403 DOI: 10.1186/s13059-016-1039-4
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1The relationship between the minor allele frequency (maf) threshold (x-axis) and the total number of variable sites (y-axis) detected by [7]. Lowering the MAF threshold leads to an exponential increase in the number of variable positions. The image was generated by applying variable MAF thresholds to data from 156 human samples and plotting the average number of variable sites at a given MAF threshold. The line thickness corresponds to the 95 % confidence interval around the mean value
Fig. 2The Du Novo approach. First, reads tagged with identical barcodes are grouped into strand-specific families. Reads within each family are aligned and single-stranded consensus sequences (SSCSs) are generated. Finally, the SSCSs are reduced into duplex consensus sequences (DCSs)
Characteristics of ABL1 and SC8 duplex sequencing experiments
| Number of |
| SC8 |
|---|---|---|
| Read pairs | 6,921,891 | 17,385,100 |
| Unique tags | 1,467,768 | 2,100,705 |
| Unique αβ configurations | 748,411 | 1,148,444 |
| Unique αβ configurations with 1 read pair | 677,069 | 884,295 |
| Unique αβ configurations with ≥3 read pairs | 60,333 | 222,823 |
| Unique βα configurations | 743,669 | 1,092,748 |
| Unique βα configurations with 1 read pair | 672,946 | 832,875 |
| Unique βα configurations with ≥3 read pairs | 60,032 | 140,486 |
| Unique αββα | 24,313 | 140,485 |
| Unique αββα with ≥3 read pairs on both strands | 20,746 | 109,999 |
| Reads within αββα families with ≥3 read pairs on both strands | 2,156,105 | 8,636,692 |
Fig. 3Distribution of family sizes (number of reads per family) supporting A and G alleles on both strands (plus and minus) for a site 130,872,141 in the ABL1 dataset and b site 13,708 in the SC8 dataset
Fig. 4A complete workflow implementing the Du Novo approach to variant discovery from duplex sequence data. It is accessible from http://usegalaxy.org/duplex