| Literature DB >> 24307552 |
Ken Chen1, Lei Chen, Xian Fan, John Wallis, Li Ding, George Weinstock.
Abstract
Recent progress in next-generation sequencing has greatly facilitated our study of genomic structural variation. Unlike single nucleotide variants and small indels, many structural variants have not been completely characterized at nucleotide resolution. Deriving the complete sequences underlying such breakpoints is crucial for not only accurate discovery, but also for the functional characterization of altered alleles. However, our current ability to determine such breakpoint sequences is limited because of challenges in aligning and assembling short reads. To address this issue, we developed a targeted iterative graph routing assembler, TIGRA, which implements a set of novel data analysis routines to achieve effective breakpoint assembly from next-generation sequencing data. In our assessment using data from the 1000 Genomes Project, TIGRA was able to accurately assemble the majority of deletion and mobile element insertion breakpoints, with a substantively better success rate and accuracy than other algorithms. TIGRA has been applied in the 1000 Genomes Project and other projects and is freely available for academic use.Mesh:
Year: 2013 PMID: 24307552 PMCID: PMC3912421 DOI: 10.1101/gr.162883.113
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Schematic view of TIGRA. (A) Reads (arrow-shaped boxes) at a breakpoint (vertical dashed line in the center), including those normally mapped (gray), mate-unmapped (gray with red outline), soft-clipped (multicolored), and interchromosomally mapped (colored) are extracted from BAM files and sent to the assembly algorithm. (B) A de Bruijn graph is constructed using an iterative multiple-k-mer assembly algorithm. A contig (oval indexed node) with a specified length and average k-mer coverage (x) is connected to other contigs if it overlaps other contigs by k-1 bp (edge) in a particular orientation (arrow), and is of a particular coverage (weight). In this example, a mobile element insertion (of C2) with homology regions (C1) is successfully assembled. Two contig strings are decoded from the graph by TIGRA, representing two alternative alleles.
Comparison of deletion breakpoint assembly using low-coverage population sequencing data from the 1000 Genomes Project based on a set of 245 known breakpoints in 45 CEU pilot samples (A) and a set of 562 known breakpoints in eight phase 3 samples (B)
Figure 2.Comparison of assembly success rate at various allele frequencies in 45 CEU samples. Six assemblers are plotted: TIGRA (purple), Velvet (blue), SGA (cyan), SGA.all (yellow), Phrap (red), and SPAdes (brown). Allele frequencies (x-axis) are derived from the deletion genotypes released by The 1000 Genomes Project Consortium, and the fraction of success (y-axis) is estimated from 245 control deletion sites.
Comparison of assembler accuracy based on mobile element subfamily classification of 158 mobile element insertion breakpoints in NA12878