| Literature DB >> 28003256 |
Roye Rozov1, Aya Brown Kav2, David Bogumil2, Naama Shterzer2, Eran Halperin1,3,4, Itzhak Mizrahi2, Ron Shamir1.
Abstract
Motivation: Plasmids and other mobile elements are central contributors to microbial evolution and genome innovation. Recently, they have been found to have important roles in antibiotic resistance and in affecting production of metabolites used in industrial and agricultural applications. However, their characterization through deep sequencing remains challenging, in spite of rapid drops in cost and throughput increases for sequencing. Here, we attempt to ameliorate this situation by introducing a new circular element assembly algorithm, leveraging assembly graphs provided by a conventional de novo assembler and alignments of paired-end reads to assemble cyclic sequences likely to be plasmids, phages and other circular elements.Entities:
Mesh:
Year: 2017 PMID: 28003256 PMCID: PMC5408804 DOI: 10.1093/bioinformatics/btw651
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Recycler work-flow. An example is shown of generating candidate cycles and peeling off cycles iteratively. For simplicity, all lengths are assumed to be equal and not shown. Here, we consider only candidate cycles that pass through vertex x, but ordinarily such candidates would be generated for each vertex in the component, and the cycle with lowest CV will be chosen and peeled off. (A) The assembly graph. (B) A single component is selected from the assembly graph (framed in A) and represented with vertices for contigs and edges for connecting k-mers. (C) The reduced component after tip removal. The numbers next to vertices are their observed contig coverage. Since vertex x has two incoming edges from vertices b and c, two candidate cycles are generated that pass through edges (b, x) and (c, x), respectively. This is done by computing shortest paths from x to b and from x to c. Two successive steps of peeling cycles are shown with their respective latent coverage assignments. First, the cycle in D is peeled off because the CV calculated from initially observed coverage is lowest for this cycle. Uncolored vertices correspond to contigs with zero coverage that are removed
Fig. 2Methods performance on simulated data. Results are shown for SPAdes without repeat resolution (RR), SPAdes with repeat resolution, the method of Jørgensen et al., and Recycler. The contigs of SPAdes before RR were used as input for the three other methods. Recycler also relied on the graph produced at this stage. F1 score calculation is described in the main text. The x axis shows the number of simulated reference sequences in each case
Fig. 3.PCR based validation of Recycler’s plasmid predictions. High coverage: 60–1000x, med–high:15–60x, med–low: 5–15x, low: 1–5x