| Literature DB >> 26714481 |
Martin Hunt1, Nishadi De Silva2, Thomas D Otto2, Julian Parkhill2, Jacqueline A Keane2, Simon R Harris3.
Abstract
The assembly of DNA sequence data is undergoing a renaissance thanks to emerging technologies capable of producing reads tens of kilobases long. Assembling complete bacterial and small eukaryotic genomes is now possible, but the final step of circularizing sequences remains unsolved. Here we present Circlator, the first tool to automate assembly circularization and produce accurate linear representations of circular sequences. Using Pacific Biosciences and Oxford Nanopore data, Circlator correctly circularized 26 of 27 circularizable sequences, comprising 11 chromosomes and 12 plasmids from bacteria, the apicoplast and mitochondrion of Plasmodium falciparum and a human mitochondrion. Circlator is available at http://sanger-pathogens.github.io/circlator/ .Entities:
Mesh:
Substances:
Year: 2015 PMID: 26714481 PMCID: PMC4699355 DOI: 10.1186/s13059-015-0849-0
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Typical issues in contigs produced by long-read assemblers representing circular sequences. In each example, the assembly is in a single contig, colored with a mix of green and blue, and the reference is shown in gray. Matches between the reference and assembly are shown in light blue. The plot below each reference sequence shows the number of matches to the assembly at each position of the reference sequence. a The contig has low-quality ends representing the same sequence, which needs resolving into one sequence. b The contig has missing sequence. c A small circular sequence is assembled into multiple tandem copies
Summary of results on 14 bacterial genome assemblies
| Species | Ref | HGAP | Circularizable contigsa,b | Correctly circularizeda | Errorsc | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| NCTC ID | contigs | contigs | BLAST | Circlator | Minimus2 | BLAST | Circlator | Minimus2 | BLAST | Circlator | Minimus2 |
|
| 1 | 19 | 1 | 1 | 1 | 1 | 1 | 1 | 5 | 1 | 2 |
| NCTC3610 | (0,1) | (0,1) | (0,1) | (0,1) | (0,1) | (0,1) | |||||
|
| 2 | 11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| NCTC13307 | (0,0) | (0,0) | (0,0) | (0,0) | (0,0) | (0,0) | |||||
|
| 7 | 9 | 8 | 8 | 8 | 7 | 8 | 4 | 0 | 0 | 4 |
| NCTC13360 | (1,7) | (1,7) | (1,7) | (1,6) | (1,7) | (1,3) | |||||
|
| 3 | 8 | 1 | 2 | 2 | 1 | 1 | 1 | 0 | 0 | 0 |
| NCTC10005 | (0,1) | (1,1) | (1,1) | (0,1) | (0,1) | (0,1) | |||||
|
| 2 | 7 | 1 | 1 | 2 | 1 | 1 | 1 | 2 | 0 | 1 |
| NCTC13348 | (0,1) | (0,1) | (1,1) | (0,1) | (0,1) | (0,1) | |||||
|
| 2 | 4 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
| NCTC10963 | (1,0) | (1,0) | (1,0) | (1,0) | (1,0) | (1,0) | |||||
|
| 1 | 3 | 2 | 2 | 2 | 2 | 2 | 1 | 0 | 0 | 0 |
| NCTC10833 | (1,1) | (1,1) | (1,1) | (1,1) | (1,1) | (0,1) | |||||
|
| 3 | 2 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| NCTC13626 | (1,0) | (1,0) | (1,0) | (0,0) | (1,0) | (0,0) | |||||
|
| 2 | 2 | 2 | 2 | 2 | 2 | 2 | 1 | 0 | 0 | 0 |
| NCTC13349 | (1,1) | (1,1) | (1,1) | (1,1) | (1,1) | (0,1) | |||||
|
| 1 | 2 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| NCTC11192 | (1,0) | (1,0) | (1,0) | (0,0) | (1,0) | (0,0) | |||||
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| NCTC13616 | (1,0) | (1,0) | (1,0) | (1,0) | (1,0) | (0,0) | |||||
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| NCTC13277 | (1,0) | (1,0) | (1,0) | (1,0) | (1,0) | (0,0) | |||||
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| NCTC13251 | (1,0) | (1,0) | (1,0) | (1,0) | (1,0) | (0,0) | |||||
|
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 |
| NCTC12419 | (1,0) | (1,0) | (1,0) | (0,0) | (1,0) | (0,0) | |||||
| Total | 28 | 71 | 22 | 23 | 24 | 18 | 22 | 10 | 7 | 1 | 7 |
aThe first and second numbers in parentheses are counts of contigs corresponding to chromosomes and plasmids, respectively
bA contig is defined as circularizable if it includes the entire sequence of a chromosome or plasmid, irrespective of the presence or size of an overlap between its start and end
cAll errors except for those on sample NCTC13360 were false circularizations, where a tool attempted to circularize a contig that should not be circular. All four errors made on sample NCTC13360 were incorrect circularizations, where an attempt was made to circularize a circular contig, but the output contained errors
Fig. 2Comparison of HGAP assembly of P. falciparum apicoplast and Circlator output. The HGAP and Circlator assemblies are shown in gray and white, respectively, with the numbers showing the lengths in kilobases. Nucmer matches between the genomes are shown as blue (hits in the same orientation) and pink (hits in opposing orientations). Matches to the three apicoplast genes, cox1 (blue), cox3 (green), and cob (orange), are shown as a colored track inside the assemblies. The corrected reads mapped to each of the assemblies are shown in gray outside the assemblies. This figure was generated using Circos [31]
Fig. 3Key stages of the Circlator pipeline. a Before circularization, input contigs are merged using de novo assemblies of filtered reads. b Circular contigs are resolved using matches to contigs assembled from filtered reads. c Circularized contigs are rearranged to start at the dnaA gene, or a different gene specified by the user