| Literature DB >> 31551994 |
Yu-Chieh Liao1, Hung-Wei Cheng1, Han-Chieh Wu2, Shu-Chen Kuo2, Tsai-Ling Yang Lauderdale2, Feng-Jui Chen2.
Abstract
The Oxford Nanopore MinION is an affordable and portable DNA sequencer that can produce very long reads (tens of kilobase pairs), which enable de novo bacterial genome assembly. Although many algorithms and tools have been developed for base calling, read mapping, de novo assembly, and polishing, an automated pipeline is not available for one-stop analysis for circular bacterial genome reconstruction. In this paper, we present the pipeline CCBGpipe for completing circular bacterial genomes. Raw current signals are demultiplexed and base called to generate sequencing data. Sequencing reads are de novo assembled several times by using a sampling strategy to produce circular contigs that have a sequence in common between their start and end. The circular contigs are polished by using raw signals and sequencing reads; then, duplicated sequences are removed to form a linear representation of circular sequences. The circularized contigs are finally rearranged to start at the start position of dnaA/repA or a replication origin based on the GC skew. CCBGpipe implemented in Python is available at https://github.com/jade-nhri/CCBGpipe. Using sequencing data produced from a single MinION run, we obtained 48 circular sequences, comprising 12 chromosomes and 36 plasmids of 12 bacteria, including Acinetobacter nosocomialis, Acinetobacter pittii, and Staphylococcus aureus. With adequate quantities of sequencing reads (80×), CCBGpipe can provide a complete and automated assembly of circular bacterial genomes.Entities:
Keywords: MinION sequencing; assembly complexity; bacterial genome; de novo assembly; one-stop analysis
Year: 2019 PMID: 31551994 PMCID: PMC6737777 DOI: 10.3389/fmicb.2019.02068
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
FIGURE 1A schematic workflow of CCBGpipe.
Sequencing statistics and assembly information.
| Strain | 2012C01-137 | 2014S07-126 | 2010C01-170 | 2012N21-164 | 2012N08-034 | 2014N21-145 | 2010S01-197 | 2014N23-120 | ||||
| Biosample∗ | SAMN09069669 | SAMN09069676 | SAMN09069679 | SAMN09069675 | SAMN09069678 | SAMN09069674 | SAMN09069668 | SAMN09069671 | ||||
| No. of reads | 68531 | 84973 | 99127 | 65317 | 81740 | 109519 | 59840 | 66798 | 103544 | 102244 | 81094 | 68573 |
| Total bases (Mbp) | 326 | 597 | 537 | 450 | 461 | 652 | 336 | 385 | 696 | 646 | 564 | 443 |
| Mean length (bp) | 4757 | 7024 | 5402 | 6886 | 5642 | 5954 | 5617 | 5761 | 6717 | 6317 | 6955 | 6467 |
| Max. length (bp) | 58035 | 89795 | 74822 | 78328 | 65545 | 55832 | 51710 | 54864 | ||||
| SRA run∗ | SRR7119551 | SRR7119558 | SRR7119559 | SRR7119553 | SRR7119560 | SRR7119554 | SRR7119552 | SRR7119549 | ||||
| Assembly | 3,865,154 | 3,900,436 | 4,176,213 | 3,907,768 | 3,866,831 | 3,854,880 | 4,231,974 | 4,037,158 | 2,880,969 | 2,866,475 | 2,879,007 | 2,907,760 |
| 91,208 | 284,051 | 97,329 | 15,039 | 323,995 | 134,185 | 158,983 | 27,088 | 27,090 | 27,085 | 27,002 | ||
| 72,978 | 96,775 | 73,223 | 8,159 | 72,034 | 92,044 | 79,057 | 3,122 | 3,120 | 3,118 | 3,127 | ||
| 8,080 | 32,766 | 12,096 | 4,340 | 9,166 | 6,077 | 39,960 | ||||||
| 9,207 | 6,632 | 3,867 | 5,449 | 5,097 | ||||||||
| 3,860 | ||||||||||||
| 2,727 | ||||||||||||
Circular sequences deduced by assemblers.
| Barcode01 | 1 chromosome 2 plasmids | 1 chromosome 1 plasmid | ||
| Barcode02 | 3 plasmids | 1 chromosome 2 plasmids | 1 chromosome 2 plasmids | |
| Barcode03 | 0 | 0 | ||
| Barcode04 | 1 chromosome 1 plasmid | 1 chromosome 2 plasmids | 1 chromosome | 1 chromosome 3 plasmids |
| Barcode05 | 3 plasmids | 1 chromosome 1 plasmid | 1 chromosome 3 plasmids | |
| Barcode06 | 1 chromosome 2 plasmids | 1 chromosome 2 plasmids | 1 chromosome 2 plasmids | 3 plasmids |
| Barcode07 | 1 chromosome 3 plasmids | 1 chromosome 5 plasmids | 2 plasmids | 3 plasmids |
| Barcode08 | 2 plasmids | 1 chromosome 2 plasmids | 1 chromosome 2 plasmids | 4 plasmids |
| Barcode09 | 1 chromosome 1 plasmid | 1 chromosome 1 plasmid | 2 plasmids | |
| Barcode10 | 1 chromosome 1 plasmid | 1 chromosome | 1 chromosome 1 plasmid | 2 plasmids |
| Barcode11 | 0 | 1 chromosome 1 plasmid | 1 chromosome 1 plasmid | |
| Barcode12 | 1 chromosome 1 plasmid | 1 chromosome 1 plasmid | 1 chromosome 1 plasmid | |
| Total | 28 | 31 | 27 | 37 |
FIGURE 2A schematic relationships between assemblies and final release assemblies for (A) barcode01 and (B) barcode10. Full alignment represents a near-perfect correlation between a circular contig and a circular sequence, partial alignment represents a partial correlation between a linear contig and a circular sequence.
Number of circular contigs produced by miniasm and Canu with different subsets of reads.
| All reads | 0 | 4 | 3 | 3 | 4 | 2 | 2 | 2 | 37 | ||||
| A∗ reads | 3 | 3 | 0 | 3 | 2 | 3 | 1 | 2 | 2 | 2 | 2 | 2 | 25 |
| B∗ reads | 3 | 4 | 0 | 4 | 1 | 1 | 1 | 1 | 30 | ||||
| A∗ + B∗ reads | 0 | 4 | 3 | 3 | 2 | 1 | 2 | 2 | 35 | ||||
| A + B reads | 4 | 5 | 0 | 4 | 5 | 4 | 3 | 4 | 3 | 2 | 2 | 2 | 38 |
| 40× sampling# all | 4 | 4 | 3 | 2 | 41 | ||||||||
| 40× sampling# A∗ + B∗ | 4 | 2 | 2 | 2 | 2 | 39 | |||||||
| 40× sampling# A + B | 4 | 4 | 3 | 2 | 2 | 2 | 39 | ||||||
| A∗ reads | 2 | 3 | 2 | 2 | 3 | 2 | 1 | 2 | 1 | 0 | 1 | 20 | |
| B∗ reads | 2 | 2 | 0 | 3 | 2 | 2 | 4 | 2 | 2 | 1 | 2 | 2 | 24 |
| A reads | 3 | 2 | 2 | 1 | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 12 | |
| B reads | 3 | 4 | 0 | 4 | 2 | 2 | 3 | 2 | 2 | 1 | 2 | 28 | |
| A + B reads | 3 | 3 | 2 | 2 | 6 | 3 | 2 | 2 | 2 | 2 | 32 | ||
| All reads | 3 | 2 | 3 | 3 | 4 | 2 | 2 | 0 | 2 | 28 | |||
| A reads | 3 | 2 | 2 | 1 | 2 | 0 | 1 | 0 | 0 | 0 | 1 | 13 | |
| B reads | 3 | 4 | 0 | 4 | 1 | 2 | 3 | 2 | 2 | 2 | 2 | 28 | |
| A + B reads | 3 | 3 | 1 | 4 | 3 | 6 | 3 | 1 | 2 | 2 | 32 | ||