| Literature DB >> 25142801 |
Mari Miyamoto, Daisuke Motooka, Kazuyoshi Gotoh, Takamasa Imai, Kazutoshi Yoshitake, Naohisa Goto, Tetsuya Iida, Teruo Yasunaga, Toshihiro Horii, Kazuharu Arakawa, Masahiro Kasahara, Shota Nakamura1.
Abstract
BACKGROUND: The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes.Entities:
Mesh:
Year: 2014 PMID: 25142801 PMCID: PMC4159541 DOI: 10.1186/1471-2164-15-699
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Data statistics for sequence run and assemblies
| Sequencer | GS Jr | Ion PGM | MiSeq | PacBio |
|---|---|---|---|---|
| Number of reads | 115611 | 4982888 | 39656630 | 120230* |
| Total bp | 48285593 | 1443005019 | 9953814130 | 374942687 |
| Coverage | 9 | 279 | 1927 | 73 |
| Mean length | 418 | 290 | 251 | 3119 |
|
|
|
|
|
|
| Number of bp used for assembly | 48285593 | 400000107 | 299809460 | 374942687 |
| Number of reads used | 115611 | 1380757 | 1194460 | 120230* |
| Coverage | 9 | 77 | 58 | 73 |
| Number of contigs | 309 | 61 | 34 | 31 |
| Total bases | 5053921 | 5075085 | 5103771 | 5298335 |
| Max length | 164926 | 895358 | 732626 | 3288561 |
| N50 contig length | 30451 | 392606 | 431440 | 3288561 |
GS Jr, Ion PGM, and MiSeq data are based on a single run. PacBio data are from three cells. The upper part of the table shows read statistics and the lower part shows the statistics of the best assembly. *Number of reads of PacBio is the number of subreads longer than 500 bp.
Accuracy of assembled contigs with respect to the reference genome
| Mismatches | GS Jr | Ion PGM | MiSeq | PacBio | PacBio (>1 M bp) |
|---|---|---|---|---|---|
| Number of contigs | 309 | 61 | 34 | 31 | 2 |
| Number of mismatches | 133 | 108 | 230 | 389 | 157 |
| Number of indels | 824 | 2853 | 184 | 715 | 698 |
| Indels length | 977 | 3018 | 241 | 818 | 794 |
| Number of mismatches per 100 kbp | 2.6 | 2.1 | 4.5 | 7.5 | 3.0 |
| Number of indels per 100 kbp | 16.3 | 56.2 | 3.6 | 13.8 | 13.5 |
| Number of misassemblies | 0 | 0 | 1 | 13 | 10 |
| Number of relocations | 0 | 0 | 1 | 11 | 10 |
| Number of translocations | 0 | 0 | 0 | 1 | 0 |
| Number of inversions | 0 | 0 | 0 | 1 | 0 |
| Number of misassembled contigs | 0 | 0 | 1 | 5 | 2 |
| Genome coverage (%) | 97.844 | 98.290 | 98.499 | 99.999 | 99.848 |
| Duplication ratio | 1.004 | 1.000 | 1.003 | 1.032 | 1.007 |
Generated contigs were compared with the reference genome using QUAST v2.3 [23]. The number of indels is the total number of insertions and deletions in the aligned bases. The number of relocations, inversions, and translocations are classified as misassemblies. A relocation is defined as a misassembly in which the left and right flanking sequences both align to the same chromosome on the reference but are either >1 kb apart or overlap by >1 kb. An inversion is a misassembly in which the left and right flanking sequences both align to the same chromosome but on opposite strands. A translocation is a misassembly in which the flanking sequences align on different chromosomes. Genome coverage is the percentage of bases aligned to the reference genome.
Figure 1Contig alignment against the genome. A Alignment of contigs to V. parahaemolyticus chromosome 1. PacBio, MiSeq, Ion PGM, and GS Jr contigs are aligned to chromosome 1 and visualized with Circos [28].
From outer to inter rings: forward CDS, reverse CDS, tRNA, rRNA, PacBio contigs, MiSeq contigs, Ion PGM contigs, GS Jr contigs, %GC plot, and GC skews. B Alignment of contigs to V. parahaemolyticus chromosome 2 PacBio, MiSeq, Ion PGM, and GS Jr contigs are aligned to chromosome 2 and visualized using a Circos plot. From outer to inter rings: forward CDS, reverse CDS, tRNA, rRNA, PacBio contigs, MiSeq contigs, Ion PGM contigs, GS Jr contigs, %GC plot, and GC skews.