| Literature DB >> 31890037 |
Lennard Epping1, Julia C Golz2, Marie-Theres Knüver2, Charlotte Huber3, Andrea Thürmer4, Lothar H Wieler4, Kerstin Stingl2, Torsten Semmler1.
Abstract
BACKGROUND: Campylobacter jejuni is a zoonotic pathogen that infects the human gut through the food chain mainly by consumption of undercooked chicken meat, raw chicken cross-contaminated ready-to-eat food or by raw milk. In the last decades, C. jejuni has increasingly become the most common bacterial cause for food-born infections in high income countries, costing public health systems billions of euros each year. Currently, different whole genome sequencing techniques such as short-read bridge amplification and long-read single molecule real-time sequencing techniques are applied for in-depth analysis of bacterial species, in particular, Illumina MiSeq, PacBio and MinION.Entities:
Keywords: Antibiotic resistance; Assembler comparison; Campylobacter jejuni; Hybrid assemblies; Long read sequencing
Year: 2019 PMID: 31890037 PMCID: PMC6913002 DOI: 10.1186/s13099-019-0340-7
Source DB: PubMed Journal: Gut Pathog ISSN: 1757-4749 Impact factor: 4.181
Summary of the raw output from Illumina, MinION, and PacBio sequencing technologies
| Technology | Number of reads | Total number of bases | Median read length | Calculated mean genome coverage |
|---|---|---|---|---|
| Illumina MiSeq | 658,314 | 165,840,055 | 285 | 98× |
| PacBio RS II | 88,482 | 802,118,168 | 9065 | 475× |
| MinION | 61,960 | 737,318,830 | 8073 | 436× |
Summary of the assembler performance based on different sequencing technologies
| Index | Data | Assembler | #Contigs | #bp total length | #Chromosomal contigs; #bp | #plasmid; #bp | Insertions, deletions and SNPs | Covered by illumina reads | Sequence identity of |
|---|---|---|---|---|---|---|---|---|---|
| A | Illumina | SPAdes | 30 | 1,666,0451 | 30; 77,674 (N50)a | Cannot be directly detected | 0b | 99.9c | 100 |
| B | PacBio | HGAP | 2 | 1,733,585 | 1; 1,668,827 | 1; 64,758 | 155 | 99.46 | 100 |
| C | PacBio | Flye | 2 | 1,687,377 | 1; 1,645,611 | 1; 41,766 | 255 | 99.99 | 100 |
| D | PacBio | CLC | 2 | 1,688,161 | 1; 1,646,367 | 1; 41,794 | 253 | 99.97 | 100 |
| E | PacBio + Illumina | Unicycler | 3 | 1,684,748 | 2; 1,631,764/ 11,212 | 1; 41,772 | 0 | 99.9 | 100 |
| F | PacBio + Illumina | wtdbg2 | 3 | 1,693,078 | 2; 1,644,895/ 6,442 | 1; 41,741 | 47 | 99.65 | 100 |
| G | MinION | Flye | 2 | 1,720,675 | 1; 1,678,003 | 1; 42,673 | 24,439 | 99.36 | 99.6 |
| H | MinION + Illumina | Unicycler | 2 | 1,687,752 | 1; 1,645,980 | 1; 41,722 | 20 | 99.94 | 100 |
| I | MinION + Illumina | wtdbg2 | 5 | 1,672,121 | 4; 1,648,160/ 15,620/12,211/6,130 | 1/41,957 | 169 | 98.15 | 100 |
aQuality of draft genomes can be measured by the N50 value
bIllumina paired-end data is taken as ground truth for identification of SNPs, insertions and deletions
cAs result of the scaffolding process, performed by the SPAdes assembler, contigs with known distance, but unknown sequence content, are connected by “N”s. Thus, the SPAdes assembly is not covered by Illumina data by 100%
Fig. 1Genome map, generated by CGView [33], of chromosomal DNA a) and plasmid DNA b) from C. jejuni. BfR-CA-14430. Circles form outside to inside showing: (1,2) coding regions (light blue) predicted on forward (outer circle) and reverse strands (inner circle); (3) tRNAs (dark red); (4) rRNAs (light green); (5) regions above (green) and below (purple) the average GC skew; (6) GC content (black) and (7) DNA coordinates
Fig. 2Progressive Mauve Alignment of chromosomal genomes generated by different assemblers. The Misassembly made by SPAdes is marked by the red square. Assemblies are index by alphabetic letters as shown in Table 2. Color coded blocks indicating homology between the genomes
Fig. 3The dotplot shows a global alignment of the plasmid sequence, generated from PacBio reads by HGAP (Table 2B), against itself. This revealed one dark blue diagonal line in the middle from start to end of the sequence as well as two additional dark blue lines showing up in the top left and bottom right part of the plot. Those lines show a repeat from 42 to 65 kb and 1 to 23 kb, respectively. Therefore, the sequence is identical in the first 23 kb as well as the last 23 kb and indicates it as a large repeat region that is likely to be cause through an assembly error