| Literature DB >> 31508405 |
Helena M B Seth-Smith1,2,3, Ferdinando Bonfiglio2,4, Aline Cuénod1,2, Josiane Reist2, Adrian Egli1,2, Daniel Wüthrich1,2,3.
Abstract
Whole genome sequencing (WGS) has become the new gold standard for bacterial outbreak investigation, due to the high resolution available for typing. While sequencing is currently predominantly performed on Illumina devices, the preceding library preparation can be performed using various protocols. Enzymatic fragmentation library preparation protocols are fast, have minimal hands-on time, and work with small quantities of DNA. The aim of our study was to compare three library preparation protocols for molecular typing: Nextera XT (Illumina); Nextera Flex (Illumina); and QIAseq FX (Qiagen). We selected 12 ATCC strains from human Gram-positive and Gram-negative pathogens with %G+C-content ranging from 27% (Fusobacterium nucleatum) to 73% (Micrococcus luteus), each having a high quality complete genome assembly available, to allow in-depth analysis of the resulting Illumina sequence data quality. Additionally, we selected isolates from previously analyzed cases of vancomycin-resistant Enterococcus faecium (VRE) (n = 7) and a local outbreak of Klebsiella aerogenes (n = 5). The number of protocol steps and time required were compared, in order to test the suitability for routine laboratory work. Data analyses were performed with standard tools commonly used in outbreak situations: Ridom SeqSphere+ for cgMLST; CLC genomics workbench for SNP analysis; and open source programs. Nextera Flex and QIAseq FX were found to be less sensitive than Nextera XT to variable %G+C-content, resulting in an almost uniform distribution of read-depth. Therefore, low coverage regions are reduced to a minimum resulting in a more complete representation of the genome. Thus, with these two protocols, more alleles were detected in the cgMLST analysis, producing a higher resolution of closely related isolates. Furthermore, they result in a more complete representation of accessory genes. In particular, the high data quality and relative simplicity of the workflow of Nextera Flex stood out in this comparison. This thorough comparison within an ISO/IEC 17025 accredited environment will be of interest to those aiming to optimize their clinical microbiological genome sequencing.Entities:
Keywords: Illumina; NGS; bacteria; comparison; library; next generation sequencing; prokaryotes; whole genome sequencing
Year: 2019 PMID: 31508405 PMCID: PMC6719548 DOI: 10.3389/fpubh.2019.00241
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
List of sequenced isolates, characteristics, reference genomes, and sample accessions.
| ATCC25586 | 36.4 | ATCC25586 | 27.15 | 1,16,59,182 | 54,71,621 | 38,92,304 | |||
| ATCC700819 | 34.2 | ATCC700819 | 30.55 | 51,04,723 | 80,67,749 | 5,37,051 | |||
| ATCC25923 | 88.4 | ATCC25923 | 32.86 | 90,93,138 | 57,42,025 | 71,39,563 | |||
| ATCC29212 | 39.8 | ATCC29212 | 37.35 | 71,99,132 | 68,06,105 | 69,81,047 | |||
| ATCC19615 | 20.8 | ATCC19615 | 38.48 | 78,95,584 | 60,46,735 | 94,81,835 | |||
| ATCC25845 | 92.0 | ATCC25845 | 40.98 | 69,93,760 | 21,62,813 | 52,57,867 | |||
| ATCC25922 | 27.2 | ATCC25922 | 50.37 | 64,19,681 | 53,21,879 | 61,12,711 | |||
| ATCC700603 | 42.8 | ATCC700603 | 57.73 | 48,25,887 | 58,53,388 | 84,17,937 | |||
| ATCC25177 (H37Ra) | 1.2 | ATCC25177 | 65.61 | 47,94,204 | 96,95,720 | 2,54,69,645 | |||
| ATCC27853 | 42.4 | ATCC27853 | 66.08 | 45,32,729 | 48,88,025 | 68,12,269 | |||
| ATCCBAA-67 | 72.0 | ATCCBAA-67 | 66.42 | 87,99,551 | 55,77,758 | 63,87,296 | |||
| ATCC4698 | 45.6 | ATCC4698 | 73.00 | 50,81,588 | 93,96,130 | 85,84,319 | |||
| NMB004374 | 55.8 | Aus0004 | 37.80 | 53,87,832 | 52,12,078 | 77,60,492 | |||
| NMB004375 | 55.8 | Aus0004 | 37.80 | 52,85,502 | 44,62,856 | 55,05,430 | |||
| NMB004376 | 55.4 | Aus0004 | 37.80 | 49,36,762 | 28,48,407 | 88,145 | |||
| NMB003061 | 56.2 | Aus0004 | 37.80 | 41,98,651 | 52,13,009 | 84,72,370 | |||
| NMB003076 | 47.2 | Aus0004 | 37.80 | 61,97,648 | 64,57,841 | 75,28,868 | |||
| NMB003240 | 57.6 | Aus0004 | 37.80 | 71,97,873 | 75,30,750 | 81,14,687 | |||
| NMB003062 (VRECH001) | 40.6 | Aus0004 | 37.80 | 51,80,248 | 77,99,226 | 76,15,044 | |||
| NMB004427 | 38.6 | KCTC2190 | 55.00 | 62,65,626 | 34,05,502 | 68,37,549 | |||
| NMB004428 | 25 | KCTC2190 | 55.00 | 9,64,566 | 78,95,179 | 1,09,31,859 | |||
| NMB004429 | 29 | KCTC2190 | 55.00 | 4,27,144 | 67,01,537 | 60,70,575 | |||
| NMB004430 | 24.8 | KCTC2190 | 55.00 | 33,39,975 | 38,92,829 | 69,09,301 | |||
| NMB004431 | 28.2 | KCTC2190 | 55.00 | 46,70,831 | 86,19,114 | 56,84,285 | |||
Figure 1Quality assessment of WGS data. (A) The reads of the three library kits subsampled to 100-fold were mapped against the 12 reference genomes and the read depth called was measured. The colors indicate the different library preparation kits. The x-axis reflects the position along the genomes and the y-axis the read depth. (B) The insert size of the different libraries was calculated using the alignment of the paired-end reads to the reference. The boxplots represent the calculations from the different species, with the lowest %G+C-content on the left, and the highest on the right. In the boxplots the lower and upper hinges correspond to the first and third quartiles. The whiskers are located at 1.5x of the interquartile range. (C) The base composition of all the nucleotide sites in the reads was determined. The bases on the left side show the composition around the fragmentation site.
Figure 2Comparison of the sequencing content using k-mers. (A) All k-mers identified within the reads were compared to those k-mer from the reference genomes. The x-axis shows the different subsampling of the reads and the y-axis shows the percent of k-mers that were found in the reads. (B) The assemblies of the sequenced strains were compared against the reference assemblies using the Jaccard index of the k-mers. The x-axis shows the different subsampling of the reads used for each assembly. The y-axis shows the Jaccard index. The colors indicate the different library preparation kits. In the boxplots the lower and upper hinges correspond to the first and third quartiles. The whiskers are located at 1.5x of the interquartile range.
Prediction of AMR determinants in sequenced ATCC strains compared to reference genomes.
| ATCC25177 | 100 | 100 | Y | Y | Y | Y | N | Y | N | Y | Y | Y | Y | Y | Y | Y | Y | |
| 100 | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | N | N | Y | Y | Y | ||
| 100 | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| ATCC25922 | 100 | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| ATCC25923 | 100 | 100 | 2 | 2 | 2 | 2 | Y | N | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| 100 | 79.05 | N | N | N | N | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| ATCC27853 | 100 | 98.53 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| 100 | 99.22 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 98.39 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 99.92 | P | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| ATCC29212 | 100 | 97.98 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | P | Y | Y | Y | Y | |
| 100 | 100 | Y | Y | Y | Y | Y | P | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 99.8 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| ATCC700603 | 100 | 99.42 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| 100 | 93.79 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 95.94 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 95.24 | P | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 100 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 100 | Y | Y | Y | Y | Y | N | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 100 | N | N | 2 | 2 | Y | N | Y | Y | Y | Y | 2 | Y | Y | Y | Y | ||
| 100 | 87.59 | N | Y | Y | Y | Y | N | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 100 | N | Y | Y | Y | Y | N | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 100 | N | Y | Y | Y | Y | N | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| 100 | 100 | N | Y | Y | Y | Y | N | Y | Y | Y | Y | Y | Y | Y | Y | Y | ||
| ATCC700819 | 99.75 | 99.63 | P | N | N | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
| ATCCBAA-67 | 95.74 | 84.89 | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | |
All under 70% coverage and/or 70% identity were screened out. Y, identified; N, not identified (red); P, partial (yellow); 2, split over 2 contigs (yellow).
This sequence also assembled a contig of 896 bp which is predicted to carry a dfrC resistance determinant: % coverage 91; % identity 76.
Figure 3cgMLST alleles identified from the patient isolates. The different subsamples (x-axis) were used to determine of the alleles of the core genome. The different strains are depicted as bars. The y-axis shows the percentage of core genes that can be used for allelic typing. The colors indicate the different library preparation kits. The E. faecium isolates were analyzed using Mentalist (A) and Ridom SeqSphere+ (B). The K. aerogenes isolates were analyzed only using Ridom SeqSphere+ (C). The failed Qia library is labeled with “*”.
Figure 4Analysis of the K. aerogenes outbreak isolates. (A) The isolates (200-fold subsamples) were analyzed using cgMLST in Ridom SeqSphere+ and are depicted in a minimum spanning tree (MST). The isolates are shown as circles. If two strains are identical they collapse into one circle. The numbers on the lines connecting the different circles show the number of different alleles between two isolates (not to scale). (B) The genomic distances between the isolates (200-fold subsamples) is show as a phylogenetic tree representing all SNP differences across the whole genome. (C) SNP numbers across the tree called using the different subsamples.
Key features of the compared library preparation kits.
| Time required | 2.5 h | 4 h | 4 h |
| DNA input amount range (ng) | 1–1 | 1–500 | 1–1,000 |
| Adjustments required for variable input | No variable input supported | PCR cycles required to be adjusted, using <50 ng | Additional PCR step is required if using <100 ng (+ 90 min) |
| Insert size behavior | Affected by DNA input amount and %G+C-content | Barely affected by the input DNA | Affected by DNA input amount |
| Available barcodes | 384 | 384 | 96 |
| Limitations | Highly affected by input DNA | Bead-linked transposomes (BLT) handling needs practice | Reagent volumes are tight |
| Key advantage | Simple protocol | Highly standardized output (input DNA independent) | PCR-free (>100 ng input DNA) |
| Special feature | Fast protocol | Produces normalized libraries (>100 ng input DNA) | Insert size can easily be adjusted to needs |
| Data quality | Highly variable read depth | High quality data | High quality data |
| Recommended read depth | G+C < 50%: 200 x | 50 x | 50 x |