| Literature DB >> 33981853 |
Malte B Hallgren1, Søren Overballe-Petersen2, Ole Lund1, Henrik Hasman2, Philip T L C Clausen1.
Abstract
For detection of clonal outbreaks in clinical settings, we present a complete pipeline that generates a single-nucleotide polymorphisms-distance matrix from a set of sequencing reads. Importantly, the program is able to handle a separate mix of both short reads from the Illumina sequencing platforms and long reads from Oxford Nanopore Technologies' (ONT) platforms as input. MINTyper performs automated reference identification, alignment, alignment trimming, optional methylation masking, and pairwise distance calculations. With this approach, we could rapidly and accurately cluster a set of DNA sequenced isolates, with a known epidemiological relationship to confirm the clustering. Functions were built to allow for both high-accuracy methylation-aware base-called MinION reads (hac_m Q10) and fast generated lower-quality reads (fast Q8) to be used, also in combination with Illumina data. With fast Q8 reads a higher number of base pairs were excluded from the calculated distance matrix, compared with the high-accuracy methylation-aware Q10 base-calling of ONT data. Nonetheless, when using different qualities of ONT data with corresponding input parameters, the clustering of isolates were nearly identical.Entities:
Keywords: ONT; SNP; bioinformatics; clustering
Year: 2021 PMID: 33981853 PMCID: PMC8106442 DOI: 10.1093/biomethods/bpab008
Source DB: PubMed Journal: Biol Methods Protoc ISSN: 2396-8923
overview of the number of SNPs differences between the consensus sequences generated by sequencing the same isolate on an Illumina platform and ONT MinION platform without trimming alignments and DCM methylation masking on fast Q8 data, alignment trimming, and DCM methylation masking on fast Q8 data and alignment trimming but not DCM methylation masking on hac_m Q10 data
| Isolate name | ΔSNP Q8 | ΔSNP Q8 with masking | ΔSNP Q10 |
|---|---|---|---|
| Ec01_ST410_CT587 | 28 | 0 | 0 |
| Ec02_ST410_CT587 | 28 | 0 | 0 |
| Ec03_ST410_CT587 | 28 | 0 | 0 |
| Ec04_ST410_CT587 | 28 | 0 | 0 |
| Ec05_ST410_CT587 | 28 | 0 | 0 |
| Ec06_ST410_CT587 | 28 | 0 | 0 |
| Ec07_ST410_CT527 | 30 | 0 | 3 |
| Ec08_ST410_CT611 | 28 | 0 | 0 |
| Ec09_ST410_CT512 | 29 | 1 | 1 |
| Ec10_ST410_CT596 | 28 | 0 | 0 |
| Ec11_ST410_CT523 | 29 | 0 | 3 |
| Ec12_ST410_CT278 | 34 | 0 | 2 |
Alignment trimming was performed with a minimum distance of 10 between accepted SNPs.
Figure 1:clustering of sequences from Illumina (denoted int) and fast base-called Q8 ONT sequences (denoted Q8) of 12 E. coli, based on core genome SNPs without alignment trimming. Isolates Ec01–Ec06 are from an outbreak in Denmark, while Ec07–Ec12 originate from different foreign countries.
Figure 2:clustering of sequences from Illumina (denoted int) and fast base-called Q8 ONT sequences (denoted Q8) of 12 E. coli, based on core genome SNPs. SNPs were trimmed away if they were within a proximity of 10, together with masking of DCM methylation-sites. Isolates Ec01–Ec06 are from an outbreak in Denmark and Ec07–Ec12 originate from different foreign countries.
Figure 3:clustering of sequences from Illumina (denoted int) and high-accuracy methylation-aware (hac_m) base-called Q10 ONT sequences (denoted Q10) of 12 E. coli, based on core genome SNPs. SNPs were trimmed away if they were within a proximity of 10. Isolates Ec01–Ec06 are from an outbreak in Denmark and Ec07–Ec12 originate from different foreign countries.
computational requirements of tested methods against 12 E. coli isolates sequenced on Illumina and ONT MinIon with fast base-called Q8 and high-accuracy methylation-aware Q10 base-calling data
| Method | Correct clustering | CPU time (h:mm:ss) | Peak memory |
|---|---|---|---|
| Illumina and fast base-called Q8 ONT data | |||
| MINTypernj | No | 1:56:06 | 10.7 GB |
| MINTyper1, nj | Yes | 1:56:08 | 10.7 GB |
| MINTyperiq | No | 2:14:46 | 10.7 GB |
| MINTyper1, iq | Yes | 1:57:58 | 10.7 GB |
| MINTyperft | No | 3:47:52 | 24.9 GB |
| MINTyper1, ft | Yes | 3:34:46 | 23.9 GB |
| MINTyper3, nj, * | Yes | 1:54:56 | 1.5 GB |
| MASH5, nj | No | 0:20:14–0:48:32 | 2.8 MB–2.3 GB |
| MASH6, nj | No | 0:24:06–4:12:51 | 0.3–11.6 GB |
| MASH7, nj | No | 0:44:26–5:03:50 | 1.3–29.6 GB |
| Illumina and high-accuracy Q10 ONT data | |||
| MINTyper2, nj | Yes | 1:23:01 | 10.1 GB |
| MINTyper2, iq | Yes | 1:27:47 | 10.1 GB |
| MINTyper2, ft | Yes | 2:54:16 | 24.1 GB |
| MINTyper4, nj, * | Yes | 1:27:48 | 1.8 GB |
| MASH5, nj | No | 0:17:45–0:33:43 | 3.0 MB–1.2 GB |
| MASH6, nj | No | 0:22:16–2:40:07 | 0.3–5.9 GB |
| MASH7, nj | No | 0:41:15–3:28:45 | 1.3–16.3 GB |
1: Alignment trimming; core-genome SNPs with a minimum distance of 10 between called SNPs and DCM-methylation masking, 3: Alignment trimming; core-genome SNPs with a minimum distance of 10 between SNPs and DCM-methylation masking, 4: Alignment trimming; core-genome SNPs with a minimum distance of 10 between SNPs, 5: Sketch size of 1024 with minimum k-mer occurrence thresholds varying from [1–32], 6: Sketch size of 1048576 with minimum k-mer occurrence thresholds varying from [1–32], 7: Sketch size of 4194304 with minimum k-mer occurrence thresholds varying from [1–32], nj: Neighbor-Joining tree-construction, iq: IQtree was used for tree-construction, ft: FastTree was used for tree-construction, *Ec01 from the ONT data was assembled with Unicycler, polished with Medaka, and used as reference.