| Literature DB >> 30157759 |
Philip T L C Clausen1,2, Frank M Aarestrup3, Ole Lund4.
Abstract
BACKGROUND: As the cost of sequencing has declined, clinical diagnostics based on next generation sequencing (NGS) have become reality. Diagnostics based on sequencing will require rapid and precise mapping against redundant databases because some of the most important determinants, such as antimicrobial resistance and core genome multilocus sequence typing (MLST) alleles, are highly similar to one another. In order to facilitate this, a novel mapping method, KMA (k-mer alignment), was designed. KMA is able to map raw reads directly against redundant databases, it also scales well for large redundant databases. KMA uses k-mer seeding to speed up mapping and the Needleman-Wunsch algorithm to accurately align extensions from k-mer seeds. Multi-mapping reads are resolved using a novel sorting scheme (ConClave scheme), ensuring an accurate selection of templates.Entities:
Mesh:
Year: 2018 PMID: 30157759 PMCID: PMC6116485 DOI: 10.1186/s12859-018-2336-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overview of step 1–4 of the KMA algorithm. 1: trim reads. 2. Match k-mers between query and database. 3a: Extend matching k-mer seeds, and identify regions with mismatches. 3b: Use the Needleman-Wunsch algorithm to align regions of mismatching k-mers. 4: Conclave scoring used to choose one best-aligning template per query sequence.
Performance of KMA, SRST2, MGmapper, BWA-MEM, Bowtie2, Minimap2 and Salmon, on simulated data generated from the ResFinder database. A minimum mapping quality of 1 was used to ensure reproducibility
| Method / Performance | Single end read set | Paired end read set | ||||
|---|---|---|---|---|---|---|
| MCC | Sensitivity | PPV | MCC | Sensitivity | PPV | |
| KMA | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| SRST2 | 0.591 | 0.999 | 0.350 | 0.659 | 0.999 | 0.436 |
| MGmapper | 0.676 | 0.457 | 1.000 | 0.443 | 0.196 | 1.000 |
| BWA-MEM | 0.585 | 0.342 | 1.000 | 0.580 | 0.337 | 1.000 |
| Bowtie2 | 0.480 | 0.480 | 0.480 | 0.577 | 0.577 | 0.577 |
| Minimap2 | 0.591 | 0.353 | 0.988 | 0.671 | 0.455 | 0.991 |
| BWA-MEM / Salmon | 0.720 | 0.720 | 0.720 | 0.500 | 0.353 | 0.707 |
| Bowtie2 / Salmon | 0.390 | 0.389 | 0.398 | 0.250 | 0.177 | 0.368 |
MCC Matthews correlation coefficient, PPV Positive Prediction Value
Fig. 2Distribution of false positives (FP) and false negatives (FN), for KMA, SRST2, MGmapper, Bowtie2, BWA-MEM, MiniMap2 and Bowtie2 and BWA-MEM post processed with Salmon when mapping simulated reads from the ResFinder database back to the ResFinder database. A minimum mapping quality of 1 was used to ensure reproducibility.
Performance measures of KMA, SRST2, MGmapper, BWA-MEM, Bowtie2, Minimap2 and Salmon, for predicting genes directly from raw reads. Thresholds for predicting a gene has been set to: 90% coverage, 90% identity and a minimum depth of 5. A minimum mapping quality of 10 was used for methods relying on post processing with SAMtools and BEDTools, as this gave the best performance across the tested thresholds
| Mapping method | Post- processing method | Avg. mapping CPU time | Avg. post- processing CPU time | Peak memory | MCC |
|---|---|---|---|---|---|
| Predicting antimicrobial resistance | |||||
| KMA | NA | 00:00:24.6 | NA | 42.3 MB | 1.000 |
| SRST2 | NA | 00:10:21.3 | NA | 165.0 MB | 1.000 |
| MGmappera | NA | 00:13:14.2 | NA | 101.4 MB | 0.288 |
| BWA-MEM | SAMtools / BEDTools | 00:07:35.5 | 00:00:06.1 | 113.0 MB | 0.000 |
| BWA-MEMb | Salmon | 00:07:34.5 | 00:00:12.2 | 694.9 MB | 0.828 |
| Bowtie2 | SAMtools / BEDTools | 00:02:35.5 | 00:00:06.7 | 33.7 MB | 0.000 |
| Bowtie2b | Salmon | 00:03:16.4 | 00:02:24.5 | 935.8 MB | 0.623 |
| Minimap2 | SAMtools / BEDTools | 00:02:18.6 | 00:00:06.0 | 517.3 MB | 0.000 |
| Mapping towards cgMLST alleles | |||||
| KMA | NA | 00:07:02.1 | NA | 8.3 GB | 0.998 |
| SRST2 | NA | > 99:99:99.9 | NA | NA | NA |
| MGmappera | NA | 01:23:21.5 | NA | 8.7 GB | 0.062 |
| BWA-MEM | SAMtools / BEDTools | 02:14:50.8 | 00:14:06.7 | 8.9 GB | 0.021 |
| BWA-MEMb | Salmon | 03:22:45.4 | 04:41:09.6 | 104.2 GBP | 0.530 |
| Bowtie2 | SAMtools / BEDTools | 01:50:56.8 | 00:15:23.5 | 4.1 GB | 0.035 |
| Bowtie2b | Salmon | > 99:99:99.9 | NA | NA | NA |
| Minimap2 | SAMtools / BEDTools | 01:20:56.2 | 00:13:11.7 | 33.6 GB | 0.035 |
a MGmapper was executed on the forward reads only, as paired end mode crashed
b Report all alignments
P The post processing method was responsible for the peak memory consumption