| Literature DB >> 34110280 |
Ivan Sserwadda1,2, Gerald Mboowa3,2.
Abstract
The recent re-emergence of multidrug-resistant pathogens has exacerbated their threat to worldwide public health. The evolution of the genomics era has led to the generation of huge volumes of sequencing data at an unprecedented rate due to the ever-reducing costs of whole-genome sequencing (WGS). We have developed the Rapid Microbial Analysis Pipeline (rMAP), a user-friendly pipeline capable of profiling the resistomes of ESKAPE pathogens (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa and Enterobacter species) using WGS data generated from Illumina's sequencing platforms. rMAP is designed for individuals with little bioinformatics expertise, and automates the steps required for WGS analysis directly from the raw genomic sequence data, including adapter and low-quality sequence read trimming, de novo genome assembly, genome annotation, single-nucleotide polymorphism (SNP) variant calling, phylogenetic inference by maximum likelihood, antimicrobial resistance (AMR) profiling, plasmid profiling, virulence factor determination, multi-locus sequence typing (MLST), pangenome analysis and insertion sequence characterization (IS). Once the analysis is finished, rMAP generates an interactive web-like html report. rMAP installation is very simple, it can be run using very simple commands. It represents a rapid and easy way to perform comprehensive bacterial WGS analysis using a personal laptop in low-income settings where high-performance computing infrastructure is limited.Entities:
Keywords: ESKAPE; command line; pipeline; rMAP; rapid microbial analysis; whole-genome sequencing
Mesh:
Substances:
Year: 2021 PMID: 34110280 PMCID: PMC8461470 DOI: 10.1099/mgen.0.000583
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Comprehensive list of third-party tools and algorithms used in rMAP
|
Software |
Version |
Summary |
|---|---|---|
|
Abricate |
1.0.1 |
Detection of antimicrobial resistance genes, plasmids and virulence factors |
|
AMRfinder |
3.8.4 |
Detection of antimicrobial resistance genes from assembled contigs |
|
Any2fasta |
0.4.2 |
Converts any genomic data format to fasta format |
|
Assembly-stats |
1.0.1 |
Summarizes quality assembly metrics from contigs |
|
Biopython.convert |
1.0.3 |
Conversion and manipulation of different genomic data formats |
|
BMGE |
1.12 |
Block mapping and gathering with entropy for removal of ambiguously aligned reads from multiple sequence alignments |
|
BWA |
0.7.17 |
Burrow–Wheeler algorithm for fast alignment of short sequence reads |
|
Cairosvg |
2.4.2 |
Converts SVG to PDF and PNG formats |
|
Fastqc |
0.11.9 |
Quality control and visualization of HTS data |
|
Fasttree |
2.1.10 |
Ultra-fast inference of phylogeny using the maximum-likelihood method |
|
Freebayes |
1.3.2 |
Bayesian-based haplotype prediction of nucleotide variants |
|
ISMapper |
2.0.1 |
Detection of insertion sequences within genomes |
|
IQtree |
2.0.3 |
Inference of phylogeny using the maximum-likelihood method |
|
Kleborate |
1.0.0 |
Screening for AMR genes and MLSTs from genome assemblies |
|
Lxml |
4.5.2 |
Parsing of XML and HTML using Python |
|
Mafft |
7.471 |
Algorithm for performing multiple sequence alignments |
|
Multiqc |
1.9 |
Aggregates numerous HTML quality reports into a single file |
|
Megahit |
1.2.9 |
Ultra-fast genome assembly algorithm |
|
Mlst |
2.19.0 |
Characterization and detection of clones within a population of pathogenic isolates |
|
Nextflow |
20.07.1 |
Portable next-generation workflow language that enables reproducibility and development of pipelines |
|
Parallel |
20200722 |
Executes jobs in parallel |
|
Prinseq |
0.20.4 |
Trims, filters and reformats genomic sequence data |
|
Prodigal |
2.6.3 |
Prediction of protein-coding genes in prokaryotic genomes |
|
Prokka |
1.14.6 |
Fast and efficient annotation of prokaryotic assembled genomes |
|
Quast |
5.0.2 |
Quality assembly assessment tool |
|
Roary |
3.13.0 |
Large-scale pangenome analysis |
|
R-base |
4.0.2 |
Statistical data computing and graphical software |
|
Samclip |
0.4.0 |
Filters SAM file for soft and hard clipped alignments |
|
Samtools |
1.9 |
Tools for manipulation of next-generation sequence data |
|
Shovill |
1.0.9 |
Illumina short-read assembler for bacterial genomes |
|
Snippy |
4.3.6 |
Rapid haploid bacterial variant caller |
|
Snpeff |
4.5covid19 |
Functional effect and variant predictor suite |
|
SRA-tools |
2.10.8 |
Toolbox for acquisition and manipulation of sequences from the NCBI |
|
Trimmomatic |
0.39 |
Illumina short-read adapter trimming algorithm |
|
Unicycler |
0.4.8 |
A hybrid assembly pipeline for Illumina and long-read sequence data |
|
Vt |
2015.11.10 |
A tool for normalizing variants in genomic sequence data |
Fig. 1.Schematic graphical representation of rMAP pipeline workflow and associated tools.
ESKAPE group insertion sequence families (both Gram-positive and Gram-negative) used by rMAP
|
Sequence name |
Determinant genes |
Conferred resistance |
|---|---|---|
|
IS903 |
|
Kanamycin |
|
ISApl1 |
|
Colistin |
|
ISEc69 |
|
Colistin |
|
ISAba14 |
|
Kanamycin |
|
ISAba1 |
|
Carbapenems, beta-lactams |
|
IS16 |
|
Vancomycin |
|
IS256 |
|
Phenicols, lincosamides, oxazolidinones, pleuromutilins, streptogramin A |
|
IS257-2 |
|
Kanamycin, bleomycin, fosfomycin, fusidic acid, tetracycline, gentamicin, streptogramin A, trimethoprim |
|
IS1182 |
|
Streptomycin, kanamycin, neomycin, streptothricin |
|
IS1216 |
|
Phenicols, lincosamides, oxazolidinones, pleuromutilins, streptomycin, streptogramin A |
|
IS1272 |
|
Methicillin |
|
IS1182 |
|
Aminoglycoside |
|
ISEnfa4 |
|
Phenicols, lincosamides, oxazolidinones, pleuromutilins, streptogramin A |
|
ISEcp1 |
|
Cefotaxime, ceftriaxone, aztreonam |
|
ISSau1 |
|
Methicillin |
|
ISKpn23 |
|
Carbapenems, cephalosporins, monobactams |
Summary of some stages of intermediate files generated from rMAP
|
Analysis |
Metrics |
Description |
|---|---|---|
|
Assembly |
Genome length, average genome length, N50, GC content and sequencing depth |
Genome length – an estimate of the draft genome assembly length Average genome length – average read length of genomes N50 – length of smallest contig covering 50 % of genome GC content – guanine–cytosine content of draft genome Depth – no. of times each nucleotide position in the draft genome has a read that aligns to that position |
|
Phylogeny |
|
SNPs are used to infer phylogenetic relationships between samples |
|
Variant calling |
SNPs |
SNP – a single-nucleotide base change from the reference genome that occurs anywhere within the genome |
|
Antimicrobial resistance profiling |
Contig, gene, identity, product |
Contig – continuous consensus nucleotide sequences without gaps Gene – antibiotic resistance gene identified within the assembly Identity – percentage representing exact nucleotide matches Product – artefact produced from antibiotic resistance gene |
|
Pangenome analysis |
Core genes, soft core genes, shell genes, cloud genes |
The genes are compared against each other across samples to predict genome plasticity and to detect how much of the accessory genome has been taken up by organisms over the course of time |
Fig. 2.Selected interactive species HTML reports. (a) Genome assembly summary statistics for the different species isolates. These include common genome analysis key metrics for checking assembly quality. (b) Table of multi-locus sequence typing (MLST) distribution. (c) SNP-based approximately maximum-likelihood phylogenetic tree. Three different formats are available, i.e circular (phylogram), circular (cladogram) and rectangular (phylogram). An approximately maximum-likelihood phylogenetic tree is computed based on SNPs detected via read mapping against a reference genome and stored in a standard Newick file format. (d) Pangenome analysis including a schematic representation of gene presence (colour) or absence (blank) between samples. (e) Antibiotic resistance profile. Presence/absence of antibiotic resistance genes (coverage and identity >90 %) for each sample. An antibiotic resistance profile is computed based on Resfinder, CARD, ARG-ANNOT, NCBI and MEGARES annotations for each isolate and transformed into an overview that allows a rapid resistome comparison of all analysed isolates.
rMAP’s wall clock runtimes for different bacterial species across different operating system platforms
|
Genomes |
Genome size |
Ubuntu |
macOS Mojave |
|---|---|---|---|
|
15 |
~2.9 Mbp |
22 h |
18 h |
|
9 |
~3.9 Mbp |
22 h |
19 h |
|
14 |
~2.9 Mbp |
21 h |
17 h |