| Literature DB >> 30479891 |
Thomas Andreas Kohl1, Christian Utpatel1, Viola Schleusener1, Maria Rosaria De Filippo2, Patrick Beckert1,3, Daniela Maria Cirillo2, Stefan Niemann1,3.
Abstract
Analyzing whole-genome sequencing data of Mycobacterium tuberculosis complex (MTBC) isolates in a standardized workflow enables both comprehensive antibiotic resistance profiling and outbreak surveillance with highest resolution up to the identification of recent transmission chains. Here, we present MTBseq, a bioinformatics pipeline for next-generation genome sequence data analysis of MTBC isolates. Employing a reference mapping based workflow, MTBseq reports detected variant positions annotated with known association to antibiotic resistance and performs a lineage classification based on phylogenetic single nucleotide polymorphisms (SNPs). When comparing multiple datasets, MTBseq provides a joint list of variants and a FASTA alignment of SNP positions for use in phylogenomic analysis, and identifies groups of related isolates. The pipeline is customizable, expandable and can be used on a desktop computer or laptop without any internet connection, ensuring mobile usage and data security. MTBseq and accompanying documentation is available from https://github.com/ngs-fzb/MTBseq_source.Entities:
Keywords: Antibiotic resistance profiling; Automated analysis pipeline; Bacterial epidemiology; Bacterial genome analysis; Mycobacterium tuberculosis complex; Next-generation sequencing; Phylogeny; Whole-genome sequencing
Year: 2018 PMID: 30479891 PMCID: PMC6238766 DOI: 10.7717/peerj.5895
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Schematic representation of the MTBseq workflow.
Modules encapsulating specific functionality shown in blue boxes.
Sensitivity and specificity for resistance prediction of different tools.
| Antibiotic | Sanger | CASTB | PhyResSE | KvarQ | Mykrobe Predictor TB | TBProfiler | MTBseq | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #R | #S | Sens | Spec | Sens | Spec | Sens | Spec | Sens | Spec | Sens | Spec | Sens | Spec | |
| INH | 28 | 63 | 89 (72, 98) | 100 (92, 100) | 100 (82, 100) | 98 (91, 100) | 86 (67, 96) | 100 (92, 100) | 89 (72, 98) | 100 (92, 100) | 93 (76, 99) | 84 (73, 92) | 100 (82, 100) | 98 (91, 100) |
| RMP | 18 | 73 | 94 (73, 100) | 100 (93, 100) | 100 (74, 100) | 99 (93, 100) | 94 (73, 100) | 100 (93, 100) | 100 (74, 100) | 99 (93, 100) | 100 (74, 100) | 99 (93, 100) | 100 (74, 100) | 100 (93, 100) |
| SM | 37 | 54 | 30 (16, 47) | 100 (90, 100) | 100 (86, 100) | 100 (90, 100) | 57 (39, 73) | 100 (90, 100) | 57 (39, 73) | 100 (90, 100) | 57 (39, 73) | 100 (90, 100) | 100 (86, 100) | 100 (90, 100) |
| EMB | 15 | 76 | 53 (27, 79) | 100 (93, 100) | 94 (70, 100) | 100 (93, 100) | 53 (27, 79) | 100 (93, 100) | 47 (21, 73) | 99 (93, 100) | 94 (70, 100) | 99 (93, 100) | 100 (71, 100) | 100 (93, 100) |
| PZA | 11 | 80 | 45 (17, 77) | 100 (93, 100) | 100 (62, 100) | 99 (93, 100) | 45 (17, 77) | 100 (93, 100) | n.a. | n.a. | 100 (62, 100) | 99 (93, 100) | 100 (62, 100) | 99 (93, 100) |
Notes:
Evaluation of resistance deduction from whole-genome sequence data by programs CASTB, PhyResSE, KvarQ, Mykrobe Predictor TB, TBProfiler, and MTBseq, with sensitivity (Sens) and specificity (Spec) estimated with 95% confidence intervals compared to Sanger sequencing results (#R resistant, #S sensitive).
INH, isoniazid; RMP, rifampicin; SM, streptomycin; EMB, ethambutol; PZA, pyrazinamide.
Figure 2Maximum likelihood phylogenetic tree.
Maximum likelihood phylogenetic tree constructed from the aligned set of SNP positions determined by MTBseq from a collection of 26 MTBC isolates suspected to form an outbreak (Kohl et al., 2014). For tree construction, we employed the program FastTree version 2 (Price, Dehal & Arkin, 2010) in the double precision built with a general time reversible (GTR) substitution model, 1,000 resamples, and Gamma20 likelihood optimization. The resulting tree was visualized with the FigTree and EvolView (He et al., 2016) tools. d12_groups: Groups of clustered isolates were determined by MTBseq with a maximum distance threshold of 12 SNPs using single-linkage clustering and the detected groups are indicated by the colored sample labels. Support value: Reliability values for splits based on resampling over 80% are shown.
Figure 3Pairwise distance matrix.
Pairwise distance matrix calculated by MTBseq from a set of 26 MTBC isolates with identical traditional genotyping patterns suspected to form an outbreak (Kohl et al., 2014). The distance between samples is calculated from the detected variants and smaller distances indicate more closely related samples. Out of the 26 isolates, 22 have overall small pairwise distances indicative of a common cluster. The respective entries for the four remaining isolates are marked in blue (1024-01, 3929-10, 6631-04, 6821-03).