| Literature DB >> 30018852 |
Rahim Rajwani1, Sheeba Shehzad1, Gilman Kit Hang Siu1.
Abstract
BACKGROUND: Tuberculosis (TB) resulted in an estimated 1.7 million deaths in the year 2016. The disease is caused by the members of Mycobacterium tuberculosis complex, which includes Mycobacterium tuberculosis, Mycobacterium bovis and other closely related TB causing organisms. In order to understand the epidemiological dynamics of TB, national TB control programs often conduct standardized genotyping at 24 Mycobacterial-Interspersed-Repetitive-Units (MIRU)-Variable-Number-of-Tandem-Repeats (VNTR) loci. With the advent of next generation sequencing technology, whole-genome sequencing (WGS) has been widely used for studying TB transmission. However, an open-source software that can connect WGS and MIRU-VNTR typing is currently unavailable, which hinders interlaboratory communication. In this manuscript, we introduce the MIRU-profiler program which could be used for prediction of MIRU-VNTR profile from WGS of M. tuberculosis. IMPLEMENTATION: The MIRU-profiler is implemented in shell scripting language and depends on EMBOSS software. The in-silico workflow of MIRU-profiler is similar to those described in the laboratory manuals for genotyping M. tuberculosis. Given an input genome sequence, the MIRU-profiler computes alleles at the standard 24-loci based on in-silico PCR amplicon lengths. The final output is a tab-delimited text file detailing the 24-loci MIRU-VNTR pattern of the input sequence. VALIDATION: The MIRU-profiler was validated on four datasets: complete genomes from NCBI-GenBank (n = 11), complete genomes for locally isolated strains sequenced using PacBio (n = 4), complete genomes for BCG vaccine strains (n = 2) and draft genomes based on 250 bp paired-end Illumina reads (n = 106).Entities:
Keywords: MIRU-VNTR; Mycobacterium tuberculosis; TB transmission; Whole genome sequencing
Year: 2018 PMID: 30018852 PMCID: PMC6045920 DOI: 10.7717/peerj.5090
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1A schematic representation of the MIRU-profiler workflow in comparison with the experimental protocol for the standardized genotyping of M. tuberculosis using 24 MIRU-VNTR loci.
The similarity between the MIRU-profiler and experimental workflow is illustrated at four stages: input, PCR, allele-calling and output. By simulating the experimental workflow in-silico, the MIRU-profiler is designed to infer the 24-loci MIRU-VNTR pattern from whole-genome-sequences of M. tuberculosis. As indicated at the stage 4 output, the 24-digit numeric pattern inferred by the MIRU-profiler might be identical and comparable to those obtained by a wet-laboratory experiment.
Comparisons among the MIRU-profiler, CASTB and experimental results.
| GenBank accession | MIRU-profiler result | CASTB result | Experimental result | Mismatched locus |
|---|---|---|---|---|
| Dataset 1: Genome assemblies downloaded from the NCBI database. | ||||
|
| 2-4-3-2-5-5-3-3-2-3-2-3-4-2-5-1-5-3-2-3-3-6-3-2 | 2-4-3-2-5-5-3-3-2-3-2-3-4-2-5-1-5-3-2-3-3-6-3-2 | 2-4-3-2-5-5-3-3-2-3-2-3-4-2-5-1-5-3-2-3-3-6-3-2 | None |
|
| 2-4-4-2-3-3-3-5-2-6-4-4-4-2-5-1-7-3-3-5-3-7-2-3 | 2-4-4-2-3-3-3-5-2-6-4-4-4-2-5-1-7-3-3-5-3-7-2-3 | 2-4-4-2-3-3-3-5-2-6-4-4-4-2-5-1-7-3-3-5-3-7-2-3 | None |
|
| 1-3-2-2-5-4-2-3-2-1-2-4-1-2-5-1-5-3-3-2- | 1-3-2-2-5-4-2-3-2-1-2-4-1-2-5-1-5-3-3-2- | 1-3-2-2-5-4-2-3-2-1-2-4-1-2-5-1-5-3-3-2- | Mtub39 |
|
| 2-2-3-2-2-4-3-3-2-6-3-4-4-2-2-1-5-3-3-3-4-7-3-2 | 2-2-3-2-2-4-3-3-2-6-3-4-4-2-2-1-5-3-3-3-4-7-3-2 | 2-2-3-2-2-4-3-3-2-6-3-4-4-2-2-1-5-3-3-3-4-7-3-2 | None |
|
| 2-2-3-2-3-4-3-3-1-4-3-2-4-2-3-1-5-3-3-3- | 2-2-3-2-3-4-3-3-1-4-3-2-4-2-3-1-5-3-3-3- | 2-2-3-2-3-4-3-3-1-4-3-2-4-2-3-1-5-3-3-3- | Mtub39, QUB26 |
|
| 2-2-4-2-3-2-3-3-2-3-3-4-2-1-5-1-6-3-3-2-3- | 2-2-4-2-3-2-3-3-2-3-3-4-2-1-5-1-6-3-3-2-3- | 2-2-4-2-3-2-3-3-2-3-3-4-2-1-5-1-6-3-3-2-3- | QUB26 |
|
| 2-2-4-2-1-3-3-5-2-9-4-4-4-2-5-1-7-1-3-4-3-8-2-3 | 2-2-4-2-1-3-3-5-2-9-4-4-4-2-5-1-7-1-3-4-3-8-2-3 | 2-2-4-2-1-3-3-5-2-9-4-4-4-2-5-1-7-1-3-4-3-8-2-3 | None |
|
| 2-2-4-2-1-3-3-5-2-8-4- | 2-2-4-2-1-3-3-5-2-8-4- | 2-2-4-2-1-3-3-5-2-8- | Mtub29, Mtub30, QUB26 |
|
| 2-2-4-2-1-3-3-5-2-9-4- | 2-2-4-2-1-3-3-5-2-9-4- | 2-2-4-2-1-3-3-5-2-9- | Mtub29 |
|
| 2-2-4-2-1-3-3-5-2-9-4-2-2-2-5-1-7-1-3-4-3- | 2-2-4-2-1-3-3-5-2-9-4-2-2-2-5-1-7-1-3-4-3- | 2-2-4-2-1-3-3-5-2-9-4-2-2-2-5-1-7-1-3-4-3- | QUB26 |
|
| 2-2-4-2-1-3-3-5-2-9-4-2-2-2-5-1-7-1-3-4-3-1-2-3 | 2-2-4-2-1-3-3-5-2-9-4-2-2-2-5-1-7-1-3-4-3-1-2-3 | 2-2-4-2-1-3-3-5-2-9-4-2-2-2-5-1-7-1-3-4-3-1-2-3 | None |
| Dataset 2: Complete genome sequences of four local strains isolated from Hong Kong. | ||||
|
| 2-4-4-2-3-3-3-2-2-5-4-4-4-2-5-1-6-3-3-5-3-7-2-3 | 2-4-4-2-3-3-3-2-2-5-4-4-4-2-5-1-6-3-3-5-3-7-2-3 | 2-4-4-2-3-3-3-2-2-5-4-4-4-2-5-1-6-3-3-5-3-7-2-3 | None |
|
| 2-4-4-2-1-3-3-5-2-5-4-4-4-2-5-1-7-1-3-5-3-8-2-3 | 2-4-4-2-1-3-3-5-2-5-4-4-4-2-5-1-7-1-3-5-3-8-2-3 | 2-4-4-2-1-3-3-5-2-5-4-4-4-2-5-1-7-1-3-5-3-8-2-3 | None |
|
| 2-2-4-2-2-3-3-5-2-5-4-4-4-2-5-1-7-3-3-4-5-8-2-3 | 2-2-4-2-2-3-3-5-2-5-4-4-4-2-5-1-7-3-3-4-5-8-2-3 | 2-2-4-2-2-3-3-5-2-5-4-4-4-2-5-1-7-3-3-4-5-8-2-3 | None |
|
| 2-2-4-2-2-3-3-5-2-5-4-4-4-2-5-1-7-3-3-4-5-8-2-3 | 2-2-4-2-2-3-3-5-2-5-4-4-4-2-5-1-7-3-3-4-5-8-2-3 | 2-2-4-2-2-3-3-5-2-5-4-4-4-2-5-1-7-3-3-4-5-8-2-3 | None |
| Dataset 3: Complete genome sequences for BCG-vaccine strains | ||||
|
| 2-0-6-2s-2-2-3-1-2-3-5-2-2-5-4-2-5-3-3-3-2-5-0-2 | 2-0-6-2s-2-2-3-1-2-3-5-2-2-5-4-2-5-3-3-3-2-5-0-2 | 2-0-6-2s-2-2-3-1-2-3-5-2-2-5-4-2-5-3-3-3-2-5-0-2 | None |
|
| 2-0-5-3s-2-2-3-1-2-3-5-2-2-5-4-2-5-3-3-3-2-4-0-2 | 2-0-5-3s-2-2-3-1-2-3-5-2-2-5-4-2-5-3-3-3-2-4-0-2 | 2-0-5-3s-2-2-3-1-2-3-5-2-2-5-4-2-5-3-3-3-2-4-0-2 | None |
Notes.
The number of repeats at the 24 MIRU-VNTR loci are presented in a 24-digit numeric pattern where each-digit represents the number of repeats at a particular locus according to the following order of the loci: MIRU02-Mtub04-ETRC-MIRU04-MIRU40-MIRU10-MIRU16-Mtub21-MIRU20-QUB11b-ETRA-Mtub29-Mtub30-ETRB-MIRU23-MIRU24-MIRU26-MIRU27-Mtub34-MIRU31-Mtub39-QUB26-QUB4156-MIRU39. The mismatched loci are bold and underlined .
Mismatched-locus = any locus that do not agree between the MIRU-profiler and experimental result.
Figure 2Evaluation of the MIRU-profiler on the genome assemblies based on Illumina Miseq reads.
(A) A summary of the evaluation results is presented in the boxplot where number of matched, mismatched and indeterminate loci (unknown allele number or not detected) per sample are shown. (B) Percentage agreement between MIRU-profiler and experimental results for each locus is shown. (C) The effect of average sequencing depth on the number of loci that could be detected by the MIRU-profiler is shown. The analysis was performed by down sampling reads from one of the samples to 5×, 10×, 15×, 20×, 25× and 30×.