| Literature DB >> 31847944 |
Rana Jajou1,2,3, Thomas A Kohl3,4,5, Timothy Walker6, Anders Norman7, Daniela Maria Cirillo8, Elisa Tagliani8, Stefan Niemann4,9, Albert de Neeling1, Troels Lillebaek10,7, Richard M Anthony1, Dick van Soolingen1.
Abstract
BackgroundWhole genome sequencing (WGS) is a reliable tool for studying tuberculosis (TB) transmission. WGS data are usually processed by custom-built analysis pipelines with little standardisation between them.AimTo compare the impact of variability of several WGS analysis pipelines used internationally to detect epidemiologically linked TB cases.MethodsFrom the Netherlands, 535 Mycobacterium tuberculosis complex (MTBC) strains from 2016 were included. Epidemiological information obtained from municipal health services was available for all mycobacterial interspersed repeat unit-variable number of tandem repeat (MIRU-VNTR) clustered cases. WGS data was analysed using five different pipelines: one core genome multilocus sequence typing (cgMLST) approach and four single nucleotide polymorphism (SNP)-based pipelines developed in Oxford, United Kingdom; Borstel, Germany; Bilthoven, the Netherlands and Copenhagen, Denmark. WGS clusters were defined using a maximum pairwise distance of 12 SNPs/alleles.ResultsThe cgMLST approach and Oxford pipeline clustered all epidemiologically linked cases, however, in the other three SNP-based pipelines one epidemiological link was missed due to insufficient coverage. In general, the genetic distances varied between pipelines, reflecting different clustering rates: the cgMLST approach clustered 92 cases, followed by 84, 83, 83 and 82 cases in the SNP-based pipelines from Copenhagen, Oxford, Borstel and Bilthoven respectively.ConclusionConcordance in ruling out epidemiological links was high between pipelines, which is an important step in the international validation of WGS data analysis. To increase accuracy in identifying TB transmission clusters, standardisation of crucial WGS criteria and creation of a reference database of representative MTBC sequences would be advisable.Entities:
Keywords: TB; Whole genome sequencing; analysis pipelines; epidemiology; international; tuberculosis
Mesh:
Year: 2019 PMID: 31847944 PMCID: PMC6918587 DOI: 10.2807/1560-7917.ES.2019.24.50.1900130
Source DB: PubMed Journal: Euro Surveill ISSN: 1025-496X
Summary of whole genome sequencing pipeline settings applied for each SNP pipeline
| Settings | RIVM SNP | Oxford University SNP | Research Center Borstel (MTBseq) SNP | SSI SNP |
|---|---|---|---|---|
| H37Rv reference genome version | 3 | 2 | 3 | 3 |
| Alignment software | Bowtie | Stampy | BWA | BWA |
| SNP calling software | Breseq | Samtools | Samtools | Samtools |
| Minimum mean sample coverage depth | ≥ 20x | NA | ≥ 30x | ≥ 20x |
| Minimum sample coverage breadth | NA | > 88% | ≥ 80% fulfilling thresholds for variant detection | ≥ 95% |
| Genomic regions excluded | Repeats | Repeats | Repeats, resistance genes | Repeats |
| Minimum coverage depth to support a SNP | NA | 5x | 8x | 8x |
| Excluding SNPs within 12bp | Yes | No | Yes | Yes |
| Allele frequency | ≥ 80% | ≥ 90% | ≥ 75% | ≥ 85% |
| Dealing with low coverage positions or positions not meeting variant call criteria when calculating the genetic distance | Report reference base | Report consensus base | Report consensus base or exclude position if data quality is below thresholds in >5% of samples | Complement with data from aligned reads if coverage is > 5x or exclude position if data quality is below threshold |
BWA: Burrows-Wheeler Alignment; MTB: Mycobacterium tuberculosis; NA: not applicable; RIVM: National Institute for Public Health and the Environment; SNP: single nucleotide polymorphism; SSI: Statens Serum Institut.
Figure 1Clustering of cases by WGS in analysed samples using five distinct international WGS data analysis pipelines (n = 535)
Genetic distances of pairs of isolates clustered by WGS only and not by MIRU-VNTR in five distinct international WGS data analysis pipelines and the associated 24-loci MIRU-VNTR patterns
| Sample 1 | Sample 2 | Genetic distance in SNPs/alleles by pipeline | 24-loci MIRU-VNTR ordera | |||||
|---|---|---|---|---|---|---|---|---|
| RIVM SNP | Oxford University SNP | Research Center Borstel (MTBseq) SNP | SSI SNP | cgMLST (allele) | MIRU-VNTR pattern sample 1 | MIRU-VNTR pattern sample 2 | ||
| ERX2465161 | ERX2465207 | 12 | 14 | 12 | 8 | 8 | 2-5-3-5-3-3-2-3-3-4-1-3-6-3- | 2-5-3-5-3-3-2-3-3-4-1-3-6-3- |
| ERX2465178b | ERX2465568b | 14 | 12 | 8 | 5 | 5 | 2-1-4-7-4- | 2-1-4-7-4- |
| ERX2465292 | ERX2465259 | 17 | 19 | 16 | 15 | 12 | 2- | 2- |
| ERX2465308 | ERX2465278 | 16 | 15 | 11 | 9 | 12 | 2-5-2- | 2-5-2- |
| ERX2465418c,d | ERX2465573 | 133 | 0 | 2 | 41 | 0 | 2- | 2- |
| ERX2465418c,d | ERX2465330 | 132 | 0 | 3 | 41 | 1 | 2- | 2- |
| ERX2465418c,d | ERX2465391d | 123 | 0 | 255 | 189 | 185 | 2-5- | 2-5- |
| ERX2465512 | ERX2465223 | 7 | 7 | 6 | 4 | 4 | 2-5-4-3-1- | 2-5-4-3-1- |
| ERX2465622b | ERX2465178b | 6 | 6 | 4 | 3 | 4 | 2-1-4-7-4- | 2-1-4-7-4- |
| ERX2465622b | ERX2465568b | 10 | 8 | 6 | 4 | 5 | 2-1-4-7-4-4-4-2-4-2-2-4-2-3-5-2-5- | 2-1-4-7-4-4-4-2-4-2-2-4-2-3-5-2-5- |
| ERX2465631e | ERX2465366 | 1 | 0 | 0 | 0 | 0 | 2-6-2-7-3-4-2-3-3-4-7-3-2- | 2-6-2-7-3-4-2-3-3-4-7-3-2- |
| ERX2465631e | ERX2465636 | 2 | 1 | 1 | 2 | 0 | 2-6-2-7-3-4-2-3-3-4-7-3-2- | 2-6-2-7-3-4-2-3-3-4-7-3-2- |
MIRU-VNTR: mycobacterial interspersed repeat unit-variable number of tandem repeat; MTB: Mycobacterium tuberculosis; RIVM: National Institute for Public Health and the Environment; SNPs: single nucleotide polymorphisms; SSI: Statens Serum Institut; WGS: whole genome sequencing.
a The 24-loci MIRU-VNTR order was 580-2996-802-960-1644-3192-424-577-2165-2401-3690-4156-2163b-1955-4052-154-2531-4348-2059-2687-3007-2347-2461-3171.
b This isolate clustered by WGS only with two isolates belonging to two different MIRU-VNTR clusters.
c This isolate clustered by WGS only with three isolates belonging to two different MIRU-VNTR clusters.
d This isolate likely contains subpopulations due to the presence of low frequency variants.
e This isolate clustered by WGS only with two isolates belonging to the same MIRU-VNTR cluster.
Variation in the 24-loci MIRU-VNTR patterns between the pairs of isolates are bold and underlined.
Figure 2Association between the pairwise genetic distance and epidemiological links for the 134 MIRU-VNTR clustered tuberculosis cases, by WGS pipeline
Results from five distinct international whole genome sequencing data analysis pipelines for the 134 isolates clustered by MIRU-VNTR with (n = 41) and without (n = 93) epidemiological link
| Pipeline | WGS clustered (≤ 12 SNPs/alleles) | Non-WGS clustered (> 12 SNPs/alleles) | NAa | Genetic distance in SNPs/alleles by pipeline, mean (range) | |||
|---|---|---|---|---|---|---|---|
| Epidemiological link (Yes) | Epidemiological link (No) | Epidemiological link (Yes) | Epidemiological link (No) | Epidemiologically linked cases | Non-epidemiologically linked cases | ||
| RIVM SNP | 39b | 34 | 0 | 59 | 2 | 2.4 (0–6) | 65.9 (0–198) |
| Oxford University SNP | 41 | 34 | 0 | 59 | NR | 0.3 (0–3) | 63.6 (0–209) |
| Research Center Borstel (MTBseq) SNP | 39b | 32b | 0 | 59 | 4 | 0.9 (0–3) | 55.7 (0–174) |
| cgMLST (allele) | 41 | 39 | 0 | 54 | NR | 0.4 (0–2) | 42.5 (0–132) |
| SSI SNP | 39b | 34 | 0 | 59 | 2 | 0.7 (0–4) | 46.6 (0–151) |
cgMLST: core genome multilocus sequence typing; NA: not applicable; NR: not recorded; MTB: Mycobacterium tuberculosis; RIVM: National Institute for Public Health and the Environment; SNP: single nucleotide polymorphism; SSI: Statens Serum Institut.
a Not applicable for analysis since paired isolates were excluded due to low mean coverage depth.
b One paired isolate was excluded due to low mean coverage depth.