| Literature DB >> 36232601 |
Charlotte Genestet1,2, Yannick Baffert2, Maxime Vallée2, Albin Bernard1, Yvonne Benito1, Gérard Lina1,2, Elisabeth Hodille1,2, Oana Dumitrescu1,2.
Abstract
Epidemiological studies investigating transmission chains of tuberculosis are undertaken worldwide to tackle its spread. CRISPR locus diversity, called spoligotyping, is a widely used genotyping assay for Mycobacterium tuberculosis complex (MTBC) characterization. Herein, we developed a house-made targeted next-generation sequencing (tNGS) spoligotyping, and compared its outputs with those of membrane-based spoligotyping. A total of 144 clinical MTBC strains were retrospectively selected to be representative of the local epidemiology. Data analysis of a training set allowed for the setting of "presence"/"absence" thresholds for each spacer to maximize the sensibility and specificity related to the membrane-based spoligotyping. The thresholds above, in which the spacer was considered present, were 50 read per millions for spacers 10 and 14, 20,000 for spacers 20, 21, and 31, and 1000 for the other spacers. The confirmation of these thresholds was performed using a validation set. The overall agreement on the training and validation sets was 97.5% and 93.8%, respectively. The discrepancies concerned six strains: Two for spacer 14, two for spacer 31, and two for spacer 32. The tNGS spoligotyping, whose thresholds were finely-tuned during a careful bioinformatics pipeline development process, appears be a technique that is reliable, inexpensive, free of handling errors, and automatable through automatic transfer into the laboratory computer system.Entities:
Keywords: CRISPR locus diversity; Mycobacterium tuberculosis complex; in silico spoligotyping; membrane-based spoligotyping; spoligotyping; targeted next-generation sequencing; tuberculosis
Mesh:
Year: 2022 PMID: 36232601 PMCID: PMC9569608 DOI: 10.3390/ijms231911302
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1Epidemiology of 144 MTBC strains included in the study, separated on (A) training set (n = 80 strains) and (B) validation set (n = 64 strains). For both, the identification was based on membrane-based spoligotyping identification.
Figure 2Determination of the reads per million (RPM) threshold for spacers 10 and 14. On the left: Distribution in the form of a violin plot of spacers 10 and 14 RPM. Purple circles represent the “present” spacers according to the membrane-based spoligotyping. Green circles represent the “absent” spacers according to the membrane-based spoligotyping. Dotted red line: RPM threshold choice (50 RPM). In the middle: Receiver operating characteristic (ROC) curve for spacers 10 and 14. Red circle represents the best RPM threshold. On the right: Sensibility (red line) and specificity (blue line) compared to the membrane-based spoligotyping according to the variable RPM threshold. Dotted purple line: RPM thresholds between which the sensitivity and the specificity are maximized. Dotted red line: RPM threshold choice (50 RPM).
Figure 3Determination of the reads per million (RPM) threshold for spacers 20, 21, and 31. On the left: Distribution in the form of a violin plot of spacers 20, 21, and 31 RPM. Purple circles represent the “present” spacers according to the membrane-based spoligotyping. Green circles represent the “absent” spacers according to the membrane-based spoligotyping. Dotted red line: RPM threshold choice (20,000 RPM). In the middle: Receiver operating characteristic (ROC) curve for spacers 20, 21, and 31. Red circle represents the best RPM threshold. On the right: Sensibility (red line) and specificity (blue line) compared to the membrane-based spoligotyping according to the variable RPM threshold. Dotted purple line: RPM thresholds between which the sensitivity and the specificity are maximized. Dotted red line: RPM threshold choice (20,000 RPM).
Figure 4Determination of the reads per million (RPM) threshold for the other spacers. On the left: Distribution in the form of a violin plot of the other spacers. Purple circles represent the “present” spacers according to the membrane-based spoligotyping. Green circles represent the “absent” spacers according to the membrane-based spoligotyping. Dotted red line: RPM threshold choice (1000 RPM). In the middle: Receiver operating characteristic (ROC) curve for the other spacers. Red circle represents the best RPM threshold. On the right: Sensibility (red line) and specificity (blue line) compared to the membrane-based spoligotyping according to the variable RPM threshold. Dotted purple line: RPM thresholds between which the sensitivity and the specificity are maximized. Dotted red line: RPM threshold choice (1000 RPM).
Discordant spacers between membrane-based spoligotyping and tNGS spoligotyping for the training set and the validation set.
| Discordant Spacer (Number) | Prevalence in Membrane-Based Spoligotyping, n | Number of Concerned Isolates | Overall Agreement, % | Cohen’s Kappa | |
|---|---|---|---|---|---|
|
| |||||
| “0” in membrane, “1” in tNGS | 14 | 62 | 2 | 97.5 | 0.911 |
|
| |||||
| “0” in membrane, “1” in tNGS | 32 | 51 | 2 | 96.9 | 0.898 |
| “1” in membrane, “0” in tNGS | 31 | 44 | 2 | 96.9 | 0.929 |
“0” corresponds to the absence of the spacer; “1” corresponds to the presence of the spacer. NGS, next-generation sequencing.
Figure 5Representation of “present” discordant spacers using Venn diagrams for membrane- (green), targeted-next generation sequencing (tNGS; red), and whole genome sequencing (WGS; blue) based spoligotyping: (A) For the training set, (B) for the validation set.