| Literature DB >> 28750049 |
Arnaud Felten1, Laurent Guillier1, Nicolas Radomski1, Michel-Yves Mistou1, Renaud Lailler1, Sabrina Cadel-Six1.
Abstract
Most of the bacterial typing methods used to discriminate isolates in medical or food safety microbiology are based on genetic markers used as targets in PCR or hybridization experiments. These DNA typing methods are important tools for studying prevalence and epidemiology, for conducting surveillance, investigations and control of biological hazard sources. In that perspective, it is crucial to insure that the chosen genetic markers have the greatest specificity and sensitivity. The wealth of whole-genome sequences available for many bacterial species offers the opportunity to evaluate the performance of these genetic markers. In the present study, we have developed GTEvaluator, a bioinformatics workflow which ranks genetic markers depending on their sensitivity and specificity towards groups of well-defined genomes. GTEvaluator identifies the most performant genetic markers to target individuals among a population. The individuals (i.e. a group of genomes within a collection) are defined by any kind of particular phenotypic or biological properties inside a related population (i.e. collection of genomes). The performance of the genetic markers is computed by a distance value which takes into account both sensitivity and specificity. In this study we report two examples of GTEvaluator application. In the first example Bacillus phenotypic markers were evaluated for their capacity to distinguish B. cereus from B. thuringiensis. In the second experiment, GTEvaluator measured the performance of genetic markers dedicated to the molecular serotyping of Salmonella enterica. In one in silico experiment it was possible to test 64 markers onto 134 genomes corresponding to 14 different serotypes.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28750049 PMCID: PMC5531552 DOI: 10.1371/journal.pone.0182082
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1GTEvaluator workflow.
The lists of genetic markers and genomes are the input files of a ‘GTEvaluator’ script which is based on the ‘fuzznuc’ pattern finder, and constituted of ‘GTEvaluator_matrixMaker’ and ‘GTEvaluator_statistic’ scripts for matrix file production (i.e. presence or absence of genomic markers for each genome) and statistical computation (i.e. specificity, sensitivity, statistical distances, and confidence intervals), respectively.
Typological variables describing the ‘presence’ (i.e. i = 1 or j = 1) and ‘absence’ (i.e. i = 0 or j = 0) of genetic markers (xij) across subgroups of studied genomes (g).
The genomes from the targeted subgroup and other subgroups are called g1 and g0, respectively.
| Genetic marker | Total | ||
|---|---|---|---|
| Presence | Absence | ||
| Genomes of the interest subgroup ( | x11 | x10 | n1 |
| Other genomes from other subgroup(s) ( | x01 | x00 | n0 |
Fig 2Simulated distances and uncertainties of specificity and sensibility implemented in GTEvaluator.
A distance value (d) defines the performance of a marker in term of specificity (Sp) and sensitivity (Se) across considered subgroups of genomes (Fig 2a). The uncertainty on specificity and sensitivity is presented for 100 (Fig 2b), 200 (Fig 2c), and 300 (Fig 2d) genomes in the dataset. A potential genomic marker of a given subgroup of genomes (xij) is defined by his presence (i or j = 1) and absence (i or j = 0) in genomes of this subgroup (i) and others (j). Specificity and sensitivity are constant values (i.e. Se = 0.900 and Sp = 0.977), and the targeted subgroup represents 20% of the genome dataset in the present simulation.
Previously published targets presenting the lowest distances (d) calculated by GTEvaluator based on combinations of their respective sensitivity (Se) and specificity (Sp).
Uncertainties on sensitivity and specificity are presented for distances lower than 0.0140 and 0.0707. The bold characters indicate the promising values.
| Group | Subgroup | Number of genomes | Target | Se | Sp | Probability d<0.014 | Probability d<0.0707 | |
|---|---|---|---|---|---|---|---|---|
| 22 | HlyII | 0.63 | 0.38 | 0.71 | ND | ND | ||
| 21 | HB1 | 0.85 | 0.63 | 0.39 | ND | ND | ||
| Agona | 20 | G-comp | 1.00 | 0.81 | 0.18 | <0.0001 | 0.0002 | |
| Derby | 1 | G-comp | 1.00 | 0.69 | 0.30 | <0.0001 | <0.0001 | |
| Enteriditis | 20 | m-g_m | ||||||
| SEN1383 | ||||||||
| SEN1383_probe | ||||||||
| Hadar | 2 | EN-comp-1 | 1.00 | 1.00 | 0 | 0.03 | 0.21 | |
| z10 | 1.00 | 1.00 | 0 | 0.03 | 0.21 | |||
| Infantis | 3 | SCH-2097-probe | 1.00 | 0.97 | 0.02 | 0.03 | 0.22 | |
| r | 1.00 | 0.97 | 0.02 | 0.03 | 0.22 | |||
| SCH-2097 | 1.00 | 0.97 | 0.02 | 0.03 | 0.22 | |||
| Kentucky | 11 | z6 | ||||||
| Newport | 20 | e-h | 1.00 | 0.97 | 0.02 | 0.13 | 0.69 | |
| Panama | 1 | L-comp | 1.00 | 1.00 | 0 | 0.02 | 0.13 | |
| Paratyphi A | 9 | PA | ||||||
| a-1 | ||||||||
| a-2 | ||||||||
| Saintpaul | 3 | e-h | 1.00 | 0.84 | 0.15 | <0.0001 | <0.0001 | |
| Typhi | 17 | TY | ||||||
| d | ||||||||
| j | ||||||||
| Typhimurium | 20 | FliC | 0.95 | 0.88 | 0.12 | <0.0001 | 0.0048 | |
| 4 | FliC | 1.00 | 0.78 | 0.21 | ND | ND | ||
| Virchow | 3 | SCH-2097-probe | 1.00 | 0.97 | 0.02 | 0.0047 | 0.2156 | |
| r | 1.00 | 0.97 | 0.02 | 0.0047 | 0.2156 | |||
| SCH-2097 | 1.00 | 0.97 | 0.02 | 0.0047 | 0.2156 |
* S 4,[5],12:i:- corresponds to a monophasic variant of Typhimurium serotype,
# ND stands for not determined.
Fig 3Graphical representation of the distances and uncertainties implemented in GTEvaluator for the genetic markers fliC and fljB for Salmonella enterica serotype Typhimurium.
Confidence intervals of sensitivity and specificity of FliC (black) and FljB (grey) markers are represented according to their abilities to distinguish between 20 genomes of S. Typhimurium and 114 genomes of other serotypes of Salmonella enterica.