| Literature DB >> 32320376 |
Laurent Guillier1,2, Michèle Gourmelon3, Solen Lozach3, Sabrina Cadel-Six2, Marie-Léone Vignaud2, Nanna Munck4, Tine Hald4, Federica Palma2.
Abstract
The partitioning of pathogenic strains isolated in environmental or human cases to their sources is challenging. The pathogens usually colonize multiple animal hosts, including livestock, which contaminate the food-production chain and the environment (e.g. soil and water), posing an additional public-health burden and major challenges in the identification of the source. Genomic data opens up new opportunities for the development of statistical models aiming to indicate the likely source of pathogen contamination. Here, we propose a computationally fast and efficient multinomial logistic regression source-attribution classifier to predict the animal source of bacterial isolates based on 'source-enriched' loci extracted from the accessory-genome profiles of a pangenomic dataset. Depending on the accuracy of the model's self-attribution step, the modeller selects the number of candidate accessory genes that best fit the model for calculating the likelihood of (source) category membership. The Accessory genes-Based Source Attribution (AB_SA) method was applied to a dataset of strains of Salmonella enterica Typhimurium and its monophasic variant (S. enterica 1,4,[5],12:i:-). The model was trained on 69 strains with known animal-source categories (i.e. poultry, ruminant and pig). The AB_SA method helped to identify 8 genes as predictors among the 2802 accessory genes. The self-attribution accuracy was 80 %. The AB_SA model was then able to classify 25 of the 29 S. enterica Typhimurium and S. enterica 1,4,[5],12:i:- isolates collected from the environment (considered to be of unknown source) into a specific category (i.e. animal source), with more than 85 % of probability. The AB_SA method herein described provides a user-friendly and valuable tool for performing source-attribution studies in only a few steps. AB_SA is written in R and freely available at https://github.com/lguillier/AB_SA.Entities:
Keywords: Salmonella Typhimurium; environmental contamination; multinomial logistic regression; pangenome-wide enrichment analysis; source attribution
Mesh:
Substances:
Year: 2020 PMID: 32320376 PMCID: PMC7478624 DOI: 10.1099/mgen.0.000366
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.Venn diagram showing the number of source-enriched genes for each animal category. Dark grey, pigs; grey, poultry; light grey, ruminants.
Tested multinomial logistic models with the accuracy obtained with the training and the AIC values for each selected number of genes
|
No. of genes/sources |
Group_176 |
Group_763 |
Group_1926 |
ymdB_2 |
ylcG_1 |
Group_852 |
Group_6195 |
Group_160 |
Group_158 |
Accuracy [95% CI] |
AIC |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
1 |
× |
– |
× |
– |
– |
– |
× |
– |
– |
0.82 [0.67,0.91] |
92.7 |
|
2 |
× |
× |
× |
– |
– |
– |
× |
× |
– |
0.82 [0.58,0.92] |
89.5 |
|
3 |
× |
× |
× |
– |
– |
– |
× |
× |
× |
0.75 [0.64,0.91] |
91.2 |
|
4 |
× |
× |
× |
× |
– |
× |
× |
× |
× |
0.74 [0.55,1] |
81.2 |
|
5 |
× |
× |
× |
× |
× |
× |
× |
× |
× |
0.71 [0.36,1] |
83.6 |
Predictors of animal sources
Numbers of isolates harbouring the predictors with the relative percentage of isolates from the different animal sources and the environment, along with gene annotation from nucleotide and amino acid sequences obtained with Prokka and KEGG, are shown.
|
Predictor |
Total no. of isolates |
Pig isolates |
Poultry isolates |
Ruminant isolates |
Env. isolates |
Prokka annotation |
KEGG protein homology |
|---|---|---|---|---|---|---|---|
|
Group_176 |
57 |
0.58 |
0.05 |
0.04 |
0.33 |
Hypothetical protein |
Putative Gifsy-2 prophage protein/DNA breaking–rejoining protein |
|
Group_763 |
12 |
0.33 |
0.58 |
0.08 |
0 |
|
Bacteriophage lysozyme |
|
ymdB_2 |
12 |
0.17 |
0.25 |
0.25 |
0.33 |
Hypothetical protein |
Gifsy-1 prophage tail assembly-like protein |
|
Group_1926 |
6 |
0 |
0.67 |
0 |
0.33 |
Hypothetical protein |
Gifsy-2 prophage protein/DNA breaking–rejoining protein |
|
Group_852 |
5 |
0 |
0.6 |
0 |
0.4 |
Hypothetical protein |
Uncharacterized protein from |
|
Group_6195 |
3 |
0 |
0 |
0.67 |
0.33 |
|
Oxaloacetate decarboxylase (Na+ extruding) subunit alpha |
|
Group_160 |
8 |
0.38 |
0 |
0.38 |
0.25 |
Hypothetical protein |
Uncharacterized protein |
|
Group_158 |
15 |
0.33 |
0.2 |
0.27 |
0.2 |
Hypothetical protein |
Uncharacterized protein |
Fig. 2.Performance estimates of the parameters of the multinomial logistic model plotting the effect of each predictor. The parameter estimates are relative to the reference category Ruminants_FR. Parameters with significant negative coefficients decrease the likelihood of that response category with respect to the reference category. Parameters with positive coefficients increase the likelihood of that response category.
Fig. 3.Histogram plot of the individual source-attribution probabilities of the 29 environmental strains of . Typhimurium to the three animal sources. The membership probabilities were estimated according to the AB_SA method carried out on the eight source-enriched genes from the accessory genomes. Letters associated with strain numbers refer to the different types of environmental samples: S, Soil; W, fresh water; C, crustacean; B, brackish water.