| Literature DB >> 29183286 |
Arnaud Felten1, Meryl Vila Nova1, Kevin Durimel1, Laurent Guillier1, Michel-Yves Mistou1, Nicolas Radomski2.
Abstract
BACKGROUND: Many of the bacterial genomic studies exploring evolution processes of the host adaptation focus on the accessory genome describing how the gains and losses of genes can explain the colonization of new habitats. Consequently, we developed a new approach focusing on the coregenome in order to describe the host adaptation of Salmonella serovars.Entities:
Keywords: Bacterial fixed variants; Bacterial genomics; Gene-ontology enrichment analysis
Mesh:
Year: 2017 PMID: 29183286 PMCID: PMC5706153 DOI: 10.1186/s12866-017-1132-1
Source DB: PubMed Journal: BMC Microbiol ISSN: 1471-2180 Impact factor: 3.605
Fig. 1Boxplots (median, 25th percentile, 75th percentile, minimum and maximum) of pairwise distances expressed as single nucleotide polymorphisms (SNPs) (a) or small insertions/deletions (InDels) (b) into Salmonella enterica subsp. enterica serovars Dublin (n = 60), Enteritidis (n = 528), Pullorum (n = 10) and Gallinarum (n = 28). Normality of distribution and equality of variances were checked with Shapiro-Wilk and Fisher tests, respectively. Statistical differences (*: p < 5.0×10−2; **: p < 1.0×10−2; ***: p < 1.0×10−3; ****: p < 1.0×10−4; *****: p < 1.0×10−5; ******: p < 1.0×10−6) are calculated with Wilcoxon rank sum (i.e. non-normal distribution with equality of variances) or Kolmogorov-Smirnov (i.e. non-normal distribution without equality of variances) tests
Fig. 2Phylogenetic inference based on coregenome single nucleotide polymorphisms (SNPs) identified in Salmonella enterica subsp. enterica serovars Dublin, Enteritidis, Pullorum, and Gallinarum. The color legend corresponds to serovars presented by Langridge et al. (Proc. Natl. Acad. Sci. 2015;112:863–8). The variants were identified by the ‘VARCall’ workflow against the reference genome S. Enteritidis (strain P125109, accession NC_011294.1). The produced pseudogenomes (4,685,848 bp) were inferred with RAxML based on a bootstrap analysis and search for best-scoring Maximum Likelihood tree with General Time-Reversible model of substitution and the secondary structure 16-state model. Bootstraps higher than 90% are represented by black circles. The phylogenetic inference converged after 200 bootstrap replicates with a log likelihood score of −8.106 for 1000 computed trees. The tree is rooted on the branch of S. Dublin
Fig. 3Densities of single nucleotide polymorphisms (SNPs) per 1000 bp (curves), Salmonella pathogenic islands (dotted lines), and recombination events (rectangles) across Salmonella enterica subsp. enterica serovars (a: 59 genomes, 12,929 SNPs), including Dublin (b: 13 genomes, 5084 SNPs), Enteritidis (c: 33 genomes, 5136 SNPs), Pullorum (d: 5 genomes, 2225 SNPs), and Gallinarum (e: 8 genomes, 671 SNPs). Pathogenicity island database from KonKuk University (Seoul, South Korea) were used to detect Salmonella Pathogenic Islands (SPIs) SPI-1 (2890501–2,934,879), SPI-2 (1727425–1,769,273), SPI-4 (4333507–4,361,514), SPI-5 (1053174–1,074,167), SPI-6 (299796–330,890), SPI-11 (1904313–1,912,607), SPI-12 (2328077–2,347,757) and PAI III 536 (2801306–2,810,695) of the reference genome S. Enteritidis (strain P125109, accession NC_011294.1)
Fig. 4Homoplastic (grew bars) and non-homoplastic (white bars) variants (SNPs versus InDels, intragenic versus intergenic, non-synonymous versus synonymous) fixed across all branches of the phylogenetic inference including genomes of Salmonella enterica subsp. enterica serovars Enteritidis (n = 33), Pullorum (n = 5), Gallinarum (n = 8) and Dublin (n = 13). The variant annotation was performed with SnpEff against reference genome S. Enteritidis (strain P125109, accession NC_011294.1). The fixed non-homoplastic variants are defined by common genotypes across the considered group of genomes, as well as different genotypes in all the others compared genomes. The fixed homoplastic variants are defined by common genotypes across the considered group of genomes and genomes of independent phylogenetic clades, as well as different genotypes in genomes of the compared child-branches. The term ‘reference genotype’ refers to fixed variants presenting genotype of the reference genome. This analysis was performed with the script ‘phyloFixedVar’ (i.e. dependently of the phylogenetic inference). Statistical differences (*: p < 1.0×10−6; **: p < 1.0×10−7; ***: p < 1.0×10−8; ****: p < 1.0×10−9; *****: p < 1.0×10−10) are calculated with Wilcoxon signed rank tests. The vertical bars represent the standard deviation
Single nucleotide polymorphisms (SNPs) and small insertions/deletions (InDels) fixed at phylogenetic branches where genomes of Salmonella enterica subsp. enterica serovars Enteritidis (n = 33), Pullorum (n = 5), Gallinarum (n = 8) and Dublin (n = 13) diverged
| Serovars | Variants | Total | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Intragenic | Intergenic | ||||||||
| sSNP | nsSNP | nsInDel | rSNP | rInDel | SNP | InDel | |||
| Homoplastic | Dublin | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Enteritidis | 0 | 0 | 0 | 3948 | 93 | 439 | 117 | 4597 | |
| Pullorum + Gallinarum | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 | |
| Pullorum | 1 | 1 | 5 | 6 | 113 | 4 | 47 | 117 | |
| Gallinarum | 0 | 0 | 8 | 236 | 84 | 16 | 38 | 382 | |
| Non-homoplastic | Dublin versus all a | 3129 | 819 | 87 | 0 | 0 | 438 | 115 | 4588 |
| Enteritidis | 0 | 0 | 0 | 0 | 1 | 0 | 2 | 3 | |
| Pullorum + Gallinarum | 0 | 0 | 31 | 0 | 0 | 0 | 5 | 36 | |
| Pullorum | 95 | 139 | 81 | 0 | 0 | 15 | 27 | 357 | |
| Gallinarum | 5 | 1 | 108 | 0 | 0 | 3 | 38 | 155 | |
The variant calling analysis was performed with the ‘VARCall’ workflow (i.e. 12,929 SNPs and 1157 small InDels). The fixed non-homoplastic variants are defined by common genotypes across the considered group of genomes, as well as different genotypes in all the others compared genomes. The fixed homoplastic variants are defined by common genotypes across the considered group of genomes and genomes of independent phylogenetic clades, as well as different genotypes in genomes of the compared child-leaves. According to the variant annotation performed with SnpEff against reference genome S. Enteritidis (strain P125109, accession NC_011294.1), the fixed variants presenting reference (r) and alternative (synonymous: s; non-synonymous: ns) genotypes are presented
aAnalysis performed with the script ‘phyloFixedVar’ (i.e. dependently of the phylogenetic inference)
bAnalysis performed with the script ‘FixedVar’ (i.e. independently of the phylogenetic inference)
Gene-ontology (GO) terms of intragenic and non-homoplastic variants (SNPs and InDels) fixed in Salmonella enterica subsp. enterica serovars Dublin versus all the others genomes (Ontology 1 called ‘Dub_All’), Pullorum/Gallinarum versus Enteritidis (Ontology 2 called ‘Ent_Pull/Gall’), Pullorum versus Gallinarum (Ontology 3 called ‘Pull_Gall’), and Gallinarum versus Pullorum (Ontology 4 called ‘Gall_Pull’)
| Gene-ontology enrichment analysis | GO ID | GO-term | Number of hits | Expected number of hits | GO-level |
| Corrected | Ontology |
|---|---|---|---|---|---|---|---|---|
| 1 | GO:0006105 | succinate metabolic process | 36 | 14.692 | 8 | 5.8×10−13 | 8.1×10−10 | BP |
| GO:0006307 | DNA dealkylation involved in DNA repair | 4 | 1.433 | 9 | 1.0×10−42 | 1.0×10−39 | BP | |
| GO:0006468 | protein phosphorylation | 61 | 40.850 | 8 | 3.9×10−05 | 5.5×10−02 | BP | |
| GO:0006520 | cellular amino acid metabolic process | 533 | 445.766 | 7 | 6.4×10−08 | 9.0×10−05 | BP | |
| GO:0006525 | arginine metabolic process | 104 | 53.392 | 10 | 8.4×10−18 | 1.1×10−14 | BP | |
| GO:0006527 | arginine catabolic process | 62 | 25.800 | 11 | 1.3×10−19 | 1.9×10−16 | BP | |
| GO:0006545 | glycine biosynthetic process | 5 | 1.792 | 11 | 1.0×10−42 | 1.0×10−39 | BP | |
| GO:0006560 | proline metabolic process | 81 | 46.583 | 10 | 2.4×10−10 | 3.4×10−07 | BP | |
| GO:0006562 | proline catabolic process | 27 | 12.183 | 11 | 3.4×10−08 | 4.9×10−05 | BP | |
| GO:0009064 | glutamine family amino acid metabolic process | 190 | 116.817 | 9 | 4.6×10−17 | 6.5×10−14 | BP | |
| GO:0009065 | glutamine family amino acid catabolic process | 89 | 37.983 | 10 | 2.4×10−25 | 3.4×10−22 | BP | |
| GO:0009233 | menaquinone metabolic process | 27 | 13.258 | 6 | 9.0×10−07 | 1.2×10−03 | BP | |
| GO:0009234 | menaquinone biosynthetic process | 27 | 13.258 | 7 | 9.0×10−07 | 1.2×10−03 | BP | |
| GO:0010133 | proline catabolic process to glutamate | 27 | 12.183 | 11 | 3.4×10−08 | 4.9×10−05 | BP | |
| GO:0019544 | arginine catabolic process to glutamate | 10 | 3.942 | 12 | 1.2×10−05 | 1.7×10−02 | BP | |
| GO:0019545 | arginine catabolic process to succinate | 36 | 14.692 | 9 | 5.8×10−13 | 8.1×10−10 | BP | |
| GO:0035510 | DNA dealkylation | 4 | 1.433 | 8 | 1.0×10−42 | 1.0×10−39 | BP | |
| GO:1,901,565 | organonitrogen compound catabolic process | 162 | 112.517 | 5 | 3.8×10−09 | 5.3×10−06 | BP | |
| GO:1,901,605 | alpha-amino acid metabolic process | 320 | 240.441 | 8 | 8.0×10−11 | 1.1×10−07 | BP | |
| GO:1,901,606 | alpha-amino acid catabolic process | 100 | 62.350 | 9 | 1.9×10−09 | 2.7×10−06 | BP | |
| GO:0009379 | Holliday junction helicase complex | 4 | 1.408 | 5 | 1.0×10−42 | 1.0×10−39 | CC | |
| GO:0003842 | 1-pyrroline-5-carboxylate dehydrogenase activity | 27 | 12.957 | 6 | 1.5×10−07 | 1.8×10−04 | MF | |
| GO:0003908 | methylated-DNA-[protein]-cysteine S-methyltransferase activity | 4 | 1.524 | 7 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0004020 | adenylylsulfate kinase activity | 4 | 1.524 | 6 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0004072 | aspartate kinase activity | 6 | 2.286 | 6 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0004372 | glycine hydroxymethyltransferase activity | 5 | 1.905 | 6 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0004412 | homoserine dehydrogenase activity | 6 | 2.286 | 6 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0004657 | proline dehydrogenase activity | 27 | 12.957 | 5 | 1.5×10−07 | 1.8×10−04 | MF | |
| GO:0004743 | pyruvate kinase activity | 10 | 3.811 | 6 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0004815 | aspartate-tRNA ligase activity | 5 | 1.905 | 7 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0008770 | [acyl-carrier-protein] phosphodiesterase activity | 4 | 1.524 | 7 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0009015 | N-succinylarginine dihydrolase activity | 12 | 4.954 | 6 | 3.5×10−06 | 4.1×10−03 | MF | |
| GO:0009017 | succinylglutamate desuccinylase activity | 10 | 4.192 | 6 | 2.4×10−05 | 2.8×10−02 | MF | |
| GO:0015166 | polyol transmembrane transporter activity | 12 | 4.573 | 6 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0015169 | glycerol-3-phosphate transmembrane transporter activity | 12 | 4.573 | 8 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0015430 | glycerol-3-phosphate-transporting ATPase activity | 6 | 2.286 | 9 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0015605 | organophosphate ester transmembrane transporter activity | 12 | 4.573 | 5 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0016749 | N-succinyltransferase activity | 5 | 1.905 | 7 | 1.0×10−42 | 1.0×10−39 | MF | |
| GO:0018480 | 5-carboxymethyl-2-hydroxymuconic-semialdehyde dehydrogenase activity | 8 | 3.049 | 6 | 1.0×10−42 | 1.0×10−39 | MF | |
| 2 | GO:0003973 | (S)-2-hydroxy-acid oxidase activity | 4 | 0.013 | 6 | 1.1×10−12 | 1.3×10−09 | MF |
| GO:0016899 | oxidoreductase activity, acting on the CH-OH group of donors, oxygen as acceptor | 4 | 0.013 | 5 | 1.1×10−12 | 1.3×10−09 | MF | |
| 3 | GO:0043603 | cellular amide metabolic process | 36 | 17.451 | 5 | 1.9×10−05 | 2.7×10−02 | BP |
| GO:0006428 | isoleucyl-tRNA aminoacylation | 4 | 0.445 | 11 | 4.7×10−05 | 6.7×10−02 | BP | |
| GO:0006522 | alanine metabolic process | 4 | 0.376 | 10 | 1.8×10−05 | 2.5×10−02 | BP | |
| GO:0009078 | pyruvate family amino acid metabolic process | 4 | 0.376 | 9 | 1.8×10−05 | 2.5×10−02 | BP | |
| GO:0004822 | isoleucine-tRNA ligase activity | 4 | 0.475 | 7 | 6.5×10−05 | 7.5×10−02 | MF | |
| GO:0015079 | potassium ion transmembrane transporter activity | 8 | 1.680 | 9 | 3.7×10−05 | 4.2×10−02 | MF | |
| GO:0008079 | translation termination factor activity | 5 | 0.657 | 7 | 3.0×10−05 | 3.4×10−02 | MF | |
| GO:0003747 | translation release factor activity | 5 | 0.657 | 8 | 3.0×10−05 | 3.4×10−02 | MF | |
| 4 | GO:0043603 | cellular amide metabolic process | 36 | 17.473 | 5 | 2.0×10−05 | 2.8×10−02 | BP |
| GO:0006428 | isoleucyl-tRNA aminoacylation | 4 | 0.445 | 11 | 4.8×10−05 | 6.7×10−02 | BP | |
| GO:0006522 | alanine metabolic process | 4 | 0.377 | 10 | 1.8×10−05 | 2.5×10−02 | BP | |
| GO:0009078 | pyruvate family amino acid metabolic process | 4 | 0.377 | 9 | 1.8×10−05 | 2.5×10−02 | BP | |
| GO:0004822 | isoleucine-tRNA ligase activity | 4 | 0.475 | 7 | 6.5×10−05 | 7.5×10−02 | MF | |
| GO:0015079 | potassium ion transmembrane transporter activity | 8 | 1.681 | 9 | 3.7×10−05 | 4.2×10−02 | MF | |
| GO:0008079 | translation termination factor activity | 5 | 0.658 | 7 | 3.0×10−05 | 3.4×10−02 | MF | |
| GO:0003747 | translation release factor activity | 5 | 0.658 | 8 | 3.0×10−05 | 3.4×10−02 | MF |
The identification of variants, detection of fixed variants, assignment of GO-terms to variants, and gene-ontology enrichment analysis were performed with the scripts ‘VARCall’, ‘phyloFixedVar’, ‘GetGOxML’, and ‘EveryGO’, respectively. The level, biological process (BP), molecular function (MF), and cellular component (CC) of GO-terms are represented. The p-values of hypergeometric tests were adjusted by Bonferroni correction. The lowest corrected p-values representing GO-terms highly impacted by fixed variants (i.e. < 5.0×10-2), the highest GO-levels presenting the most accurate GO-terms (i.e. ≥ 5), and the highest number of hits representing relevant GO-terms quantitatively (i.e. ≥ 4) are presented
Impacts of translation and function of proteins encoded by genes presenting GO-terms highly impacted by intragenic and non-homoplastic fixed variants in Salmonella enterica subsp. enterica serovars Pullorum and Gallinarum
| GO | Position | Genes | Reference genotype | Genotype in Gallinarum | Genotype in Pullorum | Protein translation | Protein translation impact in Pullorum | Impact on protein function |
|---|---|---|---|---|---|---|---|---|
| GO:0006428 | 54,044 | SEN_RS00235 | G | G | T | Null | Missense variant | Modification in Pullorum |
| GO:0006428 | 54,289 | SEN_RS00235 | T | T | C | Null | Synonymous variant | Potential modification |
| GO:0006428 | 54,658 | SEN_RS00235 | C | C | T | Null | Synonymous variant | Potential modification |
| GO:0006428 | 55,063 | SEN_RS00235 | G | G | A | Null | Synonymous variant | Potential modification |
| GO:0006522 | 1,313,705 | SEN_RS06395 | C | C | T | Null | Stop gained | Partial lost in Pullorum |
| GO:0006522 | 1,313,706 | SEN_RS06395 | A | A | G | Null | Missense variant | Modification in Pullorum |
| GO:0004822 | 54,044 | SEN_RS00235 | G | G | T | Null | Missense variant | Modification in Pullorum |
| GO:0004822 | 54,289 | SEN_RS00235 | T | T | C | Null | Synonymous variant | Potential modification |
| GO:0004822 | 54,658 | SEN_RS00235 | C | C | T | Null | Synonymous variant | Potential modification |
| GO:0004822 | 55,063 | SEN_RS00235 | G | G | A | Null | Synonymous variant | Potential modification |
| GO:0015079 | 99,941 | SEN_RS00445 | C | CGCTGGG | C | Disruptive inframe insertion | NULL | Partial lost in Gallinarum |
| GO:0015079 | 3,489,575 | SEN_RS17095 | C | CG | C | Frameshift variant | NULL | Modification in Gallinarum |
| GO:0003747 | 1,343,057 | SEN_RS06530 | C | C | A | Null | Synonymous variant | Potential modification |
| GO:0003747 | 1,343,237 | SEN_RS06530 | A | A | G | Null | Synonymous1 variant | Potential modification |
| GO:0003747 | 339,779 | SEN_RS01530 | G | G | T | Null | Missense variant | Modification in Pullorum |
The identification of variants, detection of fixed variants, assignment of GO-terms to variants, and gene-ontology enrichment analysis were performed with the scripts ‘VARCall’, ‘phyloFixedVar’, ‘GetGOxML’, and ‘EveryGO’, respectively. The variant annotation was performed with SnpEff against reference genome S. Enteritidis (strain P125109, accession NC_011294.1). The p-values of hypergeometric tests were adjusted by Bonferroni correction. The lowest corrected p-values representing GO-terms highly impacted by fixed variants (i.e. < 5.0×10-2), the highest GO-levels presenting the most accurate GO-terms (i.e. ≥ 5), and the highest number of hits representing relevant GO-terms quantitatively (i.e. ≥ 4) are presented
Fig. 5Genes impacted by single nucleotide polymorphisms (SNPs), involved in the amino acid catabolism, and fixed at the branch representing divergence between Salmonella serovars Dublin and Enteritidis/Pullorum/Gallinarum. Round bars represent missense (white) and synonymous SNPs (grew)
Fig. 6Amino acid pathways in which intragenic and non-homoplastic fixed single nucleotide polymorphisms (SNPs) differentiating Salmonella serovars Dublin versus Enteritidis have been detected. The dotted lines represent enzymatic steps for which the corresponding genes encoding enzymes have been specifically mutated. AST, NADH, OAA and PPi stand for ammonia-producing arginine succinyltransferase, nicotinamide adenine dinucleotide, oxaloacetic acid and pyrophosphate, respectively. KEGG database were used as a database for reference pathway (Nucl. Acids Res. 2016;44:D457–62)
Fig. 7Programs (i.e. black letters) and commands (i.e. grew letters) implemented in the ‘VARCall’ workflow aiming to call single nucleotide polymorphisms (SNPs) and small insertions/deletions (InDels). The scripts referring to alignment against reference genome (i.e. ‘BAMmaker’), variant calling (i.e. ‘VCFmaker_SNP’ and ‘VCFmaker_INDEL’), variant combination (i.e. ‘SNP-INDEL_merge’), pairwise distances (i.e. ‘VCFtoMATRIX’), variant concatenation (i.e. ‘VCFtoFASTA’), pseudogenome assemblies (i.e. ‘VCFtoPseudoGenome’), and report about breadth and depth coverages (i.e. ‘reportMaker’) were written with Python 2.7 and are invoked by the driven script ‘VARCall’ (i.e. black arrow). The script ‘BAMmaker’ is performed for each genome (i.e. circular arrow)
Fig. 8Programs (i.e. black letters) and corresponding effects (i.e. grew letters) implemented in the scripts ‘phyloFixedVar’, ‘GetGOxML’ and ‘EveryGO’ aiming to identify sensitive (Se) and specific (Sp) variants (SNPs and InDels) at each branches of corresponding phylogenetic inference, associate prokaryotic gene ontology (GO) terms with these homoplastic and/or non-homoplastic variants, and perform gene-ontology enrichment analysis based on the parent-child approach integrating hypergeometric tests and Bonferroni corrections, respectively. Online databases are queried by the scripts ‘GOSlimer’ and ‘GOxML’ (i.e. clouds). The GO database of the Gene Ontology Consortium is used by the script ‘GOSlimer’ to identify prokaryotic GO-terms. The QuickGO browser of the UniProt GO annotation program is queried by the script ‘GOxML’ to associate the variants with the corresponding GO-terms. These scripts were written with Python 2.7 and implement R libraries ‘p.ajust’ and ‘phyper’. The whole workflow is semi-automated (i.e. grew arrows) and the scripts ‘GetGOxML’ and ‘EveryGO’ can be performed for each variant and each branch, respectively (i.e. circular arrow)