| Literature DB >> 31799430 |
Julian Parkhill1, Sharon J Peacock2, Claire Chewapreecha3,4,5,2, Alison E Mather6,7, Simon R Harris5, Martin Hunt5, Matthew T G Holden8, Chutima Chaichana9, Vanaporn Wuthiekanun3, Gordon Dougan2, Nicholas P J Day3,10, Direk Limmathurotsakul3,10.
Abstract
The environmental bacterium Burkholderia pseudomallei causes melioidosis, an important endemic human disease in tropical and sub-tropical countries. This bacterium occupies broad ecological niches including soil, contaminated water, single-cell microbes, plants and infection in a range of animal species. Here, we performed genome-wide association studies for genetic determinants of environmental and human adaptation using a combined dataset of 1,010 whole genome sequences of B. pseudomallei from Northeast Thailand and Australia, representing two major disease hotspots. With these data, we identified 47 genes from 26 distinct loci associated with clinical or environmental isolates from Thailand and replicated 12 genes in an independent Australian cohort. We next outlined the selective pressures on the genetic loci (dN/dS) and the frequency at which they had been gained or lost throughout their evolutionary history, reflecting the bacterial adaptability to a wide range of ecological niches. Finally, we highlighted loci likely implicated in human disease.Entities:
Keywords: Bacterial genetics; Evolutionary genetics; Genome-wide association studies
Mesh:
Year: 2019 PMID: 31799430 PMCID: PMC6874650 DOI: 10.1038/s42003-019-0678-x
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Fig. 1Sampling framework for B. pseudomallei isolates from the case control study. a The chart shows the number of clinical and environmental isolates from patients and/or household water supplies of cases (patients with melioidosis) and controls (patients with non-infectious conditions admitted during the same period). b Temporal distribution of environmental and disease isolates in the discovery dataset collected from June 2010 to January 2012. With the exception of months with no house visits, the number of monthly clinical and environmental samples collected were positively correlated (linear regression, adjusted R-square = 0.259, p value = 0.026). c Spatial and temporal distribution of environmental and disease isolates in the validation dataset from the public database.
Fig. 2Population structure and phylogeny of B. pseudomallei isolated from patients and their household water supplies in northeast Thailand. a Multi-dimensional scaling based on the two dimensions that best explained data variability. b Maximum likelihood phylogeny generated from core gene SNPs in northeast Thailand isolates rooted on an Australasian outgroup Bp668. Nodes with bootstrap support of >70 are shown by black dots, with inner black rings presenting 5 monophyletic groups where detailed phylogeny-based analyses were performed. The outer coloured ring shows the isolate source. The grey arches represent an analysis of 27 cases who had a clinical isolates and up to 10 isolates cultured from their water supply, showing the connections for isolates from each case. Source data used to plot (a) is available in Supplementary Data 11.
Fig. 3Genetic relatedness between clinical and environmental isolates from households. a Boxplot summarises pairwise SNPs distance between each clinical and its closest environmental isolate from each monophyletic group after removing recombination signals. A pairwise SNP distance between two clinical isolates cultured from the same patient were included as a threshold. b Correlation between pairwise SNP distance and geographical distance of clinical and its closest environmental isolates. c–g Geographical distance between clinical and its closest environmental isolates by monophyletic group. Red and blue dots represent clinical and environmental isolates, respectively. Colour shade of the links indicates the pairwise SNP distance between the pair. Source data used to plot (a) and (b) is available in Supplementary Datas 12 and 13, respectively.
Fig. 4B. pseudomallei disease- and environmental-associated genes. a Bar charts summarise the frequency of disease- or environment- associated genes by functional category. The plots are ranked by categorical gene frequency from unknown category (n = 13 genes), potential roles in pathogenicity (n = 13 genes), replication, recombination and repair (n = 13 genes), cell wall membrane envelope biogenesis (n = 3 genes), secondary metabolite biosynthesis (n = 3 genes), and energy production and conservation (n = 2 genes). b Distance network reveals genetic loci enriched in disease- and environment-associated isolates. A network was constructed on distance between disease and environmental-associated genes that fell within the size of operon described by the transcriptional unit, as reported in Ooi et al. 2013. Each node represents each gene, with the edge thickness proportional to the frequency of each gene pair observed in the population. The largest disease-associated locus identified in this dataset was the toxin complex. For a and b, the colour indicates the effect size and directionality of association on the scale of log10(Odds ratio), with red and blue presenting association with disease and the environment, respectively.
Fig. 5The selective pressure on disease- and environmental associated genes and the frequency at which they had been gained or lost throughout their evolutionary history. a The dN/dS of core genes, accessory genes, disease-associated genes, and environmental-associated genes are plotted on a log 10 scale. Two-sided Mann–Whitney U test was used to compare categorical observation. b The ratio of gene gain minus gene loss over the total gain and loss events for disease-associated genes and environmental associated genes. Independent observations were drawn from five monophyletic groups. ANOVA was employed to test the differences in group observation, where available treated as replicates for each gene. Where multiple observations were observed for each gene, a mean across different monophyletic groups was taken as an average. For a and b, boxplots summarise the distribution of data based on first quantile, median and third quantile. c A summary of net gain or loss events across all five groups. Yellow and purple bars indicate greater net gain and greater net loss of each gene. Source data used to plot (a) and (b) is available in Supplementary Datas 14 and 15, respectively.