Literature DB >> 25803742

Genomic characterization of Burkholderia pseudomallei isolates selected for medical countermeasures testing: comparative genomics associated with differential virulence.

Jason W Sahl1, Christopher J Allender2, Rebecca E Colman3, Katy J Califf2, James M Schupp3, Bart J Currie4, Kristopher E Van Zandt5, H Carl Gelhaus5, Paul Keim1, Apichai Tuanyok6.   

Abstract

Burkholderia pseudomallei is the causative agent of melioidosis and a potential bioterrorism agent. In the development of medical countermeasures against B. pseudomallei infection, the US Food and Drug Administration (FDA) animal Rule recommends using well-characterized strains in animal challenge studies. In this study, whole genome sequence data were generated for 6 B. pseudomallei isolates previously identified as candidates for animal challenge studies; an additional 5 isolates were sequenced that were associated with human inhalational melioidosis. A core genome single nucleotide polymorphism (SNP) phylogeny inferred from a concatenated SNP alignment from the 11 isolates sequenced in this study and a diverse global collection of isolates demonstrated the diversity of the proposed Animal Rule isolates. To understand the genomic composition of each isolate, a large-scale blast score ratio (LS-BSR) analysis was performed on the entire pan-genome; this demonstrated the variable composition of genes across the panel and also helped to identify genes unique to individual isolates. In addition, a set of ~550 genes associated with pathogenesis in B. pseudomallei were screened against the 11 sequenced genomes with LS-BSR. Differential gene distribution for 54 virulence-associated genes was observed between genomes and three of these genes were correlated with differential virulence observed in animal challenge studies using BALB/c mice. Differentially conserved genes and SNPs associated with disease severity were identified and could be the basis for future studies investigating the pathogenesis of B. pseudomallei. Overall, the genetic characterization of the 11 proposed Animal Rule isolates provides context for future studies involving B. pseudomallei pathogenesis, differential virulence, and efficacy to therapeutics.

Entities:  

Mesh:

Year:  2015        PMID: 25803742      PMCID: PMC4372212          DOI: 10.1371/journal.pone.0121052

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Burkholderia pseudomallei is a pathogen endemic to Southeast Asia and Northern Australia but is increasingly found in other parts of the world including India, South America, and Africa, where it is naturally found in soil and water [1]. The bacterium is the causative agent of melioidosis [2-5], a potentially fatal disease in humans. B. pseudomallei is also considered to be a Tier 1 biothreat agent due to its ease of attainment, ability to cause lethal disease, intrinsic antibiotic resistance [6], and lack of a melioidosis vaccine [7]. The development of appropriate medical countermeasures against melioidosis has been hampered by access to human patients for clinical trials with compounds that are not currently approved for the treatment of melioidosis. To address this concern, the US Food and Drug Administration (FDA) has instituted the “Animal Rule” 21 CFR that calls for well-characterized strains to be used in animal challenge studies [8], including BALB/c mice, which have shown to represent acute human melioidosis [9]. Based on several selection criteria, a recent study selected a panel of six B. pseudomallei strains that would be appropriate for challenge studies under the FDA Animal Rule [7]. In the current study, we used whole-genome sequencing (WGS) to genetically characterize a panel of B. pseudomallei strains to be used as challenge material in therapeutic efficacy studies under the Animal Rule. In addition, we sequenced 5 B. pseudomallei strains associated with inhalational disease for evaluation as potential challenge strains. The purpose of WGS on these isolates was to (1) characterize the genomic background in each isolate; (2) identify the phylogenetic diversity of panel isolates in the context of a global set of genomes and; (3) identify the distribution of characterized virulence factors for correlation with virulence data obtained in animal challenge studies.

Methods

Strain selection

Eleven diverse isolates were selected for sequencing (Table 1). Six of these isolates were previously selected as part of a proposed B. pseudomallei strain panel, based on several selection criteria [7]. For five of these isolates, there are finished genome assemblies available in public databases [10]; these genomes were sequenced to identify any mutations compared to the published genomes. The genome for an additional isolate, NCTC 13392, has previously been published [11]. An additional 5 isolates were selected based on recent isolation and suspected inhalational disease and were associated with acute pneumonia sepsis.
Table 1

Details of isolates sequenced in current study.

IsolateIsolation sourceIsolation countryIsolation YearPassagesSRAGenbank AccessionBEI accession
1106a* liver abscessThailand199310SRX263957N/ANR-44208
K96243* bloodThailand199622SRR797065N/ANR-44206
MSHR305* brainAustralia199410SRX263963N/ANR-44225
MSHR668* bloodAustralia199511SRX259746N/ANR-44224
406e* toe swabThailand198810SRX256398AQTK00000000NR-44207
NCTC 13392* unknownThailand1996unknownSRX245558AOUG00000000N/A
1026b* bloodThailand19939SRX259661N/ANR-9910
MSHR5855sputumAustralia20117SRX526461JMMV00000000NR-45120
MSHR5858sputumAustralia20112SRX264496AVAK00000000NR-45120
HBPUB10134asputumThailand201010SRR796658AVAL00000000NR-44220
HBPUB10303asputumThailand201110SRX254948AVAM00000000NR-44221

*genome has been sequenced previously

*genome has been sequenced previously

Animal challenge studies

285 BALB/c mice (100% female) were purchased from Charles River Laboratories and were randomly selected and placed into challenge groups (n = 7) based on different isolates and dosing. Mice here housed in Innovive IVC mouse racks using disposable caging (7 mice per cage). Sedated mice were challenged by intranasal inoculation (15 μl per nare) of target doses diluted in Dulbecco’s Phosphate-Buffered Saline (PBS); mice were anesthetized intraperitoneally with ketamine (50–120 mg/kg) and xylazine (5–10 mg/kg). Prior to challenge, cultures were grown for 22 hours shaking at 37°C at 250xRPM; no mice were mock-treated in this study. The culture was then centrifuged and re-suspended in PBS containing 0.01% gelatin. The concentration of each challenge dilution was determined by spread plate enumeration. Following challenge, mice were monitored every 8 hours between days 1 and 7, then twice daily between days 8 and 21; sample HBPUB10303a was only challenged for 14 days due to unforeseen delays in starting the experiment. Observations were made for clinical signs of illness, including respiratory distress, loss of appetite and activity, and seizures; any animal judged to be moribund by a trained animal technician was humanely euthanized. All study survivors were humanely euthanized with CO2 inhalation on Study Day 21. Kaplan-Meier survival curves were created using the ‘survival’ package in R [12]. Animal challenge studies were conducted at the Battelle Biomedical Research Center (BBRC). All animal work was approved by Battelle’s IACUC prior to study initiation.

DNA extraction, library creation, sequencing

DNA library constructions were performed using the KAPA Library Preparation Kits with Standard PCR Library Amplification/Illumina series (KAPA biosystems, Boston MA, code KK8201). Quality and quantity of genomic DNA were evaluated by agarose gel analysis. One to two micrograms of DNA per sample were fragmented using a SonicMan (Matrical) with following parameters: 75.0 seconds pre chill, 16 cycles, 10.0 sec sonication, 100% power, 75.0 sec lid chill, 10.0 sec plate chill, and 75.0 sec post chill. The fragmented DNA was purified using QIAGEN QIAquick PCR purification columns (QIAGEN, cat. no. 28104) and eluted into 42.5 μl of Elution Buffer. The adapter ligation used 1.5 μl of the 40 μM adapter oligo mix [13]. Only one post-ligation bead cleanup was done. All purification steps were done with the 1.8x SPRI bead protocol in the KAPA protocol. Size selection of fragments was gel based; 30 μl of clean ligated material was run onto a 2% agarose gel. Several gel slices, corresponding to different average DNA fragment sizes (300, 600, and 1000bp fragments) were extracted from the gel and purified with a QIAGEN Gel Extraction kit (QIAGEN, cat. no. 28704) and eluted in 30 μl of Elution Buffer. Due to the high GC content of the samples, the PCR was optimized to improve yield and genomic coverage. Two microliters of DNA, 2 μl of 10 μM of both primers, 25 μl of NEBNext High-Fidelity 2X PCR Master Mix (New England Biolabs, Ipswich, MA, cat. no. M0541S), and 22 μl of 5 M Betaine (Sigma-Aldrich, St. Louis, MO, cat. no. B0300-1VL) were combined. The following PCR parameters were used: initial denaturation of 2 min at 98°C, 12 cycles of 30 sec at 98°C, 20 sec at 65°C, 30 sec at 72°C, with a final extension of 5 min at 72°C.

Genome assembly

For strains that have been sequenced previously, a comparative assembly approach was employed. Reads were assembled against the reference genome (S1 Table) with AMOScmp [14]. Assembled contigs were then aligned against the reference genome with ABACAS [15] to obtain a genomic scaffold. Gaps in scaffolds were filled with IMAGE [16], which also splits un-filled scaffolds into contigs. In addition to the comparative assembly, reads were also assembled with Abyss v. 1.3.4 [17]. The two assemblies were aligned with Mugsy [18] and regions specific to the de novo assembly were parsed from the MAF file [19], as has been done previously [20]. Putative unique regions in the de novo assembly were aligned against the comparative assembly with BLASTN [21]. Regions that significantly aligned (>90% ID, >90% query length) to the comparative assembly were filtered from the analysis. Remaining regions were combined with the comparative assembly. Assembly errors were corrected from this concatenated assembly with iCORN [22], using ten iterations. For strains that had not been sequenced previously, genomes were assembled de novo with Abyss v 1.3.4 and assembly errors were corrected with iCORN. Assembly details are shown in S1 Table.

In silico multi-locus sequence typing (isMLST)

BLASTN [21] was used to extract sequences from the seven loci in the B. pseudomallei MLST scheme [23] from all genome assemblies. To be considered a match, the alignment from the query genome must match a reference allele 100%. Sequence types were assigned to genomes when exact profile matches were identified. The isMLST functionality was performed with a custom Python script (https://gist.github.com/jasonsahl/33b0d9a8e3ac035bb92c). MLST typing information is shown in S1 Table.

Single nucleotide polymorphism (SNP) and indel identification and annotation

For re-sequencing efforts (Table 1), raw reads were mapped to the finished genome with BWA-MEM v0.7.5 [24]. SNPs and indels were then called with the UnifiedGenotyper in GATK v. 2.7 [25]; nucmer [26] was used to find duplicate regions in the reference genome and any SNPs falling within duplicate regions were filtered from the analysis. For a SNP or indel to be called, we required a minimum coverage of 6x and a minimum proportion threshold of 0.90. Nucleotide variants were annotated with snpEFF [27]. All variants were visually confirmed from BAM files with Tablet [28].

Synteny between previously sequenced genomes

In addition to identifying variants between finished genomes and re-sequencing projects, genome assemblies were aligned to completed genomes with MUMmer [29] and dot plots were visualized with mummerplot to identify any structural variation.

Core genome SNP phylogeny

To visualize the phylogenetic diversity of genomes sequenced in this study, a core genome phylogenetic approach was employed; core regions are defined as sequence conserved in all examined genomes. A diverse set of finished and draft genomes was compiled (S2 Table). Raw reads were mapped to B. pseudomallei K96243 [30] with BWA-MEM [24]. SNPs were called from each BAM file with GATK, using the EMIT_ALL_CONFIDENT_SITES method, with a minimum coverage of 6x and a minimum proportion of 0.90. For genomic assemblies, SNPs were identified from nucmer alignments. Positions in K96243 were directly mapped to the corresponding position in each query genome assembly. A matrix was generated (S1 Dataset) with NASP (http://tgennorth.github.io/NASP/) from all reference positions called and polymorphic sites were identified. SNPs that could not be called by GATK, or failed to pass the depth or proportion filters, were filtered from the matrix, as well as SNPs that fell within identified duplications. The remaining dataset consisted of 62,663 SNPs, 50,290 of them being informative. A maximum likelihood phylogeny was inferred on this dataset with RAxML v8.0.17 [31, 32] using the ASC_GTRGAMMA model and 100 bootstrap replicates. The retention index (RI) value [33] was calculated with Phangorn [34].

SNP and homoplasy density

To identify the conservation of the reference chromosomes, as well as to potentially identify any lateral gene transfer events that may confound the phylogeny, a SNP density (SD) and homoplasy density (HD) approach was employed. The SNP matrix was parsed over 1-kb non-overlapping windows of each chromosome and the number of informative SNPs was then calculated. The dataset was then processed with Paup v4.0b10 [35] to calculate the retention index (RI) value for each SNP. An RI value < 0.5 was considered to be homoplasious and the number of homoplasious SNPs over the same 1-kb window was then calculated. The HD value for each 1-kb window was calculated by dividing the number of homoplasious SNPs by the total number of informative SNPs. The distribution of SD and HD across the two chromosomes in K96243 was visualized with Circos [36].

In silico gene screen

A set of previously described virulence factors [1, 30, 37–42] characterized in B. pseudomallei were compiled (S3 Table). Genes were screened against the genomes sequenced in this study with a large-scale blast score ratio (LS-BSR) approach [43]. Genes were translated with BioPython (www.biopython.org) and aligned against its nucleotide sequence with TBLASTN in order to obtain the maximum alignment (reference) bit score. Each gene was then aligned against each genome with TBLASTN in order to obtain the query alignment bit score. The BSR [44] was obtained by dividing the reference bit score by the query bit score. Genes with a BSR value > 0.90 or < 0.80 in all genomes were removed from the analysis; the complete LS-BSR matrix is available as S2 Dataset. The genes were then correlated with the tree to identify phylogenetic patterns of gene presence/absence.

Genotype and phenotype correlations

Two approaches were performed to determine if there were correlations between genomic information and survival information obtained from animal challenge studies. The survival data were split into three categories: low virulence (100% mouse survival after 21 days), intermediate virulence (<100%, >0% survival after 21 days), and high virulence (0% mouse survival after 21 days). LS-BSR values across all genomes were multiplied by 100 in order to convert all float values to integers. The adjusted LS-BSR values were then correlated with the categorical virulence data using a Kruskal-Wallis test [45] implemented in QIIME v. 1.8.0 [46]. Core genome SNP data were also correlated to categorical data with a chi-square test implemented in SciPy. P-values were corrected with the Benjamini-Hochberg correction [47]. To test for false positives, genomes were randomly assigned to two groups of equal size and the average number of SNPs unique to each group was calculated over 10 iterations.

Unique genomic regions

In addition to screening characterized virulence genes in assembled genomes, a de novo approach was also performed. All coding regions (CDSs) from all genomes in the phylogeny were compared with LS-BSR. Regions were determined to be unique to a given genome if they contained a BSR < 0.4 in all non-targeted genomes. Each unique CDS was then aligned against the GenBank [48] nucleotide database with BLASTN, and the closest hit, based on highest bit score, was identified.

Ethics Statement

The animal protocol (2934–100007643) was approved by the Battelle Institutional Animal Care and Use Committee. The research was conducted in compliance with the Animal Welfare Act and followed the principles in the Guide for the Care and Use of Laboratory Animals from the National Research Council, Office of Laboratory Animal Welfare (OLAW), and USDA. Additionally, the research was conducted following an Institutional Animal Care and Use Committee (IACUC) approved protocol. The institution where the research was conducted is fully accredited by the Association for the Assessment and Accreditation of Laboratory Animal Care International (AAALAC).

Results

Comparisons of re-sequenced isolates with finished genomes

Five of the genomes sequenced in this study represent re-sequencing projects of finished genomes available in public databases (S1 Table). However, due to standard laboratory passages, new nucleotide variants can accumulate [49], and were identified in the current study using raw read data. The results demonstrate that many re-sequenced isolates show little mutation since the genomes were published (Table 2). However, the version of K96243 that was sequenced in the current study showed numerous variant positions (33) compared to the completed genome (Table 2), including the loss of two annotated stop codons. Some of these differences could be errors in the original genome sequence, which we are unable to verify. In addition to the analysis of nucleotide variants, the synteny of genomes was visualized as dot plots (S1 Fig) and demonstrated high synteny between all re-sequenced genome assemblies and finished genomes.
Table 2

Nucleotide variant information for re-sequencing projects conducted in current study.

NameChromosomeCoordinateReferenceQueryLocusEffectAnnotationProportionDepth
K96243NC006350.1549058GABPSL0500non-synonymoushexosaminidase0.99165
K96243NC006350.1549059TGBPSL0500synonymoushexosaminidase1.00163
K96243NC006350.1549061CTBPSL0500non-synonymoushexosaminidase1.00162
K96243NC006350.1549062CTBPSL0500synonymoushexosaminidase1.00165
K96243NC006350.12399742CGBPSL2010non-synonymouslipid metabolism-like protein0.99138
K96243NC006350.12399743CGBPSL2010non-synonymouslipid metabolism-like protein0.99140
K96243NC006351.11607761TCBPSS1194non-synonymouspeptide synthase/polyketide synthase1.008
K96243NC006351.11607796GCBPSS1194non-synonymouspeptide synthase/polyketide synthase0.98100
K96243NC006351.11607820GCBPSS1194non-synonymouspeptide synthase/polyketide synthase0.9999
K96243NC006351.11607822GCBPSS1194synonymouspeptide synthase/polyketide synthase0.9999
K96243NC006351.11607825GCBPSS1194synonymouspeptide synthase/polyketide synthase0.97102
K96243NC006351.11607838TGBPSS1194non-synonymouspeptide synthase/polyketide synthase0.99117
K96243NC006351.11607851TCBPSS1194non-synonymouspeptide synthase/polyketide synthase0.99111
K96243NC006351.11607874GTBPSS1194non-synonymouspeptide synthase/polyketide synthase0.9762
K96243NC006351.11607887GCBPSS1194non-synonymouspeptide synthase/polyketide synthase1.0058
K96243NC006351.11607894GABPSS1194synonymouspeptide synthase/polyketide synthase0.9972
K96243NC006351.11607902GCBPSS1194non-synonymouspeptide synthase/polyketide synthase1.0068
K96243NC006351.11607910CABPSS1194non-synonymouspeptide synthase/polyketide synthase0.9557
K96243NC006351.11607917GCBPSS1194non-synonymouspeptide synthase/polyketide synthase1.0048
K96243NC006351.11607997CAintergenicN/AN/A0.9649
K96243NC006351.11608005TCBPSS1195stop codon destroyednon-ribosomal peptide synthase1.0053
K96243NC006351.11608012GCBPSS1195non-synonymousnon-ribosomal peptide synthase0.9855
K96243NC006351.11608015GTBPSS1195non-synonymousnon-ribosomal peptide synthase1.0055
K96243NC006351.11608017TABPSS1195non-synonymousnon-ribosomal peptide synthase1.0055
K96243NC006351.11608029GCBPSS1195synonymousnon-ribosomal peptide synthase1.0098
K96243NC006351.11615675CTBPSS1197non-synonymousBPSS11970.98192
K96243NC006351.11764438GTintergenicN/AN/A1.0096
K96243NC006351.11764448GTintergenicN/AN/A0.93101
K96243NC006351.12337386ACBPSS1703stop codon destroyedhypothetical protein1.00188
1106aNC_009076.1797819TGBURPS1106A_0812non-synonymoussensor histidine kinase0.99125
1026bNC_017832.12020919GABP1026B_II1596non-synonymoustype VI secretion system, VGR0.91297
668NC_009074.13755785GCBURPS668_3852synonymouschemotaxis protein methyltransferase0.99289
668NC_009075.192668CGCintergenicN/AN/A0.91276

Core genome single nucleotide polymorphism (SNP) phylogeny

To phylogenetically characterize the isolates sequenced in this study, a maximum likelihood phylogeny was inferred from ~63,000 core genome SNPs (Fig. 1) identified from 44 genomes. The results demonstrate that the isolates sequenced in the current study show a broad phylogenetic history compared to previously sequenced isolates. By including phylogenetically diverse isolates in the isolate panel, local patterns of gene distribution do not bias the analysis. The retention index (RI) value of the data and maximum likelihood phylogeny demonstrated signs of homoplasy (RI = 0.62). Recombination in B. pseudomallei has been previously described [23] and homoplasy was anticipated due the recombinatorial nature of the species.
Fig 1

A maximum likelihood phylogeny inferred from a concatenation of ~63,000 core-genome single nucleotide polymorphisms (SNPs) identified in the eleven genomes sequenced in this study, shown in red, and a reference set of genomes (S2 Table).

The tree was inferred with RAxML v8 [31, 32] using the ASC_GTRGAMMA model and 100 bootstrap replicates. Filled circles are placed at nodes where the bootstrap support values are >90%.

A maximum likelihood phylogeny inferred from a concatenation of ~63,000 core-genome single nucleotide polymorphisms (SNPs) identified in the eleven genomes sequenced in this study, shown in red, and a reference set of genomes (S2 Table).

The tree was inferred with RAxML v8 [31, 32] using the ASC_GTRGAMMA model and 100 bootstrap replicates. Filled circles are placed at nodes where the bootstrap support values are >90%. The RI value of the phylogeny demonstrated the presence of homoplasy. Based on this dataset, the presence of homoplasy across the reference genome, K96243, was investigated with a SNP and homoplasy density approach. The results demonstrate that with the isolates tested, chromosome 1 of B. pseudomallei K96243 is more highly conserved than chromosome 2 (Fig. 2). Additionally, the homoplasy is distributed across both chromosomes, with no clear regions associated with specific recombination or lateral gene transfer events.
Fig 2

Plots of single nucleotide polymorphism (SNP) density and homoplasy density (HD), across the two chromosomes of the reference isolate, K96243 [30].

The outer ring represents the number of informative SNPs across 1-kb genomic intervals. The inner ring indicates the number of homoplasious SNPs, as determined by a retention index (RI) value <0.5 calculated by Paup [35], divided by the total number of informative SNPs over the same 1-kb genomic interval. HD and SD values were visualized with Circos [36].

Plots of single nucleotide polymorphism (SNP) density and homoplasy density (HD), across the two chromosomes of the reference isolate, K96243 [30].

The outer ring represents the number of informative SNPs across 1-kb genomic intervals. The inner ring indicates the number of homoplasious SNPs, as determined by a retention index (RI) value <0.5 calculated by Paup [35], divided by the total number of informative SNPs over the same 1-kb genomic interval. HD and SD values were visualized with Circos [36].

Unique coding sequences (CDSs)

B. pseudomallei has a highly plastic genome and has the ability to acquire new genes horizontally from other microorganisms, especially as the pathogen persists in the environment. A large-scale blast score ratio (LS-BSR) analysis was performed on the 44 B. pseudomallei genomes in the phylogeny (Fig. 1) to identify any unique CDSs in the 11 isolates sequenced in the current study; the criteria for a CDS to be considered unique is that it must have a BSR value < 0.4 in all non-targeted genomes. A list of closest BLAST hits to unique CDSs not associated with either B. pseudomallei or B. mallei, based on the highest bit score, is shown in Table 3. These regions are likely associated with genomic islands horizontally transferred from related organisms [50].
Table 3

Annotation for unique genes identified in genomes sequenced in the current study.

Genomeclosest BLAST matchclosest BLAST annotationnearest BLAST organismprotein ID (%)query length
MSHR5858BUPH_05469integrase catalytic subunit Burkholderia phenoliruptrix 9999
NCTC13392BDB_110343hypotheticalblood disease bacterium R2298399
NCTC13392BDB_110341hypotheticalblood disease bacterium R2298298
10134aYP_582472tyrosine-based site-specific recombinase Cupriavidus metallidurans CH346498
10134aYP_005995055hypothetical Ralstonia solanacearum CMR156298
10134aYP_005995053putative integrase Ralstonia solanacearum CMR154998
10134aWP_017232947hypothetical Pandoraea sp. B-6]9199
10134aWP_017232948DEAD/DEAH box helicase Pandoraea sp. B-6]91100
10134aWP_008918033N-6 DNA methylase Burkholderia sp. H1607899
10134aWP_017232950ATP-dependent helicase Pandoraea sp. B-6]8799
10134aWP_006395564hypothetical Achromobacter xylosoxidans 6083
HBPUB10303aYP_443256type I restriction-modification system Burkholderia thailandensis E2647872
HBPUB10303aYP_443255hypothetical Burkholderia thailandensis E2649199
HBPUB10303aYP_005028223hypothetical Dechlorosoma suillum PS7499
HBPUB10303aWP_008248767hypothetical Limnobacter sp. MED1055199
HBPUB10303aYP_006030638helicase domain-containing protein Ralstonia solanacearum Po827999
MSHR5855Bamb_2400phage integrase family protein Burkholderia ambifaria AMMD8996
MSHR5855BTQ_1983putative membrane proten Burkholderia thailandensis 200272172399100
MSHR5855BTJ_373hypothetical Burkholderia thailandensis E44499100
MSHR5855bglu_1g24070hypothetical Burkholderia glumae BGR19199
MSHR5855BTI_1944hypothetical Burkholderia thailandensis MSMB12199100
MSHR5855BTI_1943hypothetical Burkholderia thailandensis MSMB12178100
MSHR5855BTI_1942helix-turn-helix family protein Burkholderia thailandensis MSMB12199100
MSHR5855Rpic12D_1056lipoprotein releasing system Ralstonia pickettii 12D98100

Virulence gene profile

A comprehensive set of virulence-associated genes (S3 Table) was screened against the 11 genomes sequenced in this study with LS-BSR. To only compare differentially conserved regions, genes were filtered if they had a BSR value > 0.90 in all 11 genomes. The resulting variable set of genes (n = 54) was correlated to the phylogeny and LS-BSR values were visualized as a heatmap (Fig. 3). The results demonstrate that phylogenetically-distinct isolates contain a variable composition of virulence-associated genes.
Fig 3

A heatmap of blast score ratio (BSR) values [44] calculated from a known set of virulence factors characterized in B. pseudomallei (S3 Table) with the large-scale blast score ratio (LS-BSR) pipeline [43].

A maximum likelihood phylogeny was inferred on a concatenation of single nucleotide polymorphisms (SNPs) and was correlated to the heatmap.

A heatmap of blast score ratio (BSR) values [44] calculated from a known set of virulence factors characterized in B. pseudomallei (S3 Table) with the large-scale blast score ratio (LS-BSR) pipeline [43].

A maximum likelihood phylogeny was inferred on a concatenation of single nucleotide polymorphisms (SNPs) and was correlated to the heatmap. Every B. pseudomallei isolate in this study contained the B. pseudomallei bimA (BimABp) allele [51], except B. pseudomallei MSHR668, which contained the alternative B. mallei-type (BimABm). The most severe clinical presentations have been associated with the co-occurrence of BimABm with another virulence-associated gene, filamentous hemagglutinin fhaB3 (BPSS2053 in B. pseudomallei K96243), which is linked with adhesion and heightened virulence [52, 53]. While B. pseudomallei MSHR668 is missing fhaB3, it does contain another fhaB gene (similar to fhaB1 from B. pseudomallei MSHR305 [54]). fhaB3 was observed in all Asian isolates in this study, which is consistent with previous work [54, 55]. Isolates sequenced in this study either contained the Yersinia-like fimbriae cluster (YLF) or the B. thailandensis-like flagellum and chemotaxis (BTFC) gene cluster. These genes were included in our analysis because they are suggested as being active during melioidosis. Two isolates in this study, 1026b and MSHR305, exhibited reduced sequence homology to the T6SS-1 gene, BPSS1511. The T6SS-1 representative sequence, icmF gene (BPSS1511), which is required for intracellular growth of many pathogens associated with eukaryotic cells [56], showed homology, but lower sequence identity, in 1026b and MSHR305. Four isolates (MSHR5855, MSHR305, 1106a, and HBPUB10134a) exhibited reduced sequence homology for BPSS1493, a hypothetical protein associated with type VI secretion. To identify differential virulence between ten of the eleven isolates sequenced in this study, BALB/c mice (seven per group) were challenged at different concentrations of inoculum (Table 4). At an average of ~10 colony forming units (CFUs) per group, four of the ten isolates killed all of the mice in the group, 5 of the isolates killed an intermediate number of mice, and one isolate (1106a) killed none of the mice (Table 4, S2 Fig, S4 Table); HBPUB10303a was treated as intermediate in terms of virulence, despite the fact that the isolate was challenged for only 14 days instead of 21 in this experiment. At a high concentration of inoculum (~12,000 CFUs), none of the mice survived when challenged with any of the ten panel isolates. This demonstrates that all of the isolates are virulent by intranasal inoculation, but there is a dose-dependent virulence response.
Table 4

Survival data of 10 strains injected intranasally in BALB/c mice.

Challenge StrainTarget CFU:10Target CFU: 100Target CFU: 1000Target CFU: 10000
CFUs%survivalCFUs%survivalCFUs%survivalCFUs%survival
K9624313(+/-0)57170(+/-21)141581(+/-212)011940(+/-1485)0
MSHR406e13(+/-0)0135(+/-4)01365(+/-106)012840(+/-636)0
1026b12(+/-0)57127(+/-11)01215(+/-106)011355(+/-785)0
1106a12(+/-0)100128(+/-23)711220(+/-112)2912090(+/-2121)0
MSHR30516(+/-1)29149(+/-15)01524(+/-21)014100(+/-2121)0
MSHR66814(+/-0)0122(+/-2)01095(+/-8)012945(+/-912)0
MSHR585516(+/-1)0185(+/-48)01575(+/-136)014550(+/-933)0
MSHR585814(+/-2)86154(+/-8)141485(+/-119)015750(+/-2630)0
HBPUB10303a6(+/-0)7159(+/-1)14537(+/-38)05280(+/-424)0
HBPUB10134a160163017790188100
Differences were observed in both the virulence gene profile and the animal challenge studies. To identify if any CDSs were associated with differential virulence, a combined LS-BSR/QIIME analysis was performed. A Kruskal-Wallis test [45] demonstrated that numerous CDSs were significantly (false detection rate adjusted (FDR) p<0.05) differentially conserved between groups (Table 5); three of these CDSs (BPSS0771, BPSS1185, BPSS1269) have previously been associated with virulence (Table 5). Additionally, an association was made between core genome SNPs and differential virulence. Forty SNPs were only identified in high virulence isolates (Table 6), which could be due to descent and subsequent loss by intermediate and low virulence isolates, but may also be associated with convergent evolution and virulence (Fig. 3). By randomly assigning genomes to high and low virulence groups, an average of 31 correlated SNPs were identified over ten iterations. This demonstrates that with small sample sets, identified correlations would definitely need to be corroborated with functional characterization.
Table 5

Correlations of LS-BSR values with observed differential virulence in BALB/c mice.

AccessionAnnotationBSR average (high)* BSR average (intermediate)* BSR average (low)* FDR p-value
BPSS1185undefined product73.587.2140.0000
BPSL2990histone H1-like protein10071410.0003
BPSL0016general secretory pathway protein L10099.8510.0022
BDL_4286response regulator97.2587.2490.0082
BPSS1308isoaspartyl peptidase10099.6540.0082
BPSL0859N-acetylmuramoyl-L-alanine amidase100100550.0106
BPSS0771hypothetical protein9881.6500.0147
BPSS1269peptide_synthase/polyketide_synthase87.7599.2540.0324
BPSS1212hypothetical protein99.7578.4540.0454

*High, intermediate and low virulence determined by intranasal challenge at ~10 colony forming units.

Table 6

Single nucleotide polymorphisms (SNPs) unique to high virulence isolates.

ChromCoordinateK96243 callquery callrefAAderivedAAlocus tagAnnotation
NC_006350.1220073CTRQBPSL0211lipid A biosynthesis lauroyl acyltransferase
NC_006350.1220163CTRQBPSL0211lipid A biosynthesis lauroyl acyltransferase
NC_006350.1240744TCMVBPSL0230fliF; flagellar MS-ring protein
NC_006350.1534205GTADBPSL0492hypothetical protein
NC_006350.1554241TCAABPSL0504rpoH; RNA polymerase factor sigma-32
NC_006350.11839157CGN/AN/AintergenicN/A
NC_006350.11839166GAN/AN/AintergenicN/A
NC_006350.11839172TCN/AN/AintergenicN/A
NC_006350.11839197ACN/AN/AintergenicN/A
NC_006350.11839256AGN/AN/AintergenicN/A
NC_006350.11839334CTN/AN/AintergenicN/A
NC_006350.11839988TCKRBPSL1583hypothetical protein
NC_006350.11840050GAAABPSL1583hypothetical protein
NC_006350.12402720TCN/AN/AintergenicN/A
NC_006350.12440328AGDDBPSL2041hypothetical protein
NC_006350.12553415CGAABPSL2126transport-related, membrane protein
NC_006350.12555149GAN/AN/AintergenicN/A
NC_006350.12555151CTN/AN/AintergenicN/A
NC_006350.12820597AGITBPSL2334hypothetical protein
NC_006350.12847891TCN/AN/AintergenicN/A
NC_006350.13021077ACN/AN/AintergenicN/A
NC_006350.13403668GTTTBPSL2842FAD-binding oxidase
NC_006350.13654489CGN/AN/AintergenicN/A
NC_006350.13924581GAHHBPSL3305cheW; chemotaxis protein
NC_006350.14073158TCEGBPSL3430glutamine amidotransferase
NC_006351.1702758GAGSBPSS0515hypothetical protein
NC_006351.11236140TGPPBPSS0936hypothetical protein
NC_006351.11269935TCN/AN/AintergenicN/A
NC_006351.11398575CTPPBPSS1026hypothetical protein
NC_006351.11398581AG*WBPSS1026hypothetical protein
NC_006351.11917766TCN/AN/AintergenicN/A
NC_006351.12455301TGSABPSS1795hypothetical protein
NC_006351.12514965GAN/AN/AintergenicN/A
NC_006351.12695450CTN/AN/AintergenicN/A
NC_006351.13000485GAN/AN/AintergenicN/A
NC_006351.13042744AGDDBPSS2265monooxygenase
NC_006351.13097464GAN/AN/AintergenicN/A
NC_006351.13097471CTN/AN/AintergenicN/A
NC_006351.13097776CAN/AN/AintergenicN/A
NC_006351.13160929GAN/AN/AintergenicN/A
*High, intermediate and low virulence determined by intranasal challenge at ~10 colony forming units.

Discussion

Burkholderia pseudomallei is an important pathogen as both the causative agent of melioidosis and as a potential biothreat agent. In the development of medical countermeasures against melioidosis, a panel of clinically relevant isolates have been identified [7] for challenge studies under the FDA Animal Rule [8]. In this study, we sequenced all 6 of these isolates as well as 5 additional isolates associated with human inhalational melioidosis. A comparative genomics approach was employed to understand the genetic composition of each genome and the distribution of genetic elements between genomes. These results were correlated with animal survival data to determine if phenotype/genotype correlations could be identified. Ten of the 11 isolates were passed through a BALB/c mouse model in groups of seven mice per isolate. Differential virulence was observed between isolates, with MSHR668 demonstrating the highest virulence (S2 Fig, Table 2), based on time to death. An attempt was made to correlate both the distribution of coding sequences (CDSs), based on large-scale blast score ratio (LS-BSR) values, and single nucleotide polymorphisms (SNPs), with differential virulence. Three CDSs previously associated with virulence were differentially conserved between disease severity groups (Table 4). Additionally, SNPs were identified that were only present in high-virulence isolates (Table 6). While the limited number of isolates tested in this study precludes definitive correlations between genotype and phenotype, differentially conserved CDSs and/or SNPs may inform larger-scale targeted functional studies, which may help to better understand the pathogenesis of B. pseudomallei, and subsequently, may improve human health. A maximum likelihood phylogeny inferred from a concatenation of ~60,000 core-genome SNPs demonstrated that the eleven isolates sequenced in the current study represent broad phylogenetic diversity. The retention index (RI) value, which provides a representation of the homoplasy in the dataset, demonstrated signs of homoplasy, which can confound accurate phylogenetic reconstruction. Plotting the observed homoplasy density (HD) across both chromosomes of B. pseudomallei K96243 demonstrated that the homoplasy was evenly distributed, with no isolated regions of recombination in the core genome. Although this underlying homoplasy may confound phylogenetic relationships, especially in deeply branching nodes, the phylogeny still demonstrates the overall diversity of the eleven isolates sequenced in the current study. Differences in the distribution of virulence-associated genes were observed based a LS-BSR analysis. One clear difference was the presence of the B. mallei bimA (BimA Bm) allele in MSHR668 and the B. pseudomallei version (BimA Bp) in all other isolates (Fig. 3). In previous studies, 12% of Australian isolates contained BimA Bm [55, 57], although both versions appear to perform actin-based motility effectively. An association between neurological melioidosis and strains with BimA Bm was recently reported [55]. Severe clinical presentations have been associated with the co-occurrence of BimA Bm and the hemagglutinin, fhaB3. The lack of fhaB3 in isolates exhibiting BimA Bm was correlated with cutaneous melioidosis without sepsis [55]. Testing isolates with varied distributions of these virulence components will help corroborate these associations. The Inv/Mxi-Spa-like type III secretion system (T3SS-3) [58] is essential for the survival of B. pseudomallei in the host [59, 60] and closely resembles secretion systems found in other animal pathogens (Salmonella spp. and Shigella spp.). B. pseudomallei isolates 1026b and HBPUB10134a appear to have reduced homology for BPSS1528, which is described as a (HNS-like regulatory) hypothetical protein in the T3SS-3 system. Several proteins act together to form a pore that becomes bound to the host membrane, thus facilitating the delivery of effector proteins [61, 62]. This system is also likely involved in defenses against autophagy by transporting the BopA effector [63, 64]. In this study, we observed sequence homology variation among many of the isolates in the gene, BPSS1629, from the T3SS-2 cluster. One of the most dramatic differences observed between isolates was from representative genes in the Yersinia-like fimbriae (YLF) gene cluster and the BTFC gene cluster. This division is mutually exclusive [54, 55] and it is unclear whether one cluster confers enhanced virulence over the other and no correlations have been identified between gene cluster and disease severity [55]. While YLF genes are generally associated with isolates from Thailand [55], we found no geographical correlation in the small sample set that we analyzed in the current study (Fig. 3). The FDA Animal Rule was set up to identify a set of relevant isolates that could be used in lieu of human clinical trials in the development of effective medical countermeasures against human disease, including melioidosis. The data presented in this study will provide a genomic background to better understand virulence in B. pseudomallei and may also help in the development of more effective medical countermeasures.

The complete LS-BSR matrix for all coding regions in each genome investigated.

(BZ2) Click here for additional data file.

A NASP (http://tgennorth.github.io/NASP/) matrix containing all SNPs from non-duplicated regions from all genomes queried.

(BZ2) Click here for additional data file.

Synteny dot plots between finished genomes available in GenBank and draft genomes generated in this study from re-sequencing studies.

Dot plots were generated using the mummerplot method in MUMmer. (TIF) Click here for additional data file.

A Kaplan-Meier curve of survival probabilities based on the BALB/c mice challenge studies conducted in the current study.

The survival probabilities were calculated using the ‘survival’ package in R [12]. (TIF) Click here for additional data file.

Sequencing information for isolates sequenced in the current study.

(PDF) Click here for additional data file.

Accession information for reference genomes.

(PDF) Click here for additional data file.

Virulence associated genes in the current study.

(PDF) Click here for additional data file.

Survival information over the course of BALB/c challenge studies for all strains challenged.

(PDF) Click here for additional data file.
  56 in total

1.  Fast algorithms for large-scale genome alignment and comparison.

Authors:  Arthur L Delcher; Adam Phillippy; Jane Carlton; Steven L Salzberg
Journal:  Nucleic Acids Res       Date:  2002-06-01       Impact factor: 16.971

2.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

3.  Circos: an information aesthetic for comparative genomics.

Authors:  Martin Krzywinski; Jacqueline Schein; Inanç Birol; Joseph Connors; Randy Gascoyne; Doug Horsman; Steven J Jones; Marco A Marra
Journal:  Genome Res       Date:  2009-06-18       Impact factor: 9.043

4.  Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps.

Authors:  Isheng J Tsai; Thomas D Otto; Matthew Berriman
Journal:  Genome Biol       Date:  2010-04-13       Impact factor: 13.583

5.  Multilocus sequence typing and evolutionary relationships among the causative agents of melioidosis and glanders, Burkholderia pseudomallei and Burkholderia mallei.

Authors:  Daniel Godoy; Gaynor Randle; Andrew J Simpson; David M Aanensen; Tyrone L Pitt; Reimi Kinoshita; Brian G Spratt
Journal:  J Clin Microbiol       Date:  2003-05       Impact factor: 5.948

6.  A genomic survey of positive selection in Burkholderia pseudomallei provides insights into the evolution of accidental virulence.

Authors:  Tannistha Nandi; Catherine Ong; Arvind Pratap Singh; Justin Boddey; Timothy Atkins; Mitali Sarkar-Tyson; Angela E Essex-Lopresti; Hui Hoon Chua; Talima Pearson; Jason F Kreisberg; Christina Nilsson; Pramila Ariyaratne; Catherine Ronning; Liliana Losada; Yijun Ruan; Wing-Kin Sung; Donald Woods; Richard W Titball; Ifor Beacham; Ian Peak; Paul Keim; William C Nierman; Patrick Tan
Journal:  PLoS Pathog       Date:  2010-04-01       Impact factor: 6.823

7.  Making whole genome multiple alignments usable for biologists.

Authors:  Daniel Blankenberg; James Taylor; Anton Nekrutenko
Journal:  Bioinformatics       Date:  2011-07-19       Impact factor: 6.937

8.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; Karen Clark; David J Lipman; James Ostell; Eric W Sayers
Journal:  Nucleic Acids Res       Date:  2011-12-05       Impact factor: 16.971

9.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

Authors:  Alexandros Stamatakis
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

10.  ABACAS: algorithm-based automatic contiguation of assembled sequences.

Authors:  Samuel Assefa; Thomas M Keane; Thomas D Otto; Chris Newbold; Matthew Berriman
Journal:  Bioinformatics       Date:  2009-06-03       Impact factor: 6.937

View more
  12 in total

1.  Phylogenetic and genomic diversity in isolates from the globally distributed Acinetobacter baumannii ST25 lineage.

Authors:  Jason W Sahl; Mariateresa Del Franco; Spyros Pournaras; Rebecca E Colman; Nabil Karah; Lenie Dijkshoorn; Raffaele Zarrilli
Journal:  Sci Rep       Date:  2015-10-14       Impact factor: 4.379

2.  Draft genome sequences of three fungal-interactive Paraburkholderia terrae strains, BS007, BS110 and BS437.

Authors:  Akbar Adjie Pratama; Irshad Ul Haq; Rashid Nazir; Maryam Chaib De Mares; Jan Dirk van Elsas
Journal:  Stand Genomic Sci       Date:  2017-12-18

3.  A novel inducible prophage from the mycosphere inhabitant Paraburkholderia terrae BS437.

Authors:  Akbar Adjie Pratama; Jan Dirk van Elsas
Journal:  Sci Rep       Date:  2017-08-22       Impact factor: 4.379

4.  Structural diversity of Burkholderia pseudomallei lipopolysaccharides affects innate immune signaling.

Authors:  Michael H Norris; Herbert P Schweizer; Apichai Tuanyok
Journal:  PLoS Negl Trop Dis       Date:  2017-04-28

5.  Increased Neurotropic Threat from Burkholderia pseudomallei Strains with a B. mallei-like Variation in the bimA Motility Gene, Australia.

Authors:  Jodie L Morris; Anne Fane; Derek S Sarovich; Erin P Price; Catherine M Rush; Brenda L Govan; Elizabeth Parker; Mark Mayo; Bart J Currie; Natkunam Ketheesan
Journal:  Emerg Infect Dis       Date:  2017-05       Impact factor: 6.883

6.  Characterization of in vitro phenotypes of Burkholderia pseudomallei and Burkholderia mallei strains potentially associated with persistent infection in mice.

Authors:  R C Bernhards; C K Cote; K Amemiya; D M Waag; C P Klimko; P L Worsham; S L Welkos
Journal:  Arch Microbiol       Date:  2016-10-13       Impact factor: 2.552

7.  Characterization of pathogenesis of and immune response to Burkholderia pseudomallei K96243 using both inhalational and intraperitoneal infection models in BALB/c and C57BL/6 mice.

Authors:  Jeremy J Bearss; Melissa Hunter; Jennifer L Dankmeyer; Kristen A Fritts; Christopher P Klimko; Chris H Weaver; Jennifer L Shoe; Avery V Quirk; Ronald G Toothman; Wendy M Webster; David P Fetterer; Joel A Bozue; Patricia L Worsham; Susan L Welkos; Kei Amemiya; Christopher K Cote
Journal:  PLoS One       Date:  2017-02-24       Impact factor: 3.240

8.  Pathogen to commensal? Longitudinal within-host population dynamics, evolution, and adaptation during a chronic >16-year Burkholderia pseudomallei infection.

Authors:  Talima Pearson; Jason W Sahl; Crystal M Hepp; Karthik Handady; Heidie Hornstra; Adam J Vazquez; Erik Settles; Mark Mayo; Mirjam Kaestli; Charles H D Williamson; Erin P Price; Derek S Sarovich; James M Cook; Spenser R Wolken; Richard A Bowen; Apichai Tuanyok; Jeffrey T Foster; Kevin P Drees; Timothy J Kidd; Scott C Bell; Bart J Currie; Paul Keim
Journal:  PLoS Pathog       Date:  2020-03-05       Impact factor: 6.823

9.  Comparative pan-genomic analyses of Orientia tsutsugamushi reveal an exceptional model of bacterial evolution driving genomic diversity.

Authors:  Amy Fleshman; Kristin Mullins; Jason Sahl; Crystal Hepp; Nathan Nieto; Kristin Wiggins; Heidie Hornstra; Daryl Kelly; Teik-Chye Chan; Rattanaphone Phetsouvanh; Sabine Dittrich; Phonepasith Panyanivong; Daniel Paris; Paul Newton; Allen Richards; Talima Pearson
Journal:  Microb Genom       Date:  2018-07-23

10.  Delineation of a Subgroup of the Genus Paraburkholderia, Including P. terrae DSM 17804T, P. hospita DSM 17164T, and Four Soil-Isolated Fungiphiles, Reveals Remarkable Genomic and Ecological Features-Proposal for the Definition of a P. hospita Species Cluster.

Authors:  Akbar Adjie Pratama; Diego Javier Jiménez; Qian Chen; Boyke Bunk; Cathrin Spröer; Jörg Overmann; Jan Dirk van Elsas
Journal:  Genome Biol Evol       Date:  2020-04-01       Impact factor: 3.416

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.