| Literature DB >> 20333227 |
Liliana Losada1, Catherine M Ronning, David DeShazer, Donald Woods, Natalie Fedorova, H Stanley Kim, Svetlana A Shabalina, Talima R Pearson, Lauren Brinkac, Patrick Tan, Tannistha Nandi, Jonathan Crabtree, Jonathan Badger, Steve Beckstrom-Sternberg, Muhammad Saqib, Steven E Schutzer, Paul Keim, William C Nierman.
Abstract
Burkholderia mallei (Bm), the causative agent of the predominately equine disease glanders, is a genetically uniform species that is very closely related to the much more diverse species Burkholderia pseudomallei (Bp), an opportunistic human pathogen and the primary cause of melioidosis. To gain insight into the relative lack of genetic diversity within Bm, we performed whole-genome comparative analysis of seven Bm strains and contrasted these with eight Bp strains. The Bm core genome (shared by all seven strains) is smaller in size than that of Bp, but the inverse is true for the variable gene sets that are distributed across strains. Interestingly, the biological roles of the Bm variable gene sets are much more homogeneous than those of Bp. The Bm variable genes are found mostly in contiguous regions flanked by insertion sequence (IS) elements, which appear to mediate excision and subsequent elimination of groups of genes that are under reduced selection in the mammalian host. The analysis suggests that the Bm genome continues to evolve through random IS-mediated recombination events, and differences in gene content may contribute to differences in virulence observed among Bm strains. The results are consistent with the view that Bm recently evolved from a single strain of Bp upon introduction into an animal host followed by expansion of IS elements, prophage elimination, and genome rearrangements and reduction mediated by homologous recombination across IS elements.Entities:
Keywords: bacterial evolution; bacterial virulence; comparative genomics; genome erosion
Year: 2010 PMID: 20333227 PMCID: PMC2839346 DOI: 10.1093/gbe/evq003
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Burkholderia mallei and B. pseudomallei Strains Used in This Study
| Size (bp) | ||||||||
| GenBank accession number | Virulent | Source | MLST | Chromosome I | Chromosome II | Total genes | Variable genes (% of genome) | |
| ATCC23344 | NC_006348, NC_006349 | Yes | Burma 1944 | 40 | 3,510,148 | 2,325,379 | 5,229 | 1,773 (34%) |
| NCTC10229 | NC_008836, NC_008835 | Yes | Hungary 1961 | 40 | 3,458,208 | 2,284,095 | 5,519 | 2,063 (37%) |
| NCTC10247 | NC_009080, NC_009079 | Attenuated | Turkey 1960 | 100 | 3,495,687 | 2,352,693 | 5,869 | 2,413 (41%) |
| SAVP1 | NC_008785, NC_008784 | No | 40 | 3,497,479 | 1,734,922 | 5,200 | 1,744 (33%) | |
| 2002721280 | NZ_AANX00000000 | No | Pasteur Institute | 40 | — | — | 5,300 | 2,239 (35%) |
| ATCC10399 | NZ_AAHN00000000 | Yes | China 1942 | 40 | — | — | 5,749 | 1,844 (40%) |
| PRL-20 | NZ_AAZP00000000 | Yes | Pakistan 2005 | 40 | — | — | 5,469 | 2,013 (37%) |
| K96243 | NC_006350, NC_006351 | Yes | Thailand 1996 | 10 | 4,074,542 | 3,173,005 | 6,324 | 688 (11%) |
| 1106a | NC_009076, NC_009078 | Yes | Thailand 1993 | 70 | 3,988,455 | 3,100,794 | 7,187 | 1551 (21%) |
| 1710b | NC_007434, NC_007435 | Yes | Thailand 1999 | 177 | 4,126,292 | 3,181,762 | 7,088 | 1452 (20%) |
| 668 | NC_009074, NC_009075 | Yes | Australia 1995 | 129 | 3,912,947 | 3,127,456 | 7,232 | 1388 (19%) |
| 1655 | NZ_AAHR00000000 | Yes | Australia 2003 | 131 | — | — | 6,980 | 1344 (19%) |
| 406e | NZ_AAMM00000000 | Yes | Thailand 1988 | 211 | — | — | 6,880 | 1244 (18%) |
| S13 | NZ_AAHW00000000 | Yes | Singapore | 51 | — | — | 7,217 | 1581 (22%) |
| Pasteur 52237 | NZ_AAHV00000000 | Yes | Viet Nam | 411 | — | — | 7,154 | 1518 (21%) |
Core genome is 3,456 genes for Bm and 5,636 for Bp.
Virulence determined by Syrian hamster infection model. Three groups of female Syrian hamsters (five per group) were infected by the intraperitoneal route with a range of 101–103 cfu for each strain of B. mallei examined. Mortality was recorded daily for 14 days and on day 15, the surviving animals from each group were euthanized.
WGS, whole-genome shotgun sequencing (unfinished).
FMultigenome alignment of eight Bp and seven Bm strains. Each circle represents a genome as presented in Materials and Methods. All genomes are aligned with Bp K96234 genome as a reference, which appears as the outermost multicolored circle. The Bp genomes are the eight outermost circles, and Bm genomes are internal. Areas in each color represent homologies between the subject genome and the reference. Areas in black in the reference chromosome (outermost circle) are regions present in K96243 but absent in query genome. Areas in black in each of the concentric circles are regions present in the query genome but absent from K96243. Representative Bp GIs are shown with red arrows. Representative clusters of Bp-specific genes absent from all Bm genomes (black on the K96243 ring) are highlighted with a yellow arrow.
FPan-genome analysis of seven Bm and eight Bp strains. The CDSs in all Bm genomes (blue line) and Bp genomes (red line) were compared, and the number of new genes was plotted against the number of genomes used. The blue dashed line represents the extrapolated number of Bm strain-specific genes. The red dashed line represents the extrapolated minimum number of new genes discovered with each Bp genome.
Percentages of Total Variable Genes within Each Functional Role Category
| Role category | ||||
| Mean (%) | Standard deviation (%) | Mean (%) | Standard deviation (%) | |
| Amino acid biosynthesis | 1.49 | 2.37 | 2.16 | 0.89 |
| Biosynthesis of cofactors, prosthetic groups, and carriers | 0.83 | 1.55 | 1.02 | 0.16 |
| Cell envelope | 6.80 | 4.78 | 11.46 | 2.44 |
| Cellular processes | 6.12 | 3.44 | 12.42 | 3.16 |
| Central intermediary metabolism | 2.10 | 3.12 | 2.67 | 0.35 |
| DNA metabolism | 24.51 | 14.39 | 0.90 | 0.21 |
| Energy metabolism | 3.57 | 4.83 | 14.42 | 1.86 |
| Fatty acid and phospholipid metabolism | 0.66 | 1.33 | 3.89 | 0.78 |
| Mobile and extrachromosomal element functions | 29.34 | 14.70 | 0.82 | 0.41 |
| Protein fate | 3.56 | 4.18 | 6.56 | 1.26 |
| Protein synthesis | 0.76 | 2.14 | 1.09 | 0.41 |
| Purines, pyrimidines, nucleosides, and nucleotides | 0.00 | 0.00 | 1.85 | 0.78 |
| Regulatory functions | 8.84 | 5.75 | 16.78 | 1.23 |
| Signal transduction | 0.00 | 0.00 | 5.37 | 3.50 |
| Transcription | 2.36 | 2.39 | 0.75 | 0.29 |
| Transport and binding proteins | 7.33 | 6.52 | 17.85 | 1.83 |
| Viral functions | 1.73 | 3.97 | 0.00 | 0.00 |
NOTE.—Mean, standard deviation, and range are given for eight Bp strains and seven Bm strains. Hypothetical and unknown proteins and proteins of unknown function have been excluded.
Variable Gene clusters in Bm
| 5' end | 3' end | Size (bp) | Boundary (5'/3') | ATCC23344 | SAVP1 | 10299 | 10247 | 10399 | 2002721280 | PRL-20 | Number of putative virulence genes | NRPS/PKS/Multidrug efflux pump | |
| A | 600,776 | 612,728 | 11,953 | IS407A | X | X | X | X | X | X | 1 | ||
| B | 1,000,692 | 1,080,040 | 79,349 | IS407A | X | X | X | X | 11 | RND | |||
| C | 1,269,317 | 1,277,504 | 8,188 | IS407A | X | X | X | X | X | X | 4 | ||
| D | 2,053,557 | 2,070,428 | 16,872 | IS407A | X | X | X | X | X | X | 5 | ||
| E | 2,335,045 | 2,354,063 | 19,019 | IS407A | X | X | X | X | X | X | 2 | PKS | |
| F | 2,527,011 | 2,629,142 | 102,132 | ISBm2/IS407A | X | X | X | X | X | X | 20 | ||
| G | 3,320,410 | 3,346,619 | 26,210 | ISBm2 | X | X | X | X | X | X | 6 | ||
| S | 1,136,910 | 1,145,707 | 8,798 | None/IS407A | X | X | X | X | X | 2 | |||
| V | 947,304 | 951,928 | 4,625 | IS407A/none | X | X | X | 1 | |||||
| W | 1,809,469 | 1,823,849 | 14,381 | A, transposase OrfB/IS407A | X | X | X | 0 | |||||
NOTE.—Each variable cluster was assigned a letter. Genomic locations for clusters A–R are from ATCC23344, where the bold font represents those located on chromosome II. Genomic locations for clusters S–W are from NCTC10247 (bold, chromosome II), and cluster X from NCTC10399 chromosome II. An X under each strain signifies that the cluster is presented in that genome.
Virulence genes were determined by using MVirDB as described in Materials and Methods.
NRPS, nonribosomal peptide synthase; PKS, polyketide synthase; RND, resistance nodulation-division like pump.
FIS407A rearrangement of whole genomes. (A) Relative occurrence of the nucleotides in the 4-bp direct repeat of IS407A element insertion is shown as bar graphs for each position in the box below. (B) Four fully sequenced Bm genomes were aligned using WebACT. Red lines denote homology between chromosomes organized in the same orientation. Blue lines show homology but inverse orientation in each chromosome. Yellow lines show the presence of IS407A elements. Regions with no homology are shown by the absence of red or blue lines. (C) Four fully sequence Bp genomes were aligned, as described for Bm.
FIS407A mediated rearrangements of rrn and replichores among Bm strains. (A) rrn rearrangements due to IS407A recombinations. The outermost ring corresponds to Bp K96243 but is a representative of all Bp genomes. Green, ATCC23344; orange, NCTC10229; purple, NCTC10247; brown, SAVP1. The brown rrn cluster represents the locus rearranged into chromosome II in Bm. Red bars represent degenerate rrn loci. (B) guanine/cytosine-skew representation of the NCTC10247 genome generated in DNAplotter (Carver et al. 2009). Green represents a negative guanine/cytosine-skew suggesting ORF are oriented in the negative strand and purple represents a positive guanine/cytosine-skew suggesting ORF oriented in the positive strand. The origin of replication for NCTC10247 chromosome I is predicted at around 2.3 Mb and the termination around 1.0 Mb. (C) Alignment of chromosome II of ATCC23344 with chromosome I of BpK96243 as Bp representative. Regions of homology are represented by blue color. For the sake of clarity, only the genomic regions of interest are depicted.
FGenomic organization of the fliP locus in Bp and Bm. The wild-type fliP locus is present in all Bp. The fliP CDS is represented by dark purple rectangles. The NCTC10247 allele is interrupted by an IS407A (aquamarine) element. In ATCC23344, an ISBma1 (gray) is located upstream of the IS407A element and an additional 65 kb was inserted at this location. PRL20 had additional IS407A mediated insertions into fliP. Figures are not to scale, and IS407A elements in PRL-20 were made smaller.
FDistribution of dN/dS in variable and core genes of Bm genomes aligned with corresponding regions of the reference strain ATCC 23344. dN and dS rates were calculated as described in Materials and Methods. Cumulative data for the seven Bm strains is shown.
FEvolutionary tree of Bm showing the number of genes deleted and the evolutionary point of change. In total, 5,686 gene changes can be mapped onto this tree in a manner that assumes only single evolutionary deletion events. Conversely, 997 gene changes require 2 or 3 independent deletions of the same gene. Because we did not compare these genes with Bp, we do not know the ancestral state for 45 of these genes. These 45 genes could be additions or deletions with equal parsimony with mutations occurring along the basal branches of this tree. Letters in red represent the variable regions lost in each branch.