| Literature DB >> 17895988 |
Stephen M Beckstrom-Sternberg1, Raymond K Auerbach, Shubhada Godbole, John V Pearson, James S Beckstrom-Sternberg, Zuoming Deng, Christine Munk, Kristy Kubota, Yan Zhou, David Bruce, Jyothi Noronha, Richard H Scheuermann, Aihui Wang, Xianying Wei, Jianjun Wang, Jicheng Hao, David M Wagner, Thomas S Brettin, Nancy Brown, Paul Gilna, Paul S Keim.
Abstract
Francisella tularensis is the causative agent of tularemia, which is a highly lethal disease from nature and potentially from a biological weapon. This species contains four recognized subspecies including the North American endemic F. tularensis subsp. tularensis (type A), whose genetic diversity is correlated with its geographic distribution including a major population subdivision referred to as A.I and A.II. The biological significance of the A.I - A.II genetic differentiation is unknown, though there are suggestive ecological and epidemiological correlations. In order to understand the differentiation at the genomic level, we have determined the complete sequence of an A.II strain (WY96-3418) and compared it to the genome of Schu S4 from the A.I population. We find that this A.II genome is 1,898,476 bp in size with 1,820 genes, 1,303 of which code for proteins. While extensive genomic variation exists between "WY96" and Schu S4, there is only one whole gene difference. This one gene difference is a hypothetical protein of unknown function. In contrast, there are numerous SNPs (3,367), small indels (1,015), IS element differences (7) and large chromosomal rearrangements (31), including both inversions and translocations. The rearrangement borders are frequently associated with IS elements, which would facilitate intragenomic recombination events. The pathogenicity island duplicated regions (DR1 and DR2) are essentially identical in WY96 but vary relative to Schu S4 at 60 nucleotide positions. Other potential virulence-associated genes (231) varied at 559 nucleotide positions, including 357 non-synonymous changes. Molecular clock estimates for the divergence time between A.I and A.II genomes for different chromosomal regions ranged from 866 to 2131 years before present. This paper is the first complete genomic characterization of a member of the A.II clade of Francisella tularensis subsp. tularensis.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17895988 PMCID: PMC1978527 DOI: 10.1371/journal.pone.0000947
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
General features of WY96 genome.
| Length (bp) | 1,898,476 |
| GC Content (%) | 32.3 |
| Total Genes | 1,820 |
| Protein Coding Genes | 1,768 |
| Genes Assigned Function | 1,303 |
| Hypothetical Protein | 282 |
| Unclassified or Unknown Function | 273 |
| Disrupted ORFs | 186 |
| Large Duplicated regions | 3 |
| Transposons (IS elements) | 82 |
| tRNA | 38 |
| rRNA | 10 |
| Structural RNA (sRNA) | 4 |
| Average Gene Length (nt) | 938 |
| Percent Coding | 90.20% |
IS Element Summary Table.
| IS Elements | Number in WY96 | Comment | Number in Schu S4 |
| ISFtu1 (IS630 family) | 52 | Programmed frameshift | 50 |
| ISFtu2 | 19 | Each with premature stop at same position compared to elements in Schu S4 genome | 16 |
| ISFtu3 (ISNCY family, ISHpal-IS1016) | 5 | All pseudogenes | 3 |
| ISFtu4 (IS982 family) | 1 | Stop codon, insertion and deletion (pseudogene) | 1 |
| ISFtu5 (IS4 family) | 1 | Multiple frameshifts and premature stop (pseudogene) | 1 |
| ISFtu6 (IS1595 family) | 3 | All pseudogenes | 3 |
| ISSod13 | 1 | Conserved (in | 1 |
| Total | 82 | 75 |
Figure 1Circular genome diagram of the Francisella tularensis subsp. tularensis WY96 genome.
The layers beginning with the outermost layer depict: the location and types of SNPs found between WY96 and Schu S4, a line graph depicting AT and GC skew throughout the genome, the locations of start and stop codons in the forward three reading frames, ORF locations in the forward three reading frames, ORF locations in the reverse three reading frames, start and stop codons in the reverse three reading frames, and a line graph depicting AT and GC content throughout the genome. The SNP positions are relative to the Schu S4 genome (Table S2), and are classified as synonymous (sSNP), non-synonymous (nSNP) and intergenic (iSNP), ribosomal (rSNP), and tRNA (tSNP). SNPs in duplicated regions (Table S5) are not shown here, but in Figure 4. Figure created using CGView [43].
Figure 4Diagram of the 33,911 bp duplicated region containing the predicted pathogenicity island.
SNPs are classified as synonymous (sSNP – light blue), non-synonymous (nSNP - gold), ribosomal (rSNP – green), and intergenic (iSNP - violet). The IglABCD operon is shown in red. SNPs in putative and hypothetical genes are classified as intergenic because their effect is unknown (Table S5).
Figure 2Mauve comparison of the WY96 (top) and Schu S4 (top) (bottom) genomes.
Locally collinear blocks (LCBs) are the same color and size between genomes. Labeled blocks contain groups of genes. Blocks are labeled as forwards (F) and inversions (I) and the numbers correspond to names in the supplementary table (Table S3).
Figure 3Dot plot comparison of MUMmer nucmer output [26] between Francisella tularensis subsp. tularensis strains Schu S4 and WY96.
Plot generated from MUMmer coords file output using script written for R. Shared blocks of 1.5 Kb or greater were plotted as diagonal lines outlined in red (forward matches) and green (inversions). Positions of ISFtu elements (transposons) were plotted as colored spots (see legend). Red and green blocks on the axes correspond to forward and inversion matches, respectively, to the other genome, and different levels show overlapping matches. The black and white bars on each axis show overall matching regions (black) and gapped regions of no match (white). The two direct repeats (DR) diagonals are labeled DR1, DR2 along with the partial DR3 repeat (Table S4).