| Literature DB >> 19478886 |
Mia D Champion1, Qiandong Zeng, Eli B Nix, Francis E Nano, Paul Keim, Chinnappa D Kodira, Mark Borowsky, Sarah Young, Michael Koehrsen, Reinhard Engels, Matthew Pearson, Clint Howarth, Lisa Larson, Jared White, Lucia Alvarado, Mats Forsman, Scott W Bearden, Anders Sjöstedt, Richard Titball, Stephen L Michell, Bruce Birren, James Galagan.
Abstract
Tularemia is a geographically widespread, severely debilitating, and occasionally lethal disease in humans. It is caused by infection by a gram-negative bacterium, Francisella tularensis. In order to better understand its potency as an etiological agent as well as its potential as a biological weapon, we have completed draft assemblies and report the first complete genomic characterization of five strains belonging to the following different Francisella subspecies (subsp.): the F. tularensis subsp. tularensis FSC033, F. tularensis subsp. holarctica FSC257 and FSC022, and F. tularensis subsp. novicida GA99-3548 and GA99-3549 strains. Here, we report the sequencing of these strains and comparative genomic analysis with recently available public Francisella sequences, including the rare F. tularensis subsp. mediasiatica FSC147 strain isolate from the Central Asian Region. We report evidence for the occurrence of large-scale rearrangement events in strains of the holarctica subspecies, supporting previous proposals that further phylogenetic subdivisions of the Type B clade are likely. We also find a significant enrichment of disrupted or absent ORFs proximal to predicted breakpoints in the FSC022 strain, including a genetic component of the Type I restriction-modification defense system. Many of the pseudogenes identified are also disrupted in the closely related rarely human pathogenic F. tularensis subsp. mediasiatica FSC147 strain, including modulator of drug activity B (mdaB) (FTT0961), which encodes a known NADPH quinone reductase involved in oxidative stress resistance. We have also identified genes exhibiting sequence similarity to effectors of the Type III (T3SS) and components of the Type IV secretion systems (T4SS). One of the genes, msrA2 (FTT1797c), is disrupted in F. tularensis subsp. mediasiatica and has recently been shown to mediate bacterial pathogen survival in host organisms. Our findings suggest that in addition to the duplication of the Francisella Pathogenicity Island, and acquisition of individual loci, adaptation by gene loss in the more recently emerged tularensis, holarctica, and mediasiatica subspecies occurred and was distinct from evolutionary events that differentiated these subspecies, and the novicida subspecies, from a common ancestor. Our findings are applicable to future studies focused on variations in Francisella subspecies pathogenesis, and of broader interest to studies of genomic pathoadaptation in bacteria.Entities:
Mesh:
Year: 2009 PMID: 19478886 PMCID: PMC2682660 DOI: 10.1371/journal.ppat.1000459
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Gene and Assembly Statistics Summary.
| Human virulence | low | low | high | rarely | rarely |
| Strains | F. | F. | F. | F. |
|
|
| 1.89 | 1.87 | 1.85 | 1.86 | 1.90 |
|
| 32.10 | 32.07 | 32.17 | 32.34 | 32.23 |
|
| 1,764 | 1,745 | 1,715 | 1,705 | 1,720 |
|
| 916 | 953 | 964 | 986 | 1000 |
|
| 1487 | 1510 | 1514 | 1649 | 1661 |
|
| 277 | 235 | 201 | 56 | 59 |
|
| 84 | 87 | 88 | 97 | 97 |
|
| 427 | 604 | 598 | 521 | 537 |
|
| 2 | 2 | 2 | 1 | 1 |
|
| 113 | 110 | 74 | 9 | 24 |
|
| 27 | 32 | 27 | 36 | 30 |
|
| 5 | 5 | 4 | 7 | 6 |
|
| 4 | 4 | 4 | 4 | 4 |
|
| 10.01× | 10.31× | 10.48× | 9.88× | 9.88× |
|
| 1.89 | 1.87 | 1.85 | 1.86 | 1.90 |
|
| 1.89 | 1.86 | 1.84 | 1.85 | 1.90 |
|
| 21 | 9 | 8 | 5 | 9 |
|
| 245.19 | 488.1 | 387.07 | 554.42 | 298.79 |
|
| 31 | 19 | 15 | 18 | 15 |
|
| 116.80 | 293.90 | 295.52 | 238.33 | 209.54 |
|
| 97.53 | 98.76 | 98.76 | 98.78 | 98.93 |
|
| 27,832 | 24,312 | 24,672 | 26,882 | 23,927 |
Figure 1Pairwise alignments between five new Francisella genome sequences and a reference genome exhibit >95% sequence conservation.
Genome comparative maps were constructed using CGview software to map pairwise blastn alignments between several Francisella genomes (minimum percent identity = 95 and expected threshold = 1e-5). Specifically, five newly sequenced Francisella genomes (F. tularensis subsp. holarctica FSC257 and FSC022; F. tularensis subsp. tularensis FSC033; and F. tularensis subsp. novicida GA99-3548, and GA99-3549 strains) were aligned to the F. tularensis subsp. holarctica OSU18 reference sequence (outside blue track of genome map). A high degree of similarity between the genomes (>95%) is evident from the continuous blocks of synteny (colored regions).
Figure 2Dotplot comparison between the Francisella tularensis subsp. holarctica strains showing the occurrence of significant rearrangement events.
Whole genome alignments and dotplot comparisons between Type B strains of Francisella: (A) F. tularensis subsp. holarctica OSU18 and F. tularensis subsp. holarctica FSC022 (reference genome) and (B) F. tularensis subsp. holarctica LVS, F. tularensis subsp. holarctica OSU18 (reference genome), F. tularensis subsp. holarctica FSC257, F. tularensis subsp. holarctica FTA. Alignments were filtered for overlap percentages greater than or equal to 90%. Dotplot (B) shows a nearly linear, overlapping alignment between all of the Type B strains of the main holarctica lineage (not all of the strains of the Type B radiation lineage are shown for clarity), with the notable exception of comparisons between the F. tularensis subsp. holarctica OSU18 and F. tularensis subsp. holarctica FSC257 strains. In contrast, numerous rearrangements are evident from comparisons between the F. tularensis subsp. holarctica FSC022 japonica strain and all of the other Type B strains of the main holarctica lineage (A) (only comparison to the F. tularensis subsp. holarctica OSU18 strain is shown here).
IS Element Summary Table.
|
| Subtype_Strain | ISFtu1 (IS630 family) | ISFtu2 (IS5 family) | ISFtu3 (ISNCY family) | ISFtu4 (IS982 family) | ISFtu5 (IS4 family) | ISFtu6 (IS1595 family) | Total |
|
| ||||||||
| A.I_FSC033 | 55 | 17 | 3 | 1 | 1 | 4 | 81 | |
| A.I_SchuS4 | 53 | 16 | 3 | 1 | 1 | 4 | 78 | |
| A.I_FSC198 | 53 | 16 | 3 | 1 | 1 | 4 | 78 | |
| A.II_WY96 | 56 | 18 | 3 | 1 | 1 | 4 | 83 | |
|
| ||||||||
| FSC257 | 64 | 44 | 2 | 2 | 1 | 3 | 116 | |
| FSC022 | 58 | 57 | 2 | 2 | 1 | 3 | 123 | |
| OSU18 | 66 | 42 | 2 | 2 | 1 | 3 | 116 | |
| LVS | 59 | 46 | 2 | 2 | 1 | 3 | 113 | |
| FTA | 59 | 42 | 2 | 2 | 1 | 3 | 109 | |
| FSC200 | 59 | 41 | 2 | 2 | 1 | 3 | 108 | |
|
| ||||||||
| GA99-3548 | 1 | 7 | 0 | 2 | 0 | 1 | 11 | |
| GA99-3549 | 0 | 21 | 3 | 2 | 0 | 3 | 29 | |
| U112 | 2 | 20 | 4 | 1 | 0 | 2 | 29 | |
|
| ||||||||
| ATCC25017 | 0 | 1 | 4 | 1 | 0 | 2 | 8 | |
Phylogenetic Differentiation Associated with Gene Gain and Loss.
| Presence of gene (Intact, disrupted, or absent) and predicted protein family name subcategory | Subspecies strains | LocusID (SchuS4 Reference) | Description of predicted protein product (gene name) |
|
|
| FTT0496 | hypothetical protein |
| FTT0677c | hypothetical protein | ||
| FTT0939c | adenosine deaminase ( | ||
| FTT1068c | hypothetical protein (A.I subspecies specific) | ||
| FTT1080c | hypothetical membrane protein | ||
| FTT1122c | hypothetical lipoprotein | ||
| FTT1766 | O-methyltransferase | ||
| FTT1791 | hypothetical protein | ||
|
|
| FTT0524 | hypothetical protein |
| FTT1172c | cold shock protein ( | ||
|
|
| FTT0755 | hypothetical membrane protein |
| FTT1011 | hypothetical protein | ||
| FTT1580c | hypothetical protein | ||
|
|
| FTT1175c | hypothetical membrane protein |
|
|
| FTT0214 | pseudogene, transporter protein |
| FTT0514 | L-lactate dehydrogenase ( | ||
| FTT0529c | DNA polymerase IV, devoid of proofreading, damage inducible protein P ( | ||
| FTT0652c | ferritin-like protein ( | ||
| FTT1378 | pseudogene, hypothetical protein | ||
| FTT1429c | pseudogene, hypothetical protein | ||
| FTT1516c | mercuric reductase ( | ||
| FTT1619 | pseudogene, acetyltransferase | ||
| FTT1661 | thiopurine S-methyltransferase ( | ||
| FTT1768c | chitinase | ||
| FTT1786 | pseudogene, hypothetical protein | ||
| FTT1793c | aminopeptidase N ( | ||
| FTT1799c | pseudogene, hypothetical protein | ||
| Transporters: The ATP binding Cassette (ABC) Superfamily | FTT0276c | cyclohexadienyl dehydratase precursor | |
| FTT0445 | ABC transporter, ATP-binding component | ||
| Transporters: The Major Facilitator Superfamily (MFS) | FTT0657 | major facilitator superfamily (MFS) transporter | |
| FTT0775c | major facilitator superfamily (MFS) transporter ( | ||
| FTT1380 | major facilitator superfamily (MFS) transporter | ||
| FTT1618 | major facilitator superfamily (MFS) transporter | ||
|
|
| FTT0178c | 30S ribosomal protein S6 modification protein-related protein ( |
| FTT0221 | acid phosphatase precursor ( | ||
| FTT0544 | phosphonoacetate hydrolase ( | ||
| FTT0553 | hypothetical protein | ||
| FTT0568 | hypothetical protein | ||
| FTT0747c | hypothetical protein | ||
| FTT0783 | Arylsulfatase ( | ||
| FTT0786 | hypothetical protein | ||
| FTT0846 | deoxyribodipyrimidine photolyase | ||
| FTT0898c | hypothetical protein | ||
| FTT0902 | hypothetical protein | ||
| FTT0949c | hypothetical membrane protein | ||
| FTT1007c | hypothetical protein | ||
| FTT1109 | choloylglycine hydrolase family protein | ||
| FTT1171c | DNA-methyltransferase, Type I restriction-modification Enzyme subunit M ( | ||
| FTT1202 | transcriptional regulator lysR family | ||
| FTT1267 | transcriptional regulator lysR family | ||
| FTT1293c | hypothetical protein , sua5_yciO_yrdC family protein | ||
| FTT1383 | Sun protein | ||
| FTT1413 | aminotransferase | ||
| FTT1428c | acetyltransferase | ||
| FTT1591 | lipoprotein | ||
| FTT1623c | hypothetical protein | ||
| FTT1625c | hypothetical protein | ||
| FTT1796c | hypothetical protein | ||
| Transporters: The ATP binding Cassette (ABC) Superfamily | FTT0017 | ABC transporter ATP-binding protein for toxin secretion | |
| FTT0125 | oppD, oligopeptide transporter, subunit D | ||
| FTT0475 | the small conductance mechanosensitive ion channel (MscS) family transporter | ||
| FTT1775c | the chloride channel family transporter | ||
| Transporters: The Major Facilitator Superfamily (MFS) | FTT0129 | major facilitator superfamily (MSF) transporter | |
| FTT0487 | major facilitator superfamily (MSF) transporter | ||
| FTT0488c | major facilitator superfamily (MSF) transporter | ||
| FTT0671 | major facilitator superfamily (MSF) transporter | ||
| Transporters: Proton-dependent oligopeptide transport (POT) family | FTT0651 | proton-dependent oligopeptide transport (POT) family protein | |
| FTT1005c | proton-dependent oligopeptide transport (POT) family protein ( | ||
|
|
| FTT0262 | hypothetical lipoprotein |
| FTT0495 | hypothetical protein | ||
| FTT0706 | ( | ||
| FTT0865 | pseudogene, hypothetical protein | ||
| FTT0883 | pseudogene, alcohol dehydrogenase | ||
| FTT1577 | hypothetical protein | ||
|
|
| FTT0095 | hypothetical protein |
| FTT0122 | ( | ||
| FTT0177c | acetyltransferase | ||
| FTT0223c | hypothetical protein ( | ||
| FTT0464 | ( | ||
| FTT0673c | hypothetical protein | ||
| FTT0829c | aspartate∶alanine antiporter | ||
| FTT0850 | hypothetical protein | ||
| FTT0864c | transcriptional regulator lysR family | ||
| FTT0911 | hypothetical protein | ||
| FTT0961 | ( | ||
| FTT0995 | major facilitator superfamily (MSF) transporter | ||
| FTT1119 | transcriptional regulator lysR family | ||
| FTT1266c | ( | ||
| FTT1285c | transcriptional regulator lysR family | ||
| FTT1592c | pseudogene, hypothetical protein | ||
| FTT1645 | major facilitator superfamily (MSF) transporter | ||
| FTT1703 | hypothetical protein | ||
| FTT1781c | hypothetical protein | ||
| Membrane proteins | FTT1426c | hypothetical membrane protein | |
| FTT1626c | hypothetical membrane protein |
Phylogenetic Differentiation Associated with Gain and Loss of Genes with Sequence Similarity to T3SS Effectors.
|
|
| FTT0023c | lipase/acyltransferase |
| FTT1524c | ATP-dependent helicase ( | ||
|
|
| FTT0612 | hypothetical protein (present in three copies in novicida strains) |
|
|
| FTT0211c | outer membrane lipoprotein |
| FTT0393 | methionine aminopeptidase ( | ||
| FTT0541c | haloacid dehalogenase ( | ||
| FTT0659 | DNA recombination protein ( | ||
| FTT0910 | hypothetical protein | ||
| FTT1132c | glycerophosphoryl diester phosphodiesterase ( | ||
| FTT1156c | Type IV pilin multimeric outer membrane protein ( | ||
| FTT1268c | chaperone protein ( | ||
| FTT1376 | acyl carrier protein ( | ||
| FTT1512c | chaperone protein ( | ||
| FTT1671 | riboflavin biosynthesis protein ( |
Phylogenetic Differentiation Associated with Gain and Loss of Genes with Sequence Similarity to T4SS Components.
|
|
| FTT0046 | magnesium chelatase family protein ( |
| FTN_1756 | bacterioferritin comigratory protein ( | ||
|
|
| FTT1797c | peptide methionine sulfoxide reductase ( |
|
|
| FTT0542 | peroxiredoxin (alkyl hydroperoxide reductase subunit C) ( |
|
|
| FTT0458 | stringent starvation protein A regulator of transcription ( |
| FTT0557 | hypothetical protein ahpC/TSA family | ||
| FTT0623 | trigger factor (TF) protein (peptidyl-prolyl cis/trans isomerase) | ||
| FTT0628 | peptidyl-prolyl cis-trans isomerase D | ||
| FTT0633 | membrane protease subunit ( | ||
| FTT0634 | membrane protease subunit ( | ||
| FTT0832 | FKBP-type 16 kDa peptidyl-prolyl cis-transisomerase ( | ||
| FTT0878c | peptide methionine sulfoxide reductase ( | ||
| FTT1186 | SsrA (tmRNA)-binding protein ( | ||
| FTT1422 | SM-20-related protein | ||
| FTT1725c | protein-L-isoaspartate O-methyltransferase ( |
Phylogenetic Differentiation Associated with Gain and Loss of Genes Regulating Competence (E values>1e-10).
|
|
| FTT0046 | magnesium chelatase family protein ( |
| FTT0179 | DNA internalization-related competence protein ( | ||
| FTT0830c | DNA uptake protein ( | ||
| FTT1301c | amidophosphoribosyl-transferase (similar to | ||
|
|
| FTT1057c | fimbrial biogenesis and twitching motility protein ( |
| FTT1156c | Type IV pilin multimeric outer membrane protein |
Figure 3Phylogeny of 20 Francisella strains by analysis of Genome-wide SNP sequences and inferred using Maximum Parsimony method.
The evolutionary history of 20 Francisella strains was inferred from analysis of genome-wide SNP sequences. Genome-wide SNPs at least 20 bp apart were selected for further analysis using MEGA4 software. The tree is drawn to scale, with branch lengths calculated using the average pathway method and are in the units of the number of changes over the whole sequence. Francisella strains sequenced at the Broad Institute are in red. The differentiation of F. tularensis subsp. novicida predated differentiation of the more pathogenic F. tularensis subsp. tularensis and F. tularensis subsp. holarctica subspecies from a common ancestor. Our data indicates differential rates of evolution along the Type A.II and Type A.I lineages, as evident from the branch lengths. The Type B FSC022 strain diverged prior to the radiation of the main holarctica group and also has a greatly reduced branch length, consistent with a much slower rate of evolution. The F. tularensis subsp. mediasiatica FSC147 strain is phylogenetically more closely related to the Type A.II lineage even though its pathogenicity is characteristic of Type B strains.
Figure 4Multiple alignment of Francisella genomes of the Type B lineage identifies conserved sequence regions with rearrangements; pseudogenes are found proximal to predicted breakpoints.
A comparison of genome rearrangement patterns between the more ancestral F. tularensis subsp. holarctica FSC022 japonica strain and representative strains of the main holarctica group (F. tularensis subsp. holarctica LVS and F. tularensis subsp. holarctica OSU18) was done using MAUVE (A). MAUVE uses an anchored alignment algorithm that permits reordering of the alignment anchors for identification of rearrangements. Colored Local Collinear Blocks (LCBs) are regions of sequence alignment that are free of rearrangements. Each LCB is defined by the anchor regions or predicted sites of rearrangement. A default LCB cutoff of 175, and a filtering for larger blocks containing 10 Kb or larger was done in Mauve. Sequence inversions are denoted by differential positioning of the LCBs relative to a reference axis. A zoomed-in section of the whole-genome alignment is shown so that the annotated ORFs are visible (black outlined boxes). The small red ORFs are rRNA genes. ORFs proximal to predicted rearrangement breakpoints (red circles) have been color coded and labeled (FWD, INV). A summary of the predicted protein products for these genes is provided in (B). These include genes that have been identified as being either disrupted or absent in the F. tularensis subsp. holarctica subspecies.
Figure 5Comparisons of pairwise sequence alignments provides evidence of pseudogene enrichment within 1 kb of predicted breakpoints in Type B strains.
Pairwise sequence alignments using F. tularensis subsp. holarctica (Type B) strains as a reference were further analyzed for evidence of gene decay proximal to predicted breakpoint sites. Total ORF counts were used to determine the percentage of pseudogenes (y-axis) relative to the distance from the predicted breakpoint (x-axis). Pseudogene enrichment was determined using Fisher's Exact Test to derive p-values. Our findings show the highest percentages of pseudogenes are found within 1 kb of the predicted breakpoints indicating that genome rearrangement events have likely promoted gene decay in strains of the Type B lineage.
GenBank Accession Numbers for Francisella Strains Used in This Study.
| GenBank accession numbers | Strain name (strains sequenced by the Broad in bold) |
| AJ749949 |
|
|
|
|
| AM286280 |
|
| CP000608 |
|
| CP000437 |
|
| CP000803 |
|
|
|
|
|
|
|
| NC_007880 |
|
| AASP00000000 |
|
| CP000439 |
|
|
|
|
|
|
|
| CP000937 |
|