| Literature DB >> 18270594 |
Mathieu Picardeau1, Dieter M Bulach, Christiane Bouchier, Richard L Zuerner, Nora Zidane, Peter J Wilson, Sophie Creno, Elizabeth S Kuczek, Simona Bommezzadri, John C Davis, Annette McGrath, Matthew J Johnson, Caroline Boursaux-Eude, Torsten Seemann, Zoé Rouy, Ross L Coppel, Julian I Rood, Aurélie Lajus, John K Davies, Claudine Médigue, Ben Adler.
Abstract
Leptospira biflexa is a free-living saprophytic spirochete present in aquatic environments. We determined the genome sequence of L. biflexa, making it the first saprophytic Leptospira to be sequenced. The L. biflexa genome has 3,590 protein-coding genes distributed across three circular replicons: the major 3,604 chromosome, a smaller 278-kb replicon that also carries essential genes, and a third 74-kb replicon. Comparative sequence analysis provides evidence that L. biflexa is an excellent model for the study of Leptospira evolution; we conclude that 2052 genes (61%) represent a progenitor genome that existed before divergence of pathogenic and saprophytic Leptospira species. Comparisons of the L. biflexa genome with two pathogenic Leptospira species reveal several major findings. Nearly one-third of the L. biflexa genes are absent in pathogenic Leptospira. We suggest that once incorporated into the L. biflexa genome, laterally transferred DNA undergoes minimal rearrangement due to physical restrictions imposed by high gene density and limited presence of transposable elements. In contrast, the genomes of pathogenic Leptospira species undergo frequent rearrangements, often involving recombination between insertion sequences. Identification of genes common to the two pathogenic species, L. borgpetersenii and L. interrogans, but absent in L. biflexa, is consistent with a role for these genes in pathogenesis. Differences in environmental sensing capacities of L. biflexa, L. borgpetersenii, and L. interrogans suggest a model which postulates that loss of signal transduction functions in L. borgpetersenii has impaired its survival outside a mammalian host, whereas L. interrogans has retained environmental sensory functions that facilitate disease transmission through water.Entities:
Mesh:
Year: 2008 PMID: 18270594 PMCID: PMC2229662 DOI: 10.1371/journal.pone.0001607
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Circular maps of the three L. biflexa replicons.
(1) the coordinates in bp beginning at 0 = oriC; (2) dark pink: genes unique to L. biflexa, not found in L. interrogans serovar Copenhageni and L. borgpetersenii serovar Hardjobovis (identity >40% over 80% of the length of the smallest protein). (3) dark purple: genes found in L. biflexa, L. interrogans and L. borgpetersenii (identity >40% over 80% of the length of the smallest protein). (4) red: genes found in L. biflexa and L. borgpetersenii, but not in L. interrogans (identity >40% over 80% of the length of the smallest protein). (5) brown : genes found in L. biflexa and L. interrogans, but not in L. borgpetersenii (identity >40% over 80% of the length of the smallest protein). (6) blue: genes found in L. biflexa and other sequenced spirochetes (Borrelia afzelii PKo, Borrelia burgdorferi, Borrelia garinii, Treponema denticola and Treponema pallidum) (identity >40% over 80% of the length of the smallest protein ). (7) The innermost ring shows GC skew; positive skew is shown in grey, and negative skew is shown in black.
Summary of genome features of pathogenic and saprophytic Leptospira
| Features |
|
|
| |||||
| CI | CII | CI | CII | CI | CII | P74 | LE-1 prophage | |
| Size (bp) | 3,614,456 | 317,335 | 4,277,185 | 350,181 | 3,603,977 | 277,995 | 74,116 | 73,623 |
| G+C content (%) | 41.0 | 41.2 | 35.1 | 35.0 | 38.9 | 39.3 | 37.5 | 38.5 |
| Protein-coding percentage | 80 | 80 | 74.9 | 75.5 | 92.3 | 93.3 | 90.9 | 93.4 |
|
| ||||||||
| CDS | 2,607 | 237 | 3,105 | 274 | 3,268 | 266 | 56 | 82 |
| With assigned function | 1,644 | 135 | 1,817 | 159 | 2,042 | 141 | 31 | 19 |
| Conserved hypothetical | 373 | 32 | 484 | 34 | 464 | 43 | 5 | 2 |
| Unique hypothetical | 590 | 70 | 804 | 81 | 762 | 82 | 20 | 61 |
|
| 215 | 26 | 26 | 0 | 8 | 1 | 1 | 0 |
|
| 340 | 28 | 38 | 3 | 32 | 1 | 0 | 0 |
|
| 37 | 0 | 37 | 0 | 35 | 0 | 0 | 0 |
|
| ||||||||
| 23S | 2 | 0 | 2 | 0 | 2 | 0 | 0 | 0 |
| 16S | 2 | 0 | 2 | 0 | 2 | 0 | 0 | 0 |
| 5S | 1 | 0 | 1 | 0 | 2 | 0 | 0 | 0 |
L. borgpetersenii serovar Hardjo strain L550, L. interrogans serovar Copenhageni strain Fiocruz, L. biflexa serovar Patoc strain Ames
excluding transposases and pseudogenes
[11], [38]
CDS from Replicon III (p74) that have an ortholog in Chromosome I in other Leptospira
| Stop | start | locus_tag | ortholog_tag | product |
| 4267 | 3794 | LBF_5005 | SPN2759 | Conserved hypothetical protein |
| 5304 | 4993 | LBF_5007 | SPN2285 | Conserved hypothetical protein |
| 7485 | 5608 | LBF_5009 | SPN2142 | Serine phosphatase RsbU, regulator of sigma subunit |
| 10908 | 11909 | LBF_5013 | SPN2858 | ABC-type Fe3+-siderophore transport system, permease component |
| 16746 | 18548 | LBF_5018 | SPN2289 | Exodeoxyribonuclease V, alpha subunit |
| 18545 | 22168 | LBF_5019 | SPN2290 | Exodeoxyribonuclease V, beta subunit |
| 22171 | 25470 | LBF_5020 | SPN2291 | Exodeoxyribonuclease V, gamma subunit |
| 41270 | 42064 | LBF_5030 | SPN0228 | Bacteriophage-related protein |
| 53226 | 52999 | LBF_5037 | SPN1718 | Conserved hypothetical protein |
| 60761 | 60456 | LBF_5044 | SPN3221 | Antitoxin of toxin-antitoxin stability system |
| 61204 | 60767 | LBF_5045 | SPN3222 | Hypothetical protein |
| 62047 | 63093 | LBF_5047 | SPN1129 | Homoserine kinase |
| 63112 | 64008 | LBF_5048 | SPN1151 | GGDEF domain receiver component of a two-component response regulator |
Ortholog found on Chromosome II or Chromosome I in other Leptospira.
Figure 2Venn diagram showing numbers of unique and shared genes amongst L. interrogans, L. borgpetersenii and L. biflexa.
Orthologous CDS were identified in a pair-wise fashion using Whole-Genome Reciprocal Best-Hit BLAST Analysis [37]. Manual curation ensured a one to one relationship for orthologous CDS, particularly in situations where sets of paralogous CDS existed and in addition evaluated the nature of the relationship between CDS with reciprocal best-hits but low expect values. This analysis was performed using the L. interrogans serovar Copenhageni strain Fiocruz, L. borgpetersenii serovar Hardjo strain L550 and L. biflexa serovar Patoc strain Ames genome sequences.
Distribution of the orthologs over the two chromosomes of Leptospira spp.
|
|
|
|
|
| |
| No od CDS shared between C1 replicons (1) | 1411 (41.58%) | 1429 (39.13%) | 1429 (37.41%) | 1482 (38.73%) | 1448(38.20%) |
| No of CDS shared between CII replicons (2 | 80 (28.46%) | 83 (27.66%) | 83 (26.17%) | 82 (24.47%) | 82 (25.07%) |
(1) Number of CDS (orthologs) found in the large chromosome (CI) of one leptospire that are also found in the large chromosomes of the other four leptospiral large chromosomes. (2) Number of CDS (orthologs) found in the small chromosome (CII) of one leptospire that are also found in the small chromosomes of the other four leptospiral large chromosomes. Putative orthologous relations between two genomes are defined as gene couples satisfying the bi-directional best hit (BBH) criterion or a blastP alignment threshold, a minimum of 40% sequence identity on 80% of the length of the smallest protein. CDS, coding regions.
Figure 3Synteny plot between the five Leptospira genomes.
The line plots were obtained using synteny results between the large CI(A) or small CII(B) chromosomes of L. biflexa serovar Patoc strain Patoc1, L. interrogans serovar Lai strain 56601, L. interrogans serovar Copenhageni strain Fiocruz L1-130, L. borgpetersenii serovar Hardjo strain L550, and L. borgpetersenii serovar Hardjo strain JB197. A line plot (C) compares synteny between L. borgpetersenii serovar Hardjo strain JB197 and L. interrogans serovar Copenhageni strain Fiocruz L1-130. Comparative analysis was performed using the MaGe interface [35] in the SpiroScope database (https://www.genoscope.cns.fr/agc/mage). The minimum size of the synteny groups is set to five genes. In green: synteny groups are organized on the same strand; in red: synteny groups are organized on opposite strands.
Comparative genomics of Leptospira spp. Putative orthologous relations between two genomes are defined as gene couples satisfying the bi-directional best hit (BBH) criterion or a blastP alignment threshold, a minimum of 40% sequence identity on 80% of the length of the smallest protein. Putative paralogous relations between two genomes are defined as gene couples satisfying the bi-directional best hit (BBH) criterion or a blastP alignment threshold, a minimum of 60% sequence identity on 80% of the length of the smallest protein.
| Paralogs/orthologs |
|
|
|
|
|
|
| 62 (1.5%) | 1650 (44.04%) | 1635 (43.64%) | 1631 (43.53%) | 1652 (44.10% |
|
| 1633 (40.35%) | 203 (5.01%) | 2913 (71.97%) | 2907 (71.83%) | 3745 (92.53%) |
|
| 1674 (40.23%) | 3084 (74.11%) | 438 (10.52%) | 3936 (94.59%) | 3077 (73.94%) |
|
| 1636 (39.73%) | 3052 (74.13%) | 3922 (95.26%) | 354 (8.59%) | 3044 (73.93%) |
|
| 1638 (39.60%) | 3778 (91.34%) | 2942 (71.13%) | 2934 (70.93%) | 204 (4.93%) |
Distribution of general protein functions between leptospiral species based on the COG function classification scheme.
| ♣COG Function Classification |
|
|
| |
|
| ||||
| J | Translation, ribosomal structure and biogenesis | 154 | 174 | 153 |
| K | Transcription | 166 | 104 | 109 |
| L | Replication, recombination and repair | 94 | 91 | 102 |
| B | Chromatin structure and dynamics | 2 | 2 | 2 |
|
| ||||
| D | Cell cycle control, cell division, chromosome partitioning | 21 | 22 | 22 |
| V | Defense mechanisms | 39 | 32 | 37 |
| T | Signal transduction mechanisms | 287 | 167 | 214 |
| M | Cell wall/membrane/envelope biogenesis | 230 | 199 | 218 |
| N | Cell motility | 93 | 84 | 89 |
| U | Intracellular trafficking, secretion, and vesicular transport | 71 | 73 | 71 |
| O | Posttranslational modification, protein turnover, chaperones | 105 | 96 | 100 |
|
| ||||
| C | Energy production and conversion | 132 | 115 | 119 |
| G | Carbohydrate transport and metabolism | 91 | 76 | 91 |
| E | Amino acid transport and metabolism | 163 | 136 | 150 |
| F | Nucleotide transport and metabolism | 46 | 52 | 52 |
| H | Coenzyme transport and metabolism | 119 | 112 | 120 |
| I | Lipid transport and metabolism | 101 | 83 | 99 |
| P | Inorganic ion transport and metabolism | 120 | 72 | 88 |
| Q | Secondary metabolites biosynthesis, transport and catabolism | 35 | 23 | 27 |
|
| ||||
| R | General function prediction only | 311 | 237 | 294 |
| S | Function unknown | 174 | 157 | 192 |
| CDS Not Classified (not related to any COG) | 1,266 | 931 | 1,245 | |
| Total CDS (count excludes transposases and pseudogenes) | 3,590 | 2,843 | 3,378 | |
Each COG assignment has been manually curated to ensure consistent classification across orthologous proteins. A feature of the COG scheme is that some COGs have multiple functional classifications.
L. borgpetersenii serovar Hardjo strain L550, L. interrogans serovar Copenhageni strain Fiocruz, L. biflexa serovar Patoc strain Ames