| Literature DB >> 15615594 |
Gilbert Greub1, François Collyn, Lionel Guy, Claude-Alain Roten.
Abstract
BACKGROUND: The genome of Protochlamydia amoebophila UWE25, a Parachlamydia-related endosymbiont of free-living amoebae, was recently published, providing the opportunity to search for genomic islands (GIs).Entities:
Mesh:
Substances:
Year: 2004 PMID: 15615594 PMCID: PMC548262 DOI: 10.1186/1471-2180-4-48
Source DB: PubMed Journal: BMC Microbiol ISSN: 1471-2180 Impact factor: 3.605
Figure 1The genomic island (GI) present in the chromosome of the endosymbiont UWE25. (A) Position of the GI on the UWE25 genome, a 100-kb region (grey area) delimited by two direct repeats (Table 1) at both ends and by two gly-tRNAs genes in tandem (all tRNAs genes are represented by '+') at its proximal end. A third copy of the direct repeat (Table 1) is indicated by a white line disrupting the grey area. The region is characterized by a different slope in the cumulative GC skew analysis (black curve) and by a higher G+C content (grey curve, windows of 20 kb, 0.1-kb step). The horizontal line indicates the genomic G+C content average. (B) Closer view of the 100-kb region (black curve, cumulative GC skew; grey curve, G+C content windows of 5 kb, 0.1-kb step; horizontal line, average genomic G+C content). (C) Residual cumulative G+C content (GC') and genomic features of the 100-kb GI. This region encompasses the region with the highest G+C content in the 20-kb windows analysis of the UWE25 genome. The position of genes is represented by an 'X' on the upper line if encoded on the positive strand, otherwise by an 'X' on the bottom line (For details, see Table in Supplementary Material 1). A large majority of genes are co-oriented in the genome region flanked by the direct repeats. The tra operon (thick line), present on this GI, exhibits a G+C content (40.0%) clearly higher than that of the whole genome (34.7%). The positions of transposases (open circles) and of phage-related genes (full circles) are indicated.
Description of main features of the parachlamydial 100-kb genomic island. Chromosome location of direct repeats, tRNA genes, tra operon, transposases, bacteriophage-related proteins and proteins involved in DNA metabolism is listed below.
| Protein numbera | Positiona | |
| Direct repeat | - | 1648147–1648157 |
| - | 1648172–1648243 | |
| - | 1648332–1648403 | |
| Transposasec | pc1402 | 1679924–1680400 |
| Phage-related proteinc | pc1404 | 1681569–1682441 |
| Putative transcriptional regulatorc,d | pc1405 | 1679924–1680400 |
| Phage-related proteinc | pc1410 | 1685329–1686447 |
| Putative transposasec | pc1419 | 1695418–1696245 |
| pc1420-1441 | 1696410–1716241 | |
| Transposasese | pc1426-1427 | 1700887–1701896 |
| Putative DNA-binding proteinc | pc1443 | 1716648–1717004 |
| Phage-related proteinc | pc1444 | 1717137–1717400 |
| Direct repeat | - | 1723093–1723103 |
| Putative ATPase involved in DNA repairc | pc1451 | 1723169–1723504 |
| Probable Doc (death on cure) protein, bacteriophage P1a | pc1456 | 1732622–1732999 |
| Putative DNA-binding protein | pc1461 | 1735745–1736065 |
| Putative transposasec | pc1465 | 1740371–1741198 |
| Probable DNA double-strand break repair ATPase | pc1467 | 1742079–1744181 |
| Putative transposase | pc1468 | 1744634–1745023 |
| Probable resolvasea | pc1469 | 1745398–1745955 |
| Probable transposases, partial lengtha | pc1470-1471 | 1745807–1746692 |
| Probable Doc (death on cure) protein, bacteriophage P1a | pc1473 | 1747135–1747512 |
| Phage-related proteinc | pc1474 | 1747618–1747809 |
| Direct repeat | - | 1747915–1747925 |
a, according to Horn et al. [14];
b, positive strand (cooriented as the majority of the genes of the GI); gly-tRNAs are separated by 88 nt;
c, identified by BLAST [35; 36] and CLUSTALW [39] by ourselves;
d, phage-related protein based on additional BLAST hit;
e, partially annotated by Horn et al. [14] and further characterized by ourselves by BLAST [35, 36].
The genomic island of the UWE25 endosymbiont presents seven different modules. The limit of each module was determined by residual cumulative G+C content analysis.
| Modules | 1st mod. | 2nd mod. | 3rd mod. | 4th mod. | 5th mod. | 6th mod. | 7th mod. |
| Length | 32 kb | 16 kb | 19 kb | 10 kb | 6 kb | 12 kb | 2 kb |
| Mean G+C content (%) | 36.4%a | 34.1% | 40.9% | 33.4% | 41.8% | 33.3% | 38.7% |
| Number of ORFs | 28 | 18 | 21 | 13 | 1 | 13 | 6 |
| Number of genes having an homolog | 16 (57%) | 5 (28%) | 16 (85%) | 7 (54%) | 1 (100%) | 5 (38%) | 5 (83%) |
| Number of genes having no homolog | 12 | 13 | 5 | 6 | 0 | 8 | 1 |
| Best hits with homologs in: | |||||||
| - chlamydiae | 12 | 0 | 0 | 0 | 0 | 0 | 0 |
| - cyanobacteria | 2 | 2 | 0 | 3 | 0 | 2 | 0 |
| - plantsb | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| - α-proteobacteria | 0 | 0 | 6 | 0 | 0 | 0 | 0 |
| - β-proteobacteria | 1 | 0 | 3 | 1 | 0 | 0 | 0 |
| - γ-proteobacteria | 0 | 1 | 6 | 2 | 0 | 0 | 4c |
| - Bacteroidetes group | 0 | 0 | 1 | 0 | 0 | 2 | 1 |
| - others | 1 | 2 | 0 | 1 | 1 | 1 | 0 |
| Homologousd to: | |||||||
| - phage-related protein | 0 | 3 | 0 | 1 | 0 | 1 | 2 |
| - putative transposase | 1 | 1 | 2 | 0 | 0 | 2e | 2 |
| - resolvase | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| - protein involved in DNA metabolism | 0 | 1 | 0 | 2 | 0 | 2 | 0 |
a the G+C content is similar to that of 36.1% of the remaining genome (calculated on 1931 ORFs);
b 5% of the 2031 ORFs of the genome of UWE25 have products homologous to plant proteins, but no ORFs of the GI were homologous to plant counterparts;
c two ORFs which presented best homologs encoded by a plasmid of an uncultured bacteria present in activated sludge have a second best BLAST hit encoded by a gamma-proteobacterial ORFs (Pseudomonas sp.);
d not only the best BLAST hit is taking into consideration to determine the putative function encoded by the ORF;
e one of them has an e-value above 0.001.
Figure 2Comparison of the tra unit of the endosymbiont UWE25 with similar operons of pNL1 (Novosphingobium aromaticivorans), F (Escherichia coli), R391 (Providencia rettgeri) and R27 (Salmonella Typhi) plasmids. In the upper part of the figure: the G+C content of the UWE25 chromosome, around the 1.71 Mb location (1-kb sliding window average, 0.1-kb step). The horizontal line represents the genomic G+C content average. Only the ORFs composed of more than hundred amino-acids are presented on genetic maps of tra units/operons by arrows according to their transcription direction (adapted from Lawley et al. [20]). Colors or patterns are used to indicate tra gene homologs. White genes represent non-conserved transfer genes. Upper case letters refer to the corresponding tra genes, whereas lower case letters f and c stand for trsF and trbC, respectively. Double slashes indicate non-contiguous regions. Interestingly, the G+C-rich genes encoded by the UWE25 chromosome correspond to the ORFs presenting tra homologs.
Figure 3Similarity and phylogenetic analyses of tra units showing the close relatedness of the UWE25 tra unit with the operons involved in the F-like conjugative systems: (A) UPGMA tree of gene order analysis and (B) UPGMA tree comparing the Kimura corrected p-distances of the concatenated traA, traK, traB, traV, and traC gene present along the UWE25 tra unit and the F-like, I-like and P-like plasmids [20]. The bar represents estimated evolutionary distance scale. The numbers at each node are the results of a bootstrap analysis; each value is derived from 100 samples.