| Literature DB >> 15720711 |
Marcin Feder1, Janusz M Bujnicki.
Abstract
BACKGROUND: Prediction of structure and function for uncharacterized protein families by identification of evolutionary links to characterized families and known structures is one of the cornerstones of genomics. Theoretical assignment of three-dimensional folds and prediction of protein function even at a very general level can facilitate the experimental determination of the molecular mechanism of action and the role that members of a given protein family fulfill in the cell. Here, we predict the three-dimensional fold and study the phylogenomic distribution of members of a large family of uncharacterized proteins classified in the Clusters of Orthologous Groups database as COG4636.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15720711 PMCID: PMC551604 DOI: 10.1186/1471-2164-6-21
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Multiple sequence alignment of selected representatives of the extended COG4636+ family. The selection of representative sequences includes the modeled protein from Nostoc (motif H-PD-EXX-K, members from G. violaceus with different order of putative catalytic residues (Gv1: H-PD-EXK; Gv2: S-PD-EXD-K; Gv3: H-PD-EXD; Gv4: H-PD-EXX-N; Gv5: Q-PD-EXX-K), and members of mono-phyletic clusters from D. hafniense, C. aurantiacus, S. coelicolor, T. thermophilus, and G. violaceus). The positions of putative catalytic residues are labeled with "*". The variable termini, which could not be confidently aligned, are not shown; the number of omitted residues is indicated. A complete alignment of full-length sequences is available for download from . Amino acids are colored according to the physico-chemical properties of their side-chains (negatively charged: red, positively charged: blue, polar: magenta, hydrophobic: green). Conserved residues are highlighted. Elements of predicted secondary structure (helices and strands) are indicated by tubes and arrows, respectively.
Figure 2Fold-recognition alignment between all3650 and structures of Hjc and Hje. Amino acids are colored according to the physico-chemical properties of their side-chains. Conserved residues are highlighted. Secondary structure elements experimentally identified in Hjc and Hje and predicted for all3650 are shown between the target and the template sequences. Known and predicted catalytic residues are indicated by "*" (above the alignment for the target, below the alignment for the templates).
Figure 3Homology model of all3650. Helices and strands are shown in green and yellow, respectively. The predicted catalytic residues are shown in the wireframe representation and labeled. The termini are indicated.
Figure 4Spatial conservation of the PD-(D/E)XK active site in all3650, Hjc, and NgoMVI. A) The predicted structure of all360 is shown in the same orientation as the crystal structures of the bona fide PD-(D/E)XK nucleases: B) Holliday junction resolvase Hje (1ob8 in PDB [9]) and C) REase NgoMIV (1fiu in PDB [78] to illustrate the spatial conservation of side-chains in the active site (the carboxylate residues in red and the Lys residue in blue), despite the lack of their conservation in the PD-EXX-K, PD-DXK, and PD-XXK-E variants of the sequence motif. Only the common core is shown, terminal regions and insertions have been omitted for clarity of the presentation.
Figure 5Localization of COG4636+ family members in the chromosomes of Cyanobacteria with completed genomes. Circular chromosome maps of genomes with at least three genes encoding COG4636+ members (indicated by dots). Genes shown in dark blue are transcribed clockwise (positive reading frame) and those in red are transcribed anticlockwise (negative reading frame). Dots plotted inside the circle indicate that more than one gene is localized in the same region of the map (1/360 of the genome length).
Figure 6The crystal structure of Tt1808 (1wdj in PDB). Tt1808 is shown in the same orientation and is colored and labeled in the same way as the homology model of all3650 on Figure 3. Two regions of differences between Tt1808 and the model of all3650 are indicated: the N-terminal subdomain has a similar fold, but different orientation (magenta line) and the C-terminal region folds as a β-harpin (cyan line) rather than as an α-helix.
Distribution of COG4636+ family members among different bacteria.
| organism / genome | phylum | habitat | data source | COG4636+ members | |
| total | disrupted | ||||
| Gloeobacter violaceus PCC 7421 | Cyanobacteria | calcareous rock | C | 95 | 1 |
| Cyanobacteria | cycad (endosymbiont) | WGS | 71 | 7 | |
| Cyanobacteria | marine water | WGS | 62 | 1 | |
| Cyanobacteria | fresh water | C | 58 | 1 | |
| Cyanobacteria | fresh water | WGS | 45 | 5 | |
| Cyanobacteria | fresh water | C | 36 | 1 | |
| Deinococcus-Thermus | thermal environment | C | 14 | - | |
| Cyanobacteria | marine water | WGS | 10 | 3 | |
| Firmicutes | sewage sludge | WGS | 8 | - | |
| Chloroflexi | fresh water (hot springs) | WGS | 7 | - | |
| Actinobacteria | soil | C | 6 | - | |
| Planctomycetes | marine water | C | 5 | 1 | |
| Firmicutes | fresh water (ponds) | WGS | 3 | - | |
| Deinococcus-Thermus | unknown | C | 3 | - | |
| Proteobacteria | fresh water (ponds) | WGS | 2 | 1 | |
| Cyanobacteria | fresh water | WGS | 2 | - | |
| Aquificae | fresh water (hot springs) | C | 2 | - | |
| Actinobacteria | unknown (isolated from radioactive work area) | WGS | 2 | - | |
| Proteobacteria | fresh water | C | 1 | - | |
| Cyanobacteria | fresh water (hot springs) | C | 1 | - | |
| Cyanobacteria | brackish (euryhaline) and/or marine water | UGS | 1 | - | |
| Cyanobacteria | fresh water (lakes, ponds and rivers) | NR | 1 | - | |
| Prochlorococcus marinus str. MIT9313 | Cyanobacteria | marine water | C | - | - |
| Cyanobacteria | marine water | C | - | - | |
| Cyanobacteria | marine water | C | - | - | |
| Cyanobacteria | marine water | C | - | - | |
C – Completed genomic sequence, WGS – Whole Genome Shotgun, UGS – Unfinished Genomic Sequence, NR – non-redundant database (NCBI). ORFs were regarded as "disrupted" if they bear frameshift mutations or stop codons.