| Literature DB >> 15287974 |
Richard Bonneau1, Nitin S Baliga, Eric W Deutsch, Paul Shannon, Leroy Hood.
Abstract
BACKGROUND: Large fractions of all fully sequenced genomes code for proteins of unknown function. Annotating these proteins of unknown function remains a critical bottleneck for systems biology and is crucial to understanding the biological relevance of genome-wide changes in mRNA and protein expression, protein-protein and protein-DNA interactions. The work reported here demonstrates that de novo structure prediction is now a viable option for providing general function information for many proteins of unknown function.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15287974 PMCID: PMC507877 DOI: 10.1186/gb-2004-5-8-r52
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Flow chart depicting the annotation pipeline implemented in this study. Sequence based methods are employed first (top), domains that elude primary sequence based methods are predicted by structure-prediction methods (bottom). For any given genome, data from all levels in this method hierarchy are integrated using SBEAMS (Systems Biology Experiment Analysis and Management System). Implicit in this annotation hierarchy is the idea that protein annotation should be domain-centric (that is, multi-domain proteins should be divided into domains as early as possible in the annotation process). SBEAMS produces a table of annotations where for a given domain only results from the topmost level in the method hierarchy (PDB-BLAST → Pfam → Rosetta) producing a significant hit are displayed.
Figure 2Chemotaxis methyl accepting domains. (a) Htr10 (VNG1505g) domain 1 hit to 1ljwA, hemoglobin. The recently deposited structure for the Hemat Sensor domain (1OR4-A) is also shown (red box). The position of the heme (black spheres) is similar in both our predicted fold match (1LJW-A) and the match detected by PSI-BLAST (1OR4-A) (b) Htr13 (VNG1013g) hit to Gga1 (1jwfA, involved in protein transport, binding of dipeptide signal sequence), (c) the association network surrounding CheA and its interactions with the Htr methyl accepting domains found in the Halobacterium genome, as predicted by the phylogenetic profile method (red lines). Also shown are predicted operon edges (black lines). The expression levels (where red corresponds to a high level of expression and green to a low expression relative to a reference; white indicates no change/no measurement) are from a previously described microarray experiment. Nodes marked with asterisks indicate proteins where a domain was folded with Rosetta (resulting in a significant fold match) or annotated using fold recognition. Nodes marked with a 'P' are proteins that were annotated using Pfam. The '!' by yufN indicates that the prior annotation does not agree with our current analysis.
Htr1-Htr18 chemotaxis annotations
| Gene name | Name | Sensing domain | Length | HAMP domain | Membrane regions | Method | Role/responds to | Annotation |
| VNG1659g | sop1 | 536 | 35-104 | 12-31 | Known | Light | Responds to light via sensory rhodopsin | |
| VNG1765g | sop2 | 764 | 283-352 | 13-35 | Known | Light | Responds to light via sensory rhodopsin | |
| VNG1856g | self/? | 633 | 125-195 | 125-144 | Known | Amino acids | ||
| VNG0806g | self/? | 778 | 298-367 | 29-48, 297-319 | - | - | ||
| VNG1760g | ProX | 810 | 325-394 | 35-57, 325-344 | Known | Amino acids, osmoprotectants | ProX is a putative glycine betatine/choline/proline substrate-binding protein | |
| VNG0793g | yufN | 545 | 295-365 | 21-43, 297-319 | 3D-Jury | Sugars | yufN is annotated as an ABC transporter and lipoprotein | |
| VNG1759g | VNG1758H | 789 | - | 1-91 | Rosetta | - | Weak hit to sensory rhodopsin | |
| VNG1523g | self | 633 | - | 48-206 | Known | Oxygen | Experimentally known to play a role in aerotaxis | |
| VNG1395g | self/? | 481 | - | - | Pfam | Redox/o2/light | PF0989, PAS domain | |
| VNG1505g | self/? | 489 | - | - | Rosetta PSIBLAST | Oxygen | Domain 1 rosetta hit to 1ljwA hemoglobin Domain 1 hit to 1or4A via PSI-BLAST (recent PDB) | |
| VNG1442g | self/ VNG1440H | 420 | - | - | Rosetta 3D-jury Pfam | Redox/o2/light | htr12 has amino-terminal (domain 1) hit to PF0989 (PAS domain) htr12-domain 1 also has a 3d-Jury hit to 1dp6A (FixL heme domain) | |
| VNG1013g | self/? | 423 | - | - | Rosetta | Peptides/? | Hit to 1jwfA (Gga1, involved in protein transport) | |
| VNG0355g | self/? | 627 | 58-129 | 36-58 | Pfam | Peptides/? | Weak hits to PF01920-KE2 domain, PF02996-prefoldin | |
| VNG0958g | VNG0959H | 636 | - | - | 3D-jury | Oxygen | Domain 2 has 3d-jury hit to 1dp6A (FixL Heme domain) | |
| VNG0614g | self/ VNG0613H | 628 | 129-199 | 130-152 | Pfam | Lipids/? | Hit to PF01442 in domain 2 of htr16 (PF01442 is an apolipoprotein Involved in the uptake of lipids and/or cholesterol) | |
| VNG1733g | VNG1734H | 536 | - | 1-91 | - | - | Not applicable | |
| VNG0812g | PotD | 790 | 257-327 | - | known | Lipids | PotD is a spermidine/putrescine binding protein |
Annotations for IS element rich regions
| Name | IS-element | Method | Annotation |
| VNG5101H/6098H | ISH2 | Pfam | PF01402 CopG ribbon helix, regulates plasmid copy number |
| VNG5102H/6099H | - | TMHMM | Membrane protein, unknown function |
| VNG5104H/6101H | - | - | - |
| VNG5105H/6102H | - | Meta-Server | Hit to 1dhx, Coper binding protein |
| VNG5106H/6103H | - | TMHMM | Membrane protein, unknown function |
| VNG5108H/6105H | - | Rosetta | Hit to 1d1A2, capsid protein/transcription factor |
| VNG5109H/6106H | ISH8 | Pfam | PF01609, transposase DDE domain |
| VNG5112H/6109H | - | Rosetta | Hit to 1dt9A1, translation initiation factor |
| VNG5114H/6111H | ISH3 | - | - |
| VNG5115H/6112H | - | Pfam | PF00589, phage integrase family |
| VNG5116H/6113H | - | Meta-Server | Hit to 1d1qA, phosphotyrosine protein phosphatase |
| VNG5118H/6115H | - | Rosetta | 1he8A3, serine/threonine protein phosphatase |
| VNG5119H/6116H | - | Meta-Server | 1smtA, winged helix (DNA binding) in domain 1 |
| VNG5120H/6117H | - | Rosetta | small protein, 2 hits to 1asu00 phage integrase (weak hits) |
| VNG5122H/6119H | ISH7 | Pfam | PF01609 |
| VNG5123H/6120H | ISH7 | TMHMM | membrane protein, unknown function |
| VNG5124H/6121H | - | - | - |
| VNG5040H | ISH8 | Pfam | PF01609 |
| VNG5041H/5256H | - | Rosetta | - |
| VNG5042H/5255H Domain 1 | ISH9 | Rosetta | Hit to 1ez3A0, 2 long helices, no function annotation (domain 1) |
| VNG5042H/5255H Domain 2 | ISH9 | Pfam | PF01609 (domain 2) |
| VNG5044H/5253H | ISH5 | Pfam | PF01609 |
| VNG5045H/5252H | ISH11 (in ISH5) | Pfam | PF01609 |
| VNG5047H/5250H | - | Rosetta | Hit to 1am3 (HIV capsid protein), 1.10.1200.30 |
| VNG5048H/5249H | - | Rosetta | Hit to 1ais (cyclin-like fold/TBP fragment), 1.10.472.10 |
| VNG5049H/5248H | - | Rosetta | Hit to 2ezh (transposase/transcription factor), 1.10.10.60 |
| VNG5050H/5247H | - | Pfam | PF03551, PadR repressor |
| tbpB | - | known | tata-box binding protein B |
Figure 3IS-element (insertion sequences) rich regions on the minichromosome. (a) Segment of Halobacterium genome corresponding to genes VNG5101H - sojD (and duplicate region VNG6098H - sojD). IS-elements are shown above as colored boxes. Open reading frames are indicated as red/pink (on 3' strand) or blue/sky-blue boxes (on 5' strand). (b) Top ranked Rosetta prediction for VNG6109H shown next to its closest match in the PDB, 1dt9A1 (translation initiation factor sub-domain). (c) Segment of Halobacterium genome corresponding to genes VNG5244H - VNG5256H (duplicated on the opposite strand elsewhere on the minichromosome, VNG5053H - VNG5041H). (d) Top ranked Rosetta prediction for VNG5049H shown next to its closest hit in the PDB, 2ezh. (e) Top ranked Rosetta prediction for VNG5047H shown next to its closest hit in the PDB, 1am3.
Predicted transcriptional regulators
| Protein | Cluster number | CATH ID | Z-score | Confidence | Other hits |
| VNG0389C | 1 | 1.10.10.10 | 6.55 | 0.343 | Weak FFAS hit to 1mzbA (ferric uptake gene regulator) |
| 2 | 1.10.10.10 | 7.40 | 0.4 | ||
| 4 | 1.10.10.10 | 6.76 | 0.357 | ||
| VNG2614H | 1 | 1.10.10.10 | 6.64 | 0.313 | PSI-BLAST to 1jgsA (winged helix, MarR) |
| 4 | 1.10.10.60 | 7.12 | 0.419 | ||
| 5 | 1.10.10.10 | 8.09 | 0.469 | ||
| VNG0768H | 1 | 1.10.10.10 | 7.14 | 0.385 | Not applicable |
| 9 | 1.10.10.10 | 8.09 | 0.481 | ||
| VNG2369C | 1 | 1.10.10.10 | 7.14 | 0.323 | Not applicable |
| 5 | 1.10.10.10 | 7.69 | 0.424 | ||
| VNG2641H | 1 | 1.10.10.10 | 7.58 | 0.321 | Not applicable |
| VNG1640H | 1 | 1.10.10.10 | 7.86 | 0.34 | Not applicable |
| VNG5156H | 2 | 1.10.10.10 | 7.94 | 0.401 | Weak FFAS hit to 1i1gA (LRP-like transcriptional regulator) |
| 3 | 1.10.10.10 | 9.36 | 0.502 | ||
| 4 | 1.10.10.10 | 8.55 | 0.444 | ||
| VNG5108H | 14 | 1.10.10.10 | 8.04 | 0.388 | Not applicable |
| VNG6047H | 1 | 1.10.10.10 | 8.15 | 0.446 | Weak FFAS hit to 1id3D (histone fold) |
| 3 | 1.10.10.60 | 8.69 | 0.524 | ||
| 14 | 1.10.10.10 | 10.03 | 0.58 | ||
| VNG0703H | 1 | 1.10.10.10 | 8.17 | 0.439 | PSI-BLAST to 1i1gA (LPR-like regulator) |
| 2 | 1.10.10.10 | 7.56 | 0.37 | ||
| 4 | 1.10.10.10 | 7.88 | 0.428 | ||
| 5 | 1.10.10.10 | 7.16 | 0.369 | ||
| 6 | 1.10.10.10 | 8.96 | 0.505 | ||
| VNG0462C | 1 | 1.10.10.10 | 8.42 | 0.527 | PSI-BLAST to 1lnwA (mexR repressor, winged helix fold) PSI-BLAST to ArsR |
| 2 | 1.10.10.10 | 9.77 | 0.621 | ||
| 4 | 1.10.10.10 | 7.66 | 0.489 | ||
| 5 | 1.10.10.10 | 8.57 | 0.545 | ||
| VNG2014H | 14 | 1.10.10.10 | 8.72 | 0.379 | Not applicable |
| VNG1229H | 1 | 1.10.10.10 | 8.80 | 0.463 | Not applicable |
| 4 | 1.10.10.10 | 7.95 | 0.339 | ||
| VNG0837H | 2 | 1.10.10.10 | 9.02 | 0.476 | Not applicable |
| 4 | 1.10.10.10 | 7.53 | 0.377 | ||
| 5 | 1.10.10.10 | 8.39 | 0.436 | ||
| VNG6479H | 1 | 1.10.10.10 | 9.29 | 0.53 | Not applicable |
| VNG0039H | 1 | 1.10.10.10 | 9.36 | 0.478 | |
| 2 | 1.10.10.10 | 8.60 | 0.438 | Weak FFAS hit to 1mkmA (transcriptional regulator IclR, amino-terminal domain) | |
| 3 | 1.10.10.10 | 9.28 | 0.438 | ||
| 5 | 1.10.10.10 | 9.96 | 0.522 | ||
| VNG0293H | 3 | 1.10.10.60 | 6.14 | 0.395 | PSI-BLAST to 1i1gA (LPR-like regulator) |
| 4 | 1.10.10.10 | 7.01 | 0.427 | ||
| 5 | 1.10.10.10 | 6.63 | 0.393 | ||
| VNG2074H | 11 | 1.10.10.60 | 7.09 | 0.354 | Not applicable |
| 1 | 1.10.10.60 | 6.03 | 0.286 | ||
| VNG6251H | 3 | 1.10.10.60 | 8.06 | 0.378 | TMHMM predicts 1 TM helix |
| VNG2133H | 2 | 1.10.472.10 | 8.25 | 0.351 | Not applicable |
| VNG6287H | 15 | 1.10.472.10 | 9.14 | 0.306 | Weak FFAS hit to 1smt-A (SMTB repressor) |
| 4 | 1.10.472.10 | 8.93 | 0.283 | ||
| 2 | 1.10.472.10 | 8.50 | 0.269 | ||
| VNG0511H | 8 | 1.10.472.10 | 9.25 | 0.512 | PSI-BLAST hit to 1i1gA |
| VNG1865H | 8 | 1.10.472.10 | 9.55 | 0.365 | Not applicable |
| 4 | 1.10.472.10 | 7.18 | 0.18 | ||
Figure 4Predicted transcriptional regulators. Rosetta predictions for three Halobacterium NRC-1 proteins that are consistent with transcription regulation and/or DNA binding. (a) The top ranked Rosetta structure prediction for VNG0462C (according to the Rosetta confidence function) is shown next to the diphtheria toxin repressor (1bi2-B). The predicted operon for VNG0462 is shown below; red/pink boxes above the line in this diagram are genes on the 3' strand while genes indicated by rectangles below the line are genes on the 5' strand. (b) The top ranked model for VNG5156H (left) is shown next to 1bi2-B, the predicted operon containing VNG5156, VNG5154, VNG5153 and VNG5152 is shown below. (c) The top ranked Rosetta prediction for VNG0039H is shown next to its closest match in the PDB, 1bi2-B.