| Literature DB >> 21810204 |
Svetlana Gerdes1, Basma El Yacoubi, Marc Bailly, Ian K Blaby, Crysten E Blaby-Haas, Linda Jeanguenin, Aurora Lara-Núñez, Anne Pribat, Jeffrey C Waller, Andreas Wilke, Ross Overbeek, Andrew D Hanson, Valérie de Crécy-Lagard.
Abstract
BACKGROUND: Identifying functions for all gene products in all sequenced organisms is a central challenge of the post-genomic era. However, at least 30-50% of the proteins encoded by any given genome are of unknown or vaguely known function, and a large number are wrongly annotated. Many of these 'unknown' proteins are common to prokaryotes and plants. We set out to predict and experimentally test the functions of such proteins. Our approach to functional prediction integrates comparative genomics based mainly on microbial genomes with functional genomic data from model microorganisms and post-genomic data from plants. This approach bridges the gap between automated homology-based annotations and the classical gene discovery efforts of experimentalists, and is more powerful than purely computational approaches to identifying gene-function associations.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21810204 PMCID: PMC3223725 DOI: 10.1186/1471-2164-12-S1-S2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Project workflow. The overall strategy that combined in silico and experimental validation is presented showing the number of genes that were analyzed at each stage.
Selection of candidate hypothetical genes families conserved in Arabidopsis (AT) and prokaryotes for in silico functional predictions and potential experimental verification – an overview.
| AT gene families in this study | AT genes (and families) screened | AT genes with prokaryote homolog(s) | AT genes selected for | Gene families connected to metabolic areas | Families with specific hypotheses formulated | Families experimentally tested in this study | Families with validated functions - | |
|---|---|---|---|---|---|---|---|---|
| in this study | by others | |||||||
| Singletons | 3,625 | 666 (18.4%) | 178 | 42 | 21 | 10 | 3 | 5 |
| Duplets | 3,204 (1,602 x2) | 909 (28.4%) | 190 | 21 | 13 | 7 | 3 | 2 |
| Triplets a | 2,421 (807 x3) | 849 (35.0%) | 262 | 14 | 6 | 3 | 2 | 3 |
| (+1)b | (+1)b | (+1)b | ||||||
a Includes several Arabidopsis gene families with 4 or more paralogs
b One candidate was not in the Arabidopsis set (number 4 in Table 2).
c Includes 9 families with functions experimentally validated, 4 invalidated, and 8 for which experimental validation is currently in progress (see Table 2 and Table 3 for details).
Status of the experimentally validated families: cases 1-9 verified by us; cases 10-19 verified by others.
| Case no. | TAIR ID | COG number/ gene name | Subsystem in SEED | Working functional prediction | Experimental verification status | Homologs annotated | Reference |
|---|---|---|---|---|---|---|---|
| 1 | At4g12130 At1g60990 | 0354 ygfZ | YgfZ | Folate-dependent protein for Fe/S cluster synthesis/repair in oxidative stress | Validated in | 327 | [ |
| 2 | At2g20830 | 3643 | Experimental-histidine degradation | Alternative to 5-FCL (EC 6.3.3.2) as a way to metabolize 5-formyltetrahydrofolate | Verified in 5 prokaryotes | 65 | [ |
| 3 | At1g29810 At5g51110 | 2154 phhB | Pterin carbinolamine dehydratase | Pterin-4-alpha-carbinolamine dehydratase (EC 4.2.1.96) with a role in Moco metabolism | Validated in 7 eukaryotes and 8 prokaryotes | 217 | [ |
| 4 | none | 0720 | Experimental-PTPS | Replacement for FolB (EC 4.1.2.25) | Validated in 1 eukaryote and 8 prokaryotes | 65 | [ |
| 5 | At5g60590 | 0009 yrdC | YrdC-YciO-Sua5 protein family | Required for threonylcarbamoyl-adenosine (t(6)A) formation in tRNA | Validated in yeast, archaea and 2 bacteria. | 745 | [ |
| 6 | At2g45270 At4g22720 | 0533 ygjD | YrdC-YciO-Sua5 protein family | Required for threonylcarbamoyl-adenosine (t(6)A) formation in tRNA | Validated in yeast | 691 | [ |
| 7 | At1g15730 At1g26520 At1g80480 | 0523 | COG0523 | Diverse metal chaperones | Validated in several bacteria | 718 | [ |
| 8 | At3g13050 | MFS superfamily NiaP homolog | Niacin-choline transport and metabolism | Niacin and/or choline transporter | Niacin but not choline transport shown for 3 bacterial proteins and the mouse protein | 133 | Manuscript in prep |
| 9 | At1g76730 | 0212 | 5-FCL-like protein | Not a 5-FCL enzyme; involved in thiamine salvage | Cannot replace 5-FCL and lacks detectable 5-FCL activity | 41 | Manuscript submitted |
| 10 | At4g36400 | 0277 bll2569 | COG0277 | D-2-hydroxyglutarate dehydrogenase | D-2-hydroxyglutarate dehydrogenase | 158 | [ |
| 11 | At5g10910 | 0275 mraW | 16S rRNA modification within P site of ribosome | SAM-dependent methyltransferase involved in a process common to eubacteria and chloroplasts | 16S rRNA m(4) C1402 methyltransferase (modification within P site of ribosome) | 877 | [ |
| 12 | At1g45110 | 0313 | 16S rRNA modification within P site of ribosome | Tetrapyrrole family methyltransferase involved in a process common to eubacteria, chloroplasts, and possibly mitochondria | 16S rRNA 2'-O-ribose C1402 methyltransferase (modification within P site of ribosome) | 836 | [ |
| 13 | At5g18570 At1g07620 | 0536 | Iojap | At5g18570 predicted to be plastidial, At1g07615 mitochondrial. Association evidence connects At5g18570 with plastidial iojap (At3g12930) | Essential for embryo development but specific function unclear | 721 | [ |
| 14 | At1g49350 | 2313 yeiN | Pseudouridine catabolism | Sugar catabolism | Involved in pseudouridine metabolism in uropathogenic | 108 | In EC: [ |
| 15 | At1g50510 | 0524 yeiC | Pseudouridine catabolism | Sugar catabolism | Involved in pseudouridine metabolism in uropathogenic | 108 | In EC: [ |
| 16 | At4g10620 At3g57180 At3g47450 | 1161 yqeH | At4g10620 At3g57180 At3g47450 | GTP-binding protein YqeH, involved in replication initiation | At3g57180 (BPG2) functions in brassinosteroid-mediated post-transcriptional accumulation of chloroplast rRNA. At3g47450 (AtNOA1) is a GTPase that regulates nucleic acid recognition | 180 | [ |
| 17 | At3g24430 At4g19540 At5g50960 | 2151 apbC | Scaffold proteins for [4Fe-4S] cluster assembly (MRP family) | Fe-S cluster assembly proteins. The DUF59 (PaaD-like) domain of At3g24430 and its prokaryotic counterparts are also predicted to function in Fe-S cluster assembly. | At5g50960 (Nbp35) functions in Fe-S cluster assembly as a bifunctional molecular scaffold At3g24430 acts as a scaffold protein for [4Fe-4S] cluster assembly in chloroplasts | 276 | [ |
| 18 | At3g57000 | 1756 | rRNA modification Archaea; rRNA methylation in clusters | rRNA modification enzyme | The | 31 | [ |
| 19 | At5g12040 | 0388 | Omega-amidase | Omega amidase in methionine salvage pathway | Biochemical characterization of the rat and mouse orthologs | 113 | [ |
Status of the families invalidated by us (cases 20-23) or in progress (cases 24-31).
| Case no. | TAIR ID | COG number/ gene name | Subsystem in SEED | Working functional prediction | Experimental verification status | Homologs annotated | |
|---|---|---|---|---|---|---|---|
| 20 | At5g43600 | 0624 | Experimental - Histidine Degradation | Alternative form of N-formylglutamate deformylase (EC 3.5.1.68) | No deformylase actiivity detected in | ||
| 21 | At2g23390 | 3146 | COG3146 | Pterin-dependent enzyme | |||
| 22 | At2g04900 | 2363 ywdK | COG2363 | Thiamine-related transporter | |||
| 23 | At1g09150 | 2016 | rRNA modification Archaea; DOE-COG2016 | Ribosome assembly/translation termination | In progress in yeast and | ||
| 24 | At4g26860 At1g11930 | 0325 yggS | PROSC | Pyridoxal phosphate enzyme related to glutamate metabolism | In progress in | ||
| 25 | At1g78620 At5g19930 | 1836 alr1612 | COG1836 | Phytol-phosphate metabolism | Shown to be an essential gene in | ||
| 26 | At5g12950 At5g12960 | 3533 SAV1144 | DOE COG3533 | Hydroxyproline-galactosyl hydrolase | In progress in | ||
| 27 | At3g09250 | 4319 gll0142 | COG4319 | Folate or pterin metabolism enzyme, possibly an alternative DHFR (EC 1.5.1.3), a pterin reductase, or a dihydroneopterin triphosphate hydrolase | |||
| 28 | At3g12930 At1g67620 | 0799 alr4169 | Iojap | NAD-dependent ribosomal modification, possibly involving phosphoester hydrolysis | No pyrophosphatase or NAD cleavage activity detected in | ||
| 29 | At3g01920 | 0009 yciO | YrdC-YciO-Sua5 protein family | RNA/protein modification | In progress in | ||
| 30 | At1g03030 | 1072 yggC | Experimental-yggC | Sugar/polyol kinase | In progress in | ||
| 31 | At4g28830 | 2263 | rRNA modification Archaea | Predicted RNA methylase COG2263 | In progress in |
a Numbers in italics are for members of families for which the prediction has been invalidated or is in progress, they have not been included in the final count.
b S. Fournier and W. Decatur, University of Massachusetts (unpublished)
Figure 2Clustering arrangements of genes encoding COG0354 and functional complementation of an (A) Clustering of COG0354 genes with Fe/S-related genes. Blue, COG0354; red, Fe/S proteins; rose, proteins in same complex or pathway as Fe/S proteins; turquoise, Fe/S cluster assembly proteins. Rx, Rubrobacter xylanophilus; Sm, Stenotrophomonas maltophilia; Pu, Pelagibacter ubique. (B) Growth of an E. coli COG0354 (ygfZ) deletant harboring plasmid-borne E. coli ygfZ, Arabidopsis mitochondrial COG0354, or vector alone on LB medium or LB plus the oxidative stress agent plumbagin (OX) (30 μM), arabinose (0.02% w/v), and appropriate antibiotics.
Figure 3COG3643 in relation to the Hut pathway. (A) Hut pathway; note the three different routes. (B) The distribution of histidine utilization genes among bacterial and eukaryal genomes in relation to that of the ygfA gene for 5-formyltetrahydrofolate disposal. Gene colors correspond to different parts of the pathway as in part A. Lines between boxes denote gene fusions. (C) Growth of an E. coli ygfA deletant harboring plasmid-borne E. coli ygfA, Acidobacterium COG3643, or vector alone on minimal medium with NH4Cl or glycine as sole nitrogen source. The medium contained 1 mM IPTG and appropriate antibiotics.
Figure 4Separation of the COG009 family into two subgroups YrdC and YciO based on motifs and functional assays. (A) Complementation of the t6A- phenotype of the yeast Δsua5 (YGN63) strain by the E. coli yrdC gene but not the E. coli yciO gene. (B) Complementation of the yrdC essentiality phenotype in E. coli by yrdC subfamily members from E .coli (EcyrdC), Bacillus subtilis (BsywlC), Methanococcus maripaludis (MmyrdC) and yeast (Scsua5) but not by yciO from E. coli (EcyciO). All genes were cloned in pBAD24 [95] and were therefore expressed in the presence of arabinose (Ara, 0.2%) and transformed in an E.coli strain carrying the chromosomal copy of yrdC under PTet control [96] that does not grow in the absence of anhydrotetracycline (ATc, 50 ng/ml). (C) Signature motif of the functional homologs of YrdC (KxR/SxN) that are not found in the YciO subfamily. In green are the two homologs from Arabidopsis and their distribution.
Comparison of functional predictions in eNet and AraNet for 10 of the protein families.
| TAIR ID | E.coli ortholog | Working functional prediction | eNet predictions | AraNet predictionsa |
|---|---|---|---|---|
| YgfZ | Folate-dependent protein for Fe/S cluster synthesis/repair in oxidative stress | |||
| At2g20830 | none | Alternative to 5-FCL (EC 6.3.3.2) as a way to metabolize 5-formyltetrahydrofolate | n/a | Response to wounding (1.86), defense response, response to oxidative stress, phenylpropanoid biosynthesis, response to other organism, boron transport, glucosinolate biosynthesis (0.89) |
| At1g29810, At5g51110 | none | Pterin-4-alpha-carbinolamine dehydratase (EC 4.2.1.96) with a role in Moco metabolism | n/a | |
| AT5g12040 | YafV | Omega amidase in methionine salvage pathway | Predicted C-N hydrolase family amidase, NAD(P)-binding | indoleacetic acid biosynthesis (4.27), cellular response to sulfate starvation, cyanide metabolic process, glucosinolate catabolic process, detoxification of nitrogen compound, methylglyoxal catabolic process to D-lactate (1.59) |
| At5g60590 | YrdC | Required for threonylcarbamoyladenosine (t(6)A) formation in tRNA | rRNA processing (3.88), dATP biosynthesis from ADP, histidine biosynthesis, mitochondrial ATP synthesis coupled proton transport, cellular respiration, ATP synthesis coupled proton transport, regulation of transcription (2.07) | |
| At2g45270, At4g22720 | YgjD | Required for threonylcarbamoyladenosine (t(6)A) formation in tRNA | ||
| At1g15730, At1g26520, At1g80480 | YjiA YeiR | Metal chaperone-Zinc homeostasis | ||
| At1g76730 | none | Not a 5-FCL enzyme; involved in thiamine salvage | n/a | Tetrahydrofolate metabolic process (4.56), negative regulation of transcription, response to abscisic acid stimulus (0.89) |
| At4g36400 | none | D-2-hydroxyglutarate dehydrogenase | n/a | Cytoskeleton organization and biogenesis (2.39), actin cytoskeleton organization and biogenesis, ubiquitin-dependent protein catabolic process, response to light stimulus, response to wounding, seed germination (1.24) |
| At1g45110 | yraL | Tetrapyrrole family methyltransferase involved in a process common to eubacteria, chloroplasts, and possibly mitochondria | Toxin catabolic process (5.49), response to oxidative stress, cellular response to water deprivation, response to jasmonic acid stimulus, response to ozone, isoprenoid biosynthesis, electron transport (1.45) |
a Only a few top predictions (out of 30 routinely returned) for AraNet are shown. They are sorted by the AraNet score estimating the gene’s association with each particular process (given in brackets for the first and last predictions shown here).