| Literature DB >> 17038185 |
Chern-Sing Goh1, Tara A Gianoulis, Yang Liu, Jianrong Li, Alberto Paccanaro, Yves A Lussier, Mark Gerstein.
Abstract
BACKGROUND: The ability to rapidly characterize an unknown microorganism is critical in both responding to infectious disease and biodefense. To do this, we need some way of anticipating an organism's phenotype based on the molecules encoded by its genome. However, the link between molecular composition (i.e. genotype) and phenotype for microbes is not obvious. While there have been several studies that address this challenge, none have yet proposed a large-scale method integrating curated biological information. Here we utilize a systematic approach to discover genotype-phenotype associations that combines phenotypic information from a biomedical informatics database, GIDEON, with the molecular information contained in National Center for Biotechnology Information's Clusters of Orthologous Groups database (NCBI COGs).Entities:
Mesh:
Year: 2006 PMID: 17038185 PMCID: PMC1630430 DOI: 10.1186/1471-2164-7-257
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Number of validated associations at the 0.8 and 0.9 threshold
| Corr Scores ≥0.8 | 290 | 154 | 100 | 66 | 66% |
| Corr Scores ≥0.9 | 74 | 36 | 36 | 31 | 86% |
Accuracy of associations confirmed by literature broken down by individual condition. Characterized are those pairs where the COG has a known function. Confirmed are those associations that were verified in the literature.
| 17 | 16 | 94% | 12 | 12 | 100% | |
| 6 | 3 | 50% | 2 | 2 | 100% | |
| 5 | 0 | 0% | NA | NA | NA | |
| 31 | 16 | 52% | 4 | 4 | 100% | |
| 2 | 2 | 100% | NA | NA | NA | |
| 11 | 7 | 64% | 8 | 5 | 63% | |
| 1 | 1 | 100% | 1 | 1 | 100% | |
| 1 | 0 | 0% | NA | NA | NA | |
| 1 | 0 | 0% | NA | NA | NA | |
| 2 | 2 | 100% | NA | NA | NA | |
| 1 | 0 | 0% | 2 | 0 | 0% | |
| 2 | 2 | 100% | 2 | 2 | 100% | |
| 1 | 0 | 0% | NA | NA | NA | |
| 17 | 17 | 100% | 5 | 5 | 100% | |
| 1 | 0 | 0% | NA | NA | NA | |
Overview of representative hits above 0.8
| COG0763/Lipid A | 0.95 | 1.71E-09 | Involved in Lipid A biosynthesis [26–28] | |
| COG0774/UDP-3-O-acyl-N-acetylglucosamine | 0.95 | 1.71E-09 | Involved in Lipid A biosynthesis [29] [27,30] | |
| COG1212/CMP-2-keto-3-deoxyoctulosonic | 0.95 | 1.71E-09 | Involved in lipopolysaccharide biosynthesis [31] | |
| COG2877/3-deoxy-D-manno-octulosonic acid | 0.95 | 1.71E-09 | Involved in lipopolysaccharide biosynthesis [32] | |
| COG0848/Biopolymer | 0.95 | 1.71E-09 | Outer membrane transporters [33,34] | |
| COG2885/Outer membrane | 0.84 | 2.46E-09 | Outer membrane protein [35] | |
| COG3764/Sortase | 1.0 | 2.59E-08 | Plasma membrane protein [36] | |
| COG2344/AT-rich DNA-binding | 0.92 | 7.77E-07 | Transcriptional regulator [37] | |
| COG3966/Protein involved in D-alanine | 0.84 | 1.2E-05 | Cell wall and membrane component protein [38] | |
| COG4206/Outer membrane | 0.99 | 8.04E-09 | Outer membrane protein [39] | |
| COG4787/Flagellar basal body | 0.97 | 2.33E-07 | Periplasmic protein [40] | |
| COG3166/Tfp pilus assembly | 0.83 | 9.77E-06 | Outer membrane proteins [41] | |
| COG3278/Cbb3-type cytochrome oxidase, | 0.85 | 7.84E-06 | Oxidase protein subunit | |
| COG2993/Cbb3-type cytochrome oxidase, | 0.85 | 7.84E-06 | Oxidase protein subunit | |
| COG0753/Catalase | 0.97 | 7.69E-06 | Peroxisomal Marker Enzyme | |
| COG1607/Acyl-CoA hydrolase | 0.97 | 7.69E-06 | Enzyme involved in lipid metabolism [13] | |
| COG3717/5-keto 4-deoxyuronate isomerase | 0.97 | 3.95E-05 | Enzyme involved in carbohydrate metabolism [14,15] | |
| COG0246/Mannitol-1-phosphate/altronate | 0.85 | 4.68E-05 | Oxidizes mannitol to mannose [42] | |
| COG2182/Maltose-binding periplasmic | 0.94 | 3.4E-05 | Maltose-related protein [16] | |
| COG0835/Chemotaxis signal | 0.94 | 4.93E-09 | Chemotaxis-related protein | |
| COG1345/Flagellar capping protein | 0.94 | 4.93E-09 | Flagella-related protein | |
| COG1516/Flagellin-specific chaperone FliS | 0.94 | 4.93E-09 | Flagella-related protein |
Figure 2Number of COG-phenotype associated pairs in each subset of the 0.8 and 0.9 threshold correlation score data sets. The resulting data sets of the (a) 0.8 correlation threshold and the (b) 0.9 correlation threshold are broken down into four different subsets. Total number (dark blue) is the total number of COG-phenotype associated pairs found at the 0.8 and 0.9 thresholds respectively. Characterized (light purple) refers to those pairs where the COG has a known function. Annotated (blue-green) are those pairs which were selected for literature verification. Finally, confirmed (light blue) are the associations which were validated in the literature. This is shown for each lab indicated by its GIDEON identifier.