| Literature DB >> 18467347 |
Makio Tamura1, Patrik D'haeseleer.
Abstract
MOTIVATION: Microbial phenotypes are typically due to the concerted action of multiple gene functions, yet the presence of each gene may have only a weak correlation with the observed phenotype. Hence, it may be more appropriate to examine co-occurrence between sets of genes and a phenotype (multiple-to-one) instead of pairwise relations between a single gene and the phenotype. Here, we propose an efficient class association rule mining algorithm, netCAR, in order to extract sets of COGs (clusters of orthologous groups of proteins) associated with a phenotype from COG phylogenetic profiles and a phenotype profile. netCAR takes into account the phylogenetic co-occurrence graph between COGs to restrict hypothesis space, and uses mutual information to evaluate the biconditional relation.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18467347 PMCID: PMC2718668 DOI: 10.1093/bioinformatics/btn210
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
An example phenotype profile and phylogenetic profiles for three COGs (COG–COG) across six organisms (O1–O6)
| Organism | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Phenotype | 0 | 0 | 1 | 1 | 0 | 0 | |||||||
| 1 | 1 | 1 | 1 | 0 | 0 | ||||||||
| 0 | 0 | 1 | 1 | 1 | 1 | ||||||||
| 1 | 1 | 1 | 1 | 1 | 0 |
Fig. 1.(a) Number of unique COGs in extracted rule within FDR levels. Blue, orange and green lines are 1-COG, 2-COG and 3-COG rules, respectively, and broken lines are number of uncharacterized COGs in the same colored association rules. () Number of unique COGs that are in 2- and 3-COG association rules with FDR of 1.0 × 10−4 but are not in pairwise association rules with much relaxed FDR level ranging between 1.0 × 10−3 and 1.0 × 10−1. (Values for motility are 0, hence the missing curves for that phenotype.)
Fig. 2.(a–f) COG association graphs for the six phenotypes. The nodes are COGs involved in the rules within FDR level of 1.5 × 10−5, 5.0 × 10−8, 1.5 × 10−5, 1.0 × 10−5, 5.0 × 10−5 and 5.0 × 10−14 for aerobic, anaerobic, facultative, endospore, motility and Gram negativity phenotypes, and edges show that the linked COGs are used in the same rule. The orange nodes are COG covered by pairwise association with 100 times relaxed FDR except for anaerobic and Gram negativity with FDR of 0.01, while the green nodes represent the other COGs with a weaker pairwise correlation. The size of each node and the width of each edge are proportional to the frequencies of the corresponding COG and link in the extracted rules, respectively. Darker edges indicate a closer profile similarity between the linked COGs.