| Literature DB >> 21884611 |
Norberto Díaz-Díaz1, Jesús S Aguilar-Ruiz.
Abstract
BACKGROUND: The Gene Ontology (GO) provides a controlled vocabulary for describing the functions of genes and can be used to evaluate the functional coherence of gene sets. Many functional coherence measures consider each pair of gene functions in a set and produce an output based on all pairwise distances. A single gene can encode multiple proteins that may differ in function. For each functionality, other proteins that exhibit the same activity may also participate. Therefore, an identification of the most common function for all of the genes involved in a biological process is important in evaluating the functional similarity of groups of genes and a quantification of functional coherence can helps to clarify the role of a group of genes working together.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21884611 PMCID: PMC3248071 DOI: 10.1186/1471-2105-12-360
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overall scheme of the method. Overall scheme of the method used to calculate the gene-set functional dissimilarity measure GFD. The first three steps are used for all three ontologies. The last two steps are only illustrated for the Biological Process ontology.
Figure 2ROC Analysis. ROC analysis for the GFD, GS2, Wang and Resnik approaches, as applied to each of three GO ontologies.The area under the ROC curve is indicated in brackets.
Computational analysis.
| Pathway | Genes | Annotations | Representations | Combinations ( | Time (sec) |
|---|---|---|---|---|---|
| sce01100 | 645 | 2544 | 72354 | 1131.73 | 1126 |
| 2046 | 23578 | 727.49 | 645 | ||
| sce01110 | 235 | 1005 | 26745 | 430.06 | 114 |
| 716 | 10502 | 277.62 | 108 | ||
| sce03008 | 157 | 312 | 4090 | 208.11 | 3 |
| 557 | 7883 | 189.89 | 41 | ||
| sce04113 | 127 | 884 | 8850 | 204.08 | 19 |
| 450 | 4917 | 152.66 | 33 | ||
| sce04111 | 125 | 909 | 10410 | 216.56 | 20 |
| 461 | 5801 | 149.46 | 29 | ||
The five Saccharomyces cerevisiae pathways with the highest number of known genes. For each set of genes (each pathway) the upper row represents real data and the lower row illustrates the pseudorandomly generated data.