| Literature DB >> 27814675 |
Zhen Tian1, Chunyu Wang1, Maozu Guo2, Xiaoyan Liu1, Zhixia Teng1,3.
Abstract
BACKGROUND: In recent years, many measures of gene functional similarity have been proposed and widely used in all kinds of essential research. These methods are mainly divided into two categories: pairwise approaches and group-wise approaches. However, a common problem with these methods is their time consumption, especially when measuring the gene functional similarities of a large number of gene pairs. The problem of computational efficiency for pairwise approaches is even more prominent because they are dependent on the combination of semantic similarity. Therefore, the efficient measurement of gene functional similarity remains a challenging problem.Entities:
Keywords: Gene functional similarity; Gene ontology; Hash table
Mesh:
Substances:
Year: 2016 PMID: 27814675 PMCID: PMC5096311 DOI: 10.1186/s12859-016-1294-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Key information affecting computational efficiency for each method
| Methods | Key information affecting computational efficiency |
|---|---|
| Resnik | LCA( |
| Lin | LCA( |
| Jiang | LCA( |
| Pekar | LCA( |
| Wang | SV( |
| simUI | A( |
| simGIC | A( |
For pairwise approaches, we focus on the semantic similarity between t 1 and t 2. For group-wise approaches, the functional similarity between gene g 1 and g 2 requires special attention
Fig. 1The flowchart of the proposed strategy
Direct information and essential information for each method
| Method | Direct information | Essential information | Explanation |
|---|---|---|---|
| Resnik | IC(LCA( | IC( | The IC values of |
| Lin | IC( | ||
| Jiang | IC( | ||
| Pekar | Dep (LCA ( | Depth ( | The depth of |
| Wang | ∑( |
| The S-values of |
| simUI | | |
| |
| simGIC |
|
| The IC values of |
Fig. 2The main idea of the proposed strategy adopted for the Wang method. a Depicts a DAG for GO term Intracellular Membrane-bound Organelle: 0043231. b Depicts the hash table established from (a). Each row in (b) is called a record. For each record, the key of the record is the ID of the GO term, and the value of the record is a link list that contains all of the S-values of the key. For each term in (a), there is a corresponding record in (b). We can obtain the essential information from the hash table directly instead of from the DAG in (a). The proposed strategy converts the storage form of information from the GO graph into hash tables to speed the calculation process
S-values for GO terms in the DAG for intracellular membrane-bound organelle: 0043231
| GO terms | 0043231 | 0043229 | 0043227 | 0005622 |
|---|---|---|---|---|
| S-value | 1.0 | 0.8 | 0.8 | 0.48 |
| GO terms | 0005623 | 0043226 | 0005575 | |
| S-value | 0.288 | 0.64 | 0.512 |
Fig. 3The algorithm for establishing a hash table from the GO graph for the Wang method
Time complexity for measuring gene functional similarity of each method
| Method | Time complexity with the proposed strategy | Time complexity without the proposed strategy | |
|---|---|---|---|
| Step one | Step two | ||
| Resnik | O(n3) | O(m*k2*n*logn) | O(m*n3*k2*n*logn) |
| Jiang | O(n3) | O(m*k2*n*logn) | O(m*n3*k2*n*logn) |
| Lin | O(n3) | O(m*k2*n*logn) | O(m*n3*k2*n*logn) |
| Pekar | O(n4) | O(m*k2*n*logn) | O(m*n4*k2*n*logn) |
| Wang | O(n4) | O(m*k2*n*logn) | O(m*n4*k2*n*logn) |
| simUI | O(n3) | O(m*k2*n*logn) | O(m*n3*k2*n*logn) |
| simGIC | O(n3) | O(m*k2*n*logn) | O(m*n3*k2*n*logn) |
Time in seconds required to establish the hash table for each method on BP, CC and MF ontologies
| Type | Resnik | Pekar | Wang | simUI | simGIC |
|---|---|---|---|---|---|
| BP | 441 | 876 | 181 | 84 | 562 |
| CC | 264 | 46 | 2.3 | 1.6 | 267 |
| MF | 379 | 179 | 4.9 | 4.7 | 384 |
Running time to measure semantic similarity between term pairs for three ontologies
| Method | # of term pairs | BP | MF | CC | |||
|---|---|---|---|---|---|---|---|
| # of related terms | Time (s) | # of related terms | Time (s) | # of related terms | Time (s) | ||
| Resnik | 106 | 27,864 | 5.9 | 9,943 | 2.5 | 3,817 | 4.4 |
| Jiang | 106 | 27,864 | 6.7 | 9,943 | 2.9 | 3,817 | 3.8 |
| Lin | 106 | 27,864 | 6.0 | 9,943 | 2.7 | 3,817 | 3.6 |
| Wang | 106 | 27,864 | 6.2 | 9,943 | 2.6 | 3,817 | 4.1 |
| Pekar | 106 | 27,864 | 5.8 | 9,943 | 2.6 | 3,817 | 3.7 |
Time in seconds to measure gene functional similarity of five organisms
| Organism | Type | # of annotated genes | # of average annotations | # of gene pairs | Time Resnik | Time | Time | Time | Time | Time | Time |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Human | BP | 39337 | 4.35 | 7.74 × 108 | 36702 | 36487 | 36366 | 36427 | 36972 | 19088 | 19762 |
| CC | 35975 | 2.56 | 6.47 × 108 | 5843 | 5815 | 5814 | 5810 | 5869 | 4921 | 5815 | |
| MF | 38404 | 2.32 | 7.37 × 108 | 323 | 326 | 327 | 321 | 326 | 4519 | 4828 | |
| Arabidopsis | BP | 25532 | 3.24 | 3.26 × 108 | 6945 | 6957 | 5948 | 6956 | 6953 | 5238 | 5590 |
| CC | 17683 | 2.00 | 1.56 × 108 | 812 | 808 | 841 | 816 | 816 | 962 | 989 | |
| MF | 20305 | 1.92 | 2.06 × 108 | 632 | 616 | 619 | 622 | 628 | 1048 | 1091 | |
| Yeast | BP | 5906 | 3.20 | 1.74 × 107 | 578 | 580 | 586 | 583 | 579 | 353 | 383 |
| CC | 5660 | 2.22 | 1.60 × 107 | 143 | 146 | 145 | 146 | 144 | 432 | 142 | |
| MF | 5902 | 2.30 | 1.74 × 107 | 66 | 64 | 66 | 65 | 64 | 94 | 100 | |
| Rat | BP | 23319 | 4.62 | 2.65 × 108 | 13650 | 13690 | 13641 | 13714 | 13705 | 7219 | 7954 |
| CC | 22217 | 2.60 | 2.47 × 108 | 2319 | 2314 | 2299 | 2411 | 2319 | 1911 | 2083 | |
| MF | 23065 | 2.49 | 2.66 × 108 | 1334 | 1315 | 1351 | 1339 | 1330 | 1750 | 1874 | |
| Oryza | BP | 1909 | 1.44 | 1.82 × 106 | 8 | 8 | 7 | 8 | 8 | 13 | 13 |
| CC | 39995 | 1.06 | 8.00 × 108 | 1262 | 1343 | 1307 | 1258 | 1254 | 2976 | 3115 | |
| MF | 2041 | 1.60 | 2.08 × 106 | 5 | 5 | 5 | 4 | 5 | 9 | 10 |
Running time in seconds for each tool to measure semantic similarity
| Tool | # of term pairs | ||
|---|---|---|---|
| 102 | 104 | 106 | |
| SGFSC | <1 | 2 | 9.4 |
| GFSAT | 68 | 6,387 | X |
| GOSemSim | 52 | 3,634 | X |
Running time in seconds for each tool to measure gene functional similarity
| Tool | # of gene pairs | ||
|---|---|---|---|
| 102 | 104 | 106 | |
| SGFSC | <1 | 29 | 768 |
| GFSAT | 163 | 36,514 | X |
| GOSemSim | 78 | 13,056 | X |
Fig. 4Bar plots of running time in measuring semantic similarity between term pairs
Fig. 5Bar plots for running time in measuring gene functional similarity using Wang method on each organism
Fig. 6Bar plots for running time for each method in measuring functional similarity on human genomic scale
Fig. 7Bar plots for running time on different datasets using SGFSC, GFSAT and GOSemSim