| Literature DB >> 35057740 |
Zhen Tian1, Haichuan Fang1, Yangdong Ye1, Zhenfeng Zhu2.
Abstract
BACKGROUND: Recently, with the foundation and development of gene ontology (GO) resources, numerous works have been proposed to compute functional similarity of genes and achieved series of successes in some research fields. Focusing on the calculation of the information content (IC) of terms is the main idea of these methods, which is essential for measuring functional similarity of genes. However, most approaches have some deficiencies, especially when measuring the IC of both GO terms and their corresponding annotated term sets. To this end, measuring functional similarity of genes accurately is still challenging.Entities:
Keywords: Gene functional similarity; Gene ontology; Information content; Specificity of terms and edges
Mesh:
Year: 2022 PMID: 35057740 PMCID: PMC8772239 DOI: 10.1186/s12859-022-04557-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The amount distribution of terms based on IC with respect to ontologies a BP, b CC, and c MF. X-axis and Y-axis indicate the the scope of IC value and the amount of terms respectively
Fig. 2The amount distribution of terms based on depth with respect to BP, CC and MF ontologies
Fig. 3The detailed information of phenylalanine degradation pathway
Fig. 4Gene function classification results with respect to MF ontology. a Resnik, b Wang, c VSM and d STE
AUC values in S. cerevisiae datasets with respect to ontology BP, CC and MF (IEA+ and IEA-)
| Methods | BP_IEA+ | CC_IEA+ | MF_IEA+ | BP_IEA- | CC_IEA- | MF_IEA- |
|---|---|---|---|---|---|---|
| STE | 0.7441 | 0.8343 | ||||
| simGIC | 0.8198 | 0.8223 | 0.8647 | 0.7023 | ||
| Resnik | 0.7888 | 0.8211 | 0.6987 | 0.7949 | 0.8043 | 0.6182 |
| WIS | 0.8184 | 0.8249 | 0.7371 | 0.8643 | 0.8122 | 0.7259 |
| simUI | 0.8095 | 0.8213 | 0.7253 | 0.8447 | 0.8004 | 0.7098 |
| VSM | 0.8115 | 0.8246 | 0.7294 | 0.8477 | 0.8033 | 0.7088 |
| Wang | 0.7932 | 0.8028 | 0.7110 | 0.8262 | 0.7948 | 0.6905 |
The best results are in bold
AUC values in H. sapiens datasets with respect to ontology BP, CC and MF (IEA+ and IEA-)
| Methods | BP_IEA+ | CC_IEA+ | MF_IEA+ | BP_IEA- | CC_IEA- | MF_IEA- |
|---|---|---|---|---|---|---|
| STE | 0.7504 | 0.7228 | 0.6839 | |||
| simGIC | 0.8381 | 0.7839 | 0.6730 | |||
| Resnik | 0.6696 | 0.6714 | 0.7033 | 0.7264 | 0.6638 | 0.6662 |
| WIS | 0.8049 | 0.6734 | 0.6637 | 0.7718 | 0.6604 | 0.6835 |
| simUI | 0.7921 | 0.6484 | 0.6208 | 0.7734 | 0.6586 | 0.6836 |
| VSM | 0.7896 | 0.6564 | 0.6297 | 0.7825 | 0.6675 | 0.6732 |
| Wang | 0.7334 | 0.6260 | 0.5824 | 0.7404 | 0.6466 | 0.6474 |
The best results are in bold
Pearson’s correlation coefficient with gene expression dataset with respect to ontology BP, CC and MF (IEA+ and IEA-)
| Methods | BP_IEA+ | CC_IEA+ | MF_IEA+ | BP_IEA- | CC_IEA- | MF_IEA- |
|---|---|---|---|---|---|---|
| STE | 0.4048 | 0.4197 | 0.2411 | 0.4403 | 0.5412 | |
| simGIC | 0.4212 | 0.1972 | ||||
| Resnik | 0.3135 | 0.2219 | 0.3818 | 0.5286 | 0.1439 | |
| WIS | 0.3993 | 0.4125 | 0.2457 | 0.4318 | 0.5162 | 0.1980 |
| simUI | 0.3799 | 0.4003 | 0.2241 | 0.4252 | 0.5151 | 0.1889 |
| VSM | 0.3416 | 0.3621 | 0.1941 | 0.4024 | 0.4999 | 0.1806 |
| Wang | 0.2160 | 0.2292 | 0.0695 | 0.3141 | 0.4046 | 0.0563 |
The best results are in bold
Fig. 5DAG for GO term organelle assembly:0070925
The IC values of corresponding terms in Fig. 5
| Term | ||||||||
|---|---|---|---|---|---|---|---|---|
| IC | 0.0 | 0.01 | 0.02 | 0.04 | 0.05 | 0.07 | 0.09 | 0.18 |
The weight values of corresponding edges in Fig. 5
| Edge | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Weight | 0.85 | 0.98 | 0.97 | 0.84 | 0.71 | 0.65 | 0.71 | 0.72 | 0.67 | 0.84 |
The computational process for measuring the IC of term set S
| Step | Term | |||
|---|---|---|---|---|
| 1 | 0.0 | 0 | 0.000 | |
| 2 | 0.01 | 0.010 | ||
| 3 | 0.02 | 0.030 | ||
| 4 | 0.04 | 0.044 | ||
| 5 | 0.05 | 0.080 | ||
| 6 | 0.07 | 0.124 | ||
| 7 | 0.09 | 0.150 | ||
| 8 | 0.18 | 0.208 |