| Literature DB >> 35536192 |
Ahmad Pesaranghader1,2,3,4, Stan Matwin5,6,7, Marina Sokolova6,8, Jean-Christophe Grenier1, Robert G Beiko5,6, Julie Hussin1,2.
Abstract
MOTIVATION: There is a plethora of measures to evaluate functional similarity (FS) of genes based on their co-expression, protein-protein interactions, and sequence similarity. These measures are typically derived from hand-engineered and application-specific metrics to quantify the degree of shared information between two genes using their Gene Ontology (GO) annotations.Entities:
Year: 2022 PMID: 35536192 PMCID: PMC9154256 DOI: 10.1093/bioinformatics/btac304
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Experiments and datasets used for the evaluation of FS measures
| Yeast dataset | Human dataset | Task | |||
|---|---|---|---|---|---|
| No. of gene pairs | No. of genes | No. of gene pairs | No. of genes | ||
| PPI | 50 154 | 4591 | 65 542 | 14 096 | PPI classification |
| Sequence homology | 26 757 | 3972 | 381 379 | 13 626 | Sequence similarity estimation |
| Gene expression | 37 405 | 2239 | 62 470 | 2361 | Level of co-expression redection |
Sense similarity results for three BP terms over pretrained embeddings
| Query | GO term ID | GO term name |
|---|---|---|
|
|
|
|
| 1 | GO: 0072523 | Purine-containing compound catabolic process |
| 2 | GO: 0072527 | Pyrimidine-containing compound metabolic process |
| 3 | GO: 0072529 | Pyrimidine-containing compound catabolic process |
| 4 | GO: 0052803 | Imidazole-containing compound metabolic process |
| 5 | GO: 0046453 | Dipyrrin metabolic process |
|
|
|
|
| 1 | GO: 0000398 | mRNA splicing, via spliceosome |
| 2 | GO: 0048024 | Regulation of mRNA splicing, via spliceosome |
| 3 | GO: 0000380 | Alternative mRNA splicing, via spliceosome |
| 4 | GO: 0090615 | Mitochondrial mRNA processing |
| 5 | GO: 0000395 | mRNA 5′-splice site recognition |
|
|
|
|
| 1 | GO: 0001115 | Protein–DNA–RNA complex subunit organization |
| 2 | GO: 0001117 | Protein–DNA–RNA complex disassembly |
| 3 | GO: 0071165 | GINS complex assembly |
| 4 | GO: 0071824 | Protein–DNA complex subunit organization |
| 5 | GO: 0032986 | Protein–DNA complex disassembly |
PPI F1-score prediction of the yeast data (FS aggregation uses MAX)
| Including IEA (%) | Excluding IEA (%) | |||||||
|---|---|---|---|---|---|---|---|---|
| ALL | BP | CC | MF | ALL | BP | CC | MF | |
| Resnik | 87.29 | 85.65 | 81.57 | 74.06 | 86.91 | 83.28 | 79.96 | 72.00 |
| Lin | 78.75 | 85.53 | 79.12 | 73.37 | 81.24 | 82.68 | 77.44 | 73.47 |
| Jiang and Conrath | 78.75 | 84.77 | 79.06 | 72.26 | 80.79 | 81.27 | 76.65 | 74.11 |
| GraSM | 87.55 | 85.33 | 81.35 | 74.16 | 86.83 | 83.26 | 80.08 | 72.16 |
| AIC | 78.39 | 85.71 | 79.13 | 72.99 | 81.18 | 82.40 | 77.70 | 73.73 |
| clusteredGO | 78.98 | 84.70 | 78.93 | 72.68 | 80.92 | 81.13 | 76.59 | 74.59 |
| simGIC | 68.22 | 63.31 | 61.56 | 59.27 | 67.84 | 62.52 | 61.22 | 58.62 |
| simDEF | 88.56 | 86.74 | 82.67 | 75.42 | 88.38 | 84.45 | 81.43 | 74.31 |
| AicInferSentGO | 88.61 | 86.71 | 82.75 | 75.47 | 88.31 | 84.38 | 81.28 | 74.36 |
| deepSimDEF (random emb.) | 90.05 | 88.88 | 88.08 | 84.77 | 90.07 | 86.71 | 86.45 | 83.57 |
| deepSimDEF (LSA emb.) | 92.78 | 91.57 | 89.58 | 87.64 | 92.99 | 91.68 | 89.35 | 86.69 |
PPI F1-score prediction of the human data (FS aggregation uses MAX)
| Including IEA (%) | Excluding IEA (%) | |||||||
|---|---|---|---|---|---|---|---|---|
| ALL | BP | CC | MF | ALL | BP | CC | MF | |
| Resnik | 88.02 | 86.59 | 81.70 | 75.14 | 87.96 | 84.33 | 80.60 | 73.31 |
| Lin | 79.02 | 85.69 | 79.62 | 73.88 | 81.88 | 82.87 | 77.58 | 73.57 |
| Jiang and Conrath | 79.48 | 85.47 | 79.72 | 72.82 | 81.35 | 81.45 | 77.57 | 74.81 |
| GraSM | 87.97 | 86.58 | 81.83 | 75.12 | 87.59 | 83.53 | 80.38 | 73.02 |
| AIC | 79.78 | 86.05 | 79.37 | 74.10 | 81.52 | 83.50 | 77.66 | 73.69 |
| clusteredGO | 79.15 | 84.94 | 80.05 | 72.53 | 81.03 | 82.21 | 76.96 | 74.45 |
| simGIC | 69.33 | 64.19 | 62.34 | 60.66 | 69.16 | 63.56 | 62.47 | 59.32 |
| simDEF | 88.74 | 87.12 | 82.72 | 76.19 | 88.53 | 85.12 | 81.24 | 74.21 |
| AicInferSentGO | 88.83 | 87.14 | 82.04 | 75.96 | 88.31 | 84.54 | 81.04 | 74.45 |
| deepSimDEF (random emb.) | 90.69 | 87.63 | 86.71 | 85.13 | 89.91 | 87.12 | 86.51 | 84.54 |
| deepSimDEF (LSA emb.) | 93.68 | 90.60 | 89.12 | 87.80 | 93.12 | 90.19 | 88.26 | 87.38 |
Spearman correlation of FS measures versus yeast sequence homology
| LRBS | RRBS | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| ALL | BP | CC | MF | ALL | BP | CC | MF | ||
| Resnik | MAX | 0.7089 | 0.7269 | 0.5337 | 0.4743 | 0.6088 | 0.6378 | 0.5132 | 0.3514 |
| BMA | 0.6066 | 0.5862 | 0.4771 | 0.5193 | 0.5236 | 0.5312 | 0.4752 | 0.4278 | |
| Lin | MAX | 0.3831 | 0.6463 | 0.3763 | 0.6026 | 0.2512 | 0.4900 | 0.2892 | 0.4085 |
| BMA | 0.5952 | 0.5756 | 0.4490 | 0.5866 | 0.4862 | 0.4919 | 0.4048 | 0.4478 | |
| Jiang and Conrath | MAX | 0.3500 | 0.6504 | 0.2997 | 0.4975 | 0.1814 | 0.5030 | 0.2325 | 0.2845 |
| BMA | 0.6190 | 0.6298 | 0.4733 | 0.5595 | 0.4958 | 0.5317 | 0.4126 | 0.3978 | |
| GraSM | MAX | 0.3465 | 0.6584 | 0.2978 | 0.4895 | 0.1799 | 0.5002 | 0.2231 | 0.2911 |
| BMA | 0.6277 | 0.6258 | 0.4659 | 0.5651 | 0.4990 | 0.5240 | 0.4154 | 0.3944 | |
| AIC | MAX | 0.3434 | 0.6423 | 0.3094 | 0.5044 | 0.1873 | 0.5099 | 0.2275 | 0.2927 |
| BMA | 0.6197 | 0.6215 | 0.4727 | 0.5694 | 0.5028 | 0.5348 | 0.4047 | 0.3920 | |
| clusteredGO | MAX | 0.3591 | 0.6449 | 0.3027 | 0.4970 | 0.1720 | 0.5061 | 0.2277 | 0.2865 |
| BMA | 0.6198 | 0.6282 | 0.4784 | 0.5504 | 0.4998 | 0.5237 | 0.4081 | 0.3917 | |
| simGIC | 0.3140 | 0.6036 | 0.2519 | 0.4404 | 0.1237 | 0.4643 | 0.1703 | 0.2296 | |
| simDEF | MAX | 0.4505 | 0.7339 | 0.4082 | 0.5946 | 0.2770 | 0.6126 | 0.3396 | 0.3845 |
| BMA | 0.7252 | 0.7308 | 0.5661 | 0.6637 | 0.5974 | 0.6418 | 0.5079 | 0.4880 | |
| AicInferSentGO | MAX | 0.4499 | 0.7314 | 0.4073 | 0.6018 | 0.2886 | 0.5996 | 0.3289 | 0.3841 |
| BMA | 0.7252 | 0.7354 | 0.5775 | 0.6681 | 0.6049 | 0.6412 | 0.5220 | 0.5076 | |
| deepSimDEF (random emb.) | 0.7590 | 0.6600 | 0.5918 | 0.7102 | 0.6813 | 0.6050 | 0.5438 | 0.6846 | |
| deepSimDEF (LSA emb.) | 0.8078 | 0.7532 | 0.6077 | 0.7844 | 0.7255 | 0.6498 | 0.5409 | 0.6943 | |
Spearman correlation of FS measures versus human sequence homology
| LRBS | RRBS | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| ALL | BP | CC | MF | ALL | BP | CC | MF | ||
| Resnik | MAX | 0.4941 | 0.4768 | 0.2531 | 0.5834 | 0.4530 | 0.4605 | 0.2962 | 0.5222 |
| BMA | 0.5095 | 0.4606 | 0.3288 | 0.5262 | 0.5196 | 0.4525 | 0.4212 | 0.5332 | |
| Lin | MAX | 0.3087 | 0.5149 | 0.3500 | 0.3231 | 0.2820 | 0.5065 | 0.3374 | 0.2789 |
| BMA | 0.5052 | 0.5081 | 0.3970 | 0.3777 | 0.5278 | 0.5035 | 0.4618 | 0.4194 | |
| Jiang and Conrath | MAX | 0.2933 | 0.4981 | 0.2884 | 0.3734 | 0.2153 | 0.4865 | 0.2531 | 0.2730 |
| BMA | 0.4847 | 0.5418 | 0.3995 | 0.3714 | 0.5280 | 0.5492 | 0.4506 | 0.3894 | |
| GraSM | MAX | 0.2841 | 0.3787 | 0.2909 | 0.5071 | 0.2148 | 0.2713 | 0.2617 | 0.4876 |
| BMA | 0.4884 | 0.3636 | 0.3907 | 0.5449 | 0.5311 | 0.3992 | 0.4517 | 0.5403 | |
| AIC | MAX | 0.2931 | 0.3650 | 0.2797 | 0.4952 | 0.2146 | 0.2655 | 0.2449 | 0.4941 |
| BMA | 0.4875 | 0.3737 | 0.4089 | 0.5514 | 0.5247 | 0.3923 | 0.4483 | 0.5563 | |
| clusteredGO | MAX | 0.2944 | 0.3830 | 0.2788 | 0.4927 | 0.2101 | 0.2735 | 0.2589 | 0.4840 |
| BMA | 0.4918 | 0.3731 | 0.3960 | 0.5330 | 0.5271 | 0.3801 | 0.4575 | 0.5561 | |
| simGIC | 0.2356 | 0.3296 | 0.2307 | 0.4472 | 0.1530 | 0.2279 | 0.2013 | 0.4281 | |
| simDEF | MAX | 0.3505 | 0.4383 | 0.3578 | 0.5528 | 0.2850 | 0.3374 | 0.3129 | 0.5412 |
| BMA | 0.5541 | 0.4335 | 0.4320 | 0.6011 | 0.5823 | 0.4446 | 0.4956 | 0.5944 | |
| AicInferSentGO | MAX | 0.3574 | 0.4387 | 0.3428 | 0.5515 | 0.2723 | 0.3288 | 0.3138 | 0.5430 |
| BMA | 0.5440 | 0.4250 | 0.4383 | 0.6011 | 0.5897 | 0.4522 | 0.4956 | 0.5927 | |
| deepSimDEF (random emb.) | 0.6437 | 0.5241 | 0.4232 | 0.6268 | 0.6425 | 0.5222 | 0.4986 | 0.6346 | |
| deepSimDEF (LSA emb.) | 0.6723 | 0.5300 | 0.4480 | 0.6623 | 0.6514 | 0.5306 | 0.5126 | 0.6432 | |
Fig. 1.Pearson’s correlation results for the prediction of gene–gene co-expressions in yeast data
Fig. 2.Pearson’s correlation results for the prediction of gene–gene co-expressions in human data
Fig. 3.Definition-based embedding model of the Gene Ontology terms
Fig. 4.Paired single-channel deepSimDEF network architecture for BP
Fig. 5.Paired multi-channel deepSimDEF network architecture