| Literature DB >> 28155711 |
Guoxian Yu1, Wei Luo2, Guangyuan Fu2, Jun Wang2.
Abstract
BACKGROUND: Gene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them.Entities:
Keywords: GO annotations; Gene function prediction; Interspecies; Semantic similarity
Mesh:
Year: 2016 PMID: 28155711 PMCID: PMC5260010 DOI: 10.1186/s12918-016-0361-5
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Fig. 1GO annotations of a human gene a and a mouse gene b. GO terms in white ellipses are the currently available annotations of the gene, and the terms in the gray ellipses are the missing annotations. The human gene should be annotated with ‘GO:f’ and mouse gene missing annotations of ‘GO:e’ and ‘GO:g’. Annotations of these two genes are different but also complementary for each other
Statistics of GO annotations of genes from four species
| Species |
| CC | MF | BP | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| history | recent |
| history | recent |
| history | recent |
| ||
| Human | 19158 | 231057 | 298776 | 1160 | 115066 | 153722 | 1999 | 745989 | 920663 | 8696 |
| Mouse | 21357 | 164539 | 291488 | 1193 | 95511 | 158471 | 2033 | 622591 | 933356 | 9304 |
| Danio rerio | 18776 | 27434 | 77539 | 627 | 24652 | 59710 | 1174 | 172557 | 301163 | 4382 |
| Arabidopsis thaliana | 24532 | 114777 | 150144 | 528 | 42720 | 64737 | 1189 | 197277 | 281562 | 3305 |
♯genes is the number of genes in the recent GOA file (archived date: 2016-01-04), ♯terms is the number of involved terms. ‘history’ is the number of GO annotations of genes from historical GOA file (archived date: 2014-01-20), ‘recent’ is the number of GO annotations of genes from recent GOA files
Prediction on archived GOA files using BMA (see Eq. (4))
| MicroAvgF1 | MacroAvgF1 | 1-RankLoss | Fmax | RAccuracy | ||
|---|---|---|---|---|---|---|
| CC | H →H | 0.8328 | 0.7203 | 0.8780 | 0.8747 | 0.1542 |
| M →H | 0.8368 | 0.7125 | 0.8751 | 0.8750 | 0.1602 | |
| M+H →H |
|
|
|
|
| |
| D →H | 0.8316 | 0.7197 | 0.8729 | 0.8738 | 0.1536 | |
| D+H →H | 0.8524 | 0.7236 | 0.9068 | 0.8625 | 0.2530 | |
| A →H | 0.8259 | 0.7068 | 0.8632 | 0.8532 | 0.1369 | |
| A+H →H | 0.8363 | 0.7222 | 0.8797 | 0.8773 | 0.1717 | |
| M →M | 0.7712 | 0.6084 | 0.8588 | 0.8571 | 0.1936 | |
| H →M | 0.7676 | 0.6192 | 0.8580 | 0.8339 | 0.1864 | |
| H+M →M |
|
|
|
|
| |
| D →M | 0.7718 | 0.6105 | 0.8416 | 0.8082 | 0.1868 | |
| D+M →M | 0.8003 | 0.6160 | 0.8926 | 0.8315 | 0.2963 | |
| A →M | 0.7660 | 0.6341 | 0.8444 | 0.8404 | 0.1761 | |
| A+M →M | 0.7713 | 0.6143 | 0.8606 | 0.8523 | 0.1942 | |
| MF | H →H | 0.8523 | 0.8179 | 0.9192 | 0.8915 | 0.1416 |
| M →H | 0.8513 | 0.8170 | 0.9145 | 0.8905 | 0.1311 | |
| M+H →H |
|
|
|
|
| |
| D →H | 0.8502 | 0.8174 | 0.9123 | 0.8909 | 0.1295 | |
| D+H →H | 0.8668 | 0.8355 | 0.9523 | 0.8742 | 0.2259 | |
| A →H | 0.8416 | 0.8207 | 0.8964 | 0.8968 | 0.0793 | |
| A+H →H | 0.8490 | 0.8151 | 0.9116 | 0.8894 | 0.1227 | |
| M →M | 0.7654 | 0.6849 | 0.8755 | 0.8656 | 0.1344 | |
| H →M | 0.7601 | 0.6821 | 0.8797 | 0.8545 | 0.1396 | |
| H+M →M | 0.7784 |
|
|
| 0.1801 | |
| D →M | 0.7607 | 0.6891 | 0.8592 | 0.8369 | 0.1287 | |
| D+M →M |
| 0.7072 | 0.9200 | 0.8580 |
| |
| A →M | 0.7534 | 0.6880 | 0.8553 | 0.8607 | 0.0876 | |
| A+M →M | 0.7639 | 0.6716 | 0.8712 | 0.8551 | 0.1264 | |
| BP | H →H | 0.8373 | 0.7979 | 0.9507 | 0.8012 | 0.2044 |
| M →H | 0.8346 | 0.7943 | 0.9489 | 0.7981 | 0.1912 | |
| M+H →H |
|
|
|
|
| |
| D →H | 0.8368 | 0.8027 | 0.9568 | 0.8031 | 0.2020 | |
| D+H →H | 0.8368 | 0.7978 | 0.9496 | 0.8093 | 0.2018 | |
| A →H | 0.8290 | 0.7903 | 0.9239 | 0.7799 | 0.1641 | |
| A+H →H | 0.8325 | 0.7839 | 0.9308 | 0.7944 | 0.1809 | |
| M →M | 0.7812 | 0.6965 | 0.9350 | 0.7905 | 0.1855 | |
| H →M | 0.7842 | 0.6987 | 0.9401 | 0.7863 | 0.1965 | |
| H+M →M |
|
|
|
|
| |
| D →M | 0.7816 | 0.7036 | 0.9348 | 0.7885 | 0.1867 | |
| D+M →M | 0.7830 | 0.7108 | 0.9423 | 0.7875 | 0.1923 | |
| A →M | 0.7768 | 0.6929 | 0.9027 | 0.7594 | 0.1692 | |
| A+M →M | 0.7779 | 0.6871 | 0.9183 | 0.7807 | 0.1733 |
H →H directly uses GO annotations of Human to predict annotations of Human genes. M →H only employs annotations of genes from Mouse to predict annotations of Human genes. M+H →H uses GO annotations of genes from Mouse and Human to predict annotations of Human genes. D+H →H uses annotations of genes from Danio rerio and Human to predict annotations of Human genes. A+H →H uses annotations of genes from Arabidopsis thaliana and Human to predict annotations of Human genes. M →M, H+M →M, D+M →M and A+M →M follow the similar protocol, but predict annotations of Mouse genes. The data in boldface is the statistically significant best among these comparing methods for a particular target species, and the significance is checked by paired t-test at 95% confidence intervals
Prediction on archived GOA files using TO (see Eq. (5))
| MicroAvgF1 | MacroAvgF1 | 1-RankLoss | Fmax | RAccuracy | ||
|---|---|---|---|---|---|---|
| CC | H →H | 0.8374 | 0.7212 | 0.8968 | 0.8729 | 0.1773 |
| M →H | 0.8351 | 0.7241 | 0.8969 | 0.8743 | 0.1762 | |
| M+H →H |
|
|
|
|
| |
| D →H | 0.8351 | 0.7322 | 0.8941 | 0.8693 | 0.1662 | |
| D+H →H | 0.8512 | 0.7476 | 0.9422 | 0.8654 | 0.2469 | |
| A →H | 0.8317 | 0.6982 | 0.8832 | 0.8860 | 0.1488 | |
| A+H →H | 0.8366 | 0.7223 | 0.8962 | 0.8726 | 0.1732 | |
| M →M | 0.7765 | 0.6075 | 0.8826 | 0.8526 | 0.2122 | |
| H →M | 0.7805 | 0.6130 | 0.8836 | 0.8295 | 0.2166 | |
| H+M →M |
|
|
|
|
| |
| D →M | 0.7726 | 0.6142 | 0.8659 | 0.8320 | 0.2092 | |
| D+M →M | 0.7993 | 0.6357 | 0.9252 | 0.8384 | 0.2928 | |
| A →M | 0.7758 | 0.6278 | 0.8700 | 0.8324 | 0.2105 | |
| A+M →M | 0.7770 | 0.6088 | 0.8807 | 0.8447 | 0.2142 | |
| MF | H →H | 0.8569 | 0.8228 | 0.9293 | 0.8952 | 0.1687 |
| M →H | 0.8542 | 0.8213 | 0.9262 | 0.8941 | 0.1527 | |
| M+H →H |
|
|
|
|
| |
| D →H | 0.8524 | 0.8348 | 0.9413 | 0.8717 | 0.1426 | |
| D+H →H | 0.8606 | 0.8349 | 0.9588 | 0.8979 | 0.1901 | |
| A →H | 0.8456 | 0.8225 | 0.9124 | 0.8941 | 0.1026 | |
| A+H →H | 0.8535 | 0.8181 | 0.9260 | 0.8933 | 0.1489 | |
| M →M | 0.7756 | 0.6946 | 0.8985 | 0.8692 | 0.1697 | |
| H →M | 0.7804 | 0.6957 | 0.9096 | 0.8569 | 0.1677 | |
| H+M →M |
|
|
|
|
| |
| D →M | 0.7695 | 0.6811 | 0.8963 | 0.8602 | 0.1538 | |
| D+M →M |
| 0.7082 | 0.9356 | 0.8731 |
| |
| A →M | 0.7635 | 0.6941 | 0.8816 | 0.8588 | 0.1249 | |
| A+M →M | 0.7752 | 0.6840 | 0.8993 | 0.8616 | 0.1683 | |
| BP | H →H | 0.8460 | 0.8019 | 0.9605 | 0.8729 | 0.2472 |
| M →H | 0.8428 | 0.7998 | 0.9586 | 0.7818 | 0.2316 | |
| M+H →H |
|
|
|
|
| |
| D →H | 0.8385 | 0.8036 | 0.9605 | 0.7901 | 0.2101 | |
| D+H →H | 0.8443 | 0.8016 | 0.9613 | 0.7877 | 0.2387 | |
| A →H | 0.8314 | 0.7943 | 0.9333 | 0.7591 | 0.1755 | |
| A+H →H | 0.8389 | 0.7933 | 0.9520 | 0.7813 | 0.2120 | |
| M →M | 0.7960 | 0.7101 | 0.9519 | 0.7813 | 0.2405 | |
| H →M | 0.7980 | 0.7073 | 0.9532 | 0.7767 | 0.2481 | |
| H+M →M |
|
|
|
|
| |
| D →M | 0.7886 | 0.7137 | 0.9508 | 0.7756 | 0.2129 | |
| D+M →M | 0.7832 | 0.7059 | 0.9410 | 0.7765 | 0.2318 | |
| A →M | 0.7795 | 0.7020 | 0.9158 | 0.7373 | 0.1791 | |
| A+M →M | 0.7723 | 0.6923 | 0.9326 | 0.7716 | 0.2106 |
H →H directly uses GO annotations of Human to predict annotations of Human genes. M →H only employs annotations of genes from Mouse to predict annotations of Human genes. M+H →H uses GO annotations of genes from Mouse and Human to predict annotations of Human genes. D+H →H uses annotations of genes from Danio rerio and Human to predict annotations of Human genes. A+H →H uses annotations of genes from Arabidopsis thaliana and Human to predict annotations of Human genes. M →M, H+M →M, D+M →M and A+M →M follow the similar protocol, but predict annotations of Mouse genes. The data in boldface is the statistically significant best among these comparing methods for a particular target species, and the significance is checked by paired t-test at 95% confidence intervals
Prediction on archived GOA files using BMA (see Eq. (6)) by combining the GO annotations in CC, MF and BP together and then evaluating in each sub-ontology
| MicroAvgF1 | MacroAvgF1 | 1-RankLoss | Fmax | RAccuracy | ||
|---|---|---|---|---|---|---|
| CC | H →H | 0.8700 | 0.4416 | 0.9682 | 0.8619 | 0.2057 |
| M →H | 0.8550 | 0.4407 | 0.9310 | 0.8551 | 0.1963 | |
| M+H →H |
|
|
|
|
| |
| D →H | 0.8543 | 0.4372 | 0.9387 | 0.8610 | 0.1626 | |
| D+H →H | 0.8666 | 0.4412 | 0.9652 | 0.8773 | 0.1852 | |
| A →H | 0.8424 | 0.4388 | 0.8862 | 0.8595 | 0.1428 | |
| A+H →H | 0.8673 | 0.4358 | 0.9518 | 0.8761 | 0.1895 | |
| M →M | 0.8193 | 0.4430 | 0.9487 | 0.8481 | 0.1556 | |
| H →M | 0.8155 | 0.4416 | 0.9475 | 0.8514 | 0.1582 | |
| H+M →M |
|
|
|
|
| |
| D →M | 0.8085 | 0.4433 | 0.9289 | 0.8490 | 0.1446 | |
| D+M →M | 0.8170 | 0.4474 | 0.9460 | 0.8560 | 0.1452 | |
| A →M | 0.7963 | 0.4258 | 0.9121 | 0.8160 | 0.1157 | |
| A+M →M | 0.8162 | 0.4377 | 0.9241 | 0.8385 | 0.1410 | |
| MF | H →H | 0.8539 | 0.4287 | 0.9569 | 0.8394 | 0.1983 |
| M →H | 0.8514 | 0.4282 | 0.9468 | 0.8352 | 0.1721 | |
| M+H →H |
|
|
|
|
| |
| D →H | 0.8513 | 0.4232 | 0.9507 | 0.8290 | 0.1358 | |
| D+H →H | 0.8532 | 0.4294 | 0.9540 | 0.8451 | 0.1945 | |
| A →H | 0.8435 | 0.4217 | 0.9060 | 0.8049 | 0.0921 | |
| A+H →H | 0.8453 | 0.4239 | 0.9394 | 0.8187 | 0.1508 | |
| M →M | 0.7980 | 0.4015 | 0.9426 | 0.8066 | 0.1528 | |
| H →M | 0.7963 | 0.3927 | 0.9246 | 0.8001 | 0.1501 | |
| H+M →M |
|
|
|
|
| |
| D →M | 0.7596 | 0.3936 | 0.9096 | 0.7748 | 0.1108 | |
| D+M →M | 0.7989 | 0.4053 | 0.9427 | 0.8216 | 0.1563 | |
| A →M | 0.7452 | 0.3883 | 0.8856 | 0.7528 | 0.0815 | |
| A+M →M | 0.7949 | 0.3984 | 0.9274 | 0.7829 | 0.1395 | |
| BP | H →H | 0.8376 | 0.7977 | 0.9522 | 0.8023 | 0.2058 |
| M →H | 0.8320 | 0.7861 | 0.9267 | 0.8134 | 0.1791 | |
| M+H →H |
|
|
|
| 0.2421 | |
| D →H | 0.8374 | 0.7917 | 0.9513 | 0.8041 | 0.1948 | |
| D+H →H | 0.8370 | 0.7978 | 0.9502 | 0.8098 | 0.2032 | |
| A →H | 0.8248 | 0.7840 | 0.8998 | 0.8119 | 0.1433 | |
| A+H →H | 0.8322 | 0.7830 | 0.9328 | 0.7941 | 0.1796 | |
| M →M | 0.7814 | 0.6968 | 0.9372 | 0.7916 | 0.1864 | |
| H →M | 0.7892 | 0.6884 | 0.9384 | 0.7901 | 0.1897 | |
| H+M →M |
|
|
|
|
| |
| D →M | 0.7829 | 0.6999 | 0.9364 | 0.7853 | 0.1818 | |
| D+M →M | 0.7820 | 0.7033 | 0.9365 | 0.7910 | 0.1883 | |
| A →M | 0.7694 | 0.6897 | 0.9023 | 0.7769 | 0.1417 | |
| A+M →M | 0.7779 | 0.6874 | 0.9199 | 0.7822 | 0.1732 |
H →H directly uses GO annotations of Human to predict annotations of Human genes. M →H only employs annotations of genes from Mouse to predict annotations of Human genes. M+H →H uses GO annotations of genes from Mouse and Human to predict annotations of Human genes. D+H →H uses annotations of genes from Danio rerio and Human to predict annotations of Human genes. A+H →H uses annotations of genes from Arabidopsis thaliana and Human to predict annotations of Human genes. M →M, H+M →M, D+M →M and A+M →M follow the similar protocol, but predict annotations of Mouse genes. The data in boldface is the statistically significant best among these comparing methods for a particular target species, and the significance is checked by paired t-test at 95% confidence intervals
Prediction on simulated missing GO annotations under BMA in CC sub-ontology
|
| MicroAvgF1 | MacroAvgF1 | 1-RankLoss | Fmax | RAccuracy | |
|---|---|---|---|---|---|---|
| 1 | H →H | 96.03 ±0.09 | 86.84 ±0.19 | 96.49 ±0.02 | 95.36 ±0.09 | 17.12 ±1.89 |
| M+H →H |
|
|
|
|
| |
| M →M | 95.48 ±0.04 | 86.19 ±0.22 | 93.83 ±0.01 | 94.98 ±0.04 | 12.22 ±0.73 | |
| H+M →M |
|
|
|
|
| |
| 2 | H →H | 89.09 ±0.02 | 67.85 ±0.37 | 86.95 ±0.06 | 87.84 ±0.02 | 23.04 ±0.17 |
| M+H →H |
|
|
|
|
| |
| M →M | 87.31 ±0.06 | 66.78 ±0.48 | 82.46 ±0.04 | 85.95 ±0.06 | 16.69 ±0.41 | |
| H+M →M |
|
|
|
|
| |
| 3 | H →H | 82.54 ±0.06 | 53.68 ±0.25 | 79.52 ±0.02 | 81.76 ±0.06 | 25.08 ±0.27 |
| M+H →H |
|
|
| 83.74 ±0.05 |
| |
| M →M | 81.45 ±0.05 | 52.84 ±0.55 | 76.54 ±0.07 | 77.69 ±0.05 | 24.71 ±0.19 | |
| H+M →M |
|
|
|
|
|
q is the number of simulated missing annotations of a gene. H →H directly uses GO annotations of Human to predict annotations of Human genes. M+H →H uses GO annotations of genes from Mouse and Human to predict annotations of Human genes. M →M and H+M →M follow the similar protocol, but make prediction for Mouse genes. The data in boldface is the statistically significant best among these comparing methods for a particular target species, and the significance is checked by paired t-test at 95% confidence intervals
Prediction on simulated missing GO annotations under BMA in MF sub-ontology
|
| MicroAvgF1 | MacroAvgF1 | 1-RankLoss | Fmax | RAccuracy | |
|---|---|---|---|---|---|---|
| 1 | H →H | 91.71 ±0.02 | 82.31 ±0.22 | 90.98 ±0.13 | 91.37 ±0.02 | 10.56 ±0.21 |
| M+H →H |
|
|
|
|
| |
| M →M | 92.01 ±0.10 | 80.50 ±0.41 | 93.04 ±0.02 | 92.71 ±0.10 | 7.85 ±1.15 | |
| H+M →M |
|
|
|
|
| |
| 2 | H →H | 80.69 ±0.05 | 57.31 ±0.41 | 80.40 ±0.17 | 80.04 ±0.05 | 25.12 ±0.21 |
| M+H →H |
|
|
|
|
| |
| M →M | 79.02 ±0.01 | 54.82 ±0.41 | 78.43 ±0.02 | 79.26 ±0.01 | 15.78 ±0.05 | |
| H+M →M |
|
|
|
|
| |
| 3 | H →H | 70.09 ±0.03 | 40.45 ±0.42 | 68.16 ±0.03 | 70.32 ±0.03 | 23.70 ±0.08 |
| M + |
|
|
|
|
| |
| M →M | 68.89 ±0.08 | 39.27 ±0.25 | 65.40 ±0.10 | 68.25 ±0.08 | 20.59 ±0.21 | |
| H+M →M |
|
|
|
|
|
q is the number of simulated missing annotations of a gene. H →H directly uses GO annotations of Human to predict annotations of Human genes. M+H →H uses GO annotations of genes from Mouse and Human to predict annotations of Human genes. M →M and H+M →M follow the similar protocol, but make prediction for Mouse genes. The data in boldface is the statistically significant best among these comparing methods for a particular target species, and the significance is checked by paired t-test at 95% confidence intervals