| Literature DB >> 33323113 |
Guangjie Zhou1,2, Jun Wang2, Xiangliang Zhang3, Maozu Guo4, Guoxian Yu5,6,7.
Abstract
BACKGROUND: Maize (Zea mays ssp. mays L.) is the most widely grown and yield crop in the world, as well as an important model organism for fundamental research of the function of genes. The functions of Maize proteins are annotated using the Gene Ontology (GO), which has more than 40000 terms and organizes GO terms in a direct acyclic graph (DAG). It is a huge challenge to accurately annotate relevant GO terms to a Maize protein from such a large number of candidate GO terms. Some deep learning models have been proposed to predict the protein function, but the effectiveness of these approaches is unsatisfactory. One major reason is that they inadequately utilize the GO hierarchy.Entities:
Keywords: Convolutional neural network; GO terms; Gene ontology; Graph convolutional network; Maize; Protein function prediction
Year: 2020 PMID: 33323113 PMCID: PMC7739465 DOI: 10.1186/s12859-020-03745-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1An example of hierarchical GO annotations of proteins. ‘Zm00008a000131-p01’ is a Maize protein, it is annotated with ‘GO:0005886’. According to the True Path Rule, the protein ‘Zm00008a000131-p01’ is also annotated with their ancestor terms (‘GO:0071944’, ‘GO:0044464’, ‘GO:0005623’, ‘GO:0016020’ and ‘GO:0005575’)
Experimental results of predicting GO annotations of Maize and Human genome
| Maize | Human | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PR50 | AUC | AUPRC | PR50 | AUC | AUPRC | ||||||
| BP | DeepGOA | ||||||||||
| DeepGOPlus | 76.47 | 89.52 | 69.64 | 1.2193 | 59.61 | 53.61 | 68.74 | 60.75 | 20.0152 | 36.23 | |
| DeepGO | 64.41 | 85.39 | 62.91 | 1.2586 | 59.49 | 50.25 | 63.85 | 57.13 | 20.6061 | 32.71 | |
| Deepred | 67.39 | 84.95 | 63.02 | 1.3509 | 58.21 | 55.60 | 68.33 | 56.96 | 19.9538 | 38.07 | |
| BLAST | 32.61 | 71.77 | 28.96 | 1.1745 | 61.10 | 46.50 | 57.72 | 48.94 | 20.2695 | 33.92 | |
| Navie | 27.08 | 49.93 | 27.67 | 1.8957 | 29.32 | 51.94 | 49.98 | 56.61 | 20.4729 | 34.45 | |
| CC | DeepGOA | 75.78 | 75.69 | ||||||||
| DeepGOPlus | 91.21 | 82.51 | 0.8105 | 70.82 | 50.18 | 65.15 | 48.70 | 4.9488 | 62.75 | ||
| DeepGO | 86.57 | 82.91 | 72.07 | 0.7759 | 71.08 | 43.60 | 69.51 | 44.81 | 5.1828 | 58.86 | |
| Deepred | 84.77 | 86.74 | 73.86 | 0.6952 | 69.85 | 44.58 | 44.58 | 5.8166 | 61.77 | ||
| BLAST | 39.48 | 70.82 | 39.18 | 0.7904 | 62.02 | 21.25 | 56.27 | 26.91 | 5.0593 | 44.18 | |
| Navie | 48.14 | 49.98 | 43.74 | 1.2458 | 49.84 | 36.27 | 48.69 | 37.70 | 5.4474 | 55.15 | |
| MF | DeepGOA | 1.7024 | |||||||||
| DeepGOPlus | 72.70 | 83.67 | 64.42 | 51.25 | 67.84 | 81.86 | 69.38 | 4.8426 | 46.82 | ||
| DeepGO | 68.78 | 88.22 | 59.91 | 1.8551 | 52.82 | 54.56 | 75.98 | 62.47 | 5.2581 | 40.43 | |
| Deepred | 62.89 | 89.73 | 57.65 | 2.287 | 45.49 | 62.68 | 81.30 | 62.01 | 5.1711 | 45.14 | |
| BLAST | 27.40 | 67.76 | 32.92 | 1.8274 | 51.40 | 42.33 | 62.34 | 46.11 | 4.9195 | 41.07 | |
| Navie | 28.44 | 51.04 | 28.84 | 2.7430 | 26.13 | 46.86 | 49.87 | 52.77 | 5.7466 | 32.59 |
The best results for each metric are in boldface
The prediction of the Maize protein (Zm00008a011322-p01) with different methods
| Real annotation | DeepGOA | DeepGOplus | DeepGO | Deepred | |
|---|---|---|---|---|---|
| CC | GO:0005622 | GO:0005622 | GO:0005622 | GO:0005622 | GO:0005622 |
| GO:0044464 | GO:0044464 | GO:0044464 | GO:0044464 | GO:0044464 | |
| GO:0005623 | GO:0005623 | GO:0005623 | GO:0005623 | GO:0005623 | |
| GO:0044424 | GO:0044424 | GO:0044424 | GO:0044424 | ||
| GO:0043229 | GO:0043229 | GO:0005737 | GO:0043229 | ||
| GO:0005737 | GO:0005737 | GO:0005737 | |||
| GO:0043231 | GO:0043231 | GO:0043231 | |||
| GO:0043227 | GO:0043227 | ||||
| GO:0005634 |
Prediction results of DeepGOA and its variants
| AUC | AUPRC | ||||
|---|---|---|---|---|---|
| BP | DeepGOA | 69.79 | |||
| DeepGOA-GO | 69.72 | 60.69 | 20.1579 | 36.79 | |
| DeepGOA-Label | 61.72 | 20.2206 | 38.14 | ||
| DeepGOA-CNN | 69.19 | 61.06 | 20.2332 | 36.12 | |
| CC | DeepGOA | 75.69 | 49.97 | ||
| DeepGOA-GO | 75.94 | 48.64 | 4.9127 | 62.43 | |
| DeepGOA-Label | 4.9707 | 62.67 | |||
| DeepGOA-CNN | 74.85 | 49.19 | 5.0134 | 61.43 | |
| MF | DeepGOA | ||||
| DeepGOA-GO | 81.75 | 70.28 | 4.8201 | 46.98 | |
| DeepGOA-Label | 81.46 | 70.81 | 4.9661 | 46.88 | |
| DeepGOA-CNN | 77.65 | 63.12 | 5.2867 | 41.54 |
The best results for each metric are in boldface
Fig. 2The AUC and AUPRC under different values of low-dimensional vector dimension
Fig. 3The network architecture of DeepGOA. The upper yellow subnetwork is the convolutional network part. The amino acids are extracted by convolution kernels of different sizes, and the fully connected layer is used to learn the mapping from sequence features to semantic representations of GO terms. The lower blue subnetwork is the graph convolution part, it uses the GO hierarchy and empirical correlations between GO terms stored in to learn the semantic representation of each GO term. The dot product is finally used to guide the mapping between proteins and GO terms and to reversely adjust the representations of proteins and GO terms. In this way, the associations between GO terms and proteins are also predicted