| Literature DB >> 32391061 |
Yingwen Zhao1, Jun Wang1, Jian Chen2, Xiangliang Zhang3, Maozu Guo4, Guoxian Yu1,3.
Abstract
Annotating the functional properties of gene products, i.e., RNAs and proteins, is a fundamental task in biology. The Gene Ontology database (GO) was developed to systematically describe the functional properties of gene products across species, and to facilitate the computational prediction of gene function. As GO is routinely updated, it serves as the gold standard and main knowledge source in functional genomics. Many gene function prediction methods making use of GO have been proposed. But no literature review has summarized these methods and the possibilities for future efforts from the perspective of GO. To bridge this gap, we review the existing methods with an emphasis on recent solutions. First, we introduce the conventions of GO and the widely adopted evaluation metrics for gene function prediction. Next, we summarize current methods of gene function prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities. Although many efforts have improved performance by harnessing GO, we conclude that there remain many largely overlooked but important topics for future research.Entities:
Keywords: directed acyclic graph; functional genomics; gene function prediction; gene ontology; inter-relationships; semantic similarity
Year: 2020 PMID: 32391061 PMCID: PMC7193026 DOI: 10.3389/fgene.2020.00400
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Snapshot of a directed acyclic graph from Gene Ontology. Each ontological term is represented by an alphanumeric identifier, and its biological function is described by controlled words. These GO terms are hierarchically connected with different types of directed edges. The level of a GO term in the DAG is determined by its furthest distance to the root GO term (“GO:0008150” in BPO, “GO:0005575” in CCO, and “GO:0003674” in MFO). For example, “GO:0048087” is a direct child and also a grandson of “GO:0048066,” and its furthest distance to the root term is 5, while “GO:0006856” is another direct child of “GO:0048066” and its furthest distance to the root is 4, so “GO:0006856” is plotted at a higher level than “GO:0048087”.
Figure 2The number of published papers related to GO-based gene function prediction over 10 years.
Figure 3Three issues in gene function prediction (left), and categorization of existing computational solutions based on GO (right).
Figure 4Exemplar tasks of gene function prediction, which include predicting missing, negative, and noisy annotations.
Categories of solutions that use different inter-relations between GO terms.
| Predicting | ProWL (Yu et al., | Flat | Weak label learning |
| ProDM (Yu et al., | Flat | Weak label learning | |
| ProHG (Liu et al., | Flat | Random walks | |
| ITSS (Tao et al., | Hierarchical | Semantic similarity | |
| NtN (Done et al., | Hierarchical | Singular value decomposition | |
| dRW (Yu et al., | Hierarchical | Random walks | |
| PILL (Yu et al., | Hierarchical | Random walks | |
| DeepGO (Kulmanov et al., | Hierarchical | Deep learning | |
| NewGOA (Yu et al., | Hierarchical | Bi-random walks | |
| AsyRW (Zhao et al., | Hierarchical | Bi-random walks | |
| Identifying | NoisyGOA (Lu et al., | Hierarchical | Semantic-based kNN |
| NoGOA (Yu et al., | Hierarchical | Sparse representation | |
| NFA (Lu et al., | Hierarchical | Sparse representation | |
| Selecting | ALBias (Youngs et al., | Flat | Bayesian model |
| ProPN (Fu et al., | Flat | Random walks | |
| SNOB (Youngs et al., | Hierarchical | Bayesian model | |
| NETL (Youngs et al., | Hierarchical | Topic model | |
| IFDR (Yu et al., | Hierarchical | Semi-supervised linear regression | |
| NegGOA (Fu et al., | Hierarchical | Random walks |
Exemplar solutions based on compressing GO terms.
| Matrix factorization | ProCMF (Yu et al., | Flat |
| clusDCA (Wang et al., | Hierarchical | |
| NtN (Done et al., | Hierarchical | |
| clusDCA (Wang et al., | Hierarchical | |
| ProsNet (Wang et al., | Hierarchical | |
| IFDR (Yu et al., | Hierarchical | |
| NMFGO (Yu et al., | Hierarchical | |
| ZOMF (Zhao et al., | Hierarchical | |
| LSDRs (Makrodimitris et al., | Hierarchical | |
| Hash learning | HashGO (Yu et al., | Hierarchical |
| HPHash (Zhao et al., | Hierarchical |