| Literature DB >> 27586009 |
Jiajie Peng1,2, Hongxiang Li3, Yongzhuang Liu3, Liran Juan4, Qinghua Jiang4, Yadong Wang5, Jin Chen6,7.
Abstract
BACKGROUND: The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation.Entities:
Keywords: Gene Ontology; Semantic similarity; Web tool
Mesh:
Year: 2016 PMID: 27586009 PMCID: PMC5009821 DOI: 10.1186/s12864-016-2828-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1The user input interface of I n t e G O2. The inputs are grouped into three categories: a the input genes and related information, b choosing similarity measurement and GO category, and c user information
List of available organisms in InteGO2. Annotated entity count field represents the number of annotated entity. Annotation count field represents the total number of annotations in the annotation file. (Noted that this table may be changed, since the annotation file is updated with the official Gene Ontology website automatically. This table was generated at Feb. 6th, 2015)
| Taxon | Annotated entity | Annotation | |
|---|---|---|---|
| count | count | ||
| 1 | Schizosaccharomyces | 5382 | 39377 |
| pombe | |||
| 2 | Aspergillus nidulans | 139805 | 512389 |
| 3 | Candida albicans | 46843 | 249940 |
| 4 | Dictyostelium discoideum | 8176 | 62688 |
| 5 | Saccharomyces cerevisiae | 6380 | 94253 |
| 6 | Arabidopsis thaliana | 30495 | 239953 |
| 7 | Rattus norvegicus | 20897 | 352450 |
| 8 | Gallus gallus | 14555 | 115998 |
| 9 | Canis lupus familiaris | 20342 | 146570 |
| 10 | Bos taurus | 20418 | 159295 |
| 11 | Homo sapiens | 45085 | 455674 |
| 12 | Sus scrofa | 20128 | 138431 |
| 13 | Danio rerio | 19392 | 152332 |
| 14 | Drosophila melanogaster | 14614 | 101879 |
| 15 | Caenorhabditis elegans | 20341 | 135664 |
| 16 | Pseudomonas aeruginosa PAO1 | 1043 | 1979 |
| 17 | Leishmania major | 644 | 1905 |
| 18 | Plasmodium falciparum | 2305 | 5976 |
| 19 | Trypanosoma brucei | 3531 | 8667 |
| 20 | Escherichia coli | 3770 | 45976 |
| 21 | Solanaceae | 309 | 561 |
| 22 | Dickeya dadantii | 124 | 296 |
| 23 | Oryza sativa | 41141 | 49292 |
| 24 | Magnaporthe grisea | 11274 | 27618 |
Fig. 2The visualization interface of I n t e G O2 to explore gene functional similarities based on GO. The network is shown in panel (c), in which a node represents a gene, and an edge indicates that the similarity score between the two corresponding genes is larger than an edge similarity threshold, which can be changed in panel (a). Edge similarity scores distribution shown in panel (b) helps users to choose an appropriate threshold. The gene information panel (d) and (e) show the recently chosen genes and current gene respectively. Panel f shows the neighbors of the recently chosen genes. The node operation panel (g) allows users to flag, lock or unlock a gene. The selected subnetwork is shown in (h)
The layouts supported in the visualization interface. The six layouts supported in the visualization interface of InteGO2
| Name | Description |
|---|---|
| concentric | The concentric layout positions nodes in concentric circles. |
| Users could select this layout to put the graph in the middle | |
| of the explorer. | |
| breadth-first | The breadth-first layout puts nodes in a hierarchy, based |
| on a breadth-first traversal of the graph. The hierarchical | |
| structure of the gene functional association network is | |
| shown in this layout. | |
| circle | The circle layout puts nodes in a circle. From a circle layout, |
| the user could easily find the nodes with high degree and | |
| low degree. | |
| cose | The cose (Compound Spring Embedder) layout uses a |
| force-directed simulation to lay out compound graphs. | |
| This layout helps the users to find the density region of the | |
| network. | |
| cola | The cola layout uses a force-directed physics simulation |
| with several sophisticated constraints. | |
| grid | The grid layout puts nodes in a well-spaced grid. |
Fig. 3An illustrative example of two networks with different thresholds. An illustrative example of two gene functional association networks with different gene-to-gene similarity thresholds(all the other parameters are the same).The threshold used in the left figure and the right figure are 0.9 and 0.8 respectively
Fig. 4An illustrative example of visualizing a network with two different graph layouts. An illustrative example of visualizing a gene functional association network (left figure in Fig. 3) with two different graph layouts
Fig. 5An illustrative example of selecting interested genes to construct subnetworks. The right figure shows three interested genes (Q6I Q55, P49840 and Q9B Z X2) are selected, and the left figure shows that all the direct neighbors of the interested genes are selected as well
Fig. 6Framework of InteGO2. Framework of I n t e G O2 for calculating gene-to-gene similarities for a input gene set (left) and for estimating the parameters in the integration model (right)