Literature DB >> 24659104

GOssTo: a stand-alone application and a web tool for calculating semantic similarities on the Gene Ontology.

Horacio Caniza1, Alfonso E Romero1, Samuel Heron1, Haixuan Yang1, Alessandra Devoto1, Marco Frasca1, Marco Mesiti1, Giorgio Valentini1, Alberto Paccanaro1.   

Abstract

SUMMARY: We present GOssTo, the Gene Ontology semantic similarity Tool, a user-friendly software system for calculating semantic similarities between gene products according to the Gene Ontology. GOssTo is bundled with six semantic similarity measures, including both term- and graph-based measures, and has extension capabilities to allow the user to add new similarities. Importantly, for any measure, GOssTo can also calculate the Random Walk Contribution that has been shown to greatly improve the accuracy of similarity measures. GOssTo is very fast, easy to use, and it allows the calculation of similarities on a genomic scale in a few minutes on a regular desktop machine. CONTACT: alberto@cs.rhul.ac.uk AVAILABILITY: GOssTo is available both as a stand-alone application running on GNU/Linux, Windows and MacOS from www.paccanarolab.org/gossto and as a web application from www.paccanarolab.org/gosstoweb. The stand-alone application features a simple and concise command line interface for easy integration into high-throughput data processing pipelines.
© The Author 2014. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24659104      PMCID: PMC4103586          DOI: 10.1093/bioinformatics/btu144

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Semantic similarity measures have become important in bioinformatics as they provide a way of quantifying the functional relatedness between genes that is complementary to both experimental information and sequence-based approaches. This is done by annotating genes to the terms of a chosen ontology and then quantifying the similarity between these terms. Among the ontologies, the Gene Ontology (GO) (Ashburner ) has become a standard and is the focus of this work. Several semantic similarity measures have been proposed. For example, those by Resnik (1999), Jiang and Conrath (1997) and Lin (1998) are based on the information content of the lowest common ancestor of pairs of terms, and are often referred to as ‘term-based’; simUI and simGIC (Pesquita ) and GraSM (Couto and Silva, 2005) compare sets of terms rather than single terms using graph comparison approaches and are often referred to as ‘graph-based’. An important recent development has been the introduction of the Random Walk Contribution which greatly improves semantic similarity measures (Yang ). In their paper, the authors argued that existing measures have two important deficiencies: first, they do not take into account the descendants of the terms; second, they do not model the inherent uncertainty in the current annotations and ontology structure. The Random Walk Contribution is a kind of ‘add on’ for existing similarity measures that enhances them to correct these two issues. Few software tools have been proposed for calculating semantic similarities, including ProteinOn (Faria ) IT-GOM (Mazandu and Mulder, 2013) and G-SESAME (Du ). However, none of them can combine the Random Walk Contribution proposed by Yang . Moreover, tools provided as stand-alone applications are not readily extendable with new semantic similarity measures, or are available only as packages running within environments such as R or MATLAB. Other tools are exclusively available online and their use is impractical for high-throughput analysis on large bodies of data. Most tools do not allow for a straightforward calculation of semantic similarities for a whole genome, or an easy updating of the GO annotations. In this article, we present GOssTo (Gene Ontology semantic similarity Tool), a new tool for calculating semantic similarities that overcomes all of the above limitations. GOssTo includes the Random Walk Contribution by Yang and it supports both term- and graph-based similarity measures. GOssTo is available in downloadable binary form, with the entire source code released under GPLv3. GOssTo is easy to use and very fast—Table 1 shows the time required for calculating the Resnik semantic similarity including the Random Walk Contribution for a few model organisms. GOssTo features a simple and concise command line interface and an application programming interface (API) for easy integration into high-throughput data-processing pipelines. GOssTo’s design allows for user provided similarity measures to be independently developed, compiled and linked at runtime. These features make GOssTo a practical environment for both the development of novel semantic similarity measures as well as for the calculation of semantic similarities on a genomic scale.
Table 1.

Time, in minutes, required for calculating semantic similarities for a few model organisms

OrganismNumber of GO termsNumber of annotated genesTime term-wiseTime gene-wise
Arabidopsis661097033 m 48 s43 m 35 s
Rat9422527058 m 19 s29 m 54 s
Mouse129611502024 m 35 s689 m 26 s
Fly730482354 m 56 s47 m 46 s
Yeast707748984 m 0 s23 m 55 s
Worm446743701 m 29 s5 m 1 s

Note: For each organism: number of unique GO terms appearing in the GO annotation; number of annotated genes; time (in minutes and seconds) required for calculating the Resnik semantic similarity including the Random Walk Contribution term- and gene-wise. Calculations used GO experimental evidence codes (EXP, IDA, IPI, IMP, IGI, IEP, TAS) and is_a and part_of GO relationships. Data downloaded in February 2014. Experiments run on AMD Opteron 6128 HE.

Time, in minutes, required for calculating semantic similarities for a few model organisms Note: For each organism: number of unique GO terms appearing in the GO annotation; number of annotated genes; time (in minutes and seconds) required for calculating the Resnik semantic similarity including the Random Walk Contribution term- and gene-wise. Calculations used GO experimental evidence codes (EXP, IDA, IPI, IMP, IGI, IEP, TAS) and is_a and part_of GO relationships. Data downloaded in February 2014. Experiments run on AMD Opteron 6128 HE. GOssTo is also available online, through a clean web interface from our server at www.paccanarolab.org/gosstoweb. GOssToWeb provides access to the same functionalities of the stand-alone application, allowing extensive configuration of the experiments through a user-friendly web form. The user can select GO evidence code, GO relationships and a genome from the list of organisms available in UniProt-GOA. GOssToWeb automatically fetches the most recent version of the functional annotation from UniProt-GOA and of the GO from its official repository, thus ensuring that the most up-to-date data are used. Results are provided by redirecting the user to a page from which they can be downloaded. The system can notify the user with an email containing a link to the result download page.

2 METHODS

The downloadable version of GOssTo is bundled with six commonly used semantic similarity measures: the term-based measures by Resnik (1999), Lin (1998), Jiang and Conrath (1997) and GraSM (Couto and Silva, 2005); the graph-based measures simUI and simGIC (Pesquita ). All these measures are extended with Yang Random Walk based procedure. The guiding principles for GOssTo’s design aimed at producing a fast and flexible software package. This resulted in a highly modularized architecture with very low coupling between individual modules. These modules can be readily removed or replaced without affecting the overall behaviour of the system. The user can interact with GOssTo either through a command-line interface or an API. The command-line interface provides UNIX-like console parameter options as well as an interactive menu; the API offers access to all the functionalities in the different modules through a set of well-defined functions. Thus, GOssTo can be used in three different ways: as a part of a larger data-processing pipeline; as a stand-alone application; as a static library for existing software. For easy processing of the results, all output is presented in structured plain text files. GOssTo includes a powerful extension mechanism to add new semantic similarity measures. A well-defined interface grants the user access to the data structures upon which new measures can be developed. After a new measure is independently compiled, it can be dynamically linked to GOssTo’s application core. The new measure can then be used in the same way as the ones bundled with GOssTo. The current version of GOssTo focuses on traditional semantic similarity measures which rely mostly on the GO structure. Future versions will include the possibility of handling Description Logic axioms which are being added to existing ontologies (Ferreira ). GOssTo was developed using the Java programming language. The JAMA package provides the internal data types and the required mathematical routines. GOssTo’s source code is freely available from GitHub at https://github.com/pwac092/gossto and is released under the GPLv3 license. GOssTo runs on multiple platforms, and we have extensively tested in on both GNU/Linux and Windows. More information about GOssTo including a comprehensive manual is available from www.paccanarolab.org/gossto Funding: Biotechnology and Biological Sciences Research Council (BBSRC) (grant number BB/K004131/1 to A.P.); PASCAL2 Network of Excellence (EC grant number 216886 to G.V.). Conflicts of Interest: none declared.
  6 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Exploiting disjointness axioms to improve semantic similarity measures.

Authors:  João D Ferreira; Janna Hastings; Francisco M Couto
Journal:  Bioinformatics       Date:  2013-09-03       Impact factor: 6.937

3.  Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty.

Authors:  Haixuan Yang; Tamás Nepusz; Alberto Paccanaro
Journal:  Bioinformatics       Date:  2012-04-19       Impact factor: 6.937

4.  Information content-based gene ontology semantic similarity approaches: toward a unified framework theory.

Authors:  Gaston K Mazandu; Nicola J Mulder
Journal:  Biomed Res Int       Date:  2013-09-02       Impact factor: 3.411

5.  G-SESAME: web tools for GO-term-based gene similarity analysis and knowledge discovery.

Authors:  Zhidian Du; Lin Li; Chin-Fu Chen; Philip S Yu; James Z Wang
Journal:  Nucleic Acids Res       Date:  2009-06-02       Impact factor: 16.971

6.  Metrics for GO based protein semantic similarity: a systematic evaluation.

Authors:  Catia Pesquita; Daniel Faria; Hugo Bastos; António E N Ferreira; André O Falcão; Francisco M Couto
Journal:  BMC Bioinformatics       Date:  2008-04-29       Impact factor: 3.169

  6 in total
  17 in total

1.  A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool.

Authors:  Gaston K Mazandu; Emile R Chimusa; Mamana Mbiyavanga; Nicola J Mulder
Journal:  Bioinformatics       Date:  2015-10-17       Impact factor: 6.937

2.  DeepIsoFun: a deep domain adaptation approach to predict isoform functions.

Authors:  Dipan Shaw; Hao Chen; Tao Jiang
Journal:  Bioinformatics       Date:  2019-08-01       Impact factor: 6.937

3.  Interpretation of cancer mutations using a multiscale map of protein systems.

Authors:  Fan Zheng; Marcus R Kelly; Dana J Ramms; Marissa L Heintschel; Kai Tao; Beril Tutuncuoglu; John J Lee; Keiichiro Ono; Helene Foussard; Michael Chen; Kari A Herrington; Erica Silva; Sophie N Liu; Jing Chen; Christopher Churas; Nicholas Wilson; Anton Kratz; Rudolf T Pillich; Devin N Patel; Jisoo Park; Brent Kuenzi; Michael K Yu; Katherine Licon; Dexter Pratt; Jason F Kreisberg; Minkyu Kim; Danielle L Swaney; Xiaolin Nan; Stephanie I Fraley; J Silvio Gutkind; Nevan J Krogan; Trey Ideker
Journal:  Science       Date:  2021-10-01       Impact factor: 63.714

4.  Evaluating the microRNA-target gene regulatory network in renal cell carcinomas, identification for potential biomarkers and critical pathways.

Authors:  Jun Li; Jian-Hua Huang; Qing-Hua Qu; Qier Xia; Deng-Shan Wang; Lei Jin; Chang Sheng
Journal:  Int J Clin Exp Med       Date:  2015-05-15

5.  InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology.

Authors:  Jiajie Peng; Hongxiang Li; Yongzhuang Liu; Liran Juan; Qinghua Jiang; Yadong Wang; Jin Chen
Journal:  BMC Genomics       Date:  2016-08-31       Impact factor: 3.969

6.  The post-genomic era of biological network alignment.

Authors:  Fazle E Faisal; Lei Meng; Joseph Crawford; Tijana Milenković
Journal:  EURASIP J Bioinform Syst Biol       Date:  2015-06-04

7.  Investigations on factors influencing HPO-based semantic similarity calculation.

Authors:  Jiajie Peng; Qianqian Li; Xuequn Shang
Journal:  J Biomed Semantics       Date:  2017-09-20

8.  NoGOA: predicting noisy GO annotations using evidences and sparse representation.

Authors:  Guoxian Yu; Chang Lu; Jun Wang
Journal:  BMC Bioinformatics       Date:  2017-07-21       Impact factor: 3.169

9.  Similarities between plant traits based on their connection to underlying gene functions.

Authors:  Jan-Peter Nap; Gabino F Sanchez-Perez; Aalt D J van Dijk
Journal:  PLoS One       Date:  2017-08-10       Impact factor: 3.240

10.  A Protein Complex Map of Trypanosoma brucei.

Authors:  Vahid H Gazestani; Najmeh Nikpour; Vaibhav Mehta; Hamed S Najafabadi; Houtan Moshiri; Armando Jardim; Reza Salavati
Journal:  PLoS Negl Trop Dis       Date:  2016-03-18
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.