Literature DB >> 19776214

SimCT: a generic tool to visualize ontology-based relationships for biological objects.

Carl Herrmann1, Sèverine Bérard, Laurent Tichit.   

Abstract

UNLABELLED: We present a web-based service, SimCT, which allows to graphically display the relationships between biological objects (e.g. genes or proteins) based on their annotations to a biomedical ontology. The result is presented as a tree of these objects, which can be viewed and explored through a specific java applet designed to highlight relevant features. Unlike the numerous tools that search for overrepresented terms, SimCT draws a simplified representation of biological terms present in the set of objects, and can be applied to any ontology for which annotation data is available. Being web-based, it does not require prior installation, and provides an intuitive, easy-to-use service. AVAILABILITY: http://tagc.univ-mrs.fr/SimCT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2009        PMID: 19776214      PMCID: PMC2778334          DOI: 10.1093/bioinformatics/btp553

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

The wealth of data available from large-scale experiments in recent years has made the development of efficient tools to visualize, analyze, interpret and share post-genomic data a crucial endeavor. Among these, biomedical ontologies have been increasingly developed to annotate genome-wide features, and numerous fields of biology are now covered by a dedicated ontology. If Gene Ontology (GO) appears to be the pioneering project, many other projects are actively pursued (e.g. Robinson et al., 2008; see http://www.obofoundry.org/ for available biomedical ontologies) and their adoption by genome databases such as MGI, Wormbase and Flybase, will ensure that they will be increasingly used in the community. In this note, we present a generic, web-based tool called SimCT (Similarity Clustering Tool) which allows the visualization of the relations between biological objects (e.g. genes, proteins, etc.) based on their annotations to an ontology, in the form of a clustering tree. Our clustering procedure is a way to turn the ontology into a simplified tree (which is a subgraph of the ontology), which better represents the terms associated to a list of objects, therefore highlighting their relationships. This representation could neither be obtained by mapping the annotations onto the ontology, due to its complexity, nor by searching for overrepresented terms, which by definition overlooks terms that are not statistically relevant. The visualization is done using a dedicated java applet. Although many tools have been developed for GO, very few comparable tools exist for other biomedical ontologies yet.

2 METHODS

To measure the specificity of a term t in an ontology O, we have introduced the notion of precision as follows (see Supplementary Material for details, in particular the glossary for definition of terms used): where N(t) represents the number of descendant terms of t, N(t) the number of ancestor terms of t, N the total number of terms in O and Nmax the maximal number of ancestors a term can have in O. Interestingly, our definition of precision only depends on the structure of the ontology and not on annotation statistics like in Lord et al. (2003). Therefore, it can be applied to any existing ontology. Additionally, precision differs from information content, which gives equal specificity to all leaves of the ontology (Resnik, 1999; Schlicker et al., 2006; Wang et al., 2007). Based on precision, we define the similarity of two terms as the precision of their most precise common ancestor. Given a list of objects annotated to ontology, we consider the set of (object| ontology term) pairs. If an object has several annotations, it generates several (object|ontology term) pairs. We have implemented an aggregative clustering algorithm that builds the clustering tree based on the similarity between terms. The leaves of the resulting tree are the (object|ontology term) pairs and the internal nodes are ontology terms. We attach to each internal node a numerical index called Subtree Relevance Index (SRI): where T represents a subtree, t the ontology term attached to it, p(t) its precision and N(T) is the number of leaves of the subtree. It measures the relevance of each term for the list of objects submitted (Supplementary Material). The topology of the tree respects that of the underlying ontology (i.e. it is included in the directed acyclic graph (DAG) of the ontology).

3 IMPLEMENTATION

SimCT can be used in two different ways, depending on the ontology the user is interested in: In both the cases, the (object|ontology term) list is processed by the clustering algorithm. Once done, the user can open a java applet which displays the tree(s). As an example, the clustering of 300 leaves takes ∼40 s. With GO, the user can input a list of genes/proteins, select the corresponding organism (29 are currently available) and the GO sub-ontologies. The system retrieves available annotations. For other ontologies, the user must provide a two-columns list of objects associated to their annotations, and select the corresponding ontology among the 25 currently available. The user can also provide custom GO annotations.

4 APPLICATIONS

4.1 Disease ontology

To illustrate the use of SimCT, we have extracted from http://www.genome.gov/gwastudies/ a list of 79 single nucleotide polymorphisms (SNP) associated to a disease, described using the Disease Ontology (http://diseaseontology.sourceforge.net/). Figure 1 shows the resulting tree, in which two subtrees are highlighted: Diabetes Mellitus and Noninfectious enteritis and colitis. Taking the intersection of both trees through the applet menu reveals that three SNPs are simultaneously associated to both diseases (rs3024505, rs2542151, rs2476601). Interestingly, the latter two are close to or inside two genes, respectively, PTPN2 and PTPN22. Inspection of the OMIM entries related to both genes shows that only the first one is explicitly associated to both diabetes and enteritis, while PTPN22 is only associated to diabetes. Thus, our result suggests that we could add an additional annotation to PTPN22, namely enteritis.
Fig. 1.

A clustering tree of SNPs associated to diseases. Highlighted are two subtrees, Diabetes Mellitus (top) and Noninfectious enteritis and colitis (bottom).

A clustering tree of SNPs associated to diseases. Highlighted are two subtrees, Diabetes Mellitus (top) and Noninfectious enteritis and colitis (bottom).

4.2 Gene Ontology

We chose a set of 69 coregulated genes extracted from Transcriptome Browser (Lopez et al., 2008) around the natural killer (NK) gene NCR3 in human. We compared the P-values of the nodes with SRI ≥ 2.5 with the P-values given by DAVID (Dennis et al., 2003) and GO:TermFinder (Boyle et al., 2004). The differences between the SimCT approach and the search for overrepresented terms are highlighted in Supplementary Table S2. In particular, although no term related to biopolymer synthesis is found by DAVID as overrepresented, SimCT detects that five genes of the list are related to transcriptional (TBX21, CEBPD, TAF6L, GFI1) or translational (EIF5B, RPS8) processes which are child terms of biopolymer synthesis. These are the effectors at the end of the cascade of NK activity, leading for instance to the production of gamma-interferon.

5 CONCLUSION

Our approach can be compared with GOSurfer (Zhong et al., 2004), GOTreePlus (Lee et al., 2008) or GO::TermFinder (Boyle et al., 2004). However, SimCT includes the possibility to work with other biomedical ontologies than GO, and is web-based. Therefore, it provides an intuitive, easy-to-use and immediately available service, which allows to draw a clear picture of the ontological terms represented in a list of biological objects annotated to ontology, for any biomedical ontology. The viewer applet helps easily exploring and annotating the resulting tree to highlight its most relevant features. As more and more ontologies are being developed, we believe that this tool will prove very useful in working with these.
  9 in total

1.  DAVID: Database for Annotation, Visualization, and Integrated Discovery.

Authors:  Glynn Dennis; Brad T Sherman; Douglas A Hosack; Jun Yang; Wei Gao; H Clifford Lane; Richard A Lempicki
Journal:  Genome Biol       Date:  2003-04-03       Impact factor: 13.583

2.  Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation.

Authors:  P W Lord; R D Stevens; A Brass; C A Goble
Journal:  Bioinformatics       Date:  2003-07-01       Impact factor: 6.937

3.  GO::TermFinder--open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes.

Authors:  Elizabeth I Boyle; Shuai Weng; Jeremy Gollub; Heng Jin; David Botstein; J Michael Cherry; Gavin Sherlock
Journal:  Bioinformatics       Date:  2004-08-05       Impact factor: 6.937

4.  GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in Gene Ontology space.

Authors:  Sheng Zhong; Kai-Florian Storch; Ovidiu Lipan; Ming-Chih J Kao; Charles J Weitz; Wing H Wong
Journal:  Appl Bioinformatics       Date:  2004

5.  GOTreePlus: an interactive gene ontology browser.

Authors:  Bongshin Lee; Kristy Brown; Yetrib Hathout; Jinwook Seo
Journal:  Bioinformatics       Date:  2008-02-21       Impact factor: 6.937

6.  A new method to measure the semantic similarity of GO terms.

Authors:  James Z Wang; Zhidian Du; Rapeeporn Payattakool; Philip S Yu; Chin-Fu Chen
Journal:  Bioinformatics       Date:  2007-03-07       Impact factor: 6.937

7.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease.

Authors:  Peter N Robinson; Sebastian Köhler; Sebastian Bauer; Dominik Seelow; Denise Horn; Stefan Mundlos
Journal:  Am J Hum Genet       Date:  2008-10-23       Impact factor: 11.025

8.  A new measure for functional similarity of gene products based on Gene Ontology.

Authors:  Andreas Schlicker; Francisco S Domingues; Jörg Rahnenführer; Thomas Lengauer
Journal:  BMC Bioinformatics       Date:  2006-06-15       Impact factor: 3.169

9.  TranscriptomeBrowser: a powerful and flexible toolbox to explore productively the transcriptional landscape of the Gene Expression Omnibus database.

Authors:  Fabrice Lopez; Julien Textoris; Aurélie Bergon; Gilles Didier; Elisabeth Remy; Samuel Granjeaud; Jean Imbert; Catherine Nguyen; Denis Puthier
Journal:  PLoS One       Date:  2008-12-23       Impact factor: 3.240

  9 in total
  7 in total

1.  Functional integrative levels in the human interactome recapitulate organ organization.

Authors:  Ouissem Souiai; Emmanuelle Becker; Carlos Prieto; Alia Benkahla; Javier De las Rivas; Christine Brun
Journal:  PLoS One       Date:  2011-07-20       Impact factor: 3.240

2.  REVIGO summarizes and visualizes long lists of gene ontology terms.

Authors:  Fran Supek; Matko Bošnjak; Nives Škunca; Tomislav Šmuc
Journal:  PLoS One       Date:  2011-07-18       Impact factor: 3.240

3.  Multifunctional proteins revealed by overlapping clustering in protein interaction network.

Authors:  Emmanuelle Becker; Benoît Robisson; Charles E Chapple; Alain Guénoche; Christine Brun
Journal:  Bioinformatics       Date:  2011-11-10       Impact factor: 6.937

4.  RedundancyMiner: De-replication of redundant GO categories in microarray and proteomics analysis.

Authors:  Barry R Zeeberg; Hongfang Liu; Ari B Kahn; Martin Ehler; Vinodh N Rajapakse; Robert F Bonner; Jacob D Brown; Brian P Brooks; Vladimir L Larionov; William Reinhold; John N Weinstein; Yves G Pommier
Journal:  BMC Bioinformatics       Date:  2011-02-10       Impact factor: 3.307

5.  Relationships between predicted moonlighting proteins, human diseases, and comorbidities from a network perspective.

Authors:  Andreas Zanzoni; Charles E Chapple; Christine Brun
Journal:  Front Physiol       Date:  2015-06-23       Impact factor: 4.566

6.  The functional landscape of Hsp27 reveals new cellular processes such as DNA repair and alternative splicing and proposes novel anticancer targets.

Authors:  Maria Katsogiannou; Claudia Andrieu; Virginie Baylot; Anaïs Baudot; Nelson J Dusetti; Odile Gayet; Pascal Finetti; Carmen Garrido; Daniel Birnbaum; François Bertucci; Christine Brun; Palma Rocchi
Journal:  Mol Cell Proteomics       Date:  2014-10-02       Impact factor: 5.911

7.  Protein aggregation, structural disorder and RNA-binding ability: a new approach for physico-chemical and gene ontology classification of multiple datasets.

Authors:  Petr Klus; Riccardo Delli Ponti; Carmen Maria Livi; Gian Gaetano Tartaglia
Journal:  BMC Genomics       Date:  2015-12-16       Impact factor: 3.969

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.