| Literature DB >> 18925933 |
Kieran O'Neill1, Alexander Garcia, Anita Schwegmann, Rafael C Jimenez, Dan Jacobson, Henning Hermjakob.
Abstract
BACKGROUND: Ontologies such as the Gene Ontology can enable the construction of complex queries over biological information in a conceptual way, however existing systems to do this are too technical. Within the biological domain there is an increasing need for software that facilitates the flexible retrieval of information. OntoDas aims to fulfil this need by allowing the definition of queries by selecting valid ontology terms.Entities:
Mesh:
Year: 2008 PMID: 18925933 PMCID: PMC2579441 DOI: 10.1186/1471-2105-9-437
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of ontology-based query interfaces. This table shows a comparison of OntoDas with existing systems designed to facilitate ontology-based queries, using various criteria
| 1. Problem domains: | Gene Ontology (25 K terms) to multi-species gene DB (2 M entries) | Gene Ontology (25 K terms), individual species gene DBs (30 K entries each) | Gene Ontology (25 K terms), several others (5 K each); rat gene and QTL data (9 K entries) | EMTREE ontology (50 K terms), custom literature DB (10 M entries) | image metadata thesauri (4 K terms each); image databases (35 K entries each) | Gene Ontology (25 K terms) to multi-species gene DB (2 M entries) |
| 2. Types of queries: | single term only; narrowing by evidence code, species. | multiple terms, only one term, evidence code per ontology. Only one species per query | AND, OR, NOT over several ontologies; no species/evidence code narrowing | AND queries of any number of terms; no narrowing criteria | AND queries across any number of terms; only one term per orthogonal ontology; supplementation with keyword search | AND queries across any number of terms |
| 3. Initial term finding: | forms, tree navigation | QuickGO browse, search, but no tie-in with MartView interface | no support | Form with intelligent term suggestion; tree navigation | keyword search, dynamic tree navigation | not yet implemented |
| 4. Combination finding: | no support | no support | no support | valid combinations with first term shown, but limited support for 3 or more | all valid combinations with current query displayed, also size of result set (query previewing) | extensive; all valid combinations displayed, as well as size of result set (query previewing) |
| 5. Display of results: | paged table, links to detail on each query, links to external information. | simplistic but configurable to be richer; spreadsheet export | highly visual SVG; no table but proprietary spreadsheet export | interactive, visual cluster map; problems with scalability | page-able table, links to detail on each entry, ability to construct new queries from annotations of entry | table with links out; paging and CSV download not yet implemented |
| 6. User involvement: | minimal, though possibly via mailing list | no evidence of any | no evidence of any | usability evaluation post- development | multiple cycles of testing, re-development, evaluated against a baseline | extensive participatory design throughout the life cycle |
| 7. Technologies: | web-based: Perl, MySQL | web-based: Perl, BioMart | web-based: Java JSP and Oracle 9i PL/SQL | Desktop-based: Java/Swing, ClusterMap, Sesame RDF store | Web-based: Python WebWare, MySQL, Java/Lucene optional | Web-based: Ajax (MochiKit and others), Python TurboGears, MySQL, web services |
Figure 1Substitute term view. The "substitute term" panel for the term GTPase activator activity in a complex query. Presented are the parents, siblings and lexical neighbours of the term, as well as the size of the queries which could be created by selecting any of these as a substitute for the term in focus. Substitution with guanyl-nucleotide exchange factor activity returns 7 gene products.
Figure 2OntoDas Architecture. OntoDas has a three-tier architecture: The display tier consists of Ajax running within a web browser, being used by a researcher. The "business" tier runs on an Apache Tomcat web server, and provides query execution as well as DAS proxying. The final tier consists of external systems: The Ontology Lookup Service as well as various DAS services are accessed remotely via HTTP. The GO MySQL database is intended to be installed on a MySQL server on a local network with the Tomcat server.
Figure 3Summary of the biological scenario workflow. The biological scenario detailed in the paper is illustrated here. This workflow represents just one of many paths that a biologist could take when using OntoDas.
Figure 4Overview for a single-term query. The OntoDas interface, showing a query involving just one term, membrane fusion. The query is quite general, returning 262 gene products. At this stage, the researcher is interested in narrowing the result set down using terms related to GTPase regulation.
Figure 5Combinable terms panel. The "combinable terms" panel of the membrane fusion query. The biologist has used the "group alphabetically" feature to look for terms beginning with "G". Clicking on the book icon next to the terms GTPase activator activity and guanyl-nucleotide exchange factor activity, they get details for both terms. The definitions pop up in a window, as well as synonyms, and provide the researcher with additional guidance as to which term best represents the concepts they are interested in.
Figure 6Dasty2 additions. Dasty2, showing the "Ontology Annotations" panel, which forms part of OntoDas. Shown here is the Dasty2 view for the protein encoded by the gene Vam6. Each ontology term used to annotate the given protein is displayed. Check boxes enable a combination of terms to be selected to construct a new ontology-term based query, the results of which are previewed below. Two terms have been selected for the final query.
Figure 7Results of a focused query. The overall OntoDas view, showing the narrowed down result set of just one gene product.
Figure 8Pseudocode showing how OntoDas computes combinable terms. As described in the text, OntoDas works backwards from the result set of the current query to determine all of the terms which can be combined with it to produce a query with a non-empty result set. Ancestors of the query terms themselves are excluded, since adding them to the query would be redundant.