Literature DB >> 20733062

Annotare--a tool for annotating high-throughput biomedical investigations and resulting data.

Ravi Shankar¹, Helen Parkinson, Tony Burdett, Emma Hastings, Junmin Liu, Michael Miller, Rashmi Srinivasa, Joseph White, Alvis Brazma, Gavin Sherlock, Christian J Stoeckert, Catherine A Ball.

Abstract

UNLABELLED: Computational methods in molecular biology will increasingly depend on standards-based annotations that describe biological experiments in an unambiguous manner. Annotare is a software tool that enables biologists to easily annotate their high-throughput experiments, biomaterials and data in a standards-compliant way that facilitates meaningful search and analysis.
AVAILABILITY AND IMPLEMENTATION: Annotare is available from http://code.google.com/p/annotare/ under the terms of the open-source MIT License (http://www.opensource.org/licenses/mit-license.php). It has been tested on both Mac and Windows.

Entities: Disease Species

Mesh：

Year: 2010 PMID： 20733062 PMCID： PMC2944206 DOI： 10.1093/bioinformatics/btq462

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Meta-data describing high-throughput investigations enable unambiguous interpretation of experiments, experiment reproducibility and meaningful searching and analysis of the resulting data. The microarray community has developed MAGE-TAB (Rayner, 2006), an annotation format for microarray data. MAGE-TAB allows laboratories to manage, exchange and publish well-annotated biomedical data using a spreadsheet-based paradigm. Several public repositories and analysis tools for microarray data such as ArrayExpress (Parkinson, 2009), Stanford Microarray Database (SMD) (Hubble, 2009), MeV (Saeed, 2006), Bioconductor (Gentleman, 2004) and caArray (Klemm et al., 2010) support microarray data submissions with MAGE-TAB annotations, and open source tools are available for conversion of legacy formats into MAGE-TAB (Rayner, 2009). In order to improve the volume, quality and granularity of annotations, there is a compelling need for software that enables biologists to easily annotate such data. We describe Annotare, a tool that facilitates annotation of gene expression data in MAGE-TAB format. Annotare is available under the terms of the MIT License at http://code.google.com/p/annotare/.

2 SOFTWARE COMPONENTS

Annotare is a stand-alone desktop application that features (i) a set of intuitive editor forms to create and modify annotations; (ii) support for easy incorporation of terms from biomedical ontologies; (iii) standard templates for common experiment types; (iv) a design wizard to help create a new document; and (v) a validator that checks for syntactic and semantic violations (Fig. 1).

Fig. 1.

Annotare software components. Rectangles represent the various components and the ovals represent the resources that these components consume.

Annotare software components. Rectangles represent the various components and the ovals represent the resources that these components consume. The front-end graphical user interface (GUI) uses Adobe AIR. This enables Annotare to run on multiple operating systems, and also sets the stage for future work to translate the desktop version to the web (see Section 3). Backend modules are built using Java, and the data communication between Adobe AIR and Java modules is supported by the Merapi messaging technology. Annotare has been tested on Windows XP2 and Mac OS (10.5 or greater).

2.1 Annotations editor

Annotare has a set of easy-to-use GUIs to view and modify annotations for an experiment. Using the forms, users can record details such as author's contact information, experimental design, protocols used, publications and relationships between biological materials used and data produced. The GUI hides the syntactic complexity of MAGE-TAB as much as possible. A spreadsheet edit-and-view paradigm allows annotation of the relationships between biomaterials and data. A column designer complements the spreadsheet functionality by grouping relevant MAGE-TAB column options together, facilitating the addition or deletion of columns, while obviating the need to know the correct column ordering.

2.2 Ontology support

The most challenging part of creating MAGE-TAB annotations can be using the correct terms from appropriate biomedical ontologies to describe an experiment in an unambiguous fashion. Examples of information that use controlled vocabularies include experimental design, experimental factor types, protocol types and sample characteristics. To support use of controlled vocabularies, Annotare includes the Experiment Factor Ontology (Malone et al., 2010). Annotare exploits an ontology auto-complete function. Annotare also supports an ontology widget that is enabled with ontology look-up services of the NCBO Bioportal (http://bioportal.bioontology.org/). The widget allows users to search for and use appropriate terms from many ontologies, such as the MGED Ontology (Whetzel, 2006).

2.3 Standard templates

A researcher should not have to start from a blank slate in order to annotate experiments. Annotare provides a set of standard templates, covering common species and experimental designs (i.e. a time series). Users can select templates that best match experiments and get pre-formatted MAGE-TAB that can then be completed with experiment-specific data. Custom templates can also be created and saved for use in future experiments.

2.4 Design wizard

In addition to experiment templates, Annotare has a design wizard that helps users create a MAGE-TAB. The wizard takes the user through a series of questions eliciting information about the experimental design, the number of channels, the labels used for each channel, and platform and protocol information. Based on the user's answers, the wizard generates partial annotations that the user can then complete using the editor. In the process of generating annotations, the wizard taps into an internally stored knowledge base of rules and mappings that connect various experiment designs, species, technology vendors, array designs and protocols.

2.5 Validator

The MAGE-TAB specification imposes a set of syntactic and semantic rules on the layout and content of MAGE-TAB documents. Users can invoke Annotare's validator component at any time in order to check if a document complies with these rules. The validator flags any violations as errors, warnings or missing data. It employs the Limpopo Parser, a library for MAGE-TAB parsing and validation, developed by ArrayExpress.

3 DISCUSSION

Annotare is a collaborative open-source software development effort involving many institutions. The tool is freely available from Annotare's project web site http://code.google.com/p/annotare/. Updates and improvements are planned in response to current usability studies. A web-based version of Annotare is also under development. Not only will a web-based version be able to take advantage of finding key ontology terms or publications via the internet, but it will be configurable so that it can be directly connected to a software package or database. Both ArrayExpress and SMD will provide access to the web-based Annotare tool to construct and view high-throughput experimental annotations. In addition to the web-based version of Annotare, future work will provide support for the MAGE-TAB version 1.1 as well as RNA-seq data. In particular, Annotare will be extended to allow researchers to annotate their RNA-seq or ChIP-seq experiments to satisfy the MINSEQE data sharing requirements for high-throughput sequence data (A.Brazma et al., submitted for publication).

8 in total

1. The MGED Ontology: a resource for semantics-based description of microarray experiments.

Authors: Patricia L Whetzel; Helen Parkinson; Helen C Causton; Liju Fan; Jennifer Fostel; Gilberto Fragoso; Laurence Game; Mervi Heiskanen; Norman Morrison; Philippe Rocca-Serra; Susanna-Assunta Sansone; Chris Taylor; Joseph White; Christian J Stoeckert
Journal: Bioinformatics Date: 2006-01-21 Impact factor: 6.937

Review 2. TM4 microarray software suite.

Authors: Alexander I Saeed; Nirmal K Bhagabati; John C Braisted; Wei Liang; Vasily Sharov; Eleanor A Howe; Jianwei Li; Mathangi Thiagarajan; Joseph A White; John Quackenbush
Journal: Methods Enzymol Date: 2006 Impact factor: 1.600

3. Modeling sample variables with an Experimental Factor Ontology.

Authors: James Malone; Ele Holloway; Tomasz Adamusiak; Misha Kapushesky; Jie Zheng; Nikolay Kolesnikov; Anna Zhukova; Alvis Brazma; Helen Parkinson
Journal: Bioinformatics Date: 2010-03-03 Impact factor: 6.937

4. Bioconductor: open software development for computational biology and bioinformatics.

Authors: Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal: Genome Biol Date: 2004-09-15 Impact factor: 13.583

5. A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB.

Authors: Tim F Rayner; Philippe Rocca-Serra; Paul T Spellman; Helen C Causton; Anna Farne; Ele Holloway; Rafael A Irizarry; Junmin Liu; Donald S Maier; Michael Miller; Kjell Petersen; John Quackenbush; Gavin Sherlock; Christian J Stoeckert; Joseph White; Patricia L Whetzel; Farrell Wymore; Helen Parkinson; Ugis Sarkans; Catherine A Ball; Alvis Brazma
Journal: BMC Bioinformatics Date: 2006-11-06 Impact factor: 3.169

6. Implementation of GenePattern within the Stanford Microarray Database.

Authors: Jeremy Hubble; Janos Demeter; Heng Jin; Maria Mao; Michael Nitzberg; T B K Reddy; Farrell Wymore; Zachariah K Zachariah; Gavin Sherlock; Catherine A Ball
Journal: Nucleic Acids Res Date: 2008-10-25 Impact factor: 16.971

7. ArrayExpress update--from an archive of functional genomics experiments to the atlas of gene expression.

Authors: Helen Parkinson; Misha Kapushesky; Nikolay Kolesnikov; Gabriella Rustici; Mohammad Shojatalab; Niran Abeygunawardena; Hugo Berube; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Ele Holloway; Margus Lukk; James Malone; Roby Mani; Ekaterina Pilicheva; Tim F Rayner; Faisal Rezwan; Anjan Sharma; Eleanor Williams; Xiangqun Zheng Bradley; Tomasz Adamusiak; Marco Brandizi; Tony Burdett; Richard Coulson; Maria Krestyaninova; Pavel Kurnosov; Eamonn Maguire; Sudeshna Guha Neogi; Philippe Rocca-Serra; Susanna-Assunta Sansone; Nataliya Sklyar; Mengyao Zhao; Ugis Sarkans; Alvis Brazma
Journal: Nucleic Acids Res Date: 2008-11-10 Impact factor: 16.971

8. MAGETabulator, a suite of tools to support the microarray data format MAGE-TAB.

Authors: Tim F Rayner; Faisal Ibne Rezwan; Margus Lukk; Xiangqun Zheng Bradley; Anna Farne; Ele Holloway; James Malone; Eleanor Williams; Helen Parkinson
Journal: Bioinformatics Date: 2008-11-27 Impact factor: 6.937

8 in total

1. Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations.

Authors: Marcos Martínez-Romero; Martin J O'Connor; Ravi D Shankar; Maryam Panahiazar; Debra Willrett; Attila L Egyedi; Olivier Gevaert; John Graybeal; Mark A Musen
Journal: AMIA Annu Symp Proc Date: 2018-04-16

Review 2. Reuse of public genome-wide gene expression data.

Authors: Johan Rung; Alvis Brazma
Journal: Nat Rev Genet Date: 2012-12-27 Impact factor: 53.242

3. How should the completeness and quality of curated nanomaterial data be evaluated?

Authors: Richard L Marchese Robinson; Iseult Lynch; Willie Peijnenburg; John Rumble; Fred Klaessig; Clarissa Marquardt; Hubert Rauscher; Tomasz Puzyn; Ronit Purian; Christoffer Åberg; Sandra Karcher; Hanne Vriens; Peter Hoet; Mark D Hoover; Christine Ogilvie Hendren; Stacey L Harper
Journal: Nanoscale Date: 2016-05-04 Impact factor: 7.790

4. Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases.

Authors: Marcos Martínez-Romero; Martin J O'Connor; Attila L Egyedi; Debra Willrett; Josef Hardi; John Graybeal; Mark A Musen
Journal: Database (Oxford) Date: 2019-01-01 Impact factor: 3.451

5. Genetic Analysis in Translational Medicine: The 2010 GOLDEN HELIX Symposium.

Authors: George P Patrinos; Federico Innocenti; Nancy Cox; Paolo Fortina
Journal: Hum Mutat Date: 2011-03-24 Impact factor: 4.878

6. graph2tab, a library to convert experimental workflow graphs into tabular formats.

Authors: Marco Brandizi; Natalja Kurbatova; Ugis Sarkans; Philippe Rocca-Serra
Journal: Bioinformatics Date: 2012-05-03 Impact factor: 6.937

7. ArrayExpress update--simplifying data submissions.

Authors: Nikolay Kolesnikov; Emma Hastings; Maria Keays; Olga Melnichuk; Y Amy Tang; Eleanor Williams; Miroslaw Dylag; Natalja Kurbatova; Marco Brandizi; Tony Burdett; Karyn Megy; Ekaterina Pilicheva; Gabriella Rustici; Andrew Tikhonov; Helen Parkinson; Robert Petryszak; Ugis Sarkans; Alvis Brazma
Journal: Nucleic Acids Res Date: 2014-10-31 Impact factor: 16.971

8. ArrayExpress update--trends in database growth and links to data analysis tools.

Authors: Gabriella Rustici; Nikolay Kolesnikov; Marco Brandizi; Tony Burdett; Miroslaw Dylag; Ibrahim Emam; Anna Farne; Emma Hastings; Jon Ison; Maria Keays; Natalja Kurbatova; James Malone; Roby Mani; Annalisa Mupo; Rui Pedro Pereira; Ekaterina Pilicheva; Johan Rung; Anjan Sharma; Y Amy Tang; Tobias Ternent; Andrew Tikhonov; Danielle Welter; Eleanor Williams; Alvis Brazma; Helen Parkinson; Ugis Sarkans
Journal: Nucleic Acids Res Date: 2012-11-27 Impact factor: 16.971

8 in total