Literature DB >> 15608260

ArrayExpress--a public repository for microarray gene expression data at the EBI.

H Parkinson1, U Sarkans, M Shojatalab, N Abeygunawardena, S Contrino, R Coulson, A Farne, G Garcia Lara, E Holloway, M Kapushesky, P Lilja, G Mukherjee, A Oezcimen, T Rayner, P Rocca-Serra, A Sharma, S Sansone, A Brazma.   

Abstract

ArrayExpress is a public repository for microarray data that supports the MIAME (Minimum Information About a Microarray Experiment) requirements and stores well-annotated raw and normalized data. As of November 2004, ArrayExpress contains data from approximately 12,000 hybridizations covering 35 species. Data can be submitted online or directly from local databases or LIMS in a standard format, and password-protected access to prepublication data is provided for reviewers and authors. The data can be retrieved by accession number or queried by various parameters such as species, author and array platform. A facility to query experiments by gene and sample properties is provided for a growing subset of curated data that is loaded in to the ArrayExpress data warehouse. Data can be visualized and analysed using Expression Profiler, the integrated data analysis tool. ArrayExpress is available at http://www.ebi.ac.uk/arrayexpress.

Entities:  

Mesh:

Year:  2005        PMID: 15608260      PMCID: PMC540010          DOI: 10.1093/nar/gki056

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

ArrayExpress is an international public repository for microarray data established at the European Bioinformatics Institute (EBI) in 2002 (1). ArrayExpress supports standards and recommendations developed by the Microarray Gene Expression Data (MGED) society (www.mged.org), including the Minimum Information About a Microarray Experiment (MIAME) (2) and Microarray Gene Expression Mark up Language (MAGE-ML) (3). Along with Gene Expression Omnibus (4) and CIBEX (5), it is one of the three repositories recommended by the MGED society (6) for storing data related to publications. The ArrayExpress suite of databases and applications comprises: (i) MIAMExpress, a web-based MIAME supportive data-submission tool; (ii) the ArrayExpress repository that provides public and password-protected access to the submitted data; (iii) a query optimized data warehouse containing a curated subset of normalized data; and (iv) Expression Profiler, an integrated online visu-alization and analysis tool. All the software in the ArrayExpress suite is open source. Here we will focus on describing MIAMExpress, the repository and the data warehouse; Expression Profiler has been reviewed recently (7). As the number of journals requiring submission to public repositories is growing, the cost of microarray experiments is falling and as data submission tools are improving, the volume of data in ArrayExpress is growing rapidly. During the last 12 months the ArrayExpress content has grown more than 10-fold (Figure 1a), and as of November 2004, the repository contains ∼12 000 hybridizations comprising more than 300 studies from 35 species (Figure 1b). The majority of studies concern samples from Homo sapiens or Mus musculus. Slightly more than 25% of the experiments have been performed using Affymetrix arrays. Although the majority of experiments study gene expression, there is a growing volume of ChIP on Chip and Comparative Genome Hybridization data in ArrayExpress.
Figure 1

(a) The number of hybridizations from October 2003 to September 2004. (b) The content of the database is shown broken down by species.

SUBMISSION AND CURATION

There are two major submission routes to ArrayExpress: (i) online via the MIAMExpress data submission tool, and (ii) via a MAGE-ML-based pipeline set-up with an external application or database. Currently, more than a half of all submissions have been submitted online. MIAMExpress is primarily aimed at users with no substantial local bioinformatics support and with no access to a local database providing direct deposition. No prior knowledge of the MIAME guidelines is required, as contextual help on the information required and help on the use of MIAMExpress is provided via links from the web interface. Submitters progress through a series of simple web forms to describe their experiment and upload the data files. MIAMExpress is an open source software that can be customized for use by a single laboratory, or for particular application domains. Examples of customization hosted at the EBI include the toxicology (8) and plant-specific MIAMExpress versions. MIAMExpress is now installed in 35 locations worldwide. Source code and installation information can be found at http://sourceforge.net/projects/miamexpress/. The ArrayExpress curation team processes each submission before it is loaded into the repository. Submissions are checked for MIAME compliance, accuracy and completeness of biological information provided, as well as for data consistency (e.g. it is checked if submitted data files match the specified array designs). During the curation process, the curation team may contact the submitter if inconsistencies in the data are found. Once the data are successfully loaded into ArrayExpress repository, the experiment is issued an accession number and a password is provided to the submitter if requested. The data in the repository are owned by the submitter, released on the date specified or upon publication in a journal and no changes are made without the submitter's consent. Where array designs are commercially available these are pre-loaded into ArrayExpress by the curators in response to user request. Information on custom-made arrays is submitted as a tab-delimited file containing position information and annotation information. The ArrayExpress curators work with external databases when setting up a direct data-submission pipeline to ensure that data are MIAME compliant and well formatted. Once a pipeline is established, the submissions are curated at the source database and monitored by ArrayExpress curators. MAGE-ML-based pipelines have been established from 15 external databases, manufacturers or tools, including the Stanford Microarray Database (SMD) (9), MIDAS at TIGR (10), from externally installed MIAMExpress systems at Cambridge University and the European Molecular Biology Laboratory at Heidelberg, as well as the array manufacturers, Affymetrix and Agilent. Further data curation is performed when populating the ArrayExpress warehouse from the repository. The curators select data based on their MIAME compliance, presence of normalized data and the quality of the biological annotation. Array designs are additionally annotated to the current version of the sequence databases at the EBI and up-to-date gene annotation, such as InterPro (11) Gene Ontology (GO) terms (12) and gene names are added, while the original array annotation supporting the publication is maintained in the repository.

DATA ACCESS AND QUERY

The highest level of organization in the ArrayExpress repository is the Experiment, which consists of one or more hybridizations, usually linked to a publication. The ArrayExpress query interface provides the ability to query for Experiments, Protocols and Array designs by their various attributes, such as species, authors or array platforms. Once an experiment has been selected the users can examine the description of the samples and protocols by navigating through the experiment, or they can download the data for analysis locally. The data can also be analysed and visualized online using Expression Profiler. Password-protected access to pre-publication data is provided for submitters and reviewers. The ArrayExpress data warehouse [which is based on the BioMart technology (11)] supports queries on gene attributes, such as gene names, gene function (GO annotations) or information on which family a gene belongs to or the motifs and domains it contains (InterPro terms), and on sample properties. The user can retrieve and visualize the gene expression values for multiple experiments. For example, querying the gene name ‘jun’ and sample property ‘leukemia’ retrieves all the experiments that contain data for a gene annotated with this name and that have been studied in experiments described using the term ‘leukemia’. A list of genes that match the query is returned. These can be visualized using line plots and data can be selected for further analysis. Links are provided back to the repository where users can access the full annotation and supporting raw data. Experimental data and corresponding array designs selected by the curators on the basis of MIAME compliance, annotation quality and comparability are loaded periodically into the warehouse. A schematic diagram of the software architecture is shown in Figure 2.
Figure 2

(a) The ArrayExpress architecture and database side activities are shown. (b). The functionality experienced by the user is shown.

FUTURE

The online submission tool MIAMExpress is being extended to allow a spreadsheet based data batch uploading to facilitate large-scale experiment submissions. A graph-based visualization tool is being added to MIAMExpress and ArrayExpress. The ArrayExpress repository and data warehouse interfaces will be unified. The gene-based query facility in the warehouse will be used as the basis for integrating ArrayExpress into all EBI services more closely, for instance expression data will be accessible from UniProt and Ensembl databases via a Distributed Annotation System (DAS) (http://www.biodas.org) server. As the volume of submissions continues to grow, we expect that the curation phase at the point of submission to the repository will be fully automated and curation efforts will focus on adding value to submitted data made available through the data warehouse.
  13 in total

1.  The Stanford Microarray Database.

Authors:  G Sherlock; T Hernandez-Boussard; A Kasarskis; G Binkley; J C Matese; S S Dwight; M Kaloper; S Weng; H Jin; C A Ball; M B Eisen; P T Spellman; P O Brown; D Botstein; J M Cherry
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  TM4: a free, open-source system for microarray data management and analysis.

Authors:  A I Saeed; V Sharov; J White; J Li; W Liang; N Bhagabati; J Braisted; M Klapa; T Currier; M Thiagarajan; A Sturn; M Snuffin; A Rezantsev; D Popov; A Ryltsov; E Kostukovich; I Borisovsky; Z Liu; A Vinsavich; V Trush; J Quackenbush
Journal:  Biotechniques       Date:  2003-02       Impact factor: 1.993

3.  CIBEX: center for information biology gene expression database.

Authors:  Kazuho Ikeo; Jun Ishi-i; Takurou Tamura; Takashi Gojobori; Yoshio Tateno
Journal:  C R Biol       Date:  2003 Oct-Nov       Impact factor: 1.583

4.  The Gene Ontology (GO) database and informatics resource.

Authors:  M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  EnsMart: a generic system for fast and flexible access to biological data.

Authors:  Arek Kasprzyk; Damian Keefe; Damian Smedley; Darin London; William Spooner; Craig Melsopp; Martin Hammond; Philippe Rocca-Serra; Tony Cox; Ewan Birney
Journal:  Genome Res       Date:  2004-01       Impact factor: 9.043

6.  The InterPro Database, 2003 brings increased coverage and new features.

Authors:  Nicola J Mulder; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Daniel Barrell; Alex Bateman; David Binns; Margaret Biswas; Paul Bradley; Peer Bork; Phillip Bucher; Richard R Copley; Emmanuel Courcelle; Ujjwal Das; Richard Durbin; Laurent Falquet; Wolfgang Fleischmann; Sam Griffiths-Jones; Daniel Haft; Nicola Harte; Nicolas Hulo; Daniel Kahn; Alexander Kanapin; Maria Krestyaninova; Rodrigo Lopez; Ivica Letunic; David Lonsdale; Ville Silventoinen; Sandra E Orchard; Marco Pagni; David Peyruc; Chris P Ponting; Jeremy D Selengut; Florence Servant; Christian J A Sigrist; Robert Vaughan; Evgueni M Zdobnov
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

7.  Expression Profiler: next generation--an online platform for analysis of microarray data.

Authors:  Misha Kapushesky; Patrick Kemmeren; Aedín C Culhane; Steffen Durinck; Jan Ihmels; Christine Körner; Meelis Kull; Aurora Torrente; Ugis Sarkans; Jaak Vilo; Alvis Brazma
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

8.  ArrayExpress--a public repository for microarray gene expression data at the EBI.

Authors:  Alvis Brazma; Helen Parkinson; Ugis Sarkans; Mohammadreza Shojatalab; Jaak Vilo; Niran Abeygunawardena; Ele Holloway; Misha Kapushesky; Patrick Kemmeren; Gonzalo Garcia Lara; Ahmet Oezcimen; Philippe Rocca-Serra; Susanna-Assunta Sansone
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

9.  Design and implementation of microarray gene expression markup language (MAGE-ML).

Authors:  Paul T Spellman; Michael Miller; Jason Stewart; Charles Troup; Ugis Sarkans; Steve Chervitz; Derek Bernhart; Gavin Sherlock; Catherine Ball; Marc Lepage; Marcin Swiatek; W L Marks; Jason Goncalves; Scott Markel; Daniel Iordan; Mohammadreza Shojatalab; Angel Pizarro; Joe White; Robert Hubley; Eric Deutsch; Martin Senger; Bruce J Aronow; Alan Robinson; Doug Bassett; Christian J Stoeckert; Alvis Brazma
Journal:  Genome Biol       Date:  2002-08-23       Impact factor: 13.583

10.  Submission of microarray data to public repositories.

Authors:  Catherine A Ball; Alvis Brazma; Helen Causton; Steve Chervitz; Ron Edgar; Pascal Hingamp; John C Matese; Helen Parkinson; John Quackenbush; Martin Ringwald; Susanna-Assunta Sansone; Gavin Sherlock; Paul Spellman; Chris Stoeckert; Yoshio Tateno; Ronald Taylor; Joseph White; Neil Winegarden
Journal:  PLoS Biol       Date:  2004-08-31       Impact factor: 8.029

View more
  141 in total

Review 1.  Sources of variance in baseline gene expression in the rodent liver.

Authors:  J Christopher Corton; Pierre R Bushel; Jennifer Fostel; Raegan B O'Lone
Journal:  Mutat Res       Date:  2012-01-05       Impact factor: 2.433

2.  PathEx: a novel multi factors based datasets selector web tool.

Authors:  Eric Bareke; Michael Pierre; Anthoula Gaigneaux; Bertrand De Meulder; Sophie Depiereux; Fabrice Berger; Naji Habra; Eric Depiereux
Journal:  BMC Bioinformatics       Date:  2010-10-22       Impact factor: 3.169

3.  Temporal protein expression pattern in intracellular signalling cascade during T-cell activation: a computational study.

Authors:  Piyali Ganguli; Saikat Chowdhury; Rupa Bhowmick; Ram Rup Sarkar
Journal:  J Biosci       Date:  2015-10       Impact factor: 1.826

Review 4.  Methods for biological data integration: perspectives and challenges.

Authors:  Vladimir Gligorijević; Nataša Pržulj
Journal:  J R Soc Interface       Date:  2015-11-06       Impact factor: 4.118

Review 5.  Storage and retrieval of microarray data and open source microarray database software.

Authors:  Gavin Sherlock; Catherine A Ball
Journal:  Mol Biotechnol       Date:  2005-07       Impact factor: 2.695

6.  One hundred years of high-throughput Drosophila research.

Authors:  Mathias Beller; Brian Oliver
Journal:  Chromosome Res       Date:  2006       Impact factor: 5.239

7.  The comparative toxicogenomics database: a cross-species resource for building chemical-gene interaction networks.

Authors:  Carolyn J Mattingly; Michael C Rosenstein; Allan Peter Davis; Glenn T Colby; John N Forrest; James L Boyer
Journal:  Toxicol Sci       Date:  2006-05-04       Impact factor: 4.849

Review 8.  Taking the first steps towards a standard for reporting on phylogenies: Minimum Information About a Phylogenetic Analysis (MIAPA).

Authors:  Jim Leebens-Mack; Todd Vision; Eric Brenner; John E Bowers; Steven Cannon; Mark J Clement; Clifford W Cunningham; Claude dePamphilis; Rob deSalle; Jeff J Doyle; Jonathan A Eisen; Xun Gu; John Harshman; Robert K Jansen; Elizabeth A Kellogg; Eugene V Koonin; Brent D Mishler; Hervé Philippe; J Chris Pires; Yin-Long Qiu; Seung Y Rhee; Kimmen Sjölander; Douglas E Soltis; Pamela S Soltis; Dennis W Stevenson; Kerr Wall; Tandy Warnow; Christian Zmasek
Journal:  OMICS       Date:  2006

9.  A gene expression bar code for microarray data.

Authors:  Michael J Zilliox; Rafael A Irizarry
Journal:  Nat Methods       Date:  2007-09-30       Impact factor: 28.547

10.  DGEM--a microarray gene expression database for primary human disease tissues.

Authors:  Yuni Xia; Andrew Campen; Dan Rigsby; Ying Guo; Xingdong Feng; Eric W Su; Mathew Palakal; Shuyu Li
Journal:  Mol Diagn Ther       Date:  2007       Impact factor: 4.074

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.