Literature DB >> 15608273

BarleyBase--an expression profiling database for plant genomics.

Lishuang Shen¹, Jian Gong, Rico A Caldo, Dan Nettleton, Dianne Cook, Roger P Wise, Julie A Dickerson.

Abstract

BarleyBase (BB) (www.barleybase.org) is an online database for plant microarrays with integrated tools for data visualization and statistical analysis. BB houses raw and normalized expression data from the two publicly available Affymetrix genome arrays, Barley1 and Arabidopsis ATH1 with plans to include the new Affymetrix 61K wheat, maize, soybean and rice arrays, as they become available. BB contains a broad set of query and display options at all data levels, ranging from experiments to individual hybridizations to probe sets down to individual probes. Users can perform cross-experiment queries on probe sets based on observed expression profiles and/or based on known biological information. Probe set queries are integrated with visualization and analysis tools such as the R statistical toolbox, data filters and a large variety of plot types. Controlled vocabularies for gene and plant ontologies, as well as interconnecting links to physical or genetic map and other genomic data in PlantGDB, Gramene and GrainGenes, allow users to perform EST alignments and gene function prediction using Barley1 exemplar sequences, thus, enhancing cross-species comparison.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2005 PMID： 15608273 PMCID： PMC540077 DOI： 10.1093/nar/gki123

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

BarleyBase (BB) is a USDA-funded public database for cereal microarray data. BB was first developed to support the Affymetrix Barley1 GeneChip, and is being expanded to new plant GeneChips and other microarray platforms. The Barley1 GeneChip is a new community-designed, Affymetrix probe array (1), which pioneered the GeneChip design for plants without a fully sequenced genome. Several new GeneChips for wheat, soybean and maize will be released in 2005. BB includes MIAME-compliant microarray experiment annotations as well as Plant Ontology terms through BarleyExpress, its web-based submission tool (2). Links with other sequence and crop databases give BB users the ability to quickly discover all the known facts about any probe set or exemplar sequence on the chip and to compare with other plant species such as rice or wheat. Data queries are integrated with analysis and visualization tools to allow users to explore their experimental data. As of September, 2004, BB hosts 23 completed experiment submissions with a total of 972 hybridizations. There are many public databases that provide access to microarray data. These include general repositories, such as the Gene Expression Omnibus (GEO) (3), Stanford Microarray Database (4) and ArrayExpress (5) and species-specific resources, such as TAIR (6) and NASCArrays (7). Repositories typically store data for download and later analysis. The general repositories such as GEO and ArrayExpress are intended to act as central data distribution hubs, not to replace gene expression databases that are constructed to facilitate particular analytic methods or comparisons. BB is designed to meet the needs of plant biologists in their analysis of gene expression data and to put the expression data in the context of functional genomics by using controlled gene and plant ontologies to describe experimental conditions. Interconnecting links to plant genomic resources such as PlantGDB (8), Gramene (9) and GrainGenes (10) facilitate access to contig alignments, oligo probe information and a variety of BLAST tools from the NCBI, PlantGDB, TIGR, TAIR or Rice genome databases.

DATABASE DESCRIPTION

BB stores microarray gene expression data in a MIAME-compliant and Plant Ontology enhanced format for plants, and integrates the data with exploration and analysis tools across experiments. BB stores the following types of information: GeneChip and/or microarray structure data, experimental and labeling protocols, raw and normalized gene expression data and experiment and sample annotations such as summary statistics from R and MAS5.0. BB uses a hierarchical data model to organize and display microarray gene expression data. The top-level data structure is the experiment, which consists of a set of hybridizations with a treatment structure designed to answer one or more related biological questions. A factorial treatment structure is used to describe BB experiments. Each treatment is associated with a specific level of each of one or more experimental factors. Each treatment has one or more samples as biological replicates; each sample has one or more hybridizations as technical replicates. To facilitate smooth data exchange across databases, plant ontologies for growth stage and organism parts (11), and other controlled vocabularies are required in the experiment description and sample annotation in BarleyExpress. BB follows the MIAME standards (12) and the implementation used in MIAMExpress (http://www.ebi.ac.uk/miamexpress). BarleyExpress adds plant-specific fields such as links to the Plant Ontology terms on growth stages and tissue types are added in the experiment submission process (2). The use of controlled vocabularies allows cross-experiment comparisons based upon common identifiers, facilitating interoperability between existing plant databases to identify homologous genes. Biological annotation for probe sets and exemplars includes sequence description, BLAST hits from related sequence databases or species, Gene Ontology, and pathway and gene family information. BB requires raw CEL data files for gene expression data for which EXP and DAT files are recommended. BB processes all submissions in a standardized way which ensures ease of cross-experiment comparison. After the submitter uploads the experiment data, the curator checks the data integrity and computes the normalized expression measures, summary statistics and graphs. Unique accession numbers are assigned to each experiment for data access. Processed data, sequence annotation and pre-computed analyses results are stored for online access and analysis. Finally, BB generates MAGE-ML and text files for batch download and data exchange. The MAGE-ML files can be submitted to ArrayExpress or read by many microarray data analysis programs. BB uses open-source tools or tools free for academic institutions. The server uses RedHat Linux version 9. The website is powered by an Apache server, PHP and Javascript for dynamic web pages, and a MySQL 4.0 relational database as the back end. The data pre-processing uses R (13), Bioconductor and Perl. R is an open platform for statistical computation and Bioconductor is a project written in R for microarray data analysis. Robust Multichip Average (RMA) (14) in the affy package of Bioconductor and Affymetrix MAS 5.0 (15) are used to compute normalized expression measures from the raw expression values.

Data access policy

BB has secure and flexible account and data access management, which allows data owners to protect their data before publication and yet enables dispersed collaboration. The submitter can specify the accessibility to data of an experiment as ‘public’, ‘private’ or ‘group accessible’. Public access allows any users to access data; private allows data to be viewed only by the data owner; and group access allows group members to access the data. Registered users can create groups and add selected users to the groups to grant access to data from designated experiments. Reviewers can anonymously access datasets referenced by a manuscript to verify the conclusions using reviewer's login ID. All users are strongly encouraged to make their data public as soon as possible.

DATA ANALYSIS AND VISUALIZATION

Microarray gene expression datasets are large and multivariate in nature and require flexible approaches for analysis. Instructive data visualization and presentation of the data are indispensable for users to efficiently mine the data and derive meaningful biological interpretations. The visualization pages are provided at different levels based on the query hierarchy. Visualizing the expression data will aid users in choosing suitable parameters for gene filtering and analysis. The analysis and visualization tools can be accessed by using a traditional pipeline of experiment analysis or searching for a gene(s) of interest using sequence comparisons then finding genes that behave similarly. The Supplementary Material guides a user through some of the analysis tools available at BB (http://www.barleybase.org/quicktour.php).

Data visualization for experiments

Experiment queries are typically the starting point for data retrieval and analysis flow. Based on the information captured for experiment design in BarleyExpress, the Experiment Query allows users to search and browse the experiments, protocols and array designs. Quality checking and understanding the experimental data are essential before conducting gene-centric analysis. Users can navigate the expression values by hybridizations and experimental factor, and check sample annotation. The summary statistics and visualizations allow users to quickly assess experiment quality. Box plots and histograms of raw Perfect Match (PM) intensities and normalized expression values are used to check the distribution of the expression data and the quality across hybridizations in an experiment. Histograms of the PM values detect signal saturation, and help to quickly catch problems such as incorrect scanner parameters. Side-by-side boxplots of the normalized expression data are used to assess normalization results. These boxplots are ideally almost identical as shown in Figure 1.

Figure 1

Boxplots can be used to visualize overall experimental quality before expression data normalization. The central box shows the middle 50% of the data. The dotted lines or whiskers, extend to the 10th and 90th percentiles, respectively. The circles represent outliers. Boxes with the same color are technical replicates with the same treatment factors. These data summarize the logarithm of the raw probe set perfect match (PM) expression values in experiment BB4 before normalization.

At the hybridization level, pseudo-color images of PM intensities are used for visual detection of spatial abnormalities. Scatter plots and MVA plots show reproducibility and variability among and/or between hybridizations or treatments. These comparative scatter plots can range across experiments, with x- and y-axes using hybridizations or treatment means from different experiments sharing similarity in experimental material or factors. In the MVA plots, the M is the log ratio between two hybridizations and A is average of the logged signal intensities. MVA plots can be regarded as a 45° clockwise rotation of scatter plots for easier viewing of differential expression.

Gene-centric expression data analysis tools

Following the initial experiment and hybridization exploration, users can further filter data and create gene lists. Creating gene lists is the first step in most gene-centric analysis for microarray experiments. Saved gene lists can be fed to advanced microarray data analysis and visualization methods. BB provides a full range of gene filters by expression profiles and by biological criteria. Gene-centric expression profiles for single genes or gene lists are displayed as profile-plots (line graphs) and heatmaps. Interactive profile-plots allow the user to gain insight into the way treatments affect expression. An ‘Expression view’ (heatmap) explores genes with similar expression profiles that may represent co-regulated genes. Expression profile filters are mainly used to identify differentially expressed genes (probe sets). The filters usually operate on a single experiment, but users may do cross-experiment query for hypothesis generation. The filter can be a single filter or a composite filter that is a combination of several filters linked with various Boolean operators. Filters are based on absolute value range, relative and absolute variation, fold change, MAS5.0 Presence/Absence call or other variation measures. Statistical test filters include most standard two-sample and multiple-sample statistical methods for identifying differentially expressed genes with multiple test corrections. Co-regulated genes can also be identified. For cross-experiment filtering, hybridizations from several experiments are compared with each other. This functions like a virtual experiment in silico using hybridizations from different existing experiments. Biologically based filters use annotation keywords and sequence similarity to group genes into a gene list. For the ATH1 GeneChip, gene family and KEGG pathway filters are available to find probe sets corresponding to enzymes from interesting metabolic or regulatory pathways or a given gene family. Users may import their own list of gene or probe set names. Files of free text containing the gene names can be used directly without tedious editing. Users may export gene lists as tab-delimited text files for names, annotation or expression values. Gene lists can be compared in various combinations: union of two gene lists, intersection of both gene lists and unique genes in either gene lists. This is useful for combining the results of different filters, such as biological and expression-based filters, and for comparing pattern recognition methods. Analysis results are automatically saved, including information about the methods, parameters and gene list analyzed. Many of the standard supervised and unsupervised pattern recognition methods are implemented for online analysis. Methods include hierarchical and k-means clustering, principal component analysis (PCA), self-organizing maps (SOMs) and Sammon's non-linear mapping. For each of the methods, data can be transformed or scaled using logarithm-transformation, mean or median centering, and scaling based on the standard deviation of the probe set in an experiment. The pattern recognition results are visualized using expression profile line graphs, dendrograms and heatmaps for the entire gene list or for each subcluster. Each method also has its specialized visual presentation, such as clustering plots for partitions in k-means or partition grids for SOMs.

Gene function and GeneChip annotation

GeneChips may be searched for a sequence of interest by performing a BLAST search against a particular GeneChip (16). This search gives a list of exemplars and probe sets on a particular GeneChip that match that sequence. This page also allows users to access gene expression data from BB. This type of search is particularly important for organisms which have not been fully sequenced. Figure 2 shows the results of finding a particular exemplar and its accompanying annotation that links to plant genomic resources such as PlantGDB, Gramene and GrainGenes. Contig alignments from HarvEST:Barley [http://harvest.ucr.edu/Barley1.htm (1)] and oligo probe information from the Barley1 and Arabidopsis GeneChips can be displayed. The sequences can be blasted against the NCBI, PlantGDB, TAIR or Rice genome databases for additional annotation information.

Figure 2

Annotation for exemplar Barley1_11969. The exemplar annotation page includes the probe alignment, links to databases such as Gramene, PlantGDB and GrainGenes along with the best BLAST hits. The left-hand side picture shows the normalized expression data for some of the hybridizations in experiment BB4. The blocks at the bottom show the experimental factors laid out in a factorial experimental design. The raw probe intensity data PM (red) and MM (blue) levels for exemplar Barley1_11969 in hybridizations 17 and 18 in experiment BB4 show the different responses across the probe pairs.

The annotation page also links to expression data related to the probe exemplar as shown in Figure 2. The user can look at how this probe set is expressed in different experiments or search for genes that behave similarly in certain experiments. Probe sets with similar expression profiles as the selected exemplar can be identified using correlation tests. These genes may be used to create gene lists for further analysis on a particular experiment or groups of experiments. This type of analysis is critical in identifying co-regulated genes that may be involved in similar biochemical pathways. The results are displayed using heatmaps or profile plots. For more detail, the raw probe pair PM and MM data can also be displayed to further investigate GeneChip response to a particular hybridization as shown in Figure 2. Barplots with standard deviation are plotted by hybridizations or by probe pair numbers, allowing comparison of intensities across hybridizations for same probe, or across probe pairs for same hybridization. As our data and understanding of the GeneChips accumulate, we plan to exclude probe pairs that are known to be ineffective from the analysis available to BB users.

Comparative genomic analysis

BarleyBase supports comparative genomics capabilities by interconnecting links with established plant databases. Barley1 exemplars are aligned to the sequenced model plant rice genome browser in Gramene, and to other cereal genomes for annotation information integration. Barley1 exemplars can also be queried for Triticeae map positions in GrainGenes. Integrated links with PlantGDB facilitates detailed gene prediction and contig view of the exemplars. ATH1 exemplars, function and pathway information are supported through links with TAIR. A series of BLAST utilities allow users to perform cross-species queries by finding matches for any sequences on plant GeneChips with links to GenBank and other major databases. Choosing an experiment from different GeneChip platform will automatically initiate cross-platform gene list creation, where the best BLAST hits are used as match from other platforms. Cross-platform gene list creation enhances comparative gene expression analysis to fully utilize microarray data from different plant species.

SUMMARY

Adherence to MIAME standards and controlled plant ontologies facilitates the efficient presentation and organization of the volumes of data from a typical microarray-based investigation. BarleyBase captures and stores all applicable MIAME-compliant information and enforces plant ontology and controlled vocabulary for experiments. BB explicitly captures factorial experiment design information, enhancing the flow of experiment submission, data analysis and data presentation. It makes data accessible at each data level, from the experiment level to the individual probe level. The online pipeline integrates a broad set of gene query and display options with a full set of analysis and visualization tools. Cross-experiment gene filtering and cross-platform matching provide great flexibility in hypothesis generation.

FUTURE PLANS

BB is under active development, and several enhancements are planned for the near future. First, BB will expand to support the Affymetrix high-density GeneChips for maize, rice, soybean and wheat that will be available soon, and will evolve into PLEXdb, a comprehensive Plant Expression Data Base. Second, data from spotted cDNA and long oligo microarray platforms will also be added using open-source tools for integrating cDNA microarray data processing and management, such as the TM4 suite from TIGR (17) and BASE (18). Third, plant ontologies for other species beyond barley will be enhanced. These changes will begin to pave the way toward comparative expression data analysis. Gene Ontology and pathway information need to be adapted to BB for exemplar annotation, which will allow functional gene expression analysis with insight on how specific genes are involved in biological processes. Fourth, expression analysis and visualization tool development will add new methods for gene identification and pattern recognition, and enhance BB's web-based interactive visualization capabilities. Overlaying expression data with Gene Ontology, gene network and pathway analysis will be added to aid biological interpretation. Cross-experiment, cross-platform and cross-species data analysis and comparison capabilities will be enhanced for hypothesis generation.

15 in total

1. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.

Authors: A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron
Journal: Nat Genet Date: 2001-12 Impact factor: 38.330

2. TM4: a free, open-source system for microarray data management and analysis.

Authors: A I Saeed; V Sharov; J White; J Li; W Liang; N Bhagabati; J Braisted; M Klapa; T Currier; M Thiagarajan; A Sturn; M Snuffin; A Rezantsev; D Popov; A Ryltsov; E Kostukovich; I Borisovsky; Z Liu; A Vinsavich; V Trush; J Quackenbush
Journal: Biotechniques Date: 2003-02 Impact factor: 1.993

3. PlantGDB, plant genome database and analysis tools.

Authors: Qunfeng Dong; Shannon D Schlueter; Volker Brendel
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

4. affy--analysis of Affymetrix GeneChip data at the probe level.

Authors: Laurent Gautier; Leslie Cope; Benjamin M Bolstad; Rafael A Irizarry
Journal: Bioinformatics Date: 2004-02-12 Impact factor: 6.937

5. Gramene: a resource for comparative grass genomics.

Authors: Doreen Ware; Pankaj Jaiswal; Junjian Ni; Xiaokang Pan; Kuan Chang; Kenneth Clark; Leonid Teytelman; Steve Schmidt; Wei Zhao; Samuel Cartinhour; Susan McCouch; Lincoln Stein
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

6. The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community.

Authors: Seung Yon Rhee; William Beavis; Tanya Z Berardini; Guanghong Chen; David Dixon; Aisling Doyle; Margarita Garcia-Hernandez; Eva Huala; Gabriel Lander; Mary Montoya; Neil Miller; Lukas A Mueller; Suparna Mundodi; Leonore Reiser; Julie Tacklind; Dan C Weems; Yihe Wu; Iris Xu; Daniel Yoo; Jungwon Yoon; Peifen Zhang
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

7. BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data.

Authors: Lao H Saal; Carl Troein; Johan Vallon-Christersson; Sofia Gruvberger; Ake Borg; Carsten Peterson
Journal: Genome Biol Date: 2002-07-15 Impact factor: 13.583

8. A new resource for cereal genomics: 22K barley GeneChip comes of age.

Authors: Timothy J Close; Steve I Wanamaker; Rico A Caldo; Stacy M Turner; Daniel A Ashlock; Julie A Dickerson; Rod A Wing; Gary J Muehlbauer; Andris Kleinhofs; Roger P Wise
Journal: Plant Physiol Date: 2004-03 Impact factor: 8.340

9. The Plant Ontology Consortium and plant ontologies.

Authors:
Journal: Comp Funct Genomics Date: 2002

10. NASCArrays: a repository for microarray data generated by NASC's transcriptomics service.

Authors: David J Craigon; Nick James; John Okyere; Janet Higgins; Joan Jotham; Sean May
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

36 in total

1. Genevestigator. Facilitating web-based gene-expression analysis.

Authors: Aleel K Grennan
Journal: Plant Physiol Date: 2006-08 Impact factor: 8.340

2. A combined strategy of "in silico" transcriptome analysis and web search engine optimization allows an agile identification of reference genes suitable for normalization in gene expression studies.

Authors: Primetta Faccioli; Gian Paolo Ciceri; Paolo Provero; Antonio Michele Stanca; Caterina Morcia; Valeria Terzi
Journal: Plant Mol Biol Date: 2006-12-02 Impact factor: 4.076

3. SFP genotyping from affymetrix arrays is robust but largely detects cis-acting expression regulators.

Authors: Z W Luo; E Potokina; A Druka; R Wise; R Waugh; M J Kearsey
Journal: Genetics Date: 2007-04-03 Impact factor: 4.562

4. Transcriptomic and metabolite analyses of Cabernet Sauvignon grape berry development.

Authors: Laurent G Deluc; Jérôme Grimplet; Matthew D Wheatley; Richard L Tillett; David R Quilici; Craig Osborne; David A Schooley; Karen A Schlauch; John C Cushman; Grant R Cramer
Journal: BMC Genomics Date: 2007-11-22 Impact factor: 3.969

Review 5. Web-queryable large-scale data sets for hypothesis generation in plant biology.

Authors: Siobhan M Brady; Nicholas J Provart
Journal: Plant Cell Date: 2009-04-28 Impact factor: 11.277

Review 6. Using metabolomics to estimate unintended effects in transgenic crop plants: problems, promises, and opportunities.

Authors: Owen A Hoekenga
Journal: J Biomol Tech Date: 2008-07

7. The plant structure ontology, a unified vocabulary of anatomy and morphology of a flowering plant.

Authors: Katica Ilic; Elizabeth A Kellogg; Pankaj Jaiswal; Felipe Zapata; Peter F Stevens; Leszek P Vincent; Shulamit Avraham; Leonore Reiser; Anuradha Pujar; Martin M Sachs; Noah T Whitman; Susan R McCouch; Mary L Schaeffer; Doreen H Ware; Lincoln D Stein; Seung Y Rhee
Journal: Plant Physiol Date: 2006-12-01 Impact factor: 8.340

8. Drought response in the spikes of barley: gene expression in the lemma, palea, awn, and seed.

Authors: Tilahun Abebe; Kalpalatha Melmaiee; Virginia Berg; Roger P Wise
Journal: Funct Integr Genomics Date: 2009-11-20 Impact factor: 3.410

9. Surveying expression level polymorphism and single-feature polymorphism in near-isogenic wheat lines differing for the Yr5 stripe rust resistance locus.

Authors: Tristan E Coram; Matthew L Settles; Meinan Wang; Xianming Chen
Journal: Theor Appl Genet Date: 2008-05-10 Impact factor: 5.699

10. Cloning and expression analysis of candidate genes involved in wax deposition along the growing barley (Hordeum vulgare) leaf.

Authors: Andrew Richardson; Alexandre Boscari; Lukas Schreiber; Gerhard Kerstiens; Mike Jarvis; Pawel Herzyk; Wieland Fricke
Journal: Planta Date: 2007-07-28 Impact factor: 4.116