Literature DB >> 17940091

CATdb: a public access to Arabidopsis transcriptome data from the URGV-CATMA platform.

Séverine Gagnot¹, Jean-Philippe Tamby, Marie-Laure Martin-Magniette, Frédérique Bitton, Ludivine Taconnat, Sandrine Balzergue, Sébastien Aubourg, Jean-Pierre Renou, Alain Lecharny, Véronique Brunaud.

Abstract

CATdb is a free resource available at http://urgv.evry.inra.fr/CATdb that provides public access to a large collection of transcriptome data for Arabidopsis thaliana produced by a single Complete Arabidopsis Transcriptome Micro Array (CATMA) platform. CATMA probes consist of gene-specific sequence tags (GSTs) of 150-500 bp. The v2 version of CATMA contains 24 576 GST probes representing most of the predicted A. thaliana genes, and 615 probes tiling the chloroplastic and mitochondrial genomes. Data in CATdb are entirely processed with the same standardized protocol, from microarray printing to data analyses. CATdb contains the results of 53 projects including 1724 hybridized samples distributed between 13 different organs, 49 different developmental conditions, 45 mutants and 63 environmental conditions. All the data contained in CATdb can be downloaded from the web site and subsets of data can be sorted out and displayed either by keywords, by experiments, genes or lists of genes up to 100. CATdb gives an easy access to the complete description of experiments with a picture of the experiment design.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2007 PMID： 17940091 PMCID： PMC2238931 DOI： 10.1093/nar/gkm757

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Transcriptome characterization by microarray technologies is a powerful tool for functional analysis of genes. The primary purpose of most of the experiments was finding candidate genes for further experimental work. Nevertheless, with the accumulation of data, a complementary usage of the transcriptome resource is the integration of large sets of data to infer, for instance, gene regulatory networks. Several databases dedicated to microarray data exist and can be distributed in three general classes (i) public repositories including ArrayExpress, Gene Expression Omnibus (GEO) and The Center for Information Biology Gene Expression Database (CIBEX) (1–3); (ii) general databases oriented toward tools for the analyses and displaying of different types of arrays, like Genevestigator or the Stanford Microarray Database (SMD) (4,5) and (iii) specific databases dedicated to a species like The Arabidopsis Information Resource, or the expression browser (eFP) from the Bio-Array Resource for Arabidopsis Functional Genomics, or specific to a life kingdom like the Plant Expression Database (PLEXdb) (6–8). Despite considerable and valuable efforts done to define and apply the Minimal Information About Microarray Experiment (MIAME) (9) recommendations, a recent survey of the data in public repositories indicated that data submission and quality are troublesome for integrating current microarray data (10). The diversity of transcriptome data and methods to analyse them is one of the problems for the occasional users. We have developed CATdb to manage the microarray data resource generated by the URGV transcriptome platform (http://www.versailles.inra.fr/urgv) and allow an easy access to the data by the community of biologists. We took advantage of the unique origin of the URGV-CATMA data to concentrate our effort on the quality of the data and to systematically collect a global view of each project with the details of the experiment design. Thus, CATdb provides an easy access to a large and growing set of microarrays named CATMA (Complete Arabidopsis Transcriptome Micro Array) (11). All the RNA samples are sent by collaborators at the URGV then checked for quality control, labelled and hybridized following normalized protocols. The scanning is performed with common settings and a unique normalization followed by a statistical analysis procedure is applied to each experiment as described subsequently.

CATMA MICROARRAYS

CATMA is a generic Arabidopsis thaliana microarray developed by a European consortium (12). The design of the probes for CATMA microarrays is different from the design of both the A. thaliana Agilent arrays (Palo Alto, CA, USA) and the ATH1 Affymetrix GeneChips (Santa Clara, CA, USA) (13) that use respectively oligo-nucleotide probes of 60 mers and sets of oligo-nucleotides of 25 mers. CATMA probes consist of gene-specific sequence tags (GSTs) of 150–500 bp that have been designed with SPADS (Specific Primers & Amplicons Design Software) (14). Tagged genes come from both the EuGene software prediction (15) and the annotation from The Institute for Genomic Research (TIGR). The v2 version of CATMA contains 24 576 GST probes representing ∼85% of the predicted genes, 615 probes tiling the chloroplastic and mitochondrial genomes (v2.1) and 44 probes of non-protein coding genes (v2.2). A thorough benchmark study established the CATMA array as a mature alternative to the Affymetrix and Agilent platforms (16). The CATMA GSTs are also the basic materials in the AGRIKOLA (Arabidopsis Genomic RNAi Knock-out Line Analysis) European project focusing on the large-scale systematic RNAi silencing of Arabidopsis genes (http://www.agrikola.org/).

DATABASE CHARACTERISTICS AND CONTENTS

Primarily, CATdb was based on the schema and objects used in the ArrayExpress database (17). Then, the ArrayExpress schema has been adapted to our platform to include some new features. The main differences are: (i) the systematic addition of a figure describing the design of an experiment in standardized format, (ii) the possibility to manage a supplementary step with the pooling of samples or extracts and (iii) the storage of the statistical analyses using technical replicates (see the Data Analysis section). The complete description of the experiments is submitted via a private web interface that helps to respect the MIAME instructions. CATdb generates the SOFT (Simple Omnibus Format in Text) format developed by the GEO repository (Gene Expression Omnibus, http://www.ncbi.nlm.nih.gov/geo/). Data from the 1724 hybridizations in CATdb are also available either at GEO or at ArrayExpress. In the description of each project or experiment, there is the corresponding access number in GEO or ArrayExpress with a link to their respective web pages. All the data submissions and analyses are performed in our laboratory, so the development of CATdb has been oriented by the visual approaches used by biologists at URGV, like using colours to encode the data values, to facilitate the analyses and comparisons of the data. The increasing number of research projects involving CATMA microarrays shows that the CATMA arrays are an important tool for biologists. CATdb gives a public access to all the data produced by the URGV-CATMA platform even those that have not been published after one year. CATdb contains 46 projects with 1724 hybridizations corresponding to 627 different samples. The samples of these projects concern 13 types of organs: cells (73 samples), protoplasts (10), roots (116), hypocotyls (24), stems (18), leaves (129), flowers (36), pollen (2), siliques (4), seeds (17), whole aerial plants (43), plantlets (39) or whole plants (116). These samples are distributed between 49 different developmental conditions, 31 developmental stages, 45 mutants and 63 different abiotic/biotic stresses or treatments.

DATA ACCESS

CATdb is a free web resource available at the following address: http://urgv.evry.inra.fr/CATdb. There are four different possibilities to select a subset of data. First, a list of all the available projects is displayed by default. A limited list may be obtained by querying the database by keywords. These keywords are searched for in both the description of the projects, i.e. coordinator name, experiment type, environmental or treatment factor, mutant name and the description of the samples, i.e. plant species, organs, treatments and type of arrays. Second, an experiment name may be selected in the project table giving access to the entire description of the corresponding experiment including a picture of the experiment design (Figure 1A). The swap column gives access to all the results of hybridizations organized by dye-swap for the selected experiment. Normalized log2 intensities, log2-ratios and Bonferroni P-values are given for each probe (Figure 1B). As this table is rather large, only the probes with statistically significant differential expression for a dye-swap are displayed on the screen. Nevertheless, the complete table may be downloaded as a tabulated text file. Third, from either a gene or a probe accession, one may obtain signal intensities and Bonferroni P-value for all the dye-swaps processed in all the projects (Figure 2). Furthermore, data may be sorted by project, organ or any statistics. For each probe, the associated features, i.e. sequence, quality of PCR results and if applicable, the tagged gene with functional annotation, are given. Fourth, from a list of genes or probe accessions, one obtains a table containing, for each selected probe, the log2-ratios for all the projects (Figure 3). The coloured display of the differential expression allows the comparison of the data for a list of up to 100 genes.

Figure 1.

Figure 2.

Results of a query of CATdb with the name of a gene. The query was the gene AT1G32900. Data are displayed for all the projects in CATdb, but only the first four projects sorted by the Bonferroni P-values are shown here. From the left to the right, columns contain the project and the experiment names, the array type, the organ used, the sample name, the log2-intensities for both samples, the log2-ratio and the Bonferroni P-value.

Figure 3.

Results of a query of CATdb with a list of gene identifiers. From the left to the right, columns are the probe name, the corresponding gene name, the differential expression log2-ratio in all the different projects contained in CATdb. The differential expression is colour coded in green, red or black corresponding to the levels of log2-ratio in each swap analysed, and in grey for missing values. The correspondence between the colours and the log2-ratio values is given in a bar above the table. By rolling over a cell of the table, as shown by the arrows, one may display more information about either the projects, dye-swaps or expression values, depending on the selected cell.

Results of a query of CATdb for an experiment called ‘'C;ircadian cycle' and belonging to the project named ‘AF30_Starch_circadian_rythm’. The result includes (A) the experimental design that describes all the hybridizations with two colours, red and green, indicating the dye used for labelling each sample and (B) a table displaying for each dye-swap of the experiment and from left to right, the log2-intensities for samples 1 and 2, the log2-ratio and the Bonferroni P-value. In the example, results are sorted by the ratio values in the dye-swap between the leaf sample T12, extracted at 12 h after the start of the experiment, and the leaf sample T9, extracted at 9 h. Results of a query of CATdb with the name of a gene. The query was the gene AT1G32900. Data are displayed for all the projects in CATdb, but only the first four projects sorted by the Bonferroni P-values are shown here. From the left to the right, columns contain the project and the experiment names, the array type, the organ used, the sample name, the log2-intensities for both samples, the log2-ratio and the Bonferroni P-value. Results of a query of CATdb with a list of gene identifiers. From the left to the right, columns are the probe name, the corresponding gene name, the differential expression log2-ratio in all the different projects contained in CATdb. The differential expression is colour coded in green, red or black corresponding to the levels of log2-ratio in each swap analysed, and in grey for missing values. The correspondence between the colours and the log2-ratio values is given in a bar above the table. By rolling over a cell of the table, as shown by the arrows, one may display more information about either the projects, dye-swaps or expression values, depending on the selected cell. All the public data contained in CATdb can be downloaded from an anonymous FTP (File Transfer Protocol) site (ftp://urgv.evry.inra.fr/CATdb). Users who have subscribed to the CATdb e-mailing list receive news about updates and new tools.

DATA ANALYSIS

Statistic methods were developed under the software R (R Development Core Team, http://www.R-project.org) in collaboration with the group ‘Statistics and Genome’ at UMR AgroParisTech/INRA MIA 518 and are available in the R package ‘Anapuce’ on their web site (http://www.inapg.fr/ens_rech/maths/outil_A.html). For each CATMA array, the raw data include the logarithm of median feature pixel intensity at wavelengths 635 nm (red) and 532 nm (green), no background is subtracted. A normalization per array is performed to remove systematic biases. First, spots that are considered badly formed features are excluded. Then, a global intensity-dependent normalization is performed using the lowess procedures (18) to correct the dye bias. Finally, for each block, the log-ratio median calculated over the values for the entire block is subtracted from each individual log-ratio value to correct effects on each block, as well as print-tip, washing and/or drying effects. At the end of the normalization step, a normalized log-ratio, which is equivalent to an expression difference (in log base 2) between the two samples co-hybridized on the same array, is given for each spot. It is equal to the raw log-ratio minus the lowess correction minus the block correction. A normalized logarithm intensity for each sample is also calculated. It is done according to the within-array correction proposed by Yang and Thorne (19), which is a redistribution of the correction calculated for the log2-ratio normalization on each channel. To determine differentially expressed genes from a dye-swap, a paired t-test is performed on the log2-ratios. Since the number of observations per spot equals two, it is inadequate for calculating a specific variance. For this reason, it is assumed that the variance of the log2-ratios is the same for all spots. This solution has the main advantage to calculate an estimator over a large number of data, leading to a robust estimation of the variance and to a gain in the power of the test. Nevertheless, this solution should be applied with some precautions since some spots display an extreme specific variance (too small or too large) and prevent that the assumption of common variance is verified. Indeed spots with a too small specific variance decrease wrongly the estimate of the common variance and hence it could lead to increase the number of false positives, and spot with a too large variance increase wrongly the estimate of the common variance and hence it could lead to decrease the test power. For the above reasons, spots with extreme specific variance are excluded from the statistical analysis. The spots that are excluded are those with a ‘specific variance/common variance’ ratio smaller than the ‘alpha-quantile of a chi-squared distribution of one degree of liberty’ or greater than the ‘1-alpha-quantile of a chi-squared distribution of one degree of liberty’ with alpha equal to 0.0001. This rule stems from a direct application of Cochran's theorem. The raw P-values are adjusted by the Bonferroni method, which controls the Family Wise Error Rate (FWER) (20). When the Bonferroni P-value is lower than 0.05, the spot is declared differentially expressed. Spots with a missing P-value are spots with an extreme variance or genes for which one observation only is available. That is, when for one of the two arrays, the spot corresponding to the gene was a badly formed feature.

DATA QUALITY

Information on the CATMA probes and the corresponding genes is available in CATdb. This includes probe sequences and their estimated specificity, amplification efficiency, localization within genes (intron, exon) or between genes. All these annotations are graphically displayed in the genome database FLAGdb++ (21) and there are direct links from probes and genes in CATdb toward the probe loci in FLAGdb++. To validate transcriptome data, the biologist relies on quantitative RT-PCR applied to a set of genes exhibiting differential expression between two experimental situations. On the CATMA resource, quantitative RT-PCRs were done on more than 200 genes and CATMA results have been confirmed in more than 90% of the validations. The details of RT-PCR from tested genes are described in the publications associated to the different research projects using CATMA arrays. A list of these publications is available on the CATdb web site.

FUTURE PLANS

Based on the number of not yet public data, 4336 hybridized samples, stored in CATdb, the number of public projects is expected to double in the coming year. Updating data depends on the submission date of a project. As in most public repositories, data cannot be maintained under the private status more than one year and any data are publicly released after this period of time or before on the authors’ request. CATMA is an ongoing project and new array designs will be released soon including 7189 new GSTs (collaboration with CATMA members) tagging the remaining annotated genes and different paralogues belonging to a gene family. Furthermore, probes for small RNA genes were designed by URGV in collaboration with O. Voinnet and L. Navarro (IBMP Strasbourg) and will be included in a future version. CATdb developments needed by the new designs are done in parallel.

18 in total

1. Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.

Authors: A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron
Journal: Nat Genet Date: 2001-12 Impact factor: 38.330

2. CIBEX: center for information biology gene expression database.

Authors: Kazuho Ikeo; Jun Ishi-i; Takurou Tamura; Takashi Gojobori; Yoshio Tateno
Journal: C R Biol Date: 2003 Oct-Nov Impact factor: 1.583

3. Automatic design of gene-specific sequence tags for genome-wide functional studies.

Authors: Vincent Thareau; Patrice Déhais; Carine Serizet; Pierre Hilson; Pierre Rouzé; Sébastien Aubourg
Journal: Bioinformatics Date: 2003-11-22 Impact factor: 6.937

4. FLAGdb++: a database for the functional analysis of the Arabidopsis genome.

Authors: Franck Samson; Véronique Brunaud; Sylvain Duchêne; Yannick De Oliveira; Michel Caboche; Alain Lecharny; Sébastien Aubourg
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

5. Microarray data representation, annotation and storage.

Authors: Alvis Brazma; Ugis Sarkans; Alan Robinson; Jaak Vilo; Martin Vingron; Jörg Hoheisel; Kurt Fellenberg
Journal: Adv Biochem Eng Biotechnol Date: 2002 Impact factor: 2.635

6. Development and evaluation of an Arabidopsis whole genome Affymetrix probe array.

Authors: Julia C Redman; Brian J Haas; Gene Tanimoto; Christopher D Town
Journal: Plant J Date: 2004-05 Impact factor: 6.417

7. The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community.

Authors: Seung Yon Rhee; William Beavis; Tanya Z Berardini; Guanghong Chen; David Dixon; Aisling Doyle; Margarita Garcia-Hernandez; Eva Huala; Gabriel Lander; Mary Montoya; Neil Miller; Lukas A Mueller; Suparna Mundodi; Leonore Reiser; Julie Tacklind; Dan C Weems; Yihe Wu; Iris Xu; Daniel Yoo; Jungwon Yoon; Peifen Zhang
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

8. CATMA: a complete Arabidopsis GST database.

Authors: Mark L Crowe; Carine Serizet; Vincent Thareau; Sébastien Aubourg; Pierre Rouzé; Pierre Hilson; Jim Beynon; Peter Weisbeek; Paul van Hummelen; Philippe Reymond; Javier Paz-Ares; Wilfried Nietfeld; Martin Trick
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

9. Versatile gene-specific sequence tags for Arabidopsis functional genomics: transcript profiling and reverse genetics applications.

Authors: Pierre Hilson; Joke Allemeersch; Thomas Altmann; Sébastien Aubourg; Alexandra Avon; Jim Beynon; Rishikesh P Bhalerao; Frédérique Bitton; Michel Caboche; Bernard Cannoot; Vasil Chardakov; Cécile Cognet-Holliger; Vincent Colot; Mark Crowe; Caroline Darimont; Steffen Durinck; Holger Eickhoff; Andéol Falcon de Longevialle; Edward E Farmer; Murray Grant; Martin T R Kuiper; Hans Lehrach; Céline Léon; Antonio Leyva; Joakim Lundeberg; Claire Lurin; Yves Moreau; Wilfried Nietfeld; Javier Paz-Ares; Philippe Reymond; Pierre Rouzé; Goran Sandberg; Maria Dolores Segura; Carine Serizet; Alexandra Tabrett; Ludivine Taconnat; Vincent Thareau; Paul Van Hummelen; Steven Vercruysse; Marnik Vuylsteke; Magdalena Weingartner; Peter J Weisbeek; Valtteri Wirta; Floyd R A Wittink; Marc Zabeau; Ian Small
Journal: Genome Res Date: 2004-10 Impact factor: 9.043

10. The Stanford Microarray Database: implementation of new analysis tools and open source release of software.

Authors: Janos Demeter; Catherine Beauheim; Jeremy Gollub; Tina Hernandez-Boussard; Heng Jin; Donald Maier; John C Matese; Michael Nitzberg; Farrell Wymore; Zachariah K Zachariah; Patrick O Brown; Gavin Sherlock; Catherine A Ball
Journal: Nucleic Acids Res Date: 2006-12-20 Impact factor: 16.971

87 in total

1. Mutations in the Arabidopsis homolog of LST8/GβL, a partner of the target of Rapamycin kinase, impair plant growth, flowering, and metabolic adaptation to long days.

Authors: Manon Moreau; Marianne Azzopardi; Gilles Clément; Thomas Dobrenel; Chloé Marchive; Charlotte Renne; Marie-Laure Martin-Magniette; Ludivine Taconnat; Jean-Pierre Renou; Christophe Robaglia; Christian Meyer
Journal: Plant Cell Date: 2012-02-03 Impact factor: 11.277

2. Geminiviruses subvert ubiquitination by altering CSN-mediated derubylation of SCF E3 ligase complexes and inhibit jasmonate signaling in Arabidopsis thaliana.

Authors: Rosa Lozano-Durán; Tabata Rosas-Díaz; Giuliana Gusmaroli; Ana P Luna; Ludivine Taconnat; Xing Wang Deng; Eduardo R Bejarano
Journal: Plant Cell Date: 2011-03-25 Impact factor: 11.277

3. Screening and quantification of the expression of antibiotic resistance genes in Acinetobacter baumannii with a microarray.

Authors: Sébastien Coyne; Ghislaine Guigon; Patrice Courvalin; Bruno Périchon
Journal: Antimicrob Agents Chemother Date: 2009-11-02 Impact factor: 5.191

4. Multi-omics Analysis Reveals Sequential Roles for ABA during Seed Maturation.

Authors: Frédéric Chauffour; Marlène Bailly; François Perreau; Gwendal Cueff; Hiromi Suzuki; Boris Collet; Anne Frey; Gilles Clément; Ludivine Soubigou-Taconnat; Thierry Balliau; Anja Krieger-Liszkay; Loïc Rajjou; Annie Marion-Poll
Journal: Plant Physiol Date: 2019-04-04 Impact factor: 8.340

5. Arabidopsis GLUTATHIONE REDUCTASE1 plays a crucial role in leaf responses to intracellular hydrogen peroxide and in ensuring appropriate gene expression through both salicylic acid and jasmonic acid signaling pathways.

Authors: Amna Mhamdi; Jutta Hager; Sejir Chaouch; Guillaume Queval; Yi Han; Ludivine Taconnat; Patrick Saindrenan; Houda Gouia; Emmanuelle Issakidis-Bourguet; Jean-Pierre Renou; Graham Noctor
Journal: Plant Physiol Date: 2010-05-20 Impact factor: 8.340

6. The RNA binding protein Tudor-SN is essential for stress tolerance and stabilizes levels of stress-responsive mRNAs encoding secreted proteins in Arabidopsis.

Authors: Nicolas Frei dit Frey; Philippe Muller; Fabien Jammes; Dimosthenis Kizis; Jeffrey Leung; Catherine Perrot-Rechenmann; Michele Wolfe Bianchi
Journal: Plant Cell Date: 2010-05-18 Impact factor: 11.277

7. DELLAs regulate chlorophyll and carotenoid biosynthesis to prevent photooxidative damage during seedling deetiolation in Arabidopsis.

Authors: Soizic Cheminant; Michael Wild; Florence Bouvier; Sandra Pelletier; Jean-Pierre Renou; Mathieu Erhardt; Scott Hayes; Matthew J Terry; Pascal Genschik; Patrick Achard
Journal: Plant Cell Date: 2011-05-13 Impact factor: 11.277

8. The Arabidopsis abscisic acid catabolic gene CYP707A2 plays a key role in nitrate control of seed dormancy.

Authors: Theodoros Matakiadis; Alessandro Alboresi; Yusuke Jikumaru; Kiyoshi Tatematsu; Olivier Pichon; Jean-Pierre Renou; Yuji Kamiya; Eiji Nambara; Hoai-Nam Truong
Journal: Plant Physiol Date: 2008-12-12 Impact factor: 8.340

9. TC-motifs at the TATA-box expected position in plant genes: a novel class of motifs involved in the transcription regulation.

Authors: Virginie Bernard; Véronique Brunaud; Alain Lecharny
Journal: BMC Genomics Date: 2010-03-12 Impact factor: 3.969

10. Cell wall biogenesis of Arabidopsis thaliana elongating cells: transcriptomics complements proteomics.

Authors: Elisabeth Jamet; David Roujol; Hélène San-Clemente; Muhammad Irshad; Ludivine Soubigou-Taconnat; Jean-Pierre Renou; Rafael Pont-Lezica
Journal: BMC Genomics Date: 2009-10-31 Impact factor: 3.969