| Literature DB >> 25228922 |
Philip Zimmermann1, Stefan Bleuler1, Oliver Laule1, Florian Martin2, Nikolai V Ivanov2, Prisca Campanoni2, Karen Oishi2, Nicolas Lugon-Moulin2, Markus Wyss1, Tomas Hruz3, Wilhelm Gruissem4.
Abstract
Reference datasets are often used to compare, interpret or validate experimental data and analytical methods. In the field of gene expression, several reference datasets have been published. Typically, they consist of individual baseline or spike-in experiments carried out in a single laboratory and representing a particular set of conditions. Here, we describe a new type of standardized datasets representative for the spatial and temporal dimensions of gene expression. They result from integrating expression data from a large number of globally normalized and quality controlled public experiments. Expression data is aggregated by anatomical part or stage of development to yield a representative transcriptome for each category. For example, we created a genome-wide expression dataset representing the FDA tissue panel across 35 tissue types. The proposed datasets were created for human and several model organisms and are publicly available at http://www.expressiondata.org.Entities:
Year: 2014 PMID: 25228922 PMCID: PMC4165432 DOI: 10.1186/1756-0381-7-18
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Figure 1Data transformation process from public repositories to unified data queried by GENEVESTIGATOR or to reference datasets available at http://www. expressiondata.org.A. Manual curation of public data, including sample annotation using ontologies, statistical quality control of samples and experiments, and data normalization. B. Storage of high quality standardized expression experiments in the GENEVESTIGATOR database. C. Aggregation of expression data according to categories of anatomical parts, cell lines, neoplasms, perturbations (including diseases, drugs and genotypes) and stages of development. While all datasets are available through GENEVESTIGATOR, selected compilated reference datasets such as for anatomy or development are made publicly available through http://www. expressiondata.org.
Figure 2Principle component analysis of mouse tissues and organs (Left figure: components 1 versus 2; right figure: components 1 versus 3). Major organ systems are colored, while individual tissues are numbered as follows: (1) cell culture / primary cell, (2) fibroblast, (3) myoblast, (4) adipocyte, (5) hindlimb skeletal muscle, (6) gastrocnemius, (7) muscle skeletal muscle, (8) diaphragm, (9) heart, (10) heart ventricle, (11) heart left ventricle, (12) haemolymphoid system, (13) blood, (14) leukocyte (white blood cell), (15) thymus, (16) integumental system, (17) mammary gland (breast), (18) telencephalon (cerebrum), (19) cerebral cortex (neopallium), (20) hippocampus, (21) cerebellum, (22) eye, (23) intestine, (24) liver, (25) reproductive system, (26) testis (male gonad), (27) lung, (28) endocrine system, (29) pancreas, (30) adipose tissue (fat), (31) bone marrow, (32) mean of all tissues.
Figure 3Principle component analysis of Drosophila and mouse developmental stages. The expression vector for each stage represents an average from all samples annotated as belonging to that stage of development. Results were processed from data generated with the Affymetrix microarray platforms Drosophila Genome 2.0 (1431 samples) and Murine Genome U74Av2 (2357 samples).
Figure 4Oscillatory pattern of expression in Arabidopsis.A) Three genes with different patterns of oscillation are shown: CCA1 (AT2G46830; circadian clock associated 1; in red), CCR1 (AT1G15950; cinnamoyl coa reductase 1; in green), and CAT3 (AT1G20620; Catalase 3, in blue). Signal values represent expression levels as normalized using RMA. Open circles represent expression not significantly above background (p > 0.06, as obtained from the MAS5 p-values for detection calls). B) Twelve genes with different phase shifts in the circadian oscillation pattern relative to the first gene. Genes were clustered in GENEVESTIGATOR using Hierarchical Clustering with optimal leaf ordering.
Figure 5Non-circadian time course gene expression of two early cold-induced exons.A. Putative C-repeat-binding factor 4/dehydration-responsive element-binding protein 1D (CBF4/DREB1D) B. Putative Ethylene-Responsive element binding ERF2.