| Literature DB >> 23162561 |
Sebastian Klie1, Camila Caldana, Zoran Nikoloski.
Abstract
With the advent of high-throughput technologies for data acquisition from different components (i.e., genes, proteins, and metabolites) of a given biological system, generation of hypotheses, and biological interpretations based on multivariate data sets become increasingly important. These technologies allow for simultaneous gathering of data from the same biological components under different perturbations, including genotypic variation and/or changes in conditions, resulting in so-called multiple data tables. Moreover, these data tables are obtained over a well-chosen time domain to capture the dynamics of the response of the biological system to the perturbation. The computational problem we address in this study is twofold: (1) derive a single data table, referred to as a compromise, which captures information common to the investigated set of multiple tables and (2) identify biological components which contribute most to the determined compromise. Here we argue that recent extensions to principle component analysis called STATIS and dual-STATIS can be used to determine the compromise on which classical techniques for data analysis, such as clustering and term over-enrichment, can be subsequently applied. In addition, we illustrate that STATIS and dual-STATIS facilitate interpretations of a publically available transcriptomics data set capturing the time-resolved response of Arabidopsis thaliana to changing light and/or temperature conditions. We demonstrate that STATIS and dual-STATIS can be used not only to identify the components of a biological system whose behavior is similarly affected due to the perturbation (e.g., in time or condition), but also to specify the extent to which each dimension of the data tables reflect the perturbation. These findings ultimately provide insights in the components and pathways which could be under tight control in plant systems.Entities:
Keywords: Arabidopsis; compromise of data tables; multi-way data analysis; transcriptomics time-series data
Year: 2012 PMID: 23162561 PMCID: PMC3499770 DOI: 10.3389/fpls.2012.00249
Source DB: PubMed Journal: Front Plant Sci ISSN: 1664-462X Impact factor: 5.753
Figure 1Overview of the experimental conditions of the . Both light intensity (D = 0 μE, LL = 75 μE, L = 150 μE, and HL = 300 μE; *4-L = 80 μE) and temperature (4, 21, and 32°C) were varied, resulting in eight distinct environmental conditions including the control, denoted by 21-L.
Figure 2STATIS analysis of the importance of tables, i.e., conditions. (A) Table weights and R coefficient allow assessing the influence of tables on the resulting compromise; the eigenvalues correspond to the variance captured by the principal components. (B) The interstructure is the projection of the individual condition on the principal components (PC1 and PC2 on the left, PC1 and PC3 on the right) allowing characterization of the effects governing the separation of conditions.
Figure 3Heatmap and hierarchical clustering illustrating the similarity of the eight environmental conditions. The pairwise similarity is derived using the R coefficient of the corresponding cross-product matrices obtained for each condition. The hierarchical clustering was obtained using average linkage on the pairwise distances obtained using 1 − R.
Figure 4Contributions of variables (i.e., time points 0–1280 min) by projection of the corresponding columns on the first and second principal components for each condition (panel 1–8) obtained by using the analysis based on STATIS. As the first time point (0 min) is the same across conditions, it resides at the origin. Arrows/trajectories illustrate to temporal progression of contributions of the variables.
Figure 5Visualization of the compromise. Each point represents one of the 2,276 genes, i.e., observations, projected on the first and second principal component. Red (repression) and blue (induction) color corresponds to the direction average fold-change of expression levels (main panel); the color intensity correspond to the magnitude of change. Genes that exhibit a significant contribution, as determined by bootstrapping procedures, are colored in dark, non-significantly contribution genes in light gray (lower panel).
Overview of over-represented MapMan bins determined by an enrichment analysis using significantly contributing observations (i.e., genes) in the STATIS analysis at a significance level of 1%.
| MapMan bin | Description |
|---|---|
| 1.1.1 | PS.lightreaction.photosystem II |
| 1.1.30 | PS.lightreaction.state transition |
| 1.2.4.1 | PS.photorespiration.glycinecleavage.P subunit |
| 2.1.2 | Major CHO metabolism.synthesis.starch |
| 2.2.2.1 | Major CHO metabolism.degradation.starch.starch cleavage |
| 3.2.3 | Minor CHO metabolism.trehalose.potential TPS/TPP |
| 3.4.3 | Minor CHO metabolism.myo-inositol.InsP Synthases |
| 10.6.2 | Cell wall.degradation.mannan-xylose-arabinose-fucose |
| 10.7 | Cell wall.modification |
| 13.1.4.1.4 | Amino acid metabolism.synthesis.branched-chain group.common.branched-chain amino acid aminotransferase |
| 13.1.6.3.1 | Amino acid metabolism.synthesis.aromaticaa.phenylalanine.arogenate dehydratase/prephenatedehydratase |
| 14.15 | S-assimilation.AKN |
| 14.2 | S-assimilation.APR |
| 16.2.1 | Secondary metabolism.phenylpropanoids.lignin biosynthesis |
| 16.5.1.1.1 | Secondary metabolism.sulfur-containing.glucosinolates.synthesis.aliphatic |
| 16.5.1.1.4 | Secondary metabolism.sulfur-containing.glucosinolates.synthesis.shared |
| 16.5.1.2.1 | Secondary metabolism.sulfur-containing.glucosinolates.regulation.aliphatic |
| 16.8.1 | Secondary metabolism.flavonoids.anthocyanins |
| 16.8.2 | Secondary metabolism.flavonoids.chalcones |
| 16.8.3 | Secondary metabolism.flavonoids.dihydroflavonols |
| 17.1.1.1.10 | Hormone metabolism.abscisic acid.synthesis-degradation.synthesis.9-cis-epoxycarotenoid dioxygenase |
| 17.3.1.2.2 | Hormone metabolism.brassinosteroid.synthesis-degradation.sterols.SMT2 |
| 17.5.2 | Hormone metabolism.ethylene.signal transduction |
| 18.4.1 | Co-factor and vitaminemetabolism.pantothenate.branched-chain amino acid aminotransferase |
| 26.21 | Misc.protease inhibitor/seed storage/lipid transfer protein (LTP) family protein |
| 26.25 | Misc.sulfotransferase |
| 26.3.2 | Misc.gluco-, galacto- and mannosidases.beta-galactosidase |
| 26.4.1 | Misc.beta 1,3 glucanhydrolases.glucan endo-1,3-beta-glucosidase |
| 26.8 | Misc.nitrilases, *nitrile lyases, berberine bridge enzymes, reticuline oxidases, troponinereductases |
| 26.9 | Misc.glutathione S transferases |
| 27.3.50 | RNA.regulation of transcription.General Transcription |
| 27.3.6 | RNA.regulation of transcription.bHLH,Basic Helix-Loop-Helix family |
| 27.3.80 | RNA.regulation of transcription.zf-HD |
| 28.1.3 | DNA.synthesis/chromatin structure.histone |
| 29.5.4 | Protein.degradation.aspartate protease |
| 30.2.17 | Signaling.receptorkinases.DUF 26 |
| 32 | Micro RNA, natural antisense etc |
| 34.13 | Transport.peptides and oligopeptides |
MapMan bins are sorted based on their bin number.
Figure 6Dual-STATIS analysis of the data set. Here, the tables are transposed in order to characterize each condition by the time points (in this setting corresponding to observations). (A) Table weights and R coefficient discriminate the influence of tables. (B) The contribution of the individual time points (0–1280 min) to the principal components allows describing to progression of transcriptional adjustments in Arabidopsis over the complete time-course.
Figure 7STATIS analysis of the importance of tables, i.e., conditions of the re-normalized dataset. (A) Table weights and R coefficient allow assessing the influence of tables on the resulting compromise; the eigenvalues correspond to the variance captured by the principal components (B) The interstructure is the projection of the individual condition on the principal components (PC1 and PC2 on the left, PC1 and PC3 on the right) allowing characterization of the effects governing the separation of conditions.
Figure 8(A) Contributions of variables (i.e., time points 0–1280 min) by projection of the corresponding columns on the first and second principal components for each condition (panel 1–8) obtained by using the re-normalized dataset. As the first time point (0 min) is the same across conditions, it resides at the origin. Arrows/trajectories illustrate to temporal progression of contributions of the variables. (B) Dual-STATIS analysis of the transposed and re-normalized dataset.
Overview of over-represented MapMan bins using significantly contributing observations (i.e., genes) in the STATIS analysis of the re-normalized dataset.
| MapMan bin | Description |
|---|---|
| 1.1.5.4 | PS.lightreaction.other electron carrier (ox/red).ferredoxinoxireductase |
| 2.1.2.2 | Major CHO metabolism.synthesis.starch.starch synthase |
| 2.1.2.3 | Major CHO metabolism.synthesis.starch.starch branching |
| 2.2.1.4 | Major CHO metabolism.degradation.sucrose.hexokinase |
| 2.2.2.3 | Major CHO metabolism.degradation.starch.glucan water dikinase |
| 8.1.9 | TCA/org transformation.TCA.malate DH |
| 10.2 | Cell wall.cellulose synthesis |
| 10.6.3 | Cell wall.degradation.pectatelyases and polygalacturonases |
| 10.8.99 | Cell wall.pectin*esterases.misc |
| 11.6 | Lipid metabolism.lipid transfer proteins etc |
| 16.2.1.3 | Secondary metabolism.phenylpropanoids.lignin biosynthesis.4CL |
| 21.2.1.2 | Redox.ascorbate and glutathione.ascorbate.GDP- |
| 23.5.4 | Nucleotide metabolism.deoxynucleotidemetabolism. ribonucleoside-diphosphatereductase |
| 26.10 | Misc.cytochrome P450 |
| 26.2 | Misc.UDPglucosyl and glucoronyltransferases |
| 26.6 | Misc.O-methyl transferases |
| 27.3.3 | RNA.regulation of transcription.AP2/EREBP, APETALA2/Ethylene-responsive element binding protein family |
| 27.3.32 | RNA.regulation of transcription.WRKY domain transcription factor family |
| 29.2.1.1.3.2.35 | Protein.synthesis.ribosomalprotein.prokaryotic. unknown organellar.50S subunit.L35 |
| 33.1 | Development.storage proteins |
| 34.2 | Transporter.sugars |
Note, that this table contains bins which are derived in addition to those displayed in Table .