| Literature DB >> 28088176 |
Marco Albrecht1,2, Damian Stichel3,4, Benedikt Müller5, Ruth Merkle6,7, Carsten Sticht8, Norbert Gretz8, Ursula Klingmüller6,7, Kai Breuhahn5, Franziska Matthäus3,9.
Abstract
BACKGROUND: The analysis of microarray time series promises a deeper insight into the dynamics of the cellular response following stimulation. A common observation in this type of data is that some genes respond with quick, transient dynamics, while other genes change their expression slowly over time. The existing methods for detecting significant expression dynamics often fail when the expression dynamics show a large heterogeneity. Moreover, these methods often cannot cope with irregular and sparse measurements.Entities:
Keywords: Differential expression; EGF; Gene ontology; Gene set analysis; Stimulation experiments; Time series
Mesh:
Year: 2017 PMID: 28088176 PMCID: PMC5237546 DOI: 10.1186/s12859-016-1440-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Trajectory of the transcriptomes: Axes represent principal components explaining 95.7% of the variability in the data. Measurement points represent the entire transcriptome under three different stimulation experiments projected onto the first three principal components. For early time periods, all three transcriptomes correlate very well with each other. Over time, the transcriptomes develop stimulus dependent. Stimulus 1 leads to a strong change in the transcriptome, while stimulus 2 has a much smaller effect. Possible outliers are measurement points that show a large distance from the trajectory or from related replicates
Fig. 2Score characteristics: a) Dynamics score. The alternative hypothesis is represented by the solid line. The dashed line represent the null hypothesis (Picture source [6]). b) Peak score. Is based on the largest distance (arrow) between measurement points for two different stimuli. The solid line represents the fit achieved via quantile regression with Eq. (1). c) Integral score. The area between two dynamics indicates the absolute mRNA production change. This value can be computed for different time intervals. d) Different score distributions after z-transformation and the merged consensus score distribution
Compendious result table. The instability of SNORA11 is confirmed and the effect size is high, which indicates a false positive result. The plotted SNORA11 profile in Fig. 4 confirms this suspicion. The effect size of the peak score covers up to 26% of the detection range
| Consensus rank | Gene name | Consensus score | Consensus score | PubMed | Instability score | Effect size of peak score |
|---|---|---|---|---|---|---|
| 1 | CTGF | 1.00 | 3.57E-05 | 73 | 0.009 | 0.26 |
| 2 | EGR1 | 0.91 | 7.25E-05 | 101 | 0.006 | 0.23 |
| 3 | SNORA11 | 0.62 | 8.18E-04 | 0 | 0.038 | 0.26 |
| 4 | PTGS2 | 0.59 | 0.001 | 804 | 0.009 | 0.10 |
| 5 | JUN | 0.58 | 0.001 | 6789 | 0.005 | 0.11 |
| 6 | GLIPR1 | 0.57 | 0.001 | 0 | 0.006 | 0.13 |
| 7 | FOS | 0.55 | 0.002 | 920 | 0.002 | 0.14 |
| 8 | AREG | 0.53 | 0.002 | 549 | 0.006 | 0.10 |
| 13 | MIR4320 | 0.44 | 0.005 | 0 | 0.016 | 0.15 |
| 15 | F3 | 0.44 | 0.006 | 65 | 0.011 | 0.10 |
| 19 | IL8 | 0.41 | 0.007 | 43 | 0.018 | 0.13 |
| 20 | EGR2 | 0.41 | 0.008 | 7 | 0.005 | 0.12 |
| 21 | PCNA | 0.40 | 0.009 | 583 | 0.003 | 0.03 |
| 29 | DUSP5 | 0.37 | 0.013 | 4 | 0.012 | 0.10 |
| 36 | MYC | 0.34 | 0.017 | 984 | 0.002 | 0.06 |
| 37 | ROS1 | 0.34 | 0.017 | 84 | 0.005 | 0.03 |
| 38 | HIF1A | 0.34 | 0.017 | 185 | 0.007 | 0.08 |
| 42 | MIR554 | 0.34 | 0.018 | 0 | 0.004 | 0.15 |
| 45 | IL24 | 0.32 | 0.022 | 0 | 0.003 | 0.06 |
| 49 | TGFB2 | 0.31 | 0.025 | 121 | 0.008 | 0.04 |
| 51 | TGFB1 | 0.30 | 0.027 | 887 | 0.004 | 0.03 |
| 52 | JUNB | 0.30 | 0.028 | 54 | 0.008 | 0.05 |
Fig. 4Time course profiles of genes considered significant. Red: With EGF stimulation. Blue: Control. Line: Quantile regression. Points: Measurements. SNORA11 is ranked highly significant, but the instability score is high and identifies this finding as false positive
Fig. 3Gene set analysis. a) Example analysis result with the average of n genes. b) Scheme for minimal overlap calculation. The continuous lines represent the average expression of the gene group at one time point for either the stimulated sample (S) or the control (C). The dotted lines represent the average with upper (Sd) or lower (Sd) standard deviation