| Literature DB >> 17517782 |
Debashis Sahoo1, David L Dill, Rob Tibshirani, Sylvia K Plevritis.
Abstract
This article presents a new method for analyzing microarray time courses by identifying genes that undergo abrupt transitions in expression level, and the time at which the transitions occur. The algorithm matches the sequence of expression levels for each gene against temporal patterns having one or two transitions between two expression levels. The algorithm reports a P-value for the matching pattern of each gene, and a global false discovery rate can also be computed. After matching, genes can be sorted by the direction and time of transitions. Genes can be partitioned into sets based on the direction and time of change for further analysis, such as comparison with Gene Ontology annotations or binding site motifs. The method is evaluated on simulated and actual time-course data. On microarray data for budding yeast, it is shown that the groups of genes that change in similar ways and at similar times have significant and relevant Gene Ontology annotations.Entities:
Mesh:
Year: 2007 PMID: 17517782 PMCID: PMC1920252 DOI: 10.1093/nar/gkm284
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Signals of interest. Different types of binary temporal patterns that need to be extracted from the time course microarray data. (a) Gene expressions transition from a low value to a high value. (b) Gene expressions transition from a high value to a low value. (c) Gene expressions transition from low to high and return to the same low value. (d) gene expressions transition from high to low and return to the same high value.
Figure 2.Estimating degrees of freedom. (Left) Estimating degrees of freedom for the one-step model. (Right) Estimating degrees of freedom for the two-step model.
Figure 3.Evaluating StepMiner on artificial data. (A) Proportion of correctly classified steps using 15 time points and different step heights with the step position fixed at 5 for one-step. For two-step patterns, the step positions were ‘up’ at 5 and ‘down’ at 9. σ is the SD of the 0-mean additive Gaussian noise. The number of false steps found in a random Gaussian data was 10%. (B) Proportion of correctly classified steps using different step heights with random step position and 15 time points. σ is the SD of the 0-mean additive Gaussian noise added. Ten percent of the steps are false. (C) Sensitivity of StepMiner to the number of time points, using random step positions and step height 5σ. A total of 2000 one-step and 2000 two-step functions were used in the analysis. (D) Sensitivity of StepMiner to the spacing between steps. The first step is fixed at the fourth position and the second step is varied according to the spacing. The height of the step is varied from 1σ to 5σ in a data set of 15 time points.
Figure 4.Application of StepMiner to real microarray time course data. Comparison of StepMiner to hierarchical clustering for the analysis of diauxic shift time course microarray data on glucose-limited budding yeast. The expression level of each gene in StepMiner is centered around the midpoint of the step to display the transitions clearly. Fitted steps for three example genes are shown on the right.
GO annotations of different groups and P-values according to GO-TermFinder perl module
| GO Annotations | Group | ||
|---|---|---|---|
| Protein biosynthesis | Down-9.25 | 3.4 | 9.7 |
| Ribosome biogenesis and assembly | Down | 1.2 | 1.4 |
| Generation H of M precursor metabolites and energy | Up | 7.4 | 6.1 |
| Oxidative phosphorylation | Up | 4.9 | 6 |
| Amino acid and derivative metabolism | Up-Down | 1.7 | 6.2 |
| Amine biosynthesis | U-D-9 | 1.7 | 1.1 |
| Hexose catabolism | U-8.25-D | 0.00046 | 0.044 |
| Monosaccharide catabolism | U-8.25-D | 0.0012 | 0.091 |
| Siderophore transport | – | – | 0.013 |
| Intracellular transport | – | – | 1.6 |
| Secretory pathways | – | – | 1.5 |
‘P-value1’ is the P-value using the list of genes from the clusters reported by Brauer et al. ‘Down-9.25’ uses all the genes that turn off significantly at 9.25h time step. ‘Up’ uses all the genes that turn on at some time step. ‘U-D-9’ uses the list of genes that turn on at some point before 9 h but turn off at 9 h. ‘U-8.25-D’ genes turn on at 8.25 h and turn off later.
Identification of steps and average deviation from the true step positions by StepMiner with replication versus the addition of more time points
| Type | True | Missed | False | Average |
|---|---|---|---|---|
| Step | Step | Step | Deviation(Min) | |
| Addition | 99% | 1% | 8% | 11 |
| Replication | 100% | 0% | 8% | 34 |
The Addition method uses 30 different time points. The Replication method uses 10 time points with three replicates. The analysis was performed on artificial data for 1000 genes with 500 single steps (Step height = 5σ) placed uniformly randomly across the time-course of 10 h. ‘True Step’ is the number of correctly identified steps. ‘Missed Step’ is number of steps missed. ‘False Step’ is the number of steps detected in random data.