| Literature DB >> 18793468 |
Mikhail G Dozmorov1, Kimberly D Kyker, Paul J Hauser, Ricardo Saban, David D Buethe, Igor Dozmorov, Michael B Centola, Daniel J Culkin, Robert E Hurst.
Abstract
A statistically robust and biologically-based approach for analysis of microarray data is described that integrates independent biological knowledge and data with a global F-test for finding genes of interest that minimizes the need for replicates when used for hypothesis generation. First, each microarray is normalized to its noise level around zero. The microarray dataset is then globally adjusted by robust linear regression. Second, genes of interest that capture significant responses to experimental conditions are selected by finding those that express significantly higher variance than those expressing only technical variability. Clustering expression data and identifying expression-independent properties of genes of interest including upstream transcriptional regulatory elements (TREs), ontologies and networks or pathways organizes the data into a biologically meaningful system. We demonstrate that when the number of genes of interest is inconveniently large, identifying a subset of "beacon genes" representing the largest changes will identify pathways or networks altered by biological manipulation. The entire dataset is then used to complete the picture outlined by the "beacon genes." This allow construction of a structured model of a system that can generate biologically testable hypotheses. We illustrate this approach by comparing cells cultured on plastic or an extracellular matrix which organizes a dataset of over 2,000 genes of interest from a genome wide scan of transcription. The resulting model was confirmed by comparing the predicted pattern of TREs with experimental determination of active transcription factors.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18793468 PMCID: PMC2537575 DOI: 10.1186/1471-2105-9-S9-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic diagram of steps in microarray analysis.
Figure 2Identification of noise level and microarray adjustment. A) Frequency histogram of gene expression level from one microarray dataset. The first peak in the bimodal distribution represents the normal distribution of system noise centered around zero. Genes expressed 3 SD above noise level are defined as expressed genes. B) Box plots of microarray datasets before and after linear regression, values are log10 transformed.
Figure 3Identification of hypervariable and differentially expressed genes. A) Frequency histogram of variances of genes across timecourse. The normal distribution of low-variable genes identified from the left part of the histogram; the white vertical line marks the threshold for hypervariable genes expressed 3.8 SD above distribution of constant genes expressing only technical variability. B) Log10 ratio of average gene expression of cells grown on plastic and crECM is presented as a frequency histogram. Ratio values > 2 or < -2 were truncated and set to 2 and -2, respectively.
Figure 4Visualization of changes in gene expression level identified by microarray analysis. A) Schematic graphs of main changes observed in the system from clustering of hypervariable genes. B) Part of clustering heatmap of "beacon" state change genes: P 1, P 2 – duplicate gene expression profiles of RT4 cells grown on plastic; M 0.5, M 1, M 2 etc. – cells grown on crECM for the indicated number of days. Red/green intensity indicates level of gene expression, up-/downregulated, respectively.
Major functional groups overrepresented among state-change genes and corresponding overrepresented TREs. The groups are presented according to the order of significance identified by DAVID. Overrepresented TREs marked in bold are either "off-on" TFs or increased their level; regular – not present in Panomics set; italics – not present under either condition.
| 12 | 73 | Transcription factors | |
| 7 | 5 | Translation initiation factors | |
| 9 | 6 | RNA processing – ribosome biogenesis | NGFI-C, GR, |
| 11 | 8 | Zinc binding | MIF-1, |
| 3 | 16 | G-protein receptor | |
| 2 | 7 | GABA receptor/ion channels | N-Myc |
| 6 | 9 | Ion channels, K | |
| 8 | 5 | Glycosyltransferases | |
| 10 | 28 | Kinases | |
| 4 | 4 | Cadherins | |
| 1 | 15 | Transmembrane immunoglobulin-like proteins | |
| 5 | 14 | Transmembrane proteins | |
Figure 5The NFκB canonical signaling pathway from IPA. Dark red > 3-fold increase in gene expression; light red < 3-fold increase in gene expression; dark green – > 3-fold decrease in gene expression; light green – < 3-fold decrease in gene expression; gray – unchanged gene expression; no color – gene not in array. Gene symbols with a single border represent single genes. Double border represent a complex of genes or the possibility that alternative genes might act in the pathway.
Figure 6Transcription factor activity identified . A) Example of TREs overrepresented in first ontological cluster. Several genes (vertical) share common TREs (horizontal), highlighted by red. Results were filtered to show only TREs overrepresented at p < 0.05 and FDR < 0.3. TREs in bold show a significant increase in expression on crECM compared to plastic confirmed by transcription factor array experiment. B) Example of changes in binding activity of a few TFs on plastic and crECM. Gray/black bars show binding activity on plastic/crECM, respectively.