| Literature DB >> 15128432 |
Igor Dozmorov1, Nicholas Knowlton, Yuhong Tang, Michael Centola.
Abstract
BACKGROUND: Several aspects of microarray data analysis are dependent on identification of genes expressed at or near the limits of detection. For example, regression-based normalization methods rely on the premise that most genes in compared samples are expressed at similar levels and therefore require accurate identification of nonexpressed genes (additive noise) so that they can be excluded from the normalization procedure. Moreover, key regulatory genes can maintain stringent control of a given response at low expression levels. If arbitrary cutoffs are used for distinguishing expressed from nonexpressed genes, some of these key regulatory genes may be unnecessarily excluded from the analysis. Unfortunately, no accurate method for differentiating additive noise from genes expressed at low levels is currently available.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15128432 PMCID: PMC415561 DOI: 10.1186/1471-2105-5-53
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Normalization procedure for array data as shown on Atlas Clontech membranes. A. Histogram of averaged expression data from duplicated spots. B. Normality plot of A. C. Histogram of the trimmed data +/- 2SD about the mean. D. Histogram of the data after z transformation. E. Scatter plot showing the regression line for duplicated expressions (log transformed) after normalization. F. Scatter plot of background genes exhibiting the normality of their profile with average of 0 and SD of 1.
Figure 2"Contamination" of the normally distributed additive noise with weak expressions on Micromax cDNA arrays. A. Histogram of gene expression distribution. B. The variability of the gene expressions ratios on two arrays ordered by expression levels. C. Scatter plot of gene expressions used in the calculation of ratios in B with gene order preserved.
Figure 3Localization of additive noise through an F-test. A. Histogram of gene expressions from a microarray experiment. B. P value distribution showing correlation of additive noise as determined by F test. A window of 20 genes was created to calculate the groups SD and further windows were created by shifting the index of the ordered SD's by one gene. For example: widow 1 contains genes 1–20 and window 2 contains genes 2–21. As the expression level in these windows increases the SDs become correlated and a p-value threshold becomes apparent. C. Close up histogram of low intensity spots used as background. The Gaussian distribution can clearly be seen; furthermore, the right tail cutoff of the Gaussian distribution is at the expression level corresponds to the p value threshold for the ratio calculation in B.
Figure 4Within slide stability of the additive noise. Localization of the additive noise distribution and estimation of the normal distribution parameters were carried out as described in Materials and Methods. Parameters for additive noise distribution (Mean/SD) were estimated from total data set (left histogram) and from each half or quarters (intermediate and right histograms) of the slide.