| Literature DB >> 16613602 |
Casper J Albers1, Ritsert C Jansen, Jan Kok, Oscar P Kuipers, Sacha Aft van Hijum.
Abstract
BACKGROUND: Simulation of DNA-microarray data serves at least three purposes: (i) optimizing the design of an intended DNA microarray experiment, (ii) comparing existing pre-processing and processing methods for best analysis of a given DNA microarray experiment, (iii) educating students, lab-workers and other researchers by making them aware of the many factors influencing DNA microarray experiments.Entities:
Mesh:
Year: 2006 PMID: 16613602 PMCID: PMC1479841 DOI: 10.1186/1471-2105-7-205
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schematic overview of the The blue-marked boxes (A) indicate layers that are further visualized (B and C). A simulation of the entire 'non-biological' signal is shown in B and C. Top row, sum of the gradient effects and density effects; second row, spot pin effects; third row: Gaussian noise. The bottom row shows the sum of all the effects pictured in the top three rows. The signals are plotted three-dimensionally (left side view) and two-dimensionally (right side view).
Comparison of the properties of different DNA-microarray simulation models described in literature. '+' and '-' indicate availability of the indicated feature in the specified model. Note that the modeling of features in the specific models is usually not the same.
| Method | Lalush | Balagurunathan | Lonnstedt | Wierling | GE2 | ||
| model implementation | tabulated gene expression data | + | - | - | - | - | + |
| modeling of gene networks | - | - | - | - | - | + | |
| TIFF output | - | + | + | + | + | - | |
| software available | + [5] | + 1 | - | - | - | + [27] | |
| adjustable model | + | + | + | + | - | +/- | |
| model effects | background surface pattern | + | - | + | - | + | - |
| spot pin / grid effects | + | + | + | - | - | - | |
| channel effects | + | - | + | - | - | - | |
| non-linearity effects | + | - | + | - | - | + | |
| missing data | + | - | + | - | - | - | |
| 'fishtailing' | + | - | + | - | - | + | |
| spot shape / size | - | + | + | - | - | - |
1) C++ code available from author.
Overview of parameters in the SIMAGE model. Some parameters are known (such as number of spots per grid), others should be set by the user (the flag indicates when the parameter can also be estimated from the data). Details concerning these parameters are discussed in the implementation section.
| Parameter | Description | Can be estimated |
| Number of grids per row | ||
| Number of grids per column | ||
| Number of spots per grid | ||
| Number of slides | ||
| Number of technical replications | ||
| Number of spot pins | ||
| Mean expression signal | v | |
| Logratio-shift due to down- and upregulation | v | |
| Proportion of down-, up-regulated genes | v | |
| Variation in gene expression | v | |
| Correlation between Cy3 and Cy5 expression | v | |
| Replication variation | v | |
| Number of background 'densities' | ||
| Mean standard deviation per background density | ||
| Maximum slope of the linear tilt | ||
| Channel variation | v | |
| Spot pin variation | v | |
| Gene × dye variation | ||
| Non-linearity parameter 'curvature' | v | |
| Non-linearity parameter 'tilt' | v | |
| Fishtailing parameter | v | |
| Scanning device bias | v | |
| Maximum number of hairs, donuts and missing spots | ||
| Maximum length of hairs and radius of donuts |
Figure 2The gene expression parameters. These parameters (Table 2) were estimated by using the EM-algorithm (see "gene expressions" in the implementation section). The vertical lines constitute a stem-plot of the data. The red, green and blue curves indicate down-, not-, and up-regulated genes, respectively. The black curve is a combination of the three curves and, hence, the distribution of the logratios.
Estimation of parameters from the simulation of 100 DNA-microarray slides. The mentioned deviations are the number of estimated standard-deviations that the estimated mean, respectively median, lie away from the true value of the parameter.
| Parameter | Deviation (mean) | Deviation (median) |
| 0.1 | 0.1 | |
| -0.3 | -0.5 | |
| 1.5 | 1.5 | |
| -1.6 | -1.6 | |
| -1.6 | -1.6 | |
| -0.7 | -0.6 | |
| | | -1.1 | -1.0 |
| 1.0 | 0.7 |
Figure 3Distribution of the deviations of several of the model parameters estimated from 100 simulated DNA-microarray slides. The deviation is calculated as (estimate - true value) / (standard deviation of 100 estimates).
Figure 4Experiment-dependency of the parameters of the The bar-graph shows the CVs ((standard deviation / average) × 100%) of the parameters, estimated from the individual datasets. The resulting CV was determined from the average estimates for each of the parameters obtained from the experiments. The p-value obtained by ANOVA is displayed below the parameter symbols; p-values below 0.05 are considered to be significant.
Figure 5Visualization of the signals of a simulated slide. The upper picture shows a visualization of the measured expressions, while the lower picture is a visualization of the measured background signals. The areas designated as 'missing' are grey.
Figure 6Distribution of Data for 2200 genes, in 6 slides with technical duplicates hybridized in dye-swaps, was simulated using the MolGen experiment profile (supplementary Table T1) with some changes: π- = 1% and π+ = 2%, μ- = -2 and μ+ = 2), σ= 700, and s = 30 % × μ. The main graph shows the resulting ratios after normalization plotted versus the p-value. The graph was simplified by removing genes with ratios between 2/3 and 3/2. The 66 genes for which differential expressions were modeled are depicted by blue diamonds. The remaining genes are depicted in purple squares. The small graph on the right demonstrates the reversed p-value dependency on the average signal for the 66 differentially expressed genes modeled. The average signal was calculated for each of the 66 genes over the maximum of 12 normalized measurements. Normalization was performed using Lowess normalization and differential expression tests were performed with the non-Bayesian Cyber-T implementation of a variant of the t-test [3]. The Cyber-T test provides the p-values, which indicate the probability that a given ratio is not differential caused by chance. Genes with less than 8 measurements were excluded from these tests and assigned a p-value of 1, in order to be able to present these genes in the graph.