| Literature DB >> 19617374 |
Ricardo A Verdugo1, Christian F Deschepper, Gloria Muñoz, Daniel Pomp, Gary A Churchill.
Abstract
Measurements of gene expression from microarray experiments are highly dependent on experimental design. Systematic noise can be introduced into the data at numerous steps. On Illumina BeadChips, multiple samples are assayed in an ordered series of arrays. Two experiments were performed using the same samples but different hybridization designs. An experiment confounding genotype with BeadChip and treatment with array position was compared to another experiment in which these factors were randomized to BeadChip and array position. An ordinal effect of array position on intensity values was observed in both experiments. We demonstrate that there is increased rate of false-positive results in the confounded design and that attempts to correct for confounded effects by statistical modeling reduce power of detection for true differential expression. Simple analysis models without post hoc corrections provide the best results possible for a given experimental design. Normalization improved differential expression testing in both experiments but randomization was the most important factor for establishing accurate results. We conclude that lack of randomization cannot be corrected by normalization or by analytical methods. Proper randomization is essential for successful microarray experiments.Entities:
Mesh:
Year: 2009 PMID: 19617374 PMCID: PMC2761262 DOI: 10.1093/nar/gkp573
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Experimental design. Layout of samples for the Confounded and Randomized experiments. Black rectangles represent BeadChips. Sentrix Position for individual arrays are displayed along the left side of BeadChips (A–H). Each experiment used two BeadChips. Colors represent genotype [blue = C57BL/6J (B); green = C57BL/6J-chrYA/J/NaJ (BY)] and castration treatment [yellow = intact (I); red = castrated (C)].
Statistical models applied to probe level data
| Experiment | Models | Abbreviation | ||
|---|---|---|---|---|
| Confounded | 3 | 12 | c.unadj | |
| 5 | 10 | c.linreg | ||
| 9 | 6 | c.full | ||
| Randomized | 3 | 12 | r.unadj | |
| 6 | 9 | r.linreg | ||
| 11 | 4 | r.full |
df1, model degrees of freedom; df2, error degrees of freedom; y, log2 probe level intensity; μ, overall mean; μ, mean for experimental group k; β, coefficients of regression within chip; p, position covariate (values 1–8); P, random position effect (levels A–H); C, random chip effect; Lower case indicates fixed and upper case random effects; c, confounded; r, randomized; unadj, unadjusted; linreg, adjustment by linear regression; full, adjustment by full mixed model.
The prefix ‘raw.’ or ‘norm.’ is applied to the model abbreviation in the text and figures to indicate if the model was fit to raw or normalized data, respectively (Figure 4).
Figure 4.Hierarchical clustering of fitted models for adjustment of chip and position effects (Table 1). Models distance was measured as 1-Spearman correlation between P-values. Negative correlations produce distances higher than 1. Branches are labeled to indicate normalization method (norm, raw), experiment (c,r) and analysis model (unadj, linreg, full). Number of probes selected at FDR < 0.1 are shown at right.
Figure 2.Boxplot for raw data from the both experiments. Outliers are not shown for clarity. Boxplots of raw intensity values for negative probes in the Confounded and Randomized experiments are shown by position in the four Chips. Color differentiates castration treatment (yellow = castrated; green = intact). Blue lines are best linear fit on the medians by position.
Figure 3.Confounded experiment is enriched with false-positive results. Genes selected for differential expression in two replicated experiments. Genes were selected by the UnAdj model on normalized data. Venn diagrams group number of unique genes selected (a–c) and GO Biological Processes associated to those genes (d–f).
Figure 5.Variance due to chip effects. The variance component associated to chip effects (S2chip) was estimated by REML from the Full model in raw and normalized data from the Randomized experiment. The histogram shows the distribution of S2chip/S2 across probes.
Figure 6.Association between treatment and position effects introduced by a confounded design. Scatter plots show position in Chip 2 (Confounded) and Chip 4 (Randomized) versus treatment effects in the BY genotype (Treatment_BY) from each experiment. Dotted blue lines cross the y- and x-axis at 0. Solid blue lines denote the median position (vertical) and Treatment_BY effects (horizontal).
Differentially expressed genes tested for Treatment effects before and after normalization
| Model | Normalization | Confounded experiment | Randomized experiment | ||
|---|---|---|---|---|---|
| Probes elected | Probes selected | ||||
| Raw | 0.25 | 959 | 0.52 | 1472 | |
| Normalized | 0.43 | 3615 | 0.39 | 3123 | |
| Raw | 0.06 | 44 | 0.68 | 7417 | |
| Normalized | 0.34 | 789 | 0.41 | 2892 | |
| Raw | 0.11 | 87 | 0.67 | 6225 | |
| Normalized | 0.33 | 1166 | 0.41 | 888 | |
Estimated π1 and number of probes selected by FDR < 0.1 are shown by experiment, normalization procedure and model used.