| Literature DB >> 19278560 |
Xiaohui Fan1, Leming Shi, Hong Fang, Stephen Harris, Roger Perkins, Weida Tong.
Abstract
Recent publications have raised concerns about the reliability of microarray technology because of the lack of reproducibility of differentially expressed genes (DEGs) from highly similar studies across laboratories and platforms. The rat toxicogenomics study of the MicroArray Quality Control (MAQC) project empirically revealed that the DEGs selected using a fold change (FC)-based criterion were more reproducible than those derived solely by statistical significance such as P-value from a simple t-tests. In this study, we generate a set of simulated microarray datasets to compare gene selection/ranking rules, including P-value, FC and their combinations, using the percentage of overlapping genes between DEGs from two similar simulated datasets as the measure of reproducibility. The results are supportive of the MAQC's conclusion on that DEG lists are more reproducible across laboratories and platforms when FC-based ranking coupled with a nonstringent P-value cutoff is used for gene selection compared with selection based on P-value based ranking method. We conclude that the MAQC recommendation should be considered when reproducibility is an important study objective.Entities:
Year: 2009 PMID: 19278560 PMCID: PMC2654487 DOI: 10.1186/1753-6561-3-s2-s4
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Summary of the parameters used in this study.
| Low | Medium | High | Very High | |
| ~2% | ~10% | ~30% | ~100% | |
| MAQC main study | MAQC Rat toxicogenomics | Clinical application | ||
| ~1.5 | ~0.6 | ~0.2 | ||
| 5 per group | 50 per group | |||
Figure 1The relationship of POGs with the degree of noise level in the simulated datasets: (A) Low noise (CV = 2%); (B) Medium noise (CV = 10%); (C) High noise (CV = 30%); and (D) Very high noise (CV = 100%). The simulated datasets were set to the expression magnitude difference between the treated and control groups of 1.5 and the sample size of 50. The x-axis represents the number of genes selected as differentially expressed, and the y-axis represents the POG (%) of two gene lists for a given number of differentially expressed genes. Each line on the graph represents the overlap of differentially expressed gene lists based on one of six different gene ranking/selection methods. The red and blue numbers give the POG (%) for 500 selected DEGs (red dashed line) from P rank ordering only and FC rank ordering with P < 0.05, respectively.
Figure 2The relationship of POG with the degree of difference in expression magnitude between the treated versus control groups. (A) Magnitude = 0.6; (B) Magnitude = 1.5; and (C) Magnitude = 0.2. The simulated datasets had CV = 30% and sample size = 50. The x-axis represents the number of genes selected as differentially expressed, and the y-axis represents the POG (%) of two gene lists for a given number of differentially expressed genes. Each line on the graph represents the overlap of differentially expressed gene lists based on one of six different gene ranking/selection methods. The red and blue numbers give the POG (%) when 500 genes (red dashed line) are selected as DEGs using P rank ordering only and FC rank ordering with P < 0.05, respectively.
Figure 3The relationship of POG with the sample size: (A) 50 samples/group and (B) 5 samples/group. The simulated datasets had CV = 30% and magnitude = 50 (see Table 1). The x-axis represents the number of genes selected as differentially expressed, and the y-axis represents the POG (%) of two gene lists for a given number of differentially expressed genes. Each line on the graph represents the overlap of differentially expressed gene lists based on one of six different gene ranking/selection methods. The red and blue numbers give the POG (%) when 500 genes (red dashed line) are selected as DEGs using P rank ordering only and FC rank ordering with P < 0.05, respectively.