| Literature DB >> 18793455 |
Leming Shi1, Wendell D Jones, Roderick V Jensen, Stephen C Harris, Roger G Perkins, Federico M Goodsaid, Lei Guo, Lisa J Croner, Cecilie Boysen, Hong Fang, Feng Qian, Shashi Amur, Wenjun Bao, Catalin C Barbacioru, Vincent Bertholet, Xiaoxi Megan Cao, Tzu-Ming Chu, Patrick J Collins, Xiao-Hui Fan, Felix W Frueh, James C Fuscoe, Xu Guo, Jing Han, Damir Herman, Huixiao Hong, Ernest S Kawasaki, Quan-Zhen Li, Yuling Luo, Yunqing Ma, Nan Mei, Ron L Peterson, Raj K Puri, Richard Shippy, Zhenqiang Su, Yongming Andrew Sun, Hongmei Sun, Brett Thorn, Yaron Turpaz, Charles Wang, Sue Jane Wang, Janet A Warrington, James C Willey, Jie Wu, Qian Xie, Liang Zhang, Lu Zhang, Sheng Zhong, Russell D Wolfinger, Weida Tong.
Abstract
BACKGROUND: Reproducibility is a fundamental requirement in scientific experiments. Some recent publications have claimed that microarrays are unreliable because lists of differentially expressed genes (DEGs) are not reproducible in similar experiments. Meanwhile, new statistical methods for identifying DEGs continue to appear in the scientific literature. The resultant variety of existing and emerging methods exacerbates confusion and continuing debate in the microarray community on the appropriate choice of methods for identifying reliable DEG lists.Entities:
Mesh:
Year: 2008 PMID: 18793455 PMCID: PMC2537561 DOI: 10.1186/1471-2105-9-S9-S10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Concordance for inter-site comparisons. Each panel represents the POG results for a commercial platform of inter-site consistency in terms of DEGs between samples B and A. For each of the six gene selection methods, there are three possible inter-site comparisons: S1–S2, S1–S3, and S2–S3 (S = Site). Therefore, each panel consists of 18 POG lines that are colored based on gene ranking/selection method. Results shown here are based on the entire set of "12,091" genes commonly mapped across the microarray platforms without noise (absent call) filtering. POG results are improved when the analyses are performed using the subset of genes that are commonly detectable by the two test sites, as shown in Figure 2. The x-axis represents the number of selected DEGs, and the y-axis is the percentage (%) of genes common to the two gene lists derived from two test sites at a given number of DEGs.
Figure 2Concordance for inter-site comparisons based on genes commonly detectable by the two test sites compared. Each panel represents the POG results for a commercial platform of inter-site consistency in terms of DEGs between samples B and A. For each of the six gene selection methods, there are three possible inter-site comparisons: S1–S2, S1–S3, and S2–S3. Therefore, each panel consists of 18 POG lines that are colored based on gene ranking/selection method. The x-axis represents the number of selected DEGs, and the y-axis is the percentage (%) of genes common to the two gene lists derived from two test sites at a given number of DEGs.
Figure 3Concordance for inter-site comparison with samples C and D. The largest fold change between samples C and D is small (three-fold). For each platform, DEG lists from sites 1 and 2 are compared. Analyses are performed using the subset of genes that are commonly detectable by the two test sites.
Figure 4Concordance for cross-platform comparisons. Panel a: Based on the data set of "12,091" genes (without noise filtering); Panel b: Based on subsets of genes commonly detected ("Present") by two platforms. For each platform, the data from test site1 are used for cross-platform comparison. Each POG line corresponds to comparison of the DEGs from two microarray platforms using one of the six gene selection methods. There are ten platform-platform comparison pairs, resulting in 60 POG lines for each panel. The x-axis represents the number of selected DEGs, and the y-axis is the percentage (%) of genes common to the two gene lists derived from two platforms at a given number of DEGs. POG lines circled by the blue oval are from FC based gene selection methods with or without a P cutoff, whereas POG lines circled by the teal oval are from P based gene selection methods with or without an FC cutoff. Shown here are results for comparing sample B and sample A.
Figure 5Concordance between microarray and TaqMan® assays. Each panel represents the comparison of one microarray platform to TaqMan® assays. For each microarray platform, the data from test site 1 are used for comparison to TaqMan® assays. Each POG line corresponds to comparison of the DEGs from one microarray platform and those from the TaqMan® assays using one of the six gene selection methods. The x-axis represents the number of selected DEGs, and the Y-axis is the percentage (%) of genes common to DEGs derived from a microarray platform and those from TaqMan® assays. Shown here are results for comparing sample B and sample A using a subset of genes that are detectable by both the microarray platform and TaqMan® assays. Results based on the entire set of 906 genes are provided in Figure 6.
Figure 6Concordance between microarray and TaqMan® assays without noise-filtering. Each panel represents the comparison of one microarray platform to TaqMan® assays. The x-axis represents the number of selected DEGs, and the y-axis is the percentage (%) of genes common to DEGs derived from a microarray platform and those from TaqMan® assays. Shown here are results for comparing sample B and sample A using the entire set of 906 genes for which TaqMan® assay data are available.
Figure 7Inter-site reproducibility of log2 FC and log2 t-statistic. a: log2 FC of site 1 versus log2 FC of site 2; b: log2 t-statistic of test site 1 versus log2 t-statistic of test site 2; and c: log2 FC of test site 1 versus log2 t-statistic of test site 1. Shown here are results for comparing sample B and sample A for all "12,091" genes commonly probed by the five microarray platforms. The inter-site reproducibility of log2 FC (a) is much higher than that of log2 t-statistic (b). The relationship between log2 FC and log2 t-statistic from the same test site is non-linear and the correlation appears to be low (c).
Figure 8Concordance between FC and P based gene ranking methods ("12,091 genes"; test site 1). Each POG line represents a platform using data from its first test site. The x-axis represents the number of selected DEGs, and the y-axis is the percentage (%) of genes common in the DEGs derived from FC- and P-ranking. Shown here are results for comparing sample B and sample A for all "12,091" genes commonly probed. When a smaller number of genes (up to a few hundreds or thousands) are selected, POG for cross selection method comparison (FC vs. P) is low. For example, there are only about 50% genes in common for the top 500 genes selected by FC and P separately, indicating that FC and P rank order DEGs dramatically differently. When the number of selected DEGs increases, the overlap between the two methods increases, and eventually approach to 100% in common, as expected. The low concordance between FC- and P-based gene ranking methods is not unexpected considering their different definitions and low correlation (Figure 7c).
Figure 9Volcano plot illustration of joint FC and P gene selection rule. Genes in sectors A and C are selected as differentially expressed. The colors correspond to the negative log10 P and log2 fold change values: Red: 20 < -log10 P < 50 and 3 < log2 fold < 9 or -9 < log2 fold < -3. Blue: 10 < -log10 P < 50 and 2 < log2 fold < 3 or -3 < log2 fold < -2. Yellow: 4 < -log10 P < 50 and 1 < log2 fold < 2 or -2 < log2 fold < -1. Pink : 10 < -log10 P < 20 and 3 < log2 fold or log2 fold < -3. Light blue: 4 < -log10 P < 10 and 2 < log2 fold or log2 fold < -2. Light green: 2 < -log10 P < 4 and 1 < log2 fold or log2 fold < -1. Gray)
Figure 10Inter-site concordance based on FC, t-test, Wilcoxon rank-sum test, and SAM. Affymetrix data on samples A and B from site 1 and site 2 for the "12,091" commonly mapped genes were used[13]. No flagged ("Absent") genes were excluded in the analysis. For the Wilcoxon rank-sum tests, there were many ties, i.e., many genes exhibited the same level of statistical significance because of the small sample sizes (five replicates for each group). The tied genes from each test site were broken (ranked) by random ordering. Concordance between genes selected completely by random choice is shown in red and reaches 50% when all candidate genes are declared as differentially expressed; the other 50% genes are in opposite regulation directions. SAM improves inter-site reproducibility compared to t-test, and approaches, but does not exceed that of fold-change.
Figure 11Gene selection and percentage of agreement in gene lists in simulated data sets. Illustrations of the effect of biological context, replicate CV distribution, gene list size, and gene selection rules/methods on the reproducibility of gene lists using simulated microarray data. In some sense, these three graphs represent some extremes as well as typical scenarios in differential expression assays. However, FC sorting with low P thresholds (0.001 or 0.0001; light and medium gray boxes) consistently performed better overall than the other rules, even when FC-ranking or P-ranking by itself did not perform as well.