| Literature DB >> 19386098 |
Koji Kadota1, Yuji Nakai, Kentaro Shimizu.
Abstract
BACKGROUND: To identify differentially expressed genes (DEGs) from microarray data, users of the Affymetrix GeneChip system need to select both a preprocessing algorithm to obtain expression-level measurements and a way of ranking genes to obtain the most plausible candidates. We recently recommended suitable combinations of a preprocessing algorithm and gene ranking method that can be used to identify DEGs with a higher level of sensitivity and specificity. However, in addition to these recommendations, researchers also want to know which combinations enhance reproducibility.Entities:
Year: 2009 PMID: 19386098 PMCID: PMC2679019 DOI: 10.1186/1748-7188-4-7
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Frequency of preprocessing algorithms used during 2003 – 2008
| 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | |
| MAS (2002) | 8 | 34 | 53 | 42 | 47 | 16 |
| RMA (2003) | 8 | 15 | 29 | 20 | 9 | |
| MBEI (2001) | 0 | 3 | 7 | 16 | 8 | 3 |
| GCRMA (2004) | 0 | 5 | 8 | 4 | ||
| VSN (2002) | 0 | 0 | 0 | 4 | 0 | 2 |
Our investigation was performed for 394 different papers with analyses performed using the Affymetrix HG-U133A array (Gene Expression Omnibus (GEO) ID: GPL96) [32]. These results were obtained by reading the entire texts. Statistics for top five algorithms used are shown, and years in parentheses refer to their publication. Suitable methods of ranking genes for the first two algorithms, MAS and RMA, have been discussed [1].
Average AUC values for Datasets 3–26 and 27–38
| Method | PLIER | VSN | FARMS | mmgMOS | MBEI | GCRMA | MAS | RMA | DFW | Average |
| Datasets 3–26 | ||||||||||
| WAD | 89.15 | 90.97 | 91.58 | 88.19 | 92.67 | 91.37 | 91.41 | 91.88 | ||
| AD | 87.32 | 92.47 | 93.13 | 89.14 | 93.76 | 93.10 | 92.24 | 91.95 | ||
| FC | 86.20 | 92.92 | 92.49 | 92.82 | 89.71 | 93.16 | 93.63 | 92.24 | 91.81 | |
| RP | 92.48 | 92.20 | 93.23 | 91.51 | 92.54 | |||||
| modT | 85.43 | 90.70 | 89.23 | 91.91 | 86.79 | 91.19 | 95.67 | 91.38 | 90.11 | 90.27 |
| samT | 85.25 | 90.94 | 89.40 | 92.99 | 87.09 | 91.07 | 95.95 | 91.23 | 89.96 | 90.43 |
| shrinkT | 84.65 | 90.62 | - | 92.56 | 86.89 | 92.12 | 95.73 | 91.32 | 91.45 | 90.67 |
| ibmT | 86.27 | 91.02 | 90.04 | 93.11 | 87.10 | 91.06 | 96.34 | 91.77 | 90.25 | 90.78 |
| Datasets 27–38 | ||||||||||
| WAD | 91.30 | 95.41 | 93.55 | 93.39 | 95.75 | 96.73 | 94.09 | |||
| AD | 89.01 | 96.22 | 93.81 | 86.99 | 93.11 | 96.18 | 87.41 | 94.22 | 92.64 | |
| FC | 88.82 | 96.20 | 93.81 | 85.92 | 92.94 | 96.06 | 88.23 | 96.73 | 94.22 | 92.55 |
| RP | 86.94 | 84.55 | 96.53 | 92.81 | ||||||
| modT | 89.28 | 94.53 | 91.62 | 89.61 | 90.50 | 93.33 | 90.90 | 95.28 | 92.36 | 91.93 |
| samT | 88.53 | 95.08 | 91.90 | 89.14 | 90.49 | 93.60 | 90.31 | 95.70 | 92.05 | 91.87 |
| shrinkT | 88.56 | 94.04 | - | 89.85 | 90.13 | 94.48 | 90.97 | 94.85 | 93.68 | 92.07 |
| ibmT | 89.89 | 94.80 | 89.95 | 90.60 | 90.66 | 93.67 | 91.92 | 95.49 | 92.43 | 92.16 |
| Average | 88.33 | 93.46 | 91.98 | 90.94 | 89.98 | 93.58 | 92.25 | 93.99 | 92.37 | |
As statistics for shrinkT frequently include infinities, AUC values for FARMS/shrinkT could not be calculated. Note that AUC values for each dataset are given in the additional file [see Additional file 1].
Figure 1POG values for FARMS-preprocessed data. (a) Sample A vs. Sample B; (b) Sample C vs. Sample D. Number of DEGs is shown on x-axis and L is the number of DEGs from the up-regulation in one sample (Sample A or C) or other sample (Sample B or D) [21]. Percentage of genes (POG) common to the six gene lists derived from six test sites at a given number of DEGs is shown on y-axis. Note that POG values for shrinkT were not shown because statistics frequently include infinities. Also, that the reproducibility (POG) shown by only the w term (light blue line) in WAD is clearly higher than that shown by the AD (black line) recommended by the MAQC study.
POG values for the 100 top-ranked genes among six test sites
| Method | PLIER | VSN | FARMS | mmgMOS | MBEI | GCRMA | MAS | RMA | DFW |
| Sample A vs. B | |||||||||
| WAD | 64 | 56 | 65 | 69 | 58 | 61 | 62 | 60 | |
| AD | 45 | 59 | 65 | 62 | 58 | 65 | 45 | 64 | 59 |
| FC | 40 | 60 | 65 | 60 | 58 | 65 | 45 | 64 | 59 |
| RP | 43 | 58 | 65 | 65 | 59 | 65 | 42 | 63 | 59 |
| modT | 1 | 22 | 0 | 3 | 15 | 5 | 6 | 1 | 1 |
| samT | 11 | 11 | 0 | 55 | 39 | 6 | 31 | 10 | 1 |
| shrinkT | 0 | 12 | - | 37 | 20 | 10 | 5 | 4 | 1 |
| ibmT | - | 21 | 0 | - | 15 | 15 | 8 | 3 | 1 |
| Sample C vs. D | |||||||||
| WAD | 8 | 36 | 12 | 33 | 9 | 8 | 20 | 10 | |
| AD | 1 | 6 | 17 | 13 | 4 | 4 | 1 | 4 | 4 |
| FC | 1 | 8 | 16 | 14 | 2 | 3 | 0 | 5 | 4 |
| RP | 2 | 10 | 17 | 12 | 4 | 3 | 1 | 5 | 4 |
| modT | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 |
| samT | 2 | 0 | 0 | 13 | 1 | 0 | 3 | 3 | 0 |
| shrinkT | 0 | 0 | - | 0 | 2 | 0 | 0 | 1 | 0 |
| ibmT | - | 0 | 0 | - | 1 | 1 | - | 6 | 0 |
Various POG values such as for FARMS/shrinkT could not be calculated because those statistics include infinities. Note POG values for MAS, RMA, and DFW-preprocessed data.