| Literature DB >> 19455258 |
Akihiro Hirakawa1, Yasunori Sato, Takashi Sozu, Chikuma Hamada, Isao Yoshimura.
Abstract
The recent development of DNA microarray technology allows us to measure simultaneously the expression levels of thousands of genes and to identify truly correlated genes with anticancer drug response (differentially expressed genes) from many candidate genes. Significance Analysis of Microarray (SAM) is often used to estimate the false discovery rate (FDR), which is an index for optimizing the identifiability of differentially expressed genes, while the accuracy of the estimated FDR by SAM is not necessarily confirmed. We propose a new method for estimating the FDR assuming a mixed normal distribution on the test statistic and examine the performance of the proposed method and SAM using simulated data. The simulation results indicate that the accuracy of the estimated FDR by the proposed method and SAM, varied depending on the experimental conditions. We applied both methods to actual data comprised of expression levels of 12,625 genes of 10 responders and 14 non-responders to docetaxel for breast cancer. The proposed method identified 280 differentially expressed genes correlated with docetaxel response using a cut-off value for achieving FDR <0.01 to prevent false-positive genes, although 92 genes were previously thought to be correlated with docetaxel response ones.Entities:
Keywords: differentially expressed genes; false discovery rate; microarray; mixed normal distribution; significance analysis of microarray
Year: 2008 PMID: 19455258 PMCID: PMC2675830
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Results of simulation situation 1.
| 0.01 | −0.0012 | 0.0057 | 0.0005 | 0.0071 |
| 0.05 | −0.0044 | 0.0163 | 0.0019 | 0.0184 |
| 0.10 | −0.0045 | 0.0214 | 0.0027 | 0.0247 |
| 0.20 | −0.0055 | 0.0239 | 0.0035 | 0.0321 |
| 0.50 | −0.0035 | 0.0154 | 0.0142 | 0.0397 |
Results of simulation situation 2.
| 5 | −0.0308 | 0.0361 | 0.0005 | 0.0340 |
| 10 | −0.0122 | 0.0257 | 0.0013 | 0.0259 |
| 20 | −0.0045 | 0.0214 | 0.0027 | 0.0247 |
| 40 | −0.0034 | 0.0198 | 0.0042 | 0.0260 |
| 80 | −0.0032 | 0.0205 | 0.0085 | 0.0258 |
Results of simulation situation 3
| 30 | −0.0094 | 0.0456 | −0.0004 | 0.0549 |
| 75 | −0.0072 | 0.0290 | 0.0025 | 0.0346 |
| 150 | −0.0045 | 0.0214 | 0.0027 | 0.0247 |
| 300 | −0.0032 | 0.0138 | −0.0025 | 0.0176 |
| 600 | −0.0022 | 0.0087 | −0.0102 | 0.0129 |
Figure 1.Histogram of the t-type score and the density function of a two-component mixed normal distribution. The solid line is f, the dotted line is f0, and the broken line is f1 in a two-component mixed normal distribution.
Figure 2.Scatter plot of the ordered t-type score versus the expected ordered t-type score in SAM.
Results of application of the proposed method and SAM. The estimated FDR in both methods, and the number of identified genes for each cut-off value.
| 0.1 | 0.504 | 0.748 | 6,433 |
| 0.2 | 0.464 | 0.612 | 5,685 |
| 0.3 | 0.420 | 0.487 | 4,935 |
| 0.4 | 0.373 | 0.381 | 4,227 |
| 0.5 | 0.324 | 0.291 | 3,612 |
| 0.6 | 0.274 | 0.225 | 3,008 |
| 0.7 | 0.226 | 0.172 | 2,506 |
| 0.8 | 0.181 | 0.132 | 2,063 |
| 0.9 | 0.141 | 0.101 | 1,680 |
| 1.0 | 0.106 | 0.076 | 1,371 |
| 1.1 | 0.078 | 0.059 | 1,087 |
| 1.2 | 0.056 | 0.044 | 877 |
| 1.3 | 0.039 | 0.033 | 691 |
| 1.4 | 0.026 | 0.024 | 571 |
| 1.5 | 0.017 | 0.018 | 451 |
| 1.6 | 0.011 | 0.014 | 357 |
| 1.7 | 0.007 | 0.011 | 280 |
| 1.8 | 0.004 | 0.008 | 218 |
| 1.9 | 0.003 | 0.006 | 171 |
| 2.0 | 0.002 | 0.006 | 119 |