| Literature DB >> 15354216 |
F De Smet1, Y Moreau, K Engelen, D Timmerman, I Vergote, B De Moor.
Abstract
A basic problem of microarray data analysis is to identify genes whose expression is affected by the distinction between malignancies with different properties. These genes are said to be differentially expressed. Differential expression can be detected by selecting the genes with P-values (derived using an appropriate hypothesis test) below a certain rejection level. This selection, however, is not possible without accepting some false positives and negatives since the two sets of P-values, associated with the genes whose expression is and is not affected by the distinction between the different malignancies, overlap. We describe a procedure for the study of differential expression in microarray data based on receiver-operating characteristic curves. This approach can be useful to select a rejection level that balances the number of false positives and negatives and to assess the degree of overlap between the two sets of P-values. Since this degree of overlap characterises the balance that can be reached between the number of false positives and negatives, this quantity can be seen as a quality measure of microarray data with respect to the detection of differential expression. As an example, we apply our method to data sets studying acute leukaemia.Entities:
Mesh:
Year: 2004 PMID: 15354216 PMCID: PMC2747693 DOI: 10.1038/sj.bjc.6602140
Source DB: PubMed Journal: Br J Cancer ISSN: 0007-0920 Impact factor: 7.640
Definition of true and false-positive genes (TP and FP) and of true and false-negative genes (TN and FN) at a certain level of rejection α=p (P-value of the ith gene after ranking them in ascending order by P-value) (for each of them, the formula of the expected value is given)
| Yes ( | TP | FP | Pos |
| No ( | FN | TN | Neg |
N=total number of genes; N0=number of genes without actual differential expression; N1=number of genes with actual differential expression (N=N0+N1); Pos=number of genes declared positive or differentially expressed at rejection level p; Neg=number of genes declared negative at rejection level p.
Results for the data from Golub et al (detection of differential expression between ALL and AML) and from Armstrong et al (detection of differential expression between ALL and AML, between ALL and MLL and between MLL and AML)
| 7129 | 12 582 | 12 582 | 12 582 | |
| 3876 | 3084 | 8119 | 4527 | |
| 3253 | 9498 | 4463 | 8055 | |
| AUC (%) (95% CI) | 91.39 (90.68–92.10) | 95.13 (94.78–95.48) | 85.98 (85.24–86.72) | 94.83 (94.46–95.20) |
| 0.18 (= | 0.11 (= | 0.22 (= | 0.13 (= | |
| SENSopt (%) | 84.03 | 87.26 | 76.75 | 86.97 |
| SPECopt (%) | 82.06 | 88.56 | 77.97 | 86.78 |
| SENSopt+SPECopt (%) | 166.09 | 175.82 | 154.71 | 173.76 |
N=total number of genes; N0=number of genes without actual differential expression; N1=number of genes with actual differential expression; AUC=area under the ROC curve; αopt=rejection level where the optimal balance between specificity and sensitivity is reached (i.e., the rejection level that maximises the sum of sensitivity and specificity – for the first two columns, these are also the rejection levels associated with the points on the ROC curves in Figure 1 with tangent lines with slope 1); SENSopt=sensitivity at αopt; SPECopt=specificity at αopt.
Figure 1Receiver-operating characteristic curves for the data from Golub et al and from Armstrong et al with respect to the detection of differential expression between ALL and AML.