| Literature DB >> 16859500 |
Gabriela Alexe1, Sorin Alexe, David E Axelrod, Tibérius O Bonates, Irina I Lozina, Michael Reiss, Peter L Hammer.
Abstract
INTRODUCTION: The potential of applying data analysis tools to microarray data for diagnosis and prognosis is illustrated on the recent breast cancer dataset of van 't Veer and coworkers. We re-examine that dataset using the novel technique of logical analysis of data (LAD), with the double objective of discovering patterns characteristic for cases with good or poor outcome, using them for accurate and justifiable predictions; and deriving novel information about the role of genes, the existence of special classes of cases, and other factors.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16859500 PMCID: PMC1779471 DOI: 10.1186/bcr1512
Source DB: PubMed Journal: Breast Cancer Res ISSN: 1465-5411 Impact factor: 6.466
The six-gene support set of the demonstration model
| Gene Index | Van't Veer id | GeneBank | DAVID gene name |
| 1 | AF 018081 | Collagen, type XVIII, alpha 1 | |
| 2 | NM 003239 | Transforming growth factor, beta 3 | |
| 3 | NM 004035 | Acyl-Coenzyme A oxidase 1, palmitoyl | |
| 4 | Contig26768_RC | Exostoses (multiple) 1 | |
| 5 | Contig15031_RC | Oligodendrocyte myelin glycoprotein | |
| 6 | Contig27639_RC | Ectonucleoside triphosphate diphosphohydrolase 2 |
Demonstration LAD model consisting of nine positive and seven negative patterns on the support set of six genes
| Patterns | Definition of patterns | Patterns' coverages (prevalences) on training set | ||||||
| AF018081 | NM_003239 | NM_004035 | Contig26768_RC | Contig15031_RC | Contig27639_RC | |||
| Attr. 1 | Attr. 2 | Attr. 3 | Attr. 4 | Attr. 5 | Attr. 6 | Positive prevalence | Negative prevalence | |
| P1 | ≤-0.014 | >-0.106 | ≤0.055 | 22 (64.71%) | 0 | |||
| P2 | >-0.232, ≤0.0575 | >-0.106 | >-0.2305 | 17 (50%) | 0 | |||
| P3 | ≤-0.0945 | ≤0.0915 | >-0.1555 | 13 (38.24%) | 0 | |||
| P4 | >-0.12 | >-0.1555, ≤-0.014 | ≤0.1145 | 12 (35.29%) | 0 | |||
| P5 | >-0.12, ≤0.0055 | ≤0.0575 | ≤0.1485 | 11 (32.35%) | 0 | |||
| P6 | >-0.2025 | >-0.106, ≤0.0455 | >-0.0065, ≤0.055 | 11 (32.35%) | 0 | |||
| P7 | >-0.08 | >-0.0345 | >-0.1555, ≤0.1445 | 9 (26.47%) | 0 | |||
| P8 | >0.071 | >-0.106, ≤0.0775 | >-0.1555 | 9 (26.47%) | 0 | |||
| P9 | >-0.319 | >0.035 | >-0.1555, ≤0.1445 | 6 (17.65%) | 0 | |||
| N1 | ≤0.071 | ≤0.098 | >0.0915 | 0 | 15 (34.09%) | |||
| N2 | >0.1145 | ≤0.037 | 0 | 15 (34.09%) | ||||
| N3 | ≤0.071 | >0.0575 | >-0.0635 | 0 | 14 (31.82%) | |||
| N4 | ≤-0.106 | 0 | 13 (29.55%) | |||||
| N5 | >-0.014 | ≤-0.1555 | 0 | 12 (27.27%) | ||||
| N6 | >-0.1335, ≤0.098 | >0.055 | ≤0.037 | 0 | 11 (25%) | |||
| N7 | >-0.319 | >0.035 | >0.055, ≤0.1485 | 0 | 7 (15.91%) | |||
LAD, logical analysis of data.
The 17-gene support set of the enhanced model.
| Gene Index | Van't Veer id | GeneBank | DAVID_GENE_NAME |
| 1 | AB033007 | KIAA1181 protein | |
| 2 | NM_001661 | ADP-ribosylation factor 4-like | |
| 3 | NM_001756 | Serine (or cysteine) proteinase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 6 | |
| 4 | AF148505 | Aldehyde dehydrogenase 6 family, member A1 | |
| 5 | Contig42421_RC | F-box protein 16 | |
| 6 | NM_003748 | Aldehyde dehydrogenase 4 family, member A1 | |
| 7 | NM_020974 | Signal peptide, CUB domain, EGF-like 2 | |
| 8 | AL080059 | TSPY-like 5 | |
| 9 | AL110129 | Mitochondrial ribosomal protein S22 | |
| 10 | Contig15031_RC | Oligodendrocyte myelin glycoprotein | |
| 11 | Contig65439 | Chromosome 20 open reading frame 178 | |
| 12 | Contig37063_RC | Poly (ADP-ribose) glycohydrolase | |
| 13 | Contig41383_RC | Asparaginase like 1 | |
| 14 | AL049689 | Tenascin N | |
| 15 | Contig63102_RC | Hypothetical protein FLJ11354 | |
| 16 | Contig55574_RC | F-box protein 41 | |
| 17 | Contig38451_RC | Not available |
'Enhanced LAD model' consisting of 20 positive and 20 negative patterns on support set of 17 genes
| AB033007 | NM_001661 | NM_001756 | AF148505 | Contig42421_RC | NM_003748 | NM_020974 | AL080059 | AL110129 | Contig15031_RC | Contig65439 | Contig37063_RC | Contig41383_RC | AL049689 | Contig63102_RC | Contig55574_RC | Contig38451_RC | |||
| Attr. 1 | Attr. 2 | Attr. 3 | Attr. 4 | Attr. 5 | Attr. 6 | Attr. 7 | Attr. 8 | Attr. 9 | Attr. 10 | Attr. 11 | Attr. 12 | Attr. 13 | Attr. 14 | Attr. 15 | Attr. 16 | Attr. 17 | Pos Prev | Neg Prev | |
| P1 | >-0.42 | ≤0.09 | ≤0.06 | 19 (55.9%) | 0 | ||||||||||||||
| P2 | ≤0.07 | ≤-0.01 | ≤0.07 | 18 (52.9%) | 0 | ||||||||||||||
| P3 | >-0.42 | ≤0.06 | ≤0.06 | 18 (52.9%) | 0 | ||||||||||||||
| P4 | ≤-0.01 | ≤0.38 | ≤0.07 | 18 (52.9%) | 0 | ||||||||||||||
| P5 | ≤0.07 | ≤-0.45 | ≤0.06 | 17 (50.9%) | 0 | ||||||||||||||
| P6 | ≤0.33 | ≤-0.01 | ≤-0.104 | 16 (47.1%) | 0 | ||||||||||||||
| P7 | ≤0.07 | ≤-0.01 | >-0.02 | 16 (47.1%) | 0 | ||||||||||||||
| P8 | ≤0.07 | >-0.295 | ≤0.033 | 16 (47.1%) | 0 | ||||||||||||||
| P9 | >-0.42 | ≤0.06 | ≤-0.001 | 14 (41.2%) | 0 | ||||||||||||||
| P10 | >-0.1 | ≤-0.11 | ≤-0.01 | 14 (41.2%) | 0 | ||||||||||||||
| P11 | ≤0.03 | ≤-0.45 | ≤0.19 | 13 (38.2%) | 0 | ||||||||||||||
| P12 | ≤0.07 | >-0.295 | >0.08 | 13 (38.2%) | 0 | ||||||||||||||
| P13 | ≤0.35 | >-0.295 | >0.08 | 13 (38.2%) | 0 | ||||||||||||||
| P14 | ≤0.3 | ≤-0.001 | >0.08 | 13 (38.2%) | 0 | ||||||||||||||
| P15 | ≤-0.03 | >-0.16, ≤0.07 | 12 (35.3%) | 0 | |||||||||||||||
| P16 | ≤0.35 | >-0.96, ≤-0.7 | 10 (29.4%) | 0 | |||||||||||||||
| P17 | >-0.22 | ≤0.055 | >-0.295 | 10 (29.4%) | 0 | ||||||||||||||
| P18 | >-0.22 | >-0.48 | >-0.1 | 10 (29.4%) | 0 | ||||||||||||||
| P19 | >-0.22 | ≤0.32 | ≤0.055 | 10 (29.4%) | 0 | ||||||||||||||
| P20 | >-0.22 | >-0.1 | >-0.27 | 10 (29.4%) | 0 | ||||||||||||||
| N1 | >0.09 | >-0.005 | 0 | 15 (34.1%) | |||||||||||||||
| N2 | ≤-0.295 | >0.06 | ≤-0.02 | 0 | 15 (34.1%) | ||||||||||||||
| N3 | >-0.11 | ≤0.14 | ≤-0.02 | 0 | 15 (34.1%) | ||||||||||||||
| N4 | >0.055 | ≤-0.02 | ≤1.88 | 0 | 14 (31.8%) | ||||||||||||||
| N5 | >0.07 | >-0.001, ≤0.17 | 0 | 14 (31.8%) | |||||||||||||||
| N6 | >0.055 | ≤-0.02 | 0 | 14 (31.8%) | |||||||||||||||
| N7 | ≤-0.22 | >0.06 | >-0.005 | 0 | 14 (31.8%) | ||||||||||||||
| N8 | >0.055 | ≤-0.02 | 0 | 14 (31.8%) | |||||||||||||||
| N9 | >0.09 | >-0.005 | >-0.083 | 0 | 14 (31.8%) | ||||||||||||||
| N10 | >0.09 | >-0.12, ≤0.08 | 0 | 13 (29.5%) | |||||||||||||||
| N11 | >-0.03 | >0.06, ≤0.14 | 0 | 13 (29.5%) | |||||||||||||||
| N12 | >0.077 | ≤0.35 | >0.055 | 0 | 13 (29.5%) | ||||||||||||||
| N13 | >0.077 | ≤0.34 | ≤-0.0213 | 0 | 13 (29.5%) | ||||||||||||||
| N14 | ≤-0.22 | ≤0.18 | >0.055 | 0 | 13 (29.5%) | ||||||||||||||
| N15 | ≤0.21 | >0.077 | ≤-0.0213 | 0 | 12 (27.3%) | ||||||||||||||
| N16 | ≤-0.49 | >-0.1207 | ≤1.877 | 0 | 12 (27.3%) | ||||||||||||||
| N17 | >0.06, ≤0.14 | ≤-0.0213 | 0 | 12 (27.3%) | |||||||||||||||
| N18 | ≤0.22 | >-0.12, ≤-0.02 | 0 | 12 (27.3%) | |||||||||||||||
| N19 | ≤0.16 | ≤-0.42 | >-0.197 | 0 | 12 (27.3%) | ||||||||||||||
| N20 | >-0.204 | >-0.197 | >0.13 | 0 | 11 (25.0%) | ||||||||||||||
LAD, logical analysis of data
Description of the cases in the special positive class P+++
| Gene Accession Number | AB033007 | NM_001661 | NM_001756 | AF148505 | Contig42421_RC | NM_003748 | NM_020974 | AL080059 | AL110129 | Contig15031_RC | Contig65439 | Contig37063_RC | Contig41383_RC | AL049689 | Contig63102_RC | Contig55574_RC | Contig38451_RC | |
| P+++ | Lower bound | -0.13 | -0.123 | -0.193 | -0.362 | -0.281 | -0.372 | -1.125 | -0.066 | -0.078 | -0.077 | -0.268 | -0.193 | -0.095 | -0.242 | -0.453 | -0.369 | -0.119 |
| Upper bound | 0.108 | 0.044 | 0.381 | 0.116 | -0.058 | 0.041 | 0.783 | 0.518 | 0.054 | 0.071 | -0.009 | 0.116 | 0.07 | 0.115 | 0.048 | 0.525 | 0.268 | |
| Positive cases not in P+++ | Lower bound | -0.174 | -0.129 | -0.708 | -0.514 | -0.601 | -2 | -1.337 | -0.783 | -2 | -0.044 | -0.263 | -0.222 | -0.567 | -0.291 | -0.345 | -0.334 | -0.147 |
| Upper bound | 0.363 | 0.329 | 0.638 | 0.386 | 0.671 | 0.487 | 0.942 | 0.776 | 0.418 | 0.418 | 0.211 | 0.494 | 0.393 | 0.431 | 0.444 | 0.256 | 2 |
Contrastors differentiating the positive cases in P+++ from the positive cases outside P+++
| Diameter (mm) | Grade | ||
| P+++ | Average | 30.71 | 3.00 |
| CI (95%) | 25.31 | 3.00 | |
| 36.12 | 3.00 | ||
| Positive cases outside P+++ | Average | 22.67 | 2.81 |
| CI (95%) | 20.11 | 2.67 | |
| 25.22 | 2.96 |
CI, confidence interval.
Contrastors differentiating the positive cases in P+ from the positive cases outside P+
| PRp | ||
| P+ | Average | 55.56 |
| CI (95%) | 26.50 | |
| 84.61 | ||
| Positive cases outside P+ | Average | 27.60 |
| CI (95%) | 12.59 | |
| 42.61 |
CI, confidence interval; PRp, progesterone receptor.
Description of the cases in the special negative class N---
| Gene Accession Number | AB033007 | NM_001661 | NM_001756 | AF148505 | Contig42421_RC | NM_003748 | NM_020974 | AL080059 | AL110129 | Contig15031_RC | Contig65439 | Contig37063_RC | Contig41383_RC | AL049689 | Contig63102_RC | Contig55574_RC | Contig38451_RC | |
| N--- | Lower bound | 0.041 | -0.112 | -0.65 | 0.007 | -0.126 | 0.059 | -0.976 | 0.05 | 0.05 | 0.085 | -0.266 | 0.062 | -0.251 | 0.039 | -0.022 | -0.255 | 0.02 |
| Upper bound | 0.21 | 0.228 | 0.166 | 0.453 | 0.386 | 0.675 | -0.038 | 0.394 | 0.394 | 0.285 | 0.293 | 0.247 | 0.401 | 0.35 | 0.278 | -0.024 | 0.303 | |
| Negative cases not in N--- | Lower bound | -0.144 | -0.106 | -0.734 | -0.294 | -0.407 | -1.253 | -0.844 | -0.214 | -0.214 | -0.115 | -0.307 | -0.179 | -0.291 | -0.21 | -0.395 | -0.343 | -0.206 |
| Upper bound | 0.345 | 0.443 | 1.135 | 0.363 | 0.521 | 0.881 | 0.311 | 0.477 | 0.477 | 0.273 | 0.293 | 0.433 | 0.482 | 0.455 | 0.335 | 0.22 | 0.323 |
Contrastors differentiating the negative cases in N--- from the negative cases outside N---
| Grade | ERp | Lymphocytic infiltrate | ||
| N--- | Average | 1.75 | 78.75 | 0.00 |
| CI (95%) | 1.43 | 61.60 | 0.00 | |
| 2.07 | 95.90 | 0.00 | ||
| Positive cases outside N--- | Average | 2.42 | 57.22 | 0.14 |
| CI (95%) | 2.17 | 44.98 | 0.02 | |
| 2.67 | 69.46 | 0.25 |
CI, confidence interval; ERp, estrogen receptor.
Contrastors differentiating the negative cases in N- from the negative cases outside N-
| Follow-up time (years) | Grade | ERp | PRp | Lymphocytic infiltrate | ||
| N- | Average | 8.16 | 2.58 | 47.31 | 36.92 | 0.19 |
| CI (95%) | 7.28 | 2.31 | 32.62 | 23.19 | 0.04 | |
| 9.04 | 2.85 | 62.00 | 50.65 | 0.35 | ||
| Negative cases outside N- | Average | 9.48 | 1.89 | 81.11 | 56.94 | 0.00 |
| CI (95%) | 8.20 | 1.58 | 71.23 | 40.61 | 0.00 | |
| 10.76 | 2.20 | 90.99 | 73.28 | 0.00 |
CI, confidence interval; ERp, estrogen receptor; PRp, progesterone receptor.
Comparison of weighted accuracies of the van 't Veer Classifier and the enhanced LAD model
| Training set (78 cases) | Test set (19 cases) | Entire dataset (78 + 19 cases) | ||
| Direct classification (%) | Cross-validation (%) | Direct classification (%) | Cross-validation (%) | |
| Van 't Veer classifier [4] | 83.6 | Not reported | 88.7 | Not reported |
| Enhanced LAD model | 100 | 82.52 | 92.86 | 81.74 |
LAD, logical analysis of data.
Comparison of weighted accuracies of the LAD models constructed on three different support sets
| Support set | Training set (78 cases) | Test set (19 cases) | Entire dataset (78 + 19 cases) | |
| Direct classification (%) | Cross-validation (%) | Direct classification (%) | Direct classification (%) | |
| 231 genes (van't Veer [4]) | 100.00 | 79.48 | 84.52 | 78.35 |
| 70 genes (van 't Veer [4]) | 99.26 | 75.43 | 84.52 | 74.06 |
| Proposed support set of 17 genes | 100.00 | 82.52 | 92.86 | 81.74 |
LAD, logical analysis of data.
Weighted accuracies of various models constructed on the support set identified by LAD
| Method | Support set of 17 genes (LAD) | |||
| Training set (78 cases) | Test set (19 cases) | Entire dataset (78 + 19 cases) | ||
| Direct classification (%) | Cross-validation (%) | Direct classification (%) | Cross-validation (%) | |
| Artificial neural networks (1 hidden layer) | 100.00 | 76.55 | 84.21 | 78.65 |
| Support vector machines (linear kernel) | 87.18 | 76.43 | 63.16 | 77.27 |
| Logistic regression | 94.87 | 76.87 | 73.68 | 77.95 |
| Nearest neighbors | 100.00 | 80.55 | 63.16 | 76.34 |
| Decision trees (C4.5) | 96.15 | 67.48 | 57.90 | 67.01 |
| 95% CI | 91.03–100 | 71.33–79.82 | 59.20–77.65 | 71.25–79.64 |
CI, confidence interval; LAD, logical analysis of data.
Weighted accuracies of various models constructed on the support set of 70 genes identified by van 't Veer et al.
| Method | Support set of 70 genes (van 't Veer [33]) | |||
| Training set (78 cases) | Test set (19 cases) | Entire dataset (78+19 cases) | ||
| Direct classification (%) | Cross-validation (%) | Direct classification (%) | Cross-validation (%) | |
| Artificial neural networks (1 hidden layer) | 100.00 | 80.16 | 42.11 | 71.65 |
| Support vector machines (linear kernel) | 96.15 | 82.01 | 57.90 | 77.03 |
| Logistic regression | 100.00 | 73.52 | 47.37 | 73.79 |
| Nearest neighbors | 100.00 | 71.58 | 63.16 | 71.77 |
| Decision trees (C4.5) | 96.15 | 60.49 | 42.11 | 61.89 |
| 95% CI | 96.61–100 | 66.09–81.01 | 42.15–58.91 | 66.27–76.18 |
The 70-gene set was reported by van 't Veer and coworkers elsewhere [4]. CI, confidence interval; LAD, logical analysis of data.
Weighted accuracies of various models constructed on the support set of 231 genes identified by van 't Veer and coworkers
| Method | Support set of 231 genes (van 't Veer [33]) | |||
| Training set (78 cases) | Test set (19 cases) | Entire dataset (78 + 19 cases) | ||
| Direct classification (%) | Cross-validation (%) | Direct classification (%) | Cross-validation (%) | |
| Artificial neural networks (1 hidden layer) | 100.00 | 72.24 | 73.68 | 73.96 |
| Support vector machines (linear kernel) | 100.00 | 72.79 | 73.68 | 74.88 |
| Logistic regression | 100.00 | 71.21 | 73.68 | 75.63 |
| Nearest neighbors | 100.00 | 72.94 | 78.94 | 77.15 |
| Decision trees (C4.5) | 97.44 | 60.70 | 73.68 | 66.64 |
| 95% CI | 98.48–100.00 | 65.39–74.56 | 72.67–76.79 | 70.07–77.24 |
The 70-gene set was reported by van 't Veer and coworkers elsewhere [4]. CI, confidence interval; LAD, logical analysis of data.
Interval containing all the 19 cases in the test set and none of the 78 cases in the training set
| Gene Accession Number | AB033007 | NM_001661 | NM_001756 | AF148505 | Contig42421_RC | NM_003748 | NM_020974 | AL080059 | AL110129 | Contig15031_RC | Contig65439 | Contig37063_RC | Contig41383_RC | AL049689 | Contig63102_RC | Contig55574_RC | Contig38451_RC | |
| Interval | Lower bound | -0.212 | -0.227 | -0.541 | -0.268 | -0.301 | -0.295 | -1.085 | -0.606 | -0.144 | -0.282 | -0.325 | -0.307 | -0.106 | -0.219 | -0.347 | -0.325 | -0.245 |
| Upper bound | 0.187 | 0.017 | 0.29 | 0.22 | 0.394 | 0.309 | 0.557 | 0.401 | 0.241 | 0.062 | 0.303 | 0.272 | 0.117 | 0.304 | 0.331 | 0.576 | 0.183 |