| Literature DB >> 23936357 |
Rosalba Giugno1, Alfredo Pulvirenti, Luciano Cascione, Giuseppe Pigola, Alfredo Ferro.
Abstract
We present a new classification method for expression profiling data, called MIDClass (Microarray Interval Discriminant CLASSifier), based on association rules. It classifies expressions profiles exploiting the idea that the transcript expression intervals better discriminate subtypes in the same class. A wide experimental analysis shows the effectiveness of MIDClass compared to the most prominent classification approaches.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23936357 PMCID: PMC3735555 DOI: 10.1371/journal.pone.0069873
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1MIDClass flowchart.
Figure 2Example of MIDClass flowchart on Breast Cancer 2 Dataset (data are partially shown).
Let denote the expression value of sample on the -th gene (an example of entry in M is shown as a black box). Samples are divided into classes corresponding to phenotypes disease. After discretization process, MIDClass constructs a matrix from by replacing each with the unique interval containing it. denotes an entry in (an example of entry in is shown as a black box). Then, MIDClass computes per class the possible sets of that are frequent and they have maximal size. MIDClass filters out gene expression intervals which size are below a given threshold. Since, association rules express interesting relationships between gene expressions and class labels, MIDClass uses them for classification. Therefore, MIDClass extracts a set of rules per class. Each rule has quantitative attributes on the antecedence part (i.e. discretized values) and one categorical attribute on the consequence side (i.e. the class ). Finally, it returns only rules that have a maximal score. The score takes into account the number of items in each sample are contained in the rule together with the cardinality of the rule (the computation of the score is described in detailed in the Methods section).
Dataset description.
| Dataset | Description |
| Brain Cancer | 60 samples, 46 patients with classic and 14 patients with desmoplastic brain cancer |
| Breast Cancer 1 | 99 samples, patients that did (n = 45) and did not relapse (n = 54) |
| Breast Cancer 2 | 60 samples, disease-free (n = 32) or cancer recurred (n = 38) |
| Gastric Tumor | 132 samples, 103 tumor samples and 20 normal controls |
| Lymphoma | 58 samples. Patients that did (n = 32) and did not cured (n = 26) |
| Lung Cancer 1 | 41 samples, squamous cell lung carcinoma (21) or pulmonary carcinoid (20) |
| Lung Cancer 2 | 181 samples, 31 mesothelioma samples and 150 adenocarcinoma |
| Melanoma | 70 samples, 45 cases of malignant melanoma patients and 25 of non-malignant patients |
| Myeloma | 173 samples, 137 patients with bone lytic lesions,36 patients without |
| Pancreatic Cancer | 49 samples, 24 ductal carcinoma samples and 24 normal controls |
| Prostate | Cancer 102 samples, 50 non-tumor prostate and 52 prostate tumors |
Number of genes used by classifiers in each tested dataset.
| Dataset | MIDClass | SGC-t | SGC-W | DLDA | k-NN | SVM | RF |
| Melanoma | 55 | 1 | 1 | 7200 | 7200 | 7200 | 7200 |
| Breast Cancer 1 | 8 | 1 | 1 | 17 | 17 | 17 | 15 |
| Brain Cancer | 239 | 1 | 1 | 14 | 14 | 14 | 14 |
| Breast Cancer 2 | 16 | 1 | 1 | 176 | 176 | 176 | 176 |
| Gastric Tumor | 23 | 1 | 1 | 848 | 848 | 848 | 848 |
| Lung Cancer 1 | 101 | 1 | 1 | 7472 | 7472 | 7472 | 7472 |
| Lung Cancer 2 | 55 | 1 | 1 | 3207 | 3207 | 3207 | 3207 |
| Lymphoma | 3 | 1 | 1 | 2 | 2 | 2 | 2 |
| Myeloma | 27 | 1 | 1 | 169 | 169 | 169 | 169 |
| Pancreatic Cancer | 22 | 1 | 1 | 56 | 56 | 56 | 44 |
| Prostate Cancer | 45 | 1 | 1 | 798 | 798 | 798 | 798 |
Figure 3Runninig time of MIDClass to (a) build and establish its reliability using the LOOCV and (b) to create the model and classify a new instance.
Comparisons of MIDClass , single gene classifiers and standard classifiers.
| Dataset | MIDClass | SGC-t | SGC-W | DLDA | k-NN | SVM | RF |
| Melanoma |
| 97 | 96 | 97 | 97 | 97 | 97 |
| Breast Cancer 1 |
| 63 | 69 | 61 | 53 | 52 | 43 |
| Brain Cancer |
| 80 | 77 | 65 | 73 | 60 | 70 |
| Breast Cancer 2 |
| 58 | 50 | 73 | 67 | 73 | 67 |
| Gastric Tumor | 94 (ID3, 0.05, 2) | 89 | 80 | 81 | 96 |
| 95 |
| Lung Cancer 1 |
|
| 95 | 95 |
|
|
|
| Lung Cancer 2 |
| 93 | 93 |
|
|
|
|
| Lymphoma | 69 (ID3, 0.1, 2) |
| 71 | 66 | 52 | 59 | 57 |
| Myeloma |
| 68 | 67 | 75 | 78 | 74 | 79 |
| Pancreatic Cancer | 78 (ID3, 0.05, 1) | 69 |
| 63 | 61 | 65 | 55 |
| Prostate Cancer | 92 (EWIB, 0.01, 2) | 89 | 89 | 78 |
|
|
|
We report the average accuracy of all tested classifiers on the selected dataset obtained with standard LOOCV. The performances concerning the compared algorithms have been retrieved from [17]. Concerning MIDClass , in brackets we report the discretization algorithm, the MFI threshold and the function (1: 2: ).
Figure 4MIDClass ROC curves.
MIDClass classification rules in breast cancer 2 dataset.
| Rule | Genes | Class |
| Rule1 | IL17BR[0.79,0.98], DOK2[2.29,2.44], HOXB13[−0.68,−0.09], CHDH[1.58,1.89], | |
| SCYA4[7.64,8.13], GUCY2D[4.19,2.14E7], ABCC11[5.68,6.56], IL1R2[1.49,2.14E7], | ||
| APS[0.18,2.14E7] | NonRecurrence | |
| Rule2 | ABCC11[2.84,3.19], IL17BR[0.0,2.14E7], CHDH[0.94,1.2], GUCY2D[3.53,3.8], | |
| SCYA4[7.64,8.13], APS[0.18,2.14E7] | NonRecurrence | |
| Rule3 | DOK2[2.23,2.25], APS[−0.46,−0.38], IL1R2[1.09,1.38], IL17BR[0.0,−2.29], | |
| SCYA4[8.16,2.14E7], ABCC11 [5.68,6.56], HOXB13[1.1,2.14E7] | NonRecurrence | |
| Rule4 | IL17BR[−0.43,−0.34], CHDH [0.0,2.14E7], SCYA4[6.91,7.06], APS[−0.74,−0.64], | |
| GUCY2D[4.19,2.14E7], HOXB13[1.1,2.14E7] | NonRecurrence | |
| Rule5 | GUCY2D[0.56,0.7], APS[−1.34,−1.15], HOXB13[−2.2,−2.09], DOK2[2.0,2.11], | |
| ABCC11[4.96,5.25], SCYA4[6.91,7.06], CHDH[1.58,1.89] | NonRecurrence | |
| Rule6 | HOXB13 [−0.09,0.21], ABCC11 [3.61,3.97],APS [0.0,2.14E7],IL17BR [0.12,0.79] | Recurrence |
| Rule7 | GUCY2D [2.75,2.84],HOXB13 [0.56,0.85],ABCC11 [3.44,3.51], IL17BR [−1.03,−0.76], | |
| CHDH [1.2,1.35],APS [0.0,2.14E7],DOK2 [1.21,1.45] | Recurrence | |
| Rule8 | IL17BR [1.18,1.24],ABCC11 [0.0,2.14E7],APS [0.0,2.14E7], GUCY2D [3.07,3.25], | |
| DOK2 [0.0,1.2],CHDH [1.89,2.15],HOXB13 [−2.77,−2.58], IL1R2 [0.0,−0.37] | Recurrence | |
| Rule9 | GUCY2D [2.0,2.41], IL17BR [1.46,2.14E7],APS [−0.53, −0.46],CHDH [2.36,2.14E7], | |
| ABCC11 [0.57,2.84] | Recurrence | |
| Rule10 | SCYA4 [0.0,5.99], DOK2 [0.0,1.2],IL17BR [0.12,0.79] ,IL1R2 [0.0,−0.37], | |
| APS [−1.15,−0.74] | Recurrence |