| Literature DB >> 21989191 |
Abstract
BACKGROUND: Analysing gene expression data from microarray technologies is a very important task in biology and medicine, and particularly in cancer diagnosis. Different from most other popular methods in high dimensional bio-medical data analysis, such as microarray gene expression or proteomics mass spectroscopy data analysis, fuzzy rule-based models can not only provide good classification results, but also easily be explained and interpreted in human understandable terms, by using fuzzy rules. However, the advantages offered by fuzzy-based techniques in microarray data analysis have not yet been fully explored in the literature. Although some recently developed fuzzy-based modeling approaches can provide satisfactory classification results, the rule bases generated by most of the reported fuzzy models for gene expression data are still too large to be easily comprehensible.Entities:
Mesh:
Year: 2011 PMID: 21989191 PMCID: PMC3194236 DOI: 10.1186/1471-2164-12-S2-S5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
A typical gene expression matrix X, where rows represent samples obtained under different experimental conditions and columns represent genes
| Gene 1 | Gene 2 | … | Gene n-1 | Gene n | Class | |
|---|---|---|---|---|---|---|
| 1 | 165.1 | 276.4 | … | 636.6 | 784.9 | 1 |
| 2 | 653.6 | 1735.1 | … | 524.1 | 104.5 | -1 |
| … | … | … | … | … | … | … |
| m-1 | 675.0 | 45.1 | … | 841.9 | 782.8 | -1 |
| m | 78.2 | 893.8 | … | 467.9 | 330.1 | 1 |
Figure 1Membership functions from multiple fuzzy partitions. 15 membership functions from four fuzzy partitions of the domain interval [0, 1]. S, MS, M, ML and L denote Small, Medium Small (relatively small), Medium, Medium Large (relatively large) and Large, respectively. DC denotes “Don’t Care” membership function.
Classification accuracy and interpretability of models on the lung cancer data set.
| Number of Rules | Average Rule Length | Testing Accuracy | |||
|---|---|---|---|---|---|
| 0.1 | 0.7 | 0.2 | 2 | 1.5 | 89.26 |
| 0.5 | 0.1 | 0.4 | 6 | 1.8 | 90.06 |
| 0.5 | 0.4 | 0.1 | 3 | 2 | 90.06 |
| 0.5 | 0.2 | 0.2 | 3 | 2 | 89.93 |
| 0.7 | 0.1 | 0.2 | 3 | 2 | 91.28 |
| 1 | 0 | 0 | 23 | 2 | 90.06 |
Classification accuracy and interpretability of models on the ovarian cancer data set.
| Number of Rules | Average Rule Length | Testing Accuracy | |||
|---|---|---|---|---|---|
| 0.7 | 0.2 | 0.1 | 36 | 2.3 | 86.71 |
| 0.5 | 0.2 | 0.3 | 16 | 2 | 78.03 |
| 0.3 | 0.4 | 0.3 | 8 | 2 | 63.75 |
Figure 2The rule extraction process for the lung cancer data set. Left (UP): The fitness value of the best rule base found in the population; Right (UP): The testing accuracy given by the best rule base; Left (Down): The total number of fuzzy rules in the rule base; Right (Down): The sum of the length of all rules in the rule base.
Figure 3The rule extraction process for the ovarian cancer data set. Left (UP): The fitness value of the best rule base found in the population; Right (UP): The testing accuracy given by the best rule base; Left (Down): The total number of fuzzy rules in the rule base; Right (Down): The sum of the length of all rules in the rule base.
The selected rule subset for lung cancer data when testing accuracy = 0.8993; “–” denotes “don’t care” condition.
| 40256.at | 1018.at | 35792.at | 33357.at | CF | Class | |
|---|---|---|---|---|---|---|
| Rule 1 | - | - | 0.9999 | 1 | ||
| Rule 2 | - | - | 0.9829 | -1 | ||
| Rule 3 | - | - | 0.9725 | -1 | ||
The selected rule subset for ovarian cancer data when testing accuracy = 0.6375. “–” denotes “don’t care”.
| MZ820.8 | MZ6880.2 | MZ1730.9 | MZ1866.7 | MZ18871.5 | MZ827.3 | Class | |
|---|---|---|---|---|---|---|---|
| Rule 1 | - | - | - | - | 1 | ||
| Rule 2 | - | - | - | - | 1 (0.9995) | ||
| Rule 3 | - | - | - | - | 1 (0.9994) | ||
| Rule 4 | - | - | - | - | -1 (0.9999) | ||
| Rule 5 | - | - | - | - | -1 (0.9999) | ||
| Rule 6 | - | - | - | - | -1 (0.9997) | ||
| Rule 7 | - | - | - | - | -1 (0.9996) | ||
| Rule 8 | - | - | - | - | -1 (0.9994) | ||