| Literature DB >> 27066344 |
Seyed Abbas Mahmoodi1, Kamal Mirzaie2, Seyed Mostafa Mahmoudi3.
Abstract
Cancer is the leading cause of death in economically developed countries and the second leading cause of death in developing countries. Gastric cancers are among the most devastating and incurable forms of cancer and their treatment may be excessively complex and costly. Data mining, a technology that is used to produce analytically useful information, has been employed successfully with medical data. Although the use of traditional data mining techniques such as association rules helps to extract knowledge from large data sets, sometimes the results obtained from a data set are so large that it is a major problem. In fact, one of the disadvantages of this technique is a lot of nonsense and redundant rules due to the lack of attention to the concept and meaning of items or the samples. This paper presents a new method to discover association rules using ontology to solve the expressed problems. This paper reports a data mining based on ontology on a medical database containing clinical data on patients referring to the Imam Reza Hospital at Tabriz. The data set used in this paper is gathered from 490 random visitors to the Imam Reza Hospital at Tabriz, who had been suspicions of having gastric cancer. The proposed data mining algorithm based on ontology makes rules more intuitive, appealing and understandable, eliminates waste and useless rules, and as a minor result, significantly reduces Apriori algorithm running time. The experimental results confirm the efficiency and advantages of this algorithm.Entities:
Keywords: Apriori; Data mining; Gastric cancer; Ontology
Year: 2016 PMID: 27066344 PMCID: PMC4786510 DOI: 10.1186/s40064-016-1943-9
Source DB: PubMed Journal: Springerplus ISSN: 2193-1801
Features of S-Abbas Mahmoodi dataset
| Feature type | Feature name | Range |
|---|---|---|
| Personal characteristics and behavior | Sex | Male, female |
| Blood group | A, B, AB, O | |
| Smoking | Yes, no | |
| Alcohol consumption | Yes, no | |
| Exposed to chemicals | Yes, no | |
| BMI | BMI >30, 25 < BMI > 29.5, 18.5 < BMI > 24.9, BMI <18.5 | |
| Motility | Light, medium, high | |
| Age | 40>, 41 <> 60, 61< | |
| Salt consumption | Not eat, high low | |
| Consumption of vegetable | Daily, 1–3 times a week, 1–3 times a month | |
| Consumption of smoked food | Not eat, 1–3 times a week, 1–3 times a month | |
| Milk consumption | Yes, no | |
| Fast food consumption | Not eat, 1–3 times a week, 1–3 times a month | |
| Consumption of fried foods | Not eat, 1–3 times a week, 1–3 times a month | |
| Fruit consumption | Daily, 1–3 times a week, 1–3 times a month | |
| Food storage container | Aluminum, plastic, Chinese, style, copper | |
| Dish cooking | Aluminum, Teflon, copper | |
| Systemic features and the stomach | History of allergy | Yes, no |
| Family history of cancer | Yes, no | |
| Family history of gastric cancer | Yes, no | |
| History of cardiovascular disease | Yes, no | |
| Gastric cancer | Yes, no | |
| General status of cancer | Good, so–so, poor | |
| History of gastric reflux | Yes, no | |
| history of stomach surgery | Yes, no | |
| History of gastritis | Yes, no | |
| History of stomach infection | Yes, no | |
| Mucosa status | Normal, swollen, red, sore | |
| Cancer site | Cardia, non cardia |
Fig. 1Ontology design process
Fig. 2Disease ontology
Fig. 3Cancer ontology
Fig. 4Gastric cancer ontology
Fig. 5Ontology of the risk factors of gastric cancer
Fig. 6Concepts and relationships of the ontologies
Fig. 7Workflow of the proposed algorithm
Fig. 8Algorithm process
Fig. 9Itemset, item structure
The dataset used in this example
| S. no. | Gastric cancer | History of stomach surgery | Milk consumption | Smoking | Family history of gastric cancer |
|---|---|---|---|---|---|
| 1 | ✓ | ✓ | ✗ | ✓ | ✓ |
| 2 | ✓ | ✓ | ✗ | ✓ | ✓ |
| 3 | ✓ | ✗ | ✗ | ✓ | ✓ |
| 4 | ✓ | ✓ | ✓ | ✗ | ✗ |
| 5 | ✓ | ✓ | ✓ | ✓ | ✓ |
| 6 | ✓ | ✓ | ✗ | ✓ | ✓ |
| 7 | ✗ | ✗ | ✓ | ✗ | ✓ |
| 8 | ✗ | ✗ | ✓ | ✗ | ✗ |
| 9 | ✗ | ✗ | ✓ | ✗ | ✗ |
| 10 | ✗ | ✗ | ✗ | ✓ | ✗ |
2-large itemset
Selective itemset
Candidate itemsets
The remaining candidates itemset
Set of production rules
The final set of association rules
The best extracted rulesrules
| Rules | Result | Support |
|---|---|---|
| Cancer site_cardia, history of gastric reflux_yes, history of stomach infection_no | Patient | 0.20612 |
| Cancer site_cardia, history of cardiovascular disease_no, family history of cancer_no | Patient | 0.21020 |
| Cancer site_non cardia, history of gastric reflux_yes, history of stomach infection_no | Patient | 0.17755 |
| Cancer site_non cardia, history of cardiovascular disease_no, family history of cancer_no | Patient | 0.18775 |
| Cancer site_cardia, history of cardiovascular disease_no, family history of cancer_no, family history of gastric cancer_no | Patient | 0.17755 |
| Cancer site_non cardia, history of cardiovascular disease_no, family history of cancer_no, family history of gastric cancer_no | Patient | 0.18775 |
| Cancer site_cardia, History of gastric reflux_yes | Patient | 0.2224 |
| Cancer site_cardia, history of cardiovascular disease_no | Patient | 0.2020 |
| Cancer site_cardia, history of stomach infection_no | Patient | 0.20612 |
| Cancer site_non cardia, history of gastric reflux_yes | Patient | 0.2244 |
| Cancer site_non cardia, history of cardiovascular disease_no | Patient | 0.2142 |
| Cancer site_non cardia, history of stomach infection_no | Patient | 0.21224 |
| 30 > BMI | History of gastric reflux_yes | 0.17755 |
| Salt consumption_not eat | History of gastric reflux_yes | 0.17755 |
| Salt consumption_high | History of gastric reflux_yes | 0.20612 |
| Milk consumption_no | History of gastric reflux_yes | 0.2224 |
| Sex_male, mucosa status_normal, exposed to chemicals_no, history of gastritis_no, alcohol consumption_no, 29.5 < BMI > 25 | History of gastric reflux_yes | 0.17346 |
| Mucosa status_normal, alcohol consumption_no, motility_medium, age >61, history of stomach infection_no | History of gastric reflux_yes | 0.18367 |
Run time comparison
| Records | 100 | 150 | 200 | 250 | 300 | 350 | 400 | 490 |
|---|---|---|---|---|---|---|---|---|
| Apriori | 206.1 | 168 | 416.62 | 823.58 | 715.4 | 1150.8 | 1829.55 | 2089.46 |
| Proposed algorithm | 16.26 | 118.99 | 31.7 | 44.5 | 50.5 | 56.2 | 74.58 | 125.04 |
Fig. 10Run time
The number of rules generated
| The number of rules generated by proposed algorithm | The number of rules generated by Apriori algorithm | Minimum support | Minimum confidence |
|---|---|---|---|
| 1 | 2 | 0.9 | 1 |
| 2 | 33 | 0.8 | 1 |
| 30 | 64 | 0.9 | 0.9 |
| 315 | 1422 | 0.8 | 0.9 |
| 648 | 8342 | 0.7 | 0.9 |
Effect of proposed algorithm on the number of extracted rules
| Percent of rules removed to total rules | Number of eliminated rules | The number of rules generated by proposed algorithm | The number of rules generated by Apriori algorithm | Minimum support | Minimum confidence |
|---|---|---|---|---|---|
| 77.8 | 1107 | 315 | 1422 | 0.8 | 0.9 |
| 92.2 | 7694 | 648 | 8342 | 0.7 | 0.9 |
Statistics obtained from the proposed algorithm
| The number of rules generated | The number of large and frequent | Number of 2-large item sets | Minimum support |
|---|---|---|---|
| 2635 | 527 | 92 | 0.5 |
| 1699 | 341 | 62 | 0.55 |
| 1486 | 260 | 46 | 0.6 |
| 1129 | 213 | 44 | 0.65 |
| 648 | 144 | 37 | 0.7 |
| 434 | 90 | 25 | 0.75 |
| 315 | 68 | 19 | 0.8 |
| 130 | 43 | 17 | 0.85 |
| 30 | 19 | 9 | 0.9 |