| Literature DB >> 33195426 |
Rasha Elnemr1, Mohammed M Nasef2, Passant Elkafrawy2,3, Mahmoud Rafea1, Amani Tariq Jamal4.
Abstract
Malignant Tumors are developed over several years due to unknown biological factors. These biological factors induce changes in the body and consequently, they lead to Malignant Tumors. Some habits and behaviors initiate these biological factors. In effect, the immune system cannot recognize a Malignant Tumor as foreign tissue. In order to discover a fascinating pattern of these habits, behaviors, and diseases and to make effective decisions, different machine learning techniques should be used. This research attempts to find the association between normal proteins (environmental factors) and diseases that are difficult to diagnose and propose justifications for those diseases. This paper proposes a technique for medical data mining using association rules. The proposed technique overcomes some of the limitations in current association algorithms such as the Apriori algorithm and the Equivalence CLAss Transformation (ECLAT) algorithm. A modification to the Apriori algorithm has been proposed to mine Erythrocytes Dynamic Antigens Store (EDAS) data in a more efficient and tractable way. The experiments inferred that there is a relation between normal proteins as environment proteins, food proteins, commensal proteins, tissue proteins, and disease proteins. Also, the experiments show that habits and behaviors are associated with certain diseases. The presented tool can be used in clinical laboratories to discover the biological causes of malignant diseases.Entities:
Keywords: apriori algorithm; association rule; data mining; eclat algorithm; malignant tumors
Year: 2020 PMID: 33195426 PMCID: PMC7643003 DOI: 10.3389/fmolb.2020.582593
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1Workflow pipeline of the experiment.
Results of the experiment for malignant tumors (Rafea et al., 2019).
| Disease | M1 | M2 | M3 | M4 | M5 | M6 | M7 | M8 | M9 | M10 |
| Number of_records | 2063 | 2109 | 2083 | 2053 | 2035 | 2094 | 2062 | 2135 | 1982 | 2096 |
| Disease | M11 | M12 | M13 | M14 | M15 | M16 | M17 | M18 | M19 | M20 |
| Number of_records | 2040 | 2084 | 2076 | 2149 | 2130 | 2115 | 2059 | 2080 | 2116 | 2181 |
Results of phase two (malignant tumor subtypes).
| Disease | Its subtypes | Number of subtypes |
| Malignant Tumor (M1) | M1T1, M1T2, M1T3, M1T4, and M1T5 | 5 |
| Malignant Tumor (M7) | M7T1, M7T2, M7T3, M7T4, M7T5, and M7T6 | 6 |
| Malignant Tumor (M18) | M18T1, M18T2, M18T3, M18T4, M18T5, and M18T6 | 6 |
| Malignant Tumor (M20) | M20T1, M20T2, M20T3, M20T4, M20T5, M20T6, and M20T7 | 7 |
Results of phase three (association rule mining).
| Disease | Disease Subtype | Associations between proteins | |
| Normal proteins | Diseased proteins | ||
| Malignant Tumor (M1) | M1T1 | P3580 | P119913, P119662, P119786, P119535, P119939 |
| Malignant Tumor (M1) | M1T2 | P3887 | P119640, P119513, P119637, P119515, P119790, P119541, P119917, P119886, P119792, P119668 |
| Malignant Tumor (M7) | M7T5 | P3734, P6006 | P719501, P719807 |
| Malignant Tumor (M7) | M7T6 | P625, P3886 | P719785, P719964 |
| Malignant Tumor (M18) | M18T2 | P6376 | P1819795, P1819919, P1819668, P1819546 |
| Malignant Tumor (M18) | M18T5 | P327, P3479 | P1819668, P1819696 |
| Malignant Tumor (M20) | M20T3 | P777 | P2019891, P2019964, P2019765, P2019863, P2019640, P2019643, P2019516, P2019741, P2019614, P2019518, P2019638, P2019889, P2019767, P2019990 |
Rules, confidence, lift, and leverage.
| Rule Number | Rule | Confidence | Lift | Leverage |
| R1 | P3479 ^ P327 → P1819668 | 86.36% | 1.2164 | 0.0711 |
| R2 | P3479 ^ P327 → P1819696 | 84.81% | 1.1751 | 0.0622 |
| R3 | P3479 → P1819696 | 81.48% | 1.1111 | 0.0533 |
| R4 | P347 9→ P1819668 | 81.48% | 1.062 | 0.0433 |
Precision, recall, f-measure, and accuracy of the three algorithms on different support (40%, 50%, 60%, and 70%).
| Algorithm | Precision % | Recall % | F-measure % | Accuracy % | ||||||||||||
| 40% | 50% | 60% | 70% | 40% | 50% | 60% | 70% | 40% | 50% | 60% | 70% | 40% | 50% | 60% | 70% | |
| 42 | 54 | 62 | 64 | 67 | 69 | 84 | 87 | 52 | 61 | 71 | 74 | 91 | 95 | 98 | 98 | |
| ITDApriori | 42 | 54 | 62 | 64 | 67 | 69 | 84 | 87 | 52 | 61 | 71 | 74 | 91 | 95 | 98 | 98 |
| Proposed Algorithm | ||||||||||||||||
The execution time (in second) comparison among Apriori Algorithm, ITDApriori, Proposed ARM Algorithm on 500 transactions.
| Minimum Support | ITDApriori (in sec.) | Proposed ARM (in sec.) | Improvement % | |
| 40% | 12174 | 9296 | ||
| 50% | 10979 | 7312 | ||
| 60% | 8472 | 4620 | ||
| 70% | 3125 | 1368 |
FIGURE 2Execution times over different support: Apriori Algorithm ITD Apriori proposed Algorithm on 500 transactions.
The execution time (in second) comparison among Apriori Algorithm, ITDApriori, Proposed ARM Algorithm on 1000 transactions.
| Minimum Support | ITDApriori (in sec.) | Proposed ARM (in sec.) | Improvement (%) | |
| 40% | 20067 | 10849 | ||
| 50% | 16955 | 9452 | ||
| 60% | 13433 | 6028 | ||
| 70% | 7628 | 3730 |
FIGURE 3Execution times over different support: Apriori Algorithm ITD Apriori proposed Algorithm on 1000 transactions.
The execution time (in second) comparison among Apriori Algorithm, ITDApriori, Proposed ARM Algorithm on 1500 transactions.
| Minimum Support | ITDApriori (in sec.) | Proposed ARM (in sec.) | Improvement % | |
| 40% | 28176 | 13880 | ||
| 50% | 24013 | 10277 | ||
| 60% | 17642 | 8095 | ||
| 70% | 10384 | 5260 |
FIGURE 4Execution times over different support: Apriori Algorithm ITD Apriori proposed Algorithm on 1500 transactions.