| Literature DB >> 31861438 |
Giovanni Scala1, Antonio Federico2, Vittorio Fortino3, Dario Greco2,4, Barbara Majello1.
Abstract
The explosion of omics data availability in cancer research has boosted the knowledge of the molecular basis of cancer, although the strategies for its definitive resolution are still not well established. The complexity of cancer biology, given by the high heterogeneity of cancer cells, leads to the development of pharmacoresistance for many patients, hampering the efficacy of therapeutic approaches. Machine learning techniques have been implemented to extract knowledge from cancer omics data in order to address fundamental issues in cancer research, as well as the classification of clinically relevant sub-groups of patients and for the identification of biomarkers for disease risk and prognosis. Rule induction algorithms are a group of pattern discovery approaches that represents discovered relationships in the form of human readable associative rules. The application of such techniques to the modern plethora of collected cancer omics data can effectively boost our understanding of cancer-related mechanisms. In fact, the capability of these methods to extract a huge amount of human readable knowledge will eventually help to uncover unknown relationships between molecular attributes and the malignant phenotype. In this review, we describe applications and strategies for the usage of rule induction approaches in cancer omics data analysis. In particular, we explore the canonical applications and the future challenges and opportunities posed by multi-omics integration problems.Entities:
Keywords: TCGA (The Cancer Genome Atlas); cancer; machine learning; omics data; patients classification; rule induction
Mesh:
Year: 2019 PMID: 31861438 PMCID: PMC6981587 DOI: 10.3390/ijms21010018
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Typical shape of an omics data matrix. Blue arrows link the column of the matrix to the different omics data type that are frequently found in a multi-omics experiment.
Rule induction methods applicable to cancer omics data.
| Tool | Strategy | Output | Implementation | Language |
|---|---|---|---|---|
| C4.5 [ | Decision tree | Decision trees | WEKA [ | Java, R, Python |
| RIPPER (Repeated Incremental Pruning to Produce Error Reduction) [ | Sequential covering | Rule set | WEKA/JRip | Java, R, Python |
| PART (Partial Decision Trees) [ | Sequential covering | Rule set | WEKA/PART | Java, R, Python |
| CAMUR (Classifier with Alternative and MUltiple Rule-based models) [ | Sequential covering | Rule set | CAMUR website 1,2 | Java |
| BIGBIOCL [ | Sequential covering | Rule set | BIGBIOCL github 3 | Java |
| FURIA (Fuzzy Unordered Rule Induction Algorithm) [ | Sequential covering | Fuzzy rule set | WEKA/FURIA | Java, R, Python |
| MLRules (Maximum Likelihood Rule Ensembles) [ | Sequential covering and probability estimation | Rule set | MLRules website 4 | Java |
| LERS (Learning from Examples based on Rough Sets) [ | Rough set theory | Rule set | R/RoughSets package | R |
| TSP (Top Scoring Pairs) [ | Rank based | Rule set | R/tspair package | R |
| k-TSP (k - Top Scoring Pairs) [ | Rank based | Rule set | R/switchbox package | R |
| BIOHEL (Bioinformatics-oriented Hierarchical Evolutionary Learning) [ | Evolutionary rule learning | Rule set | BIOHEL website 5 | C++ |
| CN2-SD (Clark & Niblet – Subgroup Discovery) [ | Subgroup discovery | Rule set | CN2-SD website 6 | Java |
| SDEFSR (Subgroup Discovery with Evolutionary Fuzzy Systems) [ | Subgroup discovery | Fuzzy rule set | R/SDEFSR | R |
1http://dmb.iasi.cnr.it/camur.php; 2http://bioinformatics.iasi.cnr.it/camurweb/home; 3https://github.com/fcproj/BIGBIOCL; 4http://www.cs.put.poznan.pl/wkotlowski/software-mlrules.html; 5https://ico2s.org/software/biohel.html; 6http://www.keel.es/algorithms.php.