| Literature DB >> 19458758 |
Joseph A Cruz1, David S Wishart.
Abstract
Machine learning is a branch of artificial intelligence that employs a variety of statistical, probabilistic and optimization techniques that allows computers to "learn" from past examples and to detect hard-to-discern patterns from large, noisy or complex data sets. This capability is particularly well-suited to medical applications, especially those that depend on complex proteomic and genomic measurements. As a result, machine learning is frequently used in cancer diagnosis and detection. More recently machine learning has been applied to cancer prognosis and prediction. This latter approach is particularly interesting as it is part of a growing trend towards personalized, predictive medicine. In assembling this review we conducted a broad survey of the different types of machine learning methods being used, the types of data being integrated and the performance of these methods in cancer prediction and prognosis. A number of trends are noted, including a growing dependence on protein biomarkers and microarray data, a strong bias towards applications in prostate and breast cancer, and a heavy reliance on "older" technologies such artificial neural networks (ANNs) instead of more recently developed or more easily interpretable machine learning methods. A number of published studies also appear to lack an appropriate level of validation or testing. Among the better designed and validated studies it is clear that machine learning methods can be used to substantially (15-25%) improve the accuracy of predicting cancer susceptibility, recurrence and mortality. At a more fundamental level, it is also evident that machine learning is also helping to improve our basic understanding of cancer development and progression.Entities:
Keywords: Cancer; machine learning; prediction; prognosis; risk
Year: 2007 PMID: 19458758 PMCID: PMC2675494
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Figure 1.A histogram showing the steady increase in published papers using machine learning methods to predict cancer risk, recurrence and outcome. The data were collected using a variety of keyword searches through PubMed, CiteSeer, Google Scholar, Science Citation Index and other online resources. Each bar represents the cumulative total of papers published over a two year period. The earliest papers appeared in the early 1990’s.
Figure 2.An example of how a machine learner is trained to recognize images using a training set (a corrupted image of the number “8”) which is labeled or identified as the number “8”.
Figure 3.An example of a simple decision tree that might be used in breast cancer diagnosis and treatment. This is an example of a tree that might be formulated via expert assessment. Similar tree structures can be generated by decision tree learners.
Survey of machine learning methods used in cancer prediction showing the types of cancer, clinical endpoints, choice of algorithm, performance and type of training data.
| Cancer Type | Clinical Endpoint | Machine Learning Algorithm | Benchmark | Improvement (%) | Training Data | Reference |
|---|---|---|---|---|---|---|
| bladder | recurrence | fuzzy logic | statistics | 16 | mixed | |
| bladder | recurrence | ANN | N/A | N/A | clinical | |
| bladder | survivability | ANN | N/A | N/A | clinical | |
| bladder | recurrence | ANN | N/A | N/A | clinical | |
| brain | survivability | ANN | statistics | N/A | genomic | |
| breast | recurrence | clustering | statistics | N/A | mixed | |
| breast | survivability | decision tree | statistics | 4 | clinical | |
| breast | susceptibility | SVM | random | 19 | genomic | |
| breast | recurrence | ANN | N/A | N/A | clinical | |
| breast | recurrence | ANN | N/A | N/A | mixed | |
| breast | recurrence | ANN | statistics | 1 | clinical | |
| breast | survivability | ANN | statistics | N/A | clinical | |
| breast | treatment response | ANN | N/A | N/A | proteomic | |
| breast | survivability | clustering | statistics | 0 | clinical | |
| breast | survivability | fuzzy logic | statistics | N/A | proteomic | |
| breast | survivability | SVM | N/A | N/A | clinical | |
| breast | recurrence | ANN | expert | 5 | mixed | |
| breast | survivability | ANN | statistics | 1 | clinical | |
| breast | recurrence | ANN | statistics | 23 | mixed | |
| breast | recurrence | ANN | N/A | N/A | clinical | |
| breast | survivability | ANN | N/A | N/A | clinical | |
| breast | survivability | ANN | expert | 5 | clinical | |
| breast | recurrence | ANN | statistics | N/A | mixed | |
| breast | recurrence | ANN | expert | 10 | clinical | |
| cervical | survivability | ANN | N/A | N/A | mixed | |
| colorectal | recurrence | ANN | statistics | 12 | clinical | |
| colorectal | survivability | ANN | statistics | 9 | clinical | |
| colorectal | survivability | clustering | N/A | N/A | clinical | |
| colorectal | recurrence | ANN | statistics | 9 | mixed | |
| colorectal | survivability | ANN | expert | 11 | clinical | |
| esophageal | treatment response | SVM | N/A | N/A | proteomic | |
| esophageal | survivability | ANN | statistics | 3 | clinical | |
| leukemia | recurrence | decision tree | N/A | N/A | proteomic | |
| liver | recurrence | ANN | statistics | 25 | genomic | |
| liver | recurrence | SVM | N/A | N/A | genomic | |
| liver | susceptibility | ANN | statistics | –2 | clinical | |
| liver | survivability | ANN | N/A | N/A | clinical | |
| lung | survivability | ANN | N/A | N/A | clinical | |
| lung | survivability | ANN | statistics | 9 | mixed | |
| lung | survivability | ANN | N/A | N/A | mixed | |
| lung | survivability | ANN | statistics | N/A | mixed | |
| lung | survivability | ANN | N/A | N/A | clinical | |
| lymphoma | survivability | ANN | statistics | 22 | genomic | |
| lymphoma | survivability | ANN | expert | 10 | mixed | |
| lymphoma | survivability | ANN | N/A | N/A | genomic | |
| lymphoma | survivability | ANN | expert | N/A | genomic | |
| lymphoma | survivability | clustering | N/A | N/A | genomic | |
| head/neck | survivability | ANN | statistics | 11 | clinical | |
| neck | treatment response | ANN | N/A | N/A | clinical | |
| ocular | survivability | SVM | N/A | N/A | genomic | |
| osteosarcoma | treatment response | SVM | N/A | N/A | genomic | |
| pleural mesothelioma | survivability | clustering | N/A | N/A | genomic | |
| prostate | treatment response | ANN | N/A | N/A | mixed | |
| prostate | recurrence | ANN | statistics | 0 | clinical | |
| prostate | treatment response | ANN | N/A | N/A | clinical | |
| prostate | recurrence | ANN | statistics | 16 | mixed | Poulakis et al, 2004a |
| prostate | recurrence | ANN | statistics | 11 | mixed | Poulakis et al, 2004b |
| prostate | recurrence | SVM | statistics | 6 | clinical | |
| prostate | recurrence | ANN | statistics | 0 | clinical | |
| prostate | recurrence | genetic algorithm | N/A | N/A | mixed | |
| prostate | recurrence | ANN | statistics | 0 | clinical | |
| prostate | susceptibility | decision tree | N/A | N/A | clinical | |
| prostate | recurrence | ANN | statistics | 13 | clinical | |
| prostate | treatment response | ANN | N/A | N/A | proteomic | |
| prostate | recurrence | naïve Bayes | statistics | 1 | clinical | |
| prostate | recurrence | ANN | N/A | N/A | clinical | |
| prostate | recurrence | ANN | statistics | 17 | clinical | |
| prostate | recurrence | ANN | N/A | N/A | mixed | |
| skin | survivability | ANN | expert | 14 | clinical | |
| skin | recurrence | ANN | expert | 27 | proteomic | |
| skin | survivability | ANN | expert | 0 | clinical | |
| skin | survivability | genetic algorithm | N/A | N/A | clinical | |
| stomach | recurrence | ANN | expert | 28 | clinical | |
| throat | recurrence | fuzzy logic | N/A | N/A | clinical | |
| throat | recurrence | ANN | statistics | 0 | genomic | |
| throat | survivability | decision tree | statistics | N/A | proteomic | |
| thoracic | treatment response | ANN | N/A | N/A | clinical | |
| thyroid | survivability | decision tree | statistics | N/A | clinical | Kukar et al, 1997 |
| tropho- | survivability | genetic algorithm | N/A | N/A | clinical |
Figure 4.A simplified illustration of how an SVM might work in distinguishing between basketball players and weightlifters using height/weight support vectors. In this simple case the SVM has identified a hyperplane (actually a line) which maximizes the separation between the two clusters.
Summary of benefits, assumptions and limitations of different machine learning algorithms
| Machine Learning Algorithm | Benefits | Assumptions and/or Limitations |
|---|---|---|
| Decision Tree
| easy to understand and efficient training algorithm order of training instances has no effect on training pruning can deal with the problem of overfitting | classes must be mutually exclusive final decision tree dependent upon order of attribute selection errors in training set can result in overly complex decision trees missing values for an attribute make it unclear about which branch to take when that attribute is tested |
| Naïve Bayes
| foundation based on statistical modelling easy to understand and efficient training algorithm order of training instances has no effect on training useful across multiple domains | assumes attributes are statistically independent* assumes normal distribution on numeric attributes classes must be mutually exclusive redundant attributes mislead classification attribute and class frequencies affect accuracy |
fast classification of instances useful for non-linear classification problems robust with respect to irrelevant or novel attributes tolerant of noisy instances or instances with missing attribute values can be used for both regression and classification | slower to update concept description assumes that instances with similar attributes will have similar classifications assumes that attributes will be equally relevant too computationally complex as number of attributes increases | |
| Neural Network
| can be used for classification or regression able to represent Boolean functions (AND, OR, NOT) tolerant of noisy inputs instances can be classified by more than one output | difficult to understand structure of algorithm too many attributes can result in overfitting optimal network structure can only be determined by experimentation |
| Support Vector Machine
| models nonlinear class boundaries overfitting is unlikely to occur computational complexity reduced to quadratic optimization problem easy to control complexity of decision rule and frequency of error | training is slow compared to Bayes and Decision Trees difficult to determine optimal parameters when training data is not linearly separable difficult to understand structure of algorithm |
| Genetic Algorithm
| simple algorithm, easy to implement can be used in feature classification and feature selection primarily used in optimization always finds a “good” solution (not always the best solution) | computation ordevelopment of scoring function is non trivial not the most efficient method to find some optima, tends to find local optima rather than global complications involved in the representation of training/output data |
Figure 5.A histogram showing the frequency with which different types of machine learning methods are used to predict different types of cancer. Breast and prostate cancer dominate, however a good range of cancers from different organs or tissues also appear to be compatible with machine learning prognoses. The “other” cancers include brain, cervical, esophageal, leukemia, head, neck, ocular, osteosarcoma, pleural mesothelioma, thoracic, thyroid, and trophoblastic (uterine) malignancies. Figure 1.