| Literature DB >> 27280018 |
Gang Luo1.
Abstract
BACKGROUND: Predictive modeling is fundamental to transforming large clinical data sets, or "big clinical data," into actionable knowledge for various healthcare applications. Machine learning is a major predictive modeling approach, but two barriers make its use in healthcare challenging. First, a machine learning tool user must choose an algorithm and assign one or more model parameters called hyper-parameters before model training. The algorithm and hyper-parameter values used typically impact model accuracy by over 40 %, but their selection requires many labor-intensive manual iterations that can be difficult even for computer scientists. Second, many clinical attributes are repeatedly recorded over time, requiring temporal aggregation before predictive modeling can be performed. Many labor-intensive manual iterations are required to identify a good pair of aggregation period and operator for each clinical attribute. Both barriers result in time and human resource bottlenecks, and preclude healthcare administrators and researchers from asking a series of what-if questions when probing opportunities to use predictive models to improve outcomes and reduce costs.Entities:
Keywords: Automated temporal aggregation; Automatic algorithm selection; Automatic hyper-parameter value selection; Big clinical data; Machine learning
Year: 2016 PMID: 27280018 PMCID: PMC4897944 DOI: 10.1186/s13755-016-0018-1
Source DB: PubMed Journal: Health Inf Sci Syst ISSN: 2047-2501
Some example machine learning (ML) algorithms, ordinary parameters and hyper-parameters
| ML algorithm | Example ordinary parameters | Example hyper-parameters |
|---|---|---|
| Random forest | Input variable and threshold value selected at every internal node of a decision tree | Number of decision trees, number of input variables to evaluate at every internal node of a decision tree |
| Support vector machine | Support vectors, lagrange multiplier for every support vector | Kernel to use, degree of a polynomial kernel, |
Fig. 1The current approach of building machine learning models vs. PredicT-ML’s
Fig. 2Architecture of PredicT-ML
Fig. 3Progressive sampling used in our automatic selection method
Fig. 4Use MapReduce to obtain temporal aggregate values of two clinical parameters ‘test 1’ and ‘test 2’
Fig. 5The highest model accuracy achieved by PredicT-ML over time