| Literature DB >> 26417431 |
Gang Luo1.
Abstract
BACKGROUND: Predictive modeling is fundamental for extracting value from large clinical data sets, or "big clinical data," advancing clinical research, and improving healthcare. Machine learning is a powerful approach to predictive modeling. Two factors make machine learning challenging for healthcare researchers. First, before training a machine learning model, the values of one or more model parameters called hyper-parameters must typically be specified. Due to their inexperience with machine learning, it is hard for healthcare researchers to choose an appropriate algorithm and hyper-parameter values. Second, many clinical data are stored in a special format. These data must be iteratively transformed into the relational table format before conducting predictive modeling. This transformation is time-consuming and requires computing expertise.Entities:
Keywords: Automatic algorithm selection; Automatic hyper-parameter value selection; Big clinical data; Entity–Attribute–Value; Machine learning; Pivot
Year: 2015 PMID: 26417431 PMCID: PMC4584489 DOI: 10.1186/s13755-015-0011-0
Source DB: PubMed Journal: Health Inf Sci Syst ISSN: 2047-2501
Fig. 1Pivot to obtain the columns for the three clinical parameters ‘test 1,’ ‘test 2,’ and ‘test 3’
Fig. 2Architecture of MLBCD
Fig. 3An illustration of progressive sampling used in our automatic search method
Fig. 4An example dependency graph formed by all hyper-parameters of a machine learning algorithm
Fig. 5Training set size vs. model’s accuracy
Description of the dependent variables
| Variable | Description |
|---|---|
| Prediction accuracy | AUC achieved by the predictive model built |
| Time | Number of hours spent on building the predictive model |
| Satisfaction | Responses to three questions: (1) How satisfied were you with the predictive model built? (2) How easy was the predictive model building process? and (3) How much effort did it take to complete the predictive modeling task? Ratings are on a 1–7 scale with anchors of not at all/completely; difficult/easy; and a lot of effort/little effort |
| Self-efficacy for building machine learning predictive models with big clinical data | Response to the question: overall how confident are you about your ability to build machine learning predictive models with big clinical data [ |
| Adequacy | How sufficiently do you think MLBCD supports building machine learning predictive models with big clinical data? Rating is on a 1–7 scale with anchors of not at all/sufficiently |
| Trustworthiness | How much sense do you think the predictive models make clinically? Rating is on a 1–7 scale with anchors of not at all/completely. |
| Documentation quality | Responses to two questions: (1) How comprehensive is MLBCD’s user manual? (2) How easy is MLBCD’s user manual to understand? Ratings are on a 1–7 scale with anchors of not at all/comprehensive; and difficult/easy |
MLBCD vs. existing automatic selection methods for machine learning algorithms and/or hyper-parameter values
| Method | Select algorithms | Select hyper-parameter values | Can efficiently handle big data | Can handle a wide range of algorithms | Can handle various types of hyper-parameters |
|---|---|---|---|---|---|
| MLBCD | ✓ | ✓ | ✓ | ✓ | ✓ |
| [ | ✓ | × | × | × | × |
| [ | ✓ | × | × | ✓ | × |
| [ | ✓ | × | ✓ | ✓ | × |
| [ | × | ✓ | × | × | ✓ |
| [ | × | ✓ | × | × | × |
| [ | ✓ | ✓ | × | ✓ | ✓ |