| Literature DB >> 28851678 |
Gang Luo1, Bryan L Stone2, Michael D Johnson2, Peter Tarczy-Hornoch1,3,4, Adam B Wilcox1, Sean D Mooney1, Xiaoming Sheng2, Peter J Haug5,6, Flory L Nkoy2.
Abstract
BACKGROUND: To improve health outcomes and cut health care costs, we often need to conduct prediction/classification using large clinical datasets (aka, clinical big data), for example, to identify high-risk patients for preventive interventions. Machine learning has been proposed as a key technology for doing this. Machine learning has won most data science competitions and could support many clinical activities, yet only 15% of hospitals use it for even limited purposes. Despite familiarity with data, health care researchers often lack machine learning expertise to directly use clinical big data, creating a hurdle in realizing value from their data. Health care researchers can work with data scientists with deep machine learning knowledge, but it takes time and effort for both parties to communicate effectively. Facing a shortage in the United States of data scientists and hiring competition from companies with deep pockets, health care systems have difficulty recruiting data scientists. Building and generalizing a machine learning model often requires hundreds to thousands of manual iterations by data scientists to select the following: (1) hyper-parameter values and complex algorithms that greatly affect model accuracy and (2) operators and periods for temporally aggregating clinical attributes (eg, whether a patient's weight kept rising in the past year). This process becomes infeasible with limited budgets.Entities:
Keywords: automated temporal aggregation; automatic model selection; care management; clinical big data; machine learning
Year: 2017 PMID: 28851678 PMCID: PMC5596298 DOI: 10.2196/resprot.7757
Source DB: PubMed Journal: JMIR Res Protoc ISSN: 1929-0748
Two learning algorithms and their example normal parameters and hyper-parameters.
| Learning algorithm | Example hyper-parameters | Example normal parameters |
| Support vector machine | Regularization constant C, kernel to use, tolerance parameter, ε for round-off error, a polynomial kernel’s degree | Support vectors and their Lagrange multipliers |
| Random forest | Number of independent variables to examine at each inner node of a classification and regression tree, number of trees | Threshold value and input variable used at each inner node of a tree |
Figure 1Auto-ML’s approach of constructing machine learning models versus the present one.
Figure 2Progressive sampling adopted in our draft automatic model selection method.
Figure 3The highest model accuracy gained by Auto-ML over time.
The dependent variable list.
| Variable | Description |
| Impact on enrollment decision | Response to the following question: Will the prediction result and automatically generated explanations change your enrollment decision on the patient? |
| Usefulness of the prediction result | Response to the following question: How useful is the prediction result? Rating is on a 7-point Likert scale, ranging from “not at all” (1) to “very useful” (7). |
| Usefulness of the automatically generated explanations | Response to the following question: How useful are the automatically generated explanations? Rating is on a 7-point Likert scale, ranging from “not at all” (1) to “very useful” (7). |
| Trustworthiness of the prediction result | Response to the following question: In your opinion, how much clinical sense does the prediction result make? Rating is on a 7-point Likert scale, ranging from “not at all” (1) to “completely” (7). |
| Trustworthiness of the automatically generated explanations | Response to the following question: In your opinion, how much clinical sense do the automatically generated explanations make? Rating is on a 7-point Likert scale, ranging from “not at all” (1) to “completely” (7). |