Xueqiang Zeng1, Gang Luo2. 1. Computer Center, Nanchang University, 999 Xuefu Road, Nanchang, 330031 Jiangxi People's Republic of China. 2. Department of Biomedical Informatics and Medical Education, University of Washington, UW Medicine South Lake Union, 850 Republican Street, Building C, Box 358047, Seattle, WA 98109 USA.
Abstract
PURPOSE: Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. METHODS: To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. RESULTS: We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. CONCLUSIONS: This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
PURPOSE: Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. METHODS: To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. RESULTS: We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. CONCLUSIONS: This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Entities:
Keywords:
Automatic machine learning model selection; Bayesian optimization; Clinical big data; Progressive sampling
Authors: Gang Luo; Bryan L Stone; Michael D Johnson; Peter Tarczy-Hornoch; Adam B Wilcox; Sean D Mooney; Xiaoming Sheng; Peter J Haug; Flory L Nkoy Journal: JMIR Res Protoc Date: 2017-08-29
Authors: Gang Luo; Bryan L Stone; Corinna Koebnick; Shan He; David H Au; Xiaoming Sheng; Maureen A Murtaugh; Katherine A Sward; Michael Schatz; Robert S Zeiger; Giana H Davidson; Flory L Nkoy Journal: JMIR Res Protoc Date: 2019-06-06
Authors: Gang Luo; Bryan L Stone; Michael D Johnson; Peter Tarczy-Hornoch; Adam B Wilcox; Sean D Mooney; Xiaoming Sheng; Peter J Haug; Flory L Nkoy Journal: JMIR Res Protoc Date: 2017-08-29