André Rodrigues Olivera1, Valter Roesler2, Cirano Iochpe2, Maria Inês Schmidt3, Álvaro Vigo4, Sandhi Maria Barreto5, Bruce Bartholow Duncan3. 1. MSc. IT Analyst, Postgraduate Computing Program, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre (RS), Brazil. 2. PhD. Professor, Postgraduate Computing Program, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre (RS), Brazil. 3. PhD. Professor, Postgraduate Epidemiology Program and Hospital de Clínicas, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre (RS), Brazil. 4. PhD. Professor, Postgraduate Epidemiology Program, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre (RS), Brazil. 5. PhD. Professor, Department of Social and Preventive Medicine & Postgraduate Program in Public Health, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte (MG), Brazil.
Abstract
CONTEXT AND OBJECTIVE: : Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: : Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: : After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. RESULTS: : The best models were created using artificial neural networks and logistic regression. -These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: : Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.
CONTEXT AND OBJECTIVE: : Type 2 diabetes is a chronic disease associated with a wide range of serious health complications that have a major impact on overall health. The aims here were to develop and validate predictive models for detecting undiagnosed diabetes using data from the Longitudinal Study of Adult Health (ELSA-Brasil) and to compare the performance of different machine-learning algorithms in this task. DESIGN AND SETTING: : Comparison of machine-learning algorithms to develop predictive models using data from ELSA-Brasil. METHODS: : After selecting a subset of 27 candidate variables from the literature, models were built and validated in four sequential steps: (i) parameter tuning with tenfold cross-validation, repeated three times; (ii) automatic variable selection using forward selection, a wrapper strategy with four different machine-learning algorithms and tenfold cross-validation (repeated three times), to evaluate each subset of variables; (iii) error estimation of model parameters with tenfold cross-validation, repeated ten times; and (iv) generalization testing on an independent dataset. The models were created with the following machine-learning algorithms: logistic regression, artificial neural network, naïve Bayes, K-nearest neighbor and random forest. RESULTS: : The best models were created using artificial neural networks and logistic regression. -These achieved mean areas under the curve of, respectively, 75.24% and 74.98% in the error estimation step and 74.17% and 74.41% in the generalization testing step. CONCLUSION: : Most of the predictive models produced similar results, and demonstrated the feasibility of identifying individuals with highest probability of having undiagnosed diabetes, through easily-obtained clinical data.
Authors: Nicoletta Musacchio; Annalisa Giancaterini; Giacomo Guaita; Alessandro Ozzello; Maria A Pellegrini; Paola Ponzani; Giuseppina T Russo; Rita Zilich; Alberto de Micheli Journal: J Med Internet Res Date: 2020-06-22 Impact factor: 5.428
Authors: Amanda Yumi Ambriola Oku; Guilherme Augusto Zimeo Morais; Ana Paula Arantes Bueno; André Fujita; João Ricardo Sato Journal: Int J Environ Res Public Health Date: 2019-12-21 Impact factor: 3.390
Authors: Ye Zhang; Qing Zhang; Lei Li; Ravi Thomas; Si Zhen Li; Ming Guang He; Ning Li Wang Journal: Transl Vis Sci Technol Date: 2020-04-23 Impact factor: 3.283
Authors: Luana Ibiapina Cordeiro Calíope Pinheiro; Maria Lúcia Duarte Pereira; Marcial Porto Fernandez; Francisco Mardônio Vieira Filho; Wilson Jorge Correia Pinto de Abreu; Pedro Gabriel Calíope Dantas Pinheiro Journal: Comput Math Methods Med Date: 2021-07-09 Impact factor: 2.238