A A Veloso1, K B Gomes2,3, I S Silva1, C N Ferreira4, L B X Costa5, M O Sóter6, L M L Carvalho5, J de C Albuquerque6, M F Sales5, A L Candido7, F M Reis8. 1. Departamento das Ciências da Computação, Instituto de Ciências Exatas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil. 2. Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil. karinabgb@gmail.com. 3. Departamento de Análises Clínicas e Toxicológicas, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha, Belo Horizonte, MG, 31270-901, Brasil. karinabgb@gmail.com. 4. Colégio Técnico, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil. 5. Departamento de Genética, Ecologia e Evolução, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil. 6. Departamento de Análises Clínicas e Toxicológicas, Faculdade de Farmácia, Universidade Federal de Minas Gerais, Av. Antônio Carlos, 6627, Pampulha, Belo Horizonte, MG, 31270-901, Brasil. 7. Departamento de Clínica Médica, Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil. 8. Departamento de Ginecologia e Obstetrícia, Faculdade de Medicina, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil.
Abstract
PURPOSE: Polycystic Ovary Syndrome (PCOS) is the most frequent endocrinopathy in women of reproductive age. Machine learning (ML) is the area of artificial intelligence with a focus on predictive computing algorithms. We aimed to define the most relevant clinical and laboratory variables related to PCOS diagnosis, and to stratify patients into different phenotypic groups (clusters) using ML algorithms. METHODS: Variables from a database comparing 72 patients with PCOS and 73 healthy women were included. The BorutaShap method, followed by the Random Forest algorithm, was applied to prediction and clustering of PCOS. RESULTS: Among the 58 variables investigated, the algorithm selected in decreasing order of importance: lipid accumulation product (LAP); abdominal circumference; thrombin activatable fibrinolysis inhibitor (TAFI) levels; body mass index (BMI); C-reactive protein (CRP), high-density lipoprotein cholesterol (HDL-c), follicle-stimulating hormone (FSH) and insulin levels; HOMA-IR value; age; prolactin, 17-OH progesterone and triglycerides levels; and family history of diabetes mellitus in first-degree relative as the variables associated to PCOS diagnosis. The combined use of these variables by the algorithm showed an accuracy of 86% and area under the ROC curve of 97%. Next, PCOS patients were gathered into two clusters in the first, the patients had higher BMI, abdominal circumference, LAP and HOMA-IR index, as well as CRP and insulin levels compared to the other cluster. CONCLUSION: The developed algorithm could be applied to select more important clinical and biochemical variables related to PCOS and to classify into phenotypically different clusters. These results could guide more personalized and effective approaches to the treatment of PCOS.
PURPOSE: Polycystic Ovary Syndrome (PCOS) is the most frequent endocrinopathy in women of reproductive age. Machine learning (ML) is the area of artificial intelligence with a focus on predictive computing algorithms. We aimed to define the most relevant clinical and laboratory variables related to PCOS diagnosis, and to stratify patients into different phenotypic groups (clusters) using ML algorithms. METHODS: Variables from a database comparing 72 patients with PCOS and 73 healthy women were included. The BorutaShap method, followed by the Random Forest algorithm, was applied to prediction and clustering of PCOS. RESULTS: Among the 58 variables investigated, the algorithm selected in decreasing order of importance: lipid accumulation product (LAP); abdominal circumference; thrombin activatable fibrinolysis inhibitor (TAFI) levels; body mass index (BMI); C-reactive protein (CRP), high-density lipoprotein cholesterol (HDL-c), follicle-stimulating hormone (FSH) and insulin levels; HOMA-IR value; age; prolactin, 17-OH progesterone and triglycerides levels; and family history of diabetes mellitus in first-degree relative as the variables associated to PCOS diagnosis. The combined use of these variables by the algorithm showed an accuracy of 86% and area under the ROC curve of 97%. Next, PCOS patients were gathered into two clusters in the first, the patients had higher BMI, abdominal circumference, LAP and HOMA-IR index, as well as CRP and insulin levels compared to the other cluster. CONCLUSION: The developed algorithm could be applied to select more important clinical and biochemical variables related to PCOS and to classify into phenotypically different clusters. These results could guide more personalized and effective approaches to the treatment of PCOS.
Authors: L M L Carvalho; C N Ferreira; M O Sóter; M F Sales; K F Rodrigues; S R Martins; A L Candido; F M Reis; I F O Silva; F M F Campos; K B Gomes Journal: Mol Cell Endocrinol Date: 2017-01-11 Impact factor: 4.102
Authors: Jéssica A G Tosatti; Mirelle O Sóter; Cláudia N Ferreira; Ieda de F O Silva; Ana L Cândido; Marinez O Sousa; Fernando M Reis; Karina B Gomes Journal: Cytokine Date: 2020-07-06 Impact factor: 3.861
Authors: Mirelle O Sóter; Cláudia N Ferreira; Mariana F Sales; Ana L Candido; Fernando M Reis; Kátia S Milagres; Carla Ronda; Ieda O Silva; Marinez O Sousa; Karina B Gomes Journal: Cytokine Date: 2015-07-02 Impact factor: 3.861
Authors: Laura M L Carvalho; Cláudia N Ferreira; Daisy K D de Oliveira; Kathryna F Rodrigues; Rita C F Duarte; Márcia F A Teixeira; Luana B Xavier; Ana Lúcia Candido; Fernando M Reis; Ieda F O Silva; Fernanda M F Campos; Karina B Gomes Journal: J Assist Reprod Genet Date: 2017-09-13 Impact factor: 3.412
Authors: Wendy A March; Vivienne M Moore; Kristyn J Willson; David I W Phillips; Robert J Norman; Michael J Davies Journal: Hum Reprod Date: 2009-11-12 Impact factor: 6.918