Literature DB >> 31773464

Undersampling: case studies of flaviviral inhibitory activities.

Stephen J Barigye1, José Manuel García de la Vega2, Juan A Castillo-Garit3.   

Abstract

Imbalanced datasets, comprising of more inactive compounds relative to the active ones, are a common challenge in ligand-based model building workflows for drug discovery. This is particularly true for neglected tropical diseases since efforts to identify therapeutics for these diseases are often limited. In this report, we analyze the performance of several undersampling strategies in modeling the Dengue Virus 2 (DENV2) inhibitory activity, as well as the anti-flaviviral activities for the West Nile (WNV) and Zika (ZIKV) viruses. To this end, we build datasets comprising of 1218 (159 actives and 1059 inactives), 1044 (132 actives and 912 inactives) and 302 (75 actives and 227 inactives) molecules with known DENV2, WNV and ZIKV inhibitory activity profiles, respectively. We develop ensemble classifiers for these endpoints and compare the performance of the different undersampling algorithms on external sets. It is observed that data pruning algorithms yield superior performance relative to data selection algorithms. The best overall performance is provided by the one-sided selection algorithm with test set balanced accuracy (BACC) values of 0.84, 0.74 and 0.77 for the DENV2, WNV and ZIKV inhibitory activities, respectively. For the model building, we use the recently proposed GT-STAF information indices, and compare the predictivity of 3 molecular fragmentation approaches: connected subgraphs, substructure and alogp atom types, which are observed to show comparable performance. On the other hand, a combination of indices based on these fragmentation strategies enhances the predictivity of the built ensembles. The built models could be useful for screening new molecules with possible DENV, WNV and ZIKV inhibitory activities. ADMET modelers are encouraged to adopt undersampling algorithms in their workflows when dealing with imbalanced datasets.

Entities:  

Keywords:  Dengue virus; Information index; Support vector machine; Undersampling; West nile virus; Zika virus

Mesh:

Substances:

Year:  2019        PMID: 31773464     DOI: 10.1007/s10822-019-00255-3

Source DB:  PubMed          Journal:  J Comput Aided Mol Des        ISSN: 0920-654X            Impact factor:   3.686


  25 in total

1.  Tropical medicine. Surprising new dengue virus throws a spanner in disease control efforts.

Authors:  Dennis Normile
Journal:  Science       Date:  2013-10-25       Impact factor: 47.728

2.  Relations frequency hypermatrices in mutual, conditional and joint entropy-based information indices.

Authors:  Stephen J Barigye; Yovani Marrero-Ponce; Yoan Martínez-López; Francisco Torrens; Luis Manuel Artiles-Martínez; Ricardo W Pino-Urias; Oscar Martínez-Santiago
Journal:  J Comput Chem       Date:  2012-09-26       Impact factor: 3.376

3.  IMMAN: free software for information theory-based chemometric analysis.

Authors:  Ricardo W Pino Urias; Stephen J Barigye; Yovani Marrero-Ponce; César R García-Jacas; José R Valdes-Martiní; Facundo Perez-Gimenez
Journal:  Mol Divers       Date:  2015-01-26       Impact factor: 2.943

4.  Coping with unbalanced class data sets in oral absorption models.

Authors:  Danielle Newby; Alex A Freitas; Taravat Ghafourian
Journal:  J Chem Inf Model       Date:  2013-01-24       Impact factor: 4.956

5.  Zika virus evolution and spread in the Americas.

Authors:  Hayden C Metsky; Christian B Matranga; Shirlee Wohl; Stephen F Schaffner; Catherine A Freije; Sarah M Winnicki; Kendra West; James Qu; Mary Lynn Baniecki; Adrianne Gladden-Young; Aaron E Lin; Christopher H Tomkins-Tinch; Simon H Ye; Daniel J Park; Cynthia Y Luo; Kayla G Barnes; Rickey R Shah; Bridget Chak; Giselle Barbosa-Lima; Edson Delatorre; Yasmine R Vieira; Lauren M Paul; Amanda L Tan; Carolyn M Barcellona; Mario C Porcelli; Chalmers Vasquez; Andrew C Cannons; Marshall R Cone; Kelly N Hogan; Edgar W Kopp; Joshua J Anzinger; Kimberly F Garcia; Leda A Parham; Rosa M Gélvez Ramírez; Maria C Miranda Montoya; Diana P Rojas; Catherine M Brown; Scott Hennigan; Brandon Sabina; Sarah Scotland; Karthik Gangavarapu; Nathan D Grubaugh; Glenn Oliveira; Refugio Robles-Sikisaka; Andrew Rambaut; Lee Gehrke; Sandra Smole; M Elizabeth Halloran; Luis Villar; Salim Mattar; Ivette Lorenzana; Jose Cerbino-Neto; Clarissa Valim; Wim Degrave; Patricia T Bozza; Andreas Gnirke; Kristian G Andersen; Sharon Isern; Scott F Michael; Fernando A Bozza; Thiago M L Souza; Irene Bosch; Nathan L Yozwiak; Bronwyn L MacInnis; Pardis C Sabeti
Journal:  Nature       Date:  2017-05-24       Impact factor: 49.962

Review 6.  Secondary infection as a risk factor for dengue hemorrhagic fever/dengue shock syndrome: an historical perspective and role of antibody-dependent enhancement of infection.

Authors:  Maria G Guzman; Mayling Alvarez; Scott B Halstead
Journal:  Arch Virol       Date:  2013-03-08       Impact factor: 2.574

Review 7.  The Medicinal Chemistry of Dengue Virus.

Authors:  Mira A M Behnam; Christoph Nitsche; Veaceslav Boldescu; Christian D Klein
Journal:  J Med Chem       Date:  2016-02-05       Impact factor: 7.446

8.  Clinical efficacy and safety of a novel tetravalent dengue vaccine in healthy children in Asia: a phase 3, randomised, observer-masked, placebo-controlled trial.

Authors:  Maria Rosario Capeding; Ngoc Huu Tran; Sri Rezeki S Hadinegoro; Hussain Imam H J Muhammad Ismail; Tawee Chotpitayasunondh; Mary Noreen Chua; Chan Quang Luong; Kusnandi Rusmil; Dewa Nyoman Wirawan; Revathy Nallusamy; Punnee Pitisuttithum; Usa Thisyakorn; In-Kyu Yoon; Diane van der Vliet; Edith Langevin; Thelma Laot; Yanee Hutagalung; Carina Frago; Mark Boaz; T Anh Wartel; Nadia G Tornieporth; Melanie Saville; Alain Bouckenooghe
Journal:  Lancet       Date:  2014-07-10       Impact factor: 79.321

9.  Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods.

Authors:  Qingda Zang; Daniel M Rotroff; Richard S Judson
Journal:  J Chem Inf Model       Date:  2013-12-11       Impact factor: 4.956

10.  The global distribution and burden of dengue.

Authors:  Samir Bhatt; Peter W Gething; Oliver J Brady; Jane P Messina; Andrew W Farlow; Catherine L Moyes; John M Drake; John S Brownstein; Anne G Hoen; Osman Sankoh; Monica F Myers; Dylan B George; Thomas Jaenisch; G R William Wint; Cameron P Simmons; Thomas W Scott; Jeremy J Farrar; Simon I Hay
Journal:  Nature       Date:  2013-04-07       Impact factor: 49.962

View more
  1 in total

1.  Evolving scenario of big data and Artificial Intelligence (AI) in drug discovery.

Authors:  Manish Kumar Tripathi; Abhigyan Nath; Tej P Singh; A S Ethayathulla; Punit Kaur
Journal:  Mol Divers       Date:  2021-06-23       Impact factor: 3.364

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.