Wojciech Lesiński1, Krzysztof Mnich2, Agnieszka Kitlas Golińska3, Witold R Rudnicki3,2. 1. Institute of Computer Science, University of Białystok, Ciołkowskiego 1M, Białystok, Poland. w.lesinski@uwb.edu.pl. 2. Computational Center, University of Białystok, Ciołkowskiego 1M, Białystok, Poland. 3. Institute of Computer Science, University of Białystok, Ciołkowskiego 1M, Białystok, Poland.
Abstract
MOTIVATION: Drug-induced liver injury (DILI) is one of the primary problems in drug development. Early prediction of DILI can bring a significant reduction in the cost of clinical trials. In this work we examined whether occurrence of DILI can be predicted using gene expression profile in cancer cell lines and chemical properties of drugs. METHODS: We used gene expression profiles from 13 human cell lines, as well as molecular properties of drugs to build Machine Learning models of DILI. To this end, we have used a robust cross-validated protocol based on feature selection and Random Forest algorithm. In this protocol we first identify the most informative variables and then use them to build predictive models. The models are first built using data from single cell lines, and chemical properties. Then they are integrated using Super Learner method with several underlying methods for integration. The entire modelling process is performed using nested cross-validation. RESULTS: We have obtained weakly predictive ML models when using either molecular descriptors, or some individual cell lines (AUC ∈(0.55-0.61)). Models obtained with the Super Learner approach have a significantly improved accuracy (AUC=0.73), which allows to divide substances in two categories: low-risk and high-risk.
MOTIVATION: Drug-induced liver injury (DILI) is one of the primary problems in drug development. Early prediction of DILI can bring a significant reduction in the cost of clinical trials. In this work we examined whether occurrence of DILI can be predicted using gene expression profile in cancer cell lines and chemical properties of drugs. METHODS: We used gene expression profiles from 13 human cell lines, as well as molecular properties of drugs to build Machine Learning models of DILI. To this end, we have used a robust cross-validated protocol based on feature selection and Random Forest algorithm. In this protocol we first identify the most informative variables and then use them to build predictive models. The models are first built using data from single cell lines, and chemical properties. Then they are integrated using Super Learner method with several underlying methods for integration. The entire modelling process is performed using nested cross-validation. RESULTS: We have obtained weakly predictive ML models when using either molecular descriptors, or some individual cell lines (AUC ∈(0.55-0.61)). Models obtained with the Super Learner approach have a significantly improved accuracy (AUC=0.73), which allows to divide substances in two categories: low-risk and high-risk.
Entities:
Keywords:
Data integration; Machine learning; Random forest
Authors: Marinka Zitnik; Francis Nguyen; Bo Wang; Jure Leskovec; Anna Goldenberg; Michael M Hoffman Journal: Inf Fusion Date: 2018-09-21 Impact factor: 12.975
Authors: Wiebke Albrecht; Franziska Kappenberg; Tim Brecklinghaus; Iain Gardner; Jörg Rahnenführer; Jan G Hengstler; Regina Stoeber; Rosemarie Marchan; Mian Zhang; Kristina Ebbert; Hendrik Kirschner; Marianna Grinberg; Marcel Leist; Wolfgang Moritz; Cristina Cadenas; Ahmed Ghallab; Jörg Reinders; Nachiket Vartak; Christoph van Thriel; Klaus Golka; Laia Tolosa; José V Castell; Georg Damm; Daniel Seehofer; Alfonso Lampen; Albert Braeuning; Thorsten Buhrke; Anne-Cathrin Behr; Axel Oberemm; Xiaolong Gu; Naim Kittana; Bob van de Water; Reinhard Kreiling; Susann Fayyaz; Leon van Aerts; Bård Smedsrød; Heidrun Ellinger-Ziegelbauer; Thomas Steger-Hartmann; Ursula Gundert-Remy; Anja Zeigerer; Anett Ullrich; Dieter Runge; Serene M L Lee; Tobias S Schiergens; Lars Kuepfer; Alejandro Aguayo-Orozco; Agapios Sachinidis; Karolina Edlund Journal: Arch Toxicol Date: 2019-06-27 Impact factor: 5.153
Authors: Mark-Anthony Bray; Shantanu Singh; Han Han; Chadwick T Davis; Blake Borgeson; Cathy Hartland; Maria Kost-Alimova; Sigrun M Gustafsdottir; Christopher C Gibson; Anne E Carpenter Journal: Nat Protoc Date: 2016-08-25 Impact factor: 13.491
Authors: Aravind Subramanian; Rajiv Narayan; Steven M Corsello; David D Peck; Ted E Natoli; Xiaodong Lu; Joshua Gould; John F Davis; Andrew A Tubelli; Jacob K Asiedu; David L Lahr; Jodi E Hirschman; Zihan Liu; Melanie Donahue; Bina Julian; Mariya Khan; David Wadden; Ian C Smith; Daniel Lam; Arthur Liberzon; Courtney Toder; Mukta Bagul; Marek Orzechowski; Oana M Enache; Federica Piccioni; Sarah A Johnson; Nicholas J Lyons; Alice H Berger; Alykhan F Shamji; Angela N Brooks; Anita Vrcic; Corey Flynn; Jacqueline Rosains; David Y Takeda; Roger Hu; Desiree Davison; Justin Lamb; Kristin Ardlie; Larson Hogstrom; Peyton Greenside; Nathanael S Gray; Paul A Clemons; Serena Silver; Xiaoyun Wu; Wen-Ning Zhao; Willis Read-Button; Xiaohua Wu; Stephen J Haggarty; Lucienne V Ronco; Jesse S Boehm; Stuart L Schreiber; John G Doench; Joshua A Bittker; David E Root; Bang Wong; Todd R Golub Journal: Cell Date: 2017-11-30 Impact factor: 41.582
Authors: Mark-Anthony Bray; Sigrun M Gustafsdottir; Mohammad H Rohban; Shantanu Singh; Vebjorn Ljosa; Katherine L Sokolnicki; Joshua A Bittker; Nicole E Bodycombe; Vlado Dancík; Thomas P Hasaka; Cindy S Hon; Melissa M Kemp; Kejie Li; Deepika Walpita; Mathias J Wawer; Todd R Golub; Stuart L Schreiber; Paul A Clemons; Alykhan F Shamji; Anne E Carpenter Journal: Gigascience Date: 2017-12-01 Impact factor: 6.524