Florence Jornod1, Thomas Jaylet1, Ludek Blaha2, Denis Sarigiannis3, Luc Tamisier4, Karine Audouze1. 1. Université de Paris, T3S, Inserm UMR-S1124, 45 rue des Saints Pères, Paris, F-75006, France. 2. RECETOX, Faculty of Science, Masaryk University, Kamenice 5, Brno, CZ62500, Czech Republic. 3. HERACLES Research Center on the Exposome and Health, Aristotle University of Thessaloniki, Center for Interdiciplinary Research and Innovation, Thessaloniki, 57001, Greece. 4. Université de Paris, SPPIN CNRS UMR 8003, 45 rue des Saints-Pères, Paris, F-75006, France.
Abstract
MOTIVATION: Adverse Outcome Pathways (AOPs) are a conceptual framework developed to support the use of alternative toxicology approaches in the risk assessment. AOPs are structured linear organizations of existing knowledge illustrating causal pathways from the initial molecular perturbation triggered by various stressors, through key events (KEs) at different levels of biology, to the ultimate health or ecotoxicological adverse outcome. RESULTS: Artificial intelligence can be used to systematically explore available toxicological data that can be parsed in the scientific literature. Recently a tool called AOP-helpFinder was developed to identify associations between stressors and KEs supporting thus documentation of AOPs. To facilitate the utilization of this advanced bioinformatics tool by the scientific and the regulatory community, a webserver was created. The proposed AOP-helpFinder webserver uses better performing version of the tool which reduces the need for manual curation of the obtained results. As an example, the server was successfully applied to explore relationships of a set of endocrine disruptors with metabolic-related events. The AOP-helpFinder webserver assists in a rapid evaluation of existing knowledge stored in the PubMed database, a global resource of scientific information, to build AOPs and Adverse Outcome Networks (AONs) supporting the chemical risk assessment. AVAILABILITY AND IMPLEMENTATION: AOP-helpFinder is available at http://aop-helpfinder.u-paris-sciences.fr/index.php.
MOTIVATION: Adverse Outcome Pathways (AOPs) are a conceptual framework developed to support the use of alternative toxicology approaches in the risk assessment. AOPs are structured linear organizations of existing knowledge illustrating causal pathways from the initial molecular perturbation triggered by various stressors, through key events (KEs) at different levels of biology, to the ultimate health or ecotoxicological adverse outcome. RESULTS: Artificial intelligence can be used to systematically explore available toxicological data that can be parsed in the scientific literature. Recently a tool called AOP-helpFinder was developed to identify associations between stressors and KEs supporting thus documentation of AOPs. To facilitate the utilization of this advanced bioinformatics tool by the scientific and the regulatory community, a webserver was created. The proposed AOP-helpFinder webserver uses better performing version of the tool which reduces the need for manual curation of the obtained results. As an example, the server was successfully applied to explore relationships of a set of endocrine disruptors with metabolic-related events. The AOP-helpFinder webserver assists in a rapid evaluation of existing knowledge stored in the PubMed database, a global resource of scientific information, to build AOPs and Adverse Outcome Networks (AONs) supporting the chemical risk assessment. AVAILABILITY AND IMPLEMENTATION: AOP-helpFinder is available at http://aop-helpfinder.u-paris-sciences.fr/index.php.
Structured organization of toxicological and ecotoxicological data is now feasible using the adverse outcome pathways (AOP) framework (Ankley ). An AOP is defined by a linear combination of biological events, started from a molecular initiating event (MIE) triggered by stressors (pollutants, ionizing radiations, nanomaterials or climate stressors) connected through a series of key events (KEs) occurring at various levels of the biological organization, to an adverse outcome (AO). Biological events (MIE, KE and AO) are not linked to a unique AOP, but can be shared, allowing the establishment of Adverse Outcome Network (AON) that reflect better the true complexity of the biology. Combined with new approach methodologies (Parish ), AOPs and AONs are extremely useful in establishing integrated approaches to testing and assessment (IATA) for environmental and risk assessment, and they aid to the development of novel nonanimal toxicity testing strategies (Delrue ).With advances in technologies, huge amounts of data have become available, compiled in well-structured toxicological databases (e.g. CTD, CompTox), in AOP-oriented webservers (AOP-wiki, sAOP, AOP4EUpest) and scientific publications (Williams ). Innovative data mining tools are needed to identify sparse but complementary data such as Abstract Sifter allowing to have a view of the toxicological information landscape for a set of entities as chemicals (Baker ) or ComptoxAI (https://comptox.ai/index.html). Artificial intelligence (AI) technology, that uses natural language processing (NLP), is an interesting way to facilitate the identification of links between relevant information that can be used to build novel AOPs (Song ), and identify knowledge gaps and research needs (Zgheib ). Several tools use text mining (TM), an AI method to transform unstructured into structured text. For example, Limtox provides a biomedical search for adverse hepatobiliary reactions (Cañada ). Recently, the AOP-helpFinder tool, based on TM and graph theory was proposed to identify stressor-KE relationships by examining large collections of scientific abstracts, and was applied to bisphenol A substituents and pesticides (Carvaillo ; Jornod ; Rugard ).Here, we present the AOP-helpFinder webserver, which uses an updated version of the tool, to provide an easy but effective resource for identifying and compiling existing knowledge from the scientific literature. The main optimized features are (i) the capability to choose to search in full abstracts or without considering the introductory parts, (ii) the possibility to perform a refined search using machine learning and (iii) an automatic update of the PubMed database before each search. A case study with endocrine disruptors (ED) and metabolism-related events is provided, illustrating the capacity of the tool to collate quickly an overview of the existing information.
2 Materials and methods
2.1 The AOP-helpFinder webserver
The proposed webserver is easy to use, and requires only the user email to access the upload page and to receive information when the results are available for download. This simple procedure is in line with digital sobriety that aims to reduce the environmental impact by limiting computing use. Two input files are needed: one with the stressors of interest and the second with biological events (i.e. MIE, KE and/or AO). Before running the tool to identify if knowledge connecting stressors and biological events exists, the user can choose between two options: reduce search and refinement filter (see the following section and Supplementary Material), as well as the output format (date, title, PMID, etc.).
2.2 The AOP-helpFinder tool
To increase the performance of the previously developed version, several methods were tested using a set on ED and biological events related to metabolism, and two were kept through the process (see Supplementary Material, https://github.com/jornod/aop-helpFinder):(i) ‘reduced search’: searches are performed in the full abstracts or without considering the introductory part, which appears to be covered usually by the first 20% of the abstracts. This option allows avoiding too many false positives, as the introduction often reflects a working hypothesis instead of the conclusions of the publication and (ii) ‘refinement filter’: after the preprocessing that uses a stemming process (Carvaillo ), the tool can refine the searches by combining a deletion of sentences containing context words with a lemmatization process. Lemmatization is a machine learning method for text normalization used in NLP that considers the context and converts the word to its meaningful base form. This option is very useful when terms have common stems (e.g. tests, testis □ test) leading to incorrect meanings and spelling errors. Further, an automatic daily update of the PubMed database was newly implemented using the NCBI API to screen the full existing knowledge.The current version of the AI tool mined the PubMed database, that is a global source for scientific literature. Nevertheless, the developed method screens text-based knowledge, and therefore the AOP-helpFinder server could be improved for mining multiple sources (databases, literature), including studies reporting negative findings, to accelerate information gathering when data are limited and present in diverse sources (Carvaillo ).The advantage of the proposed method, is its capacity to be adapted for literature searches in general, independent of AOP development, in order to identify interconnections between the query keywords, as it was successfully done to decipher nonvalidated test methods for ED (Zgheib ).
2.3 Case study on endocrine disruptors and metabolism
The AOP-helpFinder webserver was used for a case study aiming at automatically identifying existing relationships and knowledge gaps between 10 ED (Supplementary Table S1) and 294 biological events related to metabolism (Supplementary Table S2). The webserver was launched using ‘reduced search’ (omitting searches in the first 20% of the abstracts) and ‘refinement filter’. Among the 83 970 abstracts retrieved in the PubMed database as of May 10, 2021 related to at least one ED (Supplementary Table S1), a total of 4622 were retained (comentioning ED and event). Among the 294 events, 108 were identified as comentioned with at least one ED (Supplementary Table S2). Figure 1 illustrates the large disparity of knowledge for the 10 selected ED in the area of metabolism (see Supplementary Fig. S1 for all results). For example, cadmium, bisphenol A and di(2-ethylhexyl) phthalate (DEHP) are well studied chemicals as the webserver retrieved scientific articles for almost all biological events of interest. Other chemicals (bisphenol F, bisphenol S, butyl-paraben) appear to be less studied (Supplementary Table S1), and the information were essentially identified for extensively studied biological events such as oxidative stress or obesity.
Fig. 1.
Example of ED comentioned in PubMed scientific abstracts with biological events related to metabolism, identified by the AOP-helpFinder webserver. The numbers correspond to the % of retrieved abstracts mentioning both the stressor (column) and the event (line) among all identified abstracts (the colors are according to the percentage for better visualization). For example, among all identified abstracts that comentioned bisphenol S (BPS) and at least one event from the list, 13% of the abstracts were comentioning BPS (fourth column) and obesity (the second line from the bottom)
Example of ED comentioned in PubMed scientific abstracts with biological events related to metabolism, identified by the AOP-helpFinder webserver. The numbers correspond to the % of retrieved abstracts mentioning both the stressor (column) and the event (line) among all identified abstracts (the colors are according to the percentage for better visualization). For example, among all identified abstracts that comentioned bisphenol S (BPS) and at least one event from the list, 13% of the abstracts were comentioning BPS (fourth column) and obesity (the second line from the bottom)
3 Conclusion
The AOP-helpFinder webserver uses an automatic AI screening to rapidly retrieve existing knowledge on links between stressors and biological events to build AOPs and AONs. This webserver allows highly effective searches in PubMed as it considerably reduces the time of finding relevant scientific articles. The comprehensive AI-based analyses of existing literature support various needs of the risk assessment such as establishment of causality between chemicals and AOs through AOPs and AONs, identification of gaps or prioritization and design of future experimental and epidemiological studies.Click here for additional data file.
Authors: Gerald T Ankley; Richard S Bennett; Russell J Erickson; Dale J Hoff; Michael W Hornung; Rodney D Johnson; David R Mount; John W Nichols; Christine L Russom; Patricia K Schmieder; Jose A Serrrano; Joseph E Tietge; Daniel L Villeneuve Journal: Environ Toxicol Chem Date: 2010-03 Impact factor: 3.742
Authors: Elias Zgheib; Min Ji Kim; Florence Jornod; Kévin Bernal; Céline Tomkiewicz; Sylvie Bortoli; Xavier Coumoul; Robert Barouki; Kelly De Jesus; Elise Grignard; Philippe Hubert; Efrosini S Katsanou; Francois Busquet; Karine Audouze Journal: Environ Int Date: 2021-04-22 Impact factor: 9.621
Authors: Antony J Williams; Christopher M Grulke; Jeff Edwards; Andrew D McEachran; Kamel Mansouri; Nancy C Baker; Grace Patlewicz; Imran Shah; John F Wambaugh; Richard S Judson; Ann M Richard Journal: J Cheminform Date: 2017-11-28 Impact factor: 5.514
Authors: Andres Cañada; Salvador Capella-Gutierrez; Obdulia Rabal; Julen Oyarzabal; Alfonso Valencia; Martin Krallinger Journal: Nucleic Acids Res Date: 2017-07-03 Impact factor: 16.971
Authors: Christopher D Kassotis; Frederick S Vom Saal; Patrick J Babin; Dominique Lagadic-Gossmann; Helene Le Mentec; Bruce Blumberg; Nicole Mohajer; Antoine Legrand; Vesna Munic Kos; Corinne Martin-Chouly; Normand Podechard; Sophie Langouët; Charbel Touma; Robert Barouki; Min Ji Kim; Karine Audouze; Mahua Choudhury; Nitya Shree; Amita Bansal; Sarah Howard; Jerrold J Heindel Journal: Biochem Pharmacol Date: 2022-04-05 Impact factor: 6.100
Authors: Karine Audouze; Elias Zgheib; Khaled Abass; Asma H Baig; Isabel Forner-Piquer; Henrik Holbech; Dries Knapen; Pim E G Leonards; Diana I Lupu; Saranya Palaniswamy; Arja Rautio; Maria Sapounidou; Olwenn V Martin Journal: Front Toxicol Date: 2021-12-21