Literature DB >> 35295213

The AOP-DB RDF: Applying FAIR Principles to the Semantic Integration of AOP Data Using the Research Description Framework.

Holly M Mortensen1, Marvin Martens2, Jonathan Senn3, Trevor Levey3,4, Chris T Evelo2,5, Egon L Willighagen2, Thomas Exner6.   

Abstract

Computational toxicology is central to the current transformation occurring in toxicology and chemical risk assessment. There is a need for more efficient use of existing data to characterize human toxicological response data for environmental chemicals in the US and Europe. The Adverse Outcome Pathway (AOP) framework helps to organize existing mechanistic information and contributes to what is currently being described as New Approach Methodologies (NAMs). AOP knowledge and data are currently submitted directly by users and stored in the AOP-Wiki (https://aopwiki.org/). Automatic and systematic parsing of AOP-Wiki data is challenging, so we have created the EPA Adverse Outcome Pathway Database. The AOP-DB, developed by the US EPA to assist in the biological and mechanistic characterization of AOP data, provides a broad, systems-level overview of the biological context of AOPs. Here we describe the recent semantic mapping efforts for the AOP-DB, and how this process facilitates the integration of AOP-DB data with other toxicologically relevant datasets through a use case example.
Copyright © 2022 Mortensen, Martens, Senn, Levey, Evelo, Willighagen and Exner.

Entities:  

Keywords:  adverse outcome pathway; disease; ontological mapping; pathway; semantic web; toxcast assays

Year:  2022        PMID: 35295213      PMCID: PMC8915825          DOI: 10.3389/ftox.2022.803983

Source DB:  PubMed          Journal:  Front Toxicol        ISSN: 2673-3080


Introduction

There is a need for more efficient use of existing data through improved data integration and compatibility of data structures to characterize human toxicological response data for environmental chemicals. Assessors in the US are moving towards the use of existing mechanistic data (in vitro and in silico) that provide insights into adverse outcomes in humans (National Research Council (NRC), 2007; National Research Council (NRC), 2009; National Research Council (NRC), 2010; (National Research Council (NRC), 2017), and reduced animal testing (Wheeler, 2019). The Adverse Outcome Pathway (AOP) framework helps to organize existing mechanistic information and contributes to what is currently being described as New Approach Methodologies (NAMs) (Thomas et al., 2019). The US EPA Adverse Outcome Pathway-Database (AOP-DB) is a decision support tool for risk assessors, developed by the EPA’s Center for Public Health and Environmental Assessment, which contributes to NAMs (e.g., computational toxicology tools) used for the Toxic Substances Control Act (Public Law 114–182, 2016). The AOP-DB has been made available through the Office of Science Management as a public EPA database since November 2021. Pertinent AOP-DB data is currently integrated with the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard/chemical_lists/AOPSTRESSORS), which maps the Distributed Structure-Searchable Toxicity records to the most current list of AOP-DB stressors. The AOP-DB integrates AOP content to help users characterize AOPs from the OECD-funded AOP-KB (https://aopkb.oecd.org/index.html) effort, where the AOP-Wiki (https://aopwiki.org/) is the primary repository for direct user submission of AOP information to the AOP-KB. Because the AOP-Wiki data is challenging to parse in its current format (Ives et al., 2017; Martens et al., 2018), the AOP-DB was developed to assist in automating and organizing AOP data, as well as integrating with publicly available datasets to allow biological and mechanistic characterization of AOPs and provide a systems-level overview of the biological context of AOPs (Mortensen et al., 2018; Pittman et al., 2018). Recent updates to AOP-DB in version 2 (Mortensen, 2021; Mortensen et al., 2021) include 280 AOPs (1,111 kEs) from the AOP-Wiki XML. The semantic mapping of AOP-DB data, described herein, extends AOP capabilities to users through the incorporation of the Research Description Framework (RDF), which creates additional ontological linkages and improves capabilities for computational analyses (Figure 1). These tools are useful to AOP users trying to retrieve information for AOP development or to understand and characterize existing AOPs. Here we describe the recent semantic mapping efforts for the AOP-DB, and how this process integrates AOP-DB data with other toxicologically relevant datasets.
FIGURE 1

The OECD funded AOP-KB currently support the AOP-Wiki. The EPA AOP-DB, currently slated as a third-party tool for integration with the AOPKB 2.0, automatically and programmatically pulls AOP data from the AOP-KB XML, and extends AOP capabilities to users with semantic resources like WikiPathways and the OpenRiskNet e-infrastructure that incorporate the Research Description Framework (RDF). Integration of data across the AOP-KB (AOP-Wiki), AOP-DB, and expanding research frameworks through WikiPathways and the EU funded OpenRiskNet, creates additional ontological linkages and improves capabilities for computational analyses. These tools are useful to AOP users trying to retrieve information for AOP development, as well as those trying to understand and characterize existing AOPs.

The OECD funded AOP-KB currently support the AOP-Wiki. The EPA AOP-DB, currently slated as a third-party tool for integration with the AOPKB 2.0, automatically and programmatically pulls AOP data from the AOP-KB XML, and extends AOP capabilities to users with semantic resources like WikiPathways and the OpenRiskNet e-infrastructure that incorporate the Research Description Framework (RDF). Integration of data across the AOP-KB (AOP-Wiki), AOP-DB, and expanding research frameworks through WikiPathways and the EU funded OpenRiskNet, creates additional ontological linkages and improves capabilities for computational analyses. These tools are useful to AOP users trying to retrieve information for AOP development, as well as those trying to understand and characterize existing AOPs. As part of OpenRiskNet, a 3 years project supported by the European Commission within Horizon2020 EINFRA-22-2016 Programme, the US EPA AOP-DB was selected as an Implementation Challenge winner. The Implementation Challenge was created to select external tools for use in risk assessment to be prioritized for integration in the OpenRiskNet e-Infrastructure (https://openrisknet.org/) and foster collaborative interaction between project partners. In contribution to this effort, US EPA and Maastricht University project partners have completed the semantic mapping of several AOP-DB data tables into RDF, which is a standard model for data interchange (W3C, 2014). The application of RDF defines relationships between data objects using triplestores that include three positional statements (subject, predicate and object). The mapping of AOP-DB data to the RDF data model stores relevant AOP information in a computer-readable format, and contributes to the identification, disambiguation, and meaningful linkage of AOP data with other data structures, following FAIR (findable, accessible, interoperable, and reusable) principles (Wilkinson et al., 2019a; Wilkinson et al., 2019b).

Materials and Methods

We selected seven AOP-DB data tables for semantic integration, specifically the Gene Interaction, Biological Pathway, Toxcast Assay, Taxonomy, Chemical-Gene, Gene Info, and Key Event tables. In developing the AOP-DB RDF, we implemented the most recent version of the SQL AOP-DB (Mortensen, 2020) to map each table of interest into RDF triples. Each table was filtered using the R version 3.6 and Rstudio version 1.2.83 (R Core Team, 2020) to include only records involving a molecular initiating event (MIE) or key event (KE) that maps to a molecular identifier (e.g., gene, protein, cytokine). Code was developed to implement each record as input, modify and filter the AOP-DB table data, and output each modified record to an RDF triple. Additionally, subjects were created for Ensembl and UniProt identifiers. Ontology terms were referenced using BioPortal (Whetzel et al., 2011) in order to find the most appropriate ontology terms for each entity, in line with the AOP-Wiki RDF (Martens et al., 2021a) for optimal interoperability between the two resources. Terms were selected with the most accurate description from ontologies that are relevant to the context of the field. For the development of the AOP-DB RDF, several ontologies and consistent vocabularies have been included. Furthermore, publicly available datasets included in the AOP-DB for RDF mapping are described in detail in Mortensen . Table 1 provides an overview of the included ontologies and database links, including their prefix in the RDF and their corresponding Internationalized Resource Identifier (IRI).
TABLE 1

Overview of ontologies, consistent vocabularies and databases included in the AOP-DB RDF.

Ontologies and Vocabularies 
NamePrefix in RDFIRI
AOP Ontology Burgoon, (2017) Aopo http://aopkb.org/aop_ontology#
BioAssay Ontology Abeyruwan et al. (2014) Bao http://www.bioassayontology.org/bao#
Chemical Information ontology Hastings et al. (2011) Cheminf http://semanticscience.org/resource/CHEMINF_
Dublin Coredc http://purl.org/dc/elements/1.1
EDAM Ontology Ison et al. (2013) edam http://edamontology.org
Friend Of A Friendfoaf http://xmlns.com/foaf/0.1
Logical Observation Identifier Names and Codes McDonald et al. (2003) loinc http://purl.bioontology.org/ontology/LNC
Molecular Interactions Millán, (2020) mi http://purl.obolibrary.org/obo/MI_
Measurement Method Ontology Smith et al. (2013) mmo http://purl.obolibrary.org/obo/MMO_
NCBI Taxonomy Bodenreider, (2004) ncbitaxon http://purl.bioontology.org/ontology/NCBITAXON
Pathway Ontology Petri et al. (2014) pw http://purl.obolibrary.org/obo/PW_
RDF Schemardfs http://www.w3.org/2000/01/rdf-schema#
Semantics Science Ontology Dumontier et al. (2014) sio http://semanticscience.org/resource
Simple Knowledge Organization Systemskos http://www.w3.org/2004/02/skos/core#
Uber Anatomy Ontology Mungall et al. (2012) uberon http://purl.obolibrary.org/obo/UBERON_
Databases
 AOP-Wikiaop.events http://identifiers.org/aop.events
 Comptox Dashboard Williams et al. (2017) assay https://comptox.epa.gov/dashboard/assay_endpoints
 CAS Common Chemistrycas https://identifiers.org/cas
 Ensembl Yates et al. (2020) ensembl http://identifiers.org/ensembl
 HUGO Genome Nomenclature Committee Braschi et al. (2019) hgnc https://identifiers.org/hgnc
 NCBI Genencbigene https://identifiers.org/ncbigene
 Uniprot UniProt Consortium (2019) uniprot https://identifiers.org/uniprot
 KEGG Pathways Kanehisa et al. (2021) kegg.pathway https://identifiers.org/kegg.pathway
 PharmGKB Pathways Whirl-Carrillo et al. (2021) pharmgkb.pathways https://identifiers.org/pharmgkb.pathways
 Small Molecule Pathway Database Jewison et al. (2014) smpdb https://identifiers.org/smpdb
 BioCyc Karp et al. (2019) biocyc https://identifiers.org/biocyc
 BioCarta Pathwaysbiocarta.pathway https://identifiers.org/biocarta.pathway
 Reactome Jassal et al. (2020) reactome https://identifiers.org/reactome
 NCI Pathway Interaction Database Schaefer et al. (2009) pid.pathway https://identifiers.org/pid.pathway
 NetPath Kandasamy et al. (2010) netpath http://netpath.org/pathways?path_id=
 WikiPathways Martens et al. (2021b) wikipathways https://identifiers.org/wikipathways
 AOP-DB Chemical-Gene associationchemicalgeneassociation http://example.org/ChemicalGeneAssociation
 AOP-DB Protein InteractionproteinInteraction http://example/proteinInteraction
Overview of ontologies, consistent vocabularies and databases included in the AOP-DB RDF.

Testing the AOP-DB RDF

Using a Jupyter notebook (Jupyterlab version 3.2.5, Python version 3.8.5), the AOP-DB SPARQL endpoint has been tested by executing SPARQL queries, using the SPARQLWrapper Python library (version 1.8.5). SPARQL queries were used to extract statistics of the data, and a federated SPARQL query was constructed to explore the integrative capabilities of the AOP-DB RDF. The Jupyter notebook, SPARQL queries for extracting data counts, and instructions for setting up the AOP-DB SPARQL endpoint are available on https://github.com/BiGCAT-UM/AOP-DB-RDF.

Results

The AOP-DB Semantic Mapping

The AOP-DB RDF schema developed according to the methods described above resulted in the primary and secondary table structure, as illustrated in Figure 2. The AOP-DB extends AOP-Wiki RDF with the inclusion of gene/protein, chemical, ToxCast, and biological pathway and taxonomy information. In total, the RDF contains 157 kEs, 376 NCBI genes linked to KEs, 93,449 Chemical-Gene Interactions (3,982 unique chemicals and 122 unique genes), 763,446 Protein-Protein Interactions, 1,143 ToxCast Assays 110,833 Biological Pathways from 10 sources, and 22 taxonomies. Also, the NCBI Gene IDs were matched to 299 Ensembl IDs and 1,026 UniProt IDs. The AOP-DB RDF data tables associate the gene and protein information of AOP genes to chemical, pathway, and assay information organized within the AOP-DB (Mortensen, 2020; Mortensen, 2021).
FIGURE 2

AOP-DB Semantic Mapping using illustrating the predicates and objects of the nine core subject types in the AOP-DB RDF (in blue). Vertical columns show subjects, and the middle and right columns indicate predicates and objects, respectively. Where applicable, the type of entry is indicated (literal or IRI). Yellow objects with an asterisk (*) indicate the connection between their subjects and the subjects of other tables. The interaction with the AOP-Wiki RDF is highlighted at the Key Events and Adverse Outcome Pathways (in green). Forward slashes indicate the inclusion of multiple objects as part of the subject-predicate-object triple.

AOP-DB Semantic Mapping using illustrating the predicates and objects of the nine core subject types in the AOP-DB RDF (in blue). Vertical columns show subjects, and the middle and right columns indicate predicates and objects, respectively. Where applicable, the type of entry is indicated (literal or IRI). Yellow objects with an asterisk (*) indicate the connection between their subjects and the subjects of other tables. The interaction with the AOP-Wiki RDF is highlighted at the Key Events and Adverse Outcome Pathways (in green). Forward slashes indicate the inclusion of multiple objects as part of the subject-predicate-object triple. The Key Event subjects are linked to NCBI Genes through the ‘data_1,027’ term of the EDAM ontology, which in turn is linked to pathways and assays with respectively the terms ‘pw:0000001’ from the Pathway Ontology and ‘mmo:0000441’ from the Measurement Method Ontology. Furthermore, matching identifiers were linked with ‘skos:exactMatch’, providing IRIs of Ensembl IDs, HGNC Symbols, and UniProt IDs. On the other hand, Chemical-Gene interactions, Protein-protein interactions, ToxCast assays, and Pathways have links to NCBI Gene subjects through the term ‘data_1,027’ from the EDAM ontology. Finally, taxonomy is referenced by ToxCast assay and pathway subjects through the term ‘ncbitaxon:131,567’ indicating cellular organism.

The AOP-DB SPARQL Endpoint

The AOP-DB RDF can be explored through the AOP-DB SPARQL (https://aopdb.rdf.bigcat-bioinformatics.org/sparql). It allows custom SPARQL queries to return output tables in a variety of formats, where it is possible to directly combine different resources with federated SPARQL queries.

AOP-DB RDF Use Case Example

SPARQL queries can be used to query the RDF in order to answer biological and toxicological questions, such as which molecular targets (e.g. genes/proteins), chemical stressors, key events, or in vitro assays are relevant for adverse outcomes of interest. The use case examples provided herein (Supplementary 1) illustrate the utility of the AOP-DB RDF content, as well as the power of integrating these data with other diverse, external databases using federated queries. Our first use case implements the AOP-DB RDF to identify AOP-relevant molecular targets that have associated ToxCast assay targets, which has previously not been possible. The automated linkage of ToxCast assays and KEs in AOP-Wiki can serve as a prioritization tool by exploring the activation of KEs by the many chemicals that have been investigated in ToxCast. The second use case shows the integration of the AOP-DB RDF with other databases that provide access to their data through SPARQL endpoints. A single SPARQL query can be executed to extract AOP IDs, KE IDs, KE titles and protein names from the AOP-Wiki RDF, extract protein descriptions from the Protein Ontology, and the names and descriptions of pathways in WikiPathways, all based on the NCBI Gene IDs captured in the AOP-DB. Through the integration of these diverse data sources, we can effectively explore the data and build automated computational workflows to address questions of toxicological concern.

Conclusion

A central goal of computational toxicology is to predict and explain how the human body responds after exposure to specific xenobiotics or other chemicals in silico. This effort has been hampered by several major limiting factors, including fragmented and poorly structured data, and insufficient access to computational resources and expertise. The AOP-DB RDF and SPARQL endpoint created and discussed herein allow improved access to rigorously structured AOP data and other associated data of toxicological interest. This work improves computational organization and efficiency, through improved data integration, for toxicological and related datasets, and contributes to continued progress in computational toxicology, chemical screening and the improvement of human health risk assessment. The AOP-DB RDF will be improved with regular data updates and continued data integration with relevant datasets. Future work includes semantic integration of AOP-DB disease-gene data, tissue-specific gene interaction networks, AOP functional single nucleotide polymorphism (SNP) and population SNP frequency information and chemical-specific datasets.
  30 in total

1.  LOINC, a universal standard for identifying laboratory observations: a 5-year update.

Authors:  Clement J McDonald; Stanley M Huff; Jeffrey G Suico; Gilbert Hill; Dennis Leavelle; Raymond Aller; Arden Forrey; Kathy Mercer; Georges DeMoor; John Hook; Warren Williams; James Case; Pat Maloney
Journal:  Clin Chem       Date:  2003-04       Impact factor: 8.327

2.  Uberon, an integrative multi-species anatomy ontology.

Authors:  Christopher J Mungall; Carlo Torniai; Georgios V Gkoutos; Suzanna E Lewis; Melissa A Haendel
Journal:  Genome Biol       Date:  2012-01-31       Impact factor: 13.583

3.  The clinical measurement, measurement method and experimental condition ontologies: expansion, improvements and new applications.

Authors:  Jennifer R Smith; Carissa A Park; Rajni Nigam; Stanley Jf Laulederkind; G Thomas Hayman; Shur-Jen Wang; Timothy F Lowry; Victoria Petri; Jeff De Pons; Marek Tutaj; Weisong Liu; Elizabeth A Worthey; Mary Shimoyama; Melinda R Dwinell
Journal:  J Biomed Semantics       Date:  2013-10-08

4.  UniProt: a worldwide hub of protein knowledge.

Authors: 
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

5.  Creating a Structured AOP Knowledgebase via Ontology-Based Annotations.

Authors:  Cataia Ives; Ivana Campia; Rong-Lin Wang; Clemens Wittwehr; Stephen Edwards
Journal:  Appl In Vitro Toxicol       Date:  2017-12-01

6.  EDAM: an ontology of bioinformatics operations, types of data and identifiers, topics and formats.

Authors:  Jon Ison; Matús Kalas; Inge Jonassen; Dan Bolser; Mahmut Uludag; Hamish McWilliam; James Malone; Rodrigo Lopez; Steve Pettifer; Peter Rice
Journal:  Bioinformatics       Date:  2013-03-11       Impact factor: 6.937

7.  PID: the Pathway Interaction Database.

Authors:  Carl F Schaefer; Kira Anthony; Shiva Krupa; Jeffrey Buchoff; Matthew Day; Timo Hannay; Kenneth H Buetow
Journal:  Nucleic Acids Res       Date:  2008-10-02       Impact factor: 16.971

8.  The pathway ontology - updates and applications.

Authors:  Victoria Petri; Pushkala Jayaraman; Marek Tutaj; G Thomas Hayman; Jennifer R Smith; Jeff De Pons; Stanley Jf Laulederkind; Timothy F Lowry; Rajni Nigam; Shur-Jen Wang; Mary Shimoyama; Melinda R Dwinell; Diane H Munzenmaier; Elizabeth A Worthey; Howard J Jacob
Journal:  J Biomed Semantics       Date:  2014-02-05

9.  Evaluating FAIR maturity through a scalable, automated, community-governed framework.

Authors:  Mark D Wilkinson; Michel Dumontier; Susanna-Assunta Sansone; Luiz Olavo Bonino da Silva Santos; Mario Prieto; Dominique Batista; Peter McQuilton; Tobias Kuhn; Philippe Rocca-Serra; Mercѐ Crosas; Erik Schultes
Journal:  Sci Data       Date:  2019-09-20       Impact factor: 6.444

10.  KEGG: integrating viruses and cellular organisms.

Authors:  Minoru Kanehisa; Miho Furumichi; Yoko Sato; Mari Ishiguro-Watanabe; Mao Tanabe
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.