| Literature DB >> 20807436 |
Barry Hardy1, Nicki Douglas, Christoph Helma, Micha Rautenberg, Nina Jeliazkova, Vedrin Jeliazkov, Ivelina Nikolova, Romualdo Benigni, Olga Tcheremenskaia, Stefan Kramer, Tobias Girschick, Fabian Buchwald, Joerg Wicker, Andreas Karwath, Martin Gütlein, Andreas Maunz, Haralambos Sarimveis, Georgia Melagraki, Antreas Afantitis, Pantelis Sopasakis, David Gallagher, Vladimir Poroikov, Dmitry Filimonov, Alexey Zakharov, Alexey Lagunin, Tatyana Gloriozova, Sergey Novikov, Natalia Skvortsova, Dmitry Druzhilovsky, Sunil Chawla, Indira Ghosh, Surajit Ray, Hitesh Patel, Sylvia Escher.
Abstract
OpenTox provides an interoperable, standards-based Framework for the support of predictive toxicology data management, algorithms, modelling, validation and reporting. It is relevant to satisfying the chemical safety assessment requirements of the REACH legislation as it supports access to experimental data, (Quantitative) Structure-Activity Relationship models, and toxicological information through an integrating platform that adheres to regulatory requirements and OECD validation principles. Initial research defined the essential components of the Framework including the approach to data access, schema and management, use of controlled vocabularies and ontologies, architecture, web service and communications protocols, and selection and integration of algorithms for predictive modelling. OpenTox provides end-user oriented tools to non-computational specialists, risk assessors, and toxicological experts in addition to Application Programming Interfaces (APIs) for developers of new applications. OpenTox actively supports public standards for data representation, interfaces, vocabularies and ontologies, Open Source approaches to core platform components, and community-based collaboration approaches, so as to progress system interoperability goals.The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation.Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way.Entities:
Year: 2010 PMID: 20807436 PMCID: PMC2941473 DOI: 10.1186/1758-2946-2-7
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Overall Use Case process template for predicting an endpoint for a chemical structure
| Activity Name: | Overall Use Case - Given a chemical structure, predict endpoints. |
|---|---|
| User needs toxicity prediction for one compound and initiates service request. | |
| Assume user has at least basic toxicity and chemistry knowledge but is not an expert QSAR user. | |
| 2D Chemical Structure, toxicity endpoint(s). | |
| Computer interface for user entry of structure, selection of endpoints and return of results. OpenTox Data Resources, Prediction Model Building and Report Generation. | |
| Incorrect chemical structure. Endpoint unavailable. Unable to predict endpoint. | |
| In case of exception events direct user to further consulting and advice services. | |
| Report on endpoint predictions. | |
| Suggestion of further Use Cases when applicable. | |
| OpenTox API, Data Resources, Prediction Model Building, Validation and Report Generation. | |
Figure 1Workflow for Use Case for predicting an endpoint for a chemical structure.
Figure 2Relationships between OpenTox Resources modelled in the OpenTox Ontology.
Figure 3OpenTox Algorithm Type Ontology.
Measures for evaluating the Quality of OpenTox Models
| Measures for Classification Tasks | |
|---|---|
| Confusion Matrix | A confusion matrix is a matrix, where each row of the matrix represents the instances in a predicted class, while each column represents the instances in an actual class. One benefit of a confusion matrix is that it is easy to see if the system is confusing two or more classes. |
| Absolute number and percentage of unpredicted compounds | Some compounds might fall outside the applicability domain of the algorithm or model. These numbers provide an overview on the applicability domain fit for the compound set requiring prediction. |
| Precision, recall, and F2-measure | These three measures give an overview on how pure and how sensitive the model is. The F2-measure combines the other two measures. |
| ROC curve plot and AUC | A receiver operating characteristic (ROC) curve is a graphical plot of the true-positive rate against the false-positive rate as its discrimination threshold is varied. This gives a good understanding of how well a model is performing. As a summarisation performance scalar metric, the area under curve (AUC) is calculated from the ROC curve. A perfect model would have area 1.0, while a random one would have area 0.5. |
| MSE and RMSE | The mean square error (MSE) and root mean squared error (RMSE) of a regression model are popular ways to quantify the difference between the predictor and the true value. |
| R2 | The explained variance (R²) provides a measure of how well future outcomes are likely to be predicted by the model. It compares the explained variance (variance of the model's predictions) with the total variance (of the data). |
Figure 4Workflow diagram illustrating the training test set validation of a prediction algorithm.
Summary of Different Types of OpenTox Reports
| Standard reports | |
|---|---|
| Prediction of a single (unseen) component | Activity, applicability domain, confidence |
| Prediction of multiple (unseen) components | Ranking according to activity/confidence |
| Validation of a model | Different performance criteria (on various datasets), based on cross-validation/external test set validation |
| Making predictions on a particular dataset | Prediction results of various algorithms |
| Comparison of different models/algorithms | Ranking according to different performance criteria |
| Evaluation of a feature generation algorithm | Performance of various algorithms using the generated features compared to other features |
| Evaluation of a feature selection algorithm | Performance of various algorithms using the selected features compared to no feature selection |
Figure 5Display of results from Step 5 of ToxPredict Application.
Figure 6ToxPredict Step 1 - Enter Compound, Interaction of OpenTox Services.
Figure 7ToxPredict Step 2 - Structure Selection, Interaction of OpenTox Services.
Figure 8ToxPredict Step 3 - Model Selection, Interaction of OpenTox Services: User-System Interaction.
Figure 9ToxPredict Step 3 - Behind the scenes: previously, algorithm, model and feature services had registered a list of algorithms, models and features into the Ontology service, by POSTing the URIs of these objects.
Figure 10ToxPredict Step 4 - Model Estimation, Interaction of OpenTox Services: User-System Interaction.
Figure 11ToxPredict Step 4 - Model Estimation, Interaction of OpenTox Services: Behind the scenes.
Figure 12ToxPredict Step 5 - Display Results, Interaction of OpenTox Services.