Literature DB >> 27153608

ProbOnto: ontology and knowledge base of probability distributions.

Maciej J Swat1, Pierre Grenon2, Sarala Wimalaratne1.   

Abstract

MOTIVATION: Probability distributions play a central role in mathematical and statistical modelling. The encoding, annotation and exchange of such models could be greatly simplified by a resource providing a common reference for the definition of probability distributions. Although some resources exist, no suitably detailed and complex ontology exists nor any database allowing programmatic access.
RESULTS: ProbOnto, is an ontology-based knowledge base of probability distributions, featuring more than 80 uni- and multivariate distributions with their defining functions, characteristics, relationships and re-parameterization formulas. It can be used for model annotation and facilitates the encoding of distribution-based models, related functions and quantities.
AVAILABILITY AND IMPLEMENTATION: http://probonto.org CONTACT: mjswat@ebi.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2016. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2016        PMID: 27153608      PMCID: PMC5013898          DOI: 10.1093/bioinformatics/btw170

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

When encoding probabilistic uncertainties using a given parametric distribution, its name and parameters are usually sufficient to specify the intended distribution unambiguously, as in many cases such parameter set is unique. However, in multiple cases, two or more parameterizations exist and it becomes essential to specify correctly the parameters for the distribution in order to obtain the correct model. A well-structured, independent, standard reference would greatly facilitate the specification of a distribution and its declaration in a programming language or exchange format as is shown by interoperability issues between tools with distributions differing in their parameterizations (LeBauer ). Many resources are available online (Dinov ; Williams ), as tools (Marichev and Trott, 2013) and in printed format (Forbes ; Johnson ; Leemis and Mcqueston, 2008) but a comprehensive ontology which formalizes the theory of kinds and relations of probability distributions does not yet exist. Some Bioportal (Noy ) ontologies provide a simple classification and little information; in most cases parameters, defining functions, or quantities are not defined and information about the relationships between distributions is not available. Such ontologies, because of their simplicity, are insufficient for our purposes and it is the same with existing databases of distributions.

2 Methods

We compiled an initial ‘common denominator’ collection of parametric distributions from UncertML (Williams ) and Matlab Statistical Toolbox (MathWorks, 2015) and then extended this set to cover models used in various statistical modelling areas, in particular, pharmacometrics. A number of important and more exotic discrete data models were treated with alternative versions of the Negative Binomial and the Generalized Poisson, or the Conway–Maxwell–Poisson distribution. We further included all distributions and relevant parameterizations used in the following tools: Monolix (Lixoft Team, 2014), NONMEM (Beal ) and winBUGS (Lunn ), and a few found in MCSim (Bois, 2010). We completed this collection with many relationships/re-parameterizations between distributions (Leemis and Mcqueston, 2008). We used the following sources to populate the database with probability functions, their relationships and quantities: Forbes ), Johnson and the probability distribution pages of the Wikipedia. The following list summarizes the features of ProbOnto 1.0 Over 80 uni- and multivariate distributions and alternative parameterizations. Probability density or mass functions and available cumulative distribution, hazard and survival functions. Supports encoding of univariate mixture distributions. Over 130 relationships and re-parameterization formulas. Related quantities such as mean, median, mode and variance. Parameter and support/range definitions and distribution type. Latex and R code for mathematical functions. LeBauer showed that re-parameterizations between related distributions are essential for the interoperability between existing tools using only two tools and five distributions. ProbOnto extended coverage in this respect beyond proof of concept with a wider scope and number of tools and distributions. ProbOnto contains, for example, six alternative parameterizations of the log-normal or five of the negative binomial distribution, all in use in different contexts. Scientists and tool developers can look all these up in ProbOnto (Supplementary Material).

3 Ontological model

ProbOnto is a knowledge base built from a simple ontological model. At its core, a probability distribution is an instance of the class thereof, a specialization of the class of mathematical objects. A distribution relates to a number of other individuals, which are instances of various categories in the ontology. For example, these are parameters and related functions associated with a given probability distribution. This strategy allows for the rich representation of attributes and relationships between domain objects. The ontology can be seen as a conceptual schema in the domain of mathematics and has been implemented as a PowerLoom (MacGregor ) knowledge base. An OWL version is generated programmatically using the Jena API (McBride, 2001). Output for ProbOnto are provided as supplementary materials and published on or linked from the ProbOnto website. The OWL version of ProbOnto is available via Ontology Lookup Service (OLS) to facilitate simple searching and visualization of the content (Jupp ). In addition the OLS API provides methods to programmatically access ProbOnto and to integrate it into applications.

4 Use case

ProbOnto was first designed to facilitate the encoding of nonlinear mixed effect models and their annotation in PharmML, Pharmacometrics Markup Language, (Swat ) developed by DDMoRe (Harnisch ). The scope and features of the language made ProbOnto invaluable in encoding of diverse models applicable to discrete (e.g. count, categorical and time-to-event) and continuous data (Supplementary Material). Despite its PharmML original context, ProbOnto is purpose independent and does not put implementation constraints on tool designers, thus allowing an open ended number of usage scenarios. When using ProbOnto with PharmML, a small generic XML schema was enough to allow for flexible encoding of distributions relevant in pharmacometric modelling, their parameters and functions (see Supplementary Material, Appendix). The following example shows how the negative binomial distribution is encoded by using its codename and declaring that of its parameters (‘rate’ and ‘overdispersion’). To specify any given distribution unambiguously using ProbOnto, it is sufficient to declare its code name and the code names of its parameters.

5 Future plans and conclusions

ProbOnto provides a means for the encoding and annotation of statistical models, thus facilitating their exchange between software tools. Due to its generic construction which does not enforce a specific implementation in target software, ProbOnto can be applied across various modelling platforms and databases. Although it already incorporates a high number of distributions, this collection is still growing with new distributions added on regular basis; ProbOnto will ultimately include all distributions supported by tools such as STAN (STAN Development Team, 2015) and R (R Core Team, 2015). In the future, additional features will extend the knowledge base such as applications of a probability distribution, its mathematical properties, such as linear combination, convolution and scaling (Leemis and Mcqueston, 2008) and related data type.
  6 in total

1.  The BUGS project: Evolution, critique and future directions.

Authors:  David Lunn; David Spiegelhalter; Andrew Thomas; Nicky Best
Journal:  Stat Med       Date:  2009-11-10       Impact factor: 2.373

Review 2.  Physiologically based modelling and prediction of drug interactions.

Authors:  Frédéric Y Bois
Journal:  Basic Clin Pharmacol Toxicol       Date:  2009-11-11       Impact factor: 4.080

3.  Probability Distributome: A Web Computational Infrastructure for Exploring the Properties, Interrelations, and Applications of Probability Distributions.

Authors:  Ivo D Dinov; Kyle Siegrist; Dennis K Pearl; Alexandr Kalinin; Nicolas Christou
Journal:  Comput Stat       Date:  2015-06-26       Impact factor: 1.000

4.  Drug and disease model resources: a consortium to create standards and tools to enhance model-based drug development.

Authors:  L Harnisch; I Matthews; J Chard; M O Karlsson
Journal:  CPT Pharmacometrics Syst Pharmacol       Date:  2013-03-20

5.  BioPortal: ontologies and integrated data resources at the click of a mouse.

Authors:  Natalya F Noy; Nigam H Shah; Patricia L Whetzel; Benjamin Dai; Michael Dorf; Nicholas Griffith; Clement Jonquet; Daniel L Rubin; Margaret-Anne Storey; Christopher G Chute; Mark A Musen
Journal:  Nucleic Acids Res       Date:  2009-05-29       Impact factor: 16.971

6.  Pharmacometrics Markup Language (PharmML): Opening New Perspectives for Model Exchange in Drug Development.

Authors:  M J Swat; S Moodie; S M Wimalaratne; N R Kristensen; M Lavielle; A Mari; P Magni; M K Smith; R Bizzotto; L Pasotti; E Mezzalana; E Comets; C Sarr; N Terranova; E Blaudez; P Chan; J Chard; K Chatel; M Chenel; D Edwards; C Franklin; T Giorgino; M Glont; P Girard; P Grenon; K Harling; A C Hooker; R Kaye; R Keizer; C Kloft; J N Kok; N Kokash; C Laibe; C Laveille; G Lestini; F Mentré; A Munafo; R Nordgren; H B Nyberg; Z P Parra-Guillen; E Plan; B Ribba; G Smith; I F Trocóniz; F Yvon; P A Milligan; L Harnisch; M Karlsson; H Hermjakob; N Le Novère
Journal:  CPT Pharmacometrics Syst Pharmacol       Date:  2015-06-15
  6 in total
  5 in total

1.  PharmML in Action: an Interoperable Language for Modeling and Simulation.

Authors:  R Bizzotto; E Comets; G Smith; F Yvon; N R Kristensen; M J Swat
Journal:  CPT Pharmacometrics Syst Pharmacol       Date:  2017-09-15

2.  Model Description Language (MDL): A Standard for Modeling and Simulation.

Authors:  Mike K Smith; Stuart L Moodie; Roberto Bizzotto; Eric Blaudez; Elisa Borella; Letizia Carrara; Phylinda Chan; Marylore Chenel; Emmanuelle Comets; Ronald Gieschke; Kajsa Harling; Lutz Harnisch; Niklas Hartung; Andrew C Hooker; Mats O Karlsson; Richard Kaye; Charlotte Kloft; Natallia Kokash; Marc Lavielle; Giulia Lestini; Paolo Magni; Andrea Mari; France Mentré; Chris Muselle; Rikard Nordgren; Henrik B Nyberg; Zinnia P Parra-Guillén; Lorenzo Pasotti; Niels Rode-Kristensen; Maria L Sardu; Gareth R Smith; Maciej J Swat; Nadia Terranova; Gunnar Yngman; Florent Yvon; Nick Holford
Journal:  CPT Pharmacometrics Syst Pharmacol       Date:  2017-07-12

3.  The Standard Output: A Tool-Agnostic Modeling Storage Format.

Authors:  Nadia Terranova; Mike K Smith; Rikard Nordgren; Emmanuelle Comets; Marc Lavielle; Kajsa Harling; Andrew C Hooker; Celine Sarr; France Mentré; Florent Yvon; Maciej J Swat
Journal:  CPT Pharmacometrics Syst Pharmacol       Date:  2018-08-12

4.  Towards Semantic Sensor Data: An Ontology Approach.

Authors:  Jin Liu; Yunhui Li; Xiaohu Tian; Arun Kumar Sangaiah; Jin Wang
Journal:  Sensors (Basel)       Date:  2019-03-08       Impact factor: 3.576

5.  TrhOnt: building an ontology to assist rehabilitation processes.

Authors:  Idoia Berges; David Antón; Jesús Bermúdez; Alfredo Goñi; Arantza Illarramendi
Journal:  J Biomed Semantics       Date:  2016-10-04
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.