Literature DB >> 25399028

SYSBIONS: nested sampling for systems biology.

Rob Johnson¹, Paul Kirk¹, Michael P H Stumpf¹.

Abstract

MOTIVATION: Model selection is a fundamental part of the scientific process in systems biology. Given a set of competing hypotheses, we routinely wish to choose the one that best explains the observed data. In the Bayesian framework, models are compared via Bayes factors (the ratio of evidences), where a model's evidence is the support given to the model by the data. A parallel interest is inferring the distribution of the parameters that define a model. Nested sampling is a method for the computation of a model's evidence and the generation of samples from the posterior parameter distribution.
RESULTS: We present a C-based, GPU-accelerated implementation of nested sampling that is designed for biological applications. The algorithm follows a standard routine with optional extensions and additional features. We provide a number of methods for sampling from the prior subject to a likelihood constraint.
AVAILABILITY AND IMPLEMENTATION: The software SYSBIONS is available from http://www.theosysbio.bio.ic.ac.uk/resources/sysbions/ CONTACT: m.stumpf@imperial.ac.uk, robert.johnson11@imperial.ac.uk.

Entities: Chemical

Mesh：

Year: 2014 PMID： 25399028 PMCID： PMC4325544 DOI： 10.1093/bioinformatics/btu675

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Given a set of models proposed to explain some observation, we seek to rank them according to the extent to which they are supported by some data. Likelihood-based approaches find the point at which the likelihood function is maximized, and compare models based on these maxima (Burnham and Anderson, 2002). Bayesian approaches for model selection rest on Bayes factors: the ratio of evidences of competing models. A number of methods exist to estimate the evidence (Kirk ), a metric of the support afforded to a model by some data. Nested sampling is a Bayesian method for evidence estimation and parameter inference for systems where a likelihood function can be defined (Skilling, 2006). As the algorithm progresses, it generates samples from the posterior parameter distribution directly. We present a C-based nested sampling tool for computational biologists. The user supplies a likelihood function, some experimental data and the prior parameter distribution. The program returns a value for the evidence alongside samples from the posterior parameter distribution. There exists a Fortran-based nested sampling package, MultiNest (Feroz ), used in the astrophysics community. Our work is aimed specifically at the biological community and includes an SBML (Systems Biology Markup Language, Rodriguez ) parser so that models can be specified according to current standards. The recent growing use of nested sampling in systems biology invites the release of a tool implementing the method (Aitken and Akman, 2013; Burkoff ; Dybowski ; Kirk ; Pullen and Morris, 2014).

2 APPROACH

The evidence is defined as , where θ is the parameter set (and Θ the parameter space), the likelihood function and π the prior. The change in notation , where is the cumulative density function, allows the integral to be written . This can be approximated as a sum, , where N points are sampled and W is the proportion of prior mass represented by point i, calculated as the difference between the volume enclosed by the contour of constant likelihood through and that through . Nested sampling is a method for generating the sequence of points . For a thorough presentation of nested sampling, we refer the reader to the work of Skilling (2006) and Sivia and Skilling (2006). For our purposes, we follow the general algorithm: 1. Initialise Z = 0 2. Generate N points from 3. for a. Find with lowest likelihood, b. Calculate c. Set d. Resample 4. end for 5. Set Our program is written primarily in C with additional capability for GPU acceleration. Other features include an SBML parser for automated generation of likelihood functions (Liepe ) and plotting tools. For the task of sampling from the prior subject to a likelihood constraint (step 3d), we provide three methods. The accuracy of the approximation in step 3b depends on the population of N points (live points) being truly distributed as the prior within the given likelihood constraint (Skilling, 2006).

3 METHODS

Our nested sampling package is a command-line tool for Linux and MacOSX platforms. Pre-requisites are listed in the accompanying manual. The user supplies a likelihood function, either by editing a template file or using an SBML file. An executable is then made that receives input from the command line. When the program is run, live points are generated and their likelihoods evaluated according to the function supplied by the user. On completion, it returns the calculated evidence with standard deviation, samples from the posterior, trajectories generated by points from the posterior and files from which the algorithm can be restarted.

3.1 Algorithm options

Options available to the user are listed in Table 1. The only required input is the parameter set (all other variables have default values). Parameters may be constant or inferred subject to a uniform prior distribution, for which lower and upper bounds must be supplied. The algorithm can be terminated either by specifying the number of iterations, or by monitoring the rate at which the evidence accumulates: the loop terminates at iteration m if .

Table 1.

Input options

Variable	Tag	Input	Default
Number of live points	nLive	**integer	1000
Number of iterations	maxIter	**integer	on, 10 000
Tolerance	tol	decimal	off, 0.001
*Parameters	constant	**value	none
*Parameters	uniform	** bounds	none
Sampling method	rejection	none	off
	rw	none	off
	ellipsoid	expansion factor	on, 2
Restart from file	Restart	file paths	_restart_points.txt,
Restart from file	Restart	file paths	_restart_input.txt
Write restart	write_restart	file path root	_restart
Points to leap	nLeap	**integer	1
Adaptive leaping	adaptive	none	off
CUDA	cuda	**nLeap,	off, none,
CUDA	cuda	**max. threads	none

*required; **required if tag given.

Input options *required; **required if tag given.

3.2 Sampling methods

We include three sampling methods for step 3d of the algorithm: rejection, for perfectly sampling from the prior, and random walk (following Sivia and Skilling, 2006) and ellipsoidal (following Feroz ) for refined sampling with reduced computational cost. Rejection: The rejection method samples from the prior as initially defined, accepting the point if its likelihood value is within the constraint and rejecting otherwise. This method remains true to the requirement that samples are taken from the prior subject to the likelihood constraint, but its efficiency is poor: as the lowest likelihood increases, the acceptance rate becomes prohibitively small. Random walk: The random-walk method duplicates a point randomly chosen from the current live-point population and walks it 20 steps, accepting the new point at each step if its likelihood is within the constraint. The steps are scaled according to the covariance among the present population, and scaled further to converge to an acceptance rate of 0.5 (Sivia and Skilling, 2006). Ellipsoidal: The ellipsoidal method (Mukherjee ) creates an ellipsoid surrounding the current population of live points, expanded by some user-supplied factor. The new point is sampled from within the ellipsoid. This increases the acceptance rate but risks excluding areas of prior mass that lie inside the current likelihood constraint.

3.3 Output

A summary file of input and output information is created, documenting the number of live points, number of iterations, tolerance, sampling method and parameter ranges, followed by the evidence with standard deviation, the prior-to-posterior information gain and the means of all parameters and their standard deviations. Posterior distributions of the parameters can be plotted individually as histograms and in pair-wise scatter plots using the data stored in the posterior file. Finally, a file of trajectories is created that can be compared against the input data. Restart files are created, documenting input parameters that must persist upon restart (such as the number of live points) and listing all points, live and discarded. These files can be used to restart the program from where it completed. It is also possible to specify the path to where the restart files are written.

4 SUMMARY

We present SYSBIONS, a computational tool for model selection and parameter inference using nested sampling. Using a data-based likelihood function, our package calculates the evidence of a model and the corresponding posterior parameter distribution.

6 in total

1. Exploring the energy landscapes of protein folding simulations with Bayesian computation.

Authors: Nikolas S Burkoff; Csilla Várnai; Stephen A Wells; David L Wild
Journal: Biophys J Date: 2012-02-21 Impact factor: 4.033

2. ABC-SysBio--approximate Bayesian computation in Python with GPU support.

Authors: Juliane Liepe; Chris Barnes; Erika Cule; Kamil Erguler; Paul Kirk; Tina Toni; Michael P H Stumpf
Journal: Bioinformatics Date: 2010-07-15 Impact factor: 6.937

3. SBMLeditor: effective creation of models in the Systems Biology Markup language (SBML).

Authors: Nicolas Rodriguez; Marco Donizelli; Nicolas Le Novère
Journal: BMC Bioinformatics Date: 2007-03-06 Impact factor: 3.169

4. Nested sampling for parameter inference in systems biology: application to an exemplar circadian model.

Authors: Stuart Aitken; Ozgur E Akman
Journal: BMC Syst Biol Date: 2013-07-30

5. Bayesian model comparison and parameter inference in systems biology using nested sampling.

Authors: Nick Pullen; Richard J Morris
Journal: PLoS One Date: 2014-02-11 Impact factor: 3.240

6. Nested sampling for Bayesian model comparison in the context of Salmonella disease dynamics.

Authors: Richard Dybowski; Trevelyan J McKinley; Pietro Mastroeni; Olivier Restif
Journal: PLoS One Date: 2013-12-20 Impact factor: 3.240

6 in total

8 in total

1. Topological sensitivity analysis for systems biology.

Authors: Ann C Babtie; Paul Kirk; Michael P H Stumpf
Journal: Proc Natl Acad Sci U S A Date: 2014-12-15 Impact factor: 11.205

2. Likelihood-free nested sampling for parameter inference of biochemical reaction networks.

Authors: Jan Mikelson; Mustafa Khammash
Journal: PLoS Comput Biol Date: 2020-10-09 Impact factor: 4.475

3. Dizzy-Beats: a Bayesian evidence analysis tool for systems biology.

Authors: Stuart Aitken; Alastair M Kilpatrick; Ozgur E Akman
Journal: Bioinformatics Date: 2015-01-30 Impact factor: 6.937

4. BCM: toolkit for Bayesian analysis of Computational Models using samplers.

Authors: Bram Thijssen; Tjeerd M H Dijkstra; Tom Heskes; Lodewyk F A Wessels
Journal: BMC Syst Biol Date: 2016-10-21

Review 5. Graphics processing units in bioinformatics, computational biology and systems biology.

Authors: Marco S Nobile; Paolo Cazzaniga; Andrea Tangherloni; Daniela Besozzi
Journal: Brief Bioinform Date: 2017-09-01 Impact factor: 11.622

6. Bayesian inference of agent-based models: a tool for studying kidney branching morphogenesis.

Authors: Ben Lambert; Adam L MacLean; Alexander G Fletcher; Alexander N Combes; Melissa H Little; Helen M Byrne
Journal: J Math Biol Date: 2018-02-01 Impact factor: 2.259

Review 7. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches.

Authors: Anastasis Oulas; George Minadakis; Margarita Zachariou; Kleitos Sokratous; Marilena M Bourdakou; George M Spyrou
Journal: Brief Bioinform Date: 2019-05-21 Impact factor: 11.622

8. A Bayesian semi-parametric model for thermal proteome profiling.

Authors: Siqi Fang; Paul D W Kirk; Marcus Bantscheff; Kathryn S Lilley; Oliver M Crook
Journal: Commun Biol Date: 2021-06-29

8 in total