| Literature DB >> 30729409 |
Christopher Yau1,2, Kieran Campbell3,4.
Abstract
Bayesian statistical learning provides a coherent probabilistic framework for modelling uncertainty in systems. This review describes the theoretical foundations underlying Bayesian statistics and outlines the computational frameworks for implementing Bayesian inference in practice. We then describe the use of Bayesian learning in single-cell biology for the analysis of high-dimensional, large data sets.Entities:
Keywords: Bayesian; Computational biology; Statistical modelling
Year: 2019 PMID: 30729409 PMCID: PMC6381359 DOI: 10.1007/s12551-019-00499-1
Source DB: PubMed Journal: Biophys Rev ISSN: 1867-2450
Fig. 1a Overview of Bayesian modelling. Data is assumed to be generated by a stochastic model which describes various underlying processes and is specified by some unknown parameters. Bayesian inference seeks to recover those parameters from the observed data. b Prior beliefs are expressed as a probability distribution over parameters 𝜃 = (𝜃1,𝜃2) which are updated when data is collected via the likelihood function to give a posterior distribution over 𝜃. c Real-world posterior distributions often contain a number of separated high probability regions. An ideal Metropolis-Hastings algorithm would possess a proposal mechanism that allows regular movement between different high-probability regions without the need to tranverse through low-probability intermediate regions. d Variational methods build approximations of the true posterior distribution. In this example, a mean-field approximation breaks the dependencies between the parameters (𝜃1,𝜃2) so the variational posterior models each dimension separately
Fig. 2a Single-cell differential expression analysis aims to identify differences in expression level and variability between cell types. Confounding effects such as dropout and batch effects must be accounted for in order to avoid false conclusions. b Variational autoencoders use deep neural networks to encode input expression data vectors into low-dimensional latent representations whilst simultaneously learning decoders that can generate realistic expression data from these latent representations. c Pseudotemporal model aims to identify latent uni-dimensional representations that correspond to physical time variation from high-dimensional cross-sectional single-cell data. d Probabilistic approaches to tumour phylogeny inference are essential in the presence of sequencing noise since genotyping errors can lead to uncertainties in phylogenetic reconstruction. Here, the presence of allelic dropout leading to genotyping error in a single-cell type could lead to alternate phylogenetic histories and different interpretations of the importance of acquired mutations