| Literature DB >> 28570593 |
Yuriy Sverchkov1, Mark Craven1.
Abstract
Various types of biological knowledge describe networks of interactions among elementary entities. For example, transcriptional regulatory networks consist of interactions among proteins and genes. Current knowledge about the exact structure of such networks is highly incomplete, and laboratory experiments that manipulate the entities involved are conducted to test hypotheses about these networks. In recent years, various automated approaches to experiment selection have been proposed. Many of these approaches can be characterized as active machine learning algorithms. Active learning is an iterative process in which a model is learned from data, hypotheses are generated from the model to propose informative experiments, and the experiments yield new data that is used to update the model. This review describes the various models, experiment selection strategies, validation techniques, and successful applications described in the literature; highlights common themes and notable distinctions among methods; and identifies likely directions of future research and open problems in the area.Entities:
Mesh:
Year: 2017 PMID: 28570593 PMCID: PMC5453429 DOI: 10.1371/journal.pcbi.1005466
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1The active learning loop.
In active machine learning, data from experiments informs a learner that formulates queries for further experiments that are expected to be most informative for refining a model.
Fig 2A brief summary of reviewed methods.
Icons arranged in the table represent individual methods. The columns represent the various experiment selection criteria, and the methods are divided vertically between de novo methods and methods that use prior knowledge. Visual elements in each icon indicate whether the method is deterministic (cog) or stochastic (die), whether it models continuous (circle) or discrete (diamond) variables, what is specified in a query for an experiment (G for genetic and E for environmental perturbations), and the dimensionality of the data used (dot array for multidimensional data and a ruler for one-dimensional data).