| Literature DB >> 26389116 |
Santiago Videla1, Irina Konokotina2, Leonidas G Alexopoulos3, Julio Saez-Rodriguez4, Torsten Schaub5, Anne Siegel6, Carito Guziolowski2.
Abstract
Logic models of signaling pathways are a promising way of building effective in silico functional models of a cell, in particular of signaling pathways. The automated learning of Boolean logic models describing signaling pathways can be achieved by training to phosphoproteomics data, which is particularly useful if it is measured upon different combinations of perturbations in a high-throughput fashion. However, in practice, the number and type of allowed perturbations are not exhaustive. Moreover, experimental data are unavoidably subjected to noise. As a result, the learning process results in a family of feasible logical networks rather than in a single model. This family is composed of logic models implementing different internal wirings for the system and therefore the predictions of experiments from this family may present a significant level of variability, and hence uncertainty. In this paper, we introduce a method based on Answer Set Programming to propose an optimal experimental design that aims to narrow down the variability (in terms of input-output behaviors) within families of logical models learned from experimental data. We study how the fitness with respect to the data can be improved after an optimal selection of signaling perturbations and how we learn optimal logic models with minimal number of experiments. The methods are applied on signaling pathways in human liver cells and phosphoproteomics experimental data. Using 25% of the experiments, we obtained logical models with fitness scores (mean square error) 15% close to the ones obtained using all experiments, illustrating the impact that our approach can have on the design of experiments for efficient model calibration.Entities:
Keywords: Boolean logic models; answer set programming; experimental design; phosphoproteomic; signaling networks
Year: 2015 PMID: 26389116 PMCID: PMC4560026 DOI: 10.3389/fbioe.2015.00131
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1The loop for learning and discriminating input–output behaviors. The loop starts by learning optimal input–output behaviors from a given PKN and initial dataset. Then, we try to discriminate among learned input–output behaviors as soon as we find more than one. Every time we discriminate among a set of behaviors, an optimal set of signaling perturbations is proposed. Next, both the set of perturbations and the corresponding measurements are added to the dataset used for learning and the loop starts over. When the learning method returns a single optimal input–output behavior, the workflow explores nearly optimal behaviors by considering a range of tolerances, first over the optimum model size and then over the optimum MSE.
Figure 2Learning MSE for . The X-axis shows the number of experiments (optimal signaling perturbations and measurements associated) used for learning at each iteration. The Y -axis shows the learning MSE obtained at each iteration, it represents the quality of the learned models with respect to the experiments used in the learning step.
Figure 3Testing MSE for . The red line represents the optimal MSE learned using the full available experimental datasets (214 experiments for in silico and 120 for real datasets). The X-axis shows the number of experiments (optimal signaling perturbations and measurements associated) used for learning at each iteration. The Y -axis shows the testing MSE obtained at each iteration, it represents the quality of the learned models with respect to the full experimental dataset at each iteration. Red boxplots are the results obtained when the set of signaling perturbations was composed of randomly selected experiments of size 74 or 80 for the in silico case, and 32 or 49 for the real case.
Figure 4Optimal experimental design to discriminate between 32 input–output behaviors. (A) Description of each experimental perturbation. Black squares indicate the presence of the corresponding stimulus (green header) or inhibitor (red header). (B) Number of pairwise differences by measured species with each experimental perturbation.
Figure 5Trajectories of the testing MSE for three significative cases for . The X-axis shows the number of experiments (optimal signaling perturbations and measurements associated) used for learning at each iteration. The Y -axis shows the testing MSE obtained at each iteration; it represents the quality of the learned models with respect to the full experimental dataset at each iteration.