| Literature DB >> 33802879 |
Taylor Firman1, Jonathan Huihui2, Austin R Clark1, Kingshuk Ghosh1,2.
Abstract
Learning the underlying details of a gene network with feedback is critical in designing new synthetic circuits. Yet, quantitative characterization of these circuits remains limited. This is due to the fact that experiments can only measure partial information from which the details of the circuit must be inferred. One potentially useful avenue is to harness hidden information from single-cell stochastic gene expression time trajectories measured for long periods of time-recorded at frequent intervals-over multiple cells. This raises the feasibility vs. accuracy dilemma while deciding between different models of mining these stochastic trajectories. We demonstrate that inference based on the Maximum Caliber (MaxCal) principle is the method of choice by critically evaluating its computational efficiency and accuracy against two other typical modeling approaches: (i) a detailed model (DM) with explicit consideration of multiple molecules including protein-promoter interaction, and (ii) a coarse-grain model (CGM) using Hill type functions to model feedback. MaxCal provides a reasonably accurate model while being significantly more computationally efficient than DM and CGM. Furthermore, MaxCal requires minimal assumptions since it is a top-down approach and allows systematic model improvement by including constraints of higher order, in contrast to traditional bottom-up approaches that require more parameters or ad hoc assumptions. Thus, based on efficiency, accuracy, and ability to build minimal models, we propose MaxCal as a superior alternative to traditional approaches (DM, CGM) when inferring underlying details of gene circuits with feedback from limited data.Entities:
Keywords: Maximum Caliber; gene network; inference
Year: 2021 PMID: 33802879 PMCID: PMC8002683 DOI: 10.3390/e23030357
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Comparison of accuracy between three models for SGAA. The first row reports the values of three known (“True”) rates for effective production () in the basal state, production () in the activated state, and protein degradation () used to generate the synthetic data. The inferred rates using three models, DM (second row), CGM (third row), and MaxCal (fourth row), are compared against each other and the “True” rates indicating that CGM is less accurate than DM and MaxCal. Error bars for rates were obtained by using inference on ten replicates of the input trajectory data.
| Method | |||
|---|---|---|---|
| True |
|
|
|
| DM |
|
|
|
| CGM |
|
|
|
| MaxCal |
|
|
|
Comparison of efficiency between three models for SGAA. Second column reports the maximum number of proteins used in FSP, third column shows the overall matrix dimension, fourth column reports the average time taken (using a CPU platform) for the basic matrix operation needed for a likelihood calculation, and fifth column reports the total time taken during the entire process of likelihood maximization to infer model parameters for the three different models (noted in the first column).
| Method | Max | Matrix Size | Unit Operation Time (ms) | Total Time (ms) |
|---|---|---|---|---|
| DM | 92 |
|
|
|
| CGM | 92 |
|
|
|
| MaxCal | 92 |
|
|
|
Comparison of accuracy between three models for TS. The first row reports the values of three known (“True”) rates for effective production () in the basal state, production () in the repressed state, and protein degradation () used to generate the synthetic data. The inferred rates using the three models, DM (second row), CGM (third row), and MaxCal (fourth row), are compared against each other and the “True” rates indicate CGM is less accurate than DM and MaxCal. Error bars for rates were obtained by using inference on ten replicates of the input trajectory data.
| Method | |||
|---|---|---|---|
| True |
|
|
|
| DM |
|
|
|
| CGM |
|
|
|
| MaxCal |
|
|
|
Comparison of efficiency between three models for TS. Second column reports the maximum number of proteins used in FSP, third column shows the overall matrix dimension, and fourth column reports the average time taken (using GPU platform) for the basic matrix operation needed for a likelihood calculation and fifth column reports the average of total time taken during the entire process of likelihood maximization to infer model parameters for the three different models (noted in the first column).
| Method | Max | Matrix Size | Unit Operation Time (s) | Total Time (s) |
|---|---|---|---|---|
| DM | 59 | 13,925 × 13,925 |
| 106,878 ± 19,765 |
| CGM | 59 |
|
| 14,124 ± 12,493 |
| MaxCal | 59 |
|
|
|