| Literature DB >> 35534510 |
Abstract
Expression of numerous genes is precisely controlled in a cell in various contexts. While genetic and epigenetic mechanisms contribute to this regulation, how each mechanism cooperates to ensure the proper expression patterns of the whole gene remains unclear. Here, I theoretically show that the repetition of simple biological processes makes cells functional with the appropriate expression patterns of all genes if the inappropriateness of current expression ratios is roughly fed back to the epigenetic states. A learning pair model is developed, in which two factors autonomously approach the target ratio by repeating two stochastic processes; competitive amplification with a small addition term and decay depending on the difference between the current and target ratios. Furthermore, thousands of factors are self-regulated in a hierarchical-pair architecture, in which the activation degrees competitively amplify, while transducing the activation signal, and decay at four different probabilities. Changes in whole-gene expression during human early embryogenesis and hematopoiesis are reproduced in simulation using this epigenetic learning process in a single genetically-determined hierarchical-pair architecture of gene regulatory cascades. On the background of this learning process, I propose the law of biological inertia, which means that a living cell basically maintains the expression pattern while renewing its contents.Entities:
Mesh:
Year: 2022 PMID: 35534510 PMCID: PMC9085877 DOI: 10.1038/s41598-022-10998-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Regulation of two factors through stochastic processes of increase and decrease. (a) Scheme of the simulation to reveal minimum processes required for learning. The non-negative integer values of factors A and B, x and x, change by repeating stochastic processes of increase and decrease. For every 10 repeats on average, A or B is selected at an A:B ratio, increasing the selected x or x by one unit. Each unit in x and x disappears at a probability of 0.1ε, due to the decay every repeat. See text for concrete examples, Table 1 for the definition of parameters, and Fig. 6 for the mathematical formula. (b–m) The number of repetitions of these processes are indicated on the x-axis (b–k, m). The values of x (blue) and x (orange) (b–h, j, k, m) and the x/x ratio (i, l) are shown. (b) Additive increase at a 1:1 ratio and decay with a constant probability. (c) Additive increase at a 1:2 ratio and decay with a constant probability. (d) Additive increase at a 1:1 ratio, and x and x decay with constant probabilities of 0.002 and 0.001, respectively. (e) Additive increase at a 1:1 ratio and decay with probability equal to the MSE between the current (x:x) and target (1:2) ratios. (f, g) Competitive amplification with a bias term (increase by one, selecting A or B at a (x + 1):(x + 1) ratio), and decay with a constant probability. (h, i) Competitive amplification and MSE-dependent decay. Target ratio T:T is changed from 1:2 to 10:1 after 105 repeats. (i) Ratios of x to x in 10 tests are shown in box and whisker plots indicating the interquartile range, 1.5 × the interquartile range, the mean (cross), and the data in outlier region (circle). (j) Non-competitive amplification and MSE-dependent decay. (k) Competitive amplification without additive increase setting bias β = 10−7, and MSE-dependent decay. (l, m) Competitive amplification without bias (A:B = (x + 10−7): (x + 10−7)) or additive increase (A:B = 1:1) is selected at a (1 − γ):γ ratio in an increase process. The x/x ratio at 105 repeats in 10 tests are shown in circles with the mean (cross) and median (red bar) in (m). The target ratio 0.5 is indicated by a dotted line.
Variables and parameters in the models.
| Indicator | Meaning | Values | Comments |
|---|---|---|---|
| Identifier of two factors | |||
| Value of each factor | Non-negative integer variables | Changing by increase and decrease | |
| Target ratio of each factor | 1:2 (Fig. | Calculated from RNAseq data (Figs. | |
| Probability to enter increase process at each repetition | Constant value 0.1 in Figs. | ||
| Variable (0.01–0.101) depending on the coverage of the pair in the whole in Fig. | |||
| Constant coefficient of decay probability | 0.1 | Applied in Figs. | |
| Probability to enter decrease process at each repetition | Applied in Figs. | ||
Bias. White noise, when In amplification, select A or B at a ( | 1 in Figs. | Constant value to increase additively and to avoid extinction in amplification | |
| Mean squared error between current and target ratios | Same value for | ||
| Error, which is equivalent to a parameter of decay probability | Constant (Fig. | ||
| Probability to choose additive increase among increase processes | 0 or an indicated constant in range from 0 to 1 | This | |
| 0 in other figures | |||
| Initial ratio | Ratio of each factor in the total at the initial setting | Even distribution in Figs. | |
| Expression ratio of genes from RNA-seq data in Figs. | |||
| Initial value of a branch in a pair | 1 in Figs. | ||
| Initial ratios are summed for genes in the branch of the pair, multiplied by the number of genes (11,281), and rounded to make an integer value, in Fig. | |||
In the learning pair model, the values of two factors, x and x, repeat stochastic processes of increase and decrease. In the increase process, either A or B is selected, and the value of the selected factor increases by one. In the increase process, competitive amplification or additive increase is chosen at (1 − γ):γ ratio. In competitive amplification, A or B is selected at the (x + β):(x + β) ratio. In the additive increase, A or B is selected at a 1:1 ratio. In the decrease process, x decreases by decay at a probability depending on the error value ε.
Figure 6The law of biological inertia. (a) A conventional model of expression regulation of genes A, B and C. Each gene is regulated by others. The networks become complicated as the number of genes increases. (b) In the learning hierarchical-pair model, each gene or gene-cluster changes the expression level by amplification and decay, which are represented by self-regulation. Competition in each pair is presented as mutual inhibition. Two components in each dotted oval indicates a pair in a hierarchical architecture. The process in each pair proceeds in parallel. (c) The equation represents the law of biological inertia, which means that a living cell basically keeps the expression pattern while renewing the contents. The equation indicates that the ith gene transcription level changes by competitive amplification, error-dependent decay, and additive increase. In Fig. 1 in the range Σx >> Σβ, x {x, x}, A = α(1 − γ), E = αε, and B = γαβ/Σ(x + β). The openness degree of chromatin at the ith gene locus, x, is maintained when Ax/Σx >> B, where B is small noise, and E is the common lowest value.
Assumptions in the learning hierarchical-pair model are supported by biological knowledge.
| Model assumptions | Biological findings | Regulation |
|---|---|---|
| Competition | A transcription factor chooses a binding locus among candidates, depending on the openness ratio of the chromatin | Epigenomic |
| Amplification | Transcriptional coactivators with histone acetyltransferase activity relax the chromatin structure | |
| Transcription opens the chromatin, and the open chromatin structure induces transcription | ||
Bias (no extinction) Additive increase | Whole-genome in every somatic cell | Genetic |
| Conventional genetic regulation of transcription | ||
| Error (approximated)-dependent decay | Cellular stress responses | Dependent on cell and environment Feedback from the current fitness |
| Rough evaluation of the current state | ||
| Histone deacetylases and DNA methyltransferases close the chromatin structure | ||
| Non-coding RNA-dependent cleavage | ||
| RNA-mediated epigenomic modification | ||
| Hierarchical-pair architecture | Signal transduction cascades for gene expression | Genetic |
| Topologically associated domains (TADs) | ||
| Competitive amplification in hierarchical pairs | Active and expressed cascades are preferentially selected and activated | Cell-type dependent Post-translational |
| Kinase is activated by phosphorylation at multiple sites | ||
| Error-dependent decay in hierarchical pairs | Cellular stress responses | Dependent on cell and environment |
| Dephosphorylation | ||
| Polyubiquitin dependent degradation | ||
| RNA-mediated epigenomic modification |
Epigenetic regulations, which are highly variable depending on cell type, can be interpreted as a process of competitive amplification. The decay rate is roughly regulated at several levels by the fitness of the current expression pattern in each pair. The correct expression level of each gene is not supervised in real cells. Instead, two functionally related gene-groups are regulated in a pair, in which the inappropriate expression ratio induces cellular stress, increases the decay, and destabilizes the ratio. As a possible feedback regulation for the error-dependent decay, cleaved mRNA fragments coding excessive proteins may close the genome loci. Hierarchical pairs are genetically determined and consistent in all cell-types.
Figure 2Regulation of multiple factors by using hierarchical pairs and approximated error. (a–d) Stochastic processes of competitive amplification with β = 1 and MSE-dependent decay are repeated in the model with two, four, and eight factors. (a) Target ratios of each factor in three simulation conditions are shown as three lines with markers. (b) Competition and MSE are calculated in one list that include all factors. The ratios of each factor after 106 repeats are shown. (c, d) The ratios of eight or four factors are determined by hierarchical pairs (d). Each pair independently repeats the stochastic processes 105 times. (e–h) The results after 105 repeats in the model regulating 64 factors in hierarchical pairs. In MSE, the error is calculated with full accuracy. In “stepwise”, MSE is rounded every 10 folds. The number of steps of error (g) indicates the possible error levels. In “shuffle” (h), the factor with each target value is randomly set in the hierarchical pairs. In (e), correlation coefficient between the target and result ratios in 10 tests are shown in box and whisker plots indicating the interquartile range, 1.5 × the interquartile range, the mean (cross), and the data in outlier region (circle). Black lines in (f–h) indicate the target ratios. (i–m) The results in the model with 212 factors after 105 repeats. Accurate MSE (i) or approximated stepwise error (j–m) is applied to decay probability. (i, j) The factor with each target value (range 1–4096 as indicated by a red line) is randomly set in the hierarchical pairs. (k) The target ratio is set using the expression ratio in E. coli without antibiotics. Initial condition of pairs is an even distribution. (l, m) Subsequently, from the (k) state, the target ratios for the next 105 repeats are reset using gene expression data in the presence of antibiotics. The ratios of the factors before (l) and after (m) the second 105 repeats are shown, where r is the correlation coefficient.
Figure 3Hierarchical clustering of genes for the learning pair model. (a, b) Dendrograms (a) are generated using hierarchical clustering methods from the gene expression pattern (b). (c) Using the expression of 16,921 genes in 20 cells from human early embryos, 6 clustering methods generated hierarchical pairs with the indicated number of layers. In 12 tests using 3 zygote data for initial setting and 12 4-cell data for target ratio, each pair in hierarchical pairs changes the values for 105 repeats in the learning pair model with stepwise error. The correlations to the target ratio are shown with the mean (red bar). Paired t-test is used for statistical analysis. (d) Using hierarchical pairs generated with the AreaSum method and stepwise error, the change from zygote to 4-cell stage is tested. Relative expression ratios of 16,921 genes before and after 105 repeats are plotted against the target ratio. (e) Gene expression ratios from another dataset of the 4-cell stage are plotted against the target ratio using published scRNA-seq data[19]. (f) In the same settings in (d), the initial and target ratios are independently shuffled. In the dot plots, relative ratios are plotted after adding 10−6 to all genes.
Figure 4Model with an mRNA pool. In the model with an mRNA pool and hierarchical pairs generated by the AreaSum method, stochastic processes are repeated for 5 × 105 times. The expression probability is equivalent to the expression ratio in Fig. 3d. (a) The change from zygote to 4-cell stage with stepwise error. (b–g) Model with 3-step error. (b) The change from zygote to 4-cell stage is tested. (c) Initial and target ratios are set with independently shuffled zygote and 4-cell data. Ratios after 106 repeats are shown. (d–f) Initial and target ratios are set with zygote and blastocyst. Simulation data of mRNA and expression probability are recorded every 250 repeats. (d) Correlation coefficient between the mRNA ratio and the indicated scRNA-seq data. (e) Correlation coefficient between the expression probability and scRNA-seq data are plotted until 2 × 104 repeats. (f) The amount of indicated genes in the mRNA pool is plotted. The numbers in parentheses indicate the target level of each gene. The dendrogram indicates the layers in which the genes are paired in the hierarchical pairs. (g–i) Bias β is set 10−7, not 1, for the homeostatic state. (g) Model with 3-step error. Initial and target ratios are set with the same 4-cell data. (h, i) Model with 4-step error. (h) Initial and target ratios are set with the 4-cell data. (i) Initial and target ratios are set with zygote and 2-cell stage. In the dot plots, relative ratios are plotted after adding 10−6 to all genes.
Figure 5Single model of whole gene expression in early embryogenesis and hematopoiesis. (a–d) Learning hierarchical-pair model with 11,281 genes, an mRNA pool, and 4-step error is applied to differentiation from human zygote to blastocyst. (a) The initial ratio is set with zygote data. Target ratio is changed every 5 × 105 repeats from zygote to 2-cell, 4-cell, 8-cell, morula, and blastocyst stages. Bias β is set 10−7. Correlation coefficient between the ratios of mRNA and each target are calculated every 250 repeats. (b) Ratios of 11,281 genes in mRNA pool after 1.5 × 106 repeats are plotted against the target 4-cell data. (c) Bias β is changed to 1 after 1.5 × 106 repeats in (a). Target ratio is changed from 8-cell to morula and blastocyst stages every 5 × 105 repeats. (d) Ratios of mRNA at 3 × 106 repeats in (c) are plotted against the target blastocyst data. (e–j) The same model is applied to hematopoietic differentiation from progenitors (MLP and GMP) to PBMCs (B cell, T cell, and myeloid cell). Initial ratio is set with a progenitor. During the first 5 × 105 repeats, bias β is set 10−7 and the target ratio is set with the same progenitor. During the next 5 × 105 repeats, bias β is set 1 and the target ratio is changed to a PBMC. During the last 5 × 105 repeats, bias β is set 10−7, keeping the same PBMC target. In the dot plots, relative ratios are plotted after adding 10−6 to all genes.