| Literature DB >> 33286756 |
Hyobin Kim1,2, Stalin Muñoz3, Pamela Osuna4, Carlos Gershenson5,6,7.
Abstract
Robustness and evolvability are essential properties to the evolution of biological networks. To determine if a biological network is robust and/or evolvable, it is required to compare its functions before and after mutations. However, this sometimes takes a high computational cost as the network size grows. Here, we develop a predictive method to estimate the robustness and evolvability of biological networks without an explicit comparison of functions. We measure antifragility in Boolean network models of biological systems and use this as the predictor. Antifragility occurs when a system benefits from external perturbations. By means of the differences of antifragility between the original and mutated biological networks, we train a convolutional neural network (CNN) and test it to classify the properties of robustness and evolvability. We found that our CNN model successfully classified the properties. Thus, we conclude that our antifragility measure can be used as a predictor of the robustness and evolvability of biological networks.Entities:
Keywords: Boolean networks; antifragility; complexity; convolutional neural networks; evolvability; gene regulatory networks; prediction; robustness
Year: 2020 PMID: 33286756 PMCID: PMC7597304 DOI: 10.3390/e22090986
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1An example random Boolean network (RBN) with N = 3 and K = 2 and its state transition diagram. The topology is randomly generated and Boolean functions are randomly assigned to each node. The state transition diagram is composed of 23 = 8 state configurations from 000 to 111 and transitions among them. In the state transition diagram, attractors are the configurations with bold dashed lines, and basins of attraction are the configurations except for the attractors.
Different Boolean network models of 37 biological systems used for simulations.
| Biological Network 1 | No. of Nodes | No. of Links | Ref. |
|---|---|---|---|
| #1. Cortical area development (cortical) | 5 | 14 | [ |
| #2. Cell cycle transcription by coupled CDK and network oscillators (cycle-cdk) | 9 | 19 | [ |
| #3. Mammalian cell cycle (core-cell-cycle) | 10 | 35 | [ |
| #4. Toll pathway of drosophila signaling pathway (toll-drosophila) | 11 | 11 | [ |
| #5. Metabolic interactions in the gut microbiome (metabolic) | 12 | 30 | [ |
| #6. Regulation of the L-arabinose operon of Escherichia coli (l-arabinose-operon) | 13 | 18 | [ |
| #7. Lac operon (lac-operon-bistability) | 13 | 22 | [ |
| #8. Arabidopsis thaliana cell-cycle (arabidopsis) | 14 | 66 | [ |
| #9. Fanconi anemia and checkpoint recovery (anemia) | 15 | 66 | [ |
| #10. Cardiac development (cardiac) | 15 | 38 | [ |
| #11. BT474 breast cell line short-term ErbB network (bt474-ErbB) | 16 | 46 | [ |
| #12. SKBR3 breast cell line short-term ErbB network (skbr3-short) | 16 | 41 | [ |
| #13. Neurotransmitter signaling pathway (neurotransmitter) | 16 | 22 | [ |
| #14. HCC1954 breast cell line short-term ErbB network (hcc1954-ErbB) | 16 | 46 | [ |
| #15. Body segmentation in drosophila (body-drosophila) | 17 | 29 | [ |
| #16. CD4+ T cell differentiation and plasticity (cd4) | 18 | 78 | [ |
| #17. Budding yeast cell cycle (budding-yeast) | 18 | 59 | [ |
| #18. T-LGL survival network (t-lgl-survival) | 18 | 43 | [ |
| #19. VEGF pathway of drosophila signaling pathway (vegf-drosophila) | 18 | 18 | [ |
| #20. Oxidative stress pathway (oxidative-stress) | 19 | 32 | [ |
| #21. Human gonadal sex determination (gonadal) | 19 | 79 | [ |
| #22. Mammalian cell-cycle (mammalian) | 20 | 51 | [ |
| #23. Budding yeast cell cycle (yeast-cycle) | 20 | 42 | [ |
| #24. B cell differentiation (b-cell) | 22 | 39 | [ |
| #25. Iron acquisition and oxidative stress response in aspergillus fumigatus (aspergillus-fumigatus) | 22 | 38 | [ |
| #26. FGF pathway of drosophila signaling pathways (fgf-drosophila) | 23 | 24 | [ |
| #27. T cell differentiation (t-cell-differentiation) | 23 | 34 | [ |
| #28. Aurora kinase A in neuroblastoma (aurka) | 23 | 43 | [ |
| #29. Processing of Spz Network from the drosophila signaling pathway (spz-drosophila) | 24 | 28 | [ |
| #30. TOL regulatory network (tol) | 24 | 48 | [ |
| #31. HH pathway of drosophila signaling pathways (hh-drosophila) | 24 | 32 | [ |
| #32. HCC1954 breast cell line long-term ErbB network (hcc1954) | 25 | 70 | [ |
| #33. SKBR3 breast cell line long-term ErbB network (skbr3-long) | 25 | 81 | [ |
| #34. BT474 breast cell line long-term ErbB network (bt474) | 25 | 70 | [ |
| #35. Wg pathway of drosophila signaling pathways (wg-drosophila) | 26 | 29 | [ |
| #36. Trichostrongylus retortaeformis (trichostrongylus) | 26 | 58 | [ |
| #37. Pro-inflammatory tumor microenvironment in acute lymphoblastic leukemia (leukemia) | 26 | 81 | [ |
1 Data was obtained from Cell Collective (https://research.cellcollective.org/?dashboard=true#).
Figure 2The schematic diagram of the four classes on robustness and evolvability. Depending on the change of attractors between original and mutated networks, the network is certainly classified into one class among not robust & not evolvable, not robust & evolvable, robust & not evolvable, and robust & evolvable.
Figure 3An example showing how to calculate the emergence of each node and the average, E. Since average emergence E (of the network) is 0.612, the complexity C is . With initial configuration 100, the state transitions were obtained from t = 0 to t = 6 in the example RBN of Figure 1.
Figure 4Data points related to antifragility of the mammalian cell cycle network with N = 20. In the simulations, the parameters were set to perturbed node size X = [1, 2, …, 20], simulation time for state transitions T = 200, perturbation frequency O = 1: (a) 20 data points on antifragility of the original network; (b) 20 data points on antifragility of the mutated network; (c) 30 data points on the differences of antifragility estimated through interpolation in the normalized range.
Figure 5The illustration for the processes of the nested k-fold cross-validation (k = 4). In the inner loop, the values of the hyperparameters are set and the model parameters are fitted. In the outer loop, the model performance is evaluated.
Figure 6Convolutional neural network (CNN) architectures for simulations. (a) Our simple CNN model has two convolution layers, and one pooling layer. (b) Our complex CNN model has four convolution layers and two pooling layers.
Twelve hyperparameter sets for the simulations in the inner loop.
| Set | Epoch | Batch Size | Balancing | Architecture | AUC final |
|---|---|---|---|---|---|
| hyp1 | 128 | 32 | SMOTE | simple | 0.8265 |
| hyp2 | 128 | 64 | SMOTE | simple | 0.8295 |
| hyp3 | 128 | 128 | SMOTE | simple | 0.8251 |
| hyp4 | 128 | 32 | SMOTE | complex | 0.8075 |
| hyp5 | 128 | 64 | SMOTE | complex | 0.8101 |
| hyp6 | 128 | 128 | SMOTE | complex | 0.8056 |
| hyp7 | 128 | 32 | ADASYN | simple | 0.8151 |
| hyp8 | 128 | 64 | ADASYN | simple | 0.8124 |
| hyp9 | 128 | 128 | ADASYN | simple | 0.8137 |
| hyp10 | 128 | 32 | ADASYN | complex | 0.7922 |
| hyp11 | 128 | 64 | ADASYN | complex | 0.7931 |
| hyp12 | 128 | 128 | ADASYN | complex | 0.7941 |
Training accuracy.
| Row 1 | Row 2 | Row 3 | |
|---|---|---|---|
|
| 0.7266 | 0.7288 | 0.6929 |
|
| 0.7410 | 0.7362 | 0.7433 |
|
| 0.7324 | 0.7085 | 0.7134 |
|
| 0.6889 | 0.7030 | 0.7212 |
| avg. = 0.7197 |
Validation accuracy.
| Row 1 | Row 2 | Row 3 | |
|---|---|---|---|
|
| 0.5688 | 0.5802 | 0.5815 |
|
| 0.5965 | 0.5661 | 0.6003 |
|
| 0.6078 | 0.5828 | 0.5547 |
|
| 0.5682 | 0.5742 | 0.5630 |
| avg. = 0.5787 |
Figure 7Attractors and basins of attraction of the 37 biological networks. The number in the parenthesis on the y-axis points out the number of nodes of the network: (a) The number of attractors. The values took the natural logarithm ( 2.718) so 0 means that the biological network has a single attractor; (b) the average length of attractors; (c) normalized basin entropy. It has the range [0, 1]. The more even the basin sizes of attractors are in the state space, the larger the value normalized entropy will have.
Figure 8Percentage frequency distribution of the four classes on robustness and evolvability for the 37 biological networks. A different internal perturbation was added to each network 1000 times, so 1000 different mutated networks were generated per biological network. The perturbed networks were classified into not robust & not evolvable, not robust & evolvable, robust & not evolvable, or robust & evolvable.
A contingency table for the four classes depending on the mutation type.
| Add | Delete | Change | Flip | |
|---|---|---|---|---|
|
| 2180 | 549 | 1731 | 532 |
|
| 11443 | 1204 | 6067 | 1889 |
|
| 2380 | 642 | 1873 | 3893 |
|
| 958 | 241 | 754 | 664 |
Test accuracy.
|
| 0.5794 |
|
| 0.5676 |
|
| 0.5897 |
|
| 0.6015 |
| avg. = 0.5845 |
Figure 9Model evaluation: (a) Normalized confusion matrix; (b) micro-averaged precision-recall curve for the four classes and its average precision (AP) score for the test data.
Figure 10Precision-recall curves and average precision (AP) scores for the four classes for the test data: (a) Not robust & not evolvable; (b) not robust & evolvable; (c) robust & not evolvable; (d) robust & evolvable.
Comparison of AP values between our model and random classifiers.
| AP of Our Classifier | AP of Random Classifier | |
|---|---|---|
|
| 0.52 | 0.135 |
|
| 0.64 | 0.557 |
|
| 0.81 | 0.238 |
|
| 0.17 | 0.071 |