| Literature DB >> 36060282 |
Gustavo T Naozuka1, Heber L Rocha2, Renato S Silva1, Regina C Almeida1.
Abstract
Machine learning methods have revolutionized studies in several areas of knowledge, helping to understand and extract information from experimental data. Recently, these data-driven methods have also been used to discover structures of mathematical models. The sparse identification of nonlinear dynamics (SINDy) method has been proposed with the aim of identifying nonlinear dynamical systems, assuming that the equations have only a few important terms that govern the dynamics. By defining a library of possible terms, the SINDy approach solves a sparse regression problem by eliminating terms whose coefficients are smaller than a threshold. However, the choice of this threshold is decisive for the correct identification of the model structure. In this work, we build on the SINDy method by integrating it with a global sensitivity analysis (SA) technique that allows to hierarchize terms according to their importance in relation to the desired quantity of interest, thus circumventing the need to define the SINDy threshold. The proposed SINDy-SA framework also includes the formulation of different experimental settings, recalibration of each identified model, and the use of model selection techniques to select the best and most parsimonious model. We investigate the use of the proposed SINDy-SA framework in a variety of applications. We also compare the results against the original SINDy method. The results demonstrate that the SINDy-SA framework is a promising methodology to accurately identify interpretable data-driven models. Supplementary Information: The online version contains supplementary material available at 10.1007/s11071-022-07755-2.Entities:
Keywords: Data-driven methods; Differential equations; Model selection; Sensitivity analysis; Sparse identification
Year: 2022 PMID: 36060282 PMCID: PMC9424817 DOI: 10.1007/s11071-022-07755-2
Source DB: PubMed Journal: Nonlinear Dyn ISSN: 0924-090X Impact factor: 5.741
Mathematical definitions for the mean and the standard deviation for each iteration of the SINDy-SA method, according to three conditions: , , and . W denotes the window size of previous iterations used to check the condition (8)
Fig. 1Schematic representation of the framework for solving the problem of identifying nonlinear dynamical systems, using the proposed SINDy-SA approach. The flowchart detailing the SINDy-SA method is shown in Figure SM-1 in the Supplementary Material
Behavior of the SINDy-SA method for the prey–predator model (10), using the maximum degree of the polynomial and the tuning parameter We present , , , and at each iteration of the method, dropping the subscript to ease notation. The identified model is highlighted in italics
|
| Model |
|
|
|
|---|---|---|---|---|
| 0 |
| 0.065 | – | – |
| 1 |
| 0.125 | 0.065 | 0.651 |
| 2 |
| 0.153 | 0.095 | 2.990 |
| 3 |
| 0.205 | 0.114 | 3.648 |
|
|
| |||
| 5 |
| 1115.975 | 0.188 | 2.501 |
Fig. 2SSE between the derivatives and , mean and margin of error at each iteration of the SINDy-SA method, considering the iteration window and the scaling factor
Fig. 3Heatmap of the total scores for all candidate functions of the dynamical system in each iteration of the SINDy-SA method. The corresponding combined sensitivity index are indicated inside each heatmap cell. The darkest color indicates terms to be eliminated in the current iteration; the white color indicates terms eliminated in previous iterations; the lightest gray indicates more important terms
Model selection results for the prey–predator application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics
| Method | Model | AIC | AIC | BIC | AIC | AIC | BIC | ||
|---|---|---|---|---|---|---|---|---|---|
| SINDy-SA | |||||||||
| 2 | 18 | 0.950 | 0.000 | 0.000 | 0.000 | ||||
| 3 | 22 | 0.380 | 0.000 | 0.000 | 0.000 | ||||
| 4 | 28 | 650.070 | 291.753 | 301.250 | 384.106 | 0.000 | 0.000 | 0.000 | |
| SINDy | |||||||||
| 2 | 8 | 1339.752 | 396.384 | 397.138 | 422.771 | 0.000 | 0.000 | 0.000 | |
| 3 | 6 | 94.667 | 0.000 | 0.000 | 0.000 | ||||
| 4 | 4 | 760.018 | 275.005 | 275.210 | 288.198 | 0.000 | 0.000 | 0.000 | |
| 5 | 1 | 821.749 | 284.624 | 284.644 | 287.922 | 0.000 | 0.000 | 0.000 | |
| 6 | 6 | 1062.334 | 345.981 | 346.417 | 365.771 | 0.000 | 0.000 | 0.000 | |
| 7 | 8 | 1171.715 | 369.581 | 370.335 | 395.968 | 0.000 | 0.000 | 0.000 | |
| 8 | 4 | 1489.375 | 409.559 | 409.764 | 422.752 | 0.000 | 0.000 | 0.000 | |
| 9 | 3 | 1407.354 | 396.230 | 396.352 | 406.125 | 0.000 | 0.000 | 0.000 |
Fig. 4Comparison between the observed data and the numerical solution of the best predator–prey model (shown in the bottom right) identified by the proposed framework using either the SINDy-SA method or the original SINDy method. The observed data are simulated from the prey–predator model (10), which corresponds to the best identified dynamical system
Mathematical models identified by the SINDy-SA and SINDy methods after running all experiments using simulated data from the logistic model (11)
| Method | Model | Equation |
|---|---|---|
| SINDy-SA | 1 | |
| 2 | ||
| SINDy | 1 | |
| 2 | ||
| 3 |
Model selection results for the tumor growth application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics
| Method | Model | AIC | AIC | BIC | AIC | AIC | BIC | ||
|---|---|---|---|---|---|---|---|---|---|
| SINDy-SA | |||||||||
| 2 | 3 | 0.000 | 0.000 | 0.000 | |||||
| SINDy | |||||||||
| 2 | 2 | 153697672.305 | 3948.015 | 3948.055 | 3955.422 | 0.000 | 0.000 | 0.000 | |
| 3 | 1 | 2089401837.561 | 4728.908 | 4728.922 | 4732.612 | 0.000 | 0.000 | 0.000 |
Fig. 5Comparison between the observed data and the numerical solution of the best tumor growth models selected by the used information criteria, which are shown inside the graphs. Although the dynamics are quite similar, the model in (b) includes a source term (shown in gray) that is not present in the true model
Model selection results for the pendulum motion application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics
| Method | Model | AIC | AIC | BIC | AIC | AIC | BIC | ||
|---|---|---|---|---|---|---|---|---|---|
| SINDy-SA | 1. | ||||||||
| 2 | 17 | 0.057 | 0.000 | 0.000 | 0.000 | ||||
| 3 | 35 | 0.115 | 0.000 | 0.000 | 0.000 | ||||
| 4 | 43 | 0.014 | 0.000 | 0.000 | 0.000 | ||||
| SINDy | 1 | 10 | 0.003 | 0.000 | 0.000 | 0.000 | |||
| 2 | 6 | 0.204 | 0.000 | 0.000 | 0.000 | ||||
| 4 | 11 | 0.248 | 0.000 | 0.000 | 0.000 | ||||
| 5 | 12 | 0.331 | 0.000 | 0.000 | 0.000 | ||||
| 80 | 43 | 0.015 | 0.000 | 0.000 | 0.000 | ||||
| 81 | 25 | 0.005 | 0.000 | 0.000 | 0.000 | ||||
| 82 | 10 | 0.000 | 0.000 | 0.000 | |||||
| 83 | 41 | 0.001 | 0.000 | 0.000 | 0.000 | ||||
| 84 | 29 | 0.010 | 0.000 | 0.000 | 0.000 |
Fig. 6Comparison between the observed data and the numerical solution of the best pendulum motion model, identified by both SINDy-SA and SINDy frameworks, shown inside the graphs. The best identified dynamical system corresponds to the true model
Model selection results for the SIR application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics
| Method | Model |
|
| AIC | AIC | BIC | AIC | AIC | BIC |
|---|---|---|---|---|---|---|---|---|---|
| SINDy-SA |
|
|
| - | |||||
| 2 | 8 | 29.585 | 0.000 | 0.000 | 0.000 | ||||
| 3 | 8 | 28.116 | 0.000 | 0.000 | 0.000 | ||||
| 4 | 16 | 2.799 | 0.000 | 0.000 | 0.000 | ||||
| 5 | 19 | 2.201 | 0.000 | 0.000 | 0.000 | ||||
| 6 | 29 | 2.318 | 0.000 | 0.000 | 0.000 | ||||
| 7 | 35 | 24577261.088 | 4480.347 | 4487.270 | 4620.048 | 0.000 | 0.000 | 0.000 | |
| 8 | 41 | 23109275.001 | 4467.712 | 4477.332 | 4631.362 | 0.000 | 0.000 | 0.000 | |
| 9 | 37 | 23261656.824 | 4462.341 | 4470.109 | 4610.025 | 0.000 | 0.000 | 0.000 | |
| 10 | 40 | 23261940.744 | 4468.346 | 4477.482 | 4628.004 | 0.000 | 0.000 | 0.000 | |
| SINDy | 1 | 8 | 0.018 | -3987.467 | -3987.098 | -3955.535 | 0.000 | 0.000 | 0.000 |
| 2 | 5 |
| -8780.034 | -8779.881 | -8760.076 |
|
|
| |
| 3 | 5 | 52532578.252 | 4724.192 | 4724.344 | 4744.149 | 0.000 | 0.000 | 0.000 | |
| 4 | 3 | 21898540.949 | 4370.186 | 4370.247 | 4382.161 | 0.000 | 0.000 | 0.000 | |
| 5 | 6 |
| 0.000 | 0.000 | 0.000 | ||||
| 6 | 4 |
| 0.000 | 0.000 | 0.000 | ||||
| 7 | 1 | 254775423.290 | 5347.771 | 5347.781 | 5351.763 | 0.000 | 0.000 | 0.000 | |
|
|
| - | |||||||
| 9 | 5 |
| 7893.475 |
|
|
| |||
| 10 | 5 | 35834684.742 | 4571.185 | 4571.337 | 4591.142 | 0.000 | 0.000 | 0.000 | |
| 11 | 3 | 219088760.235 | 5291.409 | 5291.470 | 5303.384 | 0.000 | 0.000 | 0.000 | |
| 12 | 5 | 4855374087.697 | 6534.755 | 6534.907 | 6554.712 | 0.000 | 0.000 | 0.000 | |
| 13 | 2 | 254775423.002 | 5349.771 | 5349.802 | 5357.754 | 0.000 | 0.000 | 0.000 | |
| 14 | 2 | 208797458.581 | 5270.164 | 5270.195 | 5278.147 | 0.000 | 0.000 | 0.000 |
Fig. 7Comparison between the observed data and the numerical solution of the selected best SIR models, shown inside the graphs. Although the dynamics are quite similar, the model in (b) includes one term (shown in gray) that is not present in the true model
Best models identified by the SINDy-SA method for the applications considered in this work, whose structures correspond to the true models. Results were obtained using the lowest number of data points (m) and the highest noise intensities
| Application |
| Noise | Best identified model |
|---|---|---|---|
| Prey-predator | 70 | Multiplicative |
|
| Tumor growth | 7 | Multiplicative |
|
| Pendulum motion | 70 | Additive |
|
| Compartmental | 22 | Multiplicative |
|
Fig. 8Comparison between the noisy and limited data (described by circular and triangular points) and the numerical solution (indicated by continuous and dashed lines) of the best identified models for the applications considered in this work