Literature DB >> 36060282

SINDy-SA framework: enhancing nonlinear system identification with sensitivity analysis.

Gustavo T Naozuka¹, Heber L Rocha², Renato S Silva¹, Regina C Almeida¹.

Abstract

Machine learning methods have revolutionized studies in several areas of knowledge, helping to understand and extract information from experimental data. Recently, these data-driven methods have also been used to discover structures of mathematical models. The sparse identification of nonlinear dynamics (SINDy) method has been proposed with the aim of identifying nonlinear dynamical systems, assuming that the equations have only a few important terms that govern the dynamics. By defining a library of possible terms, the SINDy approach solves a sparse regression problem by eliminating terms whose coefficients are smaller than a threshold. However, the choice of this threshold is decisive for the correct identification of the model structure. In this work, we build on the SINDy method by integrating it with a global sensitivity analysis (SA) technique that allows to hierarchize terms according to their importance in relation to the desired quantity of interest, thus circumventing the need to define the SINDy threshold. The proposed SINDy-SA framework also includes the formulation of different experimental settings, recalibration of each identified model, and the use of model selection techniques to select the best and most parsimonious model. We investigate the use of the proposed SINDy-SA framework in a variety of applications. We also compare the results against the original SINDy method. The results demonstrate that the SINDy-SA framework is a promising methodology to accurately identify interpretable data-driven models. Supplementary Information: The online version contains supplementary material available at 10.1007/s11071-022-07755-2.

© The Author(s), under exclusive licence to Springer Nature B.V. 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Entities: Chemical

Keywords: Data-driven methods; Differential equations; Model selection; Sensitivity analysis; Sparse identification

Year: 2022 PMID： 36060282 PMCID： PMC9424817 DOI： 10.1007/s11071-022-07755-2

Source DB: PubMed Journal: Nonlinear Dyn ISSN： 0924-090X Impact factor: 5.741

Introduction

Machine learning methods have been commonly used to understand behaviors, recognize patterns and make predictions from experimental data. Furthermore,another application of these methods, which has become popular in recent years, is the structural identification of mathematical models [2, 6, 7, 20, 25, 29, 35, 37]. These models, in turn, help to interpret the dynamics and allow the use of tools for mathematical analysis. Recently, Brunton et al. [2] have developed the sparse identification of nonlinear dynamics (SINDy) method, combining sparsity-promoting techniques and machine learning with nonlinear dynamical systems to discover governing equations from noisy measurement data. The only assumption about the structure of the mathematical model is that there are only a few important terms that govern the dynamics. Thus, by defining a prior set of possible functions, the authors solved a sparse regression problem with the objective of determining the smallest number of terms in this set, for each equation of the dynamical system, required to accurately represent the data. The SINDy method has been applied to solve many system identification problems, such as the dynamics of COVID-19 global transmission [17], the Duffing oscillator [23], and empirical data from measles, varicella, rubella, and chickenpox datasets [12, 13]. Several other related approaches have been published in the literature in order to recover the governing equations from data. França et al. [7] combined several machine learning techniques to improve the robustness to noise in system identification problems. Cortiella et al. [6] proposed an iterative sparse-regularized regression method to recover nonlinear dynamical systems from noisy measurement data. The work aims to improve the accuracy and robustness of SINDy in the presence of noisy state measurements, by developing a reweighted -regularized least-squares solver. Niven et al. [25, 26] used an inverse Bayesian method for system identification from time-series data and compared two Bayesian methods, based on the joint maximum a posteriori and variational Bayesian approximation, to the SINDy approach. These methods are also used to quantify the variances of the model parameters. Maddu et al. [20] proposed a statistical learning framework based on group-sparse regression to discover mathematical models from data. This framework can enforce conservation laws, ensure model equivalence and symmetries, by using the group iterative hard thresholding algorithm and stability selection. Wang et al. [35, 36] presented a variational framework for system identification of partial differential equations, based on stepwise regression. The authors also addressed the influences of variable fidelity and noise in the measurement data. Yang et al. [37] presented a machine learning framework for Bayesian system identification from noisy, sparse, and irregular observation data. This framework uses differentiable programming and performs Bayesian inference using Hamiltonian Monte Carlo sampling. Rudy et al. [29] proposed a sparse regression method to extract partial differential equations from time-series measurements in the spatial domain. This method selects the nonlinear and partial derivative terms of the governing equations that best fit the data. In addition, there are a number of extensions of the SINDy approach to improve its accuracy and robustness or to discover other types of mathematical models. Subramanian [33] incorporated a scientific machine learning approach in the context of clinical trials to discover governing equations of idiopathic pulmonary fibrosis disease progression. This approach combines machine learning techniques, statistical methodologies, and scientific computing tools such as bootstrap sampling, cubic spline interpolation, Bayesian inference, and SINDy to discover the dynamics and quantify the uncertainty in the model parameters. Hoffmann et al. [11] extended the SINDy method to vector-valued ansatz functions, in order to estimate effective reaction networks from observations. In the proposed “reactive SINDy” approach, each function represents a particular reaction process. Boninsegna et al. [1] also extended the SINDy method to discover stochastic dynamical systems of biophysical processes. Jiang et al. [15] proposed a SINDy-LM modeling method, in which the SINDy approach is used to discover nonlinear dynamical systems from observation data and the Levenberg-Marquardt algorithm is used to improve the accuracy of the mathematical model identified by the SINDy algorithm. Hirsh [10] developed the theory of existing methods and proposed new techniques to model spatiotemporal data. These data-driven methods include dynamic mode decomposition (DMD), a dimensionality reduction method for time-varying linear dynamics, the Hankel Alternative View of Koopman (HAVOK) algorithm, and an uncertainty quantification for sparse identification of nonlinear dynamics (UQ-SINDy) method. The modified SINDy approach uses compressed sensing and Bayesian statistics to discover governing equations from data and quantifies model uncertainties. Brunton et al. [3] generalized the SINDy algorithm to identify nonlinear dynamical systems with external inputs and feedback control. Mangan et al. [21] proposed an alternative data-driven technique to identify networked nonlinear dynamical systems by using the SINDy algorithm. This technique can be used to recover governing equations that have rational function nonlinearities with cross terms. Kaheman et al. [16] also developed a variant of the SINDy algorithm to identify implicit dynamics and rational nonlinearities, by using multiple optimization algorithms and a model selection approach. This variant can be used to recover implicit differential equations and conservation laws from limited and noisy data. Quade et al. [28] proposed a conceptual framework to recover dynamical systems in response to abrupt changes from limited data. This framework first detects the abrupt change and then applies the SINDy method to update a previously identified mathematical model with the fewest changes. Although the SINDy method is often used to identify the structure of the mathematical model, this identification depends on the suitable choice of a threshold, which is used in the process of eliminating terms from the governing equations. One way to bypass the difficulty of this choice is to define a set of values for the threshold, run the method for each defined value, and select the best model from the resulting set of models, given the experimental data [2, 22]. However, the best value for the threshold may not belong to the defined prior set, resulting in incorrect identification of the dynamical system. Thus, to discover the most parsimonious model that best fits the data, one must run the SINDy method for a sufficiently large set of threshold values, which consequently increases the computational cost of solving the problem. On the other hand, sensitivity analysis (SA) is a technique that allows ranking the parameters of a mathematical model according to their importance in relation to the desired quantity of interest [5, 31]. In our case, the quantity of interest is associated with the formulation of the sparse regression problem, described in detail in Sect. 2.1. In this work, we propose the SINDy-SA method that leverages the original SINDy approach, replacing the need to determine the threshold with a global sensitivity analysis technique. In particular, we use the elementary effects method, a simple global SA approach, able to rank the model parameters according to their importance. We integrate the SINDy-SA method in a general framework that encompasses other issues associated with the problem of nonlinear system identification. Specifically, we design different experimental settings, varying, for example, the set of possible candidate functions for the governing equations. After applying the system identification method for each experiment, a model recalibration step is performed to improve the accuracy of the parameter estimates. We then compare the resulting set of models against the experimental data in two ways: (i) constructing the Pareto curve that balances the accuracy and complexity of the models, and (ii) employing model selection criteria to select the best model. Any suitable model selection method can be used. In this work, we employ three different information criteria: the first- and second-order Akaike and the Bayesian information criteria, which weigh both the goodness-of-fit and the number of model parameters given the data. The framework using our SINDy-SA method is evaluated in a variety of applications using simulated data from a tumor growth model, a prey-predator model, a pendulum motion equation, and a compartmental model, and the obtained results are compared with the original SINDy method. To this end, we replace the SINDy-SA component with the original SINDy within the proposed general framework. We anticipate that the SINDy-SA framework is able to correctly identify the true model in all considered applications, outperforming the framework with the original SINDy method. This paper is organized as follows. In Sect. 2, we explain some preliminary concepts required to understand the development of the proposed SINDy-SA method. Moreover, we describe in detail the SINDy-SA method and its implementation and show our general framework for solving system identification problems. In Sect. 3, we present the results obtained for the different applications by employing the frameworks with our method and the original SINDy approach. Finally, in Sect. 4, we point out some final remarks.

Problem statement and the SINDy-SA framework

Before introducing the new SINDy-SA method, we present in Sect. 2.1 some preliminary concepts about the sparse regression problem. Section 2.2 details our proposed SINDy-SA approach and its implementation. Finally, Sect. 2.3 describes our developed general framework for solving system identification problems.

Problem statement

Consider the problem of determining the dynamics of a system of n variables , . Under the assumption of spatial homogeneity, we assume that the rate of change of these variables is given by the following dynamical system:in which the vector function that we want to determine defines the interplay among the state variables. We approach the problem of determining the mathematical model (1) using sparse regression techniques, based on the fact that has only a few terms that govern the dynamics of the system. The sparse regression technique is built upon the following available measured data that display a temporal history of the state variable vector at multiple time instants :From these data, we determine the corresponding temporal history of the state variable derivatives , usually approximated numerically, which is likewise arranged in the form of an matrix denoted by . With these definitions, the sparse regression problem aims to determine the sparse matrix of coefficients so thatin which is the matrix of candidate nonlinear functions from the columns of , and is the basis dimension for the function library . This meaningful library includes candidate functions (polynomials, trigonometric functions, etc.) for the right-hand side of Eq. (1) and is built from the modeler’s a priori knowledge of potential functions capable of describing the experimental data behavior. The sparse regression problem (3) can be solved using classical regression methods such as least-squares, ridge, lasso, and elastic net [8, 14]. The original SINDy approach is implemented using the sequential thresholded ridge regression (STRidge) method [2]. Specifically, the SINDy method requires the definition of a threshold for the values of the components of the sparse vector below which the corresponding library functions are eliminated from the set of possible functions. The chosen threshold, denoted here by , is decisive for the correct identification of the mathematical model. Thus, determining its best value can be a hard task, especially when the dynamical system has parameter values with very different orders of magnitude. In such cases, additional approaches to overcome this difficulty are required [2, 22]. For more information, we refer the reader to the Supplementary Material (Section SM-1). After determining the sparse matrix , the desired identified system is obtained as

The proposed SINDy-SA approach

The original SINDy method depends on the proper choice of the threshold, necessary to eliminate candidate functions from the governing equations. Due to the difficulty of this choice and the consequent increase in computational cost to determine its best value, as suggested in Brunton et al. [2], we propose in this work to integrate the sparse regression problem with a global SA technique. By using a SA technique that is able to rank the coefficient terms according to their importance with respect to , the selected quantities of interest (QoIs), we overcome the need of using a pre-defined threshold to eliminate candidate functions. Instead, the non-influential terms can be eliminated and the accuracy of the regression verified. In particular, we use the Morris method, also known as the elementary effects (EE) method, due to its simplicity and low computational cost [31]. Moreover, the EE method assesses not only the overall importance of the coefficients but also their interactions and nonlinear effects. The implementation of the proposed approach was carried out in the Python programming language, based on the PySINDy module, a sparse regression package containing different implementations for the SINDy method [32]. The EE method was developed using the SALib module, composed of commonly used SA methods [9]. Our proposed SINDy-SA sparse regression method is an iterative process encompassing the ridge regression estimation, the error computation, and the SA technique. At every -th iteration, non-influential terms are eliminated until only influential terms remain. Algorithm 1 presents a pseudocode of the SINDy-SA method for solving the sparse regression problem (3), in which describes the estimate for the sparse vectors of coefficients in the iteration . Additionally, Figure SM-1 in the Supplementary Material illustrates a flowchart of our method, graphically representing each instruction of Algorithm 1. We detail the three main features of our SINDy-SA method in the following.

Ridge regression

In our SINDy-SA approach, we start by assigning to each column of the corresponding least-squares solution with -regularization. Specifically, defining as the k-th column of , we use the ridge regression [8, 14] . Of note, we have evaluated commonly used statistical regression methods to identify nonlinear dynamical systems, and have chosen the ridge regression method for its robustness and accuracy. It is defined as:to estimate the coefficient values, balancing two different criteria: the residual sum of squares and a penalty term that has the effect of preventing overfitting. The tuning parameter serves to control the relative impact of these two terms on the coefficient estimates. Different values are likely to lead to different coefficient estimates and, ultimately, to different models. James et al. [14] suggest that a possible way to select a value for is to use the cross-validation method. By defining a prior set of values and computing the cross-validation error for each defined value, they select the value of with the smallest cross-validation error. Inspired by this approach, our proposed framework also defines a set of values that composes different simulation scenarios together with the possible set of candidate functions. Ultimately, the best identified model is chosen by further employing a model selection technique, as detailed in Sect. 2.3. Mathematical definitions for the mean and the standard deviation for each iteration of the SINDy-SA method, according to three conditions: , , and . W denotes the window size of previous iterations used to check the condition (8)

Error computation

After the ridge regression estimation, we then calculate the -th estimate for the derivative based on the library and the estimates for the sparse vectors of coefficients obtained by the ridge regression, according to the following equation:Next, we calculate the sum of squared errors (SSE) between the derivatives , measured or approximated from the data, and , given by Eq. (6), through:where and . We store the SSE values for each iteration of the SINDy-SA method in order to determine whether the error between the current and previous iterations has significantly increased. In the first iteration (), no error comparison is performed. The iterative process continues until a significant increase in the SSE occurs, ending the process. Defining what is significant is a tricky task since SSE fluctuations can occur depending on the problem. To overcome this issue, we define an iteration window, denoted by W, in which we collect statistics from part of the previous evolution of the SSE, which are used as a reference. In mathematical terms, we check the condition:where and represent, respectively, the mean and the standard deviation of the errors of previous iterations belonging to W, and is a scaling factor. Some adjustments are required when and are given according to the conditions described in Table 1.

Table 1

Mathematical definitions for the mean and the standard deviation for each iteration of the SINDy-SA method, according to three conditions: , , and . W denotes the window size of previous iterations used to check the condition (8)

	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau = 1$$\end{document}τ=1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1< \tau < W$$\end{document}1<τ<W	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau \ge W$$\end{document}τ≥W
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {M}_{\tau }$$\end{document}Mτ	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {SSE}_{0}$$\end{document}SSE0	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dfrac{1}{\tau } \sum \limits _{i = 0}^{\tau - 1} \text {SSE}_{i}$$\end{document}1τ∑i=0τ-1SSEi	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dfrac{1}{W} \sum \limits _{i = \tau - W}^{\tau - 1} \text {SSE}_{i}$$\end{document}1W∑i=τ-Wτ-1SSEi
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {D}_{\tau }$$\end{document}Dτ	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$0.1\ \text {SSE}_{0}$$\end{document}0.1SSE0	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{\dfrac{1}{\tau } \sum \limits _{i = 0}^{\tau - 1} \left( \text {SSE}_{i} - \mathcal {M}_{\tau } \right) ^{2}}$$\end{document}1τ∑i=0τ-1SSEi-Mτ2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{\dfrac{1}{W} \sum \limits _{i = \tau - W}^{\tau - 1} \left( \text {SSE}_{i} - \mathcal {M}_{\tau } \right) ^{2}}$$\end{document}1W∑i=τ-Wτ-1SSEi-Mτ2

Note that, for the second iteration of the algorithm (), the standard deviation would be zero considering only the error of the previous iteration. For this reason, we consider as of the SSE obtained in the first iteration. The values of the parameters W and can be determined by observing the sequence of SSE values, numerically or graphically, and the goodness-of-fit of the resulting mathematical model simulation against the experimental data. In all our experiments, we set the window size iterations, while the parameter varies for each experiment. If the condition (8) is satisfied, the algorithm ends, and the coefficient estimates of the iteration , whose error was considered significantly smaller compared to the iteration , are used for the construction of governing equations. Otherwise, the procedure continues and performs the SA of model parameters.

Sensitivity analysis

In the SA of model parameters, we firstly determine admissible variations for the non-zero coefficients, being of the estimated value for the coefficient. Given one or more analysis time instants, we calculate the sensitivity indices and that indicate the influence and nonlinear importance of the i-th parameter on the k-th QoI, respectively. The higher the values of these sensitivity indices, the more influential are the corresponding parameters, so that small variations in their values have a great impact on the estimation of . For more details about the EE method, see Supplementary Material (Section SM-2). In order to facilitate sorting parameters by their order of importance with respect to QoI and to take into account both their direct and nonlinear effects, we combine the sensitivity indices into a single metric:In this way, the greater the value of , the greater is the overall influence of the i-th parameter on the k-th QoI. For each equation of the dynamical system, we order the sensitivity indices from lowest to highest, we eliminate the least important terms, ensuring that the influential terms are kept. Thus, each governing equation must contain at least one active candidate function at the end of the entire process, assuming that the modeler knows a priori the number of state variables in the system. Schematic representation of the framework for solving the problem of identifying nonlinear dynamical systems, using the proposed SINDy-SA approach. The flowchart detailing the SINDy-SA method is shown in Figure SM-1 in the Supplementary Material We develop a scoring scheme to integrate SA at multiple time instants. Each model parameter receives a score , depending on its rank of importance which is updated over the analysis times, starting from 0. Of note, and , where n is the number of state variables, and d is the number of model parameters. Once, at a given time of analysis, the model parameters are ordered from the least important parameter to the most important parameter , we ensure that the latter terms are kept by assigning to . Next, we increment an increasing score to the total score for the remaining parameters of the model. In this way, more influential parameters receive a lower score, and parameters with less influence receive a higher score. After analyzing all time instants, we check if for all and , meaning that all parameters are important. Otherwise, we eliminate the terms that have the highest total score , being able to eliminate more than one candidate function for each state variable. Algorithm 2 presents the pseudocode of the procedure for sensitivity analysis at multiple time instants. If all parameters are considered important, the algorithm ends, in which case the coefficient estimates of the iteration are used to construct the governing equations. Otherwise, the procedure returns to the first step calculating a least-squares solution with -regularization for . It is important to observe that, in our approach, it is not necessary to specify a maximum number of iterations because eventually or all the model parameters are considered important or condition (8) is satisfied. Indeed, if a model parameter is erroneously eliminated according to SA, the SSE between the current and previous iterations tends to increase, and the condition (8) must be satisfied.

General framework for system identification

The proposed framework for solving the problem of identifying nonlinear dynamical systems is schematically represented in Fig. 1. It contains four distinct components that encompass the sequence of tasks required for ultimately obtaining the best identified model.

Fig. 1

Schematic representation of the framework for solving the problem of identifying nonlinear dynamical systems, using the proposed SINDy-SA approach. The flowchart detailing the SINDy-SA method is shown in Figure SM-1 in the Supplementary Material

The first component is the Input block in which we gather the available experimental data set and the modeler’s prior knowledge of a library of candidate basis functions that have the potential to explain the data. Such prior knowledge about expected terms in plays a crucial role in the dynamical system identification and supports the construction of the Experimental set. In this second component, we investigate different potential configurations in the modeling by varying and the regularization parameter of the ridge regression. Each experiment is then submitted to the model identification block, the central core of the framework. In this step, the system identification method (SINDy-SA or the original SINDy) is employed for each experimental configuration. In Fig. 1, the SINDy-SA block illustrates its main three steps described in Sect. 2.2: the ridge regression for estimating the sparse vectors of coefficients , the computation of SSE between the derivatives and , and the sensitivity analysis to eliminate less important terms from the governing equations. Once the mathematical model is identified, the coefficients learned by the optimizer are recalibrated to improve accuracy. In this work, we use the Levenberg–Marquardt algorithm due to its robustness, as suggested in Jiang et al. [15]. After performing the model identification step for the N experiments, we discover a set of at most N data-driven models (Model set component). The best and most parsimonious data-driven model, which composes the Output block of the framework, is then selected using model selection techniques, as suggested in Kaheman et al. [16] and Mangan et al. [21]. In this step, we only consider candidate models whose goodness-of-fit between their simulations and the available data are computable. We use some of the most used model selection methods for comparing both nested and non-nested models: the Akaike Information Criterion (AIC), the second-order Akaike (AIC), and the Bayesian Information Criterion (BIC) [4]. The probability of the i-th model being the best given the data is obtained by each criterion weights , which improve information about the selected model. More details on the information criteria and their weights are provided in the Supplementary Material (Section SM-3) for completeness. An important remark still has to be made when the original SINDy is used as the system identification method in the developed framework because the need of defining the threshold . To this end, the construction of the experimental set must also include a value for . This means that, when the original SINDy is used in the proposed framework, the experimental set is defined by the triples . In this way, we also vary possible values for the SINDy threshold, as suggested in Brunton et al. [2].

Applications and results

In this section, we investigate the use of the proposed SINDy-SA framework in a variety of applications with different behaviors. We use simulated data from a prey–predator model (Sect. 3.1), a tumor growth model (Sect. 3.2), a pendulum motion equation (Sect. 3.3), and a compartmental model (Sect. 3.4). The results are compared against the original SINDy approach, as the system identification method within the proposed framework. We also investigate the proposed SINDy-SA framework in scenarios with limited and noisy data. In all applications, we use the LSODA (Livermore Solver for Ordinary Differential Equations) method [27] to numerically simulate the true models and generate measurement data. It is also employed to solve the identified models and compare their simulations against the data. LSODA is a well-known numerical solver for ordinary differential equations that uses adaptive step sizes and automatically switches between the nonstiff Adams method and the stiff BDF method. These properties provide robustness to the framework. We approximate the derivative from the generated data using the second-order finite difference method. As described in Sect. 2.3, in order to verify if the system identification methods are capable of discovering known mathematical models, we define the experimental settings by choosing candidate functions that include the terms present in the true model with additional functions to be eliminated from the governing equations. In addition, we define the values for the regularization parameter and the threshold by varying different orders of magnitude. At the end of the process, the modeler may need to refine these choices and build new experimental settings. Behavior of the SINDy-SA method for the prey–predator model (10), using the maximum degree of the polynomial and the tuning parameter We present , , , and at each iteration of the method, dropping the subscript to ease notation. The identified model is highlighted in italics

Prey–predator model

In our first application, we use the prey–predator model to generate the simulated data that will be used in the sparse identification problem. This model describes the evolution of interactions between prey and predator over time t, until , and is given by:where x(t) and y(t) denote prey and predator populations, respectively. In this illustrative example, we use the prey birth rate , the predator death rate , and the interaction coefficients and , and the initial conditions are set equal to . The training set is built by simulating the model and collecting the numerical solutions at equally distributed time points. Using these collected solutions, we approximate the derivative and build a library consisting only of polynomials, varying their maximum degree , . For example, if the maximum degree of the polynomial is with two state variables representing prey and predator populations, there are six candidate functions so thatis the matrix of potential candidate functions. For each chosen maximum degree, we also vary the tuning parameter of the ridge regression, totalizing experiments for the SINDy-SA method. For the original SINDy approach, we additionally vary the threshold , ending up with experiments. In all SINDy-SA experiments, we use and perform the SA at . To detail the steps of Algorithm 1, consider the experiment in which the maximum degree of the polynomial is and the tuning parameter is . With these settings, the SINDy-SA method leads to the sequence of , , , and displayed in Table 2 for each iteration. Figure 2 graphically relates the with the sum , where the bar graphs represent the SSE, the points denote the mean , and the vertical lines describe the margin of error . Figure 3 displays the heatmap of the total scores calculated for all candidate functions in each iteration of the method. The corresponding combined sensitivity index are indicated inside each heatmap cell. The total score depicts four possible situations:

Table 2

Behavior of the SINDy-SA method for the prey–predator model (10), using the maximum degree of the polynomial and the tuning parameter We present , , , and at each iteration of the method, dropping the subscript to ease notation. The identified model is highlighted in italics

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document}τ	Model	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {SSE}(\varvec{\dot{X}}, \widehat{\varvec{\dot{X}}})$$\end{document}SSE(X˙,X˙^)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {M}$$\end{document}M	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon \mathcal {D}$$\end{document}εD
0	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \begin{array}{l} \dot{x} = -0.021 + 1.043x + 0.034y - 0.017x^{2} - 0.981xy - 0.014y^{2} \\ \dot{y} = 0.021 - 0.033x - 1.045y + 0.013x^{2} + 0.982xy + 0.018y^{2} \end{array} \right. $$\end{document}x˙=-0.021+1.043x+0.034y-0.017x2-0.981xy-0.014y2y˙=0.021-0.033x-1.045y+0.013x2+0.982xy+0.018y2	0.065	–	–
1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \begin{array}{l} \dot{x} = 0.011 + 0.981x - 0.006y - 0.981xy - 0.002y^{2} \\ \dot{y} = -0.003 + 0.014x - 1.014y + 0.981xy + 0.009y^{2} \end{array} \right. $$\end{document}x˙=0.011+0.981x-0.006y-0.981xy-0.002y2y˙=-0.003+0.014x-1.014y+0.981xy+0.009y2	0.125	0.065	0.651
2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \begin{array}{l} \dot{x} = 0.015 + 0.981x - 0.015y - 0.981xy \\ \dot{y} = -0.015 + 0.014x- 0.981y + 0.981xy \end{array} \right. $$\end{document}x˙=0.015+0.981x-0.015y-0.981xyy˙=-0.015+0.014x-0.981y+0.981xy	0.153	0.095	2.990
3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \begin{array}{l} \dot{x} = -0.002 + 0.991x - 0.989xy \\ \dot{y} = -0.002 - 0.988y + 0.989xy \end{array} \right. $$\end{document}x˙=-0.002+0.991x-0.989xyy˙=-0.002-0.988y+0.989xy	0.205	0.114	3.648
4	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \begin{array}{l} \dot{x} = 0.990 x - 0.989 xy \\ \dot{y} = -0.989 y + 0.989 xy \end{array} \right. $$\end{document}x˙=0.990x-0.989xyy˙=-0.989y+0.989xy	0.206	0.161	3.325
5	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \begin{array}{l} \dot{x} = -0.167x \\ \dot{y} = -0.006y \end{array} \right. $$\end{document}x˙=-0.167xy˙=-0.006y	1115.975	0.188	2.501

Fig. 2

SSE between the derivatives and , mean and margin of error at each iteration of the SINDy-SA method, considering the iteration window and the scaling factor

Fig. 3

Heatmap of the total scores for all candidate functions of the dynamical system in each iteration of the SINDy-SA method. The corresponding combined sensitivity index are indicated inside each heatmap cell. The darkest color indicates terms to be eliminated in the current iteration; the white color indicates terms eliminated in previous iterations; the lightest gray indicates more important terms

: the term is considered to be more important in relation to QoI by SA and thus should be kept in the governing equations; : the term was eliminated in some previous iteration; : the term has medium importance and can be deleted or maintained; and : the term is considered to be less important in relation to QoI by SA and thus should be eliminated from the governing equations. SSE between the derivatives and , mean and margin of error at each iteration of the SINDy-SA method, considering the iteration window and the scaling factor Heatmap of the total scores for all candidate functions of the dynamical system in each iteration of the SINDy-SA method. The corresponding combined sensitivity index are indicated inside each heatmap cell. The darkest color indicates terms to be eliminated in the current iteration; the white color indicates terms eliminated in previous iterations; the lightest gray indicates more important terms According to Table 2 and Figs. 2 and 3, the method performed six iterations, in each eliminating two candidate functions from the mathematical model. SSE increased as terms are eliminated, remaining relatively small until . In this iteration, the elimination of the nonlinear terms leads to a significant increase in the SSE of the iteration that satisfies the stopping criterion (8). Thus, the identified model is the one associated with iteration (step 7 of Algorithm 1), which corresponds to the true model although with parameters slightly different from 1.0. Those estimates are improved using the Levenberg-Marquardt optimization algorithm through which model (10) is recovered. It is important to note that, throughout the algorithm, each eliminated candidate function is associated with a state variable of the dynamical system. This behavior is due to the fact that we performed the SA of the parameters only at the final time of the experimental time frame for this illustrative example. If SA is performed at multiple time instants, each iteration can eliminate varying numbers of candidate functions. For a generic application, in which the true model is not known a priori, we recommend that SA be performed at multiple time instants. Running the SINDy-SA method for all 20 experiments resulted in four different mathematical models. For comparison, running the original SINDy method for all 60 experiments resulted in nine different mathematical models. In order to select the best model among the resulting set of models, we numerically simulate each dynamical system, given the initial conditions and the discretized time domain defined in Eq. (10), and compare with the measurement data calculating . Using data, we calculate AIC, AIC, and BIC, and their corresponding weights, as shown in Table 3. Of note, Figure SM-2 in the Supplementary Material illustrates the Pareto curve relating the SSE to the model complexity, measured in terms of the number of parameters d.

Table 3

Model selection results for the prey–predator application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics

Method	Model	d	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {SSE}(\varvec{X}, \widehat{\varvec{X}})$$\end{document}SSE(X,X^)	AIC	AIC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{c}$$\end{document}c	BIC	AIC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi	AIC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{c}$$\end{document}c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi	BIC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi
SINDy-SA	1	4	2.910 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}× 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-24}$$\end{document}-24	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 11890.428	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 11890.223	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 11877.235	1.000	1.000	1.000
	2	18	0.950	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1034.015	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1030.236	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 974.646	0.000	0.000	0.000
	3	22	0.380	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1208.986	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1203.269	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1136.423	0.000	0.000	0.000
	4	28	650.070	291.753	301.250	384.106	0.000	0.000	0.000
SINDy	1	4	2.910 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}× 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-24}$$\end{document}-24	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 11890.428	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 11890.223	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 11877.235	1.000	1.000	1.000
	2	8	1339.752	396.384	397.138	422.771	0.000	0.000	0.000
	3	6	94.667	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 137.591	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 137.156	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 117.801	0.000	0.000	0.000
	4	4	760.018	275.005	275.210	288.198	0.000	0.000	0.000
	5	1	821.749	284.624	284.644	287.922	0.000	0.000	0.000
	6	6	1062.334	345.981	346.417	365.771	0.000	0.000	0.000
	7	8	1171.715	369.581	370.335	395.968	0.000	0.000	0.000
	8	4	1489.375	409.559	409.764	422.752	0.000	0.000	0.000
	9	3	1407.354	396.230	396.352	406.125	0.000	0.000	0.000

Model selection results for the prey–predator application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics According to Table 3, the first model is the best according to all the three considered information criteria for the SINDy-SA method, with probability equal to one. This is due to the fact that the number of parameters in this model is relatively small, and the is significantly lower compared to the other identified models. Likewise, the model selected by the information criteria for the original SINDy approach also structurally corresponds to the true model (10). Figure 4 shows a comparison between the observed data and the numerical solution of the selected model for both SINDy-SA and SINDy methods, since they led to the same best identified (true) model. As we will see in the following, this is not always the case. Moreover, the results obtained for this application demonstrate that the model recalibration step can be essential for the accurate identification of the true mathematical model.

Fig. 4

Comparison between the observed data and the numerical solution of the best predator–prey model (shown in the bottom right) identified by the proposed framework using either the SINDy-SA method or the original SINDy method. The observed data are simulated from the prey–predator model (10), which corresponds to the best identified dynamical system

Tumor growth model

In our next application, we consider the logistic growth model, previously calibrated using 14 tumor volume (in mm) data points from the breast cancer growth (GI-101A xenografts) in athymic mice, distributed over 114 days [24]. By denoting x(t) as the tumor volume at time t, the evolution of the tumor volume over time, days, is then given by: Mathematical models identified by the SINDy-SA and SINDy methods after running all experiments using simulated data from the logistic model (11) Model selection results for the tumor growth application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics where /day, /(day mm), and mm. The absolute value of is obtained by dividing the growth rate by the carrying capacity 8472.914 mm. The derivatives are approximated from the numerical solution obtained by simulating model (11) at evenly distributed time points. As in the previous application, we define experiments for the SINDy-SA method by building using polynomials of degree at most equal to , , and varying . We also vary the threshold , which yields experiments for the original SINDy approach. We use and we perform the SA at . The execution of the SINDy-SA method for all 20 experiments resulted in only two different mathematical models. For comparison, the execution of the original SINDy method for all 60 experiments resulted in three different mathematical models. Table 4 presents the sets of models discovered by each system identification method. The model comparison of the identified models is performed, and the results are displayed in Table 5. For completeness, Figure SM-3 in the Supplementary Material illustrates the corresponding Pareto curves.

Table 4

Mathematical models identified by the SINDy-SA and SINDy methods after running all experiments using simulated data from the logistic model (11)

Method	Model	Equation
SINDy-SA	1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{x} = 0.028 x - 3.305 \times 10^{-6} x^{2}$$\end{document}x˙=0.028x-3.305×10-6x2
SINDy-SA	2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{x} = 0.028 x - 3.305 \times 10^{-6} x^{2} - 2.393 \times 10^{-16} x^{3}$$\end{document}x˙=0.028x-3.305×10-6x2-2.393×10-16x3
SINDy	1	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{x} = 1.729 \times 10^{-5} + 0.028 x - 3.305 \times 10^{-6} x^{2}$$\end{document}x˙=1.729×10-5+0.028x-3.305×10-6x2
	2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{x} = 33.862 - 5.169 \times 10^{-4} x$$\end{document}x˙=33.862-5.169×10-4x
	3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{x} = 0.013 x$$\end{document}x˙=0.013x

Table 5

Model selection results for the tumor growth application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics

Method	Model	d	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {SSE}(\varvec{X}, \widehat{\varvec{X}})$$\end{document}SSE(X,X^)	AIC	AIC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{c}$$\end{document}c	BIC	AIC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi	AIC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{c}$$\end{document}c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi	BIC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi
SINDy-SA	1	2	8.103 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}× 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-17}$$\end{document}-17	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 12822.639	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 12822.598	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 12815.231	1.000	1.000	1.000
SINDy-SA	2	3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.842 \times 10^{-5}$$\end{document}1.842×10-5	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 4975.767	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 4975.686	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 4964.656	0.000	0.000	0.000
SINDy	1	3	1.157\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \times $$\end{document}× 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-4}$$\end{document}-4	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 4424.609	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 4424.527	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 4413.497	1.000	1.000	1.000
	2	2	153697672.305	3948.015	3948.055	3955.422	0.000	0.000	0.000
	3	1	2089401837.561	4728.908	4728.922	4732.612	0.000	0.000	0.000

Comparison between the observed data and the numerical solution of the best tumor growth models selected by the used information criteria, which are shown inside the graphs. Although the dynamics are quite similar, the model in (b) includes a source term (shown in gray) that is not present in the true model All three-model selection criteria agreed in choosing the best models with probability equal to one, as shown in Table 5. While the SINDy-SA method identified the correct model, notice that the SINDy approach could not capture the correct structure of the model since it wrongly incorporates a source term. Figure 5 illustrates a comparison between the observed data and the numerical solution of the selected best model for each sparse identification method as well as the corresponding mathematical description. The results are quite similar visually, but the best model identified by the SINDy method led to an SSE 13 orders of magnitude higher than that identified by the SINDy-SA method. Of note, despite the SINDy-SA method having identified two mathematical models considering different experimental settings, the information criteria correctly selected the most parsimonious model that best fits the simulated data. In comparison, the original SINDy approach fails to solve the system identification problem in this example, mainly due to the difference between the orders of magnitude of the two terms in the true model. Therefore, the results obtained for this application show a limitation of using a threshold to eliminate candidate functions from the governing equations and highlight the importance of performing the SA of model parameters as in our proposed approach, allowing us to correctly discover the governing equations that generated the data.

Fig. 5

Model selection results for the pendulum motion application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics

Pendulum motion model

In this subsection, we apply the pendulum equation to generate the simulated data that is used in the dynamical system identification problem. The pendulum motion over time t, driven by a damped harmonic oscillation, is described in terms of the angle x(t) and the angular velocity y(t) through the following system of differential equations [34]:In this example, we use the parameters and , and we set . System (12) is integrated up to time , and we collect the numerical solutions at evenly distributed time points, which are used to approximate the derivative . Due to the specific dynamics observed in this problem, we build the library using both polynomials, varying their maximum degree , , and trigonometric functions (sine and cosine), varying their frequency , . If, for example, the maximum degree of the polynomial is and the frequency of the trigonometric functions is , the library is given by:which contains seven candidate functions. As in the previous examples, for each chosen maximum degree and frequency, we also vary , which means that we have experiments for the SINDy-SA method. These experiments are also combined with possible variations of the threshold for the original SINDy approach, leading to experiments by setting . In all SINDy-SA experiments, we use and we perform the SA at three time instants over the experimental time frame, , to keep track of the dynamics of the pendulum motion. The SINDy-SA and SINDy methods identified four and 84 different mathematical models, respectively. Table 6 details the model selection results and indicates the best model among the candidate set. To complement these results, Figure SM-4 in the Supplementary Material illustrates the corresponding Pareto curves relating the SSE with the model complexity.

Table 6

Model selection results for the pendulum motion application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics

Method	Model	d	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {SSE}(\varvec{X}, \widehat{\varvec{X}})$$\end{document}SSE(X,X^)	AIC	AIC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{c}$$\end{document}c	BIC	AIC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi	AIC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{c}$$\end{document}c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi	BIC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi
SINDy-SA	1	3	6.554 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}× 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-7}$$\end{document}-7	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 5997.491	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 5997.410	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 5986.370	1.000	1.000	1.000
	2	17	0.057	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2546.523	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2544.360	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2483.502	0.000	0.000	0.000
	3	35	0.115	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2297.928	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2288.418	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2168.179	0.000	0.000	0.000
	4	43	0.014	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2926.289	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2911.565	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2766.883	0.000	0.000	0.000
SINDy	1	10	0.003	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3419.983	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3419.224	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3382.912	0.000	0.000	0.000
	2	6	0.204	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2184.844	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2184.558	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2162.601	0.000	0.000	0.000
	3	3	6.554 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times $$\end{document}× 10\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^{-7}$$\end{document}-7	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 5997.491	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 5997.410	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 5986.370	1.000	1.000	1.000
	4	11	0.248	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2115.911	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2114.997	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2075.132	0.000	0.000	0.000
	5	12	0.331	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2026.392	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2025.309	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1981.907	0.000	0.000	0.000
	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots $$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots $$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots $$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots $$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots $$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots $$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots $$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots $$\end{document}⋮	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\vdots $$\end{document}⋮
	80	43	0.015	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2889.626	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2874.902	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2730.220	0.000	0.000	0.000
	81	25	0.005	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3281.282	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3276.554	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3188.604	0.000	0.000	0.000
	82	10	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$6.391 \times 10^{-4}$$\end{document}6.391×10-4	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-3911.838$$\end{document}-3911.838	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-3911.079$$\end{document}-3911.079	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-3874.767$$\end{document}-3874.767	0.000	0.000	0.000
	83	41	0.001	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3695.703	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3682.406	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3543.712	0.000	0.000	0.000
	84	29	0.010	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3044.384	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 3037.964	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2936.878	0.000	0.000	0.000

Comparison between the observed data and the numerical solution of the best pendulum motion model, identified by both SINDy-SA and SINDy frameworks, shown inside the graphs. The best identified dynamical system corresponds to the true model As shown in Table 6, the first pendulum motion model was selected as the best among the four models identified by the SINDy-SA method for the three information criteria with weights of . The best identified model is indeed the true model, which was also correctly identified by the original SINDy approach. In this particular application, it is worth mentioning the crucial role the model recalibration step plays. Without this step, the model selection criteria would have selected the fourth model identified by the SINDy approach, which is a much more complex model with three and five additional terms in the governing equations for the evolution of angle x(t) and angular velocity y(t), respectively. Figure 6 displays a comparison between the experimental data and the numerical solution of the best model for both system identification methods, with SSE of the order of .

Fig. 6

Compartmental model

Our fourth application involves using the epidemiological SIR compartmental model to generate the simulated data as input to both SINDy-SA and SINDy methods. This model describes the evolution of a fixed population of N individuals over time t divided into three compartments: susceptible S(t), infected I(t), and recovered R(t). For , it is represented by:By construction, are the transition coefficients that satisfy and , so that R(t) is known from the infected individuals I(t). In this way, we can simplify the sparse identification problem of this model by removing the equation associated with the recovered individuals. To generate the training data, we collect the numerical solution of evenly distributed time points using , the infection coefficient and the recovery rate . We set and so that individuals (). From the collected solution, we approximate the derivative and build the SINDy-SA experiments by setting consisting of polynomials, varying their maximum degree , , and . Those experiments are solved using and performing the SA at to capture the overall solution behavior over the considered time frame. For the original SINDy method, we consider experiments by additionally varying . Model selection results for the SIR application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics Comparison between the observed data and the numerical solution of the selected best SIR models, shown inside the graphs. Although the dynamics are quite similar, the model in (b) includes one term (shown in gray) that is not present in the true model Table 7 shows the model selection results for the 10 and 14 models identified by the SINDy-SA and original SINDy methods, respectively. The corresponding Pareto curves relating the SSE to the number of model parameters are illustrated in Figure SM-5 in the Supplementary Material. Of note, although both approaches identified many dynamical systems, the information criteria were able to correctly select the true model that originated the experimental data for the SINDy-SA method. However, the model selected for the original SINDy approach does not correspond to the true model since it incorporates a source term in the time evolution of the infected population I(t) that is not present in the true model. In both cases, there was great evidence supporting the choice of the best models. Figure 7 shows a comparison between the experimental data and the numerical solution of the selected best models, whose mathematical description is also shown inside the graphs.

Table 7

Model selection results for the SIR application. Subscripts indicate the model selection criterion weights. Models selected as the best ones are indicated by italics

Method	Model	d	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {SSE}(\varvec{X}, \widehat{\varvec{X}})$$\end{document}SSE(X,X^)	AIC	AIC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{c}$$\end{document}c	BIC	AIC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi	AIC\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{c}$$\end{document}c \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi	BIC \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{i}$$\end{document}wi
SINDy-SA	1	3	0.861	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2450.414	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2450.353	-2438.440	1.000	1.000	1.000
	2	8	29.585	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1025.674	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1025.305	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 993.742	0.000	0.000	0.000
	3	8	28.116	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1046.044	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1045.676	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1014.113	0.000	0.000	0.000
	4	16	2.799	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1952.874	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1951.453	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1889.010	0.000	0.000	0.000
	5	19	2.201	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2043.051	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2041.051	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1967.214	0.000	0.000	0.000
	6	29	2.318	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 2002.284	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1997.582	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 1886.532	0.000	0.000	0.000
	7	35	24577261.088	4480.347	4487.270	4620.048	0.000	0.000	0.000
	8	41	23109275.001	4467.712	4477.332	4631.362	0.000	0.000	0.000
	9	37	23261656.824	4462.341	4470.109	4610.025	0.000	0.000	0.000
	10	40	23261940.744	4468.346	4477.482	4628.004	0.000	0.000	0.000
SINDy	1	8	0.018	-3987.467	-3987.098	-3955.535	0.000	0.000	0.000
	2	5	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.144 \times 10^{-7}$$\end{document}1.144×10-7	-8780.034	-8779.881	-8760.076	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$9.049 \times 10^{-105}$$\end{document}9.049×10-105	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$8.821 \times 10^{-105}$$\end{document}8.821×10-105	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.230 \times 10^{-105}$$\end{document}1.230×10-105
	3	5	52532578.252	4724.192	4724.344	4744.149	0.000	0.000	0.000
	4	3	21898540.949	4370.186	4370.247	4382.161	0.000	0.000	0.000
	5	6	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.540 \times 10^{-4}$$\end{document}1.540×10-4	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 5895.933	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 5895.719	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 5871.984	0.000	0.000	0.000
	6	4	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3.717 \times 10^{-6}$$\end{document}3.717×10-6	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 7389.671	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 7389.569	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 7373.705	0.000	0.000	0.000
	7	1	254775423.290	5347.771	5347.781	5351.763	0.000	0.000	0.000
	8	4	3.470 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times 10^{-8}$$\end{document}×10-8	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 9259.171	-9259.070	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 9243.205	1.000	1.000	1.000
	9	5	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$1.049 \times 10^{-6}$$\end{document}1.049×10-6	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 7893.628	7893.475	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}- 7873.670	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.992 \times 10^{-297}$$\end{document}2.992×10-297	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2.916 \times 10^{-297}$$\end{document}2.916×10-297	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4.066 \times 10^{-298}$$\end{document}4.066×10-298
	10	5	35834684.742	4571.185	4571.337	4591.142	0.000	0.000	0.000
	11	3	219088760.235	5291.409	5291.470	5303.384	0.000	0.000	0.000
	12	5	4855374087.697	6534.755	6534.907	6554.712	0.000	0.000	0.000
	13	2	254775423.002	5349.771	5349.802	5357.754	0.000	0.000	0.000
	14	2	208797458.581	5270.164	5270.195	5278.147	0.000	0.000	0.000

Fig. 7

Comparison between the observed data and the numerical solution of the selected best SIR models, shown inside the graphs. Although the dynamics are quite similar, the model in (b) includes one term (shown in gray) that is not present in the true model

As in the tumor growth model application, the original SINDy approach also fails to solve the system identification problem. Therefore, the results obtained for this application show that the SA of model parameters in our proposed SINDy-SA framework circumvents the difficulty of choosing the threshold and allows the correct identification of the true dynamical system. Best models identified by the SINDy-SA method for the applications considered in this work, whose structures correspond to the true models. Results were obtained using the lowest number of data points (m) and the highest noise intensities Comparison between the noisy and limited data (described by circular and triangular points) and the numerical solution (indicated by continuous and dashed lines) of the best identified models for the applications considered in this work

Remarks on limited and noise-contaminated data scenarios

It is worth commenting on the robustness of the SINDy-SA approach with respect to scenarios with noisy or sparse data. For all applications and keeping the other setups fixed, we firstly carry out experiments reducing the number of evenly distributed time points (m) up to the value for which the SINDy-SA framework is able to correctly identify the true mathematical model. We then perturb the prey–predator, tumor growth, and SIR compartmental data points with a multiplicative noise following log-normal distributions, whereas we contaminate the data of the pendulum motion model with a normal additive noise. We gradually increase the noise level up to the value from which correct identification is prevented. Table 8 summarizes the lowest values for m and the highest noise intensities for all applications, and the corresponding best identified models, while Fig. 8 compares their simulations against the noisy and limited data. Although the parameter values of the identified models are slightly different from the parameters of the true models, the SINDy-SA framework was quite robust to handle noisy and limited data.

Table 8

Best models identified by the SINDy-SA method for the applications considered in this work, whose structures correspond to the true models. Results were obtained using the lowest number of data points (m) and the highest noise intensities

Application	m	Noise	Best identified model
Prey-predator	70	Multiplicative \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Log-normal}(0, 0.06)$$\end{document}Log-normal(0,0.06)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \begin{array}{l} \dot{x} = 0.926x - 0.920xy \\ \dot{y} = -1.061y + 0.999xy \end{array} \right. $$\end{document}x˙=0.926x-0.920xyy˙=-1.061y+0.999xy
Tumor growth	7	Multiplicative \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Log-normal}(0, 1.0)$$\end{document}Log-normal(0,1.0)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\dot{x} = 0.034 x - 4.586 \times 10^{-6} x^{2}$$\end{document}x˙=0.034x-4.586×10-6x2
Pendulum motion	70	Additive \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal {N}(0, 0.01)$$\end{document}N(0,0.01)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \begin{array}{l} \dot{x} = 1.002y \\ \dot{y} = -0.250y - 4.989 \sin (x) \end{array} \right. $$\end{document}x˙=1.002yy˙=-0.250y-4.989sin(x)
Compartmental	22	Multiplicative \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {Log-normal}(0, 0.001)$$\end{document}Log-normal(0,0.001)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left\{ \begin{array}{l} \dot{S} = -4.004 \times 10^{-4} SI \\ \dot{I} = 3.999 \times 10^{-4} SI - 0.040I \end{array} \right. $$\end{document}S˙=-4.004×10-4SII˙=3.999×10-4SI-0.040I

Fig. 8

Comparison between the noisy and limited data (described by circular and triangular points) and the numerical solution (indicated by continuous and dashed lines) of the best identified models for the applications considered in this work

Final remarks

In this work, we built on the original SINDy method by integrating it with a global SA technique of model parameters. The SINDy method has recently been used to identify nonlinear dynamical system structures from noisy measurement data. However, this method depends on choosing a threshold under which regression coefficients are eliminated in the process of identifying the model structure. The SA technique circumvents the need to define a threshold value by ranking terms according to their importance in relation to the rate of change in time of each state variable (QoI) and eliminating those less influential. In our proposed SINDy-SA approach, we employed the EE method to analyze the sensitivity of the parameters, and the defined QoI is associated with the formulation of a sparse regression problem. We chose the EE method since it allows ranking the global influence of the model parameters on a QoI and requires a relatively small number of model evaluations compared to variance-based methods. We remark that the proposed methodology is not dependent on the EE method and other global SA techniques can also be used. For solving the sparse regression problem, we used the ridge regression method, also implemented in the PySINDy module, due to its robustness and accuracy in identifying nonlinear dynamical systems compared with other classical regression methods. Our SINDy-SA method is carried out through an iterative process, whose convergence is dictated by statistics of the dynamic change of previously computed errors. We integrated the proposed SINDy-SA method in a general framework, in which we formulated different experimental settings yielding a set of resulting identified models, whose parameter estimates were then improved through model recalibration. Among all possible identified models, the best and most parsimonious model was selected using model selection techniques based on information theory. The general framework can also employ other system identification methods, including the original SINDy approach, as performed here for comparison. Both SINDy-SA and SINDy frameworks were executed in a variety of applications, in which we generated simulated data from true models in order to analyze the ability of the methods to correctly discover the structure of the mathematical model, as well as the parameter values. The applications presented include models with different behaviors encompassing a prey–predator model, a logistic model calibrated from tumor growth data, a pendulum motion model, and a SIR compartmental model. While the SINDy framework failed to identify the logistic and the SIR models, the SINDy-SA framework was able to correctly identify the true model for all applications, emphasizing a relevant advantage of performing a SA of model parameters to solve the system identification problem. These facts demonstrate the potential of our SINDy-SA method to be used in real applications, in which the true model is not known a priori, in the search for interpretable and predictive mathematical models. In the applications presented, we defined the library of candidate functions composed of the terms present in the true model and some additional terms that we expected to be eliminated by the system identification methods. This was assumed in order to investigate the ability of the SINDy-SA and SINDy methods to discover known mathematical models. However, in a realistic scenario, the true model is generally not known. In this context, an important question arises, associated with the behavior of the methods when some term of the true model is not part of the library of candidate functions. In such case, both SINDy-SA and the original SINDy approaches would try to find the best possible combination of candidate functions that best fits the experimental data. This emphasizes the crucial role the modeler has in system identification problems. The modeler must define the set of possible functions based on the available data and prior knowledge about the expected behavior of the system, as well as assess whether the identified mathematical model is physically interpretable and sufficiently accurate. The time and state variables to perform a global sensitivity analysis can play a crucial role in model identification. For problems with multiple dependent outputs, Saltelli et al. [30] suggest composing multiple outputs into a single scalar and performing a sensitivity analysis. However, in some situations, it is more appropriate to quantify the sensitivity of the parameters at each output [19]. Although this analysis was not carried out in the present work, applying multi-output global sensitivity analysis methods in our framework is the target of future investigations. The model recalibration step of our proposed framework is critical for correctly identifying the true dynamical system. Although we have successfully employed the Levenberg-Marquardt method, other parameter inference techniques can be used alternatively, such as Bayesian calibration procedures, through which parameter uncertainties can be quantified. This is particularly important when dealing with limited and noisy data. In this context, data preprocessing techniques, such as the Gaussian process regression [18], could also provide robustness on system identification, a topic for further investigation. Overall, our SINDy-SA framework can leverage new developments together with other recent SINDy works to deal with, for example, implicit dynamical systems [16], partial differential equations [29], and control [3]. Below is the link to the electronic supplementary material. Supplementary file 1 (pdf 1398 KB)

15 in total

1. Learning physically consistent differential equation models from data using group sparsity.

Authors: Suryanarayana Maddu; Bevan L Cheeseman; Christian L Müller; Ivo F Sbalzarini
Journal: Phys Rev E Date: 2021-04 Impact factor: 2.529

2. Sparse identification of nonlinear dynamics for rapid model recovery.

Authors: Markus Quade; Markus Abel; J Nathan Kutz; Steven L Brunton
Journal: Chaos Date: 2018-06 Impact factor: 3.642

3. Reactive SINDy: Discovering governing reactions from concentration data.

Authors: Moritz Hoffmann; Christoph Fröhner; Frank Noé
Journal: J Chem Phys Date: 2019-01-14 Impact factor: 3.488

4. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.

Authors: Steven L Brunton; Joshua L Proctor; J Nathan Kutz
Journal: Proc Natl Acad Sci U S A Date: 2016-03-28 Impact factor: 11.205

5. Data-driven discovery of partial differential equations.

Authors: Samuel H Rudy; Steven L Brunton; Joshua L Proctor; J Nathan Kutz
Journal: Sci Adv Date: 2017-04-26 Impact factor: 14.136

6. Framework for enhancing the estimation of model parameters for data with a high level of uncertainty.

Authors: Gustavo B Libotte; Lucas Dos Anjos; Regina C C Almeida; Sandra M C Malta; Renato S Silva
Journal: Nonlinear Dyn Date: 2022-01-07 Impact factor: 5.741

7. System inference for the spatio-temporal evolution of infectious diseases: Michigan in the time of COVID-19.

Authors: Z Wang; X Zhang; G H Teichert; M Carrasco-Teja; K Garikipati
Journal: Comput Mech Date: 2020-08-12 Impact factor: 4.014

8. Modeling and prediction of the transmission dynamics of COVID-19 based on the SINDy-LM method.

Authors: Yu-Xin Jiang; Xiong Xiong; Shuo Zhang; Jia-Xiang Wang; Jia-Chun Li; Lin Du
Journal: Nonlinear Dyn Date: 2021-07-22 Impact factor: 5.022

Review 9. SciPy 1.0: fundamental algorithms for scientific computing in Python.

Authors: Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; Stéfan J van der Walt; Matthew Brett; Joshua Wilson; K Jarrod Millman; Nikolay Mayorov; Andrew R J Nelson; Eric Jones; Robert Kern; Eric Larson; C J Carey; İlhan Polat; Yu Feng; Eric W Moore; Jake VanderPlas; Denis Laxalde; Josef Perktold; Robert Cimrman; Ian Henriksen; E A Quintero; Charles R Harris; Anne M Archibald; Antônio H Ribeiro; Fabian Pedregosa; Paul van Mulbregt
Journal: Nat Methods Date: 2020-02-03 Impact factor: 28.547

10. Discovering dynamic models of COVID-19 transmission.

Authors: Jinwen Liang; Xueliang Zhang; Kai Wang; Manlai Tang; Maozai Tian
Journal: Transbound Emerg Dis Date: 2021-08-11 Impact factor: 4.521