Literature DB >> 30327341

Sequential sampling strategy for extreme event statistics in nonlinear dynamical systems.

Mustafa A Mohamad1, Themistoklis P Sapsis2.   

Abstract

We develop a method for the evaluation of extreme event statistics associated with nonlinear dynamical systems from a small number of samples. From an initial dataset of design points, we formulate a sequential strategy that provides the "next-best" data point (set of parameters) that when evaluated results in improved estimates of the probability density function (pdf) for a scalar quantity of interest. The approach uses Gaussian process regression to perform Bayesian inference on the parameter-to-observation map describing the quantity of interest. We then approximate the desired pdf along with uncertainty bounds using the posterior distribution of the inferred map. The next-best design point is sequentially determined through an optimization procedure that selects the point in parameter space that maximally reduces uncertainty between the estimated bounds of the pdf prediction. Since the optimization process uses only information from the inferred map, it has minimal computational cost. Moreover, the special form of the metric emphasizes the tails of the pdf. The method is practical for systems where the dimensionality of the parameter space is of moderate size and for problems where each sample is very expensive to obtain. We apply the method to estimate the extreme event statistics for a very high-dimensional system with millions of degrees of freedom: an offshore platform subjected to 3D irregular waves. It is demonstrated that the developed approach can accurately determine the extreme event statistics using a limited number of samples.
Copyright © 2018 the Author(s). Published by PNAS.

Entities:  

Keywords:  adaptive sampling; extreme events; sequential experimental design

Year:  2018        PMID: 30327341      PMCID: PMC6217378          DOI: 10.1073/pnas.1813263115

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


For many natural and engineering systems, extreme events, corresponding to large excursions, have significant consequences and are important to predict. Examples include extreme economic events, such as credit shocks (1), rogue waves in the ocean (2), and extreme climate events (3). Extreme events “live” in the tails of a probability distribution function (pdf); thus, it is critical to quantify the pdf many standard deviations away from the mean. For most real-world problems, the underlying processes are far too complex to enable estimation of the tails through direct simulations or repeated experiments. This is a result of the low probabilities of extreme events, which necessitate a large number of experiments or ensembles to resolve their statistics. For random dynamical systems with inherently nonlinear dynamics (expressed through intermittent events, nonlinear energy transfers, broad energy spectrum, and large intrinsic dimensionality), we are usually limited to a few ensemble realizations. The setup in this article involves a stochastic dynamical system that depends on a set of random parameters with known probability distribution. We assume that the dimensionality of the random parameters is or can be reduced to a moderate size. Because of the inherent stochastic and transient character of extreme responses, it is not sufficient to consider the dynamical properties of the system independently from the statistical characteristics of solutions. A statistical approach to this problem has important limitations, such as requiring various extrapolation schemes due to insufficient sample numbers (see extreme value theorems) (4). Another strategy is large deviations theory (5, 6), a method for the probabilistic quantification of large fluctuations in systems, which involves identifying a large deviations principle that explains the least unlikely rare event. While applied to many problems, for complex systems, estimating the rate function can be very costly and the principle does not characterize the full probability distribution. The resulting distributions via such approaches cannot always capture the nontrivial shape of the tail, dictated by physical laws in addition to statistical characteristics. On the other hand, in a dynamical systems approach, there are no sufficiently generic efficient methods to infer statistical information from dynamics. For example, the Fokker–Planck equation (7) is challenging to solve even in moderate to low dimensions (8). To this end, it is essential to consider blended strategies. The utilization of combined dynamic–stochastic models for the prediction of extreme events have also been advocated and used in climate science and meteorology by others (9–11). In refs. 12 and 13, a probabilistic decomposition of extreme events was used to efficiently characterize the probability distribution of complex systems, which considered both the statistical characteristics of trajectories and the mechanism triggering the instabilities (extreme events). While effective, the proposed decomposition of intermittent regimes requires explicit knowledge of the dynamics triggering the extremes, which may not be available or easily determined for arbitrary dynamical systems. We formulate a sequential method for capturing the statistics of an observable that is, for example, a functional of the state of a dynamical system or a physical experiment. The response of the observable is modeled using a machine learning method that infers the functional form of the quantity of interest by using only a few strategically sampled numerical simulations or experiments. Combining the predictions from the machine learning model, using the Gaussian process regression (GPR) framework, with available statistical information on the random input parameters, we formulate an optimization problem that provides the next-best or most informative experiment that should be performed to maximally reduce uncertainty in the pdf prediction of extreme events (tails of the distribution) according to a proposed “distance” metric. To account for tail features, the metric uses a logarithmic transformation of the pdfs, which is similar to the style of working on the rate function in the large deviation principle. The optimization process relies exclusively on the inferred properties on the parameter-to-observation map, and no additional simulations are required in the optimization. For the optimization problem to be practically solvable, we require the parameter space to be of moderate size. The proposed method allows us to sequentially sample the parameter space to rapidly capture the pdf and, in particular, the tails of the distribution of the observable of interest.

Problem Setup and Method Overview

Consider a dynamical system with state variable ,where is the sample space in an appropriate probability space (we denote the density of the random variable by and its cumulative density by ). The random variable parameterizes sources of uncertainty, such as stochastic forcing terms or system parameters with a priori known distribution . For fixed , the response is a deterministic function in time. We are interested in estimating the pdf of a scalar quantity of interest or observable given bywhere is a continuous parameter-to-observation map and is an arbitrary functional of . In addition, it is possible to consider observations that are corrupted by some observational or numerical noise term, but for simplicity, we assume zero noise. In our setup, the unknown parameter-to-observation map is expensive to evaluate, representing, for example, a large-scale numerical simulation or a costly physical experiment, and so we desire to minimize the number of evaluations of this mapping. Note that the true statistics of the random variable induced by iswhere and is the cumulative density function of . Our objective is to estimate the statistics, especially non-Gaussian features, of the observable —that is, the pdf of the random variable induced by the mapping , which we denote by . Consider the quantity of interest with pdf induced by the unknown mapping , where is a random valued parameter with known pdf . Given a dataset of size , so that the estimated distribution of using a learning algorithm from this dataset is , determine the next input parameter such that when the map is evaluated at this new sample the error between and is minimized, placing special emphasis on the tails of the distribution where is large. If we consider the augmented -parameterized dataset for all and denote the learned density by pdf , the next parameter is obtained by minimizing a distance metric between two probability distributions, . Determining the next-best experiment should not involve the expensive-to-evaluate map nor . The main computational savings of our proposed method involves (1) using an inexpensive-to-compute surrogate model to replace appearing in the -parametrized dataset and using a version of the distance metric without explicit dependence on . Here, we use GPR as the learning algorithm to construct the surrogate for . Using the posterior distribution of the inferred map through the GPR scheme, we estimate the pdf for the quantity of interest as well as the pdfs that correspond to the confidence intervals on the map. The distance metric is then based on minimization of the logarithmic transform of the pdfs that correspond to the map upper and lower confidence intervals from the posterior variance, which does not involve . This optimization problem provides the “next-best” point in a sequential fashion. The overall aim is to accurately capture the tail statistics of through a minimum number of observations of . Note that the resulting method should not need to densely sample all regions in , since not all regions have significant probability ( may be negligible) or importance ( may be small). The formulated sampling strategy should accurately predict the tail region of the pdf taking into account both the magnitude of the map and the value of the probability of the sample (see Fig. 1).
Fig. 1.

Areas with large probability in are not necessarily associated with regions where is large. The proposed criterion focuses on sampling regions where both the probability and the magnitude of are significant.

Areas with large probability in are not necessarily associated with regions where is large. The proposed criterion focuses on sampling regions where both the probability and the magnitude of are significant.

Method Description

An important component of our proposed method is the construction of an inexpensive surrogate for the map . We use GPR as our learning method. GPR considers the function as a Gaussian process, in terms of a distribution in function space (see and numerous references, such as ref. 14). An important property of GPR is that the posterior distribution is a Gaussian process with an explicit mean and kernel function. The variance of the posterior can be used as a proxy for the error or uncertainty of the prediction, which we use along with to guide selection of the next sample point. We learn the parameter-to-observation map using GPR from the dataset . The GPR method then provides analytic expressions for the mean and kernel for the posterior distribution of the Gaussian random function (see for the expressions). We can then construct the following estimate for the pdf using the posterior mean of the learned surrogate model :where and is the cumulative distribution function. We now formulate the optimization problem for the next sample point . Consider the augmented -parameterized dataset , which approximates by using the GPR mean instead of . Then, let denote the random function trained on the augmented dataset . The pdf of the random variable , where , is now replaced by a -parameterized random probability measure induced by the Gaussian random function , where and is a sample point. Note that the mean of the random function in the -parameterized dataset is identical to for all , since the prediction of the value of the map at the sample point is given by the posterior mean at iteration . The proposed criterion is then based on minimization of a distance metric between the pdfs of the confidence bounds of . Specifically, let denote the pdfs of the two random variables , where , which are the upper and lower bounds of the confidence interval based on the -scaled SD of the posterior distribution. The pdfs corresponding to the confidence bounds are explicitly given bywhere . We use the 95% interval bounds, so that the SD is scaled by a factor . The cumulative distribution function corresponding to the upper confidence bound is a lower bound for , and we have the relation for all . Note that although the map mean of the Gaussian random function based on the -parameterized dataset is identical to the posterior mean at iteration , the value of the SD now vanishes at the sample point: . The distance metric we propose for the selection of the next sample point is given bywhere the integral is computed over the intersection of the two domains that the pdfs are defined over. The next sample point is then determined by solving the optimization problem:This is an -based metric of the logarithmic transform of the pdfs. The logarithmic transform of the densities in the criterion effectively emphasizes extreme and rare events, as we explicitly show in theorem 1. The computational savings come from the construction of a criterion that avoids and instead uses , which involves evaluating the GPR emulator (inexpensive) and an additional GPR prediction. We make a few comments on the optimization problem and the sequential algorithm. The sequential strategy is summarized in pseudocode in . For the optimization problem, we use a derivative-free method, specifically a particle swarm optimizer. The integrations for the pdfs are computed explicitly from the definition of the Lebesgue integral by partitioning the codomain of map . This is much more efficient compared with a Monte Carlo approach that would result in a very expensive computational task, as long as the dimensionality of the parameter space is low. For high-dimensional parameter spaces, the computational cost of the integration can become prohibitive, and in such cases, an order reduction in the parameter space should be first attempted, or alternatively an importance sampling algorithm could be used to compute the integral. In the numerical problems, the optimization problem is practically solvable for low-dimensional parameter spaces . The starting design plan size , if not already provided, should be small; a Latin–Hypercube (LH)-based sampling plan should be used for this purpose. We also recommend as a pretraining period to process a small number of iterations using a metric without the logarithmic transform of the densities, such as the following -based metric,to capture the main probability mass (low order moments) before using the proposed metric that emphasizes extreme and rare events. In addition, it is not necessary to retrain the GPR hyperparameters after every iteration, which can remain fixed after being calibrated from a few iterations. Updating the Gaussian process emulator after the addition of new data points can be done in if the hyperparameters are fixed; otherwise, the GPR emulator must be performed anew in (see ). (For low-dimensional , since we presume the dataset size is small, the cost difference may be negligible.)

Asymptotic Behavior.

The first theoretical result relates to the convergence of the proposed method to the true pdf as the number of samples goes to infinity (see for the proof). The second result shows that the asymptotic form of the criterion is given by (see for the proof): Theorem 1: Let and from the GPR scheme be sufficiently smooth functions of . The asymptotic behavior of for large (ensuring small ) and small is given bywhere denotes the expectation over the probability measure . Note that the pdf in the denominator under the integral in 9 is a direct implication of our choice to consider the difference between the logarithms of the pdfs in the optimization criterion. The pdf of the parameter-to-observation map in the denominator of the integrand guarantees that even values of the parameter , where the probability is low (rare events), are sampled. This point is demonstrated by the following corollary (proof in ). Corollary 1: Let be a one-dimensional random variable, and in addition to the assumptions of theorem 1, we assume that is a monotonically increasing function. Then, the asymptotic value of for large has the following property:where ′ denotes differentiation and where it is understood that the inequality above is based on the asymptotic result in Eq. . Therefore, for a large value of , bounds (within higher order error) a quantity that consists of boundary terms and two integral terms involving sampling the function over the contours of and . Consider the first term on the right, which can be discretized aswhere is an equipartition of the range of . This summation is summarized in Fig. 2, Left. The way the integral is computed guarantees that the integrand will be sampled even in locations where the pdf has a small value (rare events) or else the criterion would not converge.
Fig. 2.

(Left) Integration of over contours of implies sampling of in low probability regions of . (Right) On the other hand, low-order moments of rely only on values of close to high probability regions of ; thus, rare events are not sampled sufficiently.

(Left) Integration of over contours of implies sampling of in low probability regions of . (Right) On the other hand, low-order moments of rely only on values of close to high probability regions of ; thus, rare events are not sampled sufficiently. On the other hand, if we had instead chosen a criterion that focused directly on the convergence of low order statistics for , such as the same criterion, but without the logarithmic transformation of the densities, we would havewhere is the cumulative distribution function of . The corresponding integration is shown in Fig. 2, Right. In such a case, sampling the function in regions of high probability would be sufficient for the criterion to converge. However, such a strategy would most likely lead to large errors in regions associated with rare events since the sampling will be sparse.

Applications

We illustrate the proposed algorithm to two problems. The first example consists of a nonlinear oscillator stochastically forced by a colored noise process; this application serves to illustrate the main ideas of the proposed method. The second application, involving 3D hydrodynamic wave impacts on an offshore platform (a system with millions of degrees of freedom), showcases the applicability of the proposed method to real-world setups where computation of extreme event statistics using traditional approaches is prohibitively expensive, since the simulation times of experimental runs are on the order of several hours.

Nonlinear Oscillator Driven by Correlated Stochastic Noise.

Consider the nonlinear oscillatorforced by a stationary, colored noise with correlation function and the nonlinear restoring term given bySince the system is stochastically forced, it is necessary to use an expansion to obtain a parameterization in terms of a finite number of random variables. We use a Karhunen–Loève expansion (see ) to obtainwhich is truncated to a suitable number . For illustration, we take our quantity of interest as the average value of the response so that the parameter-to-observation map is defined by We consider a three-term truncation . The system parameters are given by , , , , and , and the forcing parameters are and , with . For comparisons, the exact pdf is obtained by sampling the true map from 64,000 points on a grid. In Fig. 3, we illustrate the sampling as determined by the proposed algorithm in addition to the log error between the exact pdf and the GPR mean prediction. In these simulations, we start from a dataset of 6 points selected according to an LH design. To capture the main mass of the pdf, before focusing on the tails of the distribution, we perform 12 iterations using the distance in Eq. before moving on to the criterion using the distance metric in Eq. involving the logarithms of the pdfs. Observe the unique shape that the sampling algorithm has identified in space, which spans regions in associated with finite probability and large values of . Fig. 4 demonstrates the progression of the estimated pdf as a function of the iteration count. Even after only 100 samples, we have already captured the qualitative features of the exact pdf and have very good quantitative agreement.
Fig. 3.

(Bottom) A scatterplot of the algorithm sampling of the parameter space (green points denote the initial random LH samples). (Top Left) The corresponding error of the logarithmic transform of the pdf between the GPR mean and truth, and (Top Right) value of the criterion in 5 as a function of the iteration number.

Fig. 4.

Progression of the pdf estimation as we iteratively sample points. We shade the region between the pdfs in red, as a visualization of the convergence of the pdfs . Dashed vertical lines denote 1 SD.

(Bottom) A scatterplot of the algorithm sampling of the parameter space (green points denote the initial random LH samples). (Top Left) The corresponding error of the logarithmic transform of the pdf between the GPR mean and truth, and (Top Right) value of the criterion in 5 as a function of the iteration number. Progression of the pdf estimation as we iteratively sample points. We shade the region between the pdfs in red, as a visualization of the convergence of the pdfs . Dashed vertical lines denote 1 SD. We have explored the convergence properties of the algorithm, and in Fig. 5, we compare the proposed sampling method to space-filling LH sampling. The LH strategy is not iterative and thus must be started anew, which puts the LH sampling at a large disadvantage. Nonetheless, this serves as a benchmark to a widely used reference method for the design of experiments due to its simplicity. In the figure, the purple curve represents the mean LH design error and the shaded region represents the SD about the mean, which are computed by evaluating 250 number of random LH designs per fixed dataset size. Even considering the variance of the LH curve, the proposed algorithm under various parameters (initial dataset size or number of “core” iterations where the criterion uses the metric) is observed to outperform the LH strategy by nearly an order of magnitude in the error of the logarithm of the pdfs. This demonstrates the favorable properties of the proposed sampling strategy for accurately estimating the tail of target distribution.
Fig. 5.

Comparison of the errors between LH sampling and proposed algorithm under different parameters measured against the exact pdf for the case where in Eq. . The parameter ncore is the number of core iterations performed according to an metric, and nstart is the initial dataset size (where the points are sampled from an LH design).

Comparison of the errors between LH sampling and proposed algorithm under different parameters measured against the exact pdf for the case where in Eq. . The parameter ncore is the number of core iterations performed according to an metric, and nstart is the initial dataset size (where the points are sampled from an LH design).

Hydrodynamic Forces and Moments on an Offshore Platform.

Here we apply the sampling algorithm to compute the probability distributions describing the loads on an offshore platform in irregular seas. The response of the platform is quantified through direct, 3D numerical simulations of Navier–Stokes using the smoothed particle hydrodynamics (SPH) method (15) (Fig. 6). Our numerical setup parallels that of a physical wave tank experiment and consists of a wave maker on one end and a sloping “beach” on the other end of the tank to quickly dissipate the energy of incident waves and avoid wave reflections. Further details regarding the simulations are provided in .
Fig. 6.

SPH simulation at with and .

SPH simulation at with and . Wind-generated ocean waves are empirically described by their energy spectrum. Here, we consider irregular seas with JONSWAP spectral density (see for details and parameters). While realizations of the random waves have the form of time series, an alternative description can be obtained by considering a sequence of primary wave groups, each characterized by a random group length scale and height (see, e.g., ref. 16). This formulation allows us to describe the input space through just two random variables (much fewer than what we would need with a Karhunen–Loeve expansion). Following ref. 16, we describe these primary wave groups by the representation which is an explicit parameterization in terms of and . Thus, and correspond to and in the notation of Eq. . The statistical characteristics of the wave groups associated with a random wave field (such as the one given by the JONSWAP spectrum) can be obtained by applying the scale-selection algorithm described in ref. 17. Specifically, by generating many realizations consistent with the used spectrum, we use a group detection algorithm to identify coherent group structures in the field along with their length scale and amplitude ( and ). This procedure provides us with the empirical probability distribution of the wave field and thus a nonlinear parametrization of the randomness in the input process. The quantities of interest in this problem are the forces and moments acting on the platform. The incident wave propagates in the direction, and as such, we consider the pdf of the force in the direction and the moment about the bottom center of the platform:In Fig. 7, we show the results of the progression of density prediction for the force variable. In these experiments, we begin by arbitrarily selecting four initial sample points from an LH sampling strategy. Next, we perform four iterations using the distance metric to quickly capture the main mass of the distribution before focusing on the distribution away from the mean that uses the metric of the logarithmic of the pdf. The lightly shaded red region in the pdf plots is a visualization of the uncertainty in the pdf, obtained by sampling the GPR prediction and computing the pdf for 200 realizations and then computing the upper (lower) locus of the maximum (minimum) value of the pdf at each value. The figures demonstrate that with 15 (i.e., 14 total sample points) iterations (together with the four samples in the initial configuration), we are able to approximate the pdf to good agreement with the “exact” pdf, which was computed from a densely sampled grid. In , we also present the sampled map where it can be seen that the algorithm selects points associated with large forces and nonnegligible probability of occurrence. In the same figures, results for the momentum and an additional spectrum are included. Note that for this problem, the GPR operates on the logarithm of the observable because the underlying function is always positive.
Fig. 7.

Progression for pdf density prediction for the force variable.

Progression for pdf density prediction for the force variable.

Conclusions

We developed and studied a computational algorithm for the evaluation of extreme event statistics associated with systems that depend on a set of random parameters. The algorithm is practical for high-dimensional systems but with parameter spaces of moderate dimensionality and where each sample is very expensive to obtain relative to the optimization of the next sample. The method provides a sequence of points that lead to improved estimates of the probability distribution for a scalar quantity of interest. The criterion for the selection of the next design point emphasizes the tail statistics. We prove asymptotic convergence of the algorithm and provided analysis for its asymptotic behavior. We also demonstrated its applicability to two problems, one involving a demanding system with millions degrees of freedom.
  4 in total

1.  Unsteady evolution of localized unidirectional deep-water wave groups.

Authors:  Will Cousins; Themistoklis P Sapsis
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2015-06-15

2.  Simple stochastic dynamical models capturing the statistical diversity of El Niño Southern Oscillation.

Authors:  Nan Chen; Andrew J Majda
Journal:  Proc Natl Acad Sci U S A       Date:  2017-01-30       Impact factor: 11.205

3.  Extremes in dynamic-stochastic systems.

Authors:  Christian L E Franzke
Journal:  Chaos       Date:  2017-01       Impact factor: 3.642

4.  Rogue waves and large deviations in deep sea.

Authors:  Giovanni Dematteis; Tobias Grafke; Eric Vanden-Eijnden
Journal:  Proc Natl Acad Sci U S A       Date:  2018-01-16       Impact factor: 11.205

  4 in total
  5 in total

1.  Output-weighted optimal sampling for Bayesian regression and rare event statistics using few samples.

Authors:  Themistoklis P Sapsis
Journal:  Proc Math Phys Eng Sci       Date:  2020-02-19       Impact factor: 2.704

2.  Harnessing fluctuation theorems to discover free energy and dissipation potentials from non-equilibrium data.

Authors:  Shenglin Huang; Chuanpeng Sun; Prashant K Purohit; Celia Reina
Journal:  J Mech Phys Solids       Date:  2021-01-22       Impact factor: 5.471

3.  Nonlinear wave evolution with data-driven breaking.

Authors:  D Eeltink; H Branger; C Luneau; Y He; A Chabchoub; J Kasparian; T S van den Bremer; T P Sapsis
Journal:  Nat Commun       Date:  2022-04-29       Impact factor: 17.694

4.  Precluding rare outcomes by predicting their absence.

Authors:  Eric W Schoon; David Melamed; Ronald L Breiger; Eunsung Yoon; Christopher Kleps
Journal:  PLoS One       Date:  2019-10-10       Impact factor: 3.240

5.  Using machine learning to predict extreme events in complex systems.

Authors:  Di Qi; Andrew J Majda
Journal:  Proc Natl Acad Sci U S A       Date:  2019-12-23       Impact factor: 11.205

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.