Literature DB >> 31311866

Learning data-driven discretizations for partial differential equations.

Yohai Bar-Sinai1, Stephan Hoyer2, Jason Hickey3, Michael P Brenner4,3.   

Abstract

The numerical solution of partial differential equations (PDEs) is challenging because of the need to resolve spatiotemporal features over wide length- and timescales. Often, it is computationally intractable to resolve the finest features in the solution. The only recourse is to use approximate coarse-grained representations, which aim to accurately represent long-wavelength dynamics while properly accounting for unresolved small-scale physics. Deriving such coarse-grained equations is notoriously difficult and often ad hoc. Here we introduce data-driven discretization, a method for learning optimized approximations to PDEs based on actual solutions to the known underlying equations. Our approach uses neural networks to estimate spatial derivatives, which are optimized end to end to best satisfy the equations on a low-resolution grid. The resulting numerical methods are remarkably accurate, allowing us to integrate in time a collection of nonlinear equations in 1 spatial dimension at resolutions 4× to 8× coarser than is possible with standard finite-difference methods.
Copyright © 2019 the Author(s). Published by PNAS.

Entities:  

Keywords:  coarse graining; computational physics; machine learning

Year:  2019        PMID: 31311866      PMCID: PMC6681734          DOI: 10.1073/pnas.1814058116

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


Solutions of nonlinear partial differential equations can have enormous complexity, with nontrivial structure over a large range of length- and timescales. Developing effective theories that integrate out short lengthscales and fast timescales is a long-standing goal. As examples, geometric optics is an effective theory of Maxwell equations at scales much longer than the wavelength of light (1); density functional theory models the full many-body quantum wavefunction with a lower-dimensional object—the electron density field (2); and the effective viscosity of a turbulent fluid parameterizes how small-scale features affect large-scale behavior (3). These models derive their coarse-grained dynamics by more or less systematic integration of the underlying governing equations (by using, respectively, WKB theory, local density approximation, and a closure relation for the Reynolds stress). The gains from coarse graining are, of course, enormous. Conceptually, it allows a deep understanding of emergent phenomena that would otherwise be masked by irrelevant details. Practically, it allows computation of vastly larger systems. Averaging out unresolved degrees of freedom invariably replaces them by effective parameters that mimic typical behavior. In other words, we identify the salient features of the dynamics at short and fast scales and replace these with terms that have a similar average effect on the long and slow scales. Deriving reliable effective equations is often challenging (4). Here we approach this challenge from the perspective of statistical inference. The coarse-grained representation of the function contains only partial information about it, since short scales are not modeled. Deriving coarse-grained dynamics requires first inferring the small-scale structure using the partial information (reconstruction) and then incorporating its effect on the coarse-grained field. We propose to perform reconstruction using machine-learning algorithms, which have become extraordinarily efficient at identifying and reconstructing recurrent patterns in data. Having reconstructed the fine features, modeling their effect can be done using our physical knowledge about the system. We call our method data-driven discretization. It is qualitatively different from coarse-graining techniques that are currently in use: Instead of analyzing equations of motion to derive effective behavior, we directly learn from high-resolution solutions to these equations.

Related Work

Several related approaches for computationally extracting effective dynamics have been previously introduced. Classic works used neural networks for discretizing dynamical systems (5, 6). Similarly, equation-free modeling approximates coarse-scale derivatives by remapping coarse initial conditions to fine scales which are integrated exactly (7). The method has a similar spirit to our approach, but it does not learn from fine-scale dynamics and use the memorized statistics in subsequent times to reduce the computational load. Recent works have applied machine learning to partial differential equations (PDEs), either focusing on speed (8–10) or recovering unknown dynamics (11, 12). Models focused on speed often replace the slowest component of a physical model with machine learning, e.g., the solution of Poisson’s equation in incompressible fluid simulations (9), subgrid cloud models in climate simulations (10), or building reduced-order models that approximate dynamics in a lower-dimensional space (8, 13, 14). These approaches are promising, but learn higher-level components than our proposed method. An important development is the ability to satisfy some physical constraints exactly by plugging learned models into a fixed equation of motion. For example, valid fluid dynamics can be guaranteed by learning either velocity fields directly (12) or a vector potential for velocity in the case of incompressible dynamics (8). Closely related to this work, neural networks can be used to calculate closure conditions for coarse-grained turbulent flow models (15, 16). However, these models rely on existing coarse-grained schemes specific to turbulent flows and do not discretize the equations directly. Finally, ref. 17 suggested discretizations whose solutions can be analytically guaranteed to converge to the center manifold of the governing equation, but not in a data-driven manner.

Data-Driven Subgrid-Scale Modeling

Consider a generic PDE, describing the evolution of a continuous field ,Most PDEs in the exact sciences can be cast in this form, including equations that describe hydrodynamics, electrodynamics, chemical kinetics, and elasticity. A common algorithm to numerically solve such equations is the method of lines (18): Given a spatial discretization , the field is represented by its values at node points (finite differences) or by its averages over a grid cell, (finite volumes), where is the spatial resolution (19). The time evolution of can be computed directly from Eq. by approximating the spatial derivatives at these points. There are various methods for this approximation—polynomial expansion, spectral differentiation, etc.—all yielding formulas resemblingwhere the are precomputed coefficients. For example, the 1-dimensional (1D) finite-difference approximation for to first-order accuracy is . Standard schemes use one set of precomputed coefficients for all points in space, while more sophisticated methods alternate between different sets of coefficients according to local rules (20, 21). This discretization transforms Eq. into a set of coupled ordinary differential equations of the formthat can be numerically integrated using standard techniques. The accuracy of the solution to Eq. depends on , converging to a solution of Eq. as . Qualitatively, accuracy requires that is smaller than the spatial scale of the smallest feature of the field . However, the scale of the smallest features is often orders of magnitude smaller than the system size. High-performance computing has been driven by the ever increasing need to accurately resolve smaller-scale features in PDEs. Even with petascale computational resources, the largest direct numerical simulation of a turbulent fluid flow ever performed has Reynolds number of order 1,000, using about grid points (22–24). Simulations at higher Reynolds number require replacing the physical equations with effective equations that model the unresolved physics. These equations are then discretized and solved numerically, e.g., using the method of lines. This overall procedure essentially modifies Eq. , by changing the to account for the unresolved degrees of freedom, replacing the discrete equations in Eq. with a different set of discrete equations. The main idea of this work is that unresolved physics can instead be learned directly from data. Instead of deriving an approximate coarse-grained continuum model and discretizing it, we suggest directly learning low-resolution discrete models that encapsulate unresolved physics. Rigorous mathematical work shows that the dimension of a solution manifold for a nonlinear PDE is finite (25, 26) and that approximate parameterizations can be constructed (27–29). If we knew the solution manifold, we could generate equation-specific approximations for the spatial derivatives in Eq. , approximations that have the potential to hold even when the system is underresolved. In contrast to standard numerical methods, the coefficients are equation dependent. Different regions in space (e.g., inside and outside a shock) will use different coefficients. To discover these formulas, we use machine learning: We first generate a training set of high-resolution data and then learn the discrete approximations to the derivatives in Eq. from this dataset. This produces a tradeoff in computational cost, which can be alleviated by carrying out high-resolution simulations on small systems to develop local approximations to the solution manifold and using them to solve equations in much larger systems at significantly reduced spatial resolution.

Burgers’ Equation.

For concreteness, we demonstrate this approach with a specific example in 1 spatial dimension. Burgers’ equation is a simple nonlinear equation which models fluid dynamics in 1D and features shock formation. In its conservative form, it is written aswhere is the viscosity and is an external forcing term. is the flux. Generically, solutions of Eq. spontaneously develop sharp shocks, with specific relationships between the shock height, width, and velocity (19) that define the local structure of the solution manifold. With this in mind, consider a typical segment of a solution to Burgers’ equation (Fig. 1). We want to compute the time derivative of the field given a low-resolution set of points (blue diamonds in Fig. 1). Standard finite-difference formulas predict this time derivative by approximating as a piecewise-polynomial function passing through the given points (orange curves in Fig. 1). But solutions to Burger’s equations are not polynomials: They are shocks with characteristic properties. By using this information, we can derive a more accurate, albeit equation-specific, formula for the spatial derivatives. For the method to work it should be possible to reconstruct the fine-scale solution from low-resolution data. To this end, we ran many simulations of Eq. and used the resulting data to train a neural network. Fig. 1 compares the predictions of our neural net (details below and in ) to fourth-order polynomial interpolation. This learned model is clearly far superior to the polynomial approximation, demonstrating that the spatial resolution required for parameterizing the solution manifold can be greatly reduced with equation-specific approximations rather than finite differences.
Fig. 1.

Polynomial vs. neural net-based interpolation. (A) Interpolation between known points (blue diamonds) on a segment of a typical solution of Burgers’ equation. Polynomial interpolation exhibits spurious “overshoots” in the vicinity of shock fronts. These errors compound when integrated in time, such that a naive finite-difference method at this resolution quickly diverges. In contrast, the neural network interpolation is so close to the exact solution that it cannot be visually distinguished. (B) Histogram of exact vs. interpolated function values over our full validation dataset. The neural network vastly reduces the number of poor predictions. (C) Absolute error vs. local curvature. The thick line shows the median and the shaded region shows the central 90% of the distribution over the validation set. The neural network makes much smaller errors in regions of high curvature, which correspond to shocks.

Polynomial vs. neural net-based interpolation. (A) Interpolation between known points (blue diamonds) on a segment of a typical solution of Burgers’ equation. Polynomial interpolation exhibits spurious “overshoots” in the vicinity of shock fronts. These errors compound when integrated in time, such that a naive finite-difference method at this resolution quickly diverges. In contrast, the neural network interpolation is so close to the exact solution that it cannot be visually distinguished. (B) Histogram of exact vs. interpolated function values over our full validation dataset. The neural network vastly reduces the number of poor predictions. (C) Absolute error vs. local curvature. The thick line shows the median and the shaded region shows the central 90% of the distribution over the validation set. The neural network makes much smaller errors in regions of high curvature, which correspond to shocks.

Models for Time Integration

The natural question to ask next is whether such parameterizations can be used for time integration. For this to work well, integration in time must be numerically stable, and our models need a strong generalization capacity: Even a single error could throw off the solution for later times. To achieve this, we use multilayer neural networks to parameterize the solution manifold, because of their flexibility, including the ability to impose physical constraints and interpretability through choice of model architecture. The high-level aspects of the network’s design, which we believe are of general interest, are described below. Additional technical details are described in and source code is available online at https://github.com/google/data-driven-discretization-1d.

Pseudolinear Representation.

Our network represents spatial derivatives with a generalized finite-difference formula similar to Eq. : The output of the network is a list of coefficients such that the th derivative is expressed as a pseudolinear filter, Eq. , where the coefficients depend on space and time through their dependence on the field values in the neighboring cells. Finding the optimal coefficients is the crux of our method. The pseudolinear representation is a direct generalization of the finite-difference scheme of Eq. . Moreover, exactly as in the case of Eq. , a Taylor expansion allows us to guarantee formal polynomial accuracy. That is, we can impose that approximation errors decay as for some , by layering a fixed affine transformation (). We found the best results when imposing linear accuracy, with a 6-point stencil (), which we used for all results shown here. Finally, we note that this pseudolinear form is also a generalization of the popular essentially nonoscillatory (ENO) and weighted ENO (WENO) methods (20, 21), which choose a local linear filter (or a combination of filters) from a precomputed list according to an estimate of the solution’s local curvature. WENO is an efficient, human-understandable, way of adaptively choosing filters, inspired by nonlinear approximation theory. We improve on WENO by replacing heuristics with directly optimized quantities.

Physical Constraints.

Since Burgers’ equation is an instance of the continuity equation, as with traditional methods, a major increase in stability is obtained when using a finite-volume scheme, ensuring the coarse-grained solution satisfies the conservation law implied by the continuity equation. That is, coarse-grained equations are derived for the cell averages of the field , rather than its nodal values (19). During training we provide the cell average to the network as the “true” value of the discretized field. Integrating Eq. , it is seen that the change rate of the cell averages is completely determined by the fluxes at cell boundaries. This is an exact relation, in which the only challenge is estimating the flux given the cell averages. Thus, prediction is carried out in 3 steps: First, the network reconstructs the spatial derivatives on the boundary between grid cells (staggered grid). Then, the approximated derivatives are used to calculate the flux using the exact formula Eq. . Finally, the temporal derivative of the cell averages is obtained by calculating the total change at each cell by subtracting at the cell’s left and right boundaries. The calculation of the time derivative from the flux can also be done using traditional techniques that promote stability, such as monotone numerical fluxes (19). For some experiments, we use Godunov flux, inspired by finite-volume ENO schemes (20, 21), but it did not improve predictions for our neural networks models. Dividing the inference procedure into these steps is favorable in a few aspects: First, it allows us to constrain the model at the various stages using traditional techniques; the conservative constraint, numerical flux, and formal polynomial accuracy constraints are what we use here, but other constraints are also conceivable. Second, this scheme limits the machine-learning part to reconstructing the unknown solution at cell boundaries, which is the main conceptual challenge, while the rest of the scheme follows either the exact dynamics or traditional approximations for them. Third, it makes the trained model more interpretable since the intermediate outputs (e.g., or ) have clear physical meaning. Finally, these physical constraints contribute to more accurate and stable models, as detailed in the ablation study in .

Choice of Loss.

The loss of a neural net is the objective function minimized during training. Rather than optimizing the prediction accuracy of the spatial derivatives, we optimize the accuracy of the resulting time derivative* . This allows us to incorporate physical constraints in the training procedure and directly optimize the final predictions rather than intermediate stages. Our loss is the mean-squared error between the predicted time derivative and labeled data produced by coarse graining the fully resolved simulations. Note that a low value of our training loss is a necessary but not sufficient condition for accurate and stable numerical integration over time. Many models with low training loss exhibited poor stability when numerically integrated (e.g., without the conservative constraint), particularly for equations with low dissipation. From a machine-learning perspective, this is unsurprising: Imitation learning approaches, such as our models, often exhibit such issues because the distribution of inputs produced by the model’s own predictions can differ from the training data (30). Incorporating the time-integrated solution into the loss improved predictions in some cases (as in ref. 9), but did not guarantee stability, and could cause the training procedure itself to diverge due to decreased stability in calculating the loss. Stability for learned numerical methods remains an important area of exploration for future work.

Learned Coefficients.

We consider 2 different parameterizations for learned coefficients. In our first parameterization, we learn optimized time- and space-independent coefficients. These fixed coefficients minimize the loss when averaged over the whole training set for a particular equation, without allowing the scheme to adapt the coefficients according to local features of the solution. Below, we refer to these as “optimized constant coefficients.” In our second parameterization, we allow the coefficients to be an arbitrary function of the neighboring field values , implemented as a fully convolutional neural network (31). We use the exact same architecture (3 layers, each with 32 filters, kernel size of 5, and ReLU nonlinearity) for coarse graining all equations discussed in this work. Example coefficients predicted by our trained models are shown in Fig. 2 and . Both the optimized constant and data-dependent coefficients differ from baseline polynomial schemes, particularly in the vicinity of the shock. The neural network solutions are particularly interesting: They do not appear to be using 1-sided stencils near the shock, in contrast to traditional numerical methods such as WENO (21) which avoid placing large weights across discontinuities.
Fig. 2.

Learned finite-volume coefficients for Burgers’ equation. Shown are fixed and spatiotemporally varying finite-volume coefficients (Eq. ) for . (A) Various centered and 1-sided polynomial finite-volume coefficients, along with optimized constant coefficients trained on this dataset (16× resample factor in Fig. 3). The vertical scale, which is the same for all coefficient plots, is not shown for clarity. (B) An example temporal snapshot of a solution to Burgers’ equation (Eq. ), along with data-dependent coefficients produced by our neural network model at each of the indicated positions on cell boundaries. The continuous solution is plotted as a dashed line, and the discrete cell-averaged representation is plotted as a piecewise constant solid line. The optimized constant coefficients are most similar to the neural network’s coefficients at the shock position. Away from the shock, the solution resembles centered polynomial coefficients. (B, Inset) Relative probability density for neural network coefficient “center of mass” vs. field value across our full test dataset. Center of mass is calculated by averaging the positions of each element in the stencil, weighted by the absolute value of the coefficient.

Learned finite-volume coefficients for Burgers’ equation. Shown are fixed and spatiotemporally varying finite-volume coefficients (Eq. ) for . (A) Various centered and 1-sided polynomial finite-volume coefficients, along with optimized constant coefficients trained on this dataset (16× resample factor in Fig. 3). The vertical scale, which is the same for all coefficient plots, is not shown for clarity. (B) An example temporal snapshot of a solution to Burgers’ equation (Eq. ), along with data-dependent coefficients produced by our neural network model at each of the indicated positions on cell boundaries. The continuous solution is plotted as a dashed line, and the discrete cell-averaged representation is plotted as a piecewise constant solid line. The optimized constant coefficients are most similar to the neural network’s coefficients at the shock position. Away from the shock, the solution resembles centered polynomial coefficients. (B, Inset) Relative probability density for neural network coefficient “center of mass” vs. field value across our full test dataset. Center of mass is calculated by averaging the positions of each element in the stencil, weighted by the absolute value of the coefficient.
Fig. 3.

Time integration results for Burgers’ equation. (A) A particular realization of a solution at varying resolution solved by the baseline first-order finite-volume method, WENO, optimized constant coefficients with Godunov flux (Opt. God.), and the neural network (NN), with the white region indicating times when the solution diverged. Both learned methods manifestly outperform the baseline method and even outperform WENO at coarse resolutions. (B) Inference predictions for the 32× neural network model, on a 10 times larger spatial domain (only partially shown). The box surrounded by the dashed line shows the spatial domain used for training. (C) Mean absolute error between integrated solutions and the ground truth, averaged over space, times less than 15, and 10 forcing realizations on the 10-times larger inference domain. These metrics almost exactly match results on the smaller training domain (). As ground truth, we use WENO simulations on a 1× grid. Markers are omitted if some simulations diverged or if the average error is worse than fixing .

The output coefficients can also be interpreted physically. For example, coefficients for both (Fig. 2 , Inset) and () are either right or left biased, opposite the sign of . This is in line with our physical intuition: Burgers’ equation describes fluid flow, and the sign of corresponds to the direction of flow. Coefficients that are biased in the opposite direction of essentially look “upwind,” a standard strategy in traditional numerical methods for solving hyperbolic PDEs (19), which helps constrain the scheme from violating temporal causality. Alternatively, upwinding could be built into the model structure by construction, as we do in models which use Godunov flux.

Results

To assess the accuracy of the time integration from our coarse-grained model, we computed “exact” solutions to Eq. for different realizations of at high enough resolution to ensure mesh convergence. These realizations of were drawn from the same distribution as those used for training, but were not in the training set. Then, for the same realization of the forcing we solved the equation at a lower, possibly underresolved resolution using 4 different methods for calculating the flux: 1) A standard finite-volume scheme with either first-order or third-order accuracy; 2) a fifth-order upwind-biased WENO scheme with Godunov flux (21); 3) spatial derivatives estimated by constant optimized coefficients, with and without Godunov flux; and 4) spatial derivatives estimated by the space- and time-dependent coefficients, computed with a neural net. Results are shown in Fig. 3. Fig. 3 compares the integration results for a particular realization of the forcing for different values of the resample factor, that is, the ratio between the number of grid points in the low-resolution calculation and that of the fully converged solution . Our learned models, with both constant and solution-dependent coefficients, can propagate the solution in time and dramatically outperform the baseline method at low resolution. Importantly, the ringing effect around the shocks, which leads to numerical instabilities, is practically eliminated. Time integration results for Burgers’ equation. (A) A particular realization of a solution at varying resolution solved by the baseline first-order finite-volume method, WENO, optimized constant coefficients with Godunov flux (Opt. God.), and the neural network (NN), with the white region indicating times when the solution diverged. Both learned methods manifestly outperform the baseline method and even outperform WENO at coarse resolutions. (B) Inference predictions for the 32× neural network model, on a 10 times larger spatial domain (only partially shown). The box surrounded by the dashed line shows the spatial domain used for training. (C) Mean absolute error between integrated solutions and the ground truth, averaged over space, times less than 15, and 10 forcing realizations on the 10-times larger inference domain. These metrics almost exactly match results on the smaller training domain (). As ground truth, we use WENO simulations on a 1× grid. Markers are omitted if some simulations diverged or if the average error is worse than fixing . Since our model is trained on fully resolved simulations, a crucial requirement for our method to be of practical use is that training can be done on small systems, but still produce models that perform well on larger ones. We expect this to be the case, since our models, being based on convolutional neural networks, use only local features and by construction are translation invariant. Fig. 3 illustrates the performance of our model trained on the domain for predictions on a 10-times larger spatial domain of size . The learned model generalizes well. For example, it shows good performance when function values are all positive in a region of size greater than , which due to the conservation law cannot occur in the training dataset. To make this assessment quantitative, we averaged over many realizations of the forcing and calculated the mean absolute error integrated over time and space. Results on the 10-times larger inference domain are shown in Fig. 3: The solution from the full neural network has equivalent accuracy to increasing the resolution for the baseline by a factor of about 8×. Interestingly, even the simpler constant-coefficient method significantly outperforms the baseline scheme. The constant-coefficient model with Godunov flux is particularly compelling. This model is faster than WENO, because there is no need to calculate coefficients on the fly, with comparable accuracy and better numerical stability at coarse resolution, as shown in Figs. 3 and 4.
Fig. 4.

Model performance across all of our test equations. Each plot shows the median time for which an integrated solution remains “valid” for each equation, defined by the absolute error on at least 80% of grid points being less than the 20th percentile of the absolute error from predicting all 0s. These thresholds were chosen so that valid corresponds to a relatively generous definition of an approximately correct solution. Error bars show the 95% confidence interval for the median across 100 simulations for each equation, determined by bootstrap resampling. Simulations for each equation were run out to a maximum of time of 100.

Model performance across all of our test equations. Each plot shows the median time for which an integrated solution remains “valid” for each equation, defined by the absolute error on at least 80% of grid points being less than the 20th percentile of the absolute error from predicting all 0s. These thresholds were chosen so that valid corresponds to a relatively generous definition of an approximately correct solution. Error bars show the 95% confidence interval for the median across 100 simulations for each equation, determined by bootstrap resampling. Simulations for each equation were run out to a maximum of time of 100. These calculations demonstrate that neural networks can carry out coarse graining. Even if the mesh spacing is much larger than the shock width, the model is still able to accurately propagate dynamics over time, showing that it has learned an internal representation of the shock structure.

Other Examples.

To demonstrate the robustness of this method, we repeated the procedure for 2 other canonical PDEs: The Korteweg–de Vries (KdV) equation (32), which was first derived to model solitary waves on a river bore and is known for being completely integrable and to feature soliton solutions, and the Kuramoto–Sivashinsky (KS) equation which models flame fronts and is a textbook example of a classically chaotic PDE (33). All details about these equations are given in . We repeated the training procedure outlined above for these equations, running high-resolution simulations and collecting data to train equation-specific estimators of the spatial derivative based on a coarse grid. These equations are essentially nondissipative, so we do not include a forcing term. The solution manifold is explored by changing the initial conditions, which are taken to be a superposition of long-wavelength sinusoidal functions with random amplitudes and phases (see for details). To assess the accuracy of the integrated solution, for each initial condition we define “valid simulation time” as the first time that the low-resolution integrated solution deviates from the cell-averaged high-resolution solution by more than a given threshold. We found this metric more informative to compare across very different equations than absolute error. Fig. 4 shows the median valid simulation time as a function of the resample factor. For all equations and resolutions, our neural network models have comparable or better performance than all other methods. The neural network is particularly advantageous at low resolutions, demonstrating its improved ability to solve coarse-grained dynamics. The optimized constant coefficients perform better at coarse resolution than baseline methods, but not always at high resolutions. Finally, at large enough resample factors the neural network approximations also fail to reproduce the dynamics, as expected. These results also hold on a 10-times larger spatial domain, as shown in , along with figures illustrating specific realizations and mean absolute error ().

Discussion and Conclusion

It has long been remarked that even simple nonlinear PDEs can generate solutions of great complexity. But even very complex, possibly chaotic, solutions are not just arbitrary functions: They are highly constrained by the equations they solve. In mathematical terms, despite the fact that the solution set of a PDE is nominally infinite dimensional, the inertial manifold of solutions is much smaller and can be understood in terms of interactions between local features of the solutions to nonlinear PDEs. The dynamical rules for interactions between these features have been well studied over the past 50 years. Examples include, among many others, interactions of shocks in complex media, interactions of solitons (32), and the turbulent energy cascade (34). Machine learning offers a different approach for modeling these phenomena, by using training data to parameterize the inertial manifold itself; said differently, it learns both the features and their interactions from experience of the solutions. Here we propose a simple algorithm for achieving this, motivated by coarse graining in physical systems. It is often the case that coarse graining a PDE amounts to modifying the weights in a discretized numerical scheme. Instead, we use known solutions to learn these weights directly, generating data-driven discretizations. This effectively parameterizes the solution manifold of the PDE, allowing the equation to be solved at high accuracy with an unprecedented low resolution. Faced with this success, it is tempting to try and leverage the understanding the neural network has developed to gain new insights about the equation or its coarse-grained representation. Indeed, in Fig. 2 we could clearly interpret the directionality of the weights as an upwind bias, the pseudolinear representation providing a clear interpretation of the prediction in a physically sensible way. However, extracting more abstract insight from the network, such as the scaling relation between the shock height and width, is a difficult challenge. This is a general problem in the field of machine learning, which is under intensive current research (35, 36). Our results are promising, but 2 challenges remain before our approach can be deployed at large scales. The first challenge is speed. We showed that optimized constant coefficients can already improve accuracy, but our best models rely on the flexibility of neural networks. Unfortunately, our neural nets use many more convolution operations than the single convolution required to implement finite differences, e.g., convolutions with a 5-point stencil between our second and third layers. We suspect that other machine-learning approaches could be dramatically faster. For example, recent work on a related problem—inferring subpixel resolution from natural images—has shown that banks of pretrained linear filters can nearly match the accuracy of neural nets with orders of magnitude better performance (37, 38). The basic idea is to divide input images into local patches, classify patches into classes based on fixed properties (e.g., curvature and orientation), and learn a single optimal linear filter for each class. Such computational architectures would also facilitate extracting physical insights from trained filters. A second challenge is scaling to higher-dimensional problems and more complex grids. Here we showcased the approach for regular grids in 1D, but most problems in the real world are higher dimensional, and irregular and adaptive grids are common. We do expect larger potential gains in 2D and 3D, as the computational gain in terms of the number of grid points would scale like the square or the cube of the resample factor. Irregular grids may be more challenging, but deep learning methods that respect appropriate invariants have been developed both for arbitrary graphs (39) and for collections of points in 3D space (40). Similar to what we found here, we expect that hand-tuned heuristics for both gridding and grid coefficients could be improved upon by systematic machine learning. More broadly, data-driven discretization suggests the potential of data-driven numerical methods, combining the optimized approximations of machine learning with the generalization of physical laws.
  5 in total

Review 1.  Equation-free multiscale computation: algorithms and applications.

Authors:  Ioannis G Kevrekidis; Giovanni Samaey
Journal:  Annu Rev Phys Chem       Date:  2009       Impact factor: 12.703

2.  Reynolds number scaling of velocity increments in isotropic turbulence.

Authors:  Kartik P Iyer; Katepalli R Sreenivasan; P K Yeung
Journal:  Phys Rev E       Date:  2017-02-10       Impact factor: 2.529

3.  Discovering governing equations from data by sparse identification of nonlinear dynamical systems.

Authors:  Steven L Brunton; Joshua L Proctor; J Nathan Kutz
Journal:  Proc Natl Acad Sci U S A       Date:  2016-03-28       Impact factor: 11.205

4.  Deep learning to represent subgrid processes in climate models.

Authors:  Stephan Rasp; Michael S Pritchard; Pierre Gentine
Journal:  Proc Natl Acad Sci U S A       Date:  2018-09-06       Impact factor: 11.205

5.  Deep learning for universal linear embeddings of nonlinear dynamics.

Authors:  Bethany Lusch; J Nathan Kutz; Steven L Brunton
Journal:  Nat Commun       Date:  2018-11-23       Impact factor: 14.919

  5 in total
  13 in total

1.  Three pitfalls to avoid in machine learning.

Authors:  Patrick Riley
Journal:  Nature       Date:  2019-08       Impact factor: 49.962

2.  Deep neural networks for waves assisted by the Wiener-Hopf method.

Authors:  Xun Huang
Journal:  Proc Math Phys Eng Sci       Date:  2020-03-25       Impact factor: 2.704

3.  Coarse-scale PDEs from fine-scale observations via machine learning.

Authors:  Seungjoon Lee; Mahdi Kooshkbaghi; Konstantinos Spiliotis; Constantinos I Siettos; Ioannis G Kevrekidis
Journal:  Chaos       Date:  2020-01       Impact factor: 3.642

4.  Large-scale distributed linear algebra with tensor processing units.

Authors:  Adam G M Lewis; Jackson Beall; Martin Ganahl; Markus Hauru; Shrestha Basu Mallick; Guifre Vidal
Journal:  Proc Natl Acad Sci U S A       Date:  2022-08-08       Impact factor: 12.779

5.  Hierarchical deep learning of multiscale differential equation time-steppers.

Authors:  Yuying Liu; J Nathan Kutz; Steven L Brunton
Journal:  Philos Trans A Math Phys Eng Sci       Date:  2022-06-20       Impact factor: 4.019

6.  An empirical mean-field model of symmetry-breaking in a turbulent wake.

Authors:  Jared L Callaham; Georgios Rigas; Jean-Christophe Loiseau; Steven L Brunton
Journal:  Sci Adv       Date:  2022-05-11       Impact factor: 14.957

7.  Physics-informed attention-based neural network for hyperbolic partial differential equations: application to the Buckley-Leverett problem.

Authors:  Ruben Rodriguez-Torrado; Pablo Ruiz; Luis Cueto-Felgueroso; Michael Cerny Green; Tyler Friesen; Sebastien Matringe; Julian Togelius
Journal:  Sci Rep       Date:  2022-05-09       Impact factor: 4.996

8.  Data-driven discovery of coordinates and governing equations.

Authors:  Kathleen Champion; Bethany Lusch; J Nathan Kutz; Steven L Brunton
Journal:  Proc Natl Acad Sci U S A       Date:  2019-10-21       Impact factor: 11.205

9.  Machine learning and serving of discrete field theories.

Authors:  Hong Qin
Journal:  Sci Rep       Date:  2020-11-09       Impact factor: 4.996

10.  A Novel Coupled Reaction-Diffusion System for Explainable Gene Expression Profiling.

Authors:  Muhamed Wael Farouq; Wadii Boulila; Zain Hussain; Asrar Rashid; Moiz Shah; Sajid Hussain; Nathan Ng; Dominic Ng; Haris Hanif; Mohamad Guftar Shaikh; Aziz Sheikh; Amir Hussain
Journal:  Sensors (Basel)       Date:  2021-03-21       Impact factor: 3.576

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.