We present a probabilistic seismic point source inversion, taking into account 3-D heterogeneous Earth structure. Our method rests on (1) reciprocity and numerical wavefield simulations in complex media and (2) Hamiltonian Monte Carlo sampling that requires only a small amount of test models to provide reliable uncertainty information on the timing, location, and mechanism of the source. Using spectral element simulations of 3-D, viscoelastic, anisotropic wave propagation, we precompute receiver side strain tensors in time and space. This enables the fast computation of synthetic seismograms for any hypothetical source within the volume of interest, and thus a Bayesian solution of the inverse problem. To improve efficiency, we developed a variant of Hamiltonian Monte Carlo sampling. Taking advantage of easily computable derivatives, numerical examples indicate that Hamiltonian Monte Carlo can converge to the posterior probability density with orders of magnitude less samples than the derivative-free Metropolis-Hastings algorithm, which we use for benchmarking. Exact numbers depend on observational errors and the quality of the prior. We apply our method to the Japanese Islands region where we previously constrained 3-D structure of the crust and upper mantle using full-waveform inversion with a minimum period of 15 s.
We present a probabilistic seismic point source inversion, taking into account 3-D heterogeneous Earth structure. Our method rests on (1) reciprocity and numerical wavefield simulations in complex media and (2) Hamiltonian Monte Carlo sampling that requires only a small amount of test models to provide reliable uncertainty information on the timing, location, and mechanism of the source. Using spectral element simulations of 3-D, viscoelastic, anisotropic wave propagation, we precompute receiver side strain tensors in time and space. This enables the fast computation of synthetic seismograms for any hypothetical source within the volume of interest, and thus a Bayesian solution of the inverse problem. To improve efficiency, we developed a variant of Hamiltonian Monte Carlo sampling. Taking advantage of easily computable derivatives, numerical examples indicate that Hamiltonian Monte Carlo can converge to the posterior probability density with orders of magnitude less samples than the derivative-free Metropolis-Hastings algorithm, which we use for benchmarking. Exact numbers depend on observational errors and the quality of the prior. We apply our method to the Japanese Islands region where we previously constrained 3-D structure of the crust and upper mantle using full-waveform inversion with a minimum period of 15 s.
Entities:
Keywords:
Bayesian inference; Monte Carlo methods; inverse theory; moment tensor; seismic source; seismology
The characterization of effectively point‐localized wavefield sources has been a prime objective of seismology ever since the first installations of sufficiently dense seismometer networks. Knowing the location, timing, moment tensor, and possibly the source time function of earthquakes is critical for a wide range of related fields and applications. These include seismotectonics, earthquake physics, seismic hazard analysis, early warning, nuclear monitoring, and seismic tomography.
Developments and Challenges in Seismic Source Inversion
Pioneering work on the mathematical description of earthquake sources (e.g., Backus & Mulcahy, 1976a, 1976b; Burridge & Knopoff, 1964; Knopoff & Randall, 1970) was accompanied by the development of practical methods to constrain their properties (e.g., Aki & Patton, 1978; Buland & Gilbert, 1976; Dziewoński & Gilbert, 1974; Dziewoński & Woodhouse, 1981; Dziewoński et al., 1981; Gilbert, 1973; Randall & Knopoff, 1970). As a result, routinely delivered source locations and moment tensors have become an indispensable and routinely employed element of geophysical research.Despite the conceptual simplicity of the seismic source inverse problem, methods for the characterization of point‐localized sources continue to proliferate, focusing on near‐real‐time operation and automation (e.g., Bernardi et al., 2004; Cesca et al., 2010; Käufl et al., 2015; Scognamiglio et al., 2009; Vackár et al., 2017), uncertainty quantification (e.g., Mustać & Tkalčić, 2016; Silwal & Tape, 2016; Staehler & Sigloch, 2014, 2017; Valentine & Trampert, 2012; Wéber, 2006), the incorporation of 3‐D structural models (e.g., Chen et al., 2007; Fichtner & Tkalcic, 2010; Hejrani et al., 2017; Hingee et al., 2011; Hsieh et al., 2014; Liu et al., 2004; Zhu & Zhou, 2016), and the use of new types of seismological observables (e.g., Donner et al., 2016; O'Toole et al., 2012).These continued developments also reflect fundamental difficulties in seismic source inversion. They include trade‐offs between source parameters and 3‐D Earth structure, and the existence of a null space, which often contains, for instance, the trace of the moment tensor (e.g., Dufumier & Rivera, 1997). The unavoidable null space commands the use of Monte Carlo methods to sample the space of acceptable solutions and to avoid results that are biased by regularization (e.g., Mosegaard & Tarantola, 1995; Sambridge & Mosegaard, 2002).Unfortunately, Monte Carlo methods for seismic source inversion suffer particularly strongly from the curse of dimensionality (e.g., Tarantola, 2005) for two reasons: (1) Since source parameters, such as moment tensor components, may vary over several orders of magnitude, the prior in model space tends to be very wide. In other words, the space (or number) of models that are a priori plausible is very large. (2) Since the observational errors in today's seismic data are often very small, the posterior in model space is rather concentrated, meaning that few models are actually plausible when tested against high‐quality data.As a consequence, many samples that are tested for being a priori plausible may in fact be wasted because they fail to produce synthetic seismograms that explain observed seismograms to within the small measurement uncertainties. This facet of the curse of dimensionality severely reduces the efficiency of Monte Carlo methods that strongly rely on prior information, including the widely used Metropolis‐Hastings algorithm (Hastings, 1970; Metropolis et al., 1953; Tarantola, 2005).
Hamiltonian Monte Carlo
Motivated by the potentially poor performance of Monte Carlo methods for high‐dimensional problems, Hamiltonian Monte Carlo was introduced as hybrid Monte Carlo by Duane et al. (1987) in the context of lattice quantum chromodynamics. The method moved into the focus of statistical computation through the work of Neal (1996) who used it for Bayesian neural network learning. Hamiltonian Monte Carlo was further popularized by the reviews of Neal (2011) and Betancourt (2017), who introduced the method without the overhead of advanced differential geometry. Today, applications of Hamiltonian Monte Carlo can be found in a wide range of disciplines, including neural networks and machine learning (e.g., Bishop, 2006), molecular simulations (e.g., Dubbledam et al., 2016), nuclear physics (e.g., Elhatisari et al., 2015), genomics (e.g., Honkela et al., 2015), and quantum mechanics (e.g., Seah et al., 2015). However, despite its success in widely varying disciplines, Hamiltonian Monte Carlo seems not to be used for the solution of geophysical inverse problems, with few recent exceptions (e.g., Muir & Tkalčić, 2015; Sen & Biswas, 2017).
Objectives and Outline
This work has two main objectives: (1) to introduce Hamiltonian Monte Carlo to geophysical inversion using the low‐dimensional point source inversion problem as example and (2) to illustrate the efficiency of Hamiltonian Monte Carlo for seismic source inversion, taking into account 3‐D heterogeneous Earth structure. Less explicitly, we further intend to pave the way toward higher‐dimensional finite‐source inversions where the benefits of Hamiltonian Monte Carlo are expected to become more pronounced than in the relatively low‐dimensional point source problem.This manuscript is organized as follows: In section 2 we summarize the forward problem solution, which rests on precomputed, receiver side Green's functions. Section 3 introduces Hamiltonian Monte Carlo in the context of Bayesian inference, with special focus on weakly nonlinear problems. A seismic source inversion toy problem that illustrates the basic concepts is the subject of section 4. Finally, in section 5, we present a real‐data point source inversion from the Japanese Islands region.
Forward Problem Solution
To set the stage for the probabilistic inversion, we describe the forward problem solution. For a moment tensor point source at position ξ, the i component of the displacement field at position x and time t is given by the representation theorem (e.g., Aki & Richards, 2002) asIn equation (1), M
denotes the components of the symmetric moment tensor, and
is the i component of the Green's function for a point source in n direction at position ξ. Equation (1) is inconvenient because the simulation of G
for heterogeneous media from all potential sources ξ is prohibitively expensive. We therefore employ spatial reciprocity (e.g., Aki & Richards, 2002) to modify the representation of u
toThis allows us to model u
via the precomputation of a database containing the derivatives of the receiver side Green's functions,
, for all receiver locations x. Equation (2) has been used in various source studies (e.g., Chen et al., 2007; Hejrani et al., 2017; Lee et al., 2014; Zhao et al., 2006).To enable source inversion, we parametrize the moment tensor in terms of time‐independent tensor components M
, the origin time t
0, and a small number of coefficients s
that premultiply the basis functions ϕ(t) of the source time function (moment function),In the interest of generality, we leave the shape of the basis functions ϕ(t) unspecified at this point. Based on equation (3), the forward problem solution depends on 10 + N parameters, six independent components of the moment tensor M
, three source coordinates ξ = (ξ1,ξ2,ξ3), the origin time t
0, and an adjustable number N of source time function coefficients s
. While alternative parametrizations of the time‐independent tensor components have been proposed (e.g., Jost & Herrmann, 1989; Kikuchi & Kanamori, 1991; Staehler & Sigloch, 2014; Tape & Tape, 2015), we prefer to work directly with M
in the interest of simplicity and because it preserves the linear dependence of synthetic seismograms on the source mechanism. This aspect will be further discussed in section 6.2.Equations (2) and (3) define the forward problem, that is, the computation of synthetic seismograms as a function of model parameters. In the following paragraphs we describe how these model parameters and their uncertainties can be inferred efficiently using Hamiltonian Monte Carlo inversion.
Hamiltonian Monte Carlo
General Hamiltonian Monte Carlo Sampling
For notational convenience, we collect all model parameters that we wish to infer into an N
‐dimensional vector q. Thus, q comprises moment tensor components, source coordinates, origin time, and source time function coefficients. Our goal is to sample the posterior probability density π(q|d) given by Bayes' theorem as
where d is the data vector, π(d|q) is the likelihood function, π(q) is the model space prior, and k is a normalization constant (e.g., Mosegaard & Tarantola, 1995; Sambridge & Mosegaard, 2002; Tarantola, 2005). In the context of Hamiltonian Monte Carlo, a model q is interpreted as the position vector of a particle, and equation (4) is used to define its potential energy U as the negative natural logarithm of the posterior,Thus, more plausible models q with large values of the posterior correspond to low potential energies, and vice versa. The kinetic energy K of the model q is defined as an auxiliary variable through the artificially introduced N
‐dimensional momentum vector p,The N
×N
mass matrix
is a tuning parameter. While it can in principle be set to any arbitrary positive definite matrix, its choice affects the performance of the Hamiltonian Monte Carlo algorithm, as we will explain in section 3.2. Endowed with momentum, the model or particle q will move through the 2N
‐dimensional phase space according to Hamilton's equations (e.g., Landau & Lifshitz, 1976; Symon, 1971),
and with increasing artificial time τ that is not to be confused with the physical time t. Based on this notion of a model moving as a particle through phase space, the Hamiltonian Monte Carlo algorithm operates in the following sequence of steps, starting from a randomly chosen initial model q:Randomly draw the N
momenta p
from the normal distribution
.With position and momentum thus determined, propagate the model q forward in time by solving Hamilton's equations (7). After some time τ, which remains to be chosen, a new model q(τ) with a new momentum p(τ) is reached.The new model is accepted with probability
where H = K + U is the total energy or Hamiltonian of the model. If accepted, q(τ) will serve as initial model for the next trajectory. Otherwise, q will be reused as starting point.Return to (1) and repeat until sufficiently many samples have been produced.The models generated by the algorithm are samples of the posterior π(q|d) and may be used to compute marginals, means, covariances, or other quantities of interest. Tuning parameters that affect convergence are the mass matrix
and the propagation time τ along the Hamiltonian trajectory. These will be discussed below in the context of specific applications.
Weakly Nonlinear Problems
Most of the computational expense in Hamiltonian Monte Carlo is in the evaluation of ∂U/∂q
along the Hamiltonian trajectory, because this requires us to compute derivatives of the posterior and, therefore, of the forward problem. The computation cost can be reduced substantially in the case of Gaussian observational errors and weakly nonlinear problems where the forward equations can be meaningfully linearized. In fact, under the Gaussian assumption and in the context of seismic source inversion, the potential energy U defined in equation (5) is equal toThe first term in equation (9), U
, is the L
2 waveform misfit between observations
and synthetics u
(x
,t;q) at N
receiver positions x
, integrated over the time interval [0,T]. The data covariance
encodes the observational uncertainties. While
could be made time dependent and receiver dependent, we keep it constant for notational convenience. The second term in equation (9), U
, captures Gaussian prior knowledge on the model parameters q, with prior mean q
0 and prior covariance matrix C
. Using equations (5) and (9), the likelihood function and the model‐space prior may be written asExpanding the synthetics in equation (9) around q
0 gives correct to first order,With the forward problem defined in equations (2) and (3), expression (11) is exact for the moment tensor components M
and the source time function coefficients s
. Higher‐order terms only exist for the source coordinates ξ and the origin time t
0. Substituting (11) into (9) and isolating the model parameters q
yields a simplified expression for the potential energy,The quantities A
, b
, and c are defined asBeing independent of the variable q, they can be precomputed before running the Hamiltonian Monte Carlo sampling. With the help of (12), the derivative of the potential energy is reduced to the simple vector‐matrix product,
which does not require significant computational resources. The first‐order approximation (11) leads to a modified Hamiltonian Monte Carlo algorithm where an approximate trajectory is computed on the basis of equation (14), while the correct criterion (8) is still used to decide about the acceptance of new samples.In addition to yielding a computationally more efficient algorithm, equation (14) can also guide the choice of the mass matrix
that we have so far left unspecified. Indeed, substituting (14) into Hamilton's equations (7) gives a second‐order differential equation for q(τ),Using the matrix exponential and matrix square root, the solution of (15) can be symbolically written as
with some constant vectors q
1 and q
2. Equation (16) reveals that the model q(τ) oscillates through phase space along closed Hamiltonian trajectories. Depending on the product
, some components of q(τ) may oscillate very rapidly and explore phase space in a short time, while others may oscillate slowly and take a long time for phase space oscillation. However, for an efficient algorithm it is desirable that all components oscillate with roughly equal speed. This can be achieved by choosing
when A is indeed a positive definite covariance matrix that can be used to draw random momenta in Step 1 of the Hamiltonian Monte Carlo algorithm. Otherwise, choosing
to be a diagonal matrix with entries Q
=M
empirically works well in the examples presented below.In the following sections we will illustrate the application of the modified Hamiltonian Monte Carlo algorithm, first using an easily reproducible toy problem, and then with a real‐data application in the Japanese Islands region.
Numerical Details
To integrate Hamilton's equations (7), we use the leapfrog algorithm. Being symplectic, it ensures that the time‐discrete equations still maintain two properties of Hamiltonian dynamics that are essential for Hamiltonian Monte Carlo, volume preservation and reversability (Neal, 2011). Following Neal (1996), we choose the integration step size empirically to be slightly less than the minimum step size required for numerical stability. This ensures that the integration error remains moderate and that the phase space is explored efficiently with the minimum number of integration steps. The length of individual Hamiltonian trajectories, that is, τ in equation (8), is found adaptively using the No‐U‐Turn criterion (Hoffmann & Gelman, 2014). This ensures that trajectories terminate when they begin to return toward their starting point. While a mathematical proof for the efficiency of the No‐U‐Turn criterion still seems to be missing, it is commonly found to work well empirically (Betancourt, 2017).
Toy Problem Illustrations
We begin our illustrative tour with simplistic synthetic inversions of P waveforms in a homogeneous elastic full space where the forward problem can be solved analytically (e.g., Aki & Richards, 2002). These examples are intended to be educational and to explain key features of Hamiltonian Monte Carlo.
Model Space Exploration
In the first example we only consider the moment tensor components M
, keeping all other parameters fixed. The source is at (x
,y
,z
) = (0,0,0) m with M
=1 and M
=M
=M
=M
=M
=0 Nm. Three receivers are located along the coordinate axes at (x
1,y
1,z
1) = (103,0,0), (x
2,y
2,z
2) = (0,103,0), and (x
3,y
3,z
3) = (0,0,103) m. In this configuration, only the diagonal components M
, M
, and M
can be constrained. The off‐diagonals M
, M
, and M
are purely controlled by the model‐space prior π(q) from equations (9) and (10). For simplicity, we use a zero prior mean q
0=0 and a diagonal prior covariance matrix
, where I is the identity matrix. Both the prior standard deviation, σ
, and the observational errors σ
we will vary during the subsequent numerical experiments.To understand the functioning of Hamiltonian Monte Carlo, we consider the projection onto the M
‐M
plane of the first three trajectories of a representative random walk, shown in Figure 1. The walk starts at a random position, indicated by the red star. There, the model q is given a random momentum p, shown by a black arrow. Following Hamiltonian dynamics, the model moves along an elliptical trajectory, roughly in the direction of M
=1 and M
=0 Nm. At the end of the trajectory, the model receives a new random momentum, which takes it onto a different elliptical trajectory, roughly orbiting around M
=1 and M
=0 Nm. The same procedure is then repeated. A key feature of Hamiltonian Monte Carlo, seen from this illustration, is that models stay within the typical set, that is, the relevant part of the model space where the posterior π(q|d) is large. This property is essential for the efficiency of Hamiltonian Monte Carlo. It ensures that new proposals are likely to be accepted and that the model space is explored rapidly.
Figure 1
Two‐dimensional projection onto the M
‐M
plane of the first three trajectories of a Hamiltonian Monte Carlo run. The red star and the blue dot mark the start and end points, respectively. Black arrows symbolize the random momentum vectors at the beginning of the trajectories, plotted in different colors.
Two‐dimensional projection onto the M
‐M
plane of the first three trajectories of a Hamiltonian Monte Carlo run. The red star and the blue dot mark the start and end points, respectively. Black arrows symbolize the random momentum vectors at the beginning of the trajectories, plotted in different colors.The importance of a high acceptance rate is illustrated in Figure 2, where we compare Hamiltonian Monte Carlo to the Metropolis‐Hastings algorithm. The latter proposes samples randomly according to the prior π(q) (Hastings, 1970; Metropolis et al., 1953; Mosegaard & Tarantola, 1995). The prior standard deviation is set to σ
=0.5 Nm for all moment tensor components, and the observational errors σ
decrease successively. (During each individual inversion, σ
is constant, but variable from one inversion to the next.) When σ
is as large as the P wave amplitude itself (σ
=1 in Figure 2a), the posterior π(q|d) is broad because many models fit the artificial observations acceptably well. Therefore, more than 60% of the test samples proposed during the Metropolis‐Hastings algorithm are accepted, and the model space is explored efficiently. However, as the observational error σ
decreases, the percentage of accepted test models drops superexponentially. When σ
equals 5% of the maximum P wave amplitude, not uncommon in practice, only 0.15% of all proposals are accepted. The algorithm is stalled at few models because the probability of randomly drawing a test model within the very small typical set of the posterior has become negligible. Consequently, the model space is explored very inefficiently, and most of the computational resources are wasted. In contrast to Metropolis‐Hastings, the acceptance rate of Hamiltonian Monte Carlo is constant around 40%, independent of the observational errors.
Figure 2
Acceptance rate of the Metropolis‐Hastings algorithm (solid) and Hamiltonian Monte Carlo (dashed) averaged over 10 runs with 10,000 samples each. (a) Acceptance rate as a function of the observational error σ
in equation (9), normalized to the maximum amplitude of the P wave. For Metropolis‐Hastings, the acceptance rate drops superexponentially with decreasing observational errors, while the acceptance rate for Hamiltonian Monte Carlo is constant at 40%. (b) Acceptance rate as a function of the prior standard deviation σ
for the moment tensor components. For an increasingly weaker prior, that is, increasing prior standard deviation, the acceptance rate for Metropolis‐Hastings decreases rapidly, while the acceptance rate of Hamiltonian Monte Carlo is again unaffected. The behavior of the prior and posterior typical sets for decreasing errors and weaker priors is illustrated in color below.
Acceptance rate of the Metropolis‐Hastings algorithm (solid) and Hamiltonian Monte Carlo (dashed) averaged over 10 runs with 10,000 samples each. (a) Acceptance rate as a function of the observational error σ
in equation (9), normalized to the maximum amplitude of the P wave. For Metropolis‐Hastings, the acceptance rate drops superexponentially with decreasing observational errors, while the acceptance rate for Hamiltonian Monte Carlo is constant at 40%. (b) Acceptance rate as a function of the prior standard deviation σ
for the moment tensor components. For an increasingly weaker prior, that is, increasing prior standard deviation, the acceptance rate for Metropolis‐Hastings decreases rapidly, while the acceptance rate of Hamiltonian Monte Carlo is again unaffected. The behavior of the prior and posterior typical sets for decreasing errors and weaker priors is illustrated in color below.The drop in acceptance rate observed in Figure 2a is a consequence of shrinking the typical set of the posterior relative to the fixed typical set of the prior, as observational uncertainties decrease. A similar phenomenon occurs when the typical set of the prior expands relative to a fixed typical set of the posterior. This is shown in Figure 2b where the standard deviation of the prior σ
increases from 0.5 to 2.5 Nm. (Again, σ
is constant within each inversion but varies from one run to the next.) The increasingly weaker prior enlarges the typical set of the prior from which the Metropolis‐Hastings algorithm draws its proposals. Consequently, it becomes less likely to obtain a proposal from the typical set of the posterior, and the acceptance rate drops. Again, Hamiltonian Monte Carlo does not suffer from this problem, which becomes even more severe in higher dimensions as a consequence of the curse of dimensionality (e.g., Tarantola, 2005).In summary, Figure 2 illustrates that Hamiltonian Monte Carlo is well suited for scenarios that are typical for seismic source inversion: high data quality (low observational errors) and weak prior knowledge on parameters such as moment tensor components that may vary over several orders of magnitude.
Convergence
The efficient exploration of model space in Hamiltonian Monte Carlo favorably affects convergence, as illustrated in Figure 3 for the case of σ
=0.1 and σ
=2.0 Nm. Using 20 bins, the posterior marginal of M
requires several thousand samples to converge. However, the posterior mean and the posterior standard deviation practically reach their final values already after a short burn‐in phase of several tens of samples, needed to find the typical set.
Figure 3
Convergence of the posterior marginal of the moment tensor component M
with increasing number of samples. (a) Histogram of the posterior marginal with 20 bins and a number of samples ranging between 20 and 100,000. (b) Convergence of the posterior mean and the posterior standard deviation of M
. While the posterior distribution in (a) requires several thousand samples to achieve acceptable convergence, the posterior mean and the posterior standard deviation converge after a very short burn‐in phase of several tens of samples.
Convergence of the posterior marginal of the moment tensor component M
with increasing number of samples. (a) Histogram of the posterior marginal with 20 bins and a number of samples ranging between 20 and 100,000. (b) Convergence of the posterior mean and the posterior standard deviation of M
. While the posterior distribution in (a) requires several thousand samples to achieve acceptable convergence, the posterior mean and the posterior standard deviation converge after a very short burn‐in phase of several tens of samples.For the same choice of σ
and σ
, the acceptance rate of the Metropolis‐Hastings algorithm is on the order of 10−4, meaning that any consideration of convergence would not even be meaningful when the number of samples is not some orders of magnitude larger than 104. While Figure 3 only shows the marginal of M
, we note that maginals for the remaining moment tensor components behave identically.
Extension to Source Location, Origin Time, and Source Time Function
Based on the formulation developed in section 3.2, the previous example can be extended to include origin time, source location, and source time function coefficients as free parameters. As bases for the source time function, ϕ, we choose 10 nonoverlapping box functions of duration 1 s. The dimension of the model space is therefore 20.Convergence and acceptance rate of Hamiltonian Monte Carlo behave similar as in the reduced scenario shown in Figures 2 and 3. This suggests that the linear approximation for origin time and source location indeed works well. For illustration, we show a collection of 2‐D posterior marginals for different parameter pairs, as well as 1‐D posterior marginals for the source time function coefficients, in Figure 4. Notable, and intuitively plausible, features of the marginals for this simple example, such as the trade‐off between source location x
and origin time t
0, indicate that the algorithm functions as expected and is ready for real‐data applications.
Figure 4
Selection of posterior marginals for the combined inference of source location, origin time, and source time function. (a) Trade‐offs between pairs of model parameters are quantified by 2‐D posterior marginals. The synthetic input model and the posterior maximum‐likelihood model are indicated by red and blue stars, respectively. (b) Posterior marginal for the source time function coefficients. Prior and posterior maximum‐likelihood values are marked by red and blue lines, respectively.
Selection of posterior marginals for the combined inference of source location, origin time, and source time function. (a) Trade‐offs between pairs of model parameters are quantified by 2‐D posterior marginals. The synthetic input model and the posterior maximum‐likelihood model are indicated by red and blue stars, respectively. (b) Posterior marginal for the source time function coefficients. Prior and posterior maximum‐likelihood values are marked by red and blue lines, respectively.
Application
To test the Hamiltonian Monte Carlo source inversion on observed waveforms, we consider an event from near the Izu arc, south of Japan (Figure 5a). In the following paragraphs, we will describe the data, the structural model used to compute synthetic seismograms, and the inferred properties of the event.
Figure 5
Source‐receiver geometry and the structural model. (a) Distribution of broadband stations (black triangles) used in the source inversion. The a priori epicentral location of the earthquake is marked by a red star. (b) SV velocity structure at 15 and 70 km depth in the radially anisotropic model of the Japanese Islands used for the computation of synthetic seismograms. In the 15 km depth slice, gray scale approximately marks mantle velocities, whereas colors from red over white to blue approximately represent crustal velocities.
Source‐receiver geometry and the structural model. (a) Distribution of broadband stations (black triangles) used in the source inversion. The a priori epicentral location of the earthquake is marked by a red star. (b) SV velocity structure at 15 and 70 km depth in the radially anisotropic model of the Japanese Islands used for the computation of synthetic seismograms. In the 15 km depth slice, gray scale approximately marks mantle velocities, whereas colors from red over white to blue approximately represent crustal velocities.
Data and Structural Model
The event used for this demonstration occurred on 30 May 2005. According to the global centroid moment tensor (CMT) catalog (www.globalcmt.org, last accessed on 17 November 2017), its origin was at latitude 32.20∘, longitude 141.01∘, depth 50.0 km, and GMT 11:18:08.47. The estimated magnitude was Mw 5.2. Among the available stations we selected 14 with an estimated signal‐to‐noise ratio above 10 in the period range of 15–100 s. The distribution of stations and the CMT location of the event are shown in Figure 5a.The structural model is based on the full‐waveform inversion of the Japanese Islands region by Simute et al. (2016). For the construction of the model, we used the numerical wave propagation code SES3D (Fichtner et al., 2009; Gokhberg & Fichtner, 2016), which combines spectral element simulations of wave propagation (Faccioli et al., 1997; Fichtner, 2010; Komatitsch & Vilotte, 1998) with adjoint techniques (e.g., Fichtner et al., 2006; Tarantola, 1988; Tromp et al., 2005). The waveform inversion framework LASIF (Krischer et al., 2015) was used for data management and for the iterative minimization of time‐ and frequency‐dependent phase misfits between observed and synthetic seismograms (Fichtner et al., 2008). In preparation for this source inversion study, we further improved the model of Simute et al. (2016) by reducing the minimum period from 20 to 15 s, using the same full‐waveform inversion method as before. The model is radially anisotropic and captures both crustal and mantle structure. To illustrate the complexity of the model, horizontal slices through the SV velocity distribution and 15 and 70 km depth are shown in Figure 5b.
Setup of the Hamiltonian Monte Carlo Sampling
To prepare for the posterior sampling, we chose the CMT solution as prior mean, q
0, for the Taylor expansion in equation (11), and for the computation of the quantities A
, b
, and c in equation (13). Since the duration of magnitude ∼5 earthquakes is usually only a few seconds (e.g., Vallée, 2013), which is small compared to the minimum period of 15 s, we restrict the model parameter space to the moment tensor components, the source location, and the origin time. Omitting the source time function, we therefore have N
=10. Figure 6 shows the prior source model, and a comparison of observed and prior synthetic seismograms computed with the structural model presented in the previous paragraph. While only the E‐W components are shown, all three components are used in the analysis.
Figure 6
Summary of source models and waveforms. (a) Properties of the prior and posterior maximum‐likelihood source models, the latter being obtained from 100 samples. DC percentage is the double‐couple percentage as defined in Jost and Herrmann (1989). (b) E‐W components seismograms. Observations are plotted in red, synthetics for the prior maximum‐likelihood source in black dashed, and synthetics for the posterior maximum‐likelihood source from 100 samples in black solid. The measurement windows used in the Hamiltonian Monte Carlo sampling are marked in white, with boundaries of the cosine taper in light gray. Time intervals marked in darker gray have been discarded because the initial fit between observations and synthetics was insufficient. Waveforms for the N‐S and vertical components appear similar and are therefore not shown.
Summary of source models and waveforms. (a) Properties of the prior and posterior maximum‐likelihood source models, the latter being obtained from 100 samples. DC percentage is the double‐couple percentage as defined in Jost and Herrmann (1989). (b) E‐W components seismograms. Observations are plotted in red, synthetics for the prior maximum‐likelihood source in black dashed, and synthetics for the posterior maximum‐likelihood source from 100 samples in black solid. The measurement windows used in the Hamiltonian Monte Carlo sampling are marked in white, with boundaries of the cosine taper in light gray. Time intervals marked in darker gray have been discarded because the initial fit between observations and synthetics was insufficient. Waveforms for the N‐S and vertical components appear similar and are therefore not shown.To avoid nonlinearity related to cycle skipping, we restrict measurements to time intervals where observed and prior synthetic seismograms match to within a half‐cycle. This windowing may be relaxed when only inverting for the moment tensor components, which enter the forward problem purely linearly. We conservatively set the observational error to σ
=0.1μm. The prior model covariance C
is diagonal with standard deviations set to infinity, thus reflecting a state of ignorance where the influence of the prior is negligible. Mathematically,
eliminates the term
in the computation of A
(equation (13)).With the goal to assess convergence of the Hamiltonian Monte Carlo sampling, we perform two runs, the first with 1 million samples, the second with only 100 samples. The posterior maximum‐likelihood model obtained from 100 samples is visualized in Figure 6a. The corresponding synthetic seismograms are shown below in Figure 6b by solid black curves. Despite the small number of samples, the maximum‐likelihood source provides a significantly improved fit to the observations, especially at stations GJM, HID, HJO, HRO, and KMU. At stations HID and KMU the fit also improves visibly outside the actual measurement window.
Posterior Distributions
The quality of the maximum‐likelihood model is fully characterized by the posterior, π(q|d), which constitutes the complete solution to the Bayesian inference or probabilistic inverse problem. Figure 7 visualizes the posterior in the form of 1‐D marginals for the individual model parameters.
Figure 7
Summary of one‐dimensional posterior marginals. (a) Marginals of the of the individual moment tensor components in units of 1016 Nm. Black histograms are drawn from 1 million samples. The posterior mean, standard deviation, and Gaussian are shown for 1 million samples in blue and for 100 samples in red. For better comparability, the x axes are labeled with the mean and deviations from the mean, for example, −1.18 ± 8 for M
. (b) Posterior distributions of the seismic moment, M
0, and the moment tensor trace, trM. The color coding is the same as above; blue for 1 million samples and red for only 100 samples. (c) Posterior marginals for source location and timing, relative to the prior origin time. Color coding and labeling as before.
Summary of one‐dimensional posterior marginals. (a) Marginals of the of the individual moment tensor components in units of 1016 Nm. Black histograms are drawn from 1 million samples. The posterior mean, standard deviation, and Gaussian are shown for 1 million samples in blue and for 100 samples in red. For better comparability, the x axes are labeled with the mean and deviations from the mean, for example, −1.18 ± 8 for M
. (b) Posterior distributions of the seismic moment, M
0, and the moment tensor trace, trM. The color coding is the same as above; blue for 1 million samples and red for only 100 samples. (c) Posterior marginals for source location and timing, relative to the prior origin time. Color coding and labeling as before.Standard deviations of the moment tensor components vary significantly, ranging from ∼0.4·1016 Nm for M
to ∼1.5·1016 Nm for M
. The uncertainties in the components of M translate into uncertainties of derived parameters, including the seismic moment and the trace, shown in Figure 7b. In fact, the maximum‐likelihood moment tensor has a nonzero trace of ∼3·1016 Nm, but a moment tensor with vanishing trace is only within little more than a standard deviation. Uncertainties of the time and space location parameters in Figure 7c are comparatively small, and the posterior maximum‐likelihood is close to the prior mean. As a consequence of stations HJO and AOG being just slightly west of the event (Figure 5), longitude is better constrained than latitude.A remarkable aspect is the number of samples needed to obtain useful approximations of the posterior mean and standard deviations for the individual parameters. These are indicated in Figure 7 for 1 million samples in blue and for 100 samples in red. While they differ, of course, by a few percent, they are for practical purposes identical. This important result confirms the preliminary synthetic study presented in section 4.2 and Figure 3.The most significant interparameter trade‐offs are shown in Figure 8 by a collection of 2‐D posterior marginals. They illustrate, for instance, that both M
and M
may be reduced without changing the misfit significantly, which is in accord with the previous finding that the trace is rather poorly constrained. To assess convergence of a 2‐D trade‐off measure, we compute the normalized posterior covariance
where σ
denotes the posterior standard deviation of parameter q
. Following the color scheme from Figure 7, values of cov0 are shown in blue for 1 million samples and in red for 100 samples. Since the computation of a covariance involves the square of parameters, the differences are larger than for the single‐parameter standard deviation, but still mostly limited to around 10%. The only exception is the parameter pair t
0 latitude, where larger differences result from the division by a small standard deviation in equation (17).
Figure 8
Summary of the most significant interparameter trade‐offs in the form of 2‐D posterior marginal distributions. All other parameter pairs are nearly independent. Posterior maximum‐likelihood values are indicated inside the panels. Following the color coding of Figure 7, normalized covariances, cov0(q
,q
) = cov(q
,q
)/(σ
σ
), are shown in blue for 1 million samples, and in red for 100 samples.
Summary of the most significant interparameter trade‐offs in the form of 2‐D posterior marginal distributions. All other parameter pairs are nearly independent. Posterior maximum‐likelihood values are indicated inside the panels. Following the color coding of Figure 7, normalized covariances, cov0(q
,q
) = cov(q
,q
)/(σ
σ
), are shown in blue for 1 million samples, and in red for 100 samples.
Discussion
In the previous sections, we introduced the concept of Hamiltonian Monte Carlo in the context of geophysical Bayesian inference and showed applications to seismic point source inversion for regional CMTs and location. Issues that remain to be discussed include the extent to which Hamiltonian Monte Carlo can be expected to be universally efficient, the parametrization of the moment tensor, the quantification of observational and modeling uncertainties, and the role of the prior.
No Free Lunch
While Hamiltonian Monte Carlo is widely considered to be an efficient alternative to preexisting sampling methods, it is important to keep in mind that efficiency only exists within a context. As stated by a series of No‐Free‐Lunch theorems (e.g., Mosegaard, 2012; Wolpert & Macready, 1997), efficiency is not an inherent attribute of a method, but it arises from the combination of the method with the right kind of prior knowledge about a specific problem.In the case of seismic source inversion, we possess the prior knowledge that the forward problem is exactly linear in the parameters that can vary by many orders of magnitude, that is, the moment tensor components. Furthermore, we know that derivatives with respect to other parameters, including origin time and location, can be computed easily, and these derivatives enable a meaningful linearization. Without these prerequisites, Hamiltonian Monte Carlo would not be a good choice for the source inversion problem.Along similar lines, we note that Hamiltonian Monte Carlo is not the only method designed to reduce the impact of the curse of dimensionality and to increase the model space dimension that can be handled in practical applications. Others include the Neighborhood Algorithm (Sambridge, 1999a, 1999b), (parallel) tempering (e.g., Geyer & Thompson, 1995; Marinari & Parisi, 1992; Sambridge, 2014), and transdimensional sampling (e.g., Bodin & Sambridge, 2009; Green, 1995; Sambridge et al., 2006, 2013).
Parametrization
Throughout the history of seismic source inversion methods, numerous parametrizations have been proposed, especially of the moment tensor. These include the classic decomposition into isotropic, double‐couple, and compensated linear‐vector components (Kikuchi & Kanamori, 1991; Knopoff & Randall, 1970); decompositions based on the eigenvectors and eigenvalues of the moment tensor (e.g., Chapman & Leaney, 2012; Hudson et al., 1989; Riedesel & Jordan, 1989); and the uniform parametrization of the moment tensor space (e.g., Tape & Tape, 2012, 2015).All of the proposed parametrizations fully span the space of moment tensors. This turns the choice of one of them into a delicate balancing act between advantages and disadvantages that is to some extent subjective. While, for instance, the parametrizations of Chapman and Leaney (2012), Hudson et al. (1989), Riedesel and Jordan (1989), and Tape and Tape (2015) may help with the physical interpretation of the moment tensor, they also sacrifice the linearity of the forward problem. An advantage of the uniform parameterization proposed by Tape and Tape (2015) is that a larger number of parameters is naturally bounded, which could make Metropolis‐Hastings sampling more competitive.Besides subjective preferences, Hamiltonian Monte Carlo relies on the efficient computation of derivatives, which gives a clear preference to parametrizations that lead to a linear forward problem. Since Monte Carlo sampling does not require artificial regularization by suppressing isotropic or non‐double‐couple components, our choice of working directly with the moment tensor components seems natural.
Quantification of Modeling and Observational Errors
With the focus of this work being on Hamiltonian Monte Carlo as a method for source inversion, we admittedly paid less attention to other aspects of the problem that may not be less important. These include the quantification of forward modeling and observational uncertainties, as performed, for instance, by Mustać and Tkalčić (2016), Silwal and Tape (2016), Staehler and Sigloch (2014, 2017), Vackár et al. (2017), and Wéber (2006).In the interest of a clear methodological description, we limited ourselves to the assumption of uncorrelated Gaussian observational errors. Based on the analysis of preevent noise, we concluded that they are well (but still conservatively) described by a single standard deviation of σ
=0.1μm for all stations. Since the structural model used to compute synthetic seismograms is the result of a full seismic waveform inversion in the region of interest, we further assumed that forward modeling errors are negligible, relative to the observational uncertainties.Regardless of these pragmatic simplifications, we note that more complex descriptions of observational and forward modeling uncertainties can be incorporated into Hamiltonian Monte Carlo by including proper covariance matrices in the definition of the potential energy in equation (9).
Role of the Prior
An essential component of our “sloppy” Hamiltonian Monte Carlo is a Taylor expansion around the prior mean model. As a consequence, the quality of the prior mean—in terms of its proximity to the posterior mean—can affect the convergence of the algorithm for those parameters that do not enter the forward problem exactly linearly (e.g., source location and origin time). The effect on convergence is hard to quantify a priori and should, as so often, be assessed on a case‐by‐case basis. Parenthetically, we remark that the proposed method may be modified by replacing the Taylor expansion around the prior mean by a Taylor expansion around the current model.
Computational Requirements
The computational requirements of the source inversions shown in the previous sections are generally low, meaning that they are feasible on modern laptop computers. While precise numbers are too hardware dependent to be generally meaningful, we note that the CPU time for the real‐data inversion is on the order of several seconds. Wavefield storage requirements are on the order of several gigabytes, though this may be reduced drastically using modern wavefield compression techniques (e.g., Boehm et al., 2016).
Conclusions
We present a new method for the Bayesian inference of effectively point‐localized seismic sources, including their timing, location, moment tensor, and source time function. The method has two key elements: (1) Spectral element simulations combined with wavefield reciprocity allow us to account for 3‐D heterogeneous Earth structure. (2) Hamiltonian Monte Carlo sampling drastically reduces the number of required samples, especially in those cases where the data quality is high and/or prior knowledge is weak.Since Hamiltonian Monte Carlo is sill hardly used for the solution of seismological inverse problems, we provide a detailed introduction, with special emphasis on weakly nonlinear problems for which first derivatives can be computed efficiently. Seismic source inversion falls into this category, meaning that it is within the niche of applications for which Hamiltonian Monte Carlo is efficient. As part of the introduction, we show that the acceptance rate of Hamiltonian Monte Carlo is practically unaffected when data quality increases and when prior knowledge weakens. This is in strong contrast, for instance, to the widely used Metropolis‐Hastings algorithm.In a real‐data application for an event in the Japanese Islands region, we demonstrate the practical feasibility of our Hamiltonian Monte Carlo source inversion. Using a 3‐D structural model inferred from full‐waveform inversion, we obtain the full posterior for timing, location, and mechanism with as little as 100 samples.While details of the method, including the treatment of observational errors and the measurement process, may still be improved, the examples are promising. They indicate that extensions to larger‐scale finite‐source inversions with many more parameters may be possible.
Authors: Antti Honkela; Jaakko Peltonen; Hande Topa; Iryna Charapitsa; Filomena Matarese; Korbinian Grote; Hendrik G Stunnenberg; George Reid; Neil D Lawrence; Magnus Rattray Journal: Proc Natl Acad Sci U S A Date: 2015-10-05 Impact factor: 11.205
Authors: Serdar Elhatisari; Dean Lee; Gautam Rupak; Evgeny Epelbaum; Hermann Krebs; Timo A Lähde; Thomas Luu; Ulf-G Meißner Journal: Nature Date: 2015-12-03 Impact factor: 49.962