Literature DB >> 35864916

Hierarchical Exploration of Continuous Seismograms With Unsupervised Learning.

René Steinmann¹, Léonard Seydoux¹, Éric Beaucé², Michel Campillo¹.

Abstract

Continuous seismograms contain a wealth of information with a large variety of signals with different origin. Identifying these signals is a crucial step in understanding physical geological objects. We propose a strategy to identify classes of signals in continuous single-station seismograms in an unsupervised fashion. Our strategy relies on extracting meaningful waveform features based on a deep scattering network combined with an independent component analysis. Based on the extracted features, agglomerative clustering then groups these waveforms in a hierarchical fashion and reveals the process of clustering in a dendrogram. We use the dendrogram to explore the seismic data and identify different classes of signals. To test our strategy, we investigate a two-day-long seismogram collected in the vicinity of the North Anatolian Fault, Turkey. We analyze the automatically inferred clusters' occurrence rate, spectral characteristics, cluster size, and waveform and envelope characteristics. At a low level in the cluster hierarchy, we obtain three clusters related to anthropogenic and ambient seismic noise and one cluster related to earthquake activity. At a high level in the cluster hierarchy, we identify a seismic burst that includes around 200 events with similar waveforms and high-frequent signals with correlating envelopes and an anthropogenic origin. The application shows that the cluster hierarchy helps to identify particular families of signals and to extract subclusters for further analysis. This is valuable when certain types of signals, such as earthquakes, are under-represented in the data. The proposed method may also successfully discover new types of signals since it is entirely data-driven.

Entities: Chemical

Keywords: seismic data analysis; seismic waveform clustering; unsupervised learning

Year: 2022 PMID： 35864916 PMCID： PMC9285886 DOI： 10.1029/2021JB022455

Source DB: PubMed Journal: J Geophys Res Solid Earth ISSN： 2169-9313 Impact factor: 4.390

Introduction

Continuous seismograms contain a rich amount of information as a large variety of signals can be observed therein. Determining the origin of these different signals is crucial in understanding the physical geological objects. For example, faults and plate boundaries accommodate the tectonic loading by releasing energy in different fashions (Ide et al., 2007), the most known and well‐understood signals being earthquakes, radiating seismic waves visible in most seismograms. Based on their signal characteristics, seismologists developed many tools to detect earthquakes in seismograms such as the short time average to long term average STA/LTA (e.g., Allen, 1978). Only 20 years ago, a new signal with tectonic origin has been discovered and designated as a non‐volcanic tremor because of the similarities with volcanic tremors (Obara, 2002). However, non‐volcanic tremors are often of weak amplitude with poorly defined signal characteristics; their detection is a more challenging task than detecting earthquakes. Other than signals with tectonic origin seismometers also record the oceanic microseisms (see e.g., Ebeling, 2012, for a recent review), rockfalls and other mass movements (e.g., Deparis et al., 2008; Lacroix & Helmstetter, 2011), ground and air traffic (e.g., Meng & Ben‐Zion, 2018; Riahi & Gerstoft, 2015) or other kind of human‐induced sources (such as church bells in Diaz, 2020). The mixing of all these sources renders a complex seismic wavefield that makes the analysis and interpretation of seismic records difficult, especially if seismic data are the only data available. As a response to this problem, seismologists have developed many processing tools for exploring these complex seismic data. Since the 1970s seismology benefits from artificial intelligence developments, bringing machine‐learning‐based solutions for exploring seismic data and recognizing patterns (e.g., Allen, 1978). More recently an unsupervised learning strategy called clustering was utilized to explore seismic data and find families of similar signals (Holtzman et al., 2018; Jenkins et al., 2021; C. W.Johnson et al., 2020; Köhler et al., 2010; Mousavi et al., 2019; Seydoux et al., 2020; Snover et al., 2020). In contrast to supervised learning strategies, clustering does not rely on a labeled training set and human expert knowledge (Goodfellow et al., 2016). Thus, clustering seismograms can help identifying families of signals which are not yet discovered or are poorly defined such as non‐volcanic tremors. In the present paper, we introduce a new strategy to use clustering as an exploration tool for continuous seismograms. Our strategy follows the idea that seismic signals are grouped in a hierarchy of classes following a specific similarity measurement, as schematized in Figure 1. Note that this illustration aims at sketching the concept rather than being complete or accurate. We consider the similarity between classes of signals to be measured on a set of signal characteristics that can be human‐defined (such as mean frequency and signal duration) or learned with machine‐learning tools, as we propose in the present paper. In the first place, one can imagine the seismic signal classes to split into long‐term and short‐term signals based on the duration of a signal (Figure 1). In the class of long‐term signals, one could use a similarity measure based on frequency content to separate the primary from secondary microseism. We see that building a tree of classes lets us explore the data on different levels and that different signal characteristics may be relevant at each node of the tree.

Figure 1

Illustration of a possible hierarchy of seismic signals found in seismograms. The different branches represent how a signal class splits into different sub‐classes depending on a given similarity measure. Here, the different classes of signals are thought in a hierarchical way, based on arbitrary properties (e.g., duration, frequency range or signal's structure). This scheme aims at illustrating the expected behavior of an optimal clustering algorithm, but does not depict the potential issues related to clustering such as overlapping between different classes of signals or imbalance between classes. The sketch presented in Figure 1 also illustrates the problems of designing a class hierarchy by hand. The labels used in this sketch are the ones we created as seismologists based on our domain knowledge. That is problematic for those classes of signal that do not have a proper definition of signal and source properties, such as non‐volcanic tremors. Moreover, some splittings, such as between earthquakes and explosions, ask for a more complex similarity measure which is hard to design by hand. Hierarchical clustering produces precisely this kind of tree, called a dendrogram, based on the exploration of the similarity of signals present in the input data. Therefore, we propose to represent continuous seismograms as a dendrogram and utilize it to explore the content of the data and identify different types of seismic signals. We want to emphasize that clustering identifies groups of similar objects, but it does not provide any meaningful labels, such as the labels in Figure 1. In an extra step, the found clusters can be labeled by analyzing the inherent properties of the clusters. In the following section, we present the workflow to build a dendrogram from continuous single‐station data. We introduce the concept of hierarchical clustering and how we transform continuous seismograms to a meaningful input (features) for the hierarchical clustering. In Section 3, we introduce a data set to apply and test the proposed workflow. In Section 4, we show and discuss briefly the resulting dendrogram. Section 5 is about navigating through the dendrogram and interpreting the clusters at different levels.

Method

A sketch of the hierarchical clustering workflow is depicted in Figure 2. In the following lines, we start with the concept of clustering in general and hierarchical clustering in particular. Then, we explain how we transform seismograms into a meaningful input for the cluster analysis.

Figure 2

Proposed workflow for hierarchically exploring continuous seismograms. (a) Input continuous three‐component seismograms, as detailed in Section 3. (b) Deep scattering spectrogram of the seismograms, with a temporal resolution of about 20 s and a high number of dimensions, detailed in Section 2.2. (c) The feature matrix extracted from the deep scattering spectrogram with independent component analysis, following the description in Section 2.3. (d) Dendrogram calculated from a similarity measurement in the feature space, as explained in Section 2.1.

Hierarchical Clustering

In general, cluster analysis groups objects based on their similarity to each other (e.g., Xu & Wunsch, 2008). Objects in the same cluster are more similar to each other than objects in separated clusters. The similarity between objects is measured on a set of certain characteristics called features. Finding the most relevant features for this task will be discussed later. Various algorithms exist to find groups of objects in a data set. This study utilizes hierarchical clustering with a bottom up approach, namely agglomerative clustering. Hierarchical clustering relies on a similarity matrix, which defines the similarity (e.g., a specific distance in the feature space) between all objects in a data set (S. C. Johnson, 1967). With a bottom‐up approach, all objects start in a singleton cluster. The clusters start merging based on the similarity matrix until all objects unify in a single global cluster. This process is summarized in a dendrogram, revealing the hierarchical structure of the entire data set. Such a strategy fits very well the nature of seismic data as depicted in Figure 1. The agglomerative clustering outcome depends mainly on the applied metric, which drives the merging of the cluster. In our approach, we use the Ward's method (Ward Jr, 1963). Given a distance d (here considered Euclidean), the Ward's method aims at grouping objects x into clusters such that the within‐cluster variance remains minimal after merging different clusters. The within‐cluster variance σ quantifies the spread of each cluster in the feature space (for more details see Appendix A). By minimizing the overall variance, with K being the number of clusters, the Ward's method allows for clusters of variable population sizes and variances. Thus, it may highlight clusters of high density located in the vicinity of more spread, low‐density clusters. Therefore, Ward's method is suitable for the expected seismic data partition, where often ambient seismic noise outweighs signals with a tectonic origin. It is worth mentioning that hierarchical clustering especially with the Ward's method can be computationally expensive. However, algorithms have been improved over time and became more efficient. In this study we utilize the python package fastcluster, which has a time complexity of with N elements in and a memory complexity of (Müllner, 2013). More recently, the use of hyperbolic embeddings for preserving the hierarchical structure of the data seems to be a promising way to reduce even further the computational costs (Chami et al., 2020).

Finding an Appropriate Representation of Seismograms: The Deep Scattering Spectrum

In order to detect and identify classes of signals in continuous seismograms with hierarchical clustering, the seismograms have to be transformed into a meaningful input for the cluster analysis. For that purpose, we calculate features for fixed windows of the seismogram. Thus, each window will be assigned a cluster based on the features for this window. Note that this process simplifies the complexity of seismic data, since multiple types of signals can occur simultaneously. Common cluster analysis such as hierarchical clustering neglect this fact and can only assign a single cluster to an object. Besides the choice of the applied metric within hierarchical clustering, the choice of features is another important factor, which determines the outcome of the cluster analysis. Finding the most relevant features should be done according to the task at hand and can be done thanks to prior knowledge on the data or by defining proper algorithms to learn the most relevant features. We distinguish classical machine‐learning algorithms that rely on human‐defined features (Maggi et al., 2017; Malfante et al., 2018) or representation‐learning algorithms where the features are learned from the data to optimize a given task (LeCun et al., 2015; Ross et al., 2018; Rouet‐Leduc et al., 2020). While classical machine learning provides less accuracy in most cases, it provides interpretability since the features are known, which is an interesting aspect. Most algorithms that rely on representation learning are less easy to interpret since the features are more abstract, but they also provide more accurate results. In the present paper, we propose to use a hybrid approach between classical and representation learning algorithms that combines the advantages of both. A time‐frequency representation such as the spectrogram is one way to create a set of features for classifying seismic signals (Jenkins et al., 2021; C. W. Johnson et al., 2020; Snover et al., 2020). However, Andén and Mallat (2014) showed that a spectrogram generated by the Fourier transform is not ideal for classification purposes since it is not stable to time‐warping deformations, especially at short periods compared with the duration of the analyzing window. They introduce another time‐frequency representation called a deep scattering spectrum which is computed by a scattering network. This type of network implements a cascade of convolutions with wavelet filters, modulus function, and pooling operations (see Figures 2a and 2b). Deep scattering spectra are locally translation invariant and preserve transient phenomena such as attack and amplitude modulation. These characteristics are beneficial when it comes to classifying any time series data. In Andén and Mallat (2014) and Peddinti et al. (2014), the authors have successfully classified audio data based on the deep scattering spectrum. The authors of Seydoux et al. (2020) have brought that representation into seismology and showed that small precursory signals of a landslide could be detected and classified in an unsupervised fashion. Other successful deep‐learning classifiers inspired by deep scattering networks are presented in Balestriero et al. (2018) and Cosentino and Aazhang (2020). We use the strategy presented in Seydoux et al. (2020) for calculating the deep scattering spectrum. Considering the continuous input signal (where, C is the number of channels), the scattering coefficients S ( of order ℓ are obtained from the following cascade of wavelet convolutions and modulus operations (i.e., wavelet transforms): where, ⋆ stands for the temporal convolution, |⋅| represents the modulus operator and is the wavelet filter at the layer i of the scattering network, with center frequency . Here, refers to one of the center frequencies of the layer i indexed by n = 1 … N , where N is the total number of wavelets at layer i. In contrast to the Fourier transform, the center frequencies of the wavelets are placed logarithmically. In this study, we only consider a scattering network with 2 layers (as depicted in Figure 2) since, Andén and Mallat (2014) argued that more layers do not necessarily introduce new valuable information. The first layer in the network creates N 1 scalograms per channel of the seismic station. In the second layer another wavelet transform is applied to each scalogram of the first layer. Thus, the second layer contains N 1 * N 2 scalograms. A maximum pooling operation is then applied over each scalogram to retrieve the scattering spectrum. The entries of the scattering spectrum pooled from the first layer scalogram are called first‐order scattering coefficients. The entries of the scattering spectrum pooled from the second layer scalogram are called second‐order scattering coefficients, which contain important information about the attack and modulation of a signal. While, the scalograms still have the same sampling rate in time as the input data, the temporal pooling collapses the time dimension of the scalogram and produces a scattering spectrum for each input window. Note that each input channel from the seismic station is treated separately and their deep scattering spectrum are concatenated into a deep scattering spectrum vector with the size 3 * N 1 + 3 * N 1 * N 2 for a three‐component seismogram. For each input window, a deep scattering spectrum vector is created, which are then merged into the deep scattering spectrogram. The final sampling rate of the deep scattering spectrogram is defined by the size of the input window. In Seydoux et al. (2020), the authors initialize Gabor wavelets with amplitudes and derivatives on a certain sets of knots and interpolate then with Hermite cubic splines. With respect to the clustering loss, they learn the parameters on these knots governing the shape of the wavelets. In this study, we directly use the initialized Gabor wavelets with zero phase shift and do not apply any learning of the wavelets. This choice was made principally because we do not perform a fixed cluster analysis in our study, but an exploration of the data instead where a loss function is harder to define. For the interested reader we refer to Andén and Mallat (2014) and Seydoux et al. (2020).

Features Extraction From Deep Scattering Spectrogram

The deep scattering spectrum can have more than 1,000 dimensions and, thus, the conditions for clustering are not favorable (Kriegel et al., 2009). Indeed, distances in very high‐dimensional spaces give little information about the structure of the data (the so‐called curse of dimensionality; Bellman, 1966). In addition, the representation is known to be highly redundant since the wavelet filters of the first layer are often considered with a strong frequency overlap in order to provide a dense first‐order representation. Therefore, it is recommended to reduce the dimensions before clustering. In our case, we use an independent component analysis (ICA) to reduce the dimension of the representation. In the following remarks, we explain the basic concept of ICA. For the interested reader we refer to Comon (1994). ICA is introduced as a statistical tool for blind source separation and feature extraction. The generative model of the ICA can be described as: where, are the N observations of dimension F, is the mixing matrix, and are the unmixed sources (namely, the C unmixed sources obtained from ICA). The observations x are therefore a linear combination of the independent sources s, with the mixing weights gathered in A. A test of statistical independence is required to solve Equation 2 while ensuring the sources s to be independent. This concept is illustrated in Figure 2, where the unmixed sources are considered as features in our workflow (therein called feature matrix). These sources are obtained from applying the unmixing matrix, the pseudo inverse of the mixing matrix A, to the deep scattering spectrogram. Among the different strategies, we can look for a minimum of mutual information, or similarly, a maximization of the non‐Gaussianity. In our study, we apply the FastICA algorithm from the scikit‐learn Python library, which uses the negentropy as a measure of non‐Gaussianity (Hyvärinen & Oja, 2000). This analysis is similar to the principal component analysis, with the difference that the independent components are not orthogonal. In addition, there is no information about the variance explained by the different independent components, and are therefore delivered unsorted by the algorithm.

Data

We test our proposed workflow on continuous three‐component seismic data from the station DC06 of the DANA experiment in Turkey (see for instance Poyraz et al., 2015, and the map shown in Figure 3a). Originally, the experiment was conducted to investigate the crustal structure beneath the western segment of the North Anatolian Fault. We choose the data set for mainly two reasons. First of all, the data set contains both seismic and anthropogenic activity, which is a typical situation in most seismological studies. Second of all, an existing template matching catalog provides labels for the seismicity in this area. The catalog was built following the methodology in Beaucé et al. (2019).

Figure 3

Geological context and seismic data used in the present study. (a) Map of the North Anatolian fault zone showing station DC06 (black triangle), the seismic burst (red dots) including the largest event (red star) and other seismic activity (blue dots); all detected with a template matching strategy. The geological faults that ruptured after 1,900 (black lines) are adapted from Emre et al. (2011). (b) Cumulative detections of the seismic burst (in red) and other seismic activity (in blue) obtained with template matching. (c) Continuous spectrogram of the east‐component of station DC06, with a visual identification of (A) oceanic microseism, (B) a non‐stationary monochromatic noise source, and (C) daily high‐frequency activity. We choose to analyze the seismic data from the 25th to the 27th of July 2012. During the period of these two days, a high rate of localized seismicity with 148 cataloged events occurred on and around the northern strand of the North Anatolian fault (see Figures 3a and 3b). In this study, we refer to this high rate of seismicity as a seismic burst. The catalog explains the series of events with 17 templates having their hypocenters close to each other (Figure 3a, red dots). Since the seismic burst causes a repeating pattern in the seismogram with short time‐warping deformations due to slight changes of the hypocenters, it is an interesting study case for our proposed method. Station DC06 is close to the seismic burst and records the time period of interest without data gaps. Thus, we choose the three‐component seismograms of this station. The sampling rate of the data is 50 Hz. The spectrogram of the east component of station DC06 is presented in Figure 3c. The oceanic microseism is visible around 0.2 Hz, where we can observe the dispersive nature of the oceanic gravity waves. At around 1.5 Hz we can identify a nonstationary monochromatic noise source, which seems to be more active during the first day. At frequencies higher than 3 Hz we can see increased activity during daytime, most likely induced by anthropogenic seismic sources. The event with the largest magnitude of the burst is also easy to spot during the evening of the 25th in the spectrogram.

Results

Feature Space

First, we use the continuous three‐component seismograms to calculate the deep scattering spectrogram with a two‐layered scattering network (as detailed in Equation 1). The network parameters are physics‐driven and can be adjusted according to the goal. In this study, the first layer contains 24 Gabor wavelets with center frequencies between the Nyquist frequency of the seismogram (25 Hz) and 0.78 Hz with a spacing of 4 wavelets per octave. The second layer contains 14 Gabor wavelets with center frequencies between 25 and 0.19 Hz with a spacing of 2 wavelets per octave. This setup results in 24 wavelet transforms per channel in the first layer and 336 (24 * 14) wavelet transforms per channel in the second layer. Because the deep scattering spectrum is a concatenation of the first‐ and second‐order scattering coefficient of each input channel, the total number of scattering coefficients is 1,080 (dimension F in Figure 2). For the temporal pooling operation, we apply maximum pooling, since we are interested in detecting and classifying non‐stationary events such as the seismic burst. If the focus of classification is the background noise, average pooling might be the better choice (as suggested in Seydoux et al., 2020). The moving pooling window is 20.48 s large and does not overlap. Hence, the time resolution of the deep scattering spectrogram is also 20.48 s. For dimensionality reduction, we apply an independent component analysis using the FastICA algorithm from the scikit‐learn Python library. In this study, we select the appropriate number of independent components according to the reconstruction loss between the original data and the reconstructed data after compression with an ICA (detailed in Appendix B). We emphasize that we look for a trade‐off between keeping the most significant amount of information while using few independent components. From the study of the loss with increasing number of components shown in Appendix B and Figure B1 therein, we conclude that keeping 10 independent components is a good compromise and constitute our choice in the present study. A visual representation of the 10 unmixed sources building the feature space is depicted in Figure B2 in Appendix B.

Figure B1

Reconstruction loss with independent component analysis from the deep scattering spectrogram. The reconstruction loss ϵ(n) is calculated from Equation B1 as a function of the number of independent components n.

Figure B2

Time series of the 10 unmixed sources of the deep scattering spectrogram. The samples containing one or more arrivals of the earthquake from the nearby seismic burst are highlighted with blue dots.

Dendrogram

After transforming the continuous seismic data into a most relevant set of features, we can use this representation to explore the data with hierarchical clustering. By controlling the distance threshold, we can extract different numbers of clusters. The distance threshold sets the boundaries for the possible distances between points within a cluster. While a larger distance threshold allows larger and fewer clusters to form, a smaller distance threshold extracts smaller but many clusters. Note that the distance threshold is only used to extract different cluster solutions based on the similarity matrix; it is not a hyperparameter affecting the similarity matrix. In Figure 4a we selected a distance threshold of 0.47 in order to show a truncated dendrogram stopping at 16 clusters. At a distance of 0.9, we extract four main clusters labeled as A, B, C, and D. Figure 4b–4e depict random‐selected three‐component seismograms from each of the cluster. Figure 4f shows the averaged first‐order scattering coefficients of these four clusters. These first‐order scattering coefficients describe the frequency characteristics of each cluster. Figure 4g presents the normalized cumulative detection rate of each cluster, with the seismic burst detection rate indicated as a reference. The relative size of each cluster compared to the size of the entire data set is depicted in Figure 4h. In the following remarks, we will analyze each of the four main clusters from left to right.

Figure 4

Dendrogram analysis and statistical characteristics of the different clusters. (a) Dendrogram calculated in the feature space (see Section 2.1 for explanations). The dendrogram is here truncated in order to form 16 clusters. The clusters marked with a letter are considered the main clusters, and the subclusters are indicated with numbers. The numbers in the parenthesis indicate the number of samples in each cluster. (b, c, d, and e) Depict random examples of waveforms for the four main cluster A,B,C, and D, respectively. (f) Averaged first‐order scattering coefficients for main clusters A, B, C, and D (g) Normalized cumulative detections of main clusters A, B, C, and D, and of the seismic burst obtained from the multi‐station template‐matching catalog. (h) Relative size of the main clusters compared to the size of the entire data set. Cluster A contains ca. 27% of the data (Figure 4h) and is the first cluster to split from the whole data set, that is, cluster A is the furthest away from the center of the data points (Figure 4a). Compared to the other clusters, its scattering coefficients for all frequencies are relatively low except for a local maximum around 1.5 Hz (Figure 4f). Looking at the corresponding cumulative detection curve (Figure 4g), we see that this cluster is active mainly during the first day until the late afternoon, which seems to correlate with the monochromatic signal around 1.5 Hz we have already identified in the spectrogram (Figure 3c). Cluster B contains about 19% of the data samples (Figure 4h) and has relatively large scattering coefficients for frequencies above 10 Hz (Figure 4f). The corresponding cumulative detection curve indicates that this cluster accumulates less detections during the beginning of a day than with later times of a day (Figure 4g). Combining these facts leads to the hypothesis that cluster B might be related to signals with an anthropogenic origin. Cluster C is the largest cluster with more than 50% of the data points (Figure 4h). Compared to the other clusters, it also has the lowest scattering coefficients at all frequencies (Figure 4f). Looking at the cumulative detection curve (Figure 4g), we see this cluster shows an almost linear increase starting at the afternoon of the first day, exactly when cluster A becomes almost inactive. The cluster size and frequency content suggest that cluster C contains mostly ambient seismic noise data. Finally, cluster D contains about 4% of data set and is the smallest of the four clusters (Figure 4h). The corresponding first‐order scattering coefficients show a local maximum around 5 Hz (Figure 4f). Its cumulative detection curve correlates well with the detections of the seismic burst (Figure 4g), with additional detections before the seismic burst starts. All these observations indicate that cluster D is probably related to nearby seismic activity in general.

Discussion

In this section, we will discuss and interpret the dendrogram's representation and its clustering solution. While, the main focus is on identifying how the seismic burst occurs in the dendrogram, we will also discuss how the general seismicity is observed through this representation, and interpret the remaining clusters with anthropogenic activity and ambient seismic noise. To underpin the statement that the deep scattering spectrum is a superior representation for the task at hand than the Fourier‐transform spectrum, we also create and interpret a dendrogram based on the Fourier‐transform of the same data set (see Appendix D).

Identification of the Seismic Burst Within the Dendrogram

First, we identify all time segments containing onsets of the events of the seismic burst and observe which clusters those time segments belong to. The template matching catalog contains 148 detections related to this seismic burst. However, we only associate 136 samples in the feature space with the seismic burst, since one sample represents about 20 s of waveform data and, thus, can contain multiple events. Figure 5a shows that a large majority of the samples, which contain arrivals of the seismic burst, fall into cluster D (92.6%). On the other hand, only 40% of cluster D is related to the seismic burst, underpinning the statement that this cluster is related to general seismic activity. Cluster B and C share the remaining 7.4% of the burst. Compared to the large population sizes of clusters B and C, the contribution of the burst almost vanishes (0.3 and 0.1%). Cluster A contains no detections of the burst. While cluster D contains the majority of the seismic burst, the interesting aspect is to understand what the remaining 60% samples of this cluster are related to (earthquakes from the same source region, different signals, etc). To answer that question, we investigate the subclusters visible in Figure 4a obtained with a distance threshold of 0.47; in particular, we will narrow the focus on the subclusters of cluster D, namely the four subclusters D.1 to D.4.

Figure 5

Identification of the seismic burst within the main and subclusters. (a) The distribution of the seismic burst across the four main clusters. (b) The distribution of the seismic burst across the four subclusters in the main cluster D. (c) Normalized cumulative detection curves for the subclusters in the main cluster D. (d) Averaged first‐order scattering coefficients for the subclusters in the main cluster D. First, we look at the distribution of the samples containing the seismic burst across the four subclusters in main cluster D. From Figure 5a, we know that more than 92% of the burst was found in cluster D. We observe in Figure 5b that this amount splits into ca. 71.3% in cluster D.1 and ca. 21.3% in cluster D.4. The subclusters D.2 and D.3 contain no earthquakes from the seismic burst and will be discussed later. If we look at the cumulative detection curve of each subcluster in D (Figure 5c), we see that cluster D.1 and D.4 share a very similar temporal pattern. The corresponding averaged first‐order scattering coefficients (Figure 5d) explain why the burst got split into two clusters: across almost all frequencies the larger subcluster D.1 shows significantly smaller scattering coefficients than the smaller subcluster D.4. Hence, the magnitude of the events seems to be the characteristic that separates the burst into two clusters. Besides, we observe that 56% of D.1 and 97% of D.4 can be explained by the cataloged burst. This observation raises the question: what are the samples in D.1 and D.4 that cannot be related to the seismic burst recorded by the catalog? We can answer this question by looking at the waveforms representing the corresponding data points of subclusters D.1 and D.4. Figures 6a–6c show the corresponding waveforms of all 204 data points of the two subclusters D.1 and D.4. For presentation purposes we align the waveforms accordingly to their maximum correlation with a template waveform from the subcluster. For all waveforms we observe the P and S seismic phase arrivals of the earthquakes. The first 30 waveforms correspond to subcluster D.4. 29 of them are are also in the catalog (marked orange) while 1 of them is not in the catalog (marked magenta). The following 174 waveforms are from subcluster D.1. 98 of them are are also in the catalog (marked light blue) while 76 of them are not in the catalog (marked blue). The waveforms are very similar to each other on all three channels. This indicates that these new detections are coming from the same source area. Note also that the first 30 waveforms representing subcluster D.4 have a better signal‐to‐noise ratio than the following waveforms of subcluster D.1. This agrees with our assumption that the burst is split into two subclusters due to magnitude differences. The magnitude estimations of the template matching catalog confirms this assumption (see Figure 6d). While most of the events located in D.1 range between M0.5 and M1, the events located in D.4 range between M1 and M2.2.

Figure 6

(a,b,c) Waveform data from subcluster D.1 and D.4. The color code indicates the according subcluster and if the event is mentioned by the catalog. (d) Magnitude estimations of the cataloged events of the seismic burst found in subcluster D.1 and D.4. By investigating cluster D and its subclusters D.1 and D.4, we are able to identify two subclusters representing the seismic burst. While D.1 contains many events with smaller magnitudes, D.4 contains fewer events with larger magnitudes. Together the two subclusters contain 92.6% of the cataloged events and 77 new events, which have identical P and S wave arrivals as the cataloged ones. The new detections can be explained by the fact that we utilize a single station method and compare it to a catalog based on a multi station method. More details and a comparison with a single station template matching catalog based on station DC06 can be found in Appendix C. However, 7.4% of the cataloged detections can not be found in subclusters D.1 or D.4. In the following remarks, we want to analyze the misidentified 7.4% of cataloged events, which equal 10 over 136 events. First of all, we want to know where these events are located in the feature space. Therefore, we calculate the Euclidean distance between the misidentified events and the centroids of each cluster in the feature space (see Figure 7a). In magenta, we highlight the distance between the sample and its respective subcluster. In cyan, we highlight the distance between the sample and subcluster D.1 containing the low magnitude events of the burst. In gray, we highlight the distances to all other remaining clusters as a comparison. We sorted the misidentified 10 events according to the distance to the centroid of D.1. We see that for the first six events, the distance to the centroid of D.1 is smaller than to the centroid of its respective cluster. The corresponding waveform data also offer explanations for the misidentification (Figures 7b–7d). Indeed, the P and S arrivals are noisy but visible for the first five events. Thus, some events might be misclassified because samples are grouped with the Ward's method, which solves iteratively an objective function considering the Euclidean distance and the within‐cluster variance. In other words, clusters can agglomerate samples which might be closer to the centroids of other clusters if we consider the pure Euclidean distance. After the first five events, when the distance to its respective cluster becomes smaller than the distance to D.1., the P and S arrivals are not visible anymore, or other large‐amplitude events are present. Here the problem is related to assigning a single cluster to 20 s of waveform data, which can contain multiple signals.

Figure 7

Analysis of the misidentified earthquake waveforms. (a) Distances between misidentified data points containing an event from the catalog and the centroids of all clusters. The magenta points show the distance between the data point and the centroid of its own respective subcluster. The cyan points show the distance between the data point and the centroid of D.1. The gray points show the distance between the data point and the centroids of the other 14 subclusters. (b, c, d) Corresponding aligned waveform data sorted according to the distance to the centroid of D.1 (respectively channels E, N, and Z). The color coding represents the distance to the centroid of subcluster D.1. A purple color indicates a larger distance than a light blue color.

Neighboring Clusters of the Seismic Burst in the Feature Space

Having identified most of the seismic burst in two neighboring subclusters already shows that the representation of the data and the distances between the data points are meaningful. As a next step, we want to analyze the neighborhood of these two subclusters to get a better understanding of the data representation. Since D.2 and D.3 share the same cluster with D.1 and D.4, we know that they are located next to each other in the feature space. This indicates that subcluster D.2 and D.3 might contain similar signals, such as seismic activity with a different origin than the seismic burst. To verify this assumption, we can compare existing earthquake catalogs with the timestamps of the samples in the subclusters. We extend the local template matching catalog with a regional catalog limited to events within a radius of 5° around station DC06. The regional catalog is downloaded from IRIS. For calculating the seismic phase arrivals at the station, we use the TauP module of ObsPy with the velocity model of Kennett and Engdahl (1991). We consider a sample related to an event of the catalog if the 20 s window of the sample overlaps with the window between the P wave arrival and the decaying coda. The waveform data of D.2 and D.3 are presented in Figure 8. Figure 8a indicates the samples which can be explained by arrivals of a regional or local event, and Figure 8b shows the samples which can not be explained by arrivals of a regional or local event. Note that one sample in the feature space represents ca. 20 s of waveform data and each horizontal waveform displayed in Figure 8 contains multiple consecutive 20 s windows. Subcluster D.2 contains only nine samples corresponding to two seismic events indicated in blue in Figure 8a. The first event represented by eight consecutive samples at index 0 is a relatively distant M4 event. The other event represented by a single sample is a quarry blast from a local mine mentioned by the template matching catalog. At first sight, it might seem unexpected that these two events are found in the same subcluster. However, subclusters D.2 shows the largest scattering coefficients for frequencies below 5 Hz (see Figure 5d), and its centroid is the furthest away from the remaining data set as we can see from the inter‐cluster distance matrix presented in Figure A1 in Appendix A. Moreover, the within‐cluster variance σ in the top panel of Figure A1 indicates that the samples of subcluster D.2 are the most spread out compared to the other subclusters, This suggests that both events are seen as outliers in the feature space due to their high amplitudes at lower frequencies.

Figure 8

Figure A1

Inter‐cluster distances and within‐cluster variances. (a) Within‐cluster variance according to Equation A2 for all 16 subclusters. (b) Inter‐cluster distance according to Equation A1 between all 16 subclusters.

Seismic waveforms identified in subclusters D.2 and D.3. (a) waveform data of D.2 and D.3 where the phase arrivals match the merged catalog. (b) Waveform data of D.3 which do not correspond to phase arrivals from the merged catalog. Moreover, we observe that the catalog can explain 67% of all samples of D.3 (a random selection of waveforms are shown in Figure 8a). The other 33% are shown in Figure 8b, and some samples also show seismic phase arrivals (in particular, the seismograms shown at index six and nine). It is thus likely that the samples shown in Figure 8b contain uncataloged events. While subcluster D.1 and D.4 represent similar earthquakes from a similar source region, subcluster D.3 shows many kinds of signals, such as earthquakes with different magnitudes and distances to the station. We can interpret subcluster D.3 as an agglomeration of transient signals with increased energy between 1 and 5 Hz (see Figure 5d). Regional and local events also fall into this category. Thus, in the vicinity of the subclusters D.1 and D.4, related to the seismic burst, other subclusters containing seismic activity can be found.

Anthropogenic Signals With High Envelope Correlation

After identifying seismic activity in cluster D, we want to draw attention to the remaining part of the seismic data set. Seismic activity induces short‐term signals with a characteristic waveform and envelope shape. However, if we want to classify other types of signals like tremors, anthropogenic noise, or ambient noise, correlating waveforms are unlikely to be suitable for this task. One key feature of the deep scattering spectrum is the representation of the waveform's envelope in the second‐order scattering coefficients (Andén & Mallat, 2014). Consequently, we should find clusters with weakly correlating waveforms but strongly correlating envelopes. For that reason, we investigate the correlation coefficient of the waveform (CC ) and the envelope (CC ) for all subclusters. First, a template is defined by the closest sample to the centroid representing the most typical waveform of a cluster. Then, we calculate the correlation coefficient of the waveform data CC and the correlation coefficient of the smoothed envelope CC between the template and the remaining samples. The envelope is defined by the modulus of the analytic signal, which is a complex‐valued representation of the waveform disregarding the negative frequencies from the Fourier transform. A median‐filter smoothens the envelope. The averaged results are depicted in Figure 9a. We first observe that CC is more significant than CC for most subclusters. In particular, cluster B.4 shows the most significant discrepancy between CC and CC ; this subcluster is part of cluster B, which we related to high‐frequent urban noise. In Figures 9b–9d, we align the envelopes for each channel and each sample in B.4 to depict the shared characteristics. We see a very symmetric envelope that lasts around 5 s. The envelopes look very similar on all three components. Figure 9e shows a histogram of detections over the time of the day. We see that this cluster mostly appears during daytime with a clear peak around 14:00 local time. Figure 9f shows the averaged first‐order scattering coefficients for all three channels. The frequencies above 5 Hz are very pronounced and peak between 10 and 15 Hz. In summary, we see that subcluster B.4 is related to non stationary urban noise which produced similar envelopes lasting 5 s. Nearby road traffic could produce these kind of signals.

Figure 9

Interpretation of subcluster B.4. (a) Averaged correlation coefficient for the waveforms CC and for the envelopes CC for all 16 subclusters. (b,c,d) Aligned envelopes for the three channels for subcluster B.4. (e) Number of detections per hour for subcluster B.4. (f) Averaged first‐order scattering coefficients for subcluster B.4.

Long‐Lasting Signals With Low Envelope Correlation

As the last example, we want to draw attention towards clusters A and C. Both clusters show relatively low correlation coefficients for the envelopes (see Figure 9). Cluster C contains more than half of the data, and the average scattering coefficients are the lowest for all frequencies compared to the other clusters (see Figures 4f and 4g). Moreover, the subclusters of C have a relatively low distance to each other, and their within‐cluster variance is relatively low (see Figure A1 in Appendix A). This indicates that they contain similar signals. Combining these facts, we conclude that this cluster contains ambient noise without any significant activity of transient signals. Cluster A seems to correlate with the monochromatic noise source around 1.5 Hz (see Figures 3c and 4f). To prove that cluster A contains only data with increased activity around 1.5 Hz we depict the occurrence of cluster A and the Fourier amplitude of the three channels filtered between 1.4 and 1.6 Hz as a function of time in Figure 10. In general, an increased amplitude around 1.5 Hz correlates well with the appearance of cluster A. However, not all samples with an increased monochromatic activity fall into cluster A. As with the misidentified events in Figure 7, the problem is related to assigning a single cluster to 20 s of waveform data containing multiple types of signals. It is also interesting to note that subcluster A.1 and A.3 show larger correlation coefficients for the waveforms than for the envelopes (Figure 9a). This characteristic only applies to these two subclusters and is related to the dominance of the monochromatic signal.

Figure 10

Fourier amplitude of all three channels calculated over 10 min windows in the frequency range of 1.4–1.6 Hz together with the activation of the main cluster A.

Fourier amplitude of all three channels calculated over 10 min windows in the frequency range of 1.4–1.6 Hz together with the activation of the main cluster A. Cluster A and C show that the dendrogram representation based on features from the deep scattering spectrum also finds cluster of noise sources without strong correlation of the waveforms or envelopes.

Conclusion

In this study, we proposed a new way of exploring the content of continuous seismograms and identifying different types of signals present in the data. Our approach is based on hierarchical clustering, which offers many cluster solutions with the dendrogram and, thus, delivers a tool for exploring the data. The hierarchical clustering is applied to a low‐dimensional feature space extracted from the deep scattering spectrogram of the continuous seismogram. A primary advantage of the workflow compared to other machine learning algorithms for classifying continuous seismic data is the interpretability at each step and the deep scattering spectrum, which seems to be a promising representation of seismic data for classification purposes. For an application in this study, we chose a 2‐day long three‐component seismogram containing a nearby seismic burst with 148 cataloged events with similar waveforms. These labels served as a sanity check for the algorithm. First, we extracted a cluster solution with four main clusters to get a rough overview of the data. With the cluster size, the temporal detection, and averaged first‐order scattering coefficients, we delivered an interpretation of each cluster and could identify a cluster containing mostly waveforms related to earthquakes. Inside this specific cluster, we found two subclusters containing almost all cataloged events of the seismic burst. While the events of the seismic burst split into two subclusters due to magnitude differences, 77 uncataloged events with similar waveforms were found. The case of the seismic burst shows that we can identify a repeating pattern with slight variations of the waveforms and low SNR in an unbalanced data set. The few misidentified events highlight the multi‐label characteristics of seismograms. Multiple signals can arrive simultaneously and, thus, assigning a single label to a part of the seismogram does not reflect the whole truth. Integrating this issue into clustering seismograms is an interesting aspect for future work. Besides the seismic burst, we also identified signal families with anthropogenic origin and a large cluster containing ambient seismic noise. The different types of signals show that the strategy is able to group signals with correlating waveforms, envelopes or similar frequency characteristics. We want to emphasize here that hierarchical clustering and the dendrogram itself does not deliver meaningful labels for the clusters. Interpreting the different cluster solutions with certain characteristics such as the temporal detection curve is a crucial step towards understanding and revealing the content of the data. Until the point of hierarchical clustering, the proposed workflow is an unsupervised and data‐driven strategy to find groups of similar seismic signals. After that point, we use the output of that strategy to do an interpretation and assign meaningful labels to the retrieved clusters. As most machine learning algorithms, the proposed strategy relies on a few parameters to tune. The hyperparameters of the deep scattering network are mainly physics‐driven and depend on the pre‐defined task. As with Fourier spectrograms, we can control the window size and frequencies of interest. For example, low frequent first‐order wavelet filters might not be necessary for finding groups of anthropogenic signals. Maximum pooling is more interesting than average pooling if the signals of interest have a transient character such as earthquakes. After designing the deep scattering network, the number of components in the independent component analysis is an exploratory task. It is a trade‐off between keeping crucial information and producing a low‐dimensional representation to avoid the curse of dimensionality. In general, the method can be used for various tasks. It is beneficial to get a general overview of an unknown data set. If there is a particular target of interest (e.g., earthquakes, urban noise sources, tremors), we can navigate the dendrogram and focus the analysis on a specific branch. The temporal detection curves of the clusters can be easily correlated with other time series such as GPS displacement or environmental parameters to search for signal classes related to certain physical processes. A specific interesting application would be the North Anatolian Fault, where seismologists assume the presence of non‐volcanic tremors but conventional methods did only deliver null results so far (Bocchini et al., 2021; Pfohl et al., 2015). In contrast to conventional tremor detection algorithm, our approach could identify signals related to tectonic processes without assuming any signal characteristics. In the same sense, the dendrogram can reveal clusters/classes human expert knowledge could not reveal yet and expand the classes of signals we know so far. Moreover, the method can be helpful to extract particular types of noise for performing ambient noise cross‐correlation potentially enhancing the signal quality.

11 in total

1. Independent component analysis: algorithms and applications.

Authors: A Hyvärinen; E Oja
Journal: Neural Netw Date: 2000 May-Jun

2. A scaling law for slow earthquakes.

Authors: Satoshi Ide; Gregory C Beroza; David R Shelly; Takahiko Uchide
Journal: Nature Date: 2007-05-03 Impact factor: 49.962

Review 3. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

4. Hierarchical clustering schemes.

Authors: S C Johnson
Journal: Psychometrika Date: 1967-09 Impact factor: 2.500

5. Nonvolcanic deep tremor associated with subduction in southwest Japan.

Authors: Kazushige Obara
Journal: Science Date: 2002-05-31 Impact factor: 47.728

6. Machine learning reveals cyclic changes in seismic source spectra in Geysers geothermal field.

Authors: Benjamin K Holtzman; Arthur Paté; John Paisley; Felix Waldhauser; Douglas Repetto
Journal: Sci Adv Date: 2018-05-23 Impact factor: 14.136

7. Probing Slow Earthquakes With Deep Learning.

Authors: Bertrand Rouet-Leduc; Claudia Hulbert; Ian W McBrearty; Paul A Johnson
Journal: Geophys Res Lett Date: 2020-02-24 Impact factor: 4.720

8. Hierarchical Exploration of Continuous Seismograms With Unsupervised Learning.

Authors: René Steinmann; Léonard Seydoux; Éric Beaucé; Michel Campillo
Journal: J Geophys Res Solid Earth Date: 2022-01-06 Impact factor: 4.390

Review 9. SciPy 1.0: fundamental algorithms for scientific computing in Python.

Authors: Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; Stéfan J van der Walt; Matthew Brett; Joshua Wilson; K Jarrod Millman; Nikolay Mayorov; Andrew R J Nelson; Eric Jones; Robert Kern; Eric Larson; C J Carey; İlhan Polat; Yu Feng; Eric W Moore; Jake VanderPlas; Denis Laxalde; Josef Perktold; Robert Cimrman; Ian Henriksen; E A Quintero; Charles R Harris; Anne M Archibald; Antônio H Ribeiro; Fabian Pedregosa; Paul van Mulbregt
Journal: Nat Methods Date: 2020-02-03 Impact factor: 28.547

10. Clustering earthquake signals and background noises in continuous seismic data with unsupervised deep learning.

Authors: Léonard Seydoux; Randall Balestriero; Piero Poli; Maarten de Hoop; Michel Campillo; Richard Baraniuk
Journal: Nat Commun Date: 2020-08-07 Impact factor: 14.919

3 in total

1. Hierarchical Exploration of Continuous Seismograms With Unsupervised Learning.

Authors: René Steinmann; Léonard Seydoux; Éric Beaucé; Michel Campillo
Journal: J Geophys Res Solid Earth Date: 2022-01-06 Impact factor: 4.390

2. A Design Framework of Exploration, Segmentation, Navigation, and Instruction (ESNI) for the Lifecycle of Intelligent Mobile Agents as a Method for Mapping an Unknown Built Environment.

Authors: Junchi Chu; Xueyun Tang; Xiwei Shen
Journal: Sensors (Basel) Date: 2022-09-01 Impact factor: 3.847

3. AI-Based Unmixing of Medium and Source Signatures From Seismograms: Ground Freezing Patterns.

Authors: René Steinmann; Léonard Seydoux; Michel Campillo
Journal: Geophys Res Lett Date: 2022-08-11 Impact factor: 5.576

3 in total