Literature DB >> 35153592

Wind field reconstruction with adaptive random Fourier features.

Jonas Kiessling^1,2, Emanuel Ström², Raúl Tempone^3,4.

Abstract

We investigate the use of spatial interpolation methods for reconstructing the horizontal near-surface wind field given a sparse set of measurements. In particular, random Fourier features is compared with a set of benchmark methods including kriging and inverse distance weighting. Random Fourier features is a linear model β ( x ) = ∑ k = 1 K β k e i ω k x approximating the velocity field, with randomly sampled frequencies ω k and amplitudes β k trained to minimize a loss function. We include a physically motivated divergence penalty | ∇ ⋅ β ( x ) | 2 , as well as a penalty on the Sobolev norm of β . We derive a bound on the generalization error and a sampling density that minimizes the bound. We then devise an adaptive Metropolis-Hastings algorithm for sampling the frequencies of the optimal distribution. In our experiments, our random Fourier features model outperforms the benchmark models.

Entities: Chemical

Keywords: Metropolis algorithm; flow field estimation; machine learning; random Fourier features; spatial interpolation; wind field reconstruction

Year: 2021 PMID： 35153592 PMCID： PMC8596000 DOI： 10.1098/rspa.2021.0236

Source DB: PubMed Journal: Proc Math Phys Eng Sci ISSN： 1364-5021 Impact factor: 2.704

Introduction

An integral part in wind farm planning and weather prediction is access to high-quality wind data [1,2]. However, meteorological stations are often heterogeneously distributed with many gaps, and interpolation techniques have to be employed in order to increase spatial resolution [3]. For example, a crucial step in building wind farms is site prospecting, in which national or state-level measurements are used to estimate the expected aggregate yearly or monthly energy output [1]. A common approach is to approximate the probabilistic distribution of the wind speed over time with some parametric model, apply spatial interpolation to the parameters, calculate the energy output as a function of wind speed and then take the expected value. The work [4] lists approximately 200 papers written between 1940 and 2008 that focus on parametric models for time series of wind speed measurements. More modern approaches that directly interpolate wind speed also exist [3]. In this work, we focus on interpolation of the near-surface north-south and east-west horizontal wind components for short, 10-min time intervals. This effectively results in a reconstruction of the entire wind vector field, hence the name wind field reconstruction. Due to high spatial variability and dependence on both local and upstream terrain features, interpolation of wind data over hourly or minute-wise intervals is considered a particularly hard problem [3]. However, there are reasons why this can be useful, such as predicting the propagation of forest fires and pollutants, modelling the movement of flying animals and insects [5], or formulating initial conditions for atmospheric simulations and numerical weather prediction models [2,6]. In such applications, spatial interpolation is used to transfer measurements from meteorological stations to a fine grid or mesh. There is relatively little research published on wind field reconstruction by the geological community [3]. Recent studies on machine learning interpolation methods like neural networks, support vector machines and random forest in other geological applications such as air temperatures [7], snow depth [8], mineral concentrations [9] and wind speed [4,10,11] have shown promising results. These types of models trade interpretability for power and flexibility and allow for efficient inclusion of covariates such as elevation, terrain slope, concavity and roughness [7]. This raises the question of whether machine learning-based interpolation works for wind field reconstruction as well. In a 2018 study by Reinhardt & Samimi [3], the established method of regression kriging was shown to reliably outperform some neural networks and support vector machines in horizontal wind field reconstruction. The authors note that the performance of machine learning methods was considerably worse for near surface winds, and argue that the models are unable to capture the complex behaviour and high variability that arises from interactions between wind and terrain. Since the 2018 study by Reinhardt & Samimi, there have been significant advancements in the related field of fluid flow reconstruction. In particular, approaches based on generalizations of principal component decomposition have been trained to interpolate sparse measurements directly to high resolution grids with incredibly high accuracy [12-14]. The difference between these methods and traditional interpolation is that the training data consist of measurement from multiple times. However, the resolution of these methods can only be as high as the training data. High-resolution data are sometimes not available, or might take a considerable time and effort to simulate. In such cases, simpler interpolation models can arguably be a valid alternative. The definition of an interpolation model varies depending on the context. The definition used in our work draws from [9], and is synonymous with regression. In this paper, we compare a novel machine learning method known as random Fourier features with a selection of popular and successful interpolation techniques. The model fits a Fourier series to the data. Instead of traditional greedy optimization methods such as stochastic gradient descent, our random Fourier features model explores the frequency domain using an adaptive Metropolis algorithm inspired by [15]. In each step, the Fourier coefficients are optimized with respect to a loss function. The work [16] proposes a technique for interpolation of incompressible flow by incorporating zero divergence as a constraint in the optimization. We propose an alternative approach where the incompressibility takes the form of a regularization term. The method presented in this report only considers data from a fixed time point and does not model the wind flow through time. The method can be extended to model the flow for instance with the principal component technique explained in [13,17]. The article is structured as follows. In §2, a mathematical formulation of the wind field reconstruction problem is formulated. The data are also presented, along with error estimates used to evaluate the models. Section 3 contains a short introduction to the interpolation models. The results are presented in §4 and finally, discussed in §5.

Problem formulation

Let denote a geographical region. For the purpose of this report, is the set of points contained within the Swedish borders. We define the horizontal wind field that maps every point in space and time to a velocity vector . In mesoscale atmospheric models, a set of physical assumptions involving quantities like the wind field, temperature and pressure are used to pose a set of differential equations that describe the time evolution of the system. Given an initial state at the time and a set of boundary conditions, it is possible to simulate and forecast the wind field using these equations. The spatial interpolation methods considered in this work do not take previous or future times into account. Thus, only a subset of the physical assumptions about the system can effectively be enforced. Some of the presented models will incorporate the assumption of divergence-free flow, which is well established in mesoscale atmospheric models [6]. It can be written as a differential equation , where is the divergence of the wind field . The above notation as well as the notation presented in §2a,b will be used extensively throughout our work.

Data

The data were obtained from the Swedish Meterological and Hydrological Institute. It contains a set of wind observations during the entirety of 2018, from weather stations scattered across Sweden as shown in figure 1. Each measurement is collected 10 m above ground. The positions of the stations are given by the elevation—the height above sea level—as well as the latitude and longitude. The measurements are 10-min averages, collected once every hour. They consist of two components: the wind speed, which is the magnitude of the horizontal component of the wind vector, and the wind direction, which is the angle from which the wind comes, measured clockwise relative to north ( is east, is south and so on). Both measurements are rounded to varying degrees of precision depending on the station [18]. The stations are not active at all times. In fact, there are occasional hours with as few as one station reporting measurements. The wind measurements are highly autocorrelated over time, as demonstrated by the autocorrelation plots of the south-north and east-west wind components in figures 2 and 3. The data were processed before training. First, the latitude–longitude pairs were transformed to Cartesian coordinates where is the eastward-measured distance and is the northward measured distance, as shown in figure 1. This was done using the SWEREF 99 TM[1] map projection. Second, the wind measurements were transformed from polar to Cartesian coordinates forming a horizontal wind vector, where corresponds to the velocity component along the -axis and corresponds to the velocity component along the -axis. Lastly, all wind measurements from the month of September were removed. The measurements exhibited a root mean square wind speed of as opposed to for the remaining months, meaning that the error metrics were disproportionally affected by September. Furthermore, the wind field exhibited low spatial variability during this period. These highly uniform, low variance fields are not the main interest of the study. The removal of the September data did not affect the method comparison significantly.

Figure 1

Figure 2

Autocorrelation of the east-west component of the wind for the available weather stations, up to a lag of 300 hours. Each red line represents the autocorrelation of one weather station. (Online version in colour.)

Figure 3

Autocorrelation of the north-south component of the wind for the available weather stations, up to a lag of 300 hours. Each red line represents the autocorrelation of one weather station. (Online version in colour.)

The SWEREF 99 TM orthographic projection of Sweden. The weather stations are marked with circles, coloured according to their elevation . The coordinate system is drawn in the bottom left. The centre of the map is at , . (Online version in colour.) Autocorrelation of the east-west component of the wind for the available weather stations, up to a lag of 300 hours. Each red line represents the autocorrelation of one weather station. (Online version in colour.) Autocorrelation of the north-south component of the wind for the available weather stations, up to a lag of 300 hours. Each red line represents the autocorrelation of one weather station. (Online version in colour.)

Spatial interpolation models

For the purpose of this report, the definition of interpolation models is restricted to approximations of functions . The range is two-dimensional since we are modelling the horizontal wind vector field, and the region is Sweden. In general, interpolation models are also used for other properties like temperature, pressure and population density, etc. Nevertheless, we define a spatial interpolation model as a map from a set of velocity measurements to a vector field . The process of evaluating a model on is called training. Usually, the training is done by minimizing a loss function. The above notation will be used throughout our report. Note that there is an important distinction between and . The function represents a model, which is trained on data and produces a vector field that approximates . Any time a symbol is indexed by the letter or variations of it such as , it symbolizes an interpolation model trained on that particular dataset. A parametrized set of interpolation models of interpolation models is called an interpolation family, and the parameters are called hyper parameters. For example, the -nearest neighbours models can be viewed as an interpolation family with hyper parameter . The process of finding the best model in an interpolation family with respect to some quality of fit is called hyper optimization.

Quality of fit

Given an interpolation model and observations at different times in a time span , the quality of fit is defined as the expected square loss with respect to the distribution of the data in space and time : We only have access to a limited set of data, both in space and time. Hence, calculating the exact value of is not possible. Furthermore, using all available data to compute is too computationally expensive. Instead, we use a Monte Carlo sampling average in time and a cross-validation scheme in space. Let be all times in for which measurements exist, that is every full hour for the entirety of 2018, excluding the entirety of September. Next, we create a set of indices by sampling randomly from the integers 1 to with replacement. Then, define a set of time samples and let be the set of observations made at the time . As stated in §2a, the weather stations are not always active, so the number of measurements at the time varies. Define a random partition of the weather stations into disjoint sets of equal size , and denote . The fixed time expectations of the quality of fit at the sampled times are approximated using cross-validation: The above formula can be summarized by the following procedure. For every split , train the model on the data and calculate the square errors on the remaining points . Then, take the mean of all squared errors. We found that fivefold cross-validation () struck a good balance between computation time and accuracy. Averaging over the samples in yields a Monte Carlo sample estimate of : also called the variance unexplained. An alternative measure is obtained from normalizing by the expected square wind speed , denoted : This measurement is referred to as the fraction of variance unexplained. If the wind field has zero mean, the number is referred to as , or the coefficient of determination. We estimate the variance of as In §4, we show that the variance unexplained approximately follows a normal distribution. Consequently, two standard deviations of constitute an approximate 95% confidence interval for . Note that the confidence bound for is not sufficient when comparing models, since the errors can be highly correlated. Given two models we instead define the difference and estimate it as . The variance of is estimated similarly to (2.5): The sample size was adjusted to obtain 95% confidence intervals within 1% of the mean square wind speed on the 2018 wind dataset, for both the variance unexplained and difference in variance unexplained . Due to high autocorrelation in the wind demonstrated in figure 2 as well as the presence of seasonality and trends, these confidence intervals do not generalize to longer time intervals than 2018. The quality of fit was used for both hyper parameter optimization and validation of the models, but with different data samples to avoid overfitting.

Models

Fourier models

In this section, we introduce two Fourier series-based models. At the end of the section, we arrive at the random Fourier features model, which is the main focus of our report. In §3b, we present some well established and high-performing interpolation models. We use these as benchmark models when assessing the interpolation quality of the random Fourier features.

Fourier series

The Fourier series-based model takes the form where is the number of terms, denotes the scalar product between and and denotes the real part of the complex-valued vector . The Fourier series generalizes to arbitrary dimensions of the input , but in this report we chose to be just the horizontal coordinates. With slight abuse of notation, we let and denote two-dimensional vectors in and respectively. Furthermore, and . In the Fourier series-based model, is held fixed, and the parameters are estimated by optimizing with respect to the expectation of a loss function : where the expectation is taken over the joint density of the data . For our application, is unknown. Therefore, the expected loss is replaced by a Monte Carlo sample estimate of the loss, called the empirical loss. Given a dataset , the empirical loss is expressed as For this work, we chose the following loss function: The expression denotes the second-order squared Sobolev-norm of [19]: where is a hyper parameter. By orthogonality of the Fourier features, the Sobolev norm can be simplified to The hyper parameter was added to allow for more flexibility in the penalty, and is equivalent to rescaling the input variable before applying the Sobolev norm. The expression is the squared -norm of the divergence of , which can be rewritten as The intended effect of the Sobolev norm is to dampen high frequencies, and the divergence penalty is supposed to simulate incompressible flow. The empirical loss using the loss function as defined in (3.4) is Here, and are hyper parameters. In order to determine a suitable choice for , assume the standard regression setting , where is zero mean and independent of . Since the spatial region is bounded, can be extended periodically over . That is, the relation is imposed on for all , whole numbers and some two-dimensional period . This means that can be written as a Fourier series. Thus, can in theory be approximated arbitrarily well by as the number of terms tends to infinity. However, since there is only a limited amount of data and contains a limited number of terms, the choice of determines how well can approximate . In the Fourier series-based model, we settle on choosing as a square grid with side length , centred at the origin in the frequency domain: As such, the Fourier series-based model is a spatial interpolation family with the hyper parameters , , and . We found struck a balance between computational cost and accuracy. The remaining hyper parameters , , and were chosen as explained in the next section. Minimizing the expression in (3.8) amounts to solving a system of linear equations with respect to the elements .

Random Fourier features

Instead of optimizing with respect to , random Fourier features aims at solving the harder problem of also optimizing with respect to the Fourier frequencies : The random Fourier features is an example of a neural network with one hidden layer and trigonometric activation function [15]. Here, are the weights connecting the inputs to the hidden layer, and are the weights connecting the nodes in the hidden layer to the output layer. What distinguishes random Fourier features is the training algorithm. The training of neural networks is traditionally done using some greedy method such as stochastic gradient descent, but in random Fourier features, the frequencies are assumed to be randomly sampled according to some distribution . The task of optimizing frequencies is thus changed to optimizing . Note that the distribution can be deterministic, so no approximation error is introduced by this. Next, the optimization of is moved inside the frequency expectation This reduces the inner minimization problem to linear regression with Fourier features. Using the quadratic loss function defined in (3.4), the coefficients can be found using a standard matrix inversion. The key approximation of random Fourier features is to assume that the elements of are independent and identically distributed according to a density . That is, . The optimal is approximated by finding an analytical minimizer to an upper bound of (3.10). We assume the same standard regression setting as presented in the previous section, where is zero mean and independent of and a periodic function. Referring to proposition A.1 found in the appendix, we can then put an upper bound on the expected loss, where and are positive constants determined by the Sobolev and divergence loss functions, respectively, and is a random variable distributed according to , representing an arbitrary element of . The proof assumes that the distribution is discrete and has bounded moments up to a degree determined by the order of the derivatives used in the regularization. Furthermore, we show that this upper bound is minimized by choosing , where are the Fourier coefficients for the Fourier series expansion of . We make two important comments about the limitations of this method. First, note that we are only minimizing the bound of the iterated expectation in (3.12). Second, the iterated expectation is also only a bound for the true loss function (3.11). Despite these limitations, random Fourier features has in some examples been shown to outperform traditional methods of optimizing (3.10) like stochastic gradient descent, possibly due to its ability to efficiently learn high-frequency details early on in the training [15]. Since the target distribution is not known a priori, this distribution has to be approximated somehow. The authors of [15] present an adaptive Metropolis algorithm for sampling from in a related setting. The main differences are that is integrable on the entirety of and has a one-dimensional range. Drawing from the work in [15], we devise an adaptive Metropolis algorithm for sampling the frequencies from . This algorithm is described in detail below (algorithm 1). The hyper parameters of the random Fourier features consist of the Fourier frequencies , the periodicity , the regularization parameters and —these are also found in the Fourier series model—as well three new parameters that appear in the Metropolis sampling algorithm: the number of steps , the standard deviation of the random walk and a parameter that adjusts the acceptance probability of each step. Similarly to the work [16], the period was chosen equal in both cardinal directions, about two times the north-south length of the region of interest: . The regularization parameter was set to in order to prevent the second derivative from dominating the Sobolev regularization term. The number of frequencies was fixed to and the number of steps was fixed to . Given a -dimensional feature space and a Gaussian target distribution, [15] show that for a fixed computational cost, the upper bound (3.12) as is approximately minimized by . Furthermore, the classical result from [20] shows that the optimal variance of the proposal kernel for a general random walk Metropolis–Hastings algorithm is . The regularization parameters and were obtained by running a grid search to minimize the variance unexplained in (2.4), while keeping , , and fixed as specified above. Since the values for and are only optimal in the limit, a second grid search was done in a neighbourhood of the limit values for and , while keeping , , and fixed. The number of steps was adjusted to ensure convergence of the Metropolis algorithm to a stationary distribution, while also keeping the computation time low. Note that the random Fourier features algorithm is not guaranteed to find the optimal frequencies. There is ongoing research in this area, and an iterative greedy method is explored in [17].

Benchmarking models

The following interpolation models were used for benchmarking: nearest neighbours, inverse distance weighting (IDW), kriging, random forest, neural networks and a Fourier series-based model introduced in the previous section. This section will serve as a short introduction to each of the methods as well as motivation as to why they are relevant. We use the same notation for the measurements as in §2c.

IDW

IDW methods is an interpolation family of methods that are evaluated at a point by taking a component-wise weighted average of the horizontal wind vector data in . Specifically, the weights for a model from the IDW family are proportional to where is a distance on . That is, The singularities at are removed by setting . The hyper parameter adjusts the amount of influence each data point has over its immediate surroundings. Letting will result in all points weighing equally everywhere, i.e. taking the arithmetic mean of the data. Letting will result in the nearest neighbour method. Usually, is chosen somewhere in between. A common shortcoming of IDW is that the interpolated values are bounded by the maximum and minimum values of the data and therefore IDW fails to predict unobserved extreme points. The main benefits are interpretability and relatively short training time. In this report, two values of were tested, namely and (i.e. nearest neighbours). Furthermore, we used the horizontal distance between points, ignoring elevation.

Kriging

Kriging is a statistical approach to spatial interpolation [21]. The true velocity is assumed to satisfy the equality where is a deterministic function and is a constant-mean, stochastic process over . Although it is possible to model correlations between the vector components of , we treat the two as independent, and run separate kriging pipelines for each component. For the remainder of this subsection, we therefore assume that the residual is in . The key idea in kriging is to endow the residuals with a specific translation-invariant structure. Namely, it is assumed that the variance of the difference between two arbitrary residuals measured at points and depends only on the distance through a function called the variogram: The process of training a kriging model consists of two steps. First, a deterministic model is used to estimate the mean from the data, resulting in some trained model . For each , the residuals are estimated as . The second training step then consists of fitting a variogram to the residuals . Evaluating the model on a specific point can then be formulated as solving a minimization problem involving the variogram. Namely, kriging seeks a linear combination such that the weights minimize the expected square error . In order to get an unbiased estimate, the minimization is often subjected to a constraint . Solving the constrained minimization problem results in a linear system of equations and where is the Lagrange multiplier, an artificial variable that is brought in to enforce the constraint (3.17). Solving this system of equation produces a vector of weights , which can be used to provide the final estimation of : A key strength of kriging is that the statistical model allows for point-wise error estimation. Furthermore, kriging can be combined with virtually any other unbiased deterministic interpolation model by interpreting it as the mean . However, when the statistical assumptions do not hold, kriging can perform poorly. Machine learning methods such as random forests have been successful in beating kriging for various spatial interpolation tasks, see for example [7,9]. The authors in [9] argue that even though kriging might be redundant in terms of accuracy, it remains a valuable tool for understanding data, exactly because of its statistical properties and interpretability. We used a version of kriging called Universal kriging, which differs from kriging in that a linear regression approximation of the mean and the residuals are fitted to the data simultaneously, resulting in a joint system of equations for all the weights. The Python package pykrige was used to implement Universal kriging with a linear variogram and a piece-wise linear mean.

Random forest

Random forests are constructed by averaging an ensemble of regression trees. A regression tree is a function that recursively splits the domain of interest into smaller domains, based on a user-specified criteria. Each domain is then assigned a constant value by minimizing some objective function on the training data [22]. In a random forest, each tree is trained on a random sample of the data and each split in the tree is chosen by randomly selecting one feature out of the input features and picking a split that minimizes the mean square error [23]. Random forests have been used successfully in spatial interpolation problems, for example to predict temperatures on and around Kilimanjaro, Tanzania [7] as well as mineral concentrations [9]. The main drawback of random forests is lack of interpretability. In this report, we used a random forest with trees with mean square loss for splitting, and unlimited tree depth. The forest was implemented in Python using the package scikit-learn. Furthermore, the random forest was trained on a polynomial feature map of the horizontal coordinates and the elevation . That is, the trained model is a composition of the random forest and the feature map , which is fixed. The feature map increases expressivity of the random forest, which can improve performance if the data are not too noisy or sparse. There are many possible ways to construct feature maps. We found that a combination of polynomial features with a total order of struck a good balance between expressivity and generalizability.

Feedforward neural network

Feedforward neural networks have been used extensively in different areas of applied mathematics. To read more about neural networks, see for example [24]. In this report, only a specific family of feedforward neural networks was considered. Namely, the networks are characterized by three fully connected hidden layers, a constant number of nodes in each layer and the ReLU activation function. The input layer consists of the three spatial coordinates and . The weights were optimized using the Adam algorithm [25], with respect to the -regularized loss. The network was implemented in TensorFlow, and hyper parameter optimization of the number of nodes and regularization parameter was done with a grid search on the variance unexplained.

Weighted linear combination of model

Given a set of interpolation models and a dataset , we can improve on the individual models by forming a weighted average where are the weights. The weights are regarded as hyper parameters. If the number of models is not too high, there is little risk of overfitting, and the hyper parameters can simply be directly fitted to minimize the quality of fit. In §4, we use this method to combine the random forest and random Fourier features models.

Results

We begin the results section by establishing a suitable choice for the number of time samples , as discussed in §2c. We are looking to satisfy two main conditions. First, needs to be approximately normally distributed. As seen from figure 4, the central limit theorem seems to hold for estimates of with sample sizes . Using normality of and means that two standard deviations of and correspond to 95% confidence intervals for and , respectively. Second, needs to be sufficiently large for the error to be reasonably small. We chose . Tables 1 and 2 show that this sample size achieves the goal of relative error with 95% confidence for both and that we formulated in §2c. The only exception is the nearest neighbours model, for which the confidence interval for the error estimate is 2%. This is still sufficient accuracy for our purpose.

Figure 4

A histogram of 5000 samples of the variance unexplained with , bootstrapped from a total of 500 samples of . The model is the Fourier series, and the black line is a fitted normal distribution.

Table 1

Quality of fit measurements and with 95% confidence intervals for a number of different models. The dimension is indicated at the top row.

interpolation model	E~ [1]	Q~ [m2 s−2]
nearest neighbours	0.628±0.022	11.145±0.386
inverse distance weighting	0.407±0.014	7.220±0.250
Universal kriging	0.388±0.013	6.887±0.235
random forest (RF)	0.386±0.013	6.862±0.238
neural network	0.381±0.013	6.762±0.225
Fourier series	0.380±0.013	6.740±0.226
random Fourier features (FF)	0.370±0.012	6.569±0.220
FF and RF average	0.357±0.012	6.333±0.212

Table 2

Difference in quality of fit and with 95% confidence intervals for the benchmark models relative to the random Fourier features model (FF). The dimension is indicated at the top row, in square brackets. A positive number means that the given model is worse than random Fourier features.

interpolation model	ΔE~ [1]	ΔQ~ [m2 s−2]
nearest neighbours	−0.258±0.010	−4.576±0.183
inverse distance weighting	−0.037±0.003	−0.651±0.049
Universal kriging	−0.018±0.002	−0.318±0.033
random forest	−0.017±0.003	−0.293±0.060
neural network	−0.011±0.003	−0.192±0.056
Fourier series	−0.010±0.001	−0.171±0.017
FF and RF average	−0.013±0.002	−0.236±0.032

A histogram of 5000 samples of the variance unexplained with , bootstrapped from a total of 500 samples of . The model is the Fourier series, and the black line is a fitted normal distribution. Quality of fit measurements and with 95% confidence intervals for a number of different models. The dimension is indicated at the top row. Difference in quality of fit and with 95% confidence intervals for the benchmark models relative to the random Fourier features model (FF). The dimension is indicated at the top row, in square brackets. A positive number means that the given model is worse than random Fourier features. The quality of fit measurements reported in table 1 were all obtained using this sample size, and the reported uncertainty corresponds to two standard deviations, estimated according to (2.5). As the table shows, the confidence bounds vary slightly depending on the model. We observe that the distribution is narrower for better performing models. The same sample size was also used for the differences between the quality of fit of the benchmarking models and the random Fourier features model listed in table 2. Figure 5 shows examples of the reconstructed wind field for the Fourier series, random Fourier features and Universal kriging interpolation models.[2] Figure 6 shows hourly means of the variance explained for different seasons throughout 2018.

Figure 5

Figure 6

Hourly variance unexplained at different seasons, for a selection of the investigated interpolation models. The remaining models were left out to prevent clutter. The measurement is aggregated over all stations and plotted on a log-scale to improve visibility. Note that September is not in this data. The unexplained variance of the zero model corresponds to the mean square wind speed. (Online version in colour.)

Reconstructions of the near-surface wind field in Sweden at 10.00, 22 January 2018. The blue streamlines show the reconstructed wind field, the green arrows show the wind velocity measurements at stations used for training the reconstruction, and the red hollow arrows are unseen test data points. From left to right, the fraction of variance unexplained for the above methods at this specific time is 0.57, 0.55 and 0.70. These values were measured on the test data points. (Online version in colour.) Hourly variance unexplained at different seasons, for a selection of the investigated interpolation models. The remaining models were left out to prevent clutter. The measurement is aggregated over all stations and plotted on a log-scale to improve visibility. Note that September is not in this data. The unexplained variance of the zero model corresponds to the mean square wind speed. (Online version in colour.) The hyper parameter grid searches for the random Fourier features model are shown in figures 7 and 8. The optimal hyper parameters for the random Fourier features model were for the Sobolev regularization constant , for the divergence penalty , for the exponent and for the step size in the proposal kernel in the adaptive Metropolis algorithm (algorithm 1). Running the Metropolis algorithm for steps struck a good balance between convergence and computation time. Figure 9 shows an example of the frequency distribution that the algorithm samples from, and figure 10 shows the magnitude of the Fourier coefficients for the Fourier series model as a comparison.

Figure 7

Fraction of variance unexplained (2.4) in the random Fourier features method, as a function of the Sobolev penalty and divergence penalty explained in (3.4), with , , . (Online version in colour.)

Figure 8

Fraction of variance unexplained (2.4) as a function of the exponent and step size in the adaptive Metropolis algorithm, with , , . (Online version in colour.)

Figure 9.

Sampling density of the random Fourier features algorithm for measurement data from 10.00, 22 January 2018. Algorithm (1) was run with , collecting the Fourier frequencies at each step into a histogram. The profile plots show marginal distributions for the latitude frequencies (top) and longitude frequencies (right). (Online version in colour.)

Figure 10.

Heat map showing the magnitudes of the Fourier terms as a function of the frequencies in the Fourier series model, for measurement data from 10.00, 22 January 2018. The support is a square grid with width 41, centred at the origin. The profile plots are analogous to figure 9. (Online version in colour.)

Fraction of variance unexplained (2.4) in the random Fourier features method, as a function of the Sobolev penalty and divergence penalty explained in (3.4), with , , . (Online version in colour.) Fraction of variance unexplained (2.4) as a function of the exponent and step size in the adaptive Metropolis algorithm, with , , . (Online version in colour.) Sampling density of the random Fourier features algorithm for measurement data from 10.00, 22 January 2018. Algorithm (1) was run with , collecting the Fourier frequencies at each step into a histogram. The profile plots show marginal distributions for the latitude frequencies (top) and longitude frequencies (right). (Online version in colour.) Heat map showing the magnitudes of the Fourier terms as a function of the frequencies in the Fourier series model, for measurement data from 10.00, 22 January 2018. The support is a square grid with width 41, centred at the origin. The profile plots are analogous to figure 9. (Online version in colour.) We omit the hyper optimization results for the benchmarking models, which were all performed using the same type of grid search methods. The weights for the linear combination of the random forest and random Fourier features were chosen to minimize loss on the training data. We obtained a weight of for the random forest, and for random Fourier features.

Discussion

Table 1 shows that the model consisting of an average between the random forest and random Fourier features model performed the best out of the tested models. Table 2, consisting of the difference between quality of fit of the random Fourier features and the remaining models, indicates that this ordering is statistically significant, since none of the confidence intervals overlap with zero. In particular, the transition from the fixed frequencies in the Fourier series-based model to the randomly sampled frequencies of the random Fourier features results in a significant improvement. The enlarged area in figure 5 shows that the reconstructed wind field is better aligned with unseen data when using random Fourier features. The numerical experiments in [15] indicate that the random Fourier features model outperforms its reference models as the amount of data increases. The improved performance is likely because the high-frequency details can be captured by exploring remote parts of the frequency domain, whereas a fixed grid of frequencies centred at the origin cannot. Figure 9 shows a noticeable mass outside the white square, compared with figure 10, in which the high frequency contributions are truncated. The success of the random forest and random Fourier features average hints that there might be more potential for improving the results using similar types of mixture models. It is also clear that the random Fourier features model on its own is competitive in comparison with the benchmarking models. A common argument for favouring statistical models such as kriging is easy access to precise error analysis, given that the prerequisite assumptions discussed in §3b(ii) hold. Whether or not this is worth trading in exchange for higher accuracy depends on the situation. The random Fourier features model has the upside of being easy to manipulate once it has been trained. It can be efficiently evaluated, differentiated and integrated for analysis of physical quantities such as divergence, vorticity and energy. In figure 6, the relative performance of the interpolation models is visualized at different times of day and times of the year. The relative performance of the models is consistent throughout. A source of difficulty in wind field reconstruction is near-surface turbulence. In general, the interpolation error is lower during night-time. For the spring and summer months, this can be explained by the observed decrease in wind speeds during night-time. For autumn and winter, however, wind speeds are consistently high. The atmosphere tends to be more stable at night-time during the winter months [26], which might explain this decrease in error. The autocorrelation plots in figures 2 and 3 demonstrate a strong autocorrelation in the wind field. Furthermore, figure 6 shows that the wind exhibits seasonal shifts in strength, as well as daily variations. This behaviour is well documented [26]. The models used in [12-14] are trained on multiple time samples whereas the spatial interpolation models used in our work only use the time aspect for hyper parameter optimization. Therefore, extending the spatial interpolation models to take time into account could improve the results. Note that since the data are only collected throughout 1 year, it is not possible to see the effect of inter-annual variability on the modelling error. More careful preprocessing of the data that account for these effects is needed to establish which model is optimal over a longer time interval. As seen in figure 7, there is little to no change in the quality of fit when setting the divergence penalty to zero. Albeit a compelling idea, it makes sense that penalizing the divergence on a two-dimensional slice of the wind field would not improve the results. The near-surface wind flow is not typically parallel with the ground due to thermally and mechanically induced turbulence in the atmospheric boundary layer [27]. If we were to transition from a two-dimensional to a three-dimensional setting by incorporating elevation as a feature in the random Fourier features, divergence penalization would likely be more useful. More advanced methods can possibly incorporate no-slip or slipping boundary conditions on the ground surface, see [16]. Lastly, we discuss some general areas of improvement that are well documented and known in the environmental sciences community. First, the results may be improved by incorporating weather stations from nearby regions such as the Baltic Sea, Finland, Norway and Denmark. The random Fourier features model is constructed to efficiently find high frequency details in the target function, which makes it suitable when increasing the resolution of the data. The inclusion of additional covariates could also improve the results. Reinhardt & Samimi [3] include geopotential, potential vorticity, relative humidity, vertical velocity and elevation, and note that these covariates improve the results. Furthermore, orographic features like terrain roughness and topographic sheltering are known to affect the wind characteristics in Sweden [26].

Conclusion

In this report, we explored the potential for wind field reconstruction with sparse data using interpolation models. We investigated whether a novel interpolation model called random Fourier features model is competitive with respect to popular statistical interpolation models such as kriging, as well as modern machine learning methods such as random forests and neural networks. Drawing from the work [15], we derived an upper bound for the mean square error of the random Fourier Frequencies model, found a density that minimized this bound and devised an adaptive Metropolis algorithm for sampling from this density. We showed that random Fourier features is competitive with respect to a time-space average of the square error and suggested some future areas of research, such as extending the model to incorporate data over multiple times and including more terrain-specific features. The key takeaway of our work is an improved spatial interpolation model for wind field reconstruction that is able to capture highly irregular near-surface wind features. Click here for additional data file.

3 in total