Literature DB >> 28148972

Empirical coverage of model-based variance estimators for remote sensing assisted estimation of stand-level timber volume.

Johannes Breidenbach¹, Ronald E McRoberts², Rasmus Astrup¹.

Abstract

Due to the availability of good and reasonably priced auxiliary data, the use of model-based regression-synthetic estimators for small area estimation is popular in operational settings. Examples are forest management inventories, where a linking model is used in combination with airborne laser scanning data to estimate stand-level forest parameters where no or too few observations are collected within the stand. This paper focuses on different approaches to estimating the variances of those estimates. We compared a variance estimator which is based on the estimation of superpopulation parameters with variance estimators which are based on predictions of finite population values. One of the latter variance estimators considered the spatial autocorrelation of the residuals whereas the other one did not. The estimators were applied using timber volume on stand level as the variable of interest and photogrammetric image matching data as auxiliary information. Norwegian National Forest Inventory (NFI) data were used for model calibration and independent data clustered within stands were used for validation. The empirical coverage proportion (ECP) of confidence intervals (CIs) of the variance estimators which are based on predictions of finite population values was considerably higher than the ECP of the CI of the variance estimator which is based on the estimation of superpopulation parameters. The ECP further increased when considering the spatial autocorrelation of the residuals. The study also explores the link between confidence intervals that are based on variance estimates as well as the well-known confidence and prediction intervals of regression models.

Entities: Chemical Disease

Keywords: Forest inventory; Image matching; Model-based inference; Synthetic estimator; Variance estimation

Year: 2016 PMID： 28148972 PMCID： PMC5268351 DOI： 10.1016/j.rse.2015.07.026

Source DB: PubMed Journal: Remote Sens Environ ISSN： 0034-4257 Impact factor: 10.164

Introduction

The use of airborne laser scanning (ALS) data in operational forest management inventories (FMI) has a long tradition in the Nordic countries (Maltamo and Packalen, 2014, Næsset, 2014). Usually, the area-based approach (ABA) is adopted, where the study area is gridded into small cells for which height and density metrics are calculated from ALS data (Næsset, 1997, Næsset, 2014). A model linking the variable of interest, such as timber volume, to the ALS metrics is estimated using field sample plots where the variable of interest and the ALS data are both available. The linking model is then applied to the grid cells to map the timber volume. A main product of an FMI is a map of mean stand-level timber volume where the mapped value for each stand is calculated as the mean of timber volume predictions for grid cells whose centers are in the stand. Although the ABA was developed using ALS data, it is well suited for the use with other remote sensing data providing high-resolution height information. For example, photogrammetric image matching data are increasingly popular to estimate forest parameters using the ABA due to the increasing availability of high-quality digital terrain models as well as improved hard- and software (e.g., Bohlin et al., 2012, Breidenbach and Astrup, 2012, Vastaranta et al., 2013). In the terms of survey sampling, the ABA is one form of small area estimation (SAE), since the stands are so small or so remote that few if any sample plots are located within them. From the perspective of SAE, ALS metrics are auxiliary data and aggregating the predictions for grid-cells at stand-level is a synthetic estimate for a small area (Rao, 2003, p. 46). This estimate is termed synthetic because only model predictions are used, with no correction for model prediction errors. While design-based estimators are generally preferred in forest inventories if enough field observations are available because they are asymptotically unbiased, synthetic estimators are generally model-based (Chambers & Clark, 2012, p. 169). The basic difference between model-based and design-based inference is the source of randomness (Kangas, 2006). Whereas randomness is introduced by sample selection in design-based inference and observations are assumed to be fixed values, the observations are assumed to be a random realization of a joint distribution known as the superpopulation in model-based inference. One consequence of the differences in these underlying assumptions is that probability samples are not necessary for model-based estimators. For an introduction to model-based inference and a comparison to design-based inference see Gregoire (1998). Although studies aiming at small area estimation using remotely sensed data are plentiful, the uncertainty of estimates is often ignored for management applications. While the number of SAE studies in forestry including inference is increasing (e.g., Breidenbach and Astrup, 2012, Goerndt et al., 2013, Lappi, 2001, Magnussen et al., 2014, Steinmann et al., 2013), the number of studies that provide methods for stand-level inference using synthetic estimators is small (e.g., Kangas, 1996, Mandallaz, 2013, McRoberts, 2006). In this study, we focus on synthetic estimation, which is relevant for small areas that frequently contain no or too few observations to apply other estimators. The context of the study is model-based inference which assumes that an entire distribution of observations is possible for each population unit. In this context, prediction of an individual observation (a finite population value) is distinguished from estimation of the expected value of the distribution of observations (a superpopulation parameter) (Kangas, 2006, p. 40). Although the prediction of an observation and the estimate of its expected value are the same for models relevant in our context, the variance estimates may be quite different. This paper focuses on different approaches to estimating the variances. A variance estimator based on the estimation of superpopulation parameters just considers the variance resulting from the estimation of the model parameters and is therefore independent of stand size (e.g., Kangas, 1996, Mandallaz, 1991, McRoberts et al., 2014). Kangas (2006) and McRoberts (2006) described a variance estimator which is based on predictions of finite population values rather than estimates of superpopulation parameters. The estimated variance is therefore dependent on stand size. Kangas (2006) described the basic form of the variance estimator in a general setting, not specifically for SAE. McRoberts (2006) extended the variance estimator for spatial autocorrelation and applied it to the binary variable forest/non-forest. We modify the variance estimator described by McRoberts (2006) for application to a continuous response variable for which we accommodate heteroskedasticity and spatial autocorrelation. The aim of this study is to compare a variance estimator based on the prediction of superpopulation parameters with variance estimators based on predictions of finite population values in the context of synthetic estimation. Furthermore, we link the variance estimators to the concepts of prediction intervals and confidence intervals well-known from regression analysis. In a case study, we use Norwegian National Forest Inventory (NFI) data to estimate stand-level mean timber volume. Photogrammetric image matching data processed using the ABA serve as auxiliary information. To compare estimators, the empirical coverage proportions (ECP) of the confidence intervals based on the different variance estimators are obtained using independent validation data.

Methods

Estimators

A linking model describes the statistical relation between the response (variable of interest) denoted y and the auxiliary variables x which, in this case, are obtained from remotely sensed datawhere i indexes observations, = (1, ⋯, ) = (1 1 ⋯ ) is a n × (p + 1) design matrix, p is the number of auxiliary (explanatory) variables, = (β0, β1, ⋯, β) is a vector of model parameters to be estimated, and ε is a residual. The residual variance is expressed as the product of σ2 and a n × n matrix where σ2 is the mean square residual. In the case of homogeneous variances, is an identity matrix (w = 1). In the case of heteroskedsticity, the diagonal elements contain appropriate weights w that result from a variance model. In the case of autocorrelation, also the off-diagonal elements contain appropriate weights w that result from a model describing the correlation pattern among the residuals. To simplify the following estimators, we assume a linear model f. While we assume that the auxiliary variable is available wall-to-wall in the areas of interest, the response is only observed at a sample of the population. In forest inventories, the response is typically observed at n sample plots systematically distributed over the landscape with distances between plots in the range 100–1000 m. In general, synthetic estimators describe a group of estimators for small areas that are based on a population level model, assuming that the characteristics of the large area hold for the small areas (Gonzalez, 1973, NCHS, 1968). This means that differences in estimates for different areas are explained by differences in the auxiliary variables rather than differences in relationships between the response and auxiliary variables (Särndal, Swensson, & Wretman, 1992, p. 411). Synthetic estimators are potentially biased, but the bias can be small if the linking model holds in the small area. Suppose, enough observations were available within a small area to support fitting a local linking model just for the small area. If the estimated model parameters of the local linking model are very similar to those for the linking model fitted to the large area, the bias of the synthetic estimator would be small. However, usually large numbers of observations within stands are not available in operational forest inventories. The bias of the synthetic estimator will therefore usually remain unknown. The regression-synthetic estimator (Rao, 2003, p. 46), as one specific synthetic estimator, is the mean of predictions of a linking model for units within a small area. If the linking model is a linear regression model as assumed in this study, the mean of the model predictions equals the product of the means of the auxiliary variables and the regression coefficient estimateswhere i = {1, …, N}, N is the number of population elements within small area m, and m = {1, …, M} where M is the total number of small areas. The upper case letter is used for the small area-level estimate which is the estimated mean of predictions for population elements . Very small areas can also consist of only one population element in which case . The notation indicates that the model prediction is an estimate of the superpopulation parameter (the expected value of the linking model given the explanatory variables), not a prediction of the observation, ŷ. The first representation of estimator (2) is applicable for linear models (Kangas, 1996, Mandallaz, 1991), the second and third representation is more generally valid (e.g., also for nonlinear models) (McRoberts, 2006, McRoberts et al., 2013). In the ABA, the population elements are often designated grid-cells and the small areas are typically stands. Typically, some of the grid cells will overlap with the sample plots used to fit the linking model (1). While Næsset (1997) was among the first to apply the regression-synthetic estimator in the ABA, Kangas (1996), McRoberts (2006), Mandallaz (2013), and McRoberts et al. (2013) have, among others, described the variance of the estimator in a forest inventory setting. Because estimator (2) is the mean of the predictions, the variance is the two-dimensional mean of the covariances of the predictionswhere the subscript p identifies the estimator as considering parameter uncertainty, i = j = {1, …, N} index grid cells and is the estimated covariance matrix of the parameter estimates of the linking model (e.g., Fahrmeir, Kneib, & Lang, 2007). This variance estimator incorporates the uncertainty in the estimate of the mean due to the uncertainty in the estimates of the regression parameters. The general representation of the estimator is given by McRoberts (2006), and a specific representation for linear models is given by Kangas (1996) and Mandallaz (2013). In model-based inference, one can either estimate the superpopulation parameter (expected value, μ) or predict the finite population value (observation, y) for each grid-cell. For the estimate of a mean within a small area, both approaches are equivalent and estimator (2) can be written However, for the variance estimate, there is a difference as we will see below. Because the variance estimator (3) depends only on the explanatory variables and the parameter covariance matrix, this estimator is independent of stand size (McRoberts et al., 2014). Although this is correct when estimating the superpopulation parameter, this property is somewhat counter-intuitive from a design-based perspective for which one would assume that the uncertainty of an estimate decreases as the number of grid cells within a stand increases. Furthermore, forest stands often consist of only few grid cells and ignoring the residual error variance may not be adequate. An estimate of the residual variance can be added to the variance of the synthetic estimator to compensate for that error term (Prasad & Rao, 1990)where the subscript pr identifies the estimator as considering parameter and residual uncertainty. Heteroskedasticity can be incorporated bywhere is estimated from the model residual variance using appropriate weights. The subscript prh identifies the estimator as considering parameter uncertainty and heteroskedastic residual uncertainty. In general, heteroskedsticity and autocorrelation within a small area can be accommodated bywhere is the estimated correlation between two grid cell residuals i and j due to (spatial) autocorrelation. The subscript prhs identifies the estimator as considering parameter uncertainty and spatially correlated heteroskedastic residual uncertainty. McRoberts (2006) described the variance estimator in the context of estimating forest area and thus for the binary variable forest/non-forest. Estimator (7) is the most general form of the synthetic variance estimators described here. Estimator (7) reduces to estimator (6) by ignoring spatial autocorrelation and to estimator (5) by further assuming a homoscedastic variance structure . We obtain estimator (3) by ignoring the effect of residual error . Estimators for totals can be obtained by omitting the terms or in the estimators (2), (3), (4), (5), (6), (7).

Confidence intervals based on variance estimates vs. confidence intervals and prediction intervals known from regression analysis

A confidence interval (CI) is an interval around an estimate of a population parameter which will or will not include the true population parameter. If many CIs are estimated based on independent samples from the same population, a proportion of them will include the true population parameter. The proportion of CIs that include the true population parameter if all assumptions are met is determined by the confidence level given by 100(1 − α) % which, in forest inventories, is typically set to 95% and thus α = 0.05. The coverage probability is the proportion of CIs that actually contain the population parameter of interest. A CI of an estimate can be constructed aswhere MSE is the mean squared error and t is the upper α/2 point of the Student's t distribution with n − 1 degrees of freedom (Thompson, 2002, p. 30). The MSE is the sum of variance and squared bias. Since the bias cannot be determined in synthetic estimation, we will use the variance estimates instead of MSEs in Eq. (8). The term is also known as the standard error (SE) which is a common measure of uncertainty. Half CIs (CI(⋅)/2, α = 0.05) are consequently tSE ≈ 2SE. The notation of the CIs follows the notation of the variance estimators. For example, is a CI based on variance estimator (7). Two additional kinds of intervals are well-known from regression analysis (e.g., Fahrmeir et al., 2007): (i) The CI, similar to the description above, is the interval around the regression line in which the expected value is likely to be if the regression were repeated with other samples taken from the same population; (ii) in contrast to a CI which focuses on inference for the superpopulation parameter, μ, a prediction interval (PI) focuses on an inference for a finite population value, y. Thus, the PI is the interval around the regression line in which the observations are likely to be if the regression were repeated with other samples taken from the same population. The CI of a linear regression such as the linking model (1) is given bywhere is the prediction of the response given the vector of explanatory variables 0 at an arbitrary position for which the CI should be estimated, p is the number of parameters in the linking model, and is the design matrix of the linking model. Recalling that the covariance matrix of the parameter estimates of the linking model can be estimated by , we see that is numerically similar to if the mean of the auxiliary variables in stand m is used to estimate the CIs . The difference between the two CIs is the degrees of freedom for the Student's t distribution which is n − 1 for and n − p for . However, under practical conditions with a sufficient number of observations and a reasonable number of auxiliary variables, the two CIs can be considered equivalent. The PI is given by Comparing Eqs. (5), (10), we see that is numerically similar to PI(ŷ(0)) if the stand consists of only one population unit (N = 1) and . As for the CIs the difference between and PI(ŷ(0)) is the degrees of freedom for the Student's t distribution. For stands with N > 1, will be between and PI(ŷ(0)).

Case study

Overview

The aim of the case study was to compare the variance estimators previously described with respect to empirical coverage proportions (ECPs) for estimates of mean stand-level timber volume. NFI plot data were used to fit the linking model used with the estimators. The study area was located in Vestfold county, in southern Norway. The validation data were independent of the calibration data as they were obtained from 64 stands, each of which included 5–7 sample plots. The validation stands were created by assuming that one stand consists only of the sample plots within the stand. The validation data are mainly used to calculate the coverage proportion of the confidence intervals obtained from the different variance estimators. Validation data may, in operational applications, not be available.

Calibration data

Data for n = 131 sample plots in Vestfold county measured between 2009 and 2013 were used to fit the linking model (Breidenbach & Astrup, 2012). General information on the NFI can be found in Tomter, Hylen, and Nilsen (2010) and Kolshus (2014, Chapter 7.2). The NFI uses an interpenetrating panel design where one-fifth of the sample plots are surveyed every year. The NFI uses permanent, circular, 250-m2 sample plots on which all trees with diameter at breast height (dbh, 1.3 m) ≥ 5 cm are measured. Trees for height measurements are selected using a relascope with an expected number of 10 trees per plot. All heights are measured if the plot contains 10 or fewer trees. Heights of the remaining trees are estimated using height curves based on the measured trees. Timber volumes of single trees are estimated using species-specific volume models with dbh and height as explanatory variables (Braastad, 1966, Brantseg, 1967, Vestjordet, 1967). The variable of interest in this study, timber volume per hectare, is the sum of the single tree timber estimates per sample plot scaled to a per unit area. Uncertainties of the single tree timber volume model predictions are ignored. Since the validation data were obtained in 2011, the timber volumes observed at the NFI sample plots were linearly interpolated and extrapolated to 2011. Without interpolation and extrapolation, the model fit to NFI data would have exhibited a systematic lack of fit when compared to the validation data. Details of the interpolation and extrapolation are in the Appendix A. Table 1 gives an overview of plot level characteristics of the response variable.

Table 1

Characteristics of the variable of interest (timber volume, m3/ha).

	Mean	SD	Max
Calibration plots	164.75	124.71	756.32
Validation plots	193.02	141.23	947.80
Validation stands	193.53	113.82	547.54

Validation data

Stand polygons were available from a previous forest management inventory (FMI). A first step in Norwegian FMIs is stand delineation based on manual photo interpretation. Data for 382 sample plots clustered in 64 stands were used as validation data; 5–7 plots were measured in each stand. The stands were selected under some constraints: (i) the stands had to be between 1–3 ha in size and compact with where A is the area and P is the perimeter of the polygon to minimize problems with defining stand borders in the field; (ii) in order to assure a wide range of volumes in the validation data, the same number of stands were randomly selected from two strata. The strata were formed based on the estimated volume in the FMI which was greater than 150 m3/ha in one stratum and less than or equal to 150 m3/ha in the other. A total of 34, 15 and 15 stands were selected in the municipalities Lardal, Holmestrand, and Stokke in Vestfold county. A 20-m × 20-m grid was superimposed over the selected stands, and 5–7 randomly selected grid intersections per stand were used as plot locations. Field crews navigated to the sample plots using hand-held GPS receivers. The established plot position was recorded with a differential GPS resulting in accuracies that are assumed to be on the order of one meter. Field work was carried out in 2011 following the NFI field protocol described in the section above. Parts of the data were used in previous studies by Solberg, Astrup, Breidenbach, Nilsen, and Weydahl (2013) and Breidenbach and Astrup (2014). Table 1 gives an overview of plot and stand level characteristics of the response variable. The validation plots do not coincide with the calibration plots.

Aerial photogrammetry data

Overlapping digital aerial images with a ground sampling distance (resolution) of 20 cm were acquired in 2010 using an UltraCamX sensor. The images were photogrammetrically processed using the default NGATE setting of SocetSet 5.5.0 which resulted in a digital surface model (DSM) of 1 m resolution. The DSM was normalized to heights above ground by subtracting a digital terrain model (DTM). The best available DTM was used for normalization. For the largest part of the study area, this was the Norwegian standard DTM with a resolution of 10 m. In Lardal municipality, an ALS DTM with 1 m resolution was available. Heights above ground estimated from aerial photogrammetry data are denoted AP heights. The data were interpolated to 20 cm × 20 cm grids and delivered with true ortho photographs by the data provider. More details of the data can be found in Breidenbach and Astrup (2012). A variety of height and density metrics for the AP heights above ground (measured in dm) within the sample plots was calculated using FUSION (McGaughey, 2014). All statistical calculations were carried out using the R software for statistical computing (R Development Core Team, 2014).

Linking model

After comparison of several model alternatives, a simple linear linking model with intercept term and the mean of the AP heights as the only explanatory variable was fit to the calibration data:where i indexes sample plots, y is timber volume (m3/ha), β is a regression coefficient, x is the mean AP height. To accommodate heteroskedasticity, we fitted a weighted model (see Eq. (1)). Because residual variances were proportional to the explanatory variable, each observation in the regression model was weighted by the inverse of σ2 = σ2x where σ2 is the residual variance. Spatial autocorrelation was not considered in the model since the sample plots have a minimum separation distance of 3 km which is greater than the range of spatial correlation. Besides the coefficient of determination, the root mean squared error (RMSE) was used as a measure of model fit:where y is the observed and ŷ the predicted value. RMSE% is defined as the RMSE as a percentage of the mean observed volume at the NFI plots. The linking model had a R2 = 0.69, RMSE = 75.0 m3/ha and RMSE% = 46%. The estimated parameters were (p-value < 0.001), (p-value < 0.001) and . The estimated parameter covariance matrix used in the variance estimators was

Modeling spatial autocorrelation using validation data

The calibration data (NFI plots) which were interpolated and extrapolated to 2011 coincided well with the validation data (Fig. 1). The generalized least squares method was used to fit a model to the sample plot data for the validation stands to estimate the spatial autocorrelation structure of the residuals within stands:where m indexes stands and i indexes plots within stands. The stand-specific weighting matrix has the diagonal elements . The off-diagonal elements of are given by . We assumed a stationary and isotropic Gaussian process without nugget to estimate the correlation among pairs of residuals within a standwhere is the estimated range and s is the Euclidean distance between two sample plots (Pinheiro & Bates, 2002, p. 232). The gls function in the R package nlme (Pinheiro, Bates, DebRoy, Sarkar, & R Core Team, 2013) was used to fit the model. The spatial autocorrelation function can also be localized as described for example by Räty, Heikkinen, and Kangas (2011). However, to focus on the estimators, we chose a relatively simple stationary model.

Fig. 1

Timber volume versus mean height derived from aerial photogrammetry (AP) for NFI and validation data.

A range of 23.0 m of the spatial autocorrelation was estimated from the validation data.

Application of the estimators

The validation plots were treated as grid cells in the estimators, such that a stand consists of only the grid cells. The advantage of this procedure is that observations (rather than estimates) at stand level are available. The disadvantage is that the stands are rather small because they consist of only 5–7 sample plots. This means in the estimators N = n and ∑ N = 382. The ECP is defined as the proportion of stands for which the observed mean timber volume is within the CI. The CIs were estimated such that the coverage proportion of all estimators should be 95% by letting α = 0.05. The spatial autocorrelation structure described in (14) was used in the variance estimator (7). The ECP of was estimated with the spatial range m estimated using the validation data. In addition, the ECP of was estimated with different alternative values of between 0 m and 120 m to evaluate the sensitivity of the estimates to spatial autocorrelation. To compare the CIs of different estimators given different stand sizes, stands were simulated. This procedure is not a part of the validation; it merely shows the trend in the CIs given stand size. The simulated stands were square and all grid cells were assigned the average AP mean height observed at the validation plots (10 m). The only difference between the simulated stands was their size which ranged between one and 40 × 40 = 1600 grid cells (stand sizes of 256 m2 to 40.64 ha).

Results

The average of all regression-synthetic mean stand-level timber volume estimates was 186.0 m3/ha. While the linking model resulted in reasonable estimates for most stands, mean stand-level timber volume was clearly underestimated for the stands with the largest and fifth-largest observed timber volume (Fig. 2).

Fig. 2

Observed versus estimated mean timber volume on stand level with 95% confidence intervals based on different estimators and 1:1 line (dashed diagonal line).

None of the CIs had an ECP of 95% without adjustments (Table 2). The ECP of , which included only parameter uncertainty, was especially low (37%), whereas the ECP of , which included parameter uncertainty and spatially correlated heteroskedastic residual uncertainty with a range of 23.0 m was 91%. ECPs of were also estimated for other values of the range of spatial autocorrelation. The ECPs ranged from 87% to 98% with a range of spatial correlation of 0 to 120 m (Fig. 3).

Table 2

Empirical coverage proportions of the estimators.

Estimator	Average half interval(m³/ha)	Average half interval(%)	# plots within interval	ECP (%)
CIμ^x0	17.1	9.8	24	37.5
CIpY¯^m	17.1	9.8	24	37.5
CIprhY¯^m	68.2	39.7	56	87.5
CIprhsY¯^m,ρ=23m	78.2	45.7	58	90.6
PI(ŷ(x₀))	159.8	93.8	64	100.0

Fig. 3

Development of the ECP of given the spatial range of autocorrelation (ρ).

The PI had an ECP of 100% but was more than 50% bigger than . The sizes of , and decrease with stand size if all other parameters are held constant. The larger the spatial autocorrelation, the slower is the decrease of the CI length. For the stands simulated to visualize the development of the CIs given different stand sizes, was very close to for stands of 10 ha and larger. The dependence of the CIs obtained from different estimators on stand size is shown in Fig. 4.

Fig. 4

Development of the CIs obtained from different estimators for simulated stands with an average AP mean height but different sizes.

Ignoring the residual variability or the spatial correlation resulted in much smaller CIs than which, however, would come with the price of a lower ECP.

Discussion

In this study, we compared model-based variance estimators for regression-synthetic estimates of small areas. One estimator was based on the estimation of superpopulation parameters (Kangas, 1996, Mandallaz, 2013, McRoberts et al., 2014), the other set of estimators was based on the prediction of finite population values (McRoberts, 2006). All compared variance estimators can be obtained from the most general variance estimator (, estimator (7)) by ignoring spatial autocorrelation (, estimator (6)) and heteroskedasticity (, estimator (5)). Ignoring the influence of residual variation in general yields the estimator based on the estimation of superpopulation parameters (, estimator (3)). CIs obtained from the variance estimators based on the prediction of finite population values decrease with the size of the small areas and are between the PI and CI of the linking model. With increasing size of the small areas, all compared variance estimators converge. How fast approaches depends on the range of spatial autocorrelation and the residual variance of the linking model. The estimator (Kangas, 1996, Mandallaz, 2013, McRoberts et al., 2014), which is numerically almost equivalent to the CI of the linking model, only considers the uncertainty in model parameter estimates and underestimated the variance of these very small stands (< 1 ha). The estimator is, however, still appropriate for large stands. Considering the residual variance and spatial autocorrelation among residuals improved the ECP compared to . In our case study stands would have to be larger than 10 ha before the would be similar to . Using the PI as an estimate of uncertainty would be very conservative as it represents the uncertainty to be expected on the grid-cell level. McRoberts (2006) described both, and , in the context of estimating forest area within small areas using Landsat images. In this study we extended the estimator based on the prediction of finite population parameters for heteroskedasticity and applied it to a metric response variable. Since McRoberts (2006) applied both estimators to very large “small” areas, the variance estimates were similar up to the third digit and the estimator based on the prediction of finite population values was not further considered. Here we see that the estimator has the property of being dependent on the size of the small area. Bias resulting from model lack of fit is not considered in synthetic estimation and stand-level observations are necessary to accommodate the fact that population-level models for various reasons do not fit to some stands. Also the variance estimators analyzed in this study do not change this fundamental problem. Bias is therefore also the reason why the ECP of m is < 95%. Stands for which the linking model does not fit will necessarily result in a reduced ECP. Often, stand-level observations will be scarce because stands are too small and measurements of sample plots too expensive. If stand-level observations are available nonetheless, a variety of methods exist to improve the estimates (e.g., Goerndt et al., 2011, Magnussen et al., 2014, Mandallaz, 1991). As the linking model plays a prominent role in the model-based estimators discussed here, its properties need to be checked carefully. This includes tests for multicollinearity if more than one explanatory variable is selected. The range and structure of the spatial autocorrelation process needs to be estimated to use . This needs to be considered in the survey design. NFI data will usually not be suitable for this purpose because the distance between single plots is frequently selected in a way that prevents estimation of spatial autocorrelation. All implemented estimators are available in an R package (Breidenbach, 2015).

Conclusions

Variance estimators based on the prediction of finite population values are recommended for small areas containing few grid cell predictions (e.g., small stands). They are applicable to small areas containing many or few grid cell predictions and decrease with the number of grid cells. Confidence intervals (CIs) obtained from variance estimators based on the prediction of finite population values are between the CI and prediction interval (PI) of the linking model. For small areas containing many grid cell predictions (e.g., large stands), the CI of the linking model can be used as a simple-to-calculate indicator of uncertainty because it is practically equivalent to all variance estimators considered in this study.

2 in total

1. A century of National Forest Inventory in Norway - informing past, present, and future decisions.

Authors: Johannes Breidenbach; Aksel Granhus; Gro Hylen; Rune Eriksen; Rasmus Astrup
Journal: For Ecosyst Date: 2020-07-17 Impact factor: 3.645

2. Analysis of area level and unit level models for small area estimation in forest inventories assisted with LiDAR auxiliary information.

Authors: Francisco Mauro; Vicente J Monleon; Hailemariam Temesgen; Kevin R Ford
Journal: PLoS One Date: 2017-12-07 Impact factor: 3.240

2 in total