Literature DB >> 31774813

Modelling vegetation understory cover using LiDAR metrics.

Lisa A Venier¹, Tom Swystun¹, Marc J Mazerolle², David P Kreutzweiser¹, Kerrie L Wainio-Keizer¹, Ken A McIlwrick¹, Murray E Woods³, Xianli Wang¹.

Abstract

Forest understory vegetation is an important characteristic of the forest. Predicting and mapping understory is a critical need for forest management and conservation planning, but it has proved difficult with available methods to date. LiDAR has the potential to generate remotely sensed forest understory structure data, but this potential has yet to be fully validated. Our objective was to examine the capacity of LiDAR point cloud data to predict forest understory cover. We modeled ground-based observations of understory structure in three vertical strata (0.5 m to < 1.5 m, 1.5 m to < 2.5 m, 2.5 m to < 3.5 m) as a function of a variety of LiDAR metrics using both mixed-effects and Random Forest models. We compared four understory LiDAR metrics designed to control for the spatial heterogeneity of sampling density. The four metrics were highly correlated and they all produced high values of variance explained in mixed-effects models. The top-ranked model used a voxel-based understory metric along with vertical stratum (Akaike weight = 1, explained variance = 87%, cross-validation error = 15.6%). We found evidence of occlusion of LiDAR pulses in the lowest stratum but no evidence that the occlusion influenced the predictability of understory structure. The Random Forest model results were consistent with those of the mixed-effects models, in that all four understory LiDAR metrics were identified as important, along with vertical stratum. The Random Forest model explained 74.4% of the variance, but had a lower cross-validation error of 12.9%. We conclude that the best approach to predict understory structure is using the mixed-effects model with the voxel-based understory LiDAR metric along with vertical stratum, because it yielded the highest explained variance with the fewest number of variables. However, results show that other understory LiDAR metrics (fractional cover, normalized cover and leaf area density) would still be effective in mixed-effects and Random Forest modelling approaches.

Entities: Chemical Disease Species

Mesh：

Year: 2019 PMID： 31774813 PMCID： PMC6881062 DOI： 10.1371/journal.pone.0220096

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Understory vegetation is an important part of the forested ecosystem. It contributes greatly to nutrient cycling [1, 2], wildlife habitat [3-5], fire behaviour [6-8], microclimate [2] and carbon accounting [9]. Understory vegetation communities are therefore often considered a good indicator of forest ecological integrity [10, 11]. However, spatial predictions of understory cover or density have been extremely difficult to generate using traditional variables such as topography, overstory and soils [12]. Active remote-sensing technology such as LiDAR (light detection and ranging) could be used to generate estimates to address this issue. LiDAR provides an estimate of three-dimensional forest structure including estimates of canopy structure, understory vegetation and terrain. LiDAR is a survey method that measures the return time of a laser light pulse reflecting off solid objects such as the vegetation or the ground. These laser returns generate a three-dimensional representation of the forest. This capacity has conferred large advantages to forest managers, conservationists and researchers in their attempts to manage the forest efficiently and sustainably. LiDAR can generate reliable, robust estimates of many forest structure variables including canopy height and cover [13-15], as well as basal area and tree density [13, 16] and has similar potential for understory structure. Our objective in this paper is to evaluate the potential of LiDAR to generate predictions of understory cover by comparing to field measures of understory. To achieve this objective, we examine alternative LiDAR metrics that control for spatial heterogeneity of sampling density, we compare regression and machine learning statistical approaches, and we examine the value of multiple variables in our models. A key challenge of working with LiDAR data is that there is a large amount of spatial heterogeneity in the sampling density over space that occurs in the normal course of generating LiDAR point clouds. This spatial heterogeneity is due to variations in scan angle, flight height, movement of the aircraft during data collection, the degree of overlapping flight lines, and topography [17-20]. Thus, relative measures of vegetation density or cover, where the number of returns in a vertical stratum are scaled relative to some measure of sampling density, should provide better estimates of true understory vegetation cover. A variety of approaches have been used to relativize these measures, for example, dividing the number of returns in a vertical bin by the total number of returns in the column, or by the number of returns in the bin and below the bin [21]. We examine four different understory structure metrics based on different approaches to control for sampling density. We explored two statistical approaches for modelling understory vegetation structure as a function of LiDAR data: machine learning and mixed effects regression models. Machine learning, specifically Random Forest [22], has been used to model forest inventory variables with a large suite of LiDAR derived predictors [23, 24]. Machine learning in this context strives to produce the best prediction of the forest inventory variables. However, machine learning does not produce an ecologically interpretable relationship per se, only estimates of variable importance. Machine learning makes no assumptions about the structure of the data, is ideal for predicting relationships that are non-linear, is insensitive to correlations among variables, and interactions are automatically modeled [25]. However, machine learning is prone to bias associated with incomplete ranges of conditions being sampled [25]. As an alternative, we explored linear mixed-effects regression models. These models make assumptions of homoscedasticity and normality of errors which must be checked but can produce more parsimonious and more interpretable models than machine learning in some instances. In Random Forest models, large suites of variables are usually included to achieve the best predictive capacity. In the regression models, it is more important to limit the number of variables included to avoid overfitting and strong correlations between explanatory variables. Occlusion has been discussed in the literature as a possible issue limiting LiDAR effectiveness for prediction of understory structure [26, 27], but more recent studies have shown that the potential occlusion may not interfere with generating predictions. Latifi et al. [23] demonstrated that artificially reducing the density of the LiDAR point cloud did not have an appreciable effect on variance explained in models predicting understory structure. In another study, prediction errors of understory vegetation cover were not related with canopy cover [28]. However, forest type in some instances can influence the predictive accuracy of models [29]. In both of our modelling approaches, we included additional variables beyond the understory LiDAR metrics that may influence the amount of occlusion of the laser pulse, namely, the amount of overstory, the forest type, and the vertical stratum. All three of these variables could reflect the amount of vegetation in the area above the vertical stratum of interest. Our primary objective is to quantify the capacity of LiDAR to estimate understory structure. To achieve this, 1; we compare the effectiveness of four possible understory LiDAR metrics for predicting understory cover that control for sampling density, 2; we examine the influence of potentially important additional explanatory variables on the model which will inform us about the importance of occlusion, and 3; we compare the mixed effects vs Random Forest approach for generating predictions. Our aim is to generate robust and effective predictions of understory cover that could inform forest management and conservation.

Methods

Study area

This project was conducted in the Petawawa Research Forest. Permission to conduct the study at the Petawawa Research Forest was granted by Natural Resources Canada. The research forest covers 9,945 hectares in the Great Lakes-St. Lawrence forest region (45° 58’ 46.74” N, 77° 30’ 22.11” W), Ontario, Canada. The study area is on the Southern end of the Precambrian Shield, on bedrock of granites and gneisses. Forest composition features White Pine (Pinus strobus Linnaeus), Red Pine (Pinus resinosa Aiton), Red Oak (Quercus rubra Linnaeus), Yellow Birch (Betula alleghaniensis Britton), Sugar Maple (Acer saccharum Marshall), and Red Maple (Acer rubrum Linnaeus) as dominant species, often in uneven-aged forests. Presently, the Petawawa Research Forest is dominated by healthy but mature and overmature overstory (80–140 years) coupled primarily with low-quality regeneration and understories. For the purpose of the current study, we classified the forest into four types (TYPE) to explore the influence of forest type on the consistency of the relationship between understory vegetation structure measured in the field and LIDAR metrics. The four classes of forest type (TYPE) are Pine, Red Oak, Mixedwood without Pine, and Mixedwood with Pine. These four classes account for approximately 71% of the landbase of the research forest.

Field data collection

Within the Petawawa Research Forest, plots were selected from a 25 m-resolution rasterized LiDAR database and Forest Resource Inventory data based on aerial photo interpretation. Potential plots were selected based on a stratification by forest type, overstory density, and understory density. Initial overstory was measured as the relative number of LiDAR laser pulse returns in overstory (> 4 m), and understory density as the relative number of LiDAR laser pulse returns 4 m or lower. We divided the full range of overstory values into 10 equal bins, and the full range of understory values into 10 equal bins. For each combination of understory by overstory bin we selected five potential plots for each of four forest types, for a total of 2000 plots, 500 in each 10 by 10 matrix, with one matrix per each forest type. This is a rough stratification but helped to fill the statistical space to ensure optimal conditions for model construction. We sampled 437 plots out of the possible 2000, trying to select 1–5 plots from all cells in the matrix. We acknowledge that this stratification would not be effective if the relative number of LiDAR pulse returns was unrelated to actual understory vegetation cover. However, it was the most intuitive method to ensure that all overstory and understory conditions in our study area were represented in the sample. We collected vegetation data on 250 plots in 2015 and on an additional 187 plots in 2016. Plots were selected in the field from the list of preselected plots based on accessibility and conformity with classified forest type, understory, and overstory. At each plot centre, we used an SX Blue II GPS to generate a sub-meter accurate location through averaging a minimum number of 300 points (Geneq Inc., Montreal, Canada). Our field data collection attempted to generate a field-based point cloud to match the LiDAR based point cloud. We measured forest structure on ground-based plots in nine vertical strata (0–0.5 m, 0.5–1 m, 1–1.5 m, 1.5–2 m, 2–2.5 m, 2.5–3 m, 3–3.5 m, 3.5–4.0 m, > 4 m). From the centre point we created eight radial transects (12 m in length each) starting in a north direction and moving clockwise by 45 degrees for each additional transect. Along each transect, data were collected at each meter for a total of 97 sample locations in each plot, including the centre point (Fig 1). To sample the vegetation structure, observers recorded the presence or absence of vegetation within a radius of 15 cm for each of the nine vertical strata. Thus, there were 97 sampling points x 9 strata = 873 presence/absence points collected in each 12 m radius plot volume. The original vertical strata were later grouped into three strata (S1 = 0.5–1.5 m, S2 = 1.5–2.5 m, S3 = 2.5–3.5 m). We excluded points below 0.5 as they are difficult to distinguish from ground points. We excluded points above 3.5 m as they were difficult to estimate from the ground. The total number of vegetation presences in each stratum (0–194) were recorded in the FIELD variable for subsequent analysis. This field collection would represent a lower sampling density than the LiDAR data which are at 6 pulses per square meter with up to 8 returns per pulse which resulted in 2.44 returns per m3 compared to the field data with 0.43 returns per m3. These data are not strictly comparable since the field data represent presence and absence, whereas the LiDAR returns represent only presence but give a general impression of relative sampling density.

Fig 1

Sampling design for field observations of vegetation structure (FIELD).

Measurements around each point on the transects and vertical strata were within a 15 cm-radius (r).

Sampling design for field observations of vegetation structure (FIELD).

Measurements around each point on the transects and vertical strata were within a 15 cm-radius (r).

LiDAR acquisition

Airborne LiDAR data were collected over the Petawawa Research Forest from August 17–20, 2012. The Riegl 680i sensor was carried aboard a Cessna 172 aircraft flown at an average altitude of 750 m. Technical acquisition specifications are provided in Table 1. The data were collected as a full-waveform and provided as a discrete point file (LAS 1.1) for use in this project. Flight overlap was approximately fifty percent.

Table 1

Airborne LiDAR acquisition specifications.

Parameter	Value
Pulse repetition rate	150 Khz
Frequency	76.67 Hz
Scan Angle	± 20 Degrees
FOV	40 Degrees
Line spacing: Cross track	0.6 m
Line spacing: Along track	0.6 m
Line spacing between flight lines	250 m
Laser footprint min:	0.38 m
Laser footprint max	0.42 m
Average point density: All Returns	~ 15 pts/m²
Average point density: Last Returns	~ 6 pts/m²

Data processing and LiDAR variables

We developed specific LiDAR understory cover metrics that are expected to capture the vegetation understory density directly. We identified four metrics for our analysis. Three of these metrics are used in the literature: fractional cover (FRAC, modified from Wing et al. [28]), leaf area density (LAD, [30]), and voxel cover (VOX1m, [31]). The fourth metric considered was normalized cover (NORM), because it is an easily interpretable and easily calculated alternative. Fractional cover is calculated by summing the number of LiDAR vegetation returns for each understory vertical stratum and dividing by the sum of understory and ground returns. Leaf area density is calculated as the negative log of the number of returns in a vertical stratum divided by all returns in and below the vertical bin and then divided by a constant. Normalized cover is calculated by dividing all vegetation returns in the understory stratum divided by all first returns. The voxel cover approach filters all returns by estimating presence/absence of returns in each standard voxel (in our case 1 m3) in the vertical stratum. For example, a 2 m x 5 m x 5 m vegetation stratum that contains 50 1-m3 voxels would have a voxel cover value between 0 and 50, equal to the number of voxels that contain vegetation. Sampling density is extremely heterogeneous due to different factors such as flight line overlap and the pitch and yaw of the plane. The LiDAR metrics provide four alternative ways to scale the number of returns in a vertical bin by sampling density. In addition to these four specific LiDAR understory cover metrics, we calculated a suite of standard LiDAR point cloud metrics such as canopy cover and canopy height (S1 Table).

Analysis

We used linear mixed effects models to determine the capacity of our four main LiDAR understory cover metrics to predict understory cover recorded in the field (FIELD) in each of the three vertical strata defined above (ST1, ST2, ST3), and to examine the influence of secondary explanatory variables [32]. These secondary explanatory variables consisted of forest TYPE (based on overstory composition), STRATUM (vertical 1 m strata, ST1-ST3), and OVERSTORY (S1 Table). The OVERSTORY variable was a measure of LiDAR vegetation cover in the vertical column above the stratum of interest calculated by classifying canopy cover (CC) into three classes (low, medium, high). We treated the plot as a random effect to account for multiple measurements in each plot. We formulated 16 candidate models consisting of LiDAR variables, with the constraint of maintaining variance inflation factors (VIF) < 10 to avoid issues of multicollinearity (Table 2). For each the four main LiDAR metric, we derived four models: 1) a null model consisting only of the LiDAR metric, 2) a model with the LiDAR metric, OVERSTORY and, their interaction, 3) a model with the LiDAR metric, TYPE, and their interaction, and 4) a model with the LiDAR metric, STRATUM, and their interaction. We ranked all mixed effects models based on Akaike’s information criterion (AIC, [33, 34]) and calculated the R2 values. We also computed the symmetric mean absolute percentage error (SMAPE), based on 10-fold cross-validation [35], for the top-ranked models, and calculated SMAPE values for each of the 3 vertical strata separately. Parameters of the mixed effects models were estimated by maximum likelihood in R with the nlme package [23, 32, 36].

Table 2

Mixed effects model explaining understory cover recorded in the field (FIELD): TYPE = forest type based on overstory composition, STRATUM = vertical 1 m strata, ST1-ST3, and OVERSTORY = a measure of LiDAR vegetation cover in the vertical column above the stratum of interest calculated by classifying canopy cover (CC) into three classes (low, medium, high), see S1 Table.

The plot was treated as a random effect in each model.

Model Name	Model fixed effects structure	Biological interpretation
FRAC null	FRAC	Relationship between FRAC and FIELD is constant
FRAC * STRATUM	FRAC + STRATUM + FRAC*STRATUM	Relationship between FRAC and FIELD differs among STRATUM
FRAC * OVERSTORY	FRAC + OVERSTORY + FRAC*OVERSTORY	Relationship between FRAC and FIELD differs among OVERSTORY
FRAC * TYPE	FRAC + TYPE + FRAC*TYPE	Relationship between FRAC and FIELD differs among TYPE
NORM null	NORM	Relationship between NORM and FIELD is constant
NORM * STRATUM	NORM + STRATUM + NORM*STRATUM	Relationship between NORM and FIELD differs among STRATUM
NORM * OVERSTORY	NORM + OVERSTORY + FRAC*OVERSTORY	Relationship between NORM and FIELD differs among OVERSTORY
NORM * TYPE	NORM + TYPE + FRAC*TYPE	Relationship between NORM and FIELD differs among TYPE
VOX1m null	VOX1m	Relationship between VOX1m and FIELD is constant
VOX1m * STRATUM	VOX1m +STRATUM + VOX1m*STRATUM	Relationship between VOX1m and FIELD differs among STRATUM
VOX1m * OVERSTORY	VOX1m + OVERSTORY + VOX1m*OVERSTORY	Relationship between VOX1m and FIELD differs among OVERSTORY
VOX1m * TYPE	VOX1m + TYPE + VOX1m*TYPE	Relationship between VOX1m and FIELD differs among TYPE
LAD (null)	LAD	Relationship between LAD and FIELD is constant
LAD * STRATUM	LAD + STRATUM + LAD*STRATUM	Relationship between LAD and FIELD differs among STRATUM
LAD * OVERSTORY	LAD + OVERSTORY + LAD*OVERSTORY	Relationship between LAD and FIELD differs among OVERSTORY
LAD * TYPE	LAD + TYPE + LAD*TYPE	Relationship between LAD and FIELD differs among TYPE

Mixed effects model explaining understory cover recorded in the field (FIELD): TYPE = forest type based on overstory composition, STRATUM = vertical 1 m strata, ST1-ST3, and OVERSTORY = a measure of LiDAR vegetation cover in the vertical column above the stratum of interest calculated by classifying canopy cover (CC) into three classes (low, medium, high), see S1 Table.

The plot was treated as a random effect in each model. We used Random Forest with the same FIELD response variable as in the mixed-effects models described above. Because Random Forests are non-parametric and do not yield a log-likelihood, we ran a stepwise procedure with 341 LiDAR derived variables (which includes overstory estimates) (S1 Table), plus secondary variables forest TYPE (from Forest Resource Inventory), and STRATUM. We used mean decrease in accuracy to rank variable importance [37]. At each iteration, we removed the 20% least influential variables and compared the explained variance. Models were built using the randomForest package in R [37]. We examined the importance of variables in the suite of random forest models. Similar to the mixed effects models above, we quantified model performance with the percent variance explained and SMAPE based on 10-fold cross-validation. Finally, we compared the prediction performance of the mixed effects and Random Forest approaches.

Results

Relationship among LiDAR metrics

The FIELD measure of understory cover was strongly correlated with all of the four main LiDAR metrics we investigated (Fig 2A–2D). However, the FRAC and VOX1m metrics were slightly more linearly correlated than the other metrics to the FIELD measure (Fig 2A–2D). Nonetheless, the four understory vegetation metrics were all highly correlated with one another (Table 3).

Fig 2

Scatterplot of FIELD (measured density) against the LiDAR metrics, a) fractional cover (FRAC), b) normalized cover (NORM), c) leaf area density (LAD), and d) voxel cover (VOX1m), including Pearson product-moment correlation coefficients.

Table 3

Pearson product-moment correlations between pairs of understory cover LiDAR metrics included in analysis (n = 1310).

Correlation	r	Lower 95% CL	Upper 95% CL
FRAC vs NORM	0.77	0.751	0.794
FRAC vs VOX1m	0.84	0.819	0.852
FRAC vs LAD	0.77	0.744	0.789
NORM vs LAD	0.81	0.79	0.827
NORM vs VOX1m	0.92	0.911	0.927
VOX1m vs LAD	0.79	0.767	0.808

Mixed-effects models

The model consisting of the voxel-based cover estimate (VOX1m) with STRATUM and their interaction was the most parsimonious among all sixteen models considered (Table 4). This model had all the support (Akaike weight = 1, Table 4, Fig 3A). This model also had the highest conditional R2 (along with the FRAC + STRATUM + interaction model, although all sixteen models had high R2 values (0.71–0.87). For each of the four LiDAR metrics we considered, we observed the same pattern: the addition of STRATUM and the interaction to the null models resulted in consistently better model performance in terms of delta AIC and R2. The addition of OVERSTORY or TYPE resulted in much less model improvement than the addition of STRATUM. The model with the most support did not include forest type or overstory, which is important since forest type was derived from forest inventory data and cannot be extracted from LiDAR point clouds.

Table 4

R2 and AIC values for sixteen candidate linear mixed-effects models.

Model	Marginal R²	Conditional R²	AIC	Delta AIC	Akaike weight
VOX1m * STRATUM	0.62	0.87	11868.87	0	1
FRAC * STRATUM	0.65	0.87	11901.00	32.13	0
LAD * STRATUM	0.56	0.82	11998.29	129.42	0
NORM * STRATUM	0.52	0.83	12099.16	230.29	0
VOX1m * OVERSTORY	0.60	0.82	12348.32	479.45	0
LAD * OVERSTORY	0.51	0.73	12384.88	516.01	0
VOX1m * TYPE	0.60	0.82	12384.88	516.01	0
VOX1m null	0.60	0.82	12385.78	516.91	0
LAD * TYPE	0.51	0.72	12396.42	527.55	0
LAD null	0.50	0.71	12407.11	538.24	0
NORM * OVERSTORY	0.53	0.75	12450.66	581.79	0
NORM * TYPE	0.51	0.75	12563.97	695.1	0
NORM null	0.49	0.75	12568.4	699.53	0
FRAC * OVERSTORY	0.58	0.77	12585.04	716.17	0
FRAC * TYPE	0.57	0.75	12613.19	744.32	0
FRAC null	0.56	0.75	12617.05	748.18	0

Fig 3

Predicted versus observed scatterplot.

(a) Predictions of FIELD generated from mixed-effects model consisting of VOX1m + STRATUM + interaction, (b) Predictions of FIELD generated from Random Forest model with 59 explanatory variables.

Predicted versus observed scatterplot.

(a) Predictions of FIELD generated from mixed-effects model consisting of VOX1m + STRATUM + interaction, (b) Predictions of FIELD generated from Random Forest model with 59 explanatory variables.

R2 and AIC values for sixteen candidate linear mixed-effects models.

Note that marginal R2 denotes the percent variance explained by the fixed effects, whereas the conditional R2 includes both fixed effects and random effects. Delta AIC is the difference between each model relative to the most parsimonious model and Akaike weight indicates the percent support of a given model. The four LiDAR metrics had positive slopes in all of the mixed effects models (Fig 4, Table 5, for example). In our best model, the intercept of the lowest STRATUM was higher than in the upper strata (Fig 4). Although the model included the interaction between STRATUM and voxel cover, there was no evidence of different slopes of LiDAR among strata (Fig 4, Table 5). Symmetric mean absolute percentage (SMAPE) errors for the top-ranked mixed effects model was 0.156, but these values varied when investigating each stratum separately (Table 6). The SMAPE value was lowest for the lowest strata (0.107) and greatest for the highest strata (0.190) suggesting no evidence of occlusion. There were 437 observations for each stratum.

Fig 4

Predictions of FIELD for each of three strata based on the mixed-effects model consisting of VOX1m + STRATUM + interaction.

Dashed lines around solid lines denote 95% confidence intervals around predictions.

Table 5

Estimates of the best supported mixed-effects model consisting of VOX1m + STRATUM + interaction and a random effect of plot.

	Estimate	Lower 95% CL	Upper 95% CL
intercept	64.35	60.25	68.46
LIDAR	0.03	0.29	0.32
STRATUM.ST2	-21.94	-25.96	-17.98
STRATUM.ST3	-29.38	-33.48	-25.28
LIDAR*STRATUM.ST2	-0.016	-0.039	0.008
LIDAR*STRATUM.ST3	-0.010	-0.037	0.017

Table 6

Ten-fold cross-validation results from top linear mixed-effects model and the selected Random Forest model, based on symmetric mean absolute percentage error (SMAPE).

Note that average values of SMAPE are given for predictions of all STRATUM levels, but also for predictions specific to STRATUM levels.

Model		SMAPE mean	SMAPE sd (n = 10)
VOX1m * STRATUM	predictions of all STRATUM levels	0.156	0.014
	predictions of STRATUM 1	0.107	0.016
	predictions of STRATUM 2	0.170	0.024
	predictions of STRATUM 3	0.190	0.020
Random forest (59 predictors)		0.129	0.015

Predictions of FIELD for each of three strata based on the mixed-effects model consisting of VOX1m + STRATUM + interaction.

Dashed lines around solid lines denote 95% confidence intervals around predictions.

Ten-fold cross-validation results from top linear mixed-effects model and the selected Random Forest model, based on symmetric mean absolute percentage error (SMAPE).

Note that average values of SMAPE are given for predictions of all STRATUM levels, but also for predictions specific to STRATUM levels.

Random forest models

We examined the percent variance explained and the number of variables included to choose a final Random Forest model. The base model with all 341 LiDAR-derived variables, forest TYPE, and STRATUM explained 74.8% of the variance, but the final model with only 59 predictors had a very similar variance explained (74.4%) (Fig 3B, Table 7, S2 Table). The 10-fold cross-validation on this reduced model showed an overall mean error rate of 0.129 (Table 6).

Table 7

Random forest models: Mean squared residuals and percent variance explained.

Number of Predictors in model	Mean Squared Residuals	Percent variance Explained
341 (Base model)	484	74.8
276	485	74.7
223	485	74.8
180	484	74.7
145	476	75.2
116	486	74.7
93	481	75.0
74	492	74.3
59	490	74.4
47	513	73.3
37	508	73.5
29	531	72.4
22	553	71.2
17	528	72.5
13	558	70.9
10	580	69.8
7	569	70.4
5	632	67.1

Some variables appeared more often than others among the 18 Random Forest models considered. These variables consisted of STRATUM, GAP (the inverse of LAD), and LAD. In addition, most or all of the LiDAR understory vegetation cover metrics (VOX1m, FRAC, NORM) were represented in the top 10 variables of most of the 18 potential models (S3 Table). Crown closure (CC), an estimate of overstory, was also often among the top 10 most important variables within the models considered. Forest TYPE never occurred among the top 10 variables (S3 Table).

Discussion

In this study, our primary objective was to quantify the capacity of LiDAR to estimate understory structure so that it can be predicted across a landscape. To address this objective, first we compared the effectiveness of four possible understory LiDAR metrics (fractional cover, leaf area density, voxel cover, and normalized cover) for predicting understory cover. Each of these metrics used some measure of the number or presence of LiDAR returns in an understory vertical stratum and standardized these measures with an estimate of sampling density. All four LiDAR metrics were effective at predicting the amount of structure in an understory stratum, probably because they are all highly correlated direct measures of the density of understory vegetation. The best metric based on mixed effects modelling, however, was the voxel-based cover estimate (VOX1m) with the addition of STRATUM with a conditional R2 of 0.87. The voxel-based approach is relatively easy to calculate and provides a direct measure of the amount of understory structure. We anticipated that other variables could influence the predictions of understory. We identified three potentially important variables that might influence occlusion of understory structure: overstory, forest type and stratum. Increased overstory can reduce the ability of LiDAR to predict understory structure due to occlusion [26, 27]. For LiDAR to detect the understory structure, LiDAR pulses must reach and be reflected by understory vegetation. A greater vegetation interception above the area of interest will result in fewer pulses returning from the understory. Both forest type and stratum will also influence the amount of vegetation in the area above the area of interest and therefore potentially alter the relationship of field measured and LiDAR measured understory. Correlations between the three secondary explanatory variables (STRATUM, forest TYPE, and OVERSTORY) made it impossible to include all variables in a single model. Our best supported model included STRATUM, where we found that the lowest stratum (ST1, 0.5–1.5 m) had the highest intercept. This is consistent with occlusion in that we have more vegetation in ST1 than ST2 (1.5–2.5 m) and ST3 (2.5–3.5 m) for a given value of VOX1m. This is consistent with the idea that fewer laser pulses are reaching the lower stratum. The relationship between the field observed structure and VOX1m did not vary with STRATUM. Surprisingly, we found that the error in the predicted relationship was greatest in the highest STRATUM and lowest in the lowest STRATUM suggesting that there was no reduction in predictability associated with potential occlusion. These differences in prediction error suggest that the model can better predict new observations in the low stratum than the high stratum. A potential explanation for this result would be that the understory vegetation in the lower stratum is easier to estimate on the ground and therefore there is less noise in the relationship between the field and the LiDAR measures in the lower stratum. Either way, we conclude that our LiDAR sampling intensity was sufficient in our forest system to capture the understory structure regardless of the density of vegetation above the area of interest and the related potential for occlusion. There is some discrepancy in the literature on the effect of occlusion. Latifi et al. [29] found that thinning LiDAR data by artificially reducing the sampling density did not impact the effectiveness of models to predict understory. Their original data had a high point density of 30–40 points per m2 and a maximum of 11 returns. Data were thinned to two different levels but Latifi et al. [29] do not report on the final point density after thinning. Our data are at roughly 11.69 vegetation returns per m2, with about 0.55 vegetation returns per m3 in the 0.5–4 m understory stratum. Obviously, the effectiveness of LiDAR to capture understory structure will eventually be undermined by a sufficient reduction in sampling density, but this limit does not seem to have been reached in the Petawawa research forest. Gonzalez-Ferreiro et al. [38] showed that reducing pulse density from 8 pulses per m2 to 0.5 pulses per m2, did not decrease model precision in estimating stand variables. Wing et al. [28] found no trends between understory vegetation cover prediction error and canopy cover, lending support to the idea that under some natural overstory conditions and common LiDAR sampling densities, occlusion is not an issue for predicting understory with LiDAR. In contrast, Ruiz et al. [19] reported an effect of LiDAR sampling density on model R2 values but only at levels below around 5 points/m2. It is unclear how this number translates into pulses reaching the understory. The lack of influence of forest type on understory cover predictions enables predicting understory from LiDAR alone without relying on traditional forest resource inventory data. The comparisons of mixed effects and Random Forest models revealed some obvious alignment. All four of the LiDAR metrics considered (fractional cover, leaf area density, normalized cover, and voxel cover) produced models with high R2 values. All four of these variables also had very high variable importance in the Random Forest models. Voxel cover (VOX1m) was the most important variable in the selected Random Forest Model. The stratum variable appeared often in the top Random Forest models and was also important in the top-ranked mixed-effects model (VOX1m * STRATUM). The Random Forest model had a high variance explained (75%), but not as high as the best mixed effects model that included the voxel-based measure of cover (87%). Our selected Random Forest model had 59 explanatory variables, whereas the best mixed effects model had two explanatory variables and their interaction, as well as a random effect of plot. Other variables with high importance in the Random Forest models included other direct measures of understory structure, and canopy closure (S2 Table), which is expected to influence the amount of vegetation in the understory through light availability. The prediction error was slightly lower for the random forest model than for the mixed effects model (12.9% vs 15.6%), albeit at the cost of including 59 explanatory variables compared to 8 parameters estimated in the mixed effects model. Based on our results, generating landscape-wide predictions using the mixed-effects model should be simpler and more efficient than with the Random Forest model. For these reasons (12% higher explained variance, fewer explanatory variables, and similar prediction error), we recommend the mixed effects model for predicting understory vegetation structure with LiDAR, but we acknowledge that the Random Forest model also generates robust predictions. Direct evaluations of LiDAR metrics to capture understory cover are relatively rare. Studies have shown good agreement between field and LiDAR measures of forest stand biomass [39], but biomass is likely driven primarily by tree biomass rather than understory. Asner et al. [40] explored structural transformation of rain forests due to invasive plants and used LiDAR to estimate structural changes in the understory. However, Asner et al. [40] did not report quantitative comparisons of field and LiDAR measures. Martinuzzi et al. [41] produced classification accuracies of 83% in predicting the presence of shrubs, but not their abundance. Wing et al. [28] compared understory vegetation cover and airborne LiDAR estimates with the addition of a filter for intensity values in an interior ponderosa pine forest. Their models had R2 values from 0.7 to 0.8 and accuracies of ± 22%. Our models achieved slightly higher R2 with slightly lower error rates without the use of the intensity filter, suggesting that the latter filter may not always be necessary to generate good estimates. As well, the intensity filter is affected by a number of factors such as elevation and the nature of the object intercepted that are difficult to normalize, so we prefer models that do not require intensity filters. Latifi et al. [29] also made a direct comparison of ground-based vs LiDAR estimates of understory cover in temperate mixed stands, and found strong relationships in the top canopy and the herbal layer with lower predictive power in the intermediate stand layers. Their shrub layer regression model had a relatively low R2 value of 37%. In a later study, Latifi et al. [23] showed an R2 of 80% for the shrub layer based on thinned LiDAR point clouds and new analytical methods. Campbell et al. [21] also compared field and LiDAR measures of understory directly in mixedwood forests and generated an R2 of 0.44 based on a relative point density similar to metrics that we used here. It is unclear why there is so much variation in the ability of LiDAR to predict understory structure but it suggests that we should be somewhat cautious in assuming that individual LiDAR metrics are always capturing the understory structure. It is important to note that some of the error in prediction in our models is likely the result of the lag between the LiDAR acquisition (2012) and the field data acquisition (2016–2017). This lag is likely to result in the most error in the youngest stands where changes in herb and shrub growth are likely to be greatest but I in the analysis, most stands are mature forest. Likely with less lag between LiDAR and ground-based measures we would have seen even better predictions. In addition, the error associated with GPS locations can introduce error into the relationship between ground-based and LiDAR estimates, although GPS technology is constantly improving. Our GPS (SXblue), reports sub meter accuracy under ideal conditions, but discrepancy in geoposition probably accounts for some of the error in prediction. Despite the limited work directly evaluating LiDAR measures of understory vegetation structure, many studies have explored the use of LiDAR to capture wildlife habitat structure some of which is related to understory [42-46] One of the most commonly reported relationships is between vegetation structural diversity or understory density and wildlife diversity [5, 47–49]. In addition, vegetation understory structure explained bird species composition in a number of studies [5, 50, 51]. Melin et al. [52] found that a LiDAR metric similar to fractional cover to estimate shrub density below 5 m was a good predictor of grouse brood occurrence in Finland, consistent with expectations based on known habitat preferences of the species. However, they did not test the assumption that the LiDAR metric effectively estimates vegetation density below 5 m. All of these studies do however, provide indirect evidence for the effectiveness of LiDAR estimates to predict understory cover or density.

Conclusions

Based on the highest variance explained, the fewer number of explanatory variables, and ease of interpretation and application, we recommend using the mixed-effects model consisting of voxel-based cover estimate, stratum, and their interaction to generate spatial estimates of understory cover. Nonetheless, all four LiDAR metrics that we considered and both analytical approaches (mixed effects models, Random Forests) produced predictions suitable for many ecological and forest planning applications. This information could improve spatially-explicit mapping of wildlife habitat, fire behaviour, or forest ecosystem dynamics. Measuring understory cover in situ is not difficult, but many applications require maps or spatial estimates of attributes for forest management and conservation applications over large areas. LiDAR remote sensing is the most efficient approach to generating these spatial estimates of forest attributes. Our results fully support the indirect evidence provided from wildlife studies that LiDAR can predict understory vegetation structure even in the presence of a mature tree canopy. With error percentages of around 15%, these spatial predictions will introduce some uncertainty into predictions, which should be factored into decision-making. With increasing sampling density associated with better LiDAR technology, we anticipate that understory cover models will become more reliable and generalizable across regions. In particular, because the models are not dependent on any ecological relationships per se, because they use direct measures of vegetation cover, we believe that under similar sampling densities the models should be generalizable. Additional testing of this approach in different forested ecosystems would provide more confidence in the transferability of the models.

Definitions of all variables included in at least one of the mixed-effects or Random Forest models.

(DOCX) Click here for additional data file.

Rank importance of explanatory variables in the selected Random Forest model with 59 variables.

(DOCX) Click here for additional data file.

Frequency of explanatory variables among the 18 Random Forest models run with 341 to 7 variables.

(DOCX) Click here for additional data file.

Data used in analyses for manuscript.

Variable definitions are found in S1. (ZIP) Click here for additional data file. 30 Aug 2019 PONE-D-19-19180 Modelling vegetation understory cover using LiDAR metrics PLOS ONE Dear Venier, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. ============================== ACADEMIC EDITOR: I agree with both reviewers that the study was well done, of interest to the broader scientific community, and requires on minor revision before acceptance for publication. ============================== We would appreciate receiving your revised manuscript by Oct 14 2019 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, John Toland Van Stan II, Ph.D. Academic Editor PLOS ONE Journal Requirements: 1. When submitting your revision, we need you to address these additional requirements. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: General Comments I enjoyed reading the manuscript and thought it well done. The comparison of models is interesting and the results are interesting and unique. I think the specific comparison of model input variables has the potential to reinforce the findings and provide some deeper ecological insight (see below). For example, which variables show up in both top ranking models? Why do you think that is in the system you examined? I think placing figures 3 and 5 together (one below the other) would be useful to really compare the differences… Figure 4 needs larger text and more contrast, perhaps with different line patterns. (line type in R). It is hard to read. Hypotheses need to be included in the final paragraph of your introduction. You make reference to them in the discussion. I would like to see some commentary on why you think the variables that floated to the top in both models were what they were. What do you think their ecological significance is? Or perhaps, why that understory structure shows up that way in the data. Do you think these models or indices would be relevant in other forests or ecosystems? Specific Comments 24 “… among other things” sounds very weak. I suggest changing this sentence to get a reader’s attention. Suggestion: “Forest understory vegetation is an important characteristic of the forest, but hard to measure with current remote sensing tools.”, or something along those lines. 40 If the random forest model had lower error, why did you choose the mixed effects model? Hope to return to this later. 52 remove the word “potentially”, it is redundant. 53 I would qualify this statement… Lidar itself is just data, the estimates come when models are created, which is what you are doing. Therefore, Lidar “can provide estimates” or something like that would be more apt. As you know, understory vegetation is typically obscured by the dominant canopy, which is why your models would be useful. 56-58 The most important measurement here is the time of flight of the reflected pulse, not just the pulse itself, which would give you the intensity of the return, so that piece of information in the third sentence here should come earlier. I think these sentences could be combined into a more concise description. 68-72 I don’t think you need to state these things here, as you repeat them in the remaining part of the introduction. If anything, these things should come at the very end of the introduction when you are tying everything together that you have presented thus far. 77 Heterogeneity is also caused by topography and the vegetation structure itself. And you should definitely have some citations here. Here’s one off the top of my head: Goodwin, N. R., Coops, N. C., & Culvenor, D. S. (2006). Assessment of forest structure with airborne LiDAR and the effects of platform altitude. Remote Sensing of Environment, 103(2), 140-152. 82 I would include other citations here… there have been numerous efforts to normalize lidar point density. For example: Ruiz, L., Hermosilla, T., Mauro, F., & Godino, M. (2014). Analysis of the influence of plot size and LiDAR density on forest structure attribute estimates. Forests, 5(5), 936-951. And take a look at this one: Jakubowski, M. K., Guo, Q., & Kelly, M. (2013). Tradeoffs between lidar pulse density and forest measurement accuracy. Remote Sensing of Environment, 130, 245-253. 92 You need a citation here concerning machine learning. 93 Also a citation here. 96 Citation here 99 And citation here. To make declarative statements about model parameters and interpretability like these, you should be referencing something. 118 What hypotheses do you have concerning variable importance, model fit, etc. that you could reference in the discussion? 144 I’d like to see something here concerning the justification for stratifying your plot locations based on the data that you are trying to predict with… it seems somewhat circular. I think it is ok, as that is really the only way you could come up with a stratification based on understory across a landscape, but you should still address this issue. 154 What was the mean/variation of the horizontal precision of your GPSed plot coordinates? Sub meter could mean 0.01m or 0.99m 163 The diagram is very helpful in understanding the plot design. 180 Could you justify the temporal discrepancy here? I am wondering how much the understory might have changed in the intervening 3 and 4 years between lidar acquisition and data collection. 189 I like this section. Easy to follow. 256 “appeared” is not a precise word. Perhaps something like “were slightly more linearly correlated”. The difference in correlation values is very small and I would assume not significant in a statistical sense. 272 “For each of the four…” 275 This is an interesting finding! 296 This should be referencing back to your initial hypotheses, but you didn’t present any in your final introductory paragraph. 299 The labels on this figure should be larger. 321 based on the lower error rate and looking at the scatter in Figure 5, I would say that the random forest model did a much better job in predicting understory strata. 331 I would like to see the ranking of the most important variables in your final random forest model. 345 This is from your mixed effects model. I personally think the random forest model did a better job of prediction, although as you state in the introduction, not as easy to interpret as the linear model. But when using such rich data and derivatives as possible with lidar, you might as well have a model that includes all relevant information, such as a random forest model (in my opinion, you might disagree). Perhaps here though, you could compare which variables were included in your top ranking linear models to the most important ranked variables in your final random forest model. Is there overlap? I think this would not only serve to justify the models and variables, but to compare what the models themselves say about the relevant predictor variables. 348 Your hypothesis should come before the discussion and reference them here. 389 You don’t mention random forest models up until this point. All of your discussion thus far comes off as there only being one model type explored… I would mention random forest earlier and like you did for mixed effects, discuss the final model and the most important variables within that model. 396 Variance explained doesn’t matter as much as the error… the random forest model had a lower rate of error and that should be stated here to counterbalance the statement about variance. 428-431 Yes! This is what I was thinking when I read the methods. I think you need to expand on this. Would your results been similar if they were collected during the same year? I think this is a really important point and shouldn’t just come at the end of a paragraph. 443 You talk a lot about the data and the models… I would like to see some hypothesis building about why certain metrics were better and how they are directly related to the structure. 443 I would also like to know, as a scientist, how you think these models, indices, metrics, etc. might work in other forests? Do you think they are particular to this ecosystem? Reviewer #2: PLOS ONE – Manuscript ID: PONE-D-19-19180 Title: Modelling vegetation understory cover using LiDAR metrics Reviewer comments: Overall comments: This is a well-written paper with logical flow that is easy to follow, has sufficient detail for replication by others, and use of terminology and acronyms is appropriate for audience. It is obvious a lot of hard work went into this project, and this paper does a great job of explaining it. I am especially impressed by the thorough lit review and excellent figures and tables, which some authors don’t put a lot of effort into. I have a few very minor comments, below. Abstract Line 27: has yet to be fully validated. Introduction Line 86 (and as it occurs thereafter): some papers cite Random Forests with capital letters. I think it is fine either way. Excellent lit review. The authors did an especially good job exploring various options for analysis. Methods Line 126: extra space between composition features Thorough Methods section. I like Figure 1. Analysis Line 214: insert space between variables (26) Line 229: insert space between package (18… Thorough presentation of analysis methods. Results Solid statistical analysis methods and presentation of results. There are a lot of table but I’m not sure how to suggest minimizing them as they all contain pertinent info to the study. _________________________________ Overall references look good, but I did not take a close look at each of them. Figures and Tables look great! ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: JONATHON J DONAGER Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. Submitted filename: PONE-D-19-19180_reviewer_comments.docx Click here for additional data file. 15 Oct 2019 Response to Reviewers for PONE-D-19180 Reviewer #1: General Comments I enjoyed reading the manuscript and thought it well done. The comparison of models is interesting and the results are interesting and unique. I think the specific comparison of model input variables has the potential to reinforce the findings and provide some deeper ecological insight (see below). For example, which variables show up in both top ranking models? Why do you think that is in the system you examined? Thank you for your comments. See below for specific responses. Line numbers cited are from the Track Changes Version I think placing figures 3 and 5 together (one below the other) would be useful to really compare the differences… Done, now labelled as 3a and b. Figure 4 needs larger text and more contrast, perhaps with different line patterns. (line type in R). It is hard to read. Done (added colour and enlarged text). Hypotheses need to be included in the final paragraph of your introduction. You make reference to them in the discussion. We do not use the term hypothesis or hypotheses in the paper. The paper is not structured as a test of any hypotheses. It is about quantifying capacity of LiDAR to estimate understory. We re-read the discussion and don’t see any reference to hypotheses, although there is some discussion of the possible effect of occlusion. But this is more about anticipating what we might need to account for to generate good predictions rather than any kind of ecological hypothesis. We would prefer not to structure the paper in terms of hypotheses since the primary objectives do not fit well with that structure ie can we predict understory with LiDAR, are there some measures of understory that work better than others, and which modelling approach gives the best predictions. We did not have any a priori expectations on these three objectives. I would like to see some commentary on why you think the variables that floated to the top in both models were what they were. What do you think their ecological significance is? Or perhaps, why that understory structure shows up that way in the data. Do you think these models or indices would be relevant in other forests or ecosystems? We included sentences in each section to indicate the significance of the variables, ie why they are likely important. It’s not generally an ecological reason. For example, it looks like STRATUM is important but not because of occlusion and maybe because of the improved ability to field sample vegetation in the lower strata. We modified the text to “These differences in prediction error suggest that the model can better predict new observations in the low stratum than the high stratum. A potential explanation for this result would be that the understory vegetation in the lower stratum is easier to estimate on the ground and therefore there is less noise in the relationship between the field and the LiDAR measures in the lower stratum.” Lines 387-391 Or for the understory LiDAR metrics… We also modified the text to “All four LiDAR metrics were effective at predicting the amount of structure in an understory stratum, probably because they are all highly correlated direct measures of the density of understory vegetation. The best metric based on mixed effects modelling, however, was the voxel-based cover estimate (VOX1m) with the addition of STRATUM with a conditional R2 of 0.87. The voxel-based approach is relatively easy to calculate and provides a direct measure of the amount of understory structure. “ Lines 362-368 We added the following sentence to the discussion on RF models to address the additional variables included in the model: “Other variables with high importance in the Random Forest models included other direct measures of understory structure, and canopy closure (S2 Table), which is expected to influence the amount of vegetation in the understory through light availability. The prediction error was slightly lower for the random forest model than for the mixed effects model (12.9% vs 15.6%), albeit at the cost of including 59 explanatory variables compared to 8 parameters estimated in the mixed effects model. Based on our results, generating landscape-wide predictions using the mixed-effects model should be simpler and more efficient than with the Random Forest model. For these reasons (12% higher explained variance, fewer explanatory variables, and similar prediction error), we recommend the mixed effects model for predicting understory vegetation structure with LiDAR. ” Lines 424-435 To address the generalizability of the models we added the following to the Conclusion: “With increasing sampling density associated with better LiDAR technology, we anticipate that understory cover models will become more reliable and generalizable across regions. In particular, because the models are not dependent on any ecological relationships per se, because they use direct measures of vegetation cover, we believe that under similar sampling densities the models should be generalizable. Additional testing of this approach in different forested ecosystems would provide more confidence in the transferability of the models. “ Lines 498-504 Specific Comments 24 “… among other things” sounds very weak. I suggest changing this sentence to get a reader’s attention. Suggestion: “Forest understory vegetation is an important characteristic of the forest, but hard to measure with current remote sensing tools.”, or something along those lines. We modified the first two sentences of the abstract to reflect this comment: see below for corrected sentences. “Forest understory vegetation is an important characteristic of the forest. Predicting and mapping understory is a critical need for forest management and conservation planning, but it has proved difficult with available methods to date.” Lines 24-27 40 If the random forest model had lower error, why did you choose the mixed effects model? Hope to return to this later. The error in the random forest was not much lower than the one for the mixed effects model, whereas the variance explained by the mixed-effects model was much higher (12%) and used 51 fewer explanatory variables. For these reasons, we choose the mixed effects model, but we also clearly stated that the RF model was good too. We modified the text to clarify this point. “Our selected Random Forest model had 59 explanatory variables, whereas the best mixed effects model had two explanatory variables and their interaction, as well as a random effect of plot. Other variables with high importance in the Random Forest models included other direct measures of understory structure, and canopy closure (S2 Table), which is expected to influence the amount of vegetation in the understory through light availability. The prediction error was slightly lower for the random forest model than for the mixed effects model (12.9% vs 15.6%), albeit at the cost of including 59 explanatory variables compared to 8 parameters estimated in the mixed effects model. Based on our results, generating landscape-wide predictions using the mixed-effects model should be simpler and more efficient than with the Random Forest model. For these reasons (12% higher explained variance, fewer explanatory variables, and similar prediction error), we recommend the mixed effects model for predicting understory vegetation structure with LiDAR. ” Lines 421-435 52 remove the word “potentially”, it is redundant. Done as suggested 53 I would qualify this statement… Lidar itself is just data, the estimates come when models are created, which is what you are doing. Therefore, Lidar “can provide estimates” or something like that would be more apt. As you know, understory vegetation is typically obscured by the dominant canopy, which is why your models would be useful. We reworded “Active remote-sensing technology such as LiDAR (light detection and ranging) could be used to generate estimates to address this issue.” Lines 54-56 56-58 The most important measurement here is the time of flight of the reflected pulse, not just the pulse itself, which would give you the intensity of the return, so that piece of information in the third sentence here should come earlier. I think these sentences could be combined into a more concise description. We reworded “LiDAR provides an estimate of the three-dimensional forest structure including estimates of canopy structure, understory vegetation, and terrain. LiDAR is a survey method that measures the return time of a laser light pulse reflecting off solid objects such as the vegetation or the ground. These laser returns generate a three-dimensional representation of the forest.”Lines 57-60 68-72 I don’t think you need to state these things here, as you repeat them in the remaining part of the introduction. If anything, these things should come at the very end of the introduction when you are tying everything together that you have presented thus far. Okay, we have removed these statements as suggested. 77 Heterogeneity is also caused by topography and the vegetation structure itself. And you should definitely have some citations here. Here’s one off the top of my head: Goodwin, N. R., Coops, N. C., & Culvenor, D. S. (2006). Assessment of forest structure with airborne LiDAR and the effects of platform altitude. Remote Sensing of Environment, 103(2), 140-152. We are struggling with this comment a bit. We read the suggested paper but did not find the specific reference to topography and veg structure influencing sampling density. We found another reference that supported the topography idea but not the vegetation structure. We added the idea of topography influencing sampling density with the new citation. We are not sure that we understand how vegetation structure would influence sampling density. Vegetation structure influences point density, but what we were interested in here were the variables that create noise in the relationship between vegetation density and point density. 82 I would include other citations here… there have been numerous efforts to normalize lidar point density. For example: Ruiz, L., Hermosilla, T., Mauro, F., & Godino, M. (2014). Analysis of the influence of plot size and LiDAR density on forest structure attribute estimates. Forests, 5(5), 936-951. And take a look at this one: Jakubowski, M. K., Guo, Q., & Kelly, M. (2013). Tradeoffs between lidar pulse density and forest measurement accuracy. Remote Sensing of Environment, 130, 245-253. We added these citations to the discussion on normalizing lidar point density 92 You need a citation here concerning machine learning. We added Cutler et al. 2007 on using Random Forest in Ecology 93 Also a citation here. We now cite Latifi et al. 2017 and Penner et al. 2013 (refs 23 and 24) 96 Citation here We now cite De’ath 2000 on machine learning 99 And citation here. To make declarative statements about model parameters and interpretability like these, you should be referencing something. We now cite De’ath 2000 118 What hypotheses do you have concerning variable importance, model fit, etc. that you could reference in the discussion? We haven’t structured the paper in terms of hypothesis testing. The main objective is to evaluate the capacity of LiDAR to generate good predictions of understory vegetation cover. We could add some hypotheses in the sense of ideas about how best to generate those predictions but we see it as somewhat artificial to call these the hypotheses of the paper. So for example we don’t have a hypothesis about whether machine learning or mixed effects models will be better for prediction. We don’t have any hypotheses about which of the 4 direct measures of understory are likely to be best for prediction. We would prefer to structure the paper in terms of well stated objectives rather than hypotheses. 144 I’d like to see something here concerning the justification for stratifying your plot locations based on the data that you are trying to predict with… it seems somewhat circular. I think it is ok, as that is really the only way you could come up with a stratification based on understory across a landscape, but you should still address this issue. Our intent with the stratification was to fill the model prediction space as much as possible. We knew that sites with more overstory were on average likely to have less understory because of the limited light availability. But we were interested in whether or not occlusion would play a role in prediction accuracy so we needed to try to have sites with both lots of overstory and lots of understory or vice versa to fill in the modelling space. So rather than just using a random approach that would have skewed the sample to the common conditions, we were selective in trying to represent the more uncommon conditions. We used the only data we had to improve the sampling to capture the uncommon conditions. We don’t believe this created any bias in the data but it may not have been very effective if the LiDAR data was not at all representative of the actual understory vegetation condition. We have added a few sentences to the methods to acknowledge this reasoning. “We acknowledge that this stratification would not be effective if the relative number of LiDAR pulse returns was unrelatedto actual understory vegetation cover. However, it was the most intuitive method to ensure that all overstory and understory conditions were represented in the sample.” Lines 159-163 154 What was the mean/variation of the horizontal precision of your GPSed plot coordinates? Sub meter could mean 0.01m or 0.99m We don’t have these data. The reference manual indicates sub meter accuracy but we did not test this. We think the important point is that error in GPS location will introduce error but even with this error we are able to make good predictions. We added a citation of the reference manual and added some text on the implications of GPS error to the discussion. “In addition, the error associated with GPS locations can introduce error into the relationship between ground-based and LiDAR estimates, although GPS technology is constantly improving. Our GPS (SXblue), reports sub meter accuracy under ideal conditions, but probably accounts for some of the error in prediction.” Lines466-470 163 The diagram is very helpful in understanding the plot design. Thank you. 180 Could you justify the temporal discrepancy here? I am wondering how much the understory might have changed in the intervening 3 and 4 years between lidar acquisition and data collection. We can’t justify it. Ideally the ground plots should be sampled immediately after the LiDAR acquisition, but that was not possible for a variety of reasons. We have no way to quantify the change in the understory over the 3-4 years. Likely this is not very significant in mature forests, but might be in disturbed stands. In our case most stands were mature. We have discussed this as a likely source of error but one that, based on the results, did not undermine, to any great degree, the predictive capacity of LiDAR. 189 I like this section. Easy to follow. Thank you. 256 “appeared” is not a precise word. Perhaps something like “were slightly more linearly correlated”. The difference in correlation values is very small and I would assume not significant in a statistical sense. Okay we changed this 272 “For each of the four…” Changes as suggested 275 This is an interesting finding! 296 This should be referencing back to your initial hypotheses, but you didn’t present any in your final introductory paragraph. We reworded this sentence to remove the term expectations, and stated explicitly the implications from the result. We would prefer not to structure the paper in terms of hypothesis testing. “The SMAPE value was lowest for the lowest strata (0.107) and greatest for the highest strata (0.190) suggesting no evidence of occlusion.”Lines 314-316 299 The labels on this figure should be larger. We enlarged the text 321 based on the lower error rate and looking at the scatter in Figure 5, I would say that the random forest model did a much better job in predicting understory strata. Both models did well. We selected the mixed effects model because of the much higher (12%) variance explained. The difference in prediction error was relatively low (<3%). Also random forest used 59 predictors vs the 2 plus interaction for the mixed effects model. We modified the text to clearly justify our reasoning but also added an acknowledgement that the RF model was also effective. “Based on our results, generating landscape-wide predictions using the mixed-effects model should be simpler and more efficient than with the Random Forest model. For these reasons (12% higher explained variance, fewer explanatory variables, and similar prediction error), we recommend the mixed effects model for predicting understory vegetation structure with LiDAR, but we acknowledge that the Random Forest model also generates robust predictions.” Lines 429-435 331 I would like to see the ranking of the most important variables in your final random forest model. We have added them to the supplementary material and added an S Table citation in the results to that effect. 345 This is from your mixed effects model. I personally think the random forest model did a better job of prediction, although as you state in the introduction, not as easy to interpret as the linear model. But when using such rich data and derivatives as possible with lidar, you might as well have a model that includes all relevant information, such as a random forest model (in my opinion, you might disagree). Perhaps here though, you could compare which variables were included in your top ranking linear models to the most important ranked variables in your final random forest model. Is there overlap? I think this would not only serve to justify the models and variables, but to compare what the models themselves say about the relevant predictor variables. We have added the variable importance from the selected RF model and made reference to it in the Results and Discussion. We highlight the overlap in variable selection between the two models in the Discussion paragraph on the Random Forest model. 348 Your hypothesis should come before the discussion and reference them here. We have not structured the paper around specific hypotheses because the main objective it to test the capacity of LiDAR to predict understory vegetation which would only generate a trivial hypothesis that LiDAR can or can’t do the job. The idea that occlusion might be important could be worded as a hypothesis but we have evidence from the literature to suggest both options so again the hypothesis would be somewhat artificial. We are trying to take the focus away from somewhat arbitrary hypotheses and predictions and focus the paper as an evaluation of LiDAR as a tool. 389 You don’t mention random forest models up until this point. All of your discussion thus far comes off as there only being one model type explored… I would mention random forest earlier and like you did for mixed effects, discuss the final model and the most important variables within that model. We are going to resist this suggestion as it would require a very significant reworking of the discussion that we do not feel is warranted. Our primary objective is to quantify the capacity of LiDAR to estimate understory structure so that is what we focus on in the first paragraphs of the discussion. The comparison of modelling approaches is identified in the introduction as our third objective and so we maintained that structure in the discussion. If we had found that the RF model was significantly better at estimating understory we might have reorganized the introduction but because we don’t see a big advantage of the RF model over the mixed-effects model, our discussion of the mixed effects models results fully answers our first objective. It is also a better model for examining objective 2 and so we believe the discussion flows as it should and in parallel to the intro/objectives. 396 Variance explained doesn’t matter as much as the error… the random forest model had a lower rate of error and that should be stated here to counterbalance the statement about variance. We believe that the variance explained is a good measure of the capacity of the model to explain/predict the ground based measures of understory cover. As well the variance explained was 12% higher in the mixed effects model but the error was 2.7 percent lower in the RF model. We see these error estimates as being very similar. But there are other reasons including the simplicity of only needing 2 variables in the mixed effects model vs 59 in the RF model that led us to recommend the ME model. We added additional justification and an acknowledgement of the value of the RF model. We think we are justified in recommending the ME model but the reader is free to use the RF model and we acknowledge that it will produce good predictions. “The comparisons of mixed effects and Random Forest models revealed some obvious alignment. All four of the LiDAR metrics considered (fractional cover, leaf area density, normalized cover, and voxel cover) produced models with high R2 values. All four of these variables also had very high variable importance in the Random Forest models. Voxel cover (VOX1m) was the most important variable in the selected Random Forest Model. The stratum variable appeared often in the top Random Forest models and was also important in the top-ranked mixed-effects model (VOX1m * STRATUM). The Random Forest model had a high variance explained (75%), but not as high as the best mixed effects model that included the voxel-based measure of cover (87%). Our selected Random Forest model had 59 explanatory variables, whereas the best mixed effects model had two explanatory variables and their interaction, as well as a random effect of plot. Other variables with high importance in the Random Forest models included other direct measures of understory structure, and canopy closure (S2 Table), which is expected to influence the amount of vegetation in the understory through light availability. The prediction error was slightly lower for the random forest model than for the mixed effects model (12.9% vs 15.6%), albeit at the cost of including 59 explanatory variables compared to 8 parameters estimated in the mixed effects model. Based on our results, generating landscape-wide predictions using the mixed-effects model should be simpler and more efficient than with the Random Forest model. For these reasons (12% higher explained variance, fewer explanatory variables, and similar prediction error), we recommend the mixed effects model for predicting understory vegetation structure with LiDAR, but we acknowledge that the Random Forest model also generates robust predictions.” Lines 413-435 428-431 Yes! This is what I was thinking when I read the methods. I think you need to expand on this. Would your results been similar if they were collected during the same year? I think this is a really important point and shouldn’t just come at the end of a paragraph. We have added more content to the paragraph on error. We acknowledge that there is some variation in the predictability and identify 2 sources of noise in our data that may affect other studies. “It is unclear why there is so much variation in the ability of LiDAR to predict understory structure but it suggests that we should be somewhat cautious in assuming that individual LiDAR metrics are always capturing the understory structure. It is important to note that some of the error in prediction in our models is likely the result of the lag between the LiDAR acquisition (2012) and the field data acquisition (2016-2017). This lag is likely to result in the most error in the youngest stands where changes in herb and shrub growth are likely to be greatest but I in the analysis, most stands are mature forest. Likely with less lag between LiDAR and ground-based measures we would have seen even better predictions. In addition, the error associated with GPS locations can introduce error into the relationship between ground-based and LiDAR estimates, although GPS technology is constantly improving. Our GPS (SXblue), reports sub meter accuracy under ideal conditions, but discrepancy in geoposition probably accounts for some of the error in prediction.” Lines 458-470 443 You talk a lot about the data and the models… I would like to see some hypothesis building about why certain metrics were better and how they are directly related to the structure. We are not comfortable with a lot of hypothesis building for this paper as we had few a priori expectaitons about the outcome. We have provided justification throughout these responses to argue that, in the end, the models don’t provide a great deal of ecological insight but do effectively demonstrate the value of LiDAR to capture understory vegetation structure. 443 I would also like to know, as a scientist, how you think these models, indices, metrics, etc. might work in other forests? Do you think they are particular to this ecosystem? We added the following text to the conclusions to suggest that we think the models are transferable but it would be good to test them in some new forested ecosystems. “With increasing sampling density associated with better LiDAR technology, we anticipate that understory cover models will become more reliable and generalizable across regions. In particular, because the models are not dependent on any ecological relationships per se, because they use direct measures of vegetation cover, we believe that under similar sampling densities the models should be generalizable. Additional testing of this approach in different forested ecosystems would provide more confidence in the transferability of the models. “ Lines 498-504 Reviewer #2 Abstract Line 27: has yet to be fully validated. Changed as requested. Introduction Line 86 (and as it occurs thereafter): some papers cite Random Forests with capital letters. I think it is fine either way. We converted all instances of “random forest” to “Random Forest”. Excellent lit review. The authors did an especially good job exploring various options for analysis. Thank you. Methods Line 126: extra space between composition features Corrected. Thorough Methods section. I like Figure 1. Thank you. Analysis Line 214: insert space between variables (26) Added as requested. Line 229: insert space between package (18… Added as requested. Thorough presentation of analysis methods. Thank you. Results Solid statistical analysis methods and presentation of results. There are a lot of table but I’m not sure how to suggest minimizing them as they all contain pertinent info to the study. We preferred to maintain the tables in their original form to convey the important information. Submitted filename: Response to Reviewers for PONE final version.docx Click here for additional data file. 7 Nov 2019 Modelling vegetation understory cover using LiDAR metrics PONE-D-19-19180R1 Dear Dr. Venier, We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements. Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication. Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. With kind regards and congratulations, John Toland Van Stan II, Ph.D. Academic Editor PLOS ONE Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: I am very pleased with the responses to my previous comments and think the manuscript is of high quality. The manuscript satisfies all of the above criteria. I support publication of this manuscript. Reviewer #2: I am please with the authors' revisions and response. I have no further comments. I would move to accept the manuscript. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Jonathon J Donager Reviewer #2: No 12 Nov 2019 PONE-D-19-19180R1 Modelling vegetation understory cover using LiDAR metrics Dear Dr. Venier: I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. For any other questions or concerns, please email plosone@plos.org. Thank you for submitting your work to PLOS ONE. With kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. John Toland Van Stan II Academic Editor PLOS ONE

4 in total

Modelling vegetation understory cover using LiDAR metrics.

Introduction

Methods

Study area

Field data collection

Sampling design for field observations of vegetation structure (FIELD).

LiDAR acquisition

Data processing and LiDAR variables

Analysis

Results

Relationship among LiDAR metrics

Mixed-effects models

Predicted versus observed scatterplot.

R2 and AIC values for sixteen candidate linear mixed-effects models.

Predictions of FIELD for each of three strata based on the mixed-effects model consisting of VOX1m + STRATUM + interaction.

Ten-fold cross-validation results from top linear mixed-effects model and the selected Random Forest model, based on symmetric mean absolute percentage error (SMAPE).

Random forest models

Discussion

Conclusions

Definitions of all variables included in at least one of the mixed-effects or Random Forest models.

Rank importance of explanatory variables in the selected Random Forest model with 59 variables.

Frequency of explanatory variables among the 18 Random Forest models run with 341 to 7 variables.

Data used in analyses for manuscript.

1. Random forests for classification in ecology.

2. Invasive plants transform the three-dimensional structure of rain forests.

Review 3. Advances in animal ecology from 3D-LiDAR ecosystem mapping.

4. Using satellite and airborne LiDAR to model woodpecker habitat occupancy at the landscape scale.