Literature DB >> 31763373

A PCA-Based Framework for Determining Remotely Sensed Geological Surface Orientations and Their Statistical Quality.

Abstract

The orientations of planar rock layers are fundamental to our understanding of structural geology and stratigraphy. Remote sensing platforms including satellites, unmanned aerial vehicles, and Light Detection and Ranging scanners are increasingly used to build three-dimensional models of structural features on Earth and other planets. Remotely gathered orientation measurements are straightforward to calculate but subject to uncertainty inherited from input data, differences in viewing geometry, and the plane-fitting process, complicating geological interpretation. Here, we improve upon the present state of the art by developing a generalized means for computing and reporting errors in strike-dip measurements from remotely sensed data. We outline a general framework for representing the error space of uncertain orientations in Cartesian and spherical coordinates and develop a principal component analysis (PCA) regression method, which captures statistical errors independent of viewing geometry and input data structure. We also introduce graphical techniques to visualize the uniqueness and quality of orientation measurements and a process to increase statistical power by jointly fitting bedding planes under the assumption of parallel stratigraphy. These new techniques are validated by comparison of field-gathered orientation measurements with those derived from minimally processed satellite imagery of the San Rafael Swell, Utah, and unmanned aerial vehicle imagery from the Naukluft Mountains, Namibia. We provide software packages supporting planar fitting and visualization of error distributions. This method increases the precision and comparability of structural measurements gathered using a new generation of remote sensing techniques. ©2019. The Authors.

Entities: Chemical Disease Species

Keywords: PCA; orientation; photogrammetry; statistics; structural geology; unmanned aerial vehicle

Year: 2019 PMID： 31763373 PMCID： PMC6853252 DOI： 10.1029/2018EA000416

Source DB: PubMed Journal: Earth Space Sci ISSN： 2333-5084 Impact factor: 2.900

Introduction

The orientations of geological features such as faults, dikes, lava flows, and sedimentary beds record characteristics of deposition or emplacement, episodes of deformation, and relationships between bodies of rock. Idealized planes describing these features are common units of geological analysis. The orientations of these planes have most often been collected directly, using a field structural compass or surveying equipment. High‐resolution remote sensing techniques are increasingly viable for three‐dimensional imaging of geological features, and measurement of their orientations, at submeter scale. In field mapping by a structural geologist, directly measured orientations (e.g., outcrop measurements with a pocket transit compass) have been considered sufficiently accurate that errors are not reported. Orientations measured using remotely gathered data are a powerful new tool for geological analysis, especially when outcrops are inaccessible to direct measurement. However, poorly modeled and hard‐to‐visualize errors complicate assessment of the true orientation of a geological structure. Remote sensing data sets spatially vary in resolution and quality; measurements are additionally biased by terrain effects, sensor‐dependent noise, measurement geometry, operator error in defining relevant features, and other factors. Even carefully planned structural studies with consistent error analysis procedures cannot be easily interpreted using current visualization tools. For instance, map symbols for nominal strike and dip do not provide a means of understanding commonly unpredictable, nonlinear errors inherent in orientation measurements in varied terrain. Individual orientation measurements often only coarsely correspond to the overall structural pattern (e.g., Lewis & Aharonson, 2006; Okubo et al., 2008; Quinn & Ehlmann, 2019). Without methods to visualize orientation errors, reported bedding orientations do not fully communicate information used in study interpretations. One major motivation of this work is the orbital mapping of layered rocks on Mars, which are key indicators of Mars' geological history (e.g., Dromart et al., 2007; Lewis, Aharonson, et al., 2008; Malin & Edgett, 2000; McEwen et al., 1999; Quantin et al., 2005; Stack et al., 2015). For example, sedimentary deposits are mapped from orbit and with rovers, but detailed evaluation of their depositional mechanisms requires understanding bedding orientations, particularly bedding dip (e.g., DiBiase et al., 2013; Edgar et al., 2012; Goudge et al., 2017; Kite et al., 2013; Lewis & Aharonson, 2006, 2014; Lewis, Aharonson, et al., 2008; Okubo, 2010; Okubo et al., 2008; Quinn & Ehlmann, 2019). Detailed accounting for errors is also important in large‐scale regional studies, where measurements are derived from multiple data sets and outcrops of varying quality (e.g., Metz et al., 2010). At present, some Mars mapping studies report no error ranges for bedding orientations (e.g., DiBiase et al., 2013), while others report errors output from commercial regression packages (e.g., Okubo et al., 2008). Other studies use “dip error” (Goudge et al., 2017; Lewis & Aharonson, 2006), “pole error” (Kite et al., 2016), or bootstrap resampling statistics (Fraeman et al., 2013; Metz et al., 2010) to evaluate measurement quality. The varying approaches used to generate orientation errors, with different degrees of reporting rigor, complicate understanding of the accuracy and precision of specific strike/dip measurements and impede geologic interpretation. The methods presented in this study were developed in conjunction with structural mapping of the layered sulfates at northeast Syrtis Major, Mars. These thick‐layered deposits occupy a critical stratigraphic interval, but their relatively poor exposure complicates orientation measurement. Additionally, because small changes in dip can imply completely different depositional processes, high‐confidence angular measurements are crucial drivers of interpretation (Quinn & Ehlmann, 2019). The use of remotely sensed orientation measurements in terrestrial geology is also a key driver of this work. High‐resolution satellite imagery and digital elevation models (DEMs) now available for much of the Earth's surface (e.g., Gesch et al., 2014) support regional photogeologic mapping. The advent of field‐portable Light Detection and Ranging (LIDAR) instruments (Buckley et al., 2008) and the improving accuracy of structure‐from‐motion (SfM) photogrammetry increasingly allow remote measurement of geological surface orientations. Numerous recent studies use orientation measurements from unmanned aerial vehicle (UAV) photogrammetry (e.g., Vollgger & Cruden, 2016), often using three‐point analytical approximations (Fienen, 2005) or commercial regression packages. Terrestrial remote sensing structural studies are hindered by the same data quality and geometric uncertainties that complicate such work on Mars. For example, in a recent study of the relative quality of orientations extracted from different remote sensing data sets, Cawood et al. (2017) extract bedding planes and fold axes from LIDAR and SfM photogrammetric digital surface models of weathered beds of the Stackpole syncline. Although most cases allowed high‐accuracy comparisons with direct measurements, significant differences in fit structure were found between LIDAR, ground‐based, and UAV‐based photogrammetry. These discrepancies are related to the orientation of the outcrop and viewing geometry as well as the scale of facet construction and point fitting. Improvements on the state of the art in computing and visualizing orientation errors must flexibly respond to different types of error. Bedding orientation measurements depend not only on the internal errors of the remote sensing data set but also on the geometry of the outcrop measured (e.g., hillslope concavity and aspect) and human accuracy in tracing bedding features. Measurements from the same data set and geologic unit can have completely different error structures depending on the shape of topography, presenting a challenge for error analysis. A flexible statistical and error visualization approach for planar orientation measurements will enable quantitative comparison between planar fits with completely different error structures. By increasing the robustness of orientation determination from heterogeneous data, such a framework will extend the range of situations in which structural metrics can be reliably assessed from remote sensing imagery, enabling statistically rigorous comparison of measurements with variable source data type and resolution, outcrop exposure and quality, and viewing geometry. In this contribution, we detail a novel method for the calculation and visualization of geological orientation errors. First, we describe a generalized approach to planar orientation errors for three‐dimensional data sets (section 2) and its implementation as a principal component analysis (PCA)‐based statistical procedure for planar fitting (section 3). We test the method using terrestrial orientations recovered from satellite and UAV data (section 4) and discuss potential alternative statistical parameterizations (section 5). Finally, we describe general geometric transformations for the error space of a plane A and two complimentary open‐source software packages that support planar fitting and error visualization B:. This work and the open‐source software we release concurrently are written to be understandable, adoptable, and adaptable by students and professionals working to extract orientation measurements from digital data sets.

Background: The Structure of a Remotely Sensed Plane and Its Error Space

A geologic surface is typically extracted from remote sensing data by isolating representative points from a three‐dimensional surface or outcrop model. Commonly, a bedding trace is digitized from visible imagery and the elevations of constituent points are extracted from a coregistered gridded DEM. Related procedures include the grouping of closely spaced LIDAR points that sample the same surface (e.g., Weingarten et al., 2004) or targeted elevation measurement along a feature trace by theodolite or differential GPS. Since remote sensing data sets are typically defined in Cartesian spatial coordinates, all of these methods produce an array of three‐dimensional points in space that together represent a single feature. Planar representations of geological surfaces are typically modeled using regression statistics: the set of coordinates representing a potential plane is converted to an orientation by finding the best fitting plane through the data set using minimization (e.g., PCA, ordinary least squares [OLS], and other regression frameworks) in Cartesian coordinate space (Fahrmeir et al., 2013; Jolliffe, 2002). The idealized geologic surface that results from regression can be mathematically described as a plane, requiring three free parameters. Description of orientation alone requires only two free parameters, represented either as slopes in two directions or the orientation of a normal vector to the plane (Figure 1a).

Figure 1

Schematic representation of the structure of a plane with errors and its relation to the global Cartesian coordinate system x. (a) The plane and its associated normal vector n. In unweighted principal component analysis, n falls along the principal component axis . The three unit vectors v , oriented along , form rows of the rotation matrix V that maps x to . (b) An ‐ slice of the nominal plane and its normal vector and a bundle of planes with slightly different orientations. The resulting hyperbolic error space, its major axes h 2 and h 3, and the normal vector error ellipsoid are overlaid in blue. Regression of a best fitting plane inherently involves uncertainty, which combines with irregularities in the input data set to produce orientation errors. In Cartesian space terms, these errors can be represented as a hyperboloid of two sheets enclosing all possible planes in the data set, varying around the nominal regression line, or alternatively as a set ofnormal vectors perpendicular to the plane. Assuming a fixed length, the error space for this normal vector forms an ellipsoid containing possible vector endpoints (Figure 1b). In spherical coordinates, orientations are intuitively represented as a pair of angles (commonly, strike/dip, or dip/dip direction). This two‐angle representation supports visualization of orientation information on stereonets and related spherical plots. Previous studies of bedding orientation errors have parameterized orientation error in terms of strike and dip (Cruden & Charlesworth, 1976), and many workers have reported orientation errors in these terms (e.g., Lewis & Aharonson, 2006; Okubo et al., 2008). However, orientation errors are not necessarily aligned with the strike/dip parameters that describe the nominal plane; errors expressed in these terms rely implicitly on the small‐angle assumption. Near‐horizontal bedding (a common mode of stratigraphic exposure) has highly nonlinear angular dispersion in strike when approaching zero dip, with large covariances between the two. Errors parameterized as pole error (angular error around the nominal orientation of a plane) operate in spherical coordinates, entirely avoiding the issues of linearization. Directional statistical fitting mechanisms commonly used for geological orientations yield errors parameterized as pole error but operate entirely on data already expressed in angular terms (e.g., Bingham, 1974; Fisher et al., 1987; Kent et al., 1983; Mardia, 2014; Onstott, 1980). For data defined in Cartesian space, the structure of these pole errors must be defined with a statistical process. Regression errors defined in Cartesian space can be mapped to spherical coordinates using geometric projection onto a unit sphere: the hyperbolic errors to the nominal plane map to a spherical girdle (a bundle of great circles), and the cone of normal vector errors projects to an ellipse, a two‐axis expression of the “pole error” (Figure 2a). The angular span of this spherical girdle or ellipse can be defined by θ max, the maximum angular error to the plane, and θ min, which is orthogonal to θ max by definition. Importantly, θ max need not be oriented along strike or along dip; instead, the orientation of θ max with respect to the nominal plane is expressed using a rake angle between the strike of the plane and θ max (Figure 2b). This format generalizes pole error to allow the full expression of a Cartesian orientation error space in angular terms, with five free parameters. Errors expressed in this structure are the target of this work.

Figure 2

Schematic representation of the relationship between the nominal planar fit (black), the hyperbolic error shell and inverse ellipsoid representing the normal vector endpoint (blue), and the spherical error distribution formed by projecting the tangents to these error spaces onto the unit sphere (purple); θ min and θ max define the scale of orientation errors along two axes within the plane, and . (a) Projection of Cartesian error space to spherical coordinates, both as a planar girdle and pole error ellipse. (b) Orientation of the error space to the plane (defined by θ min and θ max) relative to the nominal plane, emphasizing the rake angle needed to report the directions of errors within the plane.

Methods

PCA for Planar Fitting

Error Treatment in OLS Versus PCA

OLS regression is the most common technique for fitting orientations of lines and planes. However, many other regression techniques, which chiefly differ in their mechanism for apportioning error along the coordinate axes of the fit, can be used to define errors to a plane. OLS regression fundamentally tests the relationship of a dependent variable with a set of independent variables. All error is assumed to belong to the dependent variable, which in spatial data is usually assigned to the vertical plane. This property inhibits the fitting of steep slopes (Figure 3). Geological planes can be steeply dipping, depending on their origin and geologic context, and it is not always reasonable to assume that errors are chiefly vertical.

Figure 3

PCA and OLS regressions of a two‐dimensional, zero‐centered point cloud rotated counterclockwise by (a) 0°, (b) 40°, and (c) 80°, emphasizing the nonlinear relationship between OLS and PCA regressions for differently dipping planes with the same measurement scatter. Unweighted PCA retains the same error structure regardless of orientation, while the apparent scale of OLS errors decreases as fitted orientations steepen. OLS is also structurally unwilling to fit near‐vertical data. The unbiased reconstruction of orientation provided by PCA is often preferable, especially when it is not rigorously known that errors are confined to the vertical plane. PCA = principal component analysis; OLS = ordinary least squares; CI = confidence interval. Consequently, we use PCA, which fits errors along all axes simultaneously, with no distinction between independent and dependent data. Errors are instead minimized orthogonal to the best fitting plane. The flexibility of this fitting mechanism is a significant advantage for fitting arbitrarily oriented planes atop data sets with different error structures; it is particularly relevant when errors are known to be nonvertical. Many such situations exist for geological orientation measurements: errors for photogrammetric data sets are generally dependent on the viewing geometry of the image pair(s) used to assemble the 3‐D model. Elevation models created from oblique UAV imagery of cliff faces (e.g., section 4.2) will have chiefly horizontal errors; multiview SfM data sets will have errors oriented along arbitrary, oblique view planes. Even elevations measured on a gridded data set have several sources of nonvertical error: (1) error in the construction of the DEM (e.g., photogrammetric image‐registration error), (2) resampling error (sub–post smoothing imparted by gridding), (3) sampling error (inexact digitization of measured features), and (4) downslope bias. Thoughoften poorly quantified, these errors still influence the output of planar fitting. PCA has been used for orientation measurement in contexts requiring flexibility in the expected orientation of features, ranging from paleomagnetism data reduction (Kirschvink, 1980) to computer vision and scene mapping (Nurunnabi et al., 2012; Weingarten et al., 2004). Much of the literature urges caution when applying PCA to estimate statistical confidence (e.g., Faber et al., 1993, 1995; Jolliffe, 2002). PCA is not usually developed or motivated with a clear probabilistic framework (Tipping & Bishop, 1999). Instead, it is commonly used for dimensionality reduction, compressing the variation of a multidimensional data set into a smaller set of explanatory variables. That process is difficult to statistically model, largely limiting PCA to algorithmic applications (e.g., image processing) and exploratory data analysis, except where explicit statistical rationale can be advanced for how many principal components to retain (Jolliffe, 2002). For fitting spatial planes, both input and output data are tied to orthonormal spatial coordinates. Thus, finding the best fitting plane involves only rotation, not dimensionality reduction, allowing us to circumvent this source of uncertainty. Evaluating the orientation and scatter along the axes of the input data, rather than discarding some of them, is statistically straightforward (section 3.3).

Orientation Examples

To exhibit the properties of the PCA algorithm applied to data sets of varying quality, we focus on four end‐member examples of digitized bedding traces (Figure 4) with a range of data set structures of types discussed in Table 1. These examples are digitized traces of sedimentary bedding measured within an area in NE Syrtis, Mars during the study described in Quinn and Ehlmann (2019). Orientations were collected atop paired orthophotos and DEMs, which have varying spatial resolution and error structure due to differences in dust cover and stereo geometry. These bedding traces have different potential error structures due to their range of hillslope aspect and curvature. Alongside a complete mathematical description, we follow each case through transformation of its error space from Cartesian to spherical coordinates (Figures 4, 5, 6); the numerical breakdown of their errors is summarized in Table 2.

Figure 4

Context maps showing traced bedding planes (red) and nominal bedding orientation measurements for four test cases in the NE Syrtis region of Mars. The backdrop is HiRISE or CTX imagery, and 10‐m contours follow the photogrammetric elevation model (derived from the data set shown) used to extract orientations. (a) A bedding exposure on a concave hillslope between two parallel raised ridges, atop a HiRISE image and elevation model. (b) A similar concave hillslope with slightly less 3‐D exposure, atop lower‐resolution CTX data. (c) A linear bedding trace on a planar, west facing hillslope. (d) A rectangular area of a dipping lava flow surface atop low‐precision CTX topography.

Table 1

Classification of Data Set Major Axes

Scenarioa	Hyperbolic axesb	Shape of variance ellipsoid	Notes
A	h₁≈ h₂ > h₃	Prolate ellipsoid	Plane well defined in two dimensions, with small error axis
B	h₁ > h₂ > h₃	Scalene ellipsoid	Quality of planar fit depends on axial dimensions and structure of data set
C	h₁ > h₂≈ h₃	Oblate ellipsoid	Defined along a line, but with no unique planar orientation
D	h₁≈vh₂≈ h₃	Spherical	Poorly constrained on all axes, no clear plane defined

Scenario lettering corresponds to Figures 4, 5, 6.

h=𝛌 in Onstott (1980), and h=𝛌+F in this work (see Table 3 for notation definition).

Figure 5

The Cartesian error space of fitted orientation measurements corresponding to example bedding traces (Figure 4). Each plane is decomposed into two views aligned with , with in‐plane variation shown on the horizontal axis and out‐of‐plane variation on the vertical. The data making up the planar measurement are shown as gray points, and hyperbolic error bounds computed by several methods are overlain. Errors are vertically exaggerated and the true scale of angular errors is given for each set of axes. The error structure of each fitted plane depends on the characteristics of the input point cloud: (a) A well‐fit plane with low errors on all axes. (b) A slightly poorer fit with minimal definition along . (c) A plane well defined along but essentially undefined along . (d) A fit poorly defined on both axes.

Figure 6

Projection of the hyperbolic errors to the plane into spherical coordinates to show angular errors. Estimates by different methods for computing h are colored as in Figure 5. (a–d) Spherical error space for each of the planes shown in Figures 4 and 5, projected onto oblique upper hemisphere, equal area stereonets. (e) Error space to the bedding pole for each of the planes in panels (a)–(d).

Table 2

Data for Orientation Examples

			Eigenvalues			Spherical summary ( °)
n	L ^a	R ^b	λ₁	λ₂	λ₃	Strike	Dip	Rake	θ _min	θ _max
Type examples (Table 1 and Figures 5 and 6 , ordered a–d)
31	479	2.0	17228	422.9	0.82	311.7	7.6	81.5	0.59	3.88
546	584	1.1	21634	2079.3	0.11	11.3	3.5	172.7	0.15	0.48
593	615	2.4	31514	10.0	0.66	174.2	13.2	60.9	0.29	16.49
172	0	18.5	2163	948.1	73.74	139.6	10.1	119.2	13.17	19.92
Joint fitting of parallel planes (Figure 8)
Well‐constrained single‐bed measurements
476	507	0.7	16825	1437.8	0.09	9.3	3.5	9.9	0.15	0.51
546	584	1.1	21634	2079.3	0.11	11.3	3.5	172.7	0.15	0.48
Joint fit
1217	—	1.3	6431	972.4	0.13	11.8	3.5	156.1	0.28	0.71
Components (ordered from north to south)
315	332	0.5	8940	88.1	0.05	339.7	3.5	167.7	0.15	1.54
189	209	1.1	3544	14.1	0.09	38.3	6.7	112.5	0.34	5.43
367	389	0.9	12008	205.8	0.14	7.3	3.4	158.3	0.22	1.69
138	146	0.6	1746	5.7	0.06	358.1	6.1	70.7	0.43	7.69
208	217	0.3	3778	33.0	0.02	9.5	3.8	59.5	0.16	1.72

L: length of bedding trace (m).

R: maximum residual to plane (m).

Classification of Data Set Major Axes Scenario lettering corresponds to Figures 4, 5, 6. h=𝛌 in Onstott (1980), and h=𝛌+F in this work (see Table 3 for notation definition).

Table 3

Summary of Notation

Symbol	Meaning		Shape
i	In subscript, represents component of 3D vector basis (1,2 or 3)
n	Number of samples in data matrix
∗	In subscript, represents all n samples in data matrix.
x, x _i	Orthonormal basis vectors defining “world” coordinates		3×1
D	Data matrix in “world” coordinates		n×3
𝛍_D	Column‐wise mean of data matrix		3×1
Principal component analysis
M	Data matrix centered on all axes	M=D−μ _D	n×3
C	Covariance matrix for M	C(n−1)=M ^T M	3×3
𝛌, λ_i	Vector of eigenvalues of M		3×1
𝚲	Diagonal matrix of eigenvalues of M	𝚲=I 𝛌	3×3
V	Rotation matrix of orthonormal eigenvectors		3×3
v _i	Unit‐length rows of V oriented along eigenvector axes		3×1
n	Normal vector to the best‐fitting plane	n=v ₃	3×1
x‾,x‾i	Orthonormal coordinate basis aligned with principal component axes, v _i		3×1
M‾	Data matrix aligned with principal component axes	M‾=MVT	n×3
Singular value decomposition
U	Left singular vectors of M		n×3
S	Diagonal matrix of the singular values of M	eigenvalues of C	3×3
s, s_i	Vector of singular values	s=𝛌(n−1)	3×1
Statistical error analysis
σM‾	Standard error of data matrix	σM‾=𝛌=sn−1	3×1
σ _𝛌	Standard error of the eigenvalues		3×1
β^	Sample regression parameters		2×1
d	Degrees of freedom of the estimator	d=2 for angular error analysis
α	Confidence level for an error surface	α=0.95 is typical
F _a,d,n−d	Fisher percent‐point test statistic
Construction of error surfaces
p	Parameters of the nominal plane in x‾	p=𝛌	3×1
e	Errors to the nominal plane in x‾	e=error(𝛌)=F _a,d,n−d σ _λ	3×1
h	Semimajor axes of hyperbolic quadric defining an error surface	h=p+e	3×1
Q	Matrix representation of a quadric surface	as defined in text	5×5
Q‾H	Tensor representation of a hyperbolic error quadric for semiaxes h		5×5
T	An affine or projective transformation matrix	as defined in text	varies
C _H	Matrix representation of conic section	as defined in text	4×4
Spherical errors
γ	Angle in [0,2π] from x‾1 within x‾1– x‾2 plane
x‾γ	2D coordinate basis orthogonal to nominal plane, defined by x‾γ,x‾3		2×1
θ _γ	Angular error for an arbitrary direction within the plane
θ _max, θ _min	Maximum and minimum angular errors at γmin=0,γmax=π2

Data for Orientation Examples L: length of bedding trace (m). R: maximum residual to plane (m). Context maps showing traced bedding planes (red) and nominal bedding orientation measurements for four test cases in the NE Syrtis region of Mars. The backdrop is HiRISE or CTX imagery, and 10‐m contours follow the photogrammetric elevation model (derived from the data set shown) used to extract orientations. (a) A bedding exposure on a concave hillslope between two parallel raised ridges, atop a HiRISE image and elevation model. (b) A similar concave hillslope with slightly less 3‐D exposure, atop lower‐resolution CTX data. (c) A linear bedding trace on a planar, west facing hillslope. (d) A rectangular area of a dipping lava flow surface atop low‐precision CTX topography.

The Nominal Plane in PCA

Notation

Matrices are uppercase and bold (M), while vectors are lowercase and bold (x). Vector components use upright characters (x1) while scalar quantities are in script (n). The subscript i defines a range of indices over the dimensions of the coordinate basis i = [1,2,3]. Thus, x = x = [x1,x2,x3]. When a vector component is given in subscript (e.g., σ ), its implicit i index is dropped. An index of all notation is contained in Table 3. Summary of Notation

Finding Principal Components

The original data matrix D is a n × 3 matrix containing three‐axis coordinates in a Cartesian coordinate system (commonly 3‐D geographical points in UTM or another local geodetic system). The n represents the number of independent observations in the data set. The centered data matrix M is centered by subtraction of the mean ( ) along each axis. The sample covariance matrix is defined as the cross‐product matrix M T M scaled by the size of the data set. PCA is formally described as an eigenvector decomposition of C, represented as and arrived at by numerical optimization. V is a rotation matrix composed of the eigenvectors, λ is a vector of eigenvalues of C, and is the diagonal matrix of eigenvalues. The eigenvalues represent the variance of M along each eigenvector row (v ) of V. In practice, singular value decomposition is used for a more numerically stable implementation of PCA. This technique is represented as where U T U = V T V = I and is a diagonal matrix containing the singular values of the data matrix M. A direct relationship between the singular values and the eigenvalue matrix, allows recovery of the eigenvalues 𝛌.

Rotation Into a Principal Component‐Aligned Frame

Geometrically, PCA corresponds to rotation of the data set into a decorrelated reference frame, eliminating cross correlations between components and defining a new coordinate basis aligned with the directions of maximum variability of the data set. This rotated orthonormal coordinate basis, is aligned with the axes of V (Figure 1). An arbitrary vector a in the global Cartesian plane can be rotated into this coordinate system using . Rotation of data into a principal component‐aligned coordinate basis significantly eases error analysis and visualization of the structure of the data set relative to its best fitting plane. The “axis‐aligned” projection of the input data set , defined as , collapses the data set onto its best fitting plane. Inverting equation (3), the sample covariance matrix C can be expressed in this coordinate system as The rotated data set varies independently along each axis of , and the magnitude of the eigenvalues 𝛌 of the PCA fit is proportional to the scale of the data set along each principal component axis. The eigenvalues are equivalent to the three‐component vector variance of the decorrelated data along each axis of : The axes and fall within the best fitting plane through the data set. The axis is along the normal to the plane; scatter along this axis represents the error in the planar fit. Thus, the third column of the aligned data matrix, , represents residuals from the nominal planar fit. Rotation of the data set into provides a useful view of the distribution of residuals and potential nonrandom structure relative to the best fitting plane: plotting versus yields a view of structure within the model plane, and versus for i = 1,2 shows residuals (Figure 5). The Cartesian error space of fitted orientation measurements corresponding to example bedding traces (Figure 4). Each plane is decomposed into two views aligned with , with in‐plane variation shown on the horizontal axis and out‐of‐plane variation on the vertical. The data making up the planar measurement are shown as gray points, and hyperbolic error bounds computed by several methods are overlain. Errors are vertically exaggerated and the true scale of angular errors is given for each set of axes. The error structure of each fitted plane depends on the characteristics of the input point cloud: (a) A well‐fit plane with low errors on all axes. (b) A slightly poorer fit with minimal definition along . (c) A plane well defined along but essentially undefined along . (d) A fit poorly defined on both axes.

Strike and Dip of the Nominal Plane

The first and second eigenvector rows of V describe the planar fit in the absence of errors. The third eigenvector row of V is orthogonal to the plane; this normal vector n = v 3 can be used with the mean of the data set 𝛍 (which the regressed plane passes through by definition), to form the plane equation where x is the set of all points that lie within the modeled plane. The nominal strike and dip in a geographic framework (strike defined relative to north) are calculated as follows:

Confidence Intervals for Planar Orientations

Errors to a planar measurements arise from statistical uncertainties on the parameters of a planar fit, and accurate modeling of errors requires the incorporation of a statistical distribution that is responsive to variation in input data quality. The PCA eigenvectors 𝛌 can be treated analogously to the OLS fit parameters to define the error space to the plane, and the resulting orientation errors can be modeled by a hyperbolic shell with axes h. Data set‐specific orientations errors are scaled by the Fisher (F) distribution to produce standardized statistical errors.

Eigenvectors as Regression Parameters

The statistical basis for PCA regression errors can be developed from the widely used OLS regression. The closed‐form equation for OLS is given by where X is a matrix of explanatory variables (n × 2 for planar fitting), y is a column vector of n dependent measurements, and is the two‐element vector of regression coefficients. Regression errors are estimated using the variance of these fit coefficients, where σ 2 is the scalar mean squared error of the residuals to the fit (Fahrmeir et al., 2013, p. 117). When rotated into , the results of PCA transformation are directly comparable to OLS, allowing an expression for error to be adapted from standard statistical techniques. Errors are oriented along and conform by definition to the OLS assumption that errors are uniaxial, that is, vertical only. Accordingly, , since a fit plane expressed in has no slope. However, is nonzero and can be used to construct errors analogous to those defined in OLS. An expression for errors can be derived by recasting the PCA procedure into OLS notation. Recalling equation (2), the covariance matrix C of a mean‐centered point cloud M = [X y] can be decomposed into subspaces: where n is the number of data points. Transformed into , this relationship reduces to . Since 𝚲 is a diagonal matrix of eigenvalues, , and and contain only nullspace. Substituting into equation (11) yields Since the PCA eigenvalues are equivalent to the variance of the rotated data matrix along each axis (equation (7)), we can further substitute . Since there are no covariances between the terms, the result reduces to a PCA regression error formulated in terms of eigenvectors.

Errors Limited by Data Variance

The statistical definition of the data set centroid has important implications for the structure of planar orientation errors, and we make a significant adjustment to this framework to account for the nature of the orientation‐fitting problem. In standard regression statistics, the best fitting plane is modeled to pass through the mean of the data set, which is assumed to be more precisely known as sample size increases, regardless of the variance of the data set. The precision of the data set mean is modeled by its variance, . This “mean‐limited” construction forms the basis for the regression errors derived in equation (15). For fitting geological planes, all data points should instead be treated as estimates of the true value of a single plane. In this formulation of regression error, a high‐quality fitted plane is defined by low variance, rather than well‐known variance. The precision of the true value of the data set is simply modeled by its variance, σ 2. This “variance‐limited” framework explicitly models departures from a single plane, rather than the strength of correlations between scattered data. The definition of the data set centroid substantially alters the error structure of the fitted plane. A mean‐limited parametrization overestimates angular certainty when sample sizes are large (Figure 7a), complicating comparisons of measurements with different sampling characteristics. In variance‐limited statistics, the variance of data itself sets a floor for errors to the plane: large departures from an idealized plane are penalized and the basic structure of angular errors is preserved regardless of data density (Figure 7b). This feature is crucial for comparing planes with different sampling characteristics. Most “off‐the‐shelf” packages for planar fitting use standard mean‐centered statistics, suggesting that measurements made using these packages may be fundamentally biased by sample size effects.

Figure 7

Exploration of centroid behavior with sample size. (a) Orientation errors for an example plane standard regression statistics with mean‐based error scaling. (b) Variance‐limited regression of same example plane, modeling all points as estimates of a single true plane. This parameterization is more resistant to artificially low errors when data are oversampled. (c) Exploration of variance with sample size for a randomly generated roughly planar point cloud with error hyperboloid axial lengths h = [100,10,5], using several methods for variance estimation. When the data set centroid is mean limited, errors trend inappropriately to 0 at large sample sizes.

The mean‐limited PCA regression errors expressed in equation (15) can be adapted to the variance‐limited framework by replacing the variance of the mean in equation (14) with the variance of the data set, which removes the scalar, yielding the simplified expression This allows equation (15) to be simplified to The orientation error modeled by PCA regression is thus equivalent to the ratio of eigenvalues.

Population Fit Parameters

The captures regression errors specific to the sample measured and is a maximum‐likelihood estimator of the error to the true population fit parameter, var β; that is, . The var β can be estimated using a statistical distribution that takes into account sample size and the degrees of freedom of the input data (e.g., Fahrmeir et al., 2013). For PCA, the eigenvalues λ that represent the data set are equivalent to the sample variance of the data set along each major axis (equation (7)), and the population variance along each axis is equivalent to . Since PCA eigenvectors are orthogonal, their eigenvalues are statistically independent (Jolliffe, 2002, p. 46) and can be straightforwardly ratioed. Extending equation (17), statistical errors to the planar estimator can be expressed as a ratio of uncertain eigenvalues: The orthogonality of PCA allows regression errors to be represented as a symmetrical hyperbolic surface (Figure 2). This geometric construct can be manipulated with vector and tensor algebra, increasing flexibility for data visualization (section 3.4). The two orthogonal slopes that make up varβ PCA are equivalent to tangents to an elliptic hyperboloid on two orthogonal axes. This error hyperboloid has semimajor axes defined by the vector expression h = p ± e, where p represents the regression parameters and represents the error to each eigenvalue. For mean‐limited statistics, , and for variance‐limited statistics, p=𝛌. Thus, orientation uncertainty in the PCA framework can be represented as a hyperbola with eigenvalue axes:

Errors to Eigenvectors

We extend our definition of e to incorporate the Fisher statistical distribution, F: F incorporates the number of samples in the data set (n) the degrees of freedom of the statistical transformation (d), and the desired level of certainty (a). Typical values are α = 0.95 and d = 2 (see section 3.3.5). The σ is the standard error of the eigenvectors, which can be defined in several ways, as summarized below. Results for the four type cases are shown in Figure 5.

Data Variance

The most basic parametrization of orientation errors uses only variance of the input data set (its eigenvalues, in PCA) to represent the error space without σ , resulting simply in h = p=𝛌. The data variance defines the basic structure of the plane, including its scaling based on out‐of‐plane residuals and the directional dependence of fit quality. However, without a statistical treatment of the accuracy of variance, this method is unresponsive to undersampling or differently scaled data sets. The data variance parameterization of orientation errors is developed in the paleomagnetism literature, where uncertain lines and planes model magnetometer response during laboratory measurements of rock remnant magnetism (e.g., Kirschvink, 1980). This literature describes the parameterization of the PCA fit as an ellipsoid (the “dual” quadric to the hyperbola enclosing the plane; see Appendix A) with different potential shapes depending on data set structure (Table 1; adapted from Onstott, 1980).

Sampling Variance

The simplest method of statistically based error scaling uses multivariate statistics based on sample size. In this framework, errors assume that the measured data is a random sampling of a population that conforms to a Gaussian distribution. The expression for variance of the eigenvectors for PCA, arises directly from the estimation of population variance in sampling statistics (Jolliffe, 2002, p. 48; Faber et al., 1993).

Noise Variance

The standard assumption of Gaussian population statistics, that the variance of the sample is primarily a function of its size, may be imperfect when applied to continuously sampled data. Data sets that include all of the available data over an interval (i.e., are not random samples of a population) are implicitly highly correlated, so sample‐size‐based statistics may be misleading. Interpolated elevation data can easily be smoothed and overfitted, increasing apparent statistical power with little to no improvement in the quality of the fit. Conversely, when the noise in the input data set is low, even small samples can show significant results. The noise variance framework for PCA errors (Faber et al., 1993, 1995; Faber & Kowalski, 1997; Malinowski, 1977) is explicitly designed for use with continuously sampled data. Instead of uniformly scaling errors along a given principal component axis , noise covariance is based on the intuition that “measurement noise” defined along higher‐dimensional axes provides a good estimate of the errors on all axes. In our case, scatter along is the “noise component” of the data and may provide a better estimate of the scatter in and than the variance along these axes. Intuitively, the structure of the data cloud within the best fitting plane represents “signal” and its structure should not be used as a measure of error. Faber et al. (1993) shows that the variance of the PCA eigenvectors can be modeled as where is the “noise variance” of the data matrix. Methods to compute the noise variance rely on the concept of “pseudorank,” the rank of the aligned data matrix in the absence of noise. Detailed treatments of the noise variance framework (Faber et al., 1995; Faber & Kowalski, 1997) discuss adjustment of the pseudorank to incorporate nonlinear bias, but this is unnecessary for our low‐dimensional case. For three‐dimensional data aligned along a plane, errors will be entirely contained in scatter on . A plane without noise will be contained in the – plane, with a pseudorank of K = 2. Malinowski (1977) describes the “real error” component where r × c is the dimensions of the data matrix M. Faber et al. (1993) slightly modifies this to based on experimental validation. For our purposes of planar fitting, K = 2, r = n, and c = 3, and these expressions collapse to (Malinowski, 1977) and (Faber et al., 1993). With sample sizes n ⋙ K, the difference between these estimators is negligible. Combining equation (22) with equation (24), we can express the noise variance of the data set as

Other Statistical Distributions

Several other treatments of errors given in the literature provide direct alternatives for scaling e with different statistical assumptions. Francq and Govaerts (2014) provides a formulation of error bars for two‐axis OLS, which can be generalized to the PCA framework, yielding error axes This formulation provides slightly more constrained errors than both sampling and noise‐based errors, due to the codependence of errors of variables defined in global Cartesian coordinates. Babamoradi et al. (2013) provides an implementation that closely tracks the “sampling variance” method with slightly different scaling for sample sizes. Weingarten et al. (2004) describes a numerical method, which applies OLS regression after PCA rotation, using the slope found by OLS in to estimate var β PCA.

Preferred Choice of σ (or e)

The effect of using different test statistics is minimal for well‐sampled data, and results asymptotically converge on the data variance at large sample sizes (Figures 5 and 6). The formulations tested show similar results, but the “noise variance” is more resistant to changes in sample density (Figures 5c and 5d). We use the noise error as the preferred scaling in software and graphical implementations of this method. Projection of the hyperbolic errors to the plane into spherical coordinates to show angular errors. Estimates by different methods for computing h are colored as in Figure 5. (a–d) Spherical error space for each of the planes shown in Figures 4 and 5, projected onto oblique upper hemisphere, equal area stereonets. (e) Error space to the bedding pole for each of the planes in panels (a)–(d). Exploration of centroid behavior with sample size. (a) Orientation errors for an example plane standard regression statistics with mean‐based error scaling. (b) Variance‐limited regression of same example plane, modeling all points as estimates of a single true plane. This parameterization is more resistant to artificially low errors when data are oversampled. (c) Exploration of variance with sample size for a randomly generated roughly planar point cloud with error hyperboloid axial lengths h = [100,10,5], using several methods for variance estimation. When the data set centroid is mean limited, errors trend inappropriately to 0 at large sample sizes.

Statistical Error Scaling

To create confidence intervals, we apply the Fisher statistical distribution and the formulation of σ from equation (25) to (20). The eigenvalues of the data set each follow the distribution. The Fisher distribution, F , which models ratios of χ 2‐distributed parameters (Babamoradi et al., 2013; Jolliffe, 2002; Francq & Govaerts, 2014), is the appropriate test statistic for orientation data, since regression parameters are composed of ratios of eigenvalues (equation (17)). At large sample sizes, For planar orientations, d = 2, since the orientation information contained in the three eigenvectors can be summarized as two ratios. The remaining parameter, α, is the confidence level at which the distribution should be queried. For typical analysis, α = 0.95, corresponding to a 95% confidence interval, should suffice. The resulting parameterization of the errors to the eigenvectors is summarized for noise errors as To construct the hyperbolic error space of the plane, we recall that h=𝛌±e (equation (19)). At any level of error, the maximum bounding surface of h occurs when the length of in‐plane axes of the hyperboloid are minimized and out‐of‐plane error is maximized. Thus, the maximum error shell used for visualization is or alternatively where controls the direction along which errors are applied to form the maximum error surface.

Displaying Orientation Error Surfaces

Projections of error bounds as 2‐D hyperbolic slices, spherical ellipses, and girdles provide useful visualizations of the error structure of the plane, relying only on the statistically derived hyperboloid with semiaxes h (Figure 2). This represents the uncertain plane (independent of the statistical assumptions used in its construction). Generalized equations for quadric surfaces that can be manipulated with transformation matrices and quaternion rotations are discussed in Appendix A; here we focus on common cases used to develop key visualizations of the error space.

Projection to Hyperbolic Errors

Two‐dimensional conic slices of the hyperbolic error space of the plane summarize data set structure, both in PCA‐aligned and global coordinates. Errors can be assessed along any axis, but slices of the error hyperboloid aligned with the major axes of the planar fit are the most intuitive. These “axis‐aligned” views of the data set, with in‐plane variation on the horizontal axis and out‐of‐plane variation on the vertical, are the ideal decomposition to assess the structure of a fitted data set and verify the quality of the input data D. Visual inspection of data set quality in PCA‐aligned coordinates (Figure 5) is an important quality check on measured orientations. Several measurements (Figures 5b and 5d) show significant out‐of‐plane variation potentially related to both DEM errors and digitizing errors. A hyperbola can by constructed for a two‐dimensional slice of the error quadric, along a coordinate basis with axis within the plane defined as a linear combination of and where γ = [0,2π] is the angle from within the plane. Accordingly, h can be defined as an axis of the 3‐D conic intermediate between h1 and h2 (and a major axis of its 2‐D projection), The 2‐D hyperbolic slice of the error quadric aligned with axis can be represented as For a slice of the plane oriented along , γ = 0, and h = [h1, h3]. For an axis‐aligned and mean‐centered conic, the hyperbolic error bounds in are given by the equivalent representations These error bars can be plotted as is (e.g., Figure 5) or shifted from to x using scaling and rotation as necessary. We discuss this more general transformation in Appendix A.

Spherical Representation of Errors

In addition to the Cartesian reference frame described above, we represent uncertain planar fits in an angular framework to allow plotting on stereonets and direct comparison to other orientation data. For any in‐plane axis h , the angular error from the nominal plane is defined by the tangent to the hyperbolic error sheet, the factor of 2 arising from combining errors for both the upper and lower sheets of the hyperboloid. Solving this for all axes h over γ = [0,2π] yields angular error magnitudes for all directions within the planar fit, and their graphical representation as a girdle defined by angular spans centered along a unit circle within the nominal plane (Figure 6). Angular error surfaces for the normal vector fall 90° from those representing the plane, forming an elliptical error space that encompasses poles to the plane. Normal vector errors can be computed by a similar process to that used to generate a hyperbolic girdle around the plane, using the inverse of the tangents. evaluated over γ = [0,2π] defines the angular dimension of an error ellipse in spherical coordinates, defined relative to . This ellipse can be rotated into global coordinates using the rotation matrix V. A more general solution is discussed in Appendix A2.

Maximum and Minimum Angular Errors

Though the statistical analysis of error is undertaken in Cartesian space, we want to report the magnitude and directionality of error in angular space. For the semiaxes h, the maximum and minimum angular errors of the planar fit are aligned with the best fitting plane of the data set (Figure 2b) and computed as Because of the nonlinearity associated with angular transformations, there is no natural correspondence between the dip direction of a best fitting plane and the direction of θ max and θ min. Thus, to form a full representation of the errors, we must also report the azimuth of the error axis within the plane. This rake angle (Table 2 and Figure 2b) is defined as the angle between the strike and θ max (which is oriented along ) and calculated as where z = [0,0,1] is a vertical unit vector.

Joint Fitting of Parallel Bedding Planes

A common problem for remote sensing of geologically relevant areas is lack of continuous exposure, and planes that are unconstrained in one dimension are common (Figures 4c and 5c). However, exposures of bedding in close spatial association often capture slightly different cuts of topography with different orientation error structures. This is statistically useful: under the assumption of parallel bedding, multiple bedding traces can be jointly fitted to increase the three‐dimensional definition of a planar data set. Error metrics computed after fitting can be used to test the validity of this assumption. In Figure 8, several bedding traces digitized on different sides of the same cuesta show substantially different error structures for what is likely the same bed geometry. In contrast, bedding traces that could be followed around the entire hillslope aspect have much more restricted error spaces. Grouping the low‐quality planar fits creates a much higher‐precision joint measurement at the intersection of the error spaces of individual beds; the accuracy of the approach is indicated by its nearly identical modeled orientation to the high‐precision single‐bed measurements (Figures 8c and 8d).

Figure 8

Joint fitting of bedding traces within a single stratigraphy for planes assumed to be parallel. (a) Map view of bedding traces showing scattered nominal dips digitized from single‐plane bedding traces on opposing hillslopes (dashed white), two well‐constrained single‐plane orientations digitized over the entire range of hillslope aspect (solid gray), and a jointly fit orientation (solid bold white). (b) Plan view of bedding traces centered and stacked atop each other for joint fitting, defining a plane in two dimensions. (c) Side view of the plane showing residuals within the digitized data set. Jagged lines are due to digitization errors. (d) Projection of errors to bedding poles on an upper‐hemisphere stereonet, showing the joint fit error range (solid red) overlapping the error spaces of high‐precision single planes (gray) at the intersection of error spaces for the constituent planes of the group (dashed red). The process of joint fitting is nearly the same as the single‐plane fitting procedure outlined in sections 3.2 and 3.3. The only difference in the joint fitting procedure relates to processing of the input data: prior to PCA regression, the data matrix D corresponding to each input point cloud (i.e., each bed in the group) is independently mean subtracted using equation (1). The resulting matrices are then stacked to form a single centered data matrix M. This combined representation contains orientation info for each bedding trace but discards information on the relative locations of the planes (Figure 8b). The orientation of the combined data matrix is regressed using PCA and error is modeled using the standard techniques presented in section 3.2. If the assumption of a shared bedding orientation is valid, this can vastly increase statistical power. This technique removes the need for certainty in the bed‐to‐bed correspondence of adjacent but discontinuous stratigraphic exposures, which is often difficult to determine. However, the method must be applied with care: it is only valid where the assumption of parallel bedding holds. For this reason, the combination of this method with views of decomposed variance and statistical error bounds is particularly powerful (e.g., Figure 8). Evaluation of misfit from the jointly fit plane can test the validity of the assumption of a shared stratigraphy: if errors are high and systematic, it is likely that the grouping does not consist of parallel beds. Joint fitting of planes can be valuable both for modeling the geometry of parallel bed sets and as an exploratory tool to evaluate whether stratigraphies conform to a parallel‐bedding assumption.

Method Demonstration and Performance

Orbital Imagery of the San Rafael Swell, Utah

The San Rafael Swell in eastern Utah, USA, is an ∼20 × 40‐km Paleocene Laramide anticline formed above a west dipping thrust fault in the subsurface that tilted the strata to nearly vertical, creating the imposing San Rafael “Reef” (Figure 9a). This structure is cored by a Jurassic stratigraphy including the distinctive, thick aeolian Navajo sandstone (Gilluly & Reeside Jr, 1928). The dramatic transect of Interstate 70 through the center of the structure, where strata are eroded away, makes the San Rafael Swell a world‐famous structural locale. At the eastern edge of the swell, eastward dips steepen from near flat to a maximum of ∼60° before shallowing outside of the reef (Figure 9). The simple fold pattern and well‐exposed stratigraphic layering provide an ideal setting to test the extraction of orientation errors from orbital or airborne data, allowing orientation recovery to be tested at a wide range of dips against data collected in situ.

Figure 9

(a) Physiographic context of the San Rafael Swell in southeast Utah, USA. (b) Cross section of the San Rafael Swell anticline (after ; Doelling et al., 2017) showing the asymmetric dips of strata across the structure. (c) Field‐measured bedding orientations (gray) from the San Rafael Desert geologic map (Doelling et al., 2017), nominal remotely sensed bedding orientations (black), and corresponding digitized bedding traces (red lines) atop a hillshade of the of a 5 m per pixel aerial photogrammetric DEM. Field‐measured and remotely sensed bedding orientations follow the same structural pattern. (d–f) Digitized bedding traces, remotely measured orientations, and field orientations atop orthorectified, coregistered Google Maps satellite data (accessed February 2018) for key areas. Remotely sensed orientations are underlain by an error ellipse with axial lengths corresponding to θ max and θ min, oriented along the maximum direction of error.

Data Sets

The map database accompanying the recently published geologic map of the San Rafael Desert (Doelling et al., 2017) provides bedding orientations from the structural map, which were measured in the field at outcrop scale using a compass clinometer. At regional scale, they outline the convex structure and N‐S axis of the swell (Figure 9c). A 5‐m ground‐sample‐distance DEM from the Utah Automated Geographic Reference Center was used as the elevation layer for digitized bedding traces. This DEM was created from autocorrelated 1‐m resolution stereo aerial imagery, using the SOCET Set software package. Elevation contours and a shaded‐relief map were generated from the DEM to inspect alignment and data fidelity. In general, the DEM is of high quality, with a few artifacts in high‐slope regions on the eastern side of steep hillsides where shadows lead to poor correlations. Locally, the data is significantly higher fidelity than the 10‐m resolution National Elevation Data Set (Gesch et al., 2014). Orthorectified, mosaicked, and georeferenced ∼25 cm per pixel satellite imagery from Google Maps was overlain on the DEM to guide feature digitization. Warping of the satellite imagery over a coarser‐resolution DEM resulted in lateral mismatch of up to 5 m between the two data sets in some steeply sloping areas. These poorly coregistered areas were avoided for feature extraction. Bedding traces were digitized atop the satellite imagery using QGIS. Outcrops were chosen to maximize the 3‐D structure of captured planes, and areas near field‐measured observations were targeted for direct comparison. Lengths of bedding traces range from 100 to 2500 m (median length 415 m). The longest traces are in low‐dipping strata in the western portion of the study area. Elevations were extracted along feature traces at 5‐m intervals to fully query the DEM. Digitized bedding traces are shown in Figure 9c. The attitude software package was used to conduct planar fitting; the resulting planes were evaluated for quality using the Orienteer application (see Appendix B). Planes were examined visually and quantitatively after PCA fits as described above. Those with large residuals (typically >10 m out‐of‐plane) were discarded, or remeasured if the blunder was due to obvious misdigitization (Figure 10). Sixty‐eight planes were retained. Since only planes with favorable exposure were measured, no grouping of beds was required to increase statistical power.

Figure 10

Axis‐aligned visualization of fit errors to illustrate filtering criteria for poor bedding traces during creation of the San Rafael Swell data set. (a) An accepted fit with relatively low out‐of‐plane scatter, defined over a significant length along both and . (b) A poor fit, with higher out‐of‐plane scatter and no definition along . This bedding trace was discarded from the data set.

Orbital and Field Data Comparison

Overall, the map pattern of remotely‐measured orientations mimics the large‐scale structural trend of steepening dips toward the eastern monocline of the swell (Figure 9c). Dip magnitudes are very close to those measured in the field. The direction and magnitude of errors are summarized as ellipses on the dip symbols. For the shallowest bedding, errors are extremely low (Figure 9d), while for the steepest measurements, errors are almost entirely in the dip direction (Figure 9e). Error magnitudes are small for low‐dipping strata and increase substantially with steeper dips. This is intuitive as the effects of DEM errors, poor registration of imagery, and digitizing errors will increase in rugged topography. Selected closely spaced in situ and remotely sensed orientation measurements paired for direct comparison (Figure 11) show that remotely sensed orientations typically closely match the in situ measurements, typically within error. Mismatch of a few degrees, especially in strike, can be explained by actual localized variation in bed orientation or slight measurement errors either in remote or in situ gathered data. One measurement, highlighted in red on Figure 11, has an unusually large mismatch with in situ data. This measurement, at the eastern margin of the swell immediately north of the I‐70 freeway, has a reported dip of 27°. We instead measure a dip, with error, of 10–20° and replicate this result with additional measurements of several closely spaced beds. The 27° dip is steeper than those immediately to the west, putting it at odds with the localized structural pattern of shallowing dips at the eastern edge of the swell. This suggests that the published dip measurement for this outcrop may be in error.

Figure 11

Upper‐hemisphere stereonet showing poles to bedding for pairs of closely spaced field‐measured (gray) and remotely measured (black) bedding orientations in the San Rafael Swell from Figure 9. Errors generally increase at steeper dips (toward the right). One literature field measurement (highlighted with a red ? and corresponding to the same symbol in Figure 9d) is steeper than all nearby remotely sensed dips and does not conform to the regional structural pattern, suggesting that it may be an error in the preparation of the geologic map. Although basic correspondence between the DEM and imagery was manually checked, no special processing or alignment was applied to the input data. A higher level of processing might increase the fidelity of the digital surface model, but this example demonstrates that reasonable planar orientations can be extracted from minimally processed, publicly available imagery data sets, especially when good exposure is available. The addition of error and its visualization to the analytical product enables much more flexibility in input data quality, as errors arising from poorly registered data or sloppy digitizing will be penalized by poor confidence metrics and readily recognized (e.g., Figure 10).

UAV Photogrammetry in the Naukluft Mountains, Namibia

The eastern face of the Naukluft mountains adjacent to Onis Farm (24.32°S, 16.23°E) contains mixed siliciclastic and carbonate strata above a regionally significant thrust fault (Rowe et al., 2012). Recent mapping and stratigraphic studies in the area identified a minimally deformed stratigraphic section of the Zebra Nappe above this basal thrust fault (Quinn & Grotzinger, 2016). UAV imagery gathered during this field study was processed using SfM photogrammetry into a coarse‐resolution digital outcrop model of a key hillside (Figure 12a). Assessing the quality of bedding orientation measurements from UAV data is of significant interest for terrestrial field geological studies (e.g., Cawood et al., 2017), and this multiview aerial data tests the functionality of our method with the off‐vertical errors and ad‐hoc photogrammetry that characterize UAV‐based surface model creation.

Figure 12

(a) UAV photograph (∼500‐m standoff) looking NW toward the cliffs at Onis Farm, Naukluft Mountains, Namibia. Digitized bedding traces (colored lines) and the locations of field‐measured orientations (colored squares) are superposed. Beds dip ∼30–45° degrees into the hillslope (away from viewer); 230 m of topographic relief is shown in the photo. (b) Digital surface model from UAV photogrammetry, viewed from slightly below the viewpoint of panel (a), with digitized bedding traces superposed. Bedding traces grouped for analysis are connected by dashed lines. Groups of bedding traces with similar properties are numbered 1–6; field‐measured orientations are lettered a–f. UAV = unmanned aerial vehicle. An 80‐m elevation range within the ∼300‐m cliff face at Onis Farm was chosen for this comparison, comprising the upper Ubisis Formation, the Tsams Formation, and the lower Lemoenputs Formation of the Zebra Nappe; field structural data was subset from a stratigraphic data set assembled for the entire cliff (Quinn & Grotzinger, 2016). Within the target elevation range, bedding orientation measurements were collected at six locations with a Brunton compass clinometer, and the GPS position and description of the measured bed were logged (Figure 12). The elevation of each measurement was determined after measurement by draping the georeferenced data atop an Advanced Land Observing Satellite global 15‐m resolution photogrammetric DEM, which was used as a regional topographic basemap. Outcrop images were acquired using a remotely piloted DJI Phantom 4 quadcopter UAV, from an altitude of ∼200 m above ground and ∼500‐ to 800‐m lateral standoff southeast of the target cliff. The aircraft was approximately level with the target stratigraphic interval (Figure 12a). UAV images were combined into a photogrammetric 3‐D model using the Agisoft Photoscan Professional v1.2 SfM software package (Figure 12b). The 3‐D model extends ∼1.5 km laterally along the cliff face and captures ∼400 m of relief on the east facing cliff; it was assembled with the “very high” quality setting and has ∼4 million constituent points and a horizontal resolution of ∼15 cm per pixel. In all dimensions, model precision varies within the scene depending on the stereo convergence geometry of individual image pairs. The stratigraphic interval studied contains two cliff faces with intervening float‐covered slopes; beds traceable in UAV imagery primarily occur on the cliffs. The traces of 14 bedding surfaces were digitized manually in Agisoft Photoscan atop oblique images registered to the 3‐D model (Figure 12). Agisoft Photoscan automatically drapes digitized bedding traces onto the surface model, creating a 3‐D point data set without an additional software package or conversion to a gridded DEM. Digitized bedding traces were exported as a dxf‐format file using the UTM Zone 33S coordinate system. The fiona Python module was used to read this data, and the attitude software package was used for planar fitting. To increase statistical power, four bedding traces were assumed parallel with nearby measurements and collapsed into two groups, yielding a final set of 12 distinct orientation measurements. An iPython notebook containing the analytical pipeline for this example is available as supplementary material to this publication.

UAV and Field Data Comparison

Field‐measured bedding orientations for the target stratigraphic interval range in strike from 225–245°, corresponding to dip azimuths of 315–335°. Dips range from 30° to 45° to the northwest (into the hillslope). Field‐measured orientations are lettered a–f, and sets of remotely sensed measurements are numbered 1–6 (Figures 12b and 13).

Figure 13

Comparison of field‐measured and UAV photogrammetric bedding orientations for the Onis cliffs, colorized by height as in Figure 12. In each panel, error spaces for individual remotely sensed measurements are shown as colored fields. Thin dotted lines show the error bounds of measurements prior to grouping. (a) Orthographic projection of bedding orientations, with the horizontal axis showing distance to the southeast, perpendicular to the strike of measured beds. Remotely measured beds are shown as residuals to their best fitting plane and overlain by hyperbolic error bounds. The recovery of dips into the hillslope by remotely sensed orientations is apparent. (b) Upper‐hemisphere oblique equal‐area stereonet showing NW‐dipping bedding girdles and pole error ellipses for remotely sensed and field measurements. Dotted lines represent the edges of error ellipses for components of grouped measurements. (c) Errors to poles of bedding, showing close correspondence with field measurements (squares) in most cases (see text). The maximum angular errors for each measurement are roughly oriented along the dip direction into the outcrop. The lowest‐elevation extracted bedding trace (1) follows a coarse sandstone bed across the nose of the hillslope. Its orientation is well constrained, with a maximum angular error of ∼5°, but significantly different from the field‐measured orientation of a siltstone bed ∼10 m stratigraphically below (a). This mismatch may result from an actual dip change due to slight folding across the lithologic boundary at the base of the cliff. The next intervals (2 and 3) contain five beds within a cliff‐forming dolomite unit; two of these measurements were grouped. The beds in 2 and 3 have error ellipses elongated in the dip direction, representing measurements well constrained on a single axis (roughly, their apparent dip in standoff imagery); their error spaces overlap that of (1), suggesting consistent bedding orientations for the entire lower cliff. Beds marked as 4 occur in a fine‐medium sandstone interval where stairstep beds are easily traced; these beds are individually well resolved and generally steeper than the beds of 2 and 3. These measurements closely correspond to field measurement b in dip but suggest an ∼5° more westward strike. Since several remotely sensed beds agree closely, this strike mismatch may be caused by a slight error in field measurement. Beds in 5 were measured in a second cliff‐forming dolomite interval, and extracted error distributions overlap the field measurement c within the same interval. This suggests that orientations were recovered within a few degrees. Field measurements d, e, and f were measured on a float‐covered slope of the Lemoenputs Formation with few traceable bedding planes. One somewhat resistant dolomite bed (6) can be traced on both sides of the hillslope but not over its nose. When grouped, these measurements outline a single plane dipping at ∼45° that corresponds closely in orientation to d, e, and f. Overall, bedding orientations extracted from the UAV data set correspond closely to field‐measured orientations, recording bedding dips 30–50° northwest (into the hillslope) and steepening with elevation. In general, strike is constrained to within a few degrees, while dips are constrained to within ∼5–15°. This error structure is consistent with the relatively stronger constraints on apparent dips along the cliff face than dips in and out of the cliff. Flights on a single side of a relatively planar outcrop entail little 3‐D structure with which to derive well‐constrained orientations. However, even with a relatively low‐resolution (15 cm per pixel) SfM photogrammetric elevation model, the crucial observation of beds steeply dipping into the outcrop is easily captured.

Potential Future Improvements to the Statistical Framework

Modeling Data With Different Error Structures

The statistical error bounds developed for unweighted PCA regression in section 3.3 are general and adaptable to a wide variety of data types. Different statistical frameworks can be substituted, and supplements to this statistical framework can be used to model errors for uncertain orientations using situation‐specific information as described below.

Adding a Noise Floor

PCA‐based regression is responsive to the scale of errors, but known errors in the input data are not automatically accounted for in the fitting process. If data input error is independently measured, a “noise floor” can be imposed that defines a minimum amount of noise expected for the input data set. This can be accomplished by conditionally replacing λ3 in equation (25) with a standard value for the minimum noise variance, to ensure that . For instance, if the accuracy of a point cloud is 1 m, as computed based on external criteria (e.g., the input stereo geometry of a gridded elevation model or measurement error for LIDAR or radar ranging), introducing a noise floor of into calculations of the noise covariance could correct for false certainty arising from possible local smoothing.

Rescaling Error Sensitivity

An advantage of the isotropic error framework of PCA is its flexibility: because coordinates are not fixed, the input data set can be rescaled along any axis. Different axial weightings can be a useful way to incorporate known errors on single‐axis parameters of the input data (e.g., photogrammetric image‐registration errors). It is often desirable to set error sensitivities separately based on informed criteria around data set‐specific error sources (Carroll & Ruppert, 1996). For instance, orbital photogrammetric DEMs might be tuned for chiefly vertical errors, while oblique SfM photogrammetry would be given higher sensitivity in the oblique view direction. While our current statistical framework treats errors along all axes equally, the software can be modified to fit different errors along each axis. The PCA framework can be limited to only vertical errors, mimicking OLS, or utilized in a variety of other weighted schemes (e.g., Francq & Govaerts, 2014; Friendly et al., 2013).

Applying Other Statistical Models

Numerical methods such as bootstrap resampling and Monte Carlo sensitivity analysis can be substituted for the asymptotic Gaussian and noise‐based statistical models described in this paper (section 3.3.4). Although these numerical methods are computationally intensive and difficult to generalize, they allow the incorporation of detailed assumptions about data set errors and can support situation‐specific, high‐quality errors in the fit parameters of the plane. Additionally, a variety of regression techniques with different assumptions, such as OLS and weighted schemes, can be substituted for PCA (section 3.1.1). No matter which statistical framework is used, the fitting and data‐visualization methods outlined in this paper can be used to represent the resulting uncertain planes.

The Link With Bingham Statistics

The Bingham statistical distribution is a generalized statistical distribution of undirected orientations defined in spherical space (Bingham, 1974). The core assumption of the Bingham framework is that for the axes of a distribution a, trace(a 2) = 1. Applying the Bingham transformation to a Cartesian set of error axes is functionally equivalent to finding the tangents to a hyperbolic error range. As such, our hyperbolic axes h can be transformed into the Bingham structural parameters κ 1 and κ 2 (Bingham, 1974; Onstott, 1980). When fully explored, the formal link between PCA regression and Bingham statistics will allow uncertain orientation measurements to be treated as probability density functions in spherical space. This will allow higher‐level statistical transforms to be applied to measurements, including combination using error‐propagation techniques and the application of statistical significance tests. Formalizing the conceptual link between Cartesian and Bingham statistics by future efforts may lead to expanded capabilities for this error‐analysis framework.

Conclusion and Recommendations

We have described a complete error‐analysis workflow for the orientation of geological planes, especially stratigraphic bedding, that improves on typical regression statistics for the assessment of geological planes. Our PCA‐based analytical approach includes a regression method, a mathematical framework for computing statistically‐based orientation errors, approaches for the 2‐D visualization and reporting of structural data with errors, and software to handle calculations and data management. As shown by the two terrestrial examples, these analytical procedures are generalized and flexible. They can be used to model the orientation of planes on map‐projected satellite and aerial imagery, as well as digital surface models built with LIDAR, UAV photogrammetry, and radar techniques. Application of the error analysis method in the San Rafael Swell successfully captures the structural pattern of this geological area. Relatively good conformance with in situ measurements was gained despite the use of off‐the‐shelf data products, reflecting the flexibility and wide applicability of this method to readily available nadir‐looking imagery and elevation data sets. Application of the method to oblique‐looking UAV data on the Naukluft plateau demonstrated the viability of PCA‐based orientation calculation in a reconnaissance study using high‐obliquity aerial imagery with relatively inexpensive equipment and commercial SfM photogrammetry software. The coupling of a robust error‐analysis framework with techniques to visualize the error space allows simple and transparent analytical workflows. Error‐minimizing data collection strategies can be easily compared, and heterogeneous data can be used with full knowledge of the errors involved. We propose a standardized five‐term method for numerical reporting of uncertain planar orientations, combining the basic strike/dip representation with terms for angular errors on two axes, and the rake of these error axes within the best fitting plane (section 2), and yielding (strike, dip, rake, min. angular error, and max. angular error) for each measurement. Additionally, we create intuitive stereonet display methods that provide a natural means to visualize uncertain planar orientations alongside traditional structural data. Overall, the results of this study suggest that errors arising from outcrop geometry are at least as important as precision of the input remote sensing data set in defining the error space of a fitted plane. Traces of geologic features can only be modeled as unique planes when they query a three‐dimensional point data set, and outcrops within the same data set can have completely different error structures. For characterization of orientations in an outcrop, we recommend that care be taken to find beds that sample a wide range of hillslope aspect or depth within an obliquely measured scene (or groups of closely spaced beds that collectively sample such a range). Furthermore, we suggest that digitizing precision is of subsidiary importance to collecting such a varied sample set: small errors in describing a fitted plane do not significantly diminish the quality of the fit relative to poor sampling of three‐dimensional outcrop geometry. Thus, measuring a large quantity of adjacent bed surfaces provides the best opportunity to remove poor measurements and group incomplete ones. We expect the methods described here to push the scale of geologic inference toward the resolution limit of 3‐D surface models, broadening the range of structural interpretations that can be made from remotely sensed imagery. This will increase the fidelity of structural measurements supported by UAVs and LIDAR scanners in terrestrial research, rover‐based cameras for in situ planetary exploration, and satellite data for regional planetary mapping. To that end, we release software to implement these methods and visualize strike and dip in Appendix B.

3 in total