Thomas Fischer1. 1. Brandenburg University of Technology Cottbus-Senftenberg, Central Analytical Laboratory, Konrad-Wachsmann-Allee 6, 03046 Cottbus, Germany.
Abstract
It was the aim of the method development to classify types of various biological soil crusts (biocrusts) using principle component analysis (PCA) on multispectral images. To address this aim, visible (RGB) and NIR images of bare sandy soil, algal and moss biocrusts were registered, per channel reflection values were determined using a calibration color chart on a pixel basis, and a PCA was performed on the unfolded RGB-NIR reflectance hypercubes (i.e. three-dimensional hypercubes were transformed into x.y × λ 2D-matrices with λ channels serving as variables for PCA). The classification approach was based on the hypothesis that biocrust types map specifically in PCA ordination plots, meaning that distinct regions in ordination plots may be assigned specifically to individual biocrust types. Reallocation of the pixels assigned to biocrust types to their respective image coordinates would then yield biocrust classification plots. •Allows manual selection of features or identification of given features in PCA ordination plots.•Fully permits the selection of relevant and omission of irrelevant, as well as identification of unknown classes.•It is not restricted to RGB-NIR multispectral data only, but may be applied to any type of multimodal imaging data.
It was the aim of the method development to classify types of various biological soil crusts (biocrusts) using principle component analysis (PCA) on multispectral images. To address this aim, visible (RGB) and NIR images of bare sandy soil, algal and moss biocrusts were registered, per channel reflection values were determined using a calibration color chart on a pixel basis, and a PCA was performed on the unfolded RGB-NIR reflectance hypercubes (i.e. three-dimensional hypercubes were transformed into x.y × λ 2D-matrices with λ channels serving as variables for PCA). The classification approach was based on the hypothesis that biocrust types map specifically in PCA ordination plots, meaning that distinct regions in ordination plots may be assigned specifically to individual biocrust types. Reallocation of the pixels assigned to biocrust types to their respective image coordinates would then yield biocrust classification plots. •Allows manual selection of features or identification of given features in PCA ordination plots.•Fully permits the selection of relevant and omission of irrelevant, as well as identification of unknown classes.•It is not restricted to RGB-NIR multispectral data only, but may be applied to any type of multimodal imaging data.
Entities:
Keywords:
Classification; Identification; Multimodal imaging; PCA-based supervised identification of biological soil crusts in multispectral images
Specifications TableAgricultural and Biological Sciences
Method details
The multispectral approach in remote sensing typically includes the estimation of spectral indices. For biocrusts, the normalized difference vegetation index (NDVI [1,2]) has been proposed; however, high NDVI values of wet biocrusts may be misinterpreted as vascular plant vegetation dynamics whereas dry biocrusts only gained negligible NDVI values [3]. To overcome this disadvantage, several more specific spectral indices have been proposed for biocrusts, like the crust index (CI, respectively [4]), the brightness index (BI [5]) or the biological soil crust index (BSCI [6]), where biocrusts are characterized by typical ranges of respective index values. All spectral indices well reflect crust activity or biomass but are not specific enough to differentiate between different crust types, like lichen, cyanobacterial, algal or moss biocrusts. Rodriguez-Caballero et al. [7] propose support vector machines (SVM) for supervised classification of biocrusts from hyperspectral remote sensing data. However, a common problem with linear spectral mixture analysis (SMA) remains when the number of spectral endmembers is greater than the number actually required to unmix an individual pixel in the scene [3]. Non-spectral unsupervised principal component analysis (PCA) classification of biocrusts has indicated that the development of the microbial community was affected at multiple scales, including biocrust successional stage, seasonal effect and the micro-geomorphology [8]. High-resolution VIS-NIR spectroscopy was employed to study the influence of wetting on cyanolichen-dominated biocrusts in a non-imaging approach [7]. There was no attempt so far in the literature to use multispectral PCA classification of biocrusts in high-resolution images.The method proposed refers to Rodarmel and Shan [9] who suggested principal component analysis for preprocessing of hyperspectral images. Based on their finding that only the first few principal component image bands contain significant information, this study aimed at elucidating the feasibility of biocrust classification by manual selection of spectral features in PCA ordination plots.The study site was a catena from the mobile part of an inland dune to dry acidic grassland dominated by Corynephorus canescens and located near Lieberose, Brandenburg, northeast Germany (51°55′49″N, 14°22′22″E). A detailed description of the sampling site is given by [10]. The samples represented the sandy substrate, an algal biocrust dominated by Zygogonium ericetorum, a moss crust dominated by Polytrichum piliferum, and a mixed biocrust composed of both Z. ericetorum and P. piliferum (Fig. 1).
Fig. 1
RGB (left) and NIR (right) images used for biocrust classification. A – Polytrichum-dominated moss biocrust, B – mixed biocrust, C – bare sandy soil, D – Zygogonium-dominated algal biocrust selected.
RGB (left) and NIR (right) images used for biocrust classification. A – Polytrichum-dominated moss biocrust, B – mixed biocrust, C – bare sandy soil, D – Zygogonium-dominated algal biocrust selected.
Method workflow
Load packages gatepoints and jpeg of the R software suite [11]. Define two functions to plot image objects (plot_jpeg) and to reallocate the result of classification to image coordinates (plot_PCA). The R code for the two functions is listed in the additional information section.Load the RGB and the respective registered NIR image files into the workspace and store the xy-dimensions in variable „res“, where imageNIR-reg.jpg and imageRGB.jpg denote the calibrated and registered NIR and RGB image files.imgRGB.stat <- readJPEG("imageRGB.jpg") # load RGB imageimgNIR.stat <- readJPEG("imageNIR-reg.jpg") # load NIR imageres = dim(imgRGB.stat)[1:2]Merge image objects to hypercube denoted as „imgMult“. The number of spectral channels (the z-dimension) of the hypercube equals sum of the spectral channels of the RGB and NIR images. Unfold hypercube to object „imgMult.1d“.imgMult <- array(c(imgRGB.stat[, , 1:3], imgNIR.stat[, , 1:3]), dim = c(res[1], res[2], (dim(imgRGB.stat)[3] + dim(imgNIR.stat)[3])))imgMult.1d <- array(imgMult, dim = c(res[1]*res[2], (dim(imgRGB.stat)[3] + dim(imgNIR.stat)[3])))Run PCA. For the given example images, the first two components explained 9615% of the total variance.img.PCA <- prcomp(imgMult.1d, scale. = T)Create empty dataframe for selection results (kk2) and PC1 vs. PC2 density plot. Each coordinate in Fig. 2 represents a single spectrum, where clusters of high pixel density denote objects with similar spectral features.
Fig. 2
PCA ordination density plot of the first and second components of unfolded hypercube.
kk2 <- data.frame(img.PCA$x[,1], img.PCA$x[,2])smoothScatter(img.PCA$x[,1], img.PCA$x[,2])PCA ordination density plot of the first and second components of unfolded hypercube.Select spectral features by outlining a polygon with the mouse pointer. The function „fhs “(freehand select) of the gatepoints package allows selection of data points by outlining polygons in two-dimensional scatterplots. In the present method, each data point represents spectrally identical pixels in a PC1 vs. PC2 density plot, where data points inside the polygon were denoted as „T “(TRUE) and all other data points as „F “(FALSE). Close selection with click on right mouse button and plot selection result using the previously defined plot_PCA function (see step 1 of this section), which generates binarized plots with selected features being represented in white color and all other in black. Fig. 3 shows the selection of the data points representing the bare sandy soil, algae and mosses.
Fig. 3
The data points outlined by yellow, dark and bright green polygons (left) represent pixels of bare sandy soil, as well as algae and mosses in the biocrusts, respectively (right).
selectedPoints <- fhs(kk2, names = F, pch = NA_integer_)plot_PCA(selectedPoints)The data points outlined by yellow, dark and bright green polygons (left) represent pixels of bare sandy soil, as well as algae and mosses in the biocrusts, respectively (right).
Hint 1
Step 6 may be repeated multiple times and allows the simultaneous presentation of multiple classes, for example in combined color images of biocrust types (use, for example, selectedPoints1, selectedPoints2 etc. as fhs output for multiple polygons, Fig. 4).
Fig. 4
Biocrust classification image. Red – bare sandy soil, green – algae, blue – mosses.
Biocrust classification image. Red – bare sandy soil, green – algae, blue – mosses.
Hint 2
Situations may arise when it is necessary to identify the area where objects with given spectral features are represented in the ordination plot. The „predict “function of R may be deployed in this case with steps 2 and 3 of the method workflow being performed for an image containing the desired objects only. Fig. 5 depicts the result of the projection of pixels representing bare sandy soil only onto the PCA space generated in step 4. The prediction results were stored in the variable “img.new”.
Fig. 5
PCA ordination density plot of unfolded hypercube of all biocrust types (left) and of the bare sandy soil only (right).
PCA ordination density plot of unfolded hypercube of all biocrust types (left) and of the bare sandy soil only (right).img.new <- predict(img.PCA, newdata = bareSand.1d)smoothScatter(img.PCA$x[,1], img.PCA$x[,2], xlim = c(-10,8), ylim = c(-15,5)) # Fig. 5 leftsmoothScatter(img.new[,1], img.new[,2], colramp = colorRampPalette(c("white", "yellow3")),xlim = c(-10,8), ylim = c(-15,5)) # Fig. 5 right
Hint 3
The binarized plots of biocrust types may be used for further image analysis, like calculation of coverages (Fig. 7) or for geostatistical analyses.
Fig. 7
Combined biocrust type image (A) with mosses (green), algae (red) and grass (blue); NDVI images of mosses (B), algae (C) and grass (D). The PCA ordination plot, data points of training spots (see Fig. 6) and the polygon for manual feature selection for each biocrust type are shown in the lower right corners. Coverages were estimated according to HINT 3. Trampled areas and color checker were disregarded for analysis. A high-resolution version of the image is available as eSlide.
Example
Provided that the normalized difference vegetation index (NDVI) correlates with net primary production [12], the proposed approach may be used to estimate the relative contribution of biocrust types to photosynthesis. Fig. 6 depicts an example spot covered with an algae, mosses and some higher vegetation (grass), as well as the NDVI image of that spot.
Fig. 6
RGB (left) and NDVI images (right) of an example spot covered with algae, mosses and grass. The training spots for projection of spectral features onto the PCA space (see HINT 2) are outlined as A (mosses), B (algae) and C (grass). A high-resolution version of the image is available as eSlide.
RGB (left) and NDVI images (right) of an example spot covered with algae, mosses and grass. The training spots for projection of spectral features onto the PCA space (see HINT 2) are outlined as A (mosses), B (algae) and C (grass). A high-resolution version of the image is available as eSlide.Combined biocrust type image (A) with mosses (green), algae (red) and grass (blue); NDVI images of mosses (B), algae (C) and grass (D). The PCA ordination plot, data points of training spots (see Fig. 6) and the polygon for manual feature selection for each biocrust type are shown in the lower right corners. Coverages were estimated according to HINT 3. Trampled areas and color checker were disregarded for analysis. A high-resolution version of the image is available as eSlide.The NDVI image was subsequently masked with binary images of mosses, algae and grass, where the pixel values of 255 were set to full transparency (Fig. 7). Then, the pixel value histogram of the resulting combined images would represent the distribution of net primary production for each biocrust type (Fig. 8).
Fig. 8
NDVI histograms of mosses, algae and grass estimated from Fig. 7B–D, respectively. Relative histogram areas represent the contribution of individual biocrust types to total photosynthesis. In the given example, mosses, algae and grass contributed with 21.4, 62.5 and 16.1%, respectively, to total photosynthesis.
NDVI histograms of mosses, algae and grass estimated from Fig. 7B–D, respectively. Relative histogram areas represent the contribution of individual biocrust types to total photosynthesis. In the given example, mosses, algae and grass contributed with 21.4, 62.5 and 16.1%, respectively, to total photosynthesis.
Subject Area:
Agricultural and Biological Sciences
More specific subject area:
Classification of biological soil crust types
Method name:
PCA-based Supervised Identification of Biological Soil Crusts in Multispectral Images
Name and reference of original method:
C. Rodarmel, J. Shan, Principal component analysis for hyperspectral image classification, Surveying and Land Information Science. 62 (2002) 115–122.
Resource availability:
example image files provided as supplementary material