| Literature DB >> 26890307 |
Jin Li1, Maggie Tran1, Justy Siwabessy1.
Abstract
Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and caution should be taken when applying filter FS methods in selecting predictive models.Entities:
Mesh:
Year: 2016 PMID: 26890307 PMCID: PMC4758710 DOI: 10.1371/journal.pone.0149089
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1a) Location of the study region in the eastern Joseph Bonaparte Gulf, northern Australian marine margin overlaid with bathymetry; b) location of the four study areas (A, B, C, and D) in the study region and seabed hardness types (hard, hard-soft, soft-hard and soft) based on hard90 overlaid with bathymetry at video transect; and c) the geomorphic features of the four study areas.
Predictive variables and their corresponding number.
| No. | Predictive variable | No. | Predictive variable |
|---|---|---|---|
| 1 | easting | 11 | tpi |
| 2 | northing | 12 | bs13 |
| 3 | prock | 13 | bs21 |
| 4 | bathy | 14 | bs25 |
| 5 | bathy.moran | 15 | bs27 |
| 6 | planar.curv | 16 | bs32 |
| 7 | profile.curv | 17 | bs35 |
| 8 | relief | 18 | homogeneity |
| 9 | slope | 19 | variance |
| 10 | surface | 20 | bs.moran |
Spearman correlation coefficients (ρ) among 20 predictive variables and seabed hardness (i.e. hard total) (n = 140).
| easting | northing | prock | bathy | bathy.moran | planar.curv | profile.curv | relief | slope | surface | tpi | bs13 | bs21 | bs25 | bs27 | bs32 | bs35 | homogeneity | variance | bs.moran | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| hard.total | -0.28 | 0.05 | 0.46 | -0.37 | 0.34 | -0.07 | -0.14 | 0.23 | 0.16 | 0.28 | -0.21 | 0.45 | 0.43 | 0.45 | 0.47 | 0.49 | 0.48 | 0.23 | -0.13 | 0.34 |
| easting | 1 | -0.85 | 0.36 | -0.53 | 0.08 | -0.01 | 0.01 | -0.3 | -0.25 | -0.29 | 0.07 | 0.4 | 0.43 | 0.4 | 0.38 | 0.36 | 0.4 | 0.31 | -0.27 | 0.04 |
| northing | -0.85 | 1 | -0.52 | 0.71 | -0.23 | 0.01 | -0.01 | 0.17 | 0.2 | 0.15 | -0.04 | -0.6 | -0.62 | -0.61 | -0.6 | -0.6 | -0.63 | -0.43 | 0.35 | -0.16 |
| prock | 0.36 | -0.52 | 1 | -0.81 | 0.48 | 0.06 | 0.03 | -0.13 | -0.17 | -0.08 | 0.03 | 0.78 | 0.79 | 0.78 | 0.77 | 0.74 | 0.78 | 0.41 | -0.3 | 0.45 |
| bathy | -0.53 | 0.71 | -0.81 | 1 | -0.26 | -0.05 | 0.03 | 0.18 | 0.22 | 0.14 | 0.02 | -0.88 | -0.89 | -0.88 | -0.88 | -0.83 | -0.87 | -0.53 | 0.42 | -0.31 |
| bathy.moran | 0.08 | -0.23 | 0.48 | -0.26 | 1 | -0.11 | 0.1 | 0.24 | 0.17 | 0.25 | 0.14 | 0.35 | 0.28 | 0.31 | 0.32 | 0.36 | 0.33 | 0.02 | 0.04 | 0.56 |
| planar.curv | -0.01 | 0.01 | 0.06 | -0.05 | -0.11 | 1 | -0.18 | -0.13 | -0.13 | -0.09 | -0.45 | 0.05 | 0.06 | 0.05 | 0.04 | 0.01 | 0.04 | 0.07 | -0.04 | -0.09 |
| profile.curv | 0.01 | -0.01 | 0.03 | 0.03 | 0.1 | -0.18 | 1 | 0.03 | -0.08 | -0.01 | 0.65 | 0 | -0.03 | -0.03 | -0.04 | -0.01 | -0.05 | -0.13 | 0.13 | 0.08 |
| relief | -0.3 | 0.17 | -0.13 | 0.18 | 0.24 | -0.13 | 0.03 | 1 | 0.88 | 0.92 | -0.06 | -0.04 | -0.06 | -0.02 | -0.01 | 0 | -0.04 | -0.33 | 0.44 | 0.19 |
| slope | -0.25 | 0.2 | -0.17 | 0.22 | 0.17 | -0.13 | -0.08 | 0.88 | 1 | 0.83 | -0.07 | -0.07 | -0.09 | -0.05 | -0.04 | -0.03 | -0.07 | -0.27 | 0.37 | 0.09 |
| surface | -0.29 | 0.15 | -0.08 | 0.14 | 0.25 | -0.09 | -0.01 | 0.92 | 0.83 | 1 | -0.08 | 0.02 | -0.02 | 0.03 | 0.05 | 0.07 | 0.02 | -0.29 | 0.43 | 0.15 |
| tpi | 0.07 | -0.04 | 0.03 | 0.02 | 0.14 | -0.45 | 0.65 | -0.06 | -0.07 | -0.08 | 1 | 0 | -0.03 | -0.03 | -0.03 | -0.01 | -0.05 | -0.13 | 0.08 | 0.16 |
| bs13 | 0.4 | -0.6 | 0.78 | -0.88 | 0.35 | 0.05 | 0 | -0.04 | -0.07 | 0.02 | 0 | 1 | 0.98 | 0.98 | 0.98 | 0.94 | 0.97 | 0.51 | -0.43 | 0.21 |
| bs21 | 0.43 | -0.62 | 0.79 | -0.89 | 0.28 | 0.06 | -0.03 | -0.06 | -0.09 | -0.02 | -0.03 | 0.98 | 1 | 0.99 | 0.98 | 0.92 | 0.96 | 0.52 | -0.44 | 0.19 |
| bs25 | 0.4 | -0.61 | 0.78 | -0.88 | 0.31 | 0.05 | -0.03 | -0.02 | -0.05 | 0.03 | -0.03 | 0.98 | 0.99 | 1 | 0.99 | 0.95 | 0.97 | 0.51 | -0.42 | 0.21 |
| bs27 | 0.38 | -0.6 | 0.77 | -0.88 | 0.32 | 0.04 | -0.04 | -0.01 | -0.04 | 0.05 | -0.03 | 0.98 | 0.98 | 0.99 | 1 | 0.96 | 0.98 | 0.51 | -0.42 | 0.22 |
| bs32 | 0.36 | -0.6 | 0.74 | -0.83 | 0.36 | 0.01 | -0.01 | 0 | -0.03 | 0.07 | -0.01 | 0.94 | 0.92 | 0.95 | 0.96 | 1 | 0.96 | 0.5 | -0.4 | 0.25 |
| bs35 | 0.4 | -0.63 | 0.78 | -0.87 | 0.33 | 0.04 | -0.05 | -0.04 | -0.07 | 0.02 | -0.05 | 0.97 | 0.96 | 0.97 | 0.98 | 0.96 | 1 | 0.54 | -0.44 | 0.22 |
| homogeneity | 0.31 | -0.43 | 0.41 | -0.53 | 0.02 | 0.07 | -0.13 | -0.33 | -0.27 | -0.29 | -0.13 | 0.51 | 0.52 | 0.51 | 0.51 | 0.5 | 0.54 | 1 | -0.85 | -0.06 |
| variance | -0.27 | 0.35 | -0.3 | 0.42 | 0.04 | -0.04 | 0.13 | 0.44 | 0.37 | 0.43 | 0.08 | -0.43 | -0.44 | -0.42 | -0.42 | -0.4 | -0.44 | -0.85 | 1 | 0.15 |
| bs.moran | 0.04 | -0.16 | 0.45 | -0.31 | 0.56 | -0.09 | 0.08 | 0.19 | 0.09 | 0.15 | 0.16 | 0.21 | 0.19 | 0.21 | 0.22 | 0.25 | 0.22 | -0.06 | 0.15 | 1 |
A brief summary of RF modelling process for hard90 data using various FS methods and predictive variables.
1) models 1–25 based on the VI using 20 variables; 2) models 26–29 based on the AVI using 20 variables; 3) models 30–31 based on KIAVI using 20 variables; 4) models 32–43 based on the AVI using 41 variables; and 5) models 44–45 based on the Boruta and model 46 based on the RRF using 41 variables. Model.fit is the predictive accuracy (ccr) of training samples by each RF model developed. The corresponding predictor for each number is listed in Table 1.
| Model | Modelling.process | Predictors | No.predictors | Model.fit |
|---|---|---|---|---|
| 1 | All 20 predictive variables | All 20 variables | 20 | 100 |
| 2 | model 1:—planar.curv | 1–5,7–20 | 19 | 100 |
| 3 | model 2:—surface | 1–5, 7–9, 11–20 | 18 | 100 |
| 4 | model 3:—slope | 1–5, 7–8, 11–20 | 17 | 100 |
| 5 | model 4:—relief | 1–5, 7, 11–20 | 16 | 100 |
| 6 | model 5:—bathy.moran | 1–4, 7, 11–20 | 15 | 100 |
| 7 | model 6:—profile.curv | 1–4, 11–20 | 14 | 100 |
| 8 | model 7:—bs.moran | 1–4, 11–19 | 13 | 100 |
| 9 | model 8:—bathy | 1–3, 11–19 | 12 | 100 |
| 10 | model 9:—homogeneity | 1–3, 11–17, 19 | 11 | 100 |
| 11 | model 10:—variance | 1–3, 11–17 | 10 | 100 |
| 12 | model 11:—northing | 1, 3, 11–17 | 9 | 100 |
| 13 | model 12:—tpi | 1, 3, 12–17 | 8 | 100 |
| 14 | model 13:—bs13 | 1, 3, 13–17 | 7 | 100 |
| 15 | model 14:—bs21 | 1, 3, 14–17 | 6 | 100 |
| 16 | model 15:—easting | 3, 14–17 | 5 | 100 |
| 17 | model 16:—bs32 | 3, 14–15, 17 | 4 | 100 |
| 18 | model 17:—bs27 | 3, 14, 17 | 3 | 96.43 |
| 19 | model 17:—bs25 | 3, 15, 17 | 3 | 96.43 |
| 20 | model 18:—bs25 | 3, 17 | 2 | 96.43 |
| 21 | model 20:—prock | 17 | 1 | 100 |
| 22 | model 20:—bs35 | 3 | 1 | 91.43 |
| 23 | model 14: + variance | 1, 3, 13–17, 19 | 8 | 100 |
| 24 | model 23: + surface | 1, 3, 10, 13–17, 19 | 9 | 100 |
| 25 | model 24: + relief | 1, 3, 8, 10, 13–17, 19 | 10 | 100 |
| 26 | Six most important predictors | 1, 10, 16–19 | 6 | 100 |
| 27 | model 26: + prock and bs27 | 1, 3, 10, 15–19 | 8 | 100 |
| 28 | model 27: + planar.curv | 1, 3, 6, 10, 15–19 | 9 | 100 |
| 29 | model 27:—prock | 1, 10, 15–19 | 7 | 100 |
| 30 | Combine model 24 and 27 | 1, 3, 10, 13–19 | 10 | 100 |
| 31 | model 30:—bs21 | 1, 3, 10, 14–19 | 9 | 100 |
| 32 | The 13 predictors most important | 1, 10, 18, 19, bs28-bs36 | 13 | 100 |
| 33 | model 32: + bs10 | 1, 10, 18, 19, bs10, bs28-bs36 | 14 | 100 |
| 34 | model 33: + planar.curv | 1, 6, 10, 18, 19, bs10, bs28-bs36 | 15 | 100 |
| 35 | model 34: + northing | 1, 2, 6, 10, 18, 19, bs10, bs28-bs36 | 16 | 100 |
| 36 | model 35: + prock, bs.moran, bs27 | 1–3, 6, 10, 18–20, bs10, 15, bs28-bs36 | 19 | 100 |
| 37 | model 36:—bs27 | 1–3, 6, 10, 18–20, bs10, bs28-bs36 | 18 | 100 |
| 38 | model 37:—bs.moran | 1–3, 6, 10, 18, 19, bs10, bs28-bs36 | 17 | 100 |
| 39 | model 37:—prock | 1, 2, 6, 10, 19–20, bs10, bs28-bs36 | 17 | 100 |
| 40 | model 35:—planar.curv | 1, 2, 10, 18, 19, bs10, bs28-bs36 | 15 | 100 |
| 41 | model 40:—northing | 1, 10, 18–19, bs10, bs28-bs36 | 14 | 100 |
| 42 | model 41:—bs34 | 1, 2, 10, 17–19, bs10, bs28-bs33, bs36 | 14 | 100 |
| 43 | All 41 predictors | All 41 predictors | 41 | 100 |
| 44 | 30 predictors | 1–3, 7, 11, 20, bs12, bs14:bs36 | 30 | 100 |
| 45 | model 44: +bs13 | 1–3, 7, 11, 20, bs12:bs36 | 31 | 100 |
| 46 | 31 predictors | 1–3, 5, 7–11, 18–20, bs15, bs17-bs18, bs20-bs25, bs27-bs36 | 31 | 100 |
Fig 2Correct classification rate (%) and kappa (mean: black line; minimum and maximum: dash red lines) of 43 RF models with different predictor sets based on the averages over 100 iterations of 10-fold cross validation for seabed hardness based on hard90 data; and the model with the maximum mean ccr and mean kappa (circle).
a) models 1–25 based on the VI using 20 predictive variables; b) models 26–29 based on the AVI and models 30–31 based on KIAVI using 20 variables; c) models 32–43 based on the AVI using 41 variables.
Confusion matrix between the observed and predicted values of four hardness classes based on the average of 100 times of 10-fold cross validation using the most accurate predictive model (i.e., model 40) for hard90.
| Observed | |||||||
|---|---|---|---|---|---|---|---|
| Hard | Hardsoft | Softhard | Soft | Total | User's accuracy | ||
| Predicted | Hard | 4 | 1 | 0.42 | 0 | 5.42 | 73.80 |
| Hardsoft | 0 | 6.84 | 0.86 | 1.89 | 9.59 | 71.32 | |
| Softhard | 0 | 0 | 5.87 | 0.13 | 6 | 97.83 | |
| Soft | 2 | 6.16 | 1.85 | 108.98 | 118.99 | 91.59 | |
| Total | 6 | 14 | 9 | 111 | 140 | ||
| Producer's Accuracy | 66.67 | 48.86 | 65.22 | 98.18 | 89.78 | ||
Confusion matrix between the observed and predicted values of two hardness classes based on the average of 100 times of 10-fold cross validation using the most accurate predictive model (i.e., model 40) for hard90.
| Observed | |||||
|---|---|---|---|---|---|
| Hard | Soft | Total | User's accuracy | ||
| Predicted | Hard | 18.99 | 2.02 | 21.01 | 90.39 |
| Soft | 10.01 | 108.98 | 118.99 | 91.59 | |
| Total | 29 | 111 | 140 | ||
| Producer's Accuracy | 65.48 | 98.18 | 91.41 | ||
A brief summary of RF modelling process for hard70 data using various FS methods and predictive variables.
1) models 1–25 based on the AVI using 20 variables; 2) models 26–38 based on the AVI using 41 variables; 3) models 39–49 based on KIAVI using 41 variables; and 4) models 50–52 based on the Boruta with the maximal number of importance source runs of 2000, 100 and 5000, and model 53 based on the RRF using 41 variables. The model fit is the predictive accuracy (ccr) of training samples by each RF model developed. The corresponding predictor for each number is listed in Table 1.
| Model | Modelling.process | Predictors | No.predictors | Model.fit |
|---|---|---|---|---|
| 1 | All 20 predictive variables | All 20 variables | 20 | 100 |
| 2 | model 1: -relief | 1–7, 9–20 | 19 | 100 |
| 3 | model 2: -northing | 1, 3–7, 9–20 | 18 | 100 |
| 4 | model 3: -bs13 | 1, 3–7, 9–11, 13–20 | 17 | 100 |
| 5 | model 4: -bs27 | 1, 3–7, 9–11, 13–14, 16–20 | 16 | 100 |
| 6 | model 5: -slope | 1, 3, 4–7, 10–11, 13–14, 16–20 | 15 | 100 |
| 7 | model 6: -bs25 | 1, 3–7, 10–11, 13, 16–20 | 14 | 100 |
| 8 | model 7: -bs21 | 1, 3–7, 10–11, 16–20 | 13 | 100 |
| 9 | model 8: -bathy.moran | 1, 3–4, 6–7, 10–11, 16–20 | 12 | 100 |
| 10 | model 9: -surface | 1, 3–4, 6–7, 11, 16–20 | 11 | 100 |
| 11 | model 10: -bathy | 1, 3, 6–7, 11, 16–20 | 10 | 100 |
| 12 | model 11: -tpi | 1, 3, 6–7, 16–20 | 9 | 100 |
| 13 | model 12: -bs35 | 1, 3, 6–7, 16, 18–20 | 8 | 100 |
| 14 | model 13: -variance | 1, 3, 6–7, 16, 18, 20 | 7 | 100 |
| 15 | model 14: -bs.moran | 1, 3, 6–7, 16, 18 | 6 | 100 |
| 16 | model 15: -bs32 | 1, 3, 6–7, 18 | 5 | 100 |
| 17 | model 16: -easting | 3, 6–7, 18 | 4 | 100 |
| 18 | model 17: -profile.curv | 3, 6, 18 | 3 | 96.43 |
| 19 | model 18: -homogeneity | 3, 6 | 2 | 95 |
| 20 | model 19: -planar.curv | 3 | 1 | 91.43 |
| 21 | model 11: +bs21 | 1, 3, 6–7, 11, 13, 16–20 | 11 | 100 |
| 22 | model 21: +bathy.moran | 1, 3, 5–7, 11, 13, 16–20 | 12 | 100 |
| 23 | model 21: -profile.curv | 1, 3, 6, 11, 13, 16–20 | 10 | 100 |
| 24 | model 21: -bs.moran | 1, 3, 6–7, 11, 13, 16–19 | 10 | 100 |
| 25 | model 24: -variance | 1, 3, 6, 7, 11, 18, 13, 16, 17 | 9 | 100 |
| 26 | All 41 predictors | 1–11, 18–20, bs10-bs36 | 41 | 100 |
| 27 | model 26:—bs25, bs10:bs14 | 1–11, 18–20, bs15-bs36 | 35 | 100 |
| 28 | model 27:—bathy, relief, bs24,bs26, bs27 | 1–3, 5–7, 9–11, 18–20, bs15-bs23, bs28-bs36 | 30 | 100 |
| 29 | model 28:—bs20, bs29, bs30, bs36 | 1–3, 5–7, 9–11, 18–20, bs15-bs19, bs21-bs23, bs28, bs31-bs35 | 26 | 100 |
| 30 | model 29:—nothing and bs28 | 1, 3, 5–7, 9–11, 18–20, bs15-bs19, bs21-bs23, bs31-bs35 | 24 | 100 |
| 31 | model 30:—slope and bs34 | 1, 3, 5–7, 10, 11, 18–20, bs15 to bs19, bs21-bs23, bs31-bs33, 17 | 22 | 100 |
| 32 | model 31-—bs15, bs16, bs19, bs21, bs32, bs33 and bs 35 | 1, 3, 5–7, 10, 11, 18–20, bs17, bs18, bs22, bs23, bs31 | 15 | 100 |
| 33 | model 32-—bathy.moran, surface, bs17 and bs22 | 1, 3, 6–7, 11, 18–20, bs18, bs23, bs31 | 11 | 100 |
| 34 | model 33-—homogeneity | 1, 3, 6–7, 11, 19–20, bs18, bs23, bs31 | 10 | 100 |
| 35 | model 34-—tpi | 1, 3, 6–7, 19–20, bs18, bs23, bs31 | 9 | 100 |
| 36 | model 35-—bs23 | 1, 3, 6–7, 19–20, bs18, bs31 | 8 | 100 |
| 37 | model 36-—bs18 | 1, 3, 6–7, 19–20, bs31 | 7 | 100 |
| 38 | model 37-—bs.moran | 1, 3, 6–7, 19, bs31 | 6 | 100 |
| 39 | Variables for model 24, 33, and model 40 for hard90 | 1–3, 6–7, 10–11, 13, 18–20, bs10, bs18, bs23, bs28-bs36 | 23 | 100 |
| 40 | model 39-—bs10 | 1–3, 6–7, 10–11, 13, 18–20,bs18, bs23, bs28-bs36 | 22 | 100 |
| 41 | model 40-—bs28, bs30 abd bs36 | 1–3, 6–7, 10–11, 13, 18–20, bs18, bs23, bs29, bs31-17 | 19 | 100 |
| 42 | model 41-—northing | 1, 3, 6–7, 10–11, 13, 18–20, bs18, bs23, bs29, bs31-17 | 18 | 100 |
| 43 | model 42-—bs35 | 1, 3, 6–7, 10–11, 13, 18–20, bs18, bs23, bs29, bs31-bs34 | 17 | 100 |
| 44 | model 43-—bs29 and bs34 | 1, 3, 6–7, 10–11, 13, 18–20,bs18, bs23, bs31-bs33 | 15 | 100 |
| 45 | model 44-—bs21 | 1, 3, 6–7, 10–11, 18–20,bs18, bs23, bs31-bs33 | 14 | 100 |
| 46 | model 45-—bs33 | 1, 3, 6–7, 10–11, 16, 18–20,bs18, bs23, bs31 | 13 | 100 |
| 47 | model 46-—surface | 1, 3, 6–7, 11, 16, 18–20,bs18, bs23, bs31 | 12 | 100 |
| 48 | model 47-—tpi | 1, 3, 6–7, 16, 18–20,bs18, bs23, bs31 | 11 | 100 |
| 49 | model 47-—tpi and bs31 | 1, 3, 6–7, 16, 18–20, bs18, bs23 | 10 | 100 |
| 50 | 27 predictors | 1, 3, 7, 11, bs12, bs13, bs16-bs36 | 27 | 100 |
| 51 | model 50-—easting, bs12 | 2–3, 7, 11, 20, bs13, bs16-bs36 | 25 | 100 |
| 52 | model 50- +bs10, bs14 | 1, 3, 7, 11, bs10, bs12-bs14, bs16-bs36 | 29 | 100 |
| 53 | 31 predictors | 1–3, 5, 7–11, 18–20, bs15, bs17-bs18, bs20-bs25, bs27-bs36 | 31 | 100 |
Fig 3Correct classification rate (%) and kappa (mean: black line; minimum and maximum: dash red lines) of 49 RF models with different predictor sets based on the averages over 100 iterations of 10-fold cross validation for seabed hardness based on hard70 data; and the model with the maximum mean ccr and mean kappa (circle).
a) models 1–25 based on the AVI using 20 predictive variables; b) models 26–38 based on the AVI using 41 variables; c) models 39–49 based on KIAVI using 41 variables.
Confusion matrix between the observed and predicted values of four hardness classes based on the average of 100 times of 10-fold cross validation using the most accurate predictive model (i.e., model 50) for hard70.
| Observed | |||||||
|---|---|---|---|---|---|---|---|
| Hard | Hardsoft | Softhard | Soft | Total | User's accuracy | ||
| Predicted | Hard | 4 | 1.22 | 0.01 | 2.08 | 7.31 | 54.72 |
| Hardsoft | 1.02 | 4.22 | 0 | 1.24 | 6.48 | 65.12 | |
| Softhard | 0 | 0.3 | 3 | 0.04 | 3.34 | 89.82 | |
| Soft | 3.98 | 5.26 | 2.99 | 110.64 | 122.87 | 90.05 | |
| Total | 9 | 11 | 6 | 114 | 140 | ||
| Producer's Accuracy | 44.44 | 38.36 | 50.00 | 97.05 | 87.04 | ||
Confusion matrix between the observed and predicted values of two hardness classes based on the average of 100 times of 10-fold cross validation using the most accurate predictive model (i.e., model 50) for hard70.
| Observed | |||||
|---|---|---|---|---|---|
| Hard | Soft | Total | User's accuracy | ||
| Predicted | Hard | 13.77 | 3.36 | 17.13 | 80.39 |
| Soft | 12.23 | 110.64 | 122.87 | 90.05 | |
| Total | 26 | 114 | 140 | ||
| Producer's Accuracy | 52.96 | 97.05 | 88.86 | ||
Comparison of the accuracy of full models (i.e. model 43 for hard90, and models 26 for hard70) with the most accurate models based various FS methods.
The differences between these comparisons based on the Mann-Whitney tests (n = 100 for each model).
| Data | Model | FS method | ccr | kappa | ||||
|---|---|---|---|---|---|---|---|---|
| ccr | kappa | |||||||
| Hard90 | 43 | 41 variables | 84.97 | 0.4973 | ||||
| 1 | Filter (20 variables) | 84.62 | 1 vs. 43 | 0.0000 | 0.4852 | 1 vs. 43 | 0.0000 | |
| 24 | VI & filter | 88.53 | 24 vs. 1 | 0.0000 | 0.6449 | 24 vs. 1 | 0.0000 | |
| 27 | AVI & filter | 88.51 | 27 vs. 1 | 0.0000 | 0.6305 | 27 vs. 1 | 0.0000 | |
| 30 | KIAVI | 88.11 | 30 vs. 1 | 0.0000 | 0.6278 | 30 vs. 1 | 0.0000 | |
| 40 | AVI | 89.78 | 40 vs. 43 | 0.0000 | 0.6753 | 40 vs. 43 | 0.0000 | |
| 45 | Boruta | 85.83 | 44 vs. 43 | 0.0000 | 0.5301 | 44 vs. 43 | 0.0000 | |
| 46 | GRF | 85.28 | 46 vs. 43 | 0.0010 | 0.5078 | 46 vs. 43 | 0.0013 | |
| Hard70 | 26 | 41 variables | 84.71 | 0.4421 | ||||
| 1 | Filter | 84.09 | 1 vs. 26 | 0.0000 | 0.4116 | 1 vs. 26 | 0.0000 | |
| 24 | AVI & Filter | 86.27 | 24 vs. 1 | 0.0000 | 0.4905 | 24 vs. 1 | 0.0000 | |
| 33 | AVI | 86.52 | 33 vs. 26 | 0.0000 | 0.4976 | 33 vs. 26 | 0.0000 | |
| 47 | KIAVI | 86.36 | 47 vs. 26 | 0.0000 | 0.4927 | 47 vs. 26 | 0.0000 | |
| 50 | Boruta | 87.04 | 50 vs. 26 | 0.0000 | 0.5328 | 50 vs. 26 | 0.0000 | |
| 53 | GRF | 85.14 | 53 vs. 26 | 0.0002 | 0.4533 | 53 vs. 26 | 0.0028 | |
Fig 4Correct classification rate (%) (a) and kappa (b) of the most accurate models based on the averages over 100 iterations of 10-fold cross validation for hard90 and hard70 data.
Comparison of the accuracy of the most accurate models (i.e. model 40 for hard90 and model 50 for hard70) with the most accurate models based various FS techniques, and also model 40 with model 50.
The differences between these comparisons based on the Mann-Whitney tests (n = 100 for each model).
| FS method Hard90 | Models | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Developed | Model | 24 | 27 | 30 | 45 | 46 | 24 | 27 | 30 | 45 | 46 | |
| VI & filter | 25 | 24 | ||||||||||
| AVI & filter | 4 | 27 | 0.8881 | 0.0010 | ||||||||
| KIAVI | 2+25+4 | 30 | 0.0003 | 0.0006 | 0.0000 | 0.4450 | ||||||
| Boruta | 2 | 45 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | ||||
| GRF | 1 | 46 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | ||
| AVI | 11 | 40 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 |
| Hard70 | Model | 24 | 33 | 47 | 53 | 24 | 33 | 47 | 53 | |||
| AVI & filter | 25 | 24 | ||||||||||
| AVI | 13 | 33 | 0.1079 | 0.3001 | ||||||||
| KIAVI | 11+25+13 | 47 | 0.6253 | 0.1820 | 0.9357 | 0.2328 | ||||||
| GRF | 1 | 53 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | ||||
| Boruta | 3 | 50 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | 0.0000 | ||
| Hard90/Hard70 | Model | 40 | 40 | |||||||||
| 50 | 0.0000 | 0.0000 | ||||||||||
Confusion matrix between predictions for individual classes based on hard90 and hard70 data for all study areas and for a portion of area A (A1).
| All four areas- Hard70 (17,418,262 grid cells) | Correctly matched by hard70 (%) | ||||||
| Hard | Hard-soft | Soft-hard | Soft | Total | |||
| Hard90 | Hard | 0.85 | 0.23 | 0.00 | 0.09 | 1.17 | 72.60 |
| Hard-soft | 1.26 | 3.68 | 0.10 | 2.06 | 7.12 | 51.77 | |
| Soft-hard | 0.04 | 0.13 | 2.65 | 3.12 | 5.95 | 44.60 | |
| Soft | 0.41 | 0.06 | 0.17 | 85.12 | 85.77 | 99.25 | |
| Total | 2.57 | 4.10 | 2.93 | 90.41 | 100.00 | ||
| Correctly matched by hard90 (%) | 33.15 | 89.91 | 90.56 | 94.16 | 92.31 | ||
| Area A1- Hard70 (3,083,153 grid cells) | Correctly matched by hard70 (%) | ||||||
| Hard | Hard-soft | Soft-hard | Soft | Total | |||
| Hard90 | Hard | 2.01 | 0.12 | 0 | 0.46 | 2.59 | 77.78 |
| Hard-soft | 5.24 | 14.82 | 0.29 | 6.66 | 27.01 | 54.85 | |
| Soft-hard | 0 | 0.14 | 2.45 | 7.17 | 9.76 | 25.09 | |
| Soft | 1.62 | 0.01 | 0.21 | 58.8 | 60.64 | 96.96 | |
| Total | 8.87 | 15.09 | 2.95 | 73.09 | 100 | ||
| Correctly matched by hard90 (%) | 22.66 | 98.22 | 82.90 | 80.45 | 78.07 | ||
Fig 5Spatial predictions of seabed hardness for a section of area A (A1): a) hard90, b) hard70, c) hardness with two classes, and d) geomorphic features.