| Literature DB >> 35397182 |
Diana T White1, Thibaud M Antoniou1, Jonathan M Martin1, William Kmetz2, Michael R Twiss2.
Abstract
Myriophyllum spicatum, more commonly known as Eurasian watermilfoil (EWM), is one of the most invasive aquatic plants in North America, causing negative ecological and economic impacts in ecosystems where it proliferates. Many control strategies have been developed and implemented to mitigate EWM growth and spread, although the results are mixed and there is no consensus on lake-specific strategies. Here, we describe the development of a predictive model using a support vector technique, that predicts the success of biological pest control using Euhrychiopsis lecontei (the milfoil weevil), a milfoil specialist, to reduce EWM in lakes. Such a model is informed by lake characteristics (limnological and landscape) and augmentation strategies. To develop our predictive model, we performed a metadata analysis from 133 published peer-reviewed literature and professional reports of milfoil weevil augmentation field experiments that contained information on lake characteristics. The predictive model's algorithm uses a support vector machine (SMV) to learn patterns among lake characteristics, along with the recorded augmentation strategy and the reported success of each study, where success is a measure of EWM change over a season and is recorded in a variety of ways (e.g., EWM biomass change, EWM percent change, EWM visual change, etc.,). Overall, the model results suggests that shallower lakes, more frequent weevil augmentations, and larger weevil overwintering habitat are the most important predictors for EWM reduction success by weevil augmentation. Although watermilfoil weevil augmentation is a promising mitigation strategy, it may not work for all lakes. However, in terms of suggesting weevil augmentation, our model is a valuable tool for lake stakeholders and resource managers, who can use it to determine whether milfoil weevil augmentation, which can be very costly due to the difficulties in finding and raising milfoil weevils, will be a useful and sustainable approach to control EWM in their lake community.Entities:
Keywords: Eurasian watermilfoil; biocontrol; machine learning; milfoil weevils; predictive modeling; support vector machine
Mesh:
Year: 2022 PMID: 35397182 PMCID: PMC9539498 DOI: 10.1002/eap.2625
Source DB: PubMed Journal: Ecol Appl ISSN: 1051-0761 Impact factor: 6.105
Table of model features and targets for 13/133 studies (full table provided in White, 2022)
| Study site | Latitude | Area (ha) | Maximum depth (m) | Buffer zone (km) | Phosphorus (μg/L) | Secchi depth (m) | Treatment frequency | Augmentation (average no. weevils) | Biological success (Y = 1, |
|---|---|---|---|---|---|---|---|---|---|
| Big Sand Lake (Jester et al., | 46.06 | 563.2 | 19.8 | 11.3 | 19.0 | 2.9 | 1 | N/A | 1 |
| Eagle Lake (Jester et al., | 42.70 | 208.0 | 3.6 | 9.3 | 19.0 | 2.4 | 1 | N/A | 1 |
| Lower Spring Lake (Jester et al., | 42.88 | 41.6 | 3.3 | 2.2 | 44.3 | 1.3 | 1 | N/A | 1 |
| Whitewater Lake (Jester et al., | 42.76 | 256.0 | 11.6 | 5.8 | 15.0 | 1.3 | 1 | N/A | 0 |
| Nancy Lake (Jester et al., | 46.09 | 309.8 | 11.9 | 14.2 | 14.6 | 4.2 | 1 | N/A | 0 |
| Pearl Lake (Jester et al., | 44.09 | 36.8 | 15.2 | 1.3 | 12.3 | 5.8 | 1 | N/A | 0 |
| Beaver Dam Lake (Jester et al., | 45.55 | 444.8 | 32.3 | 13.9 | 9.9 | 4.0 | 1 | N/A | 0 |
| Cedar Lake (Ward & Newman, | 44.96 | 68.0 | 16.0 | 2.7 | 25.0a | 2.8, 2.8, 2.5, 2.5 | 1, 1, 2, 2 | N/A,N/A,NA,N/A | 1, 0, 0, 0 |
| Little Bearskin (Havel et al., | 45.71 | 74.0 | 8.1 | 3.7 | 20.8, 66.4, 33.1, 20.8, 66.4,33.1 | 1.8 | 1 | 2157,0,0,2719,0,0 | 1, 0, 0, 1, 1 |
| Manson Lake (Havel et al., | 45.56 | 96.0 | 16.2 | 4.5 | 11.0, 12.6, 14.7, 11.0, 12.6, 14.7 | 4.8 | 1 | 3013, 0, 0, 2912, 0, 0 | 0, 0, 0, 0, 0, 1 |
Note: Metadata analysis for 13/133 studies. Full table of metadata analysis shared here. Information for model target (biological success 0 = failure, 1 = success) and the eight most highly ranked model features, Lake location (name and latitude), area, maximum depth, buffer zone (measure of shoreline suitable for weevil overwintering), phosphorus (P), Secchi depth, augmentation strategy (number of weevils added), and treatment frequency.
Corresponds to estimate values. Where N/A reported, average weevil number 5036 used.
FIGURE 1A schematic of the models use in future biocontrol programs. Once we have trained, tested, and validated our model, we can input feature sets, including a variety of augmentation strategies, to determine weevil efficacy of Eurasian watermilfoil reduction (with some probability of success)
FIGURE 2An example of a two‐feature linear support vector machine (SVM) describing hyperplane (line) that best separates success and failure. Left image C = 1; right image C = 1000. The dashed lines are the support vectors that the model attempts to separate the data with. The SVM with C = 1 has a larger margin of separation compared to C = 1000. Data is scaled between −1 and 1
FIGURE 3A schematic of the data collection process and filtering (the metadata analysis) in connection with the machine learning algorithm. Data is added to a spreadsheet and filtered (lakes with data missing are removed). Once filtered data is recorded as either an input (a model feature such as lake characteristics or augmentation information) or output (model targets of success = 1 or failure = 0). Then, model features and targets are placed into the machine learning algorithm (which uses a support vector machine (SVM) classification algorithm) where data is randomly split into training sets (three‐quarters of the data) and training sets (one‐quarter of the data) many times. The model is then validated on a set of known targets for which it was not trained
The f1 scores for eight highest ranked features for all five target sets
| Feature no. | All augments | All augments | EWM stem density change | EWM biomass change | EWM percent change | Relative abundance of EWM |
|---|---|---|---|---|---|---|
| 3 | 60.09 ± 0.19 | 65.64 ± 0.26 | 44.32 ± 0.36 | 65.47 ± 0.28 | 46.39 ± 0.25 | 54.86 ± 0.28 |
| 4 | 63.22 ± 0.17 | 65.39 ± 0.27 | 48.03 ± 0.40 | 68.22 ± 0.30 | 49.91 ± 0.28 | 56.06 ± 0.30 |
| 5 | 63.88 ± 0.18 | 65.15 ± 0.27 | 49.39 ± 0.40 | 69.11 ± 0.31 | 51.23 ± 0.28 | 55.65 ± 0.30 |
| 6 | 64.74 ± 0.17 | 65.07 ± 0.27 | 49.51 ± 0.40 | 69.19 ± 0.31 | 52.54 ± 0.28 | 56.33 ± 0.30 |
| 7 | 65.18 ± 0.17 | 65.05 ± 0.27 | 49.48 ± 0.40 | 69.12 ± 0.30 | 52.61 ± 0.27 | 56.89 ± 0.32 |
| 8 | 65.15 ± 0.17 | 64.99 ± 0.27 | 49.41 ± 0.40 | 69.07 ± 0.30 | 52.69 ± 0.26 | 57.16 ± 0.31 |
Note: f1 scores for eight highest ranked features for all five target sets. The top eight features are defined by our largest target set, which includes qualitative information on Eurasian watermilfoil (EWM) increase/decrease (any increase is classified as a failure, and any decrease is classified as a success). Overall, each target set showed a large drop in f1 score between three and four features. In addition, the f1 score only started to increase after the first few features were removed (results not shown).
Data removed from adjusted “All augments” is averaged phosphorus, Secchi depth, and number of weevils, which gives us a total of n = 54 feature sets.
Ranking for eight highest ranked features for all six target sets
| Targets of success | ||||||
|---|---|---|---|---|---|---|
| Features | All augments | All augments | EWM stem density change | EWM biomass change | EWM percent change | Relative abundance of EWM |
| Latitude | 4.1 | 5.1 | 6.3 |
| 1.5 |
|
| Area (ha) | 1.7 | 6.1 |
| 3.9 | 1.2 | 1.3 |
| Maximum depth (m) |
| 2.4 | 2.3 | 4.9 |
|
|
| Buffer (km) | 2.0 |
|
| 1.5 | 2.7 | 3.5 |
| Phosphorus (μg/L) | 2.5 | 7.1 | 1.6 | 2.1 | 2.0 | 2.1 |
| Secchi depth (m) | 1.4 | 3.2 | 3.3 | 2.9 | 1.0 | 2.7 |
| Treatment frequency |
|
| 4.3 |
|
| 4.4 |
| Average no. weevils | 3.2 | 4.1 | 5.3 | 5.9 | 3.6 | 1.7 |
Note: Ranking of model features from most important (ranked as 1) to least important. The top two highest ranked model features are shown in boldface type in each of the columns (describing the highest ranked features in each target set). It is shown that four out of six of the targets tested show that augmentation treatment frequency (how many times weevils are added) is the most important predictor of weevil success. In addition, three out of six of the targets tested showed that maximum lake depth was an important predictor of weevil success. In addition, two out of six target sets showed buffer zone as one of the most important predictors of weevil success. The model we select as our primary target is the “All augments ” target shown in the first column, which illustrates that treatment frequency and buffer zone are the top two most important features needed to predict weevil success.
Data removed from adjusted “All augments” is averaged phosphorus, Secchi depth, and weevil number, which gives us a total of n = 54 feature sets.
FIGURE 4Box plot of feature ranking. Here, ranking of features 1 through 8 across 2a: All target sets, and 2b: Target sets excluding “All augments” where missing data is averaged. In each plot, the red line corresponds to the mean, the blue box to the total variation in the data, the black lines to the standard deviation, and red crosses to outliers. Here, we see that in both cases, treatment frequency, lake area, and buffer zone are ranked in the top four of eight features (features are lake location (latitude), area, maximum depth, buffer zone (measure of shoreline suitable for weevil overwintering), phosphorus (P), Secchi depth, augmentation strategy (number of weevils added), and treatment frequency. However, in 2a, lake depth in also ranked in top four, whereas in 2b latitude is ranked in the top four
Model outputs for linear SVM for all model targets
| Targets of success (across) versus model validation parameters (down) | All augments | All augments | EWM stem density change | EWM biomass change | EWM percent change | Relative abundance of EWM |
|---|---|---|---|---|---|---|
| Accuracy | 66.48 ± 0.10 | 67.41 ± 0.14 | 57.74 ± 0.15 | 76.11 ± 0.12 | 58.21 ± 0.14 | 64.31 ± 0.18 |
| False negative | 24.72 ± 0.10 | 28.77 ± 0.09 |
| 21.58 ± 0.09 | 34.37 ± 0.12 | 27.99 ± 0.10 |
| False positive |
|
| 37.84 ± 0.14 |
|
|
|
| Precision | 72.62 ± 0.20 | 86.90 ± 0.46 | 57.84 ± 0.10 | 33.97 ± 0.91 | 60.27 ± 0.59 | 26.00 ± 0.78 |
| Recall | 47.42 ± 0.18 | 42.92 ± 0.21 | 92.19 ± 0.09 | 14.15 ± 0.43 | 24.76 ± 0.27 | 13.2 ± 0.39 |
| f1 score | 65.04 ± 0.10 | 64.90 ± 0.15 | 49.16 ± 0.23 | 69.61 ± 0.17 | 52.82 ± 0.18 | 57.69 ± 0.19 |
Note: Results for linear SVM model. For each of the targets tested, we can see that false positives are kept at a minimum (are low) for every case except that of stem density (the lowest value between false negative and false positive is shown in boldface type for each target). In addition, we can see that the “All augments” and adjusted “All augmentsa”, which included n = 54 data sets with missing data for phosphorus, Secchi depth, and weevil numbers removed. Results show similar trends in model accuracy and f1 score and recall. However, there is a significant increase in model precision, and a significant decrease in FP. This results in a better/more accurate model overall.
Data removed from adjusted “All augments” is averaged phosphorus, Secchi depth, and number of weevils, which gives us a total of n = 54 feature sets.
Model outputs for polynomial SVM for all model targets
| Targets of success (across) versus model validation parameters (down) | All augments | All augments | EWM stem density change | EWM biomass change | EWM percent change | Relative abundance of EWM |
|---|---|---|---|---|---|---|
| Accuracy | 64.95 ± 0.14 | 65.60 ± 0.20 | 57.20 ± 0.14 | 69.69 ± 0.24 | 63.36 ± 0.19 | 67.40 ± 0.13 |
| False negative | 13.54 ± 0.10 | 18.24 ± 0.14 | 4.26 ± 0.03 | 12.64 ± 0.15 | 17.74 ± 0.13 | 28.48 ± 0.09 |
| False positive |
|
|
|
|
| 4.12 ± 0.10 |
| Precision | 61.04 ± 0.14 | 67.60 ± 0.26 | 57.43 ± 0.09 | 42.78 ± 0.53 | 59.87 ± 0.22 | 32.39 ± 0.89 |
| Recall | 71.17 ± 0.21 | 64.30 ± 0.29 | 92.41 ± 0.06 | 50.90 ± 0.66 | 61.12 ± 0.30 | 11.81 ± 0.34 |
| f1 score | 64.79 ± 0.15 | 65.25 ± 0.21 | 48.20 ± 0.22 | 70.22 ± 0.24 | 63.25 ± 0.19 | 59.58 ± 0.17 |
Note: Results for polynomial SVM model (polynomial of order 3). In this particular model, the percentage of false positives is higher than for the linear model, so we stick with our linear SVM, as one of the goals of our decision‐making process is to minimize false positives (i.e., it's more costly (in terms of time and money) to suggest weevil augmentation in a lake where it is likely not to work then it is to not prescribe weevil augmentation in a lake where it could work.
Data removed from adjusted “All augments” is averaged phosphorus, Secchi depth, and number of weevils, which gives us a total of n = 54 feature sets.