| Literature DB >> 34620921 |
Yiwei Cheng1, Ved N Bhoot2, Karl Kumbier3,4, Marilou P Sison-Mangus5, James B Brown3,6,7,8, Raphael Kudela5, Michelle E Newcomer2.
Abstract
Increasing occurrence of harmful algal blooms across the land-water interface poses significant risks to coastal ecosystem structure and human health. Defining significant drivers and their interactive impacts on blooms allows for more effective analysis and identification of specific conditions supporting phytoplankton growth. A novel iterative Random Forests (iRF) machine-learning model was developed and applied to two example cases along the California coast to identify key stable interactions: (1) phytoplankton abundance in response to various drivers due to coastal conditions and land-sea nutrient fluxes, (2) microbial community structure during algal blooms. In Example 1, watershed derived nutrients were identified as the least significant interacting variable associated with Monterey Bay phytoplankton abundance. In Example 2, through iRF analysis of field-based 16S OTU bacterial community and algae datasets, we independently found stable interactions of prokaryote abundance patterns associated with phytoplankton abundance that have been previously identified in laboratory-based studies. Our study represents the first iRF application to marine algal blooms that helps to identify ocean, microbial, and terrestrial conditions that are considered dominant causal factors on bloom dynamics.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34620921 PMCID: PMC8497483 DOI: 10.1038/s41598-021-98110-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Nash–Sutcliffe Efficiencies (NSE) of the iterative random forest models when tested against training and testing data. (A) iRF NSE results for the Santa Cruz Wharf (SCW) only dataset, and (B) iRF NSE results for the SCW + inland dataset.
Figure 2(A and B) Gini Importance of explanatory variables for iterative random forest models utilizing (left) Santa Cruz Wharf (SCW) data and (right) SCW + inland data. (C and D) The 8 most stable interactions recovered by iRF with the highest NSE utilizing (left) Santa Cruz Wharf (SCW) data and (right) SCW + inland data. Black triangles represent stability values while blue dots represent precision values.
Figure 3Nash–Sutcliffe Efficiencies (NSE) of the iterative random forest models when tested against (a) training and (b) testing data subsets of microbial OTUs that are part of the marine microbiome dataset collected from Santa Cruz Wharf. X-axes are the microbial strains and the y-axes are the NSE values.
Figure 4Feature importance of explanatory variables for each bacterial OTU. X-axes are the explanatory variables. The explanatory variables are categorized and color coded: red – chemical, grey – physical, blue – biological. Biotic (environmental) ocean measures consisted of ammonium (NH4, µM), silicic acid (Si, µM), nitrate (N, µM), phosphate (P, µM), temperature (WTMP, °C), and Domoic Acid (DA, mg/L). Biotic measures include Alexandrium spp. (Alx. Spp. cells/L), Pseudo-nitzschia in the size range of the functional group seriata, (Ps-nt. Seri. cells/L), and chlorophyll-a (Chl-a. mg/m3) as a proxy.
The 5 most stable interactions recovered by iRF with the highest NSE during prediction for each OTU.
| Octadecabacter 1 | NSE: 0.266 | Octadecabacter 2 | NSE: 0.440 | Marine group II | NSE: − 0.040 | ||
|---|---|---|---|---|---|---|---|
| Interaction | Precision | Interaction | Precision | Interaction | Precision | ||
| DA+_Ps-nt Seri− | 0.787 | N−_Si− | 0.796 | Chl-a−_Ps-nt Seri+ | 0.852 | ||
| DA+_Si− | 0.754 | Chl-a+_Si− | 0.731 | Chl-a−_WTMP− | 0.800 | ||
| Ps-nt Seri−_Si− | 0.586 | Si−_WTMP+ | 0.727 | Chl-a−_DA+ | 0.770 | ||
| DA−_Si− | 0.523 | Chl-a+_N− | 0.634 | Chl-a−_DA− | 0.532 | ||
| DA+_Ps-nt Seri+ | 0.690 | N-_WTMP+ | 0.625 | Chl-a−_Ps-nt Seri− | 0.531 | ||
The direction of change is indicate by the + or the − sign.