| Literature DB >> 33066472 |
Ștefan-Mihai Petrea1, Mioara Costache2, Dragoș Cristea3, Ștefan-Adrian Strungaru4, Ira-Adeline Simionov1,5, Alina Mogodan1, Lacramioara Oprica6, Victor Cristea1,5.
Abstract
Metals are considered to be one of the most hazardous substances due to their potential for accumulation, magnification, persistence, and wide distribution in water, sediments, and aquatic organisms. Demersal fish species, such as turbot (Psetta maxima maeotica), are accepted by the scientific communities as suitable bioindicators of heavy metal pollution in the aquatic environment. The present study uses a machine learning approach, which is based on multiple linear and non-linear models, in order to effectively estimate the concentrations of heavy metals in both turbot muscle and liver tissues. For multiple linear regression (MLR) models, the stepwise method was used, while non-linear models were developed by applying random forest (RF) algorithm. The models were based on data that were provided from scientific literature, attributed to 11 heavy metals (As, Ca, Cd, Cu, Fe, K, Mg, Mn, Na, Ni, Zn) from both muscle and liver tissues of turbot exemplars. Significant MLR models were recorded for Ca, Fe, Mg, and Na in muscle tissue and K, Cu, Zn, and Na in turbot liver tissue. The non-linear tree-based RF prediction models (over 70% prediction accuracy) were identified for As, Cd, Cu, K, Mg, and Zn in muscle tissue and As, Ca, Cd, Mg, and Fe in turbot liver tissue. Both machine learning MLR and non-linear tree-based RF prediction models were identified to be suitable for predicting the heavy metal concentration from both turbot muscle and liver tissues. The models can be used for improving the knowledge and economic efficiency of linked heavy metals food safety and environment pollution studies.Entities:
Keywords: heavy metals; machine learning; prediction models; random forest; turbot
Mesh:
Substances:
Year: 2020 PMID: 33066472 PMCID: PMC7587397 DOI: 10.3390/molecules25204696
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Correlation matrix of heavy metals concentration in turbot tissues ((a) correlation matrix for the first group samples; (b) correlation matrix for the second group samples; (c) correlation matrix for the third group samples; and, (d) correlation matrix for the fourth group samples).
The version of RF models and regressors for the first group MLR models, described in Section 2.2.1. (models 1–8b).
| No. of MLR Model | RF Model | RF Regressor |
|---|---|---|
|
| RF MODEL: Ca muscle–Feature importance: 0.11 for Ca liver, 0.05 for Na liver, 0.02 for Mg liver, 0.01 for Ni liver and 0.01 for K liver; Model Accuracy: 90.27% (MAPE = 9.73%) | RandomForestRegressor(bootstrap = False, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 30, n_jobs = None, oob_score = False, random_state = 256, verbose = 0, warm_start = False) |
|
| RF MODEL: Cu liver–Feature importance: 0.06 for Zn liver, 0.04 for Mg liver, 0.03 for Ni liver; Model Accuracy: 95.81% (MAPE = 4.19%) | RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 110, n_jobs = None, oob_score = False, random_state = 58, verbose = 0, warm_start = False) |
|
| RF MODEL: Fe muscle–Feature importance: 0.06 for Na muscle, 0.05 for K liver, 0.04 for Mn liver, 0.04 for Mg muscle, 0.03 for Ni liver; Model Accuracy: 89.48% (MAPE = 10.52%) | RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 40, n_jobs = None, oob_score = False, random_state = 80, verbose = 0, warm_start = False) |
|
| RF MODEL: K liver–Feature importance: 0.14 for Na liver, 0.11 for Ca liver, 0.11 for Ca muscle, 0.09 for Fe liver, 0.04 for Mg liver; Model Accuracy: 97.66% (MAPE = 2.34%) | RandomForestRegressor(bootstrap = False, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 3, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 30, n_jobs = None, oob_score = False, random_state = 124, verbose = 0, warm_start = False) |
|
| RF MODEL: Mg muscle–Feature importance: 0.13 for Na muscle, 0.08 for Zn liver, 0.03 for Ni liver, 0.02 for Cu muscle, 0.02 for Fe muscle; Model Accuracy: 97.01% (MAPE = 2.99%) | RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 60, n_jobs = None, oob_score = False, random_state = 297, verbose = 0, warm_start = False) |
|
| RF MODEL: Na liver–Feature importance: 0.16 for Ca liver, 0.09 for Fe muscle, 0.08 for Mg liver, 0.07 for K liver, 0.02 for Mg muscle; Model Accuracy: 98.48% (MAPE = 1.52%) | RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 30, n_jobs = None, oob_score = False, random_state = 115, verbose = 0, warm_start = False) |
|
| RF MODEL: Na muscle–Feature importance: 0.08 for Fe muscle, 0.06 for Zn liver, 0.04 for Ca muscle, 0.03 for Mg muscle, 0.03 for Ca liver; Model Accuracy: 97.09% (MAPE = 2.91%) | RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 50, n_jobs = None, oob_score = False, random_state = 273, verbose = 0, warm_start = False) |
|
| RF MODEL: Zn liver–Feature importance: 0.04 for Ca liver, 0.01 for Cd liver, 0.01 for Zn muscle, 0.01 for Mn muscle; Model Accuracy: 97.78% (MAPE = 2.22%) | RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 40, n_jobs = None, oob_score = False, random_state = 299, verbose = 0, warm_start = False) |
Figure A1Prediction of heavy metal concentration for turbot liver and muscle tissues—actual values vs. predicted value of RF models for the first group MLR models, described in Section 2.2.1. (models 1–8b) ((a). prediction for Ca concentration in muscle; (b). prediction for Cu concentration in liver; (c). prediction for Fe concentration in muscle; (d). prediction for K concentration in liver; (e). prediction for Mg concentration in muscle; (f). prediction for Na concentration in liver; (g). prediction for Na concentration in muscle; (h). prediction for Zn concentration in liver).
The RF regressors.
| Model No. | RF Model Regressor |
|---|---|
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 3, min_weight_fraction_leaf = 0.0, n_estimators = 200, n_jobs = None, oob_score = False, random_state = 116, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 200, n_jobs = None, oob_score = False, random_state = 278, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 200, n_jobs = None, oob_score = False, random_state = 15, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 237, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 50, n_jobs = None, oob_score = False, random_state = 227, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 90, n_jobs = None, oob_score = False, random_state = 214, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 3, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 3, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 43, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 201, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 93, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 223, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 3, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 192, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 200, n_jobs = None, oob_score = False, random_state = 206, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = None, max_features = ‘auto’, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, in_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 173, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 70, n_jobs = None, oob_score = False, random_state = 171, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 3, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 29, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = False, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 3, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 30, n_jobs = None, oob_score = False, random_state = 237, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = None, max_features = ‘auto’, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 227, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = None, max_features = ‘auto’, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 143, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = None, max_features = ‘auto’, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 45, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 3, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 30, n_jobs = None, oob_score = False, random_state = 104, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 3, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 2, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 110, n_jobs = None, oob_score = False, random_state = 223, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = None, max_features = ‘auto’, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 76, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 60, n_jobs = None, oob_score = False, random_state = 192, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 50, n_jobs = None, oob_score = False, random_state = 273, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 93, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 100, max_features = 4, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 3, min_weight_fraction_leaf = 0.0, n_estimators = 90, n_jobs = None, oob_score = False, random_state = 11, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 80, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 3, min_weight_fraction_leaf = 0.0, n_estimators = 90, n_jobs = None, oob_score = False, random_state = 74, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 50, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 90, n_jobs = None, oob_score = False, random_state = 54, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = None, max_features = ‘auto’, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 102, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 80, n_jobs = None, oob_score = False, random_state = 17, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 3, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 3, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 40, n_jobs = None, oob_score = False, random_state = 284, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = False, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 30, n_jobs = None, oob_score = False, random_state = 54, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = True, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 2, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 100, n_jobs = None, oob_score = False, random_state = 63, verbose = 0, warm_start = False) |
|
| RandomForestRegressor(bootstrap = False, ccp_alpha = 0.0, criterion = ‘mse’, max_depth = 40, max_features = 3, max_leaf_nodes = None, max_samples = None, min_impurity_decrease = 0.0, min_impurity_split = None, min_samples_leaf = 1, min_samples_split = 2, min_weight_fraction_leaf = 0.0, n_estimators = 30, n_jobs = None, oob_score = False, random_state = 22, verbose = 0, warm_start = False) |
Figure A2Prediction of heavy metal concentration for turbot liver tissues—actual values vs. predicted value for first group non-linear tree-based RF prediction models, described in Section 2.2.2. ((a). prediction for As concentration; (b). prediction for Cd concentration; (c). prediction for Mn concentration; (d). prediction for Fe concentration; (e). prediction for Ca concentration; (f). prediction for Mg concentration).
Figure A3Prediction of heavy metals concentration for turbot muscle tissues—actual values vs. predicted value for first group non-linear tree-based RF prediction models, described in Section 2.2.2. ((a). prediction for K concentration; (b). prediction for As concentration; (c). prediction for Cd concentration; (d). prediction for Cu concentration; (e). prediction for Mn concentration; (f). prediction for Zn concentration).
Figure A4Prediction of heavy metal concentration for turbot muscle and liver tissues—actual values vs. predicted value for first group non-linear tree-based RF prediction models 21–30. ((a). prediction for As concentration in muscle; (b). prediction for Cd concentration in muscle; (c). prediction for Cu concentration in muscle; (d). prediction for K concentration in muscle; (e). prediction for Mn concentration in muscle; (f). prediction for Zn concentration in muscle; (g). prediction for As concentration in liver; (h). prediction for Cd concentration in liver; (i). prediction for Ca concentration in liver; (j). prediction for Fe concentration in liver; (k). prediction for Mg concentration in liver; (l). prediction for Mn concentration in liver).
Figure A5Prediction of heavy metals concentration for turbot muscle tissues—actual values vs. predicted value for second dataset group—Section 2.2.3. ((a). prediction for Zn concentration; (b). prediction for Cd concentration; (c). prediction for Fe concentration; (d). prediction for Cu concentration; (e). prediction for Ni concentration).
Figure A6Prediction of heavy metals concentration for turbot muscle tissues—actual values vs. predicted value for fourth dataset group—Section 2.2.5 ((a). prediction for Cd concentration; (b). prediction for Fe concentration; (c). prediction for Cu concentration; (d). prediction for Zn concentration; (e). prediction for Ni concentration).
Feature importance of elements in all Random Forest (RF) models from current study.
| Parameter | Weight 1 | Weight 2 | Weight 3 | Weight 4 | Weight 5 | Total | Total Per Element |
|---|---|---|---|---|---|---|---|
| Ca muscle | - | 1 | 2 | 1 | 2 | 6 | 14 |
| Ca liver | 3 | 2 | 1 | - | 2 | 8 | |
| K muscle | 2 | 2 | - | 1 | 1 | 6 | 12 |
| K liver | - | 2 | 3 | 1 | - | 6 | |
| Zn muscle | 1 | - | 1 | 1 | - | 3 | 13 |
| Zn liver | 1 | 2 | 4 | 1 | 2 | 10 | |
| Mg muscle | 2 | - | - | 2 | 1 | 5 | 10 |
| Mg liver | 1 | 1 | 1 | - | 2 | 5 | |
| Ni muscle | 2 | - | - | 1 | - | 3 | 8 |
| Ni liver | 1 | 2 | 1 | 2 | 6 | ||
| Fe muscle | 2 | 2 | 1 | - | 1 | 6 | 8 |
| Fe liver | - | 1 | - | 1 | - | 2 | |
| Na muscle | 2 | - | - | 3 | - | 5 | 8 |
| Na liver | 3 | - | - | - | - | 3 | |
| Cu muscle | - | 1 | - | 1 | 1 | 3 | 5 |
| Cu liver | - | 1 | - | - | 1 | 2 | |
| Mn muscle | - | - | - | 1 | 1 | 2 | 5 |
| Mn liver | - | - | 2 | 1 | - | 3 | |
| Cd muscle | - | - | - | 1 | 1 | 2 | 5 |
| Cd liver | 1 | 2 | - | - | - | 3 | |
| As muscle | - | - | 2 | 1 | - | 3 | 5 |
| As liver | 1 | 1 | -- | - | 2 |
Figure 2The sampling area of the scientific studies used as dataset for developing the analytical framework (the literature sources which present only data recorded for aquaculture turbot specimens are bordered, while sources that present data for wild turbot specimens are not bordered).
Figure 3A machine learning typical workflow (original figure).
Reasons for Random Forest use in the present study.
| No. | Characteristic | Authors |
|---|---|---|
| 1 | Predictive performance | [ |
| 2 | No overfitting | [ |
| 3 | Highly Flexible | [ |
| 4 | Can capture non-linear dependencies | [ |
| 5 | Robust when noise is present | [ |
| 6 | Formalized predictor significance | [ |
| 7 | Fast | [ |
| 8 | Suitable for small datasets | [ |
| 9 | Efficient when interactions are present | [ |
| 10 | Small number of model parameters | [ |
| 11 | Stable | [ |
| 12 | Good for high dimensional data | [ |
| 13 | Various type of problems | [ |
| 14 | Straightforward to use | [ |
| 15 | Can handle highly correlated predictor variables | [ |
Figure A7Python code excerpt for implementing random forest evaluation method.
Descriptive statistics of first group dataset.
| Variable | Unit | Mean | SE Mean | StDev | Min. | Q1 | Median | Q3 | Max. |
|---|---|---|---|---|---|---|---|---|---|
| As muscle | µg g−1 Fresh weight (F.W.) | 3.82 | 0.18 | 1.14 | 2.15 | 2.79 | 3.69 | 4.66 | 6.32 |
| Cd muscle | µg g−1 F.W. | 0.03 | 0.00 | 0.00 | 0.03 | 0.03 | 0.03 | 0.03 | 0.04 |
| Fe muscle | µg g−1 F.W. | 9.13 | 0.58 | 3.68 | 4.33 | 6.26 | 8.40 | 12.84 | 15.87 |
| Cu muscle | µg g−1 F.W. | 0.16 | 0.00 | 0.01 | 0.14 | 0.14 | 0.16 | 0.17 | 0.18 |
| Mn muscle | µg g−1 F.W. | 0.17 | 0.01 | 0.06 | 0.04 | 0.15 | 0.19 | 0.21 | 0.27 |
| Zn muscle | µg g−1 F.W. | 12.18 | 0.47 | 2.99 | 6.17 | 10.20 | 13.14 | 14.39 | 16.13 |
| Ni muscle | µg g−1 F.W. | 0.11 | 0.00 | 0.03 | 0.05 | 0.09 | 0.11 | 0.13 | 0.17 |
| Ca muscle | µg g−1 F.W. | 176.84 | 19.63 | 124.14 | 52.49 | 79.52 | 100.94 | 287.07 | 435.90 |
| Mg muscle | µg g−1 F.W. | 518.08 | 7.47 | 47.25 | 438.42 | 479.67 | 517.38 | 551.31 | 608.88 |
| Na muscle | µg g−1 F.W. | 1116.55 | 31.99 | 202.32 | 831.94 | 917.89 | 1123.59 | 1319.56 | 1394.43 |
| K muscle | µg g−1 F.W. | 6001.23 | 38.72 | 244.91 | 5640.17 | 5778.49 | 5998.94 | 6191.85 | 6453.03 |
| As liver | µg g−1 F.W. | 8.42 | 0.62 | 3.92 | 3.91 | 4.78 | 7.44 | 10.67 | 17.65 |
| Cd liver | µg g−1 F.W. | 0.10 | 0.00 | 0.02 | 0.05 | 0.08 | 0.09 | 0.12 | 0.13 |
| Fe liver | µg g−1 F.W. | 60.34 | 1.69 | 10.68 | 42.12 | 52.39 | 60.16 | 69.88 | 79.81 |
| Cu liver | µg g−1 F.W. | 3.10 | 0.07 | 0.42 | 2.51 | 2.77 | 3.00 | 3.42 | 3.89 |
| Mn liver | µg g−1 F.W. | 0.62 | 0.04 | 0.27 | 0.02 | 0.44 | 0.61 | 0.80 | 1.10 |
| Zn liver | µg g−1 F.W. | 28.63 | 0.40 | 2.50 | 25.27 | 26.45 | 28.01 | 30.74 | 33.93 |
| Ni liver | µg g−1 F.W. | 0.17 | 0.00 | 0.03 | 0.13 | 0.14 | 0.17 | 0.20 | 0.21 |
| Ca liver | µg g−1 F.W. | 85.89 | 4.56 | 28.82 | 51.64 | 58.76 | 82.24 | 115.01 | 121.73 |
| Mg liver | µg g−1 F.W. | 434.82 | 14.94 | 94.52 | 334.09 | 344.36 | 406.76 | 535.68 | 599.18 |
| Na liver | µg g−1 F.W. | 1511.55 | 23.77 | 150.32 | 1217.21 | 1419.67 | 1548.77 | 1645.93 | 1672.52 |
| K liver | µg g−1 F.W. | 4889.04 | 118.62 | 750.23 | 3281.69 | 4252.79 | 5204.22 | 5533.91 | 5580.09 |
| Turbot Weight | kg | 1.39 | 0.02 | 0.15 | 1.20 | 1.26 | 1.36 | 1.48 | 1.70 |
| Turbot Length | cm | 43.46 | 0.31 | 1.94 | 40.20 | 41.77 | 43.70 | 44.98 | 46.80 |
Descriptive statistics of second group dataset.
| Variable | Mean | SE Mean | StDev | Min. | Q1 | Median | Q3 | Max. |
|---|---|---|---|---|---|---|---|---|
| Cd muscle | 0.04 | 0.00 | 0.01 | 0.02 | 0.03 | 0.03 | 0.03 | 0.10 |
| Fe muscle | 11.57 | 1.23 | 8.23 | 4.33 | 6.56 | 9.33 | 13.62 | 39.84 |
| Cu muscle | 0.38 | 0.12 | 0.83 | 0.14 | 0.15 | 0.16 | 0.17 | 5.05 |
| Mn muscle | 1.00 | 0.55 | 3.67 | 0.04 | 0.17 | 0.20 | 0.21 | 24.22 |
| Zn muscle | 14.67 | 1.23 | 8.26 | 6.17 | 10.81 | 13.49 | 14.99 | 45.20 |
| Ni muscle | 0.50 | 0.17 | 1.14 | 0.05 | 0.09 | 0.12 | 0.14 | 4.50 |
Descriptive statistics of third group dataset.
| Variable | Mean | SE Mean | StDev | Min. | Q1 | Median | Q3 | Max. |
|---|---|---|---|---|---|---|---|---|
| Cd muscle | 0.03 | 0.00 | 0.01 | 0.01 | 0.03 | 0.03 | 0.03 | 0.10 |
| Cu muscle | 0.55 | 0.16 | 1.11 | 0.14 | 0.15 | 0.16 | 0.17 | 5.18 |
| Zn muscle | 15.31 | 1.20 | 8.40 | 6.17 | 11.30 | 13.63 | 15.43 | 45.20 |
Descriptive statistics of fourth group dataset.
| Variable | Mean | SE Mean | StDev | Min. | Q1 | Median | Q3 | Max. |
|---|---|---|---|---|---|---|---|---|
| Cd muscle | 0.03 | 0.00 | 0.01 | 0.01 | 0.03 | 0.03 | 0.03 | 0.10 |
| Fe muscle | 10.67 | 1.09 | 7.42 | 2.60 | 6.09 | 9.17 | 13.46 | 39.84 |
| Cu muscle | 0.37 | 0.12 | 0.82 | 0.14 | 0.15 | 0.16 | 0.17 | 5.05 |
| Mn muscle | 0.93 | 0.53 | 3.61 | 0.04 | 0.17 | 0.20 | 0.22 | 24.22 |
| Zn muscle | 13.91 | 1.01 | 6.86 | 6.17 | 10.41 | 13.42 | 14.98 | 45.20 |
| Ni muscle | 0.41 | 0.15 | 1.03 | 0.02 | 0.08 | 0.11 | 0.14 | 4.50 |
Descriptive statistics of fifth group dataset.
| Variable | Mean | SE Mean | StDev | Min. | Q1 | Median | Q3 | Max. |
|---|---|---|---|---|---|---|---|---|
| Pb muscle | 0.24 | 0.08 | 0.25 | 0.03 | 0.10 | 0.17 | 0.28 | 0.85 |
| Cd muscle | 0.05 | 0.01 | 0.04 | 0.01 | 0.01 | 0.03 | 0.10 | 0.11 |
| As muscle | 0.91 | 0.27 | 0.81 | 0.15 | 0.30 | 0.61 | 1.58 | 2.53 |