Kumuduni N Palansooriya1, Jie Li2, Pavani D Dissanayake1,3, Manu Suvarna2, Lanyu Li2, Xiangzhou Yuan1, Binoy Sarkar4, Daniel C W Tsang5, Jörg Rinklebe6,7, Xiaonan Wang8, Yong Sik Ok1. 1. Korea Biochar Research Center, APRU Sustainable Waste Management Program & Division of Environmental Science and Ecological Engineering, Korea University, Seoul 02841, South Korea. 2. Department of Chemical and Biomolecular Engineering, National University of Singapore, Singapore 117585, Singapore. 3. Soils and Plant Nutrition Division, Coconut Research Institute, Lunuwila 61150, Sri Lanka. 4. Lancaster Environment Centre, Lancaster University, Lancaster LA1 4YQ, United Kingdom. 5. Department of Civil and Environmental Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong, China. 6. School of Architecture and Civil Engineering, Institute of Foundation Engineering, Water and Waste Management, Laboratory of Soil and Groundwater Management, University of Wuppertal, Pauluskirchstraße 7, 42285 Wuppertal, Germany. 7. Department of Environment, Energy and Geoinformatics, Sejong University, 98 Gunja-Dong, Gwangjin-Gu, Seoul 05006, Republic of Korea. 8. Department of Chemical Engineering, Tsinghua University, Beijing 100084, China.
Abstract
Biochar application is a promising strategy for the remediation of contaminated soil, while ensuring sustainable waste management. Biochar remediation of heavy metal (HM)-contaminated soil primarily depends on the properties of the soil, biochar, and HM. The optimum conditions for HM immobilization in biochar-amended soils are site-specific and vary among studies. Therefore, a generalized approach to predict HM immobilization efficiency in biochar-amended soils is required. This study employs machine learning (ML) approaches to predict the HM immobilization efficiency of biochar in biochar-amended soils. The nitrogen content in the biochar (0.3-25.9%) and biochar application rate (0.5-10%) were the two most significant features affecting HM immobilization. Causal analysis showed that the empirical categories for HM immobilization efficiency, in the order of importance, were biochar properties > experimental conditions > soil properties > HM properties. Therefore, this study presents new insights into the effects of biochar properties and soil properties on HM immobilization. This approach can help determine the optimum conditions for enhanced HM immobilization in biochar-amended soils.
Biochar application is a promising strategy for the remediation of contaminated soil, while ensuring sustainable waste management. Biochar remediation of heavy metal (HM)-contaminated soil primarily depends on the properties of the soil, biochar, and HM. The optimum conditions for HM immobilization in biochar-amended soils are site-specific and vary among studies. Therefore, a generalized approach to predict HM immobilization efficiency in biochar-amended soils is required. This study employs machine learning (ML) approaches to predict the HM immobilization efficiency of biochar in biochar-amended soils. The nitrogen content in the biochar (0.3-25.9%) and biochar application rate (0.5-10%) were the two most significant features affecting HM immobilization. Causal analysis showed that the empirical categories for HM immobilization efficiency, in the order of importance, were biochar properties > experimental conditions > soil properties > HM properties. Therefore, this study presents new insights into the effects of biochar properties and soil properties on HM immobilization. This approach can help determine the optimum conditions for enhanced HM immobilization in biochar-amended soils.
Entities:
Keywords:
biochar; graphical user interface; heavy metal; machine learning models; soil remediation
Soil
pollution by heavy metals (HMs) is a significant global problem
that threatens sustainable development, particularly in developing
countries such as China and India, where 36% of the global population
resides.[1,2] Anthropogenic activities such as mining
and smelting, industrial operations, and agricultural activities accelerate
HM contamination in soils.[1] Ultimately,
these HMs enter the food chain and cause diseases such as cancer,
renal failure, cardiovascular disorders, and neurological and cognitive
impairment.[3,4] Various in situ and ex situ techniques have
been used to remediate HM-contaminated soils. Among these techniques,
in situ HM immobilization using biological waste has become well established
owing to its efficiency, economic feasibility, and ease of adaptation.[3] In situ HM immobilization is a risk-based remediation
strategy in which HM bioavailability is reduced to a level considered
safe for the intended land use.[5] Appropriate
HM immobilizing agents can facilitate environmentally sustainable
remediation because of their reduced environmental footprint.[6]Biochar application to contaminated soil
is considered a promising
method for HM immobilization because biochar can adsorb and immobilize
HMs as its surface area (SA), microporosity, surface functional groups,
pH, and cation exchange capacity are superior to those of raw feedstock.[7,8] A variety of biochars produced from various feedstocks (e.g., sewage
sludge, manure, crop residue) under different production conditions
(e.g., slow pyrolysis, fast pyrolysis, gasification, and hydrothermal
carbonization)[9] have been used to immobilize
HMs (and metalloids) such as As, Cd, Cu, Pb, Cr, Ni, Co, and Zn in
soils.[10,11] The HM immobilization efficiency in biochar-amended
soils varies depending on the type of biochar (e.g., production conditions
and physicochemical properties), soil properties (e.g., soil pH, organic
matter content, and electrical conductivity (EC)), and HM properties
(e.g., valency and ionic radius).[7,12] Studies have
shown that immobilization efficiency is influenced by various adsorption/immobilization
mechanisms and factors such as cation exchange, electrostatic interaction,
precipitation, and complexation by surface functional groups.[13] However, the optimum conditions for enhanced
HM immobilization in soils using biochar vary considerably among studies.
Studying all of the process parameters involved in soil HM immobilization
via simultaneous experimentation is challenging. Several meta-analyses,
bibliometric analyses, and review studies have been carried out to
evaluate the efficiency of HM immobilization.[14,15] However, these methods of determining the relative contribution
of various factors to the immobilization efficiency are time-consuming
and complex. Before applying a biochar-based remediation technique,
identifying the optimum parameters for maximum HM immobilization in
certain types of soil via an empirical approach can reduce the time
and cost involved. However, to the best of our knowledge, such techniques
have not yet been developed. An empirical approach capable of predicting
HM immobilization efficiency in biochar-treated soils may address
the issues associated with the determination of the optimum experimental
conditions, biochar properties, and soil properties to maximize HM
immobilization in soils. A robust model involving all of the possible
factors can be used to highlight the relative importance of each factor,
which may enhance the understanding of the overall process and help
achieve a high HM immobilization efficiency in contaminated soils
under optimized conditions.Machine learning (ML) can process
and learn from large, complex,
and multidimensional data to develop predictive models.[16] ML methods such as random forest (RF) and neural
networks (NN) have been used to monitor and map contaminants in soils[17] and groundwater.[18] In addition, several studies have utilized ML models to develop
risk assessment methods for groundwater pollution,[19] predict the yield and C content of biochar based on biomass
properties and pyrolysis conditions,[20] as
well as predict the sorption efficiencies of HMs (and metalloids)[16,21−23] and personal care products[24] by biochar in water and wastewater. The ML models used could predict
the nonlinear and complex relationships between dependent and independent
variables in complex systems for environmental engineering and bioremediation.[16,25] Complex biochar–soil interactions and lack of a systematic
dataset have resulted in paucity of studies on ML-based prediction
of biochar efficiency for immobilization of contaminants in soils,
particularly in relation to HM contamination.To address this
gap, we developed three types of ML models (RF,
supporting vector regression (SVR), and NN) to predict HM immobilization
efficiency in biochar-amended soils. Statistical methods can only
achieve sample linear or quadratic correlations between a single factor
and a target. Compared to general statistical analysis, ML methods
can simultaneously consider the maximum possible related factors and
identify complex correlations (both linear and nonlinear) with the
targets. Moreover, the targets of interest (i.e., HM immobilization)
can be accurately predicted by the ML model to conveniently evaluate
the remediation potential of biochar for a specific HM-contaminated
soil. In addition, new experimental designs achieved from the developed
ML model can guide experiments and biochar application in HM-contaminated
soil (e.g., by providing a suggested biochar addition rate). Therefore,
this study considered 20 input variables to assess their roles and
effects on HM immobilization in biochar-treated soils. These variables
were related to four key aspects: (i) biochar characteristics and
pyrolysis temperatures; (ii) experimental conditions; (iii) physicochemical
properties of biochar-treated soils; and (iv) HM properties. Prior
to developing the ML models, a thorough statistical analysis of the
data collected from the literature was used to establish correlations
between the input variables (factors) and output variables (HM immobilization)
(detailed information is available in the Supporting Information including Figures S1 and S2).
Materials and Methods
Data Collection
and Data Imputation
Research articles
addressing HM immobilization (for Cu, Zn, Pb, Cd, Fe, Ni, and Mn)
in biochar-amended soils from 2015 to 2020[11,12,26−37] were selected using the Institute for Scientific Information Web
of Science and Google Scholar databases. The HM immobilization efficiency
for the biochar-amended soil (compared to untreated soil) was directly
obtained from the experimental data or calculated as a percentage
using eq where CB0 refers
to the HM concentration in the control treatment (without biochar
amendment), and CB is the HM concentration
in the biochar-amended treatment. The concentrations are expressed
in mg·kg–1.Immobilization in metal-contaminated
soil describes a decrease in the bioavailability, bioaccessibility,
phytoavailability, and leaching of the metal, as there are reduced
amounts of the metal in the exchangeable, labile, and water-soluble
fractions of the soil.The research articles selected for this
study were expected to
include all of the data for the four empirical categories listed in
the Introduction, including 20 input features
and the output variable (Table S1). However,
only a few studies were available that included all of the data for
the considered parameters. Therefore, 15 papers[11,12,26−37] were selected, and data were obtained from tables or extracted from
figures using the Web Plot Digitizer Software (https://apps.automeris.io/wpd/). Thus, 162 data points were collected and used for ML exploration.The detailed procedure of ML exploration associated with HM immobilization
efficiency in biochar-amended soils is illustrated in Figure . Twenty parameters were identified
as input features, and HM immobilization was defined as the output
variable. From the total dataset, 35 and 10 data points were missing
for the biochar SA and soil EC data, respectively. To fill these data
gaps and obtain a uniform dataset, an ML model was used to predict
the missing SA values using other properties of biochar, including
pH, composition, and atomic ratio. Missing soil EC data points were
excluded due to the lack of adequate soil input variables for the
prediction of the EC. In summary, two datasets were compiled, one
for the prediction of the biochar SA and the other for the prediction
of the immobilization efficiency. The dataset for SA prediction contained
127 data points on the biochar properties as the input variables,
whereas that for the immobilization efficiency prediction contained
152 data points following the addition of the missing SA data.
Figure 1
Flowchart detailing
the strategies of the machine learning framework
to determine the heavy metal immobilization efficiency in biochar-amended
soil. During the first step, data were collected from the literature
based on four empirical categories and the output variable. Model
development was preceded by data pretreatment. Then, the model was
updated using feature engineering to simplify it and improve its performance.
The final updated model was used for feature exploration to study
the effects of input features on the target output.
Flowchart detailing
the strategies of the machine learning framework
to determine the heavy metal immobilization efficiency in biochar-amended
soil. During the first step, data were collected from the literature
based on four empirical categories and the output variable. Model
development was preceded by data pretreatment. Then, the model was
updated using feature engineering to simplify it and improve its performance.
The final updated model was used for feature exploration to study
the effects of input features on the target output.
Model Development and Evaluation
To improve the training
process of ML models for rapid convergence, the input features were
normalized using StandardScaler in Scikit-Learn (version 1.0.2)[38] to obtain a similar scale and approximate a
normal distribution. Following normalization, the dataset was randomly
divided into two parts: 85% was used for ML model training and the
remaining 15% was used for the final model evaluation.[39,40] Previous studies have reported that SVR, RF, and NN algorithms have
exhibited satisfactory performance when used to train the models used
for the prediction of biochar properties and HM adsorption by biochar.[16,20,41,42] The hyperparameters for each algorithm were tuned to obtain the
minimal mean-squared error for biochar SA and to predict the immobilization
efficiency based on fivePerovskitecross-validation. Different hyperparameters
were included in various ML algorithms during the tuning process.
In SVR, the epsilon (ε), kernel function, and penalty (α)
parameters were the tuned hyperparameters. In RF, the number of trees,
depth of each tree, and max_feature of RF were the three crucial parameters
that were tuned. Finally, in NN, the number of hidden layers and neurons
in each layer were tuned to improve the model convergence. The tuning
process used in this study for the three ML algorithms has been described
in previous studies.[16,20,41]Three optimal ML models from the SVR, NN, and RF algorithms
were obtained after hyperparameter tuning; these models were evaluated
using the 15% test dataset. The coefficient of determination (R2) and root-mean-square error (RMSE) were utilized
to compare the prediction accuracy and quantify the prediction performance.[41]R2 and RMSE values
were calculated using eqs and 3, respectivelywhere ypi is the predicted
value of the
output, yti is the true value of the output collected
from the literature on experimental research, ym is the mean value of all output values, and N is the number of data samples in the training or testing datasets.
ML-Based Feature Engineering and Performance Analysis with the
Updated ML Model
To simplify the ML model and improve its
performance, feature filtering was incorporated using ML-based feature
importance and correlation. The Pearson correlation coefficient (PCC)
was used to investigate the correlations between features, and hierarchical
clustering was conducted based on the Pearson rank-order correlations.
Features with high correlation were sorted into one cluster because
they contained similar information, and one of these features was
selected as a representative of the cluster for the ML model development.
ML-based feature importance can reasonably determine the representative
features of a cluster. The most important feature in a cluster was
selected as the representative feature for model improvement by combining
the feature importance and correlation. The details of PCC have been
described in a previous study.[16] Moreover,
the hierarchical clustering strategy can be summarized in three steps.[43] First, the lowest distance between elements
was determined, which involved determining the two elements that were
most similar to each other for clustering. Second, the two clustered
elements were considered a new element for future clustering. These
two steps were repeated until a final cluster was obtained. The number
of clusters was determined by selecting a distance threshold based
on the application. Feature importance was investigated based on the
developed RF model, which could interpret the roles of input features
relative to the output variable.[20]The important and valuable features selected based on the feature
engineering results were further applied to update the ML model that
was optimized in the first round. Based on previous studies,[41,44] the hyperparameters remained the same, and only the input features
were updated. Using the same hyperparameters as the previous ML model
enabled comparison of the new and previous models to evaluate the
feasibility of the feature selection method; this also reduced time
and computational costs. The ML model was retrained using the same
85% training dataset to adapt to the new feature information by fixing
the random_state of data splitting, and the updated ML model was evaluated
with the same 15% test set after updating the features to avoid the
participation of the test data in the model training process. If the
prediction performance of the updated model improved, it implied that
our feature selection process was helpful. However, if the prediction
performance of the updated model decreased, the hyperparameters would
have to be retuned. The final updated ML model was applied to explore
the importance and impact of each feature on the target. Two types
of feature analysis methods were applied to evaluate the feature importance
and correlations to HM immobilization efficiency. One feature analysis
result was directly determined through the final updated RF model.
Both one-dimensional and two-dimensional (feature interaction) partial
dependence plots were utilized to integrate the updated RF model and
systematically express the correlation of each feature to the output
variable. Another feature analysis was achieved using the Shapley
additive explanation (SHAP) method, which is also widely used in feature
analysis.[41,45] The marginal effect of each feature on the
predicted output was determined using the ML model and the relevance
between the input features and output variables (e.g., linear, monotonic,
and even more complex relationships).[46]
Results and Discussion
Dataset Compilation and Missing Data Imputation
Following
a systematic literature review and data collection, 20 input variables
were classified into empirical categories based on the domain knowledge.
These included the pyrolysis temperature, biochar properties (pH and
SA of biochar), biochar composition (C, H, N, O, and ash contents),
atomic ratios (H/C, O/C, and [O + N]/C), operational conditions (biochar
addition rate, experimental duration, and available HM concentration),
and soil properties (soil pH and EC) (Figure and Table S1).HM immobilization was considered as the output variable. For biochar
SA, 35 data points of the total 162 were missing, which would reduce
the dataset further if SA was considered. To ensure uniformity of
the entire dataset and obtain the missing data points, three ML algorithms—RF,
SVR, and NN—were developed (Figures a and S3a,b and Table S2) to derive the missing SA data using the pyrolysis temperature,
biochar pH, biochar composition, and atomic ratios as inputs. Plots
of the experimental versus the predicted SA data from the three ML
models (Figures b
and S3c,d) showed that all models exhibited
good performance for SA prediction with a high test coefficient of
determination (R2 = 0.98–0.99).
Figure 2
Results
of (a) hyperparameter tuning, (b) prediction performance,
and (c) feature importance from the random forest model in terms of
the surface area (SA) prediction. The machine learning algorithms
were established based on 131 data points including biochar properties
(pH; C, H, O, N contents; ratios of H/C, O/C, [O + N]/C; and ash content)
and biochar pyrolysis temperature.
Results
of (a) hyperparameter tuning, (b) prediction performance,
and (c) feature importance from the random forest model in terms of
the surface area (SA) prediction. The machine learning algorithms
were established based on 131 data points including biochar properties
(pH; C, H, O, N contents; ratios of H/C, O/C, [O + N]/C; and ash content)
and biochar pyrolysis temperature.Feature analysis was continued by utilizing the optimal RF model
to predict the missing values and obtain the feature importance; Figure c shows the importance
of each feature to biochar SA. The H/C atomic ratio was found to be
the most important feature for SA prediction. This is a new finding,
as no such direct relationship has previously been reported in the
literature. The second most important feature for biochar SA was biochar
pH, followed by the biochar pyrolysis temperature. Generally, the
pyrolysis temperature showed a strong correlation with the SA and
H/C atomic ratios in the biochar. A high pyrolysis temperature led
to a higher SA and lower H/C atomic ratio in biochar.[47,48] Chen et al.[49] showed that missing data
in a meta-analysis would increase the uncertainties of the results.
In contrast, in the ML analysis, missing data could be imputed to
avoid excluding records with missing values.[40]
ML Model Development and Feature Analysis
After filling
the data gaps, three ML algorithms (SVR, NN, and RF) were used to
predict the HM immobilization efficiency of biochar-amended soils
based on 20 input features (Table S1).
The optimal hyperparameters for each model were tuned during the training
phase to minimize prediction errors based on fivefold cross-validation
(Figure S4). Figure presents the actual and predicted values
of the HM immobilization efficiency for the three models. For the
RF model, the training and testing R2 were
0.95 and 0.91, respectively, while the RMSE values were 7.35 and 10.54%,
respectively. For the SVR and NN models, the testing R2 were similar at 0.88 and 0.80, respectively. These results
indicated that the RF model with optimally tuned hyperparameters was
the best-performing algorithm for predicting the HM immobilization
efficiency.
Figure 3
Predictive performance of (a) supporting vector regression, (b)
neural network models, (c) random forest, and (d) updated random forest
to predict heavy metal immobilization efficiency in biochar-amended
soils. In all, 152 data points were obtained for model development
after imputing the missing data for the biochar surface area. RMSE
= root-mean-square error.
Predictive performance of (a) supporting vector regression, (b)
neural network models, (c) random forest, and (d) updated random forest
to predict heavy metal immobilization efficiency in biochar-amended
soils. In all, 152 data points were obtained for model development
after imputing the missing data for the biochar surface area. RMSE
= root-mean-square error.Although the preliminary dataset had satisfactory prediction accuracies
for the ML models (particularly for RF), the less important features
and the simultaneous use of many input features could weaken the generalization
capacity of the model. Therefore, ML-based feature engineering was
conducted to filter out the less important features and simplify the
ML model for improved performance. Highly related input features were
identified using the PCC (Figure S5) and
hierarchical clustering (Figure a). The clustering and empirical categories were identified
based on the domain knowledge, including pyrolysis temperature, biochar
properties, operational conditions, soil properties, and heavy metal
properties. Features within the same cluster that belonged to different
empirical categories were not removed during the feature-filtering
process.
Figure 4
Input feature analysis: (a) hierarchical clustering and (b) machine
learning model-based feature importance from the random forest model.
In all, 152 data points were used for model development. The clusters
in panel (a) indicated by different colors and hierarchical levels
obtained from the hierarchical clustering algorithm were based on
the Pearson rank-order correlations from Figure S5. A distance threshold of one was selected to determine the
similarity between features. This implied that the bottom branches
(each feature had one bottom branch) from the same upper layer branch
were identified as one cluster under this distance threshold. Determining
similar features in the same cluster provided insights for postextraction
of important features to simplify the model. Note: T °C: pyrolysis
temperature; pH_BC: biochar pH; C%, H%, O%, and N%: C, H, O, and N
contents of biochar, respectively; H/C, O/C, (O + N)/C: atomic ratios
of biochar; ash %: ash content of biochar; SA: surface area of biochar;
BC rate %: biochar application rate in soil; time: experimental duration;
Avail. HM: available heavy metal content in soil; pH: soil pH; EC:
soil electrical conductivity; MW: molecular weight of the heavy metal;
electronegativity: electronegativity of the heavy metal; ionic radius:
ionic radius of the heavy metal; and valency: valency of the heavy
metal.
Input feature analysis: (a) hierarchical clustering and (b) machine
learning model-based feature importance from the random forest model.
In all, 152 data points were used for model development. The clusters
in panel (a) indicated by different colors and hierarchical levels
obtained from the hierarchical clustering algorithm were based on
the Pearson rank-order correlations from Figure S5. A distance threshold of one was selected to determine the
similarity between features. This implied that the bottom branches
(each feature had one bottom branch) from the same upper layer branch
were identified as one cluster under this distance threshold. Determining
similar features in the same cluster provided insights for postextraction
of important features to simplify the model. Note: T °C: pyrolysis
temperature; pH_BC: biochar pH; C%, H%, O%, and N%: C, H, O, and N
contents of biochar, respectively; H/C, O/C, (O + N)/C: atomic ratios
of biochar; ash %: ash content of biochar; SA: surface area of biochar;
BC rate %: biochar application rate in soil; time: experimental duration;
Avail. HM: available heavy metal content in soil; pH: soil pH; EC:
soil electrical conductivity; MW: molecular weight of the heavy metal;
electronegativity: electronegativity of the heavy metal; ionic radius:
ionic radius of the heavy metal; and valency: valency of the heavy
metal.Figure a shows
that the (O + N)/C, O/C, and O contents were within one cluster with
a threshold of one; these features represented the biochar properties,
and two were eliminated to simplify the ML model. Based on the feature
importance of the RF model, the O content was more important than
the other two features (Figure b). Combining these results, (O + N)/C and O/C were removed
as redundant features. Furthermore, the H/C and H contents, which
represented biochar properties, were also identified in one cluster
(Figure a). Therefore,
the H/C ratio was eliminated from the features, as it was less important
than the H content in terms of feature importance. The last cluster
contained HM properties, including the mass, electronegativity, ionic
radius, and valency, which were highly correlated to each other. This
indicates that these features offered similar contributions to model
training; therefore, the most important feature (i.e., electronegativity)
to predict HM immobilization was retained for model development.
ML Model Update for Final Feature Exploration
The best-performing
RF model was reconceptualized with a reduced set of 14 input features
to obtain improved generalization ability and greater computational
efficiency. The new RF model was trained using 85% of the dataset,
and the remaining 15% was used to evaluate its prediction performance. Figure d presents the prediction
accuracy of the updated RF model; the R2 values of the training and test dataset were 0.95 and 0.92, respectively.
For the test dataset, the R2 value of
the updated model was slightly higher than that of the preliminary
model, and the RMSE value was smaller than that of the preliminary
model. This slight increase in the testing prediction performance
was reasonable as the excluded features might have been redundant
in the original model, weakening its generalization ability and robustness.[44] This result implied that the feature selection
method used in this study was feasible for simplifying the ML model
and improving its robustness. Using the same optimal hyperparameters
as the previous ML model could be an efficient method to save computational
cost and time for model development. Notably, the prediction performance
in this study was higher than other reported values from previous
studies.[20,50,51] These results
imply that the feature-filtering process, in which only 14 important
features were identified, could adequately achieve a satisfactory
ML model performance.To ensure that the prediction model is
accessible to scientists and practitioners, a graphical user interface
(GUI) web software was developed using Python (version 3.7) and the
Flask (version 1.1.2) web framework (Figure S6). To further validate the model through the GUI using new data,
we collected eight data points (Table S3) from published experimental studies. These experimental data points
were independent of the initial dataset of 162 data points. Based
on the validation results using the new data points, our developed
model provided some prediction errors, most of which were lower than
30% (Figure S7). This level of prediction
error was reasonable because our model was limited to the initial
dataset, and some values or conditions (e.g., remediation time and
soil pH) of the newly added experimental data points were outside
the range of our original dataset (Tables S1 and S3), thus resulting in a prediction error <20%, which is
an acceptable value.[20,52] Moreover, the optimal values
for input variables can be obtained through the GUI for real-world
biochar applications for HM immobilization in soils before implementation.
In particular, the GUI uses the provided information to predict the
HM immobilization performance of a specific biochar for a particular
soil type. This may be achieved when the properties of biochar (elemental
composition, SA, biochar pyrolysis temperature, and biochar pH), soil
(pH and EC), and HM (available HM concentration and metal electronegativity)
are determined using analytical instruments, and the amendment conditions
(biochar rate and amendment time) are obtained based on the experimental
design. Thus, this method can save time and cost in related research
or engineering projects investigating the HM immobilization performance
of biochar-amended soils.Based on the updated RF model, the
final feature importance relative
to the output HM immobilization efficiency was explored using both
the RF explainer and SHAP methods (Figures and S8). The
ranking of important features from the two feature analysis methods
showed similar results, particularly for the two most important features
for predicting the immobilization efficiency, which were the N content
in the biochar and biochar application rate (Figure a,b). The N content was positively correlated
with HM immobilization within the 0.3–25.9% range (Figure S8). The presence of N-containing functional
groups (e.g., −NH2, N–C=O, and C=N)
on the biochar surface provides active sites for HM immobilization
through strong covalent bonding, H bonding, chelation, and electrostatic
attraction.[53,54] HMs, such as Cu, may be fixed
by amino-modified biochar because of the increased −NH2 surface functional groups.[55] In
addition, a biochar with a higher N content may have better adsorption
properties than that with a lower N content. For example, N-doped
biochar exhibited altered surface chemistry with a higher SA (418.7
m2·g–1) than that of pristine biochar
(61.0 m2·g–1),[56] resulting in higher adsorption capacities for aqueous Cu2+ and Cd2+ of N-doped biochar compared to pristine
biochar. In general, the O content and O-containing functional groups
(e.g., carboxyl, hydroxyl, and phenolic) in biochar play a vital role
in HM immobilization.[57] The presence of
more O-containing functional groups in biochar increases the immobilization
of HMs (i.e., Cd, Pb, and Cu) owing to the strong interactions between
HMs and O-containing functional groups.[57−59] Nevertheless, the N
content was more important than the O content of biochar for HM immobilization
in this study (Figure b). A similar finding was reported by Igalavithana et al.[60] who found that the Pb immobilization rate showed
a higher correlation with the N content than with the O content of
biochar. A possible reason for this difference could be the preferential
adsorption of some HMs by the N-containing functional groups. Deng
et al.[59] reported that N-containing functional
groups (e.g., N–C=O) played a significant role in Pb
removal, while C- or O-containing functional groups did not show a
significant effect.
Figure 5
Updates of the random forest model based on the feature-filtering
dataset with the final investigation of feature importance, where
the relative importance of empirical categories was selected from
the (a) random forest model and (b) Shapley additive explanation method,
and (c) interactions among the top four features (N%, BC rate %, C%,
and EC_soil) on the impact of HM immobilization. In all, 152 data
points were used for model development. Note: T °C: pyrolysis
temperature; pH_BC: biochar pH; C%, H%, O%, and N%: C, H, O, and N
contents of biochar, respectively; ash %: ash content of biochar;
SA: surface area of biochar; BC rate %: biochar application rate in
soil; time: experimental duration; Avail. HM: available heavy metal
content in soil; pH_soil: soil pH; EC_soil: soil electrical conductivity;
and electronegativity: electronegativity of heavy metals.
Updates of the random forest model based on the feature-filtering
dataset with the final investigation of feature importance, where
the relative importance of empirical categories was selected from
the (a) random forest model and (b) Shapley additive explanation method,
and (c) interactions among the top four features (N%, BC rate %, C%,
and EC_soil) on the impact of HM immobilization. In all, 152 data
points were used for model development. Note: T °C: pyrolysis
temperature; pH_BC: biochar pH; C%, H%, O%, and N%: C, H, O, and N
contents of biochar, respectively; ash %: ash content of biochar;
SA: surface area of biochar; BC rate %: biochar application rate in
soil; time: experimental duration; Avail. HM: available heavy metal
content in soil; pH_soil: soil pH; EC_soil: soil electrical conductivity;
and electronegativity: electronegativity of heavy metals.The HM immobilization efficiency increased with the biochar
application
rate within a specific range (0–10%) based on the utilized
dataset (Figure S8). Moreover, feature
interaction indicated that biochar application rates higher than 4%
with biochar N content higher than 5% could achieve a high HM immobilization
efficiency (Figure c). Field experiments and incubation studies have also confirmed
that the bioavailability of HMs may decrease with increasing biochar
application rates.[61−63] Cui et al.[63] reported
that the application of biochar at 5 and 15% (weight) to Cd-contaminated
soil in an incubation experiment reduced the concentration of bioavailable
Cd by 53.4 and 87.9%, respectively, compared with the concentration
of bioavailable Cd in the control. This might be due to the increase
in functional groups and soil pH, which caused the formation of insoluble
Cd precipitates with increasing biochar application rates.[63] Bioavailable Pb and Cd in biochar-amended soils
decreased with increasing biochar application rates (0.0, 1.0, 2.5,
5.0, and 10.0%).[64] This was attributed
to the increase in soil pH (from 6.17 to 7.17) and organic matter
content (from 10.34 to 11.48%), which promoted the formation of less
soluble Pb(OH)2 and Cd(OH)2 precipitates and
the binding of Pb/Cd to Mn and Fe oxides.[64]The C contents in biochar and EC of soil were very important
in
predicting the HM immobilization efficiency, and they ranked third
and fourth in terms of feature importance (Figure a,b). In general, the C content in biochar
increases with the pyrolysis temperature, indicating that biochar
that is produced at higher temperatures has more recalcitrant C, whereas
biochar produced at lower temperatures has more labile C.[65,66] In particular, biochar with more recalcitrant C or higher aromaticity
exhibits higher HM immobilization performance owing to its greater
surface negativity.[12,60] However, higher HM immobilization
was observed when the biochar C content was between 40 and 60%, particularly
when the biochar application rate was below 7% (Figures S8 and 5c). Soil EC showed
a strong positive correlation with HM immobilization in biochar-amended
soils in this study, which was consistent with existing reports in
the literature[27,60] (Figure S8). Furthermore, feature interaction revealed that the addition of
biochar with a carbon content <60% at a biochar application rate
>5% with soil EC higher than 0.5 could improve soil HM immobilization
(Figure c). The application
of biochar to soil may also increase soil EC, particularly because
of the release of exchangeable basic cations such as Ca2+, Mg2+, Na+, and K+,[11] which subsequently facilitates HM immobilization
via ion exchange, complexation, and coprecipitation.[12,29,60]Although the pH values
of biochar and soil were not considered
very important for HM immobilization (Figure b), these features are known to be crucial
for soil HM immobilization after biochar application.[11,29] Several meta-analyses have also emphasized that the pH of biochar
and soil are crucial factors that influence bioavailable HM in soil.[15,67] However, the partial dependence plot demonstrated that soil HM immobilization
was higher when the soil and biochar pH were approximately 8 and >7.5,
respectively (Figure S8). At a higher soil
pH, the deprotonation of acidic functional groups on the biochar surface
demonstrated a preference for positively charged HMs, thereby enhancing
HM immobilization.[68] Moreover, the release
of K+, Na+, OH–, PO43–, and Cl– ions at higher pH
facilitated the formation of stable compounds with HMs.[11] The formation of stable inner-sphere complexes[69] and precipitation of HMs as carbonates, phosphates,
and hydroxides[29,70] are some of the additional mechanisms
that facilitate HM immobilization in soils due to the increased pH
from biochar application. Notably, the SA of biochar and the pyrolysis
temperature did not play significant roles in HM immobilization. This
is consistent with the preliminary analysis (statistical data analysis
presented in the SI), which showed that
a high immobilization efficiency was obtained for different ranges
of the SA, and a high SA was not always required. Biochar applied
to contaminated soil forms a complex system, and other factors may
also interact with biochar SA to influence HM immobilization.Some studies have reported that the SA of biochar increases with
increasing pyrolysis temperature; in this study, biochar with a higher
SA adsorbed more HMs than that with a lower SA.[11,26] However, this phenomenon may change when the types of biochar, HM
species, and experimental conditions vary. Son et al.[71] reported that the Cu2+ adsorption capacity of
marine macroalgae biochar (SA = 0.39–0.49 m2·g–1) was higher than that of pinewood sawdust biochar
(SA = 364.47 m2·g–1), even though
the specific SA of the former was lower than that of the latter. Although
increasing the pyrolysis temperature develops a microporous structure
and increases the specific SA of biochar, it decreases the abundance
of functional groups on the biochar surface.[72] For example, functional groups such as carboxyl, carbonyl, hydroxyl,
and methoxyl on biochar tend to disappear as the pyrolysis temperature
is increased over 450.[73] Thus, biochar
with a higher SA would have fewer functional groups, and hence, lower
HM immobilization capacity. These results supplement the findings
of this study, which demonstrate the reduced importance of SA and
pyrolysis temperature in HM immobilization using biochar. An increase
in ash content in the biochar increased HM immobilization (Figure S8). The ash in biochar consists of residual
minerals (e.g., salts) that can supply cations to the soil after dissolution.[60,74] These cations increase the soil cation exchange capacity, thereby
increasing the HM immobilization efficiency in soils.[60] Based on the relative importance of each empirical category,
“biochar properties” was found to be the most important
feature for HM immobilization, followed by the experimental conditions,
soil properties, and HM properties (Figure a). The contribution of biochar properties
to HM immobilization was 50% (Figure a). Lakshmi et al.[75] reported
similar findings, where “biochar properties” was the
leading variable (53.8% contribution) for predicting HM sorption (in
aqueous media) on biochar using an RF model. In our study, “experimental
conditions” (which includes the available HM concentration
in the control soil) was the second most important factor for HM immobilization,
contributing 28.0% to the relative importance (Figure a). Similarly, in the aforementioned study,[75] the initial HM concentration was the second
most important variable, contributing 30.6% to the relative importance.
Environmental Implications
The optimum conditions for HM
immobilization in soils containing
biochar vary widely among studies. Examining all relevant parameters
through simultaneous experimentation is challenging. To address this
issue, an ML-based empirical approach was developed in this study,
which could be used to predict HM immobilization efficiency in biochar-amended
soils based on biochar, soil, and HM properties as well as experimental
conditions. According to the findings of the newly developed RF model,
the two most important features governing soil HM immobilization were
the biochar application rate and N content (N%), which were positively
correlated with HM immobilization capacity. Based on causal analysis,
the importance of the involved features was in the following order:
biochar properties > experimental conditions > soil properties
> HM
properties.Chen et al.[49] performed
a meta-analysis
and found that the HM bioavailability and plant uptake depend on factors
such as HM speciation, soil properties, biochar characteristics, application
rate, and plant type. Moreover, soil properties such as soil pH, organic
matter content, and texture are the key variables that determine the
concentration of HM uptake by plants.[49] Similarly, Rehman et al.[67] determined
through a meta-analysis that edaphic factors, such as soil pH, texture,
and plant species, affect HM adsorption and transformation in soils
amended with biochar. Several other meta-analyses[15,76−78] have also reported the factors influencing HM immobilization
in soil. However, none of these studies has identified the most important
variable or the relative importance of each variable for HM immobilization.
Compared with these conventional approaches, our ML model highlights
the relative importance of each factor for HM immobilization. This
facilitates the overall understanding of the process and realization
of maximum HM immobilization efficiency in contaminated soils under
various conditions. Furthermore, a GUI was developed to ensure that
the prediction model is accessible to both scientists and practitioners.
This online tool can predict the HM immobilization efficiency of a
given biochar for a specific soil type using available data. Thus,
the GUI can assist in obtaining optimum values of the input variables
and in achieving maximum HM immobilization in soils prior to the implementation
of a biochar-based remediation plan.The results of this study
have some limitations owing to the quality
and quantity of data collected from published papers. The data distribution
of some input features and output targets was inconsistent owing to
multiple variations in the research objectives, methodologies, and
experimental conditions. For example, immobilization efficiency was
determined based on a wide range of features such as bioavailability,
bioaccessibility, exchangeable fraction, labile fraction, leaching,
phytoavailability, and water-soluble fraction of HMs. Moreover, various
extraction methods using diethylenetriaminepentaacetic acid, ethylenediaminetetraacetic
acid, CaCl2, and NH4NO3 were applied
to determine the available HM concentrations in soil fractions. These
constraints may cause uncertainties in some of the prediction results
and may not precisely reflect real-world scenarios. Therefore, future
research should focus on improving the ML model using a database that
includes studies with well-defined scientific objectives and similar
methodologies under uniform experimental conditions. In particular,
some biochar properties that are directly associated with HM immobilization
were missing in this study due to the lack of data in the selected
literature. For example, the evaluation of the surface functional
groups (e.g., −OH, −COOH, C–C, C=C, C–O,
C=O phenolic, alcoholic, and ether) of biochar on HM immobilization
is more important than the evaluation of the elemental composition
of biochar. Hence, future studies should utilize the surface chemistry
data derived from X-ray photoelectron spectroscopy, X-ray diffraction
analysis, and Fourier transform infrared spectroscopy when developing
ML models for predicting the HM immobilization efficiency by biochar.
Authors: Avanthi Deshani Igalavithana; Sung-Eun Lee; Young Han Lee; Daniel C W Tsang; Jörg Rinklebe; Eilhann E Kwon; Yong Sik Ok Journal: Chemosphere Date: 2017-02-03 Impact factor: 7.086
Authors: Xiao Yang; Ana Tsibart; Hyungseok Nam; Jin Hur; Ali El-Naggar; Filip M G Tack; Chi-Hwa Wang; Young Han Lee; Daniel C W Tsang; Yong Sik Ok Journal: J Hazard Mater Date: 2018-11-13 Impact factor: 10.588
Authors: Xiao Yang; Avanthi D Igalavithana; Sang-Eun Oh; Hyungseok Nam; Ming Zhang; Chi-Hwa Wang; Eilhann E Kwon; Daniel C W Tsang; Yong Sik Ok Journal: Sci Total Environ Date: 2018-06-02 Impact factor: 7.963