| Literature DB >> 34611225 |
Masoud Karbasi1, Mehdi Jamei2, Iman Ahmadianfar3, Amin Asadi4,5.
Abstract
In the present study, two kernel-based data-intelligence paradigms, namely, Gaussian Process Regression (GPR) and Kernel Extreme Learning Machine (KELM) along with Generalized Regression Neural Network (GRNN) and Response Surface Methodology (RSM), as the validated schemes, employed to precisely estimate the elliptical side orifice discharge coefficient in rectangular channels. A total of 588 laboratory data in various geometric and hydraulic conditions were used to develop the models. The discharge coefficient was considered as a function of five dimensionless hydraulically and geometrical variables. The results showed that the machine learning models used in this study had shown good performance compared to the regression-based relationships. Comparison between machine learning models showed that GPR (RMSE = 0.0081, R = 0.958, MAPE = 1.3242) and KELM (RMSE = 0.0082, R = 0.9564, MAPE = 1.3499) models provide higher accuracy. Base on the RSM model, a new practical equation was developed to predict the discharge coefficient. Also, the sensitivity analysis of the input parameters showed that the main channel width to orifice height ratio (B/b) has the most significant effect on determining the discharge coefficient. The leveraged approach was applied to identify outlier data and applicability domain.Entities:
Year: 2021 PMID: 34611225 PMCID: PMC8492736 DOI: 10.1038/s41598-021-99166-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Schematic view of elliptic side orifice and its geometrical parameters.
Statistical specifications of train and test datasets.
| Data | Statistic | ||||||
|---|---|---|---|---|---|---|---|
| Train data | Mean | 2.665 | 1.463 | 9.047 | 0.172 | 0.484 | 0.518 |
| Std | 1.212 | 0.208 | 2.611 | 0.039 | 0.121 | 0.026 | |
| Min | 1.25 | 1.25 | 6.25 | 0.093 | 0.219 | 0.405 | |
| Max | 5 | 1.667 | 12.5 | 0.252 | 0.777 | 0.569 | |
| Test data | Mean | 2.837 | 1.446 | 8.971 | 0.168 | 0.472 | 0.517 |
| Std | 1.246 | 0.209 | 2.576 | 0.039 | 0.108 | 0.027 | |
| Min | 1.25 | 1.25 | 6.25 | 0.093 | 0.273 | 0.426 | |
| Max | 5 | 1.667 | 12.5 | 0.256 | 0.731 | 0.562 |
Figure 2Correlation matrix of input(independent) variables and output (dependent) variable.
List of Kernel functions used for GPR model.
| Kernel function | Kernel equation |
|---|---|
| Rational quadratic | |
| Exponential Kernel | |
| Matern 3/2 | |
| Squared exponential | |
| Matern 5/2 | |
| ARD squared exponential | |
| ARD exponential | |
| ARD matern 3/2 | |
| ARD matern 5/2 | |
| ARD rational quadratic |
Where is the signal standard deviation (Std), is the characteristic length scale, r is the Euclidean distance between and which is defined by , is a positive-valued scale-mixture parameter, and is a separate length scale for each predictor , . The values of (hyper-parameters) is calculated by maximizing the marginal likelihood[61,62].
Figure. 3KELM structure.
Figure 4The flowchart of predicting discharge coefficient of elliptical side orifice by different machine learning (ML) models.
Effect of GPR kernel type on model accuracy.
| Kernel | RMSE | R | MAPE | MBE | NRMSE |
|---|---|---|---|---|---|
| Exponential | 0.00875 | 0.95098 | 1.40634 | − 0.00062 | 1.69007 |
| Squaredexponential | 0.00845 | 0.95423 | 1.36432 | − 0.00054 | 1.63369 |
| Matern3/2 | 0.00857 | 0.95299 | 1.38097 | − 0.00061 | 1.65591 |
| Matern5/2 | 0.00853 | 0.95346 | 1.37472 | − 0.00059 | 1.64759 |
| Rational quadratic | 0.00845 | 0.95423 | 1.36432 | − 0.00054 | 1.63370 |
| ARDexponential | 0.00840 | 0.95474 | 1.38167 | − 0.00031 | 1.62279 |
| ARDsquaredexponential | 0.00810 | 0.95797 | 1.32428 | − 0.00029 | 1.56469 |
| ARDmatern3/2 | 0.00821 | 0.95674 | 1.35327 | − 0.00028 | 1.58723 |
| ARDmatern5/2 | 0.00817 | 0.95723 | 1.34383 | − 0.00028 | 1.57823 |
| ARDrationalquadratic | 0.00818 | 0.95715 | 1.34447 | − 0.00028 | 1.57975 |
Figure 5Scatter plots of observed Cd against predicted Cd by GPR model for train and test data.
Figure 6Scatter plots of observed Cd against predicted Cd by KELM model for train and test data.
Figure 7Scatter plots of observed Cd against predicted Cd by GRNN model for train and test data.
ANOVA results for determining the effective variable interactions in the RSM model.
| Variables | Coefficient | Sum of squares | F-value | |
|---|---|---|---|---|
| 0.18167 | 0.06822 | 695.09123 | 0.00000 | |
| 0.13982 | 0.09677 | 986.01148 | 0.00000 | |
| 0.00772 | 0.00170 | 17.35857 | 0.00004 | |
| 3.93829 | 0.00163 | 16.65797 | 0.00005 | |
| − 0.26858 | 0.03962 | 403.65941 | 0.00000 | |
| − 0.02340 | 0.00205 | 20.87685 | 0.00001 | |
| − 0.00743 | 0.00371 | 37.77965 | 0.00000 | |
| − 0.65597 | 0.00503 | 51.25667 | 0.00000 | |
| − 0.00870 | 0.00078 | 7.95540 | 0.00502 | |
| − 1.35512 | 0.00152 | 15.45248 | 0.00010 | |
| 0.26254 | 0.00127 | 12.95195 | 0.00036 | |
| 0.11749 | 0.00199 | 20.24286 | 0.00001 | |
| − 0.04114 | 0.00197 | 20.08285 | 0.00001 | |
| − 3.38425 | 0.00159 | 16.21657 | 0.00007 | |
| 0.00118 | 0.00128 | 13.02753 | 0.00034 | |
| 0.65565 | 0.00319 | 32.46088 | 0.00000 |
Figure 8Scatter plots of observed Cd against predicted Cd by MLRI model for train and test data.
Results of regression-based equations.
| Equations | RMSE | R | MAPE | MBE | NRMSE |
|---|---|---|---|---|---|
| Equation | 0.0106 | 0.9277 | 1.6846 | 0.0003 | 2.0464 |
| Equation | 0.0107 | 0.9254 | 1.6993 | 0.0007 | 2.0731 |
| Equation | 0.0234 | 0.5713 | 3.6714 | − 0.0014 | 4.5212 |
| Equation | 0.0273 | 0.5149 | 4.3799 | 0.0113 | 5.2789 |
| Equation | 0.0202 | 0.7008 | 3.1016 | − 0.0011 | 3.9039 |
Figure 9Scatter plots of regression-based equations.
Performance of AI models and best Vatankhah and Rafeifar[3] regression-based equation in predicting discharge coefficient.
| Statistical criteria | Models | ||||
|---|---|---|---|---|---|
| GPR | GRNN | K-ELM | RSM | Equation | |
| R | 0.9556 | 0.9202 | 0.9530 | 0.9279 | 0.8962 |
| RMSE | 0.0077 | 0.0104 | 0.0079 | 0.0097 | 0.0116 |
| MAPE | 1.1781 | 1.5769 | 1.2128 | 1.4790 | 1.7496 |
| MBE | 0 | 0.0001 | 0 | 0 | 0.0005 |
| NRMSE | 1.4844 | 2.0043 | 1.5230 | 1.8742 | 2.2231 |
| AVE rank | 1 | 4 | 2 | 3 | 5 |
| R | 0.9580 | 0.9291 | 0.9564 | 0.9456 | 0.9277 |
| RMSE | 0.0081 | 0.0106 | 0.0082 | 0.0092 | 0.0106 |
| MAPE | 1.3243 | 1.6971 | 1.3499 | 1.4921 | 1.6846 |
| MBE | − 0.0003 | − 0.0004 | − 0.0002 | − 0.0001 | 0.0003 |
| NRMSE | 1.5647 | 2.0532 | 1.5929 | 1.7738 | 2.0464 |
| AVE rank | 1 | 4 | 2 | 3 | 5 |
Figure 10The error distribution of four developed AI models and regression-based equations.
Figure 11The cumulative frequency (%) of absolute relative error (%) for AI models and regression-based equations.
Figure 12Williams plot to identify the application domain of machine learning models.
Figure 13Williams plot for identifying the application domain of regression-based models.
The statistical measures for sensitivity analysis situations.
| Metrics | All- | All- | All- | All- | All- | All |
|---|---|---|---|---|---|---|
| R | 0.9052 | 0.7932 | 0.9576 | 0.9432 | 0.8968 | 0.9580 |
| RMSE | 0.0120 | 0.0174 | 0.0081 | 0.0094 | 0.0125 | 0.0081 |
| MAPE | 1.7022 | 2.4944 | 1.3336 | 1.4860 | 1.9147 | 1.3243 |
| MBE | − 0.0003 | − 0.0012 | − 0.0003 | − 0.0007 | − 0.0007 | − 0.0003 |
| NRMSE | 2.3179 | 3.3595 | 1.5723 | 1.8159 | 2.4168 | 1.5647 |
| AVE rank | 3 | 1 | 5 | 4 | 2 | – |