| Literature DB >> 35529268 |
Sara Muhammadullah1, Amena Urooj1, Muhammad Hashim Mengal2, Shahzad Ali Khan3, Fereshteh Khalaj4.
Abstract
Impulse indicator saturation is a popular method for outlier detection in time series modeling, which outperforms the least trimmed squares (LTS), M-estimator, and MM-estimator. However, using the IIS method for outlier detection in cross-sectional analysis has remained unexplored. In this paper, we probe the feasibility of the IIS method for cross-sectional data. Meanwhile, we are interested in forecasting performance and covariate selection in the presence of outliers. IIS method uses Autometrics techniques to estimate the covariates and outlier as the number of covariates P > n observations. Besides Autometrics, regularization techniques are a well-known method for covariate selection and forecasting in high-dimensional analysis. However, the efficiency of regularization techniques for the IIS method has remained unexplored. For this purpose, we explore the efficiency of regularization techniques for out-of-sample forecast in the presence of outliers with 6 and 4 standard deviations (SD) and orthogonal covariates. The simulation results indicate that SCAD and MCP outperform in forecasting and covariate selection with 4 SD (20% and 5% outliers) compared to Autometrics. However, LASSO and AdaLASSO select more covariates than SCAD and MCP and possess higher RMSE. Overall, regularization techniques possess the least RMSE than Autometrics, as Autometrics possesses the least average gauge at the cost of the least average potency. We use COVID-19 cross-sectional data collected from 1 July 2021 to 30 September 2021 for real data analysis. The SCAD and MCP select CRP level, gender, and other comorbidities as an important predictor of hospital stay with the least out-of-sample RMSE of 7.45 and 7.50, respectively.Entities:
Mesh:
Year: 2022 PMID: 35529268 PMCID: PMC9073553 DOI: 10.1155/2022/2588534
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.809
Regularization penalties.
| Method | Penalty function |
|---|---|
| LASSO |
|
| AdaLASSO |
|
| SCAD |
|
| MCP |
|
p (.) is a function denoted as penalty function, and λ is the function parameter.
Simulated results with different percentages of outliers with 6 SD.
| 20% outliers | ||
| Gauge | Potency | |
| SCAD | 0.222 | 0.367 |
| MCP | 0.222 | 0.367 |
| LASSO | 0.611 | 0.767 |
| AdaLASSO | 0.333 | 0.433 |
| Auto(0.05) | 0.011 | 0.100 |
| Auto(0.01) | 0.011 | 0.100 |
| 10% outliers | ||
| SCAD | 0.100 | 0.500 |
| MCP | 0.140 | 0.550 |
| LASSO | 0.650 | 0.850 |
| AdaLASSO | 0.220 | 0.600 |
| Auto(0.05) | 0.010 | 0.200 |
| Auto(0.01) | 0.000 | 0.200 |
| 5% outliers | ||
| SCAD | 0.048 | 0.600 |
| MCP | 0.048 | 0.600 |
| LASSO | 0.591 | 0.933 |
| AdaLASSO | 0.124 | 0.667 |
| Auto(0.05) | 0.000 | 0.534 |
| Auto(0.01) | 0.000 | 0.534 |
Simulated results with different percentages of outliers with 4 SD.
| 20% outliers | ||
| Gauge | Potency | |
| SCAD | 0.222 | 1.000 |
| MCP | 0.144 | 1.000 |
| LASSO | 0.611 | 0.967 |
| AdaLASSO | 0.189 | 0.933 |
| Auto(0.05) | 0.000 | 0.367 |
| Auto(0.01) | 0.011 | 0.367 |
| 10% outliers | ||
| SCAD | 0.230 | 0.600 |
| MCP | 0.150 | 0.550 |
| LASSO | 0.650 | 0.850 |
| AdaLASSO | 0.360 | 0.700 |
| Auto(0.05) | 0.000 | 0.500 |
| Auto(0.01) | 0.000 | 0.500 |
| 5% outliers | ||
| SCAD | 0.114 | 0.667 |
| MCP | 0.095 | 0.667 |
| LASSO | 0.657 | 0.867 |
| AdaLASSO | 0.352 | 0.667 |
| Auto(0.05) | 0.000 | 0.667 |
| Auto(0.01) | 0.000 | 0.667 |
Figure 1Average RMSE with less than 5% outliers.
Figure 2Average RMSE with 10% outliers.
Figure 3Average RMSE with 20% outliers.
Figure 4Correlation graph.
Figure 5Box plot of hospital stay.
Figure 6Residual box plot of linear regression.
Real data analysis with covariate selection and number of selected outliers.
| SCAD number of selected outliers (28) | ||||
| Variable | Gender | CRP level | Other comorbidities | |
| Coefficient | 0.24463 | 0.00083 | 0.20533 | |
| MCP number of selected outliers (31) | ||||
| Variable | Gender | CRP level | Other comorbidities | |
| Coefficient | 0.22493 | 0.0004 | 0.2585 | |
| LASSO number of selected outliers (204) | ||||
| Variable | Age | Gender | CRP level | Other comorbidities |
| Coefficient | 0.00225 | 0.55747 | 0.00282 | 1.3966 |
| Auto(0.05) number of selected outliers (14) | ||||
| Variable | CRP level | Other comorbidities | ||
| Coefficient | 0.00766 | 0.9653 | ||
Figure 7Out-of-sample RMSE of real data analysis.