| Literature DB >> 34945891 |
Dursun Aydın1, Syed Ejaz Ahmed2, Ersin Yılmaz1.
Abstract
This paper focuses on the adaptive spline (A-spline) fitting of the semiparametric regression model to time series data with right-censored observations. Typically, there are two main problems that need to be solved in such a case: dealing with censored data and obtaining a proper A-spline estimator for the components of the semiparametric model. The first problem is traditionally solved by the synthetic data approach based on the Kaplan-Meier estimator. In practice, although the synthetic data technique is one of the most widely used solutions for right-censored observations, the transformed data's structure is distorted, especially for heavily censored datasets, due to the nature of the approach. In this paper, we introduced a modified semiparametric estimator based on the A-spline approach to overcome data irregularity with minimum information loss and to resolve the second problem described above. In addition, the semiparametric B-spline estimator was used as a benchmark method to gauge the success of the A-spline estimator. To this end, a detailed Monte Carlo simulation study and a real data sample were carried out to evaluate the performance of the proposed estimator and to make a practical comparison.Entities:
Keywords: B-splines; adaptive splines; right-censored data; semiparametric regression; synthetic data transformation; time series
Year: 2021 PMID: 34945891 PMCID: PMC8699840 DOI: 10.3390/e23121586
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Some of the datasets generated using Algorithm 2 including both fully observed and censored data points for different censoring levels and sample sizes.
Estimated regression coefficients from the AS and the B-spline (BS) with values of variance and bias.
|
| |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||||||||
|
|
| AS | BS | AS | BS | AS | BS | AS | BS | AS | BS | AS | BS |
| 50 | 5 | 0.887 |
| 0.936 |
| 0.809 |
| 0.922 |
| 0.867 |
| 0.884 |
|
| 20 |
| 0.895 |
| 1.290 |
| 0.892 |
| 1.358 | 0.963 |
|
| 1.336 | |
| 40 |
| 1.172 |
| 1.641 |
| 1.108 |
| 1.657 |
| 1.145 |
| 1.674 | |
| 100 | 5 | 0.510 |
| 0.440 |
| 0.539 |
| 0.433 |
| 0.515 |
| 0.439 |
|
| 20 |
| 0.610 |
| 0.609 |
| 0.579 |
| 0.609 |
| 0.599 |
| 0.618 | |
| 40 | 0.535 |
|
| 0.689 |
| 0.622 |
| 0.689 |
| 0.610 |
| 0.692 | |
| 200 | 5 | 0.285 |
| 0.260 |
| 0.290 |
| 0.255 | 0.255 | 0.294 |
|
| 0.254 |
| 20 |
| 0.324 |
| 0.355 | 0.311 |
|
| 0.351 | 0.304 |
|
| 0.353 | |
| 40 |
| 0.333 |
| 0.352 |
| 0.337 |
| 0.356 |
| 0.336 |
| 0.363 | |
The bolded values indicate the best scores.
Figure 2Boxplots of bias values for both the AS and BS methods for all configurations. In the x-axis, b1, b2, and b3 denote , and ; A1, A2, and A3 denote biases obtained from the AS method for CLs of 5%, 20%, and 40%. Similarly, B1, B2, and B3 denote biases for the BS method, when CLs are 5%, 20%, and 40%.
Outcomes from the fitted nonparametric components.
|
|
|
| |||||
|---|---|---|---|---|---|---|---|
|
|
| AS | BS | AS | BS | AS | BS |
| 50 | 5 | 1.085 |
| 0.048 |
| 1.135 |
|
| 20 |
| 1.498 |
| 0.075 |
| 2.061 | |
| 40 |
| 2.510 |
| 0.095 |
| 3.127 | |
| 100 | 5 | 0.961 |
|
| 0.025 | 0.824 |
|
| 20 |
| 1.217 |
| 0.041 |
| 1.779 | |
| 40 |
| 1.302 |
| 0.070 |
| 2.331 | |
| 200 | 5 | 0.891 |
| 0.009 |
| 0.670 |
|
| 20 |
| 0.959 |
| 0.021 |
| 1.871 | |
| 40 |
| 1.070 |
| 0.028 |
| 2.882 | |
The bolded values indicate the best scores.
Figure 3Data points, real regression functions, and curves fitted by two methods. In the legend of the plots, f(A) and f(B) represent function estimates obtained from the AS and BS methods, respectively.
The values of performances from the AS and BS methods.
|
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
| AS | BS | AR(1) | AS | BS | AR(1) | AS | BS | AR(1) | |
| 50 | 5 | 0.166 |
| 0.322 | 0.419 |
| 0.999 | 3.119 | 3.510 | 4.915 |
| 20 | 0.358 |
| 0.388 |
| 0.896 | 1.052 | 4.468 | 4.920 | 5.142 | |
| 40 |
| 0.688 | 1.980 |
| 1.519 | 1.971 | 7.762 | 9.542 | 10.751 | |
| 100 | 5 |
| 0.186 | 0.303 | 0.323 |
| 0.860 | 1.001 | 0.928 | 3.614 |
| 20 |
| 0.336 | 0.365 |
| 0.750 | 0.914 | 1.870 | 1.988 | 4.147 | |
| 40 |
| 0.528 | 1.476 |
| 1.831 | 1.891 | 3.663 | 4.182 | 6.798 | |
| 200 | 5 | 0.111 |
| 0.283 | 0.264 |
| 0.717 | 0.983 |
| 1.935 |
| 20 |
| 0.332 | 0.364 |
| 0.606 | 0.847 |
| 2.497 | 3.411 | |
| 40 |
| 0.508 | 0.654 |
| 1.086 | 1.501 |
| 2.816 | 3.131 | |
The bolded values indicated the best scores.
Figure 4bar chart for the s of all simulation combinations.
Augmented Dickey–Fuller (ADF) test results for the stationarity of time series data and the determination of the appropriate lag.
| No. Lag | ADF Test Results | |
|---|---|---|
| 0 | −2.61 | 0.318 |
| 1 | −3.27 | 0.077 |
| 2 |
|
|
| 3 | −3.33 | 0.066 |
| 4 | −3.30 | 0.072 |
Bold scores are significant score for the 95% confidence level.
The performances of the BS and AS methods for the estimation of both parametric and nonparametric components.
| Measurement | Bias | Variance | ||
|---|---|---|---|---|
| AS | BS | AS | BS | |
|
|
| 2.682 |
| 1.703 |
|
|
| 1.139 |
| 1.624 |
|
|
| 4.566 | 0.067 |
|
The bolded values indicate the best scores.
Scores of performance measures for the AS and BS methods obtained from the whole model estimation.
| Method |
|
|
|
|
|
|---|---|---|---|---|---|
| AS |
|
|
|
|
|
| BS | 1.315 | 1.166 | 1.546 | 1.212 | 1.385 |
| AR(2) | 1.856 | 4.506 | 3.702 | 2.775 | - |
The bolded values indicate the best scores.
Figure 5Estimated curves for the seasonality obtained from the AS and BS methods.