| Literature DB >> 30182308 |
Loucif Benahmed1, Larbi Houichi2.
Abstract
Hydrology-related studies often require complete datasets. However, missing data is an unavoidable reality. In this regard, the imputed data could fulfill the same role as the observed ones, while they are uncertain and just estimated. The aim of this study is to compare the performance of four simple imputation variants derived from the principal component analysis (PCA) for imputing annual total rainfall series obtained from stations located in northeast Algeria. On the other hand, the study focuses on the effects on quantiles of annual rainfall data due to imputations by the former methods. The four variants are probabilistic PCA, expectation maximization PCA, regularized PCA, and singular value decomposition PCA. Annual rainfall data from 30 stations for the period ranging from 1935 to 2004 (69 years) are used to generate and impute gaps for four different percentages of missing values (PMV), namely, 10, 20, 30, and 40%. Based on some well-known statistical indices, the results show that the regularized PCA and expectation maximization PCA variants perform better than the other imputation methods considered in this study and result in very good to acceptable predicted quantiles, such as the following: correlation coefficient is equal to 0.97 with 10% of percentage of missing values and 0.66 with 40%; the relative error between observed and predicted quantiles is equal to 4.74% with 10% of percentage of missing values and 3.82% with 40%.Keywords: Algeria; Missing data; PCA methods; Rainfall; Simple imputation
Mesh:
Year: 2018 PMID: 30182308 DOI: 10.1007/s10661-018-6913-y
Source DB: PubMed Journal: Environ Monit Assess ISSN: 0167-6369 Impact factor: 2.513