Literature DB >> 30182308

The effect of simple imputations based on four variants of PCA methods on the quantiles of annual rainfall data.

Loucif Benahmed1, Larbi Houichi2.   

Abstract

Hydrology-related studies often require complete datasets. However, missing data is an unavoidable reality. In this regard, the imputed data could fulfill the same role as the observed ones, while they are uncertain and just estimated. The aim of this study is to compare the performance of four simple imputation variants derived from the principal component analysis (PCA) for imputing annual total rainfall series obtained from stations located in northeast Algeria. On the other hand, the study focuses on the effects on quantiles of annual rainfall data due to imputations by the former methods. The four variants are probabilistic PCA, expectation maximization PCA, regularized PCA, and singular value decomposition PCA. Annual rainfall data from 30 stations for the period ranging from 1935 to 2004 (69 years) are used to generate and impute gaps for four different percentages of missing values (PMV), namely, 10, 20, 30, and 40%. Based on some well-known statistical indices, the results show that the regularized PCA and expectation maximization PCA variants perform better than the other imputation methods considered in this study and result in very good to acceptable predicted quantiles, such as the following: correlation coefficient is equal to 0.97 with 10% of percentage of missing values and 0.66 with 40%; the relative error between observed and predicted quantiles is equal to 4.74% with 10% of percentage of missing values and 3.82% with 40%.

Keywords:  Algeria; Missing data; PCA methods; Rainfall; Simple imputation

Mesh:

Year:  2018        PMID: 30182308     DOI: 10.1007/s10661-018-6913-y

Source DB:  PubMed          Journal:  Environ Monit Assess        ISSN: 0167-6369            Impact factor:   2.513


  7 in total

1.  Missing value estimation methods for DNA microarrays.

Authors:  O Troyanskaya; M Cantor; G Sherlock; P Brown; T Hastie; R Tibshirani; D Botstein; R B Altman
Journal:  Bioinformatics       Date:  2001-06       Impact factor: 6.937

2.  MissForest--non-parametric missing value imputation for mixed-type data.

Authors:  Daniel J Stekhoven; Peter Bühlmann
Journal:  Bioinformatics       Date:  2011-10-28       Impact factor: 6.937

3.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering.

Authors:  Ryota Suzuki; Hidetoshi Shimodaira
Journal:  Bioinformatics       Date:  2006-04-04       Impact factor: 6.937

4.  A methodology for treating missing data applied to daily rainfall data in the Candelaro River Basin (Italy).

Authors:  Rossella Lo Presti; Emanuele Barca; Giuseppe Passarella
Journal:  Environ Monit Assess       Date:  2010-01       Impact factor: 2.513

5.  Application of ANN and ANFIS models for reconstructing missing flow data.

Authors:  Mohammad T Dastorani; Alireza Moghadamnia; Jamshid Piri; Miguel Rico-Ramirez
Journal:  Environ Monit Assess       Date:  2009-06-20       Impact factor: 2.513

6.  Using self-organizing maps to infill missing data in hydro-meteorological time series from the Logone catchment, Lake Chad basin.

Authors:  E Nkiaka; N R Nawaz; J C Lovett
Journal:  Environ Monit Assess       Date:  2016-06-09       Impact factor: 2.513

7.  Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes.

Authors:  Guy N Brock; John R Shaffer; Richard E Blakesley; Meredith J Lotz; George C Tseng
Journal:  BMC Bioinformatics       Date:  2008-01-10       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.