Literature DB >> 29715280

Efficient estimation of Pareto model: Some modified percentile estimators.

Sajjad Haider Bhatti1, Shahzad Hussain1, Tanvir Ahmad1, Muhammad Aslam2, Muhammad Aftab1, Muhammad Ali Raza1.   

Abstract

The article proposes three modified percentile estimators for parameter estimation of the Pareto distribution. These modifications are based on median, geometric mean and expectation of empirical cumulative distribution function of first-order statistic. The proposed modified estimators are compared with traditional percentile estimators through a Monte Carlo simulation for different parameter combinations with varying sample sizes. Performance of different estimators is assessed in terms of total mean square error and total relative deviation. It is determined that modified percentile estimator based on expectation of empirical cumulative distribution function of first-order statistic provides efficient and precise parameter estimates compared to other estimators considered. The simulation results were further confirmed using two real life examples where maximum likelihood and moment estimators were also considered.

Entities:  

Mesh:

Year:  2018        PMID: 29715280      PMCID: PMC5929531          DOI: 10.1371/journal.pone.0196456

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


1. Introduction

Pareto distribution is widely applicable distribution in economics. It was initially introduced by Pareto [1] to represent the income distribution among individuals. It is most appropriate model for situations represented by 80–20 rule, that is, when 80% effect comes from 20% causes. Certainly, a large portion of wealth of society is used or owned by a small percentage of people. The Pareto model has wide application in economic studies as it plays a vital role in the investigation of several phenomena [2]. Although it is most widely used as an income model to define the allocation of wealth among individual units [3] but it is not limited to application only in economics as it has great utility in modeling number of casualties in earthquakes, forestry fire areas and oil & gas in different field sizes [4]. The applicability of Pareto model in real life phenomenon is evident in many studies like, [2,5-9]. Its generalized, exponentiated, modified, Kumaraswamy and transmuted versions have also been presented with real life applications [10-14]. The density function of Pareto distribution is given as where β is scale and α is shape parameter and it is denoted by x ~ Pareto(β, α). Shapes of Probability Density Function (PDF) and Cumulative Distribution Function (CDF) for different combinations of scale and shape parameters are shown in Figs 1 and 2, respectively.
Fig 1

PDF of the Pareto distribution.

Fig 2

CDF of the Pareto distribution.

In the literature concerning parameter estimation, different estimation strategies have been used for the Pareto distribution like Quandt [15] derived expressions for moments, maximum likelihood, percentile and least squares estimators. Kuldroff and Vannman [16] have proposed parameter estimation of the Pareto distribution by linear functions of order statistics. Afify [17] has employed distinct estimation procedures for parameter estimation of Pareto distribution and revealed that least squares estimators perform better in terms of root mean square error. Parameter estimation of the Pareto distribution have also been carried out using jackknife and minimum risk estimators [18]. Based on Monte Carlo simulation Lu and Tao [19] showed that maximum likelihood and weighted least squares methods were equally efficient. Method of percentile estimation is in use for a long time. Parameters of different probability distributions have been estimated using percentile estimation method and found better or equally efficient to maximum likelihood and least squares techniques [20-22]. In the literature devoted to parameter estimation, different modifications have been proposed in standard estimation procedures. Modified maximum likelihood estimators and modified moment estimator have been introduced and found efficient than traditional estimators for different probability distributions like three-parameter log-normal [23-24], three-parameter Weibull [25], three-parameter Gamma [26], Rayleigh [27], two-parameter Exponential [28], and two-parameter Power Function [29]. Keeping in view the applicability and importance of the Pareto distribution in empirical studies, method of percentile estimation and superiority of modified estimators for different distributions in recent literature, present study is focused on deriving modified percentile estimators for Pareto distribution. The derived modifications have been compared with traditional percentile estimators through Monte Carlo simulation and two real life datasets.

2. Methodology

In the present work, we have suggested some modifications in percentile estimation method using median, geometric mean and expectation of first order statistic of empirical cumulative distribution function of the Pareto distribution. The modified estimators were compared with traditional percentile estimators.

2.1 Method of percentile estimation

Percentiles play an important role in descriptive statistics and their use is recommended for parameter estimation as well [30]. The principle is based on equating two values of cumulative distribution function with corresponding percentiles and then simultaneously solving resulting equations for unknown parameters. Following Marks [22], Zaka and Akhtar [29] and Sampath and Anjana [31], we have chosen P25 and P75 to be relatively more accurate in comparison to other pairs of percentiles.

2.2 Percentile estimator

Let x1, x2,…,xn be a random sample of size n from Pareto distribution. The cumulative-distribution function of a Pareto distribution with shape and scale parameters α and β, respectively is, Thus, using percentiles P75 and P25, Similarly, Solving Eqs (1) and (2) simultaneously for unknown parameters, we get the percentile estimators for α and β as, Eqs (3) and (4) are the required percentile estimators of the Pareto distribution. For further reference, we name these estimators as PE.

2.3 Modified percentile estimator (I)

Our first modification in method of percentile estimation is based on replacing Eq (2) by median of the Pareto distribution as, Rewriting Eq (1) as Solving Eqs (5) and (6) simultaneously we get first modified percentile estimators for α and β, so, Putting value of from Eq (7) in Eq (6) we get estimate of β as Eqs (7) and (8) provide expressions for first modified percentile estimators (PE-I, for further reference).

2.4 Modified percentile estimator (II)

Our second modified percentile estimator is based on replacing Eq (2) by Geometric Mean (GM) of the Pareto distribution. Rewriting Eq (1) as Solving Eqs (9) and (10) simultaneously we get second modified percentile estimators for α and β as, Putting value of from Eq (11) in Eq (9) we get estimate of β as Eqs (11) and (12) are the expressions for the second modified percentile (PE-II for further reference) estimators of the Pareto distribution.

2.5 Modified percentile estimator (III)

The third modified percentile estimator proposed is obtained by replacing Eq (2) by expectation of empirical cumulative distribution function of first order statistic of Pareto distribution. Following [25,26,28,29] expectation of empirical CDF of first order statistic is defined as, So in case of the Pareto distribution, We have Eq (1) as Comparing Eqs (14) and (15), Eqs (14) and (16) give algebraic expressions for third modified percentile estimators (PE-III for further reference) of parameters of the Pareto distribution.

2.6 Performance indices

In order to compare efficiency and accuracy of different estimators, Total Mean Square Error (TMSE) and Total Relative Deviation (TRD) were used as performance indices. These measures are frequently used as performance criterion when different estimators (or estimation strategies) are compared through Monte Carlo simulation [28,29,32-39]. These performance indices are defined as, and where α and β are the true parameters, REP is the number of replications while and are the parameter estimates. As true parameters are unknown in real life data set, total mean square error and total relative deviation cannot be used for assessing performance of estimators in such cases. Therefore, we have used Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Root Mean Square Error (RMSE) and Root Mean Square Percentage Error (RMSPE) as performance measures for comparison among different estimators. These measures are defined as, where S(x) is sample (observed) distribution function and is expected distribution function which are respectively defined as, and with parameter estimates ( and ) form any particular method.

3. Monte Carlo simulation

A Monte Carlo simulation study was performed to compare the proposed modified percentile estimators with traditional percentile estimation. This comparison was carried out by taking random samples of different sizes (n = 20, 50, 100, 200, 500 and 1000) with different pairs of parameter values (β, α) = (1, 0.5), (1, 1), (1, 2), (2, 1). For any combination of true parameters (β and α), Monte Carlo simulation was performed by carrying out following steps in R-language [40]. A sample of n uniform random numbers was generated in interval [0,1]. Uniform random numbers were converted in Pareto random variables by following relation. The process in above steps was repeated 10000 times.

4. Results and discussion

Tables 1–4 present the results of Monte Carlo simulation study carried out for numerical evaluation of the estimators considered for different sample sizes and different parameter combinations.
Table 1

Comparison of PE and modified PE for β = 1, α = 0.5.

nMethodE(β^)E(α^)TMSETD
20PE1.1775120.5811130.2661230.339737
PE-I1.4548960.6462391.6965950.747374
PE-II1.8396411.85823210974.323.556105
PE-III1.0083790.5399130.0403920.088205
50PE1.0681490.5347240.0680590.137597
PE-I1.173880.5578610.371540.289601
PE-II1.3537150.63809919.035150.629914
PE-III1.0015180.5194490.0110830.040415
100PE1.0349970.5174850.0301210.069967
PE-I1.0909070.5288810.1613910.148668
PE-II1.1687910.5559630.506950.280717
PE-III1.0001060.5096160.0046040.019339
200PE1.0182520.5089120.0144630.036076
PE-I1.0454470.5142970.0723350.074042
PE-II1.0897970.5265840.2028290.142964
PE-III1.0001750.5048830.0021350.009941
500PE1.0081840.5035610.0053830.015306
PE-I1.0178410.5055050.0275630.028852
PE-II1.0353480.5099620.0704510.055272
PE-III1.0000390.5018190.0008160.003676
1000PE1.0043320.5023360.0026820.009004
PE-I1.0102030.5034840.0133820.017172
PE-II1.0191590.5056880.0330110.030536
PE-III1.0000240.5014220.0004010.002867
Table 4

Comparison of PE and modified PE for β = 2, α = 1.

nMethodE(β^)E(α^)TMSETRD
20PE2.1419951.1772340.3615730.248231
PE-I2.2708061.3084291.3137340.443832
PE-II2.3609061.6366391427.4920.817092
PE-III2.0050431.0930120.124780.095533
50PE2.0623421.075970.1128380.107141
PE-I2.1257021.1286060.4089160.191457
PE-II2.198211.26950230.758320.368607
PE-III2.0009921.041540.0401890.042036
100PE2.0303031.0349420.0485740.050094
PE-I2.061611.0585650.1820360.089371
PE-II2.0999421.1129291.4080450.162899
PE-III2.0003171.0190280.0176310.019187
200PE2.0150671.0167090.022390.024243
PE-I2.0312281.0283040.084370.043918
PE-II2.0485061.0501270.2035570.07438
PE-III2.0001781.0089880.0080960.009077
500PE2.0062471.0069460.0086370.010069
PE-I2.0129111.011650.0337740.018106
PE-II2.0205551.0197670.0767730.030045
PE-III2.0000221.0037430.0031120.003754
1000PE2.0032471.0039920.0043650.005616
PE-I2.0065341.0062280.0164640.009495
PE-II2.0112571.010540.0373680.016169
PE-III1.9999691.0023410.0015920.002356
Results from Table 1 (for β = 1; α = 0.5) show that modified percentile estimator PE-III (which is based on expectation of empirical cumulative distribution function of first-order statistic) more accurately estimated true parameters compared to traditional percentile estimator and other modified percentile estimators (based on median and geometric mean). From these results, under total mean square error criterion, third modified percentile estimator provided more efficient parameter estimates for all sample sizes as it has lower values of total mean square error values than other competing estimators. Based on second performance criterion, total relative deviation, it is interesting to note that for all samples sizes we come to same conclusion that third modified percentile estimators is more efficient among all estimation strategies considered. It is worth noticing that traditional percentile estimator is second best choice after third modified percentile estimator. Concerning literature devoted to modified estimators, our results coincide with other studies favouring use of modified maximum likelihood, moment and percentile estimation for different probability distributions [25-29]. Avoiding repetition, it can be stated that PE-III provides more efficient and accurate estimates of parameters than other estimators considered for all sample size for parameter combinations (β = 1, α = 1), (β = 1, α = 2) and (β = 2, α = 1) presented in Tables 2–4, respectively.
Table 2

Comparison of PE and modified PE for β = 1, α = 1.

nMethodE(β^)E(α^)TMSETRD
20PE1.0730721.1774650.2482540.250536
PE-I1.136861.3052230.7310950.442083
PE-II1.19023413.56927201641812.75951
PE-III1.002981.0946740.1176580.097654
50PE1.0271241.0661420.0695930.093265
PE-I1.0514581.1099180.1926250.161376
PE-II1.0832641.23334670.595110.31661
PE-III1.0008031.0360140.0361070.036817
100PE1.0140991.0331430.030490.047242
PE-I1.0298831.0564820.0834190.086365
PE-II1.0480961.304341362.97310.352436
PE-III0.9999211.0180840.0166780.018163
200PE1.0067651.0170740.0146770.023839
PE-I1.0136891.0274620.0379710.04115
PE-II1.0224711.0506640.0972140.073136
PE-III1.0001471.009940.0083910.010088
500PE1.0026911.0061990.0056530.00889
PE-I1.0054961.0103620.014650.015859
PE-II1.0103011.0196270.0340650.029928
PE-III0.9999891.0033090.0032290.00332
1000PE1.0010361.0028010.0027210.003837
PE-I1.0025771.004940.0070830.007517
PE-II1.0040161.0086240.0157010.01264
PE-III0.9999911.0015970.0015640.001606
Moreover, from results in Tables 1–4, it can also be observed that modified estimator PE-II (based on geometric mean) is worst performer in terms of both performance indicators. However, its performance gets better gradually with increasing sample size. The reason behind its poor performance in small samples may be that geometric mean is influenced by extreme values which is common in heavy tailed distributions like Pareto.

5. Real life examples

In addition to numerical evaluation of proposed estimators through simulation study, the modified percentile estimators were applied on two real life data sets. For comparison purpose, we have also used maximum likelihood and moment estimators of Pareto distribution. The Maximum Likelihood (ML) estimator of α and β are, Similarly, the estimators from Method of Moments (MM) are Example 1: First example is taken from Clark [9], it consists of 21 observations about number of deaths in major earthquakes during 1900–2011 as published by the United States Geological Survey. The results from application of proposed estimators on example 1 are presented in Table 5.
Table 5

Comparison of estimators for example 1.

Methodβ^α^MAEMAPERMSERMSPE
ML200850.9033760.03843414.428160.04709927.24039
MM53146.932.4435392.359632061.1823.8222015120.551
PE21348.210.8455530.05445627.548160.06534655.94947
PE-I22727.270.8791180.07268142.045170.08936388.84464
PE-II13071.230.6508220.08545449.336470.102901104.4611
PE-III18933.40.7878680.0312316.5868410.042878.599407
Results from Table 5 clearly indicate the superiority of PE-III in comparison to other percentile based estimators as well as to maximum likelihood and moment estimators. All four performance measures have smaller values for PE-III than other estimators. Example 2: Second data set is taken from Beirliant et al. [41] consisting of 142 values of fire damage claims (in 1000’s of Norwegian Krones) in Norway during 1975. This data set have also been used by some other studies focusing on Pareto distribution [3,42,43]. Table 6 shows that based on three performance indices, third modified percentile estimator (PE-III) is better than traditional percentile (PE), maximum likelihood (ML), moment (MM) and other modified percentile estimators (PE-I, PE-II). However, maximum likelihood estimation performs slightly better that PE-III in terms of mean absolute error.
Table 6

Comparison of estimators for example 2.

Methodβ^α^MAEMAPERMSERMSPE
ML5001.194030.0144556.9803190.01947817.45636
MM2685.6722.00734810.589889363.31813.5985624469.07
PE489.60911.2505450.0172398.0673870.02111314.84931
PE-I559.43441.4215010.05721857.148530.080562162.745
PE-II604.74081.544860.10874109.94730.153703307.4798
PE-III497.2411.2682410.0150936.7366360.01815613.48124

6. Conclusion

Three modified percentile estimators are proposed for parameter estimation of the Pareto distribution. The modifications are based on median, geometric mean and expectation of empirical cumulative distribution function of first order statistic of Pareto distribution. Newly proposed estimators are compared with the traditional percentile estimators via Monte Carlo simulation and performance of modified percentile estimator based on expectation of empirical cumulative distribution function of first-order statistic is found better than traditional and other modified percentile estimators in terms of mean square error and total relative deviation. The Monte Carlo simulation results were further corroborated by application of proposed estimators on two real-life examples. From real life applications, it is shown that modified percentile estimator based on expectation of empirical cumulative distribution function of first order statistic performs better than not only other percentile based estimators but also maximum likelihood and moment estimators. Considering results from simulation and real data applications, use of modified percentile estimation can be recommended for estimating parameters of the Pareto distribution.

MINIMAL DATA.xlsx.

(XLSX) Click here for additional data file.
Table 3

Comparison of PE and modified PE for β = 1, α = 2.

nMethodE(β^)E(α^)TMSETRD
20PE1.0327742.3561480.8434060.210848
PE-I1.0524332.6222392.3212680.363552
PE-II1.0583612.8059876327.5760.461354
PE-III1.0011372.1891730.4502760.095724
50PE1.0127962.1369930.2395320.081293
PE-I1.0223122.2359010.549740.140262
PE-II1.0270952.433643108.43770.243917
PE-III1.0001072.0739070.1461560.037061
100PE1.0057772.0649170.1038720.038235
PE-I1.0099392.1090240.2173260.064451
PE-II1.0137692.2436512.4954660.135594
PE-III1.000052.0359850.0673430.018043
200PE1.0033262.0338660.0491880.020259
PE-I1.0058712.0567030.0984550.034223
PE-II1.0074182.1021940.2458550.058515
PE-III1.0000392.0184160.0329660.009247
500PE1.0011512.010880.0180040.006591
PE-I1.001442.0178160.0357740.010347
PE-II1.0021772.0341620.077370.019259
PE-III0.9999952.0053490.0124060.002679
1000PE1.0003852.0050860.009120.002928
PE-I1.0006512.0089250.0179450.005113
PE-II1.0010932.0172530.0374910.00972
PE-III12.0028430.0062990.001422
  2 in total

1.  Estimation of population percentiles.

Authors:  Frank Schoonjans; Dirk De Bacquer; Pirmin Schmid
Journal:  Epidemiology       Date:  2011-09       Impact factor: 4.822

2.  Comparison of Two New Robust Parameter Estimation Methods for the Power Function Distribution.

Authors:  Muhammad Shakeel; Muhammad Ahsan Ul Haq; Ijaz Hussain; Alaa Mohamd Abdulhamid; Muhammad Faisal
Journal:  PLoS One       Date:  2016-08-08       Impact factor: 3.240

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.