Literature DB >> 35463286

Poisson XLindley Distribution for Count Data: Statistical and Reliability Properties with Estimation Techniques and Inference.

Muhammad Ahsan-Ul-Haq1, Afrah Al-Bossly2, Mahmoud El-Morshedy2,3, Mohamed S Eliwa4,5.   

Abstract

In this study, a new one-parameter count distribution is proposed by combining Poisson and XLindley distributions. Some of its statistical and reliability properties including order statistics, hazard rate function, reversed hazard rate function, mode, factorial moments, probability generating function, moment generating function, index of dispersion, Shannon entropy, Mills ratio, mean residual life function, and associated measures are investigated. All these properties can be expressed in explicit forms. It is found that the new probability mass function can be utilized to model positively skewed data with leptokurtic shape. Moreover, the new discrete distribution is considered a proper tool to model equi- and over-dispersed phenomena with increasing hazard rate function. The distribution parameter is estimated by different six estimation approaches, and the behavior of these methods is explored using the Monte Carlo simulation. Finally, two applications to real life are presented herein to illustrate the flexibility of the new model.
Copyright © 2022 Muhammad Ahsan-ul-Haq et al.

Entities:  

Mesh:

Year:  2022        PMID: 35463286      PMCID: PMC9020915          DOI: 10.1155/2022/6503670

Source DB:  PubMed          Journal:  Comput Intell Neurosci


1. Introduction

Researchers obtain a multitude of probability distributions for analyzing the various forms of data sets from diverse sectors, such as health, transportation, engineering, astronomy, and agriculture. Various well-known approaches are used to introduce new probability distributions. Some famous approaches, such as compounding technique and T-X family, give a very effective way to generalize a common parametric family of distributions to fit data sets and those classical distributions do not sufficiently fit. In some practical fields, count data may be generated/observed, and to model such data, discrete probability distributions were proposed based on different approaches such as survival discretization, Poisson mixture, and compound models. For example, Greenwood and Yule [1] compound Poisson and negative binomial distributions by considering the rate parameter in the Poisson distribution. Mahmoudi and Zakerzadeh [2] extended the Poisson–Lindley distribution and revealed that their generalized distribution is more flexible in evaluating count data. Zamani and Ismail [3] introduced a novel compound distribution by combining a negative binomial distribution with a one-parameter Lindley distribution that provides a better fit for count data. Rashid [4] introduced a count data model that combines the negative binomial and Kumaraswamy distributions and used it for modeling biological data sets. Some more discrete distributions are Poisson–Ishita distribution by Hassan et al. [5]; Poisson–Ailamujia distribution by Hassan et al. [6]; Poisson Xgamma distribution by Para et al. [7]; Poisson quasi-Lindley distribution by Grine and Zeghdoudi [8]; discrete Gompertz-G family by Eliwa et al. [9]; discrete extension to three-parameter Lindley model by Eliwa et al. [10]; two-parameter exponentiated discrete Lindley distribution by El-Morshedy et al. [11]; Eliwa and El-Morshedy [12]; discrete Burr–Hatke distribution by El-Morshedy et al. [13]; discrete Weibull Marshall–Olkin family by Gillariose et al. [14]; and discrete Ramos–Louzada model by Eldeeb et al. [15]. The XLindley (XL) distribution was introduced for the analysis of lifetime data (see [16]). Let X be a random variable following the XL distribution with the probability density function: Since there is a need for a more flexible model for modeling statistical data, in this study, we proposed a flexible discrete distribution by compounding Poisson and XL distributions. The proposed model is named the “Poisson-XL” distribution. The reported distribution strength lies in the capacity to describe equi- and over-dispersed data. Furthermore, it can be used as a suitable statistical tool to model positively skewed data with leptokurtic shape. One more advantage to Poisson-XL model is that its statistical and reliability characterization can be expressed in closed forms, which make this model have multi-benefits in regression and time-series analysis. The study is organized as follows. Section 2 is devoted to the derivation of Poisson-XL distribution and its shape analysis. Some statistical properties are derived in Section 3. Some reliability measures are derived in Section 4. The parameter is estimated in Section 5. Section 6 is based on the applications of the proposed distribution. In the end, we concluded this study in Section 7.

2. Synthesis of the Poisson-XL Model

If X|λ follows Poisson(λ) where λ is itself a random variable following XL distribution with parameter α, then determining the distribution that results from marginalizing over λ will be known as a compound of Poisson distribution with that of XL distribution, which is denoted by the PXL model.

Theorem 1 .

The probability mass function of a compound of PXL distribution is given as follows:

Proof

The probability mass function of a compound of Poisson(λ) with XL(α) can be formulated as follows:Then:where x=0,1,2,…,  and α > 0. Figure 1 shows the probability mass function (PMF) plots of the proposed distribution for various values of parameter α.
Figure 1

PMF plot of the PXL distribution.

According to Figure 1, it is noted that the PMF can be either unimodal or decreasing-shaped. Further, it can be utilized as a probability tool to discuss right-skewed data. The corresponding cumulative distribution function (CDF) to equation (2) can be expressed as follows:where x=0,1,2,…,  and α > 0. Let x1:, x2:, x3:,…, x be the order statistics of a random sample from the PXL distribution. The cumulative distribution function of ith order statistics for an integer value of x is given as follows: where and F(x; α, m+j)=[1 − {1+α(x+4+α(3+α))}/(1+α)4+] represent the CDF of the exponentiated PXL distribution with power m+j. The corresponding PMF to equation (2) is given as follows:where f(x; α, m+j) represents the PMF of the exponentiated PXL distribution with power parameter m+j. Thus, the pth moments of X can be written as follows:

3. Statistical Properties

3.1. Mode

To get and study the mode of the PXL model with its characterization, we should derive the first and second derivatives to the PMF with respect to x, where:and when d/dx(P(X=λ, α))=0, the solution is as follows: For the mode is a unique critical point, in which P(X=λ, α) is maximum and P(X=λ, α) is concave, but if the density function is decreasing of x.

3.2. Factorial Moments

The rth factorial moment around the origin of the PXL distribution can be obtained as follows:where X(=X(X − 1)(X − 2) … (X − r+1), and then:and assuming y=x − r, we get the following:

3.3. Probability Generating Function (PGF)

Theorem 2 .

If X has PXL(X; α), then PGF G(Z) can be formulated as follows: The PGF can be obtained as follows:

3.4. Moment Generating Function (MGF)

Theorem 3 .

If X has PXL(X; α), then the MGF can be expressed as follows: The moments around the origin can be obtained as follows:Then: The first four ordinary moments of X are as follows:whereas the first four moments around the mean of X are as follows: Based on the rth moments, the index of dispersion index (DI) can be expressed as follows: Further, the skewness and kurtosis can be derived in closed forms, where: The summary measure, mean, variance, moments, and DI are presented in Table 1. The PXL model can be used to model equi- and over-dispersed data.
Table 1

Moments and DI of the PXL distribution.

α MeanVarianceDI E(X2) E(X3) E(X4)
0.118.265215.2511.785548.8422486.01162376.9
0.28.472256.1386.6261127.922679.371386.3
0.52.88899.65433.341918.000160.221847.3
0.71.92295.13172.66878.829258.293502.48
1.01.25002.68752.15004.250020.750133.25
1.50.77331.34861.74391.94676.924432.548
2.00.55560.85801.54441.16673.388913.000
2.50.43270.61771.42770.80492.02746.7216
3.00.35420.47871.35170.60421.36814.0579
3.50.29980.38931.29850.47920.99872.7111
4.00.26000.32741.25920.39500.77001.9438
4.50.22960.28221.22910.33490.61781.4671
5.00.20560.24771.20530.29000.51091.1513
7.00.14510.16611.14500.18720.28970.5602
100.10080.11101.10080.12120.16800.2825
500.02000.02041.02000.02080.02250.0259
1000.01000.01011.01000.01020.01060.0114
The plots of coefficient of skewness and kurtosis are shown in Figure 2. The skewness and kurtosis monotonically increase for higher values of α. Moreover, the PXL model can be used as a probability tool for modeling positively skewed data with leptokurtic shape.
Figure 2

Skewness and kurtosis of the PXL distribution.

3.5. Shannon Entropy

The Shannon entropy is a measurable physical property that is most associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynamics, where it was first recognized, to the microscopic description of nature in statistical physics, and to the principles of information theory. The Shannon entropy of the random variable X can be expressed as follows:and then:For more details around HurwitzLerchPhi (0, 1, 0) function “Lerch transcendent,” see https://mathworld.wolfram.com/LerchTranscendent.html. Some entropy values of PXL distribution in terms of the parameter (α) are presented in Table 2. It is noticed that the Shannon entropy shows a monotonically decreasing pattern and it proceeds to zero when α increased.
Table 2

Shannon entropy of PXL distribution.

α H(x) α H(x) α H(x)
0.13.882823.00.778196.50.45858
0.23.170563.50.702017.00.43522
0.52.213364.00.641447.50.41444
1.01.545384.50.591938.00.39582
1.51.214625.00.550568.50.37901
2.01.013845.50.515391000.05611
2.50.877566.00.485065000.01443

4. Reliability Characteristics of the PXL Distribution

4.1. Reversed (Hazard) Rate Function and Mills Ratio

The corresponding survival function (SF) to equation (5) can be expressed as follows: The hazard rate function (HRF) of the random variable X can be defined as h(x)=P(X=x; α)/S(x − 1; α). Then, the HRF of the PXL distribution can be formulated as follows: It is easy to see that the limiting behavior HRF at the upper limit is . As a result, the parameter α may be regarded as a strict upper bound on the HRF, which is a key feature of lifetime probability distributions. Few discrete distributions contain parameters that can be readily interpretable in terms of failure rate functions. The geometric distribution is an exception, although in this instance the HRF is constant. In Proposition 1, we proved that the PXL distribution always allows for increasing failure rates.

Proposition 1 .

The HRF of the PXL distribution is increasing. According to Glaser (1980) and from the PMF of the PXL distribution:and it follows that:∀x, α > 0, implying that h(x) is increasing. Figure 3 illustrates some plots of the PXL model based on various values of the model parameter.
Figure 3

HRF plots of the PXL distribution.

The reverse hazard rate of the PXL distribution is as follows:whereas the second rate of failure and Mills ratio can be expressed, respectively, as follows:

4.2. Mean Residual Life

For the random variable X, the mean residual life or the mean remaining lifetime (MRL) is the expected remaining life of X − t, given that the item has survived to time i. Thus, the MRL concept can be used effectively in stochastic ageing and dependence for reliability. The unconditional mean of the distribution, E(X), is a special case of mean residual life for i=0. For a discrete random variable, the MRL function is defined as follows:where ℕ0={0, 1, 2,…, w} and 0 < w < ∞. Let X have the PXL random variable, and then, the MRL is defined as follows:and after simple algebra steps, we get the MRL in an explicit form as follows:

5. Various Estimation Techniques

This section is based on parameter estimation of the PXL distribution using different estimation methods. The considered methods are maximum likelihood, moment, Anderson darling, Cramér–von Mises, ordinary least squares, and weighted least squares.

5.1. Maximum-Likelihood Estimation (MLE)

Suppose x=(x1, x2, x3,…, x) be a random sample of size “n” from the PXL distribution. Then, the log-likelihood (L) function is given as follows: Partially differentiating with respect to α, we get the following: Since we cannot get a close form to equation (15), a numerical procedure should be used to solve this equation numerically to get the maximum-likelihood estimator.

5.2. Method of Moment Estimation (MOME)

Based on the MOME approach for estimating the parameter, the sample and population means should be derived. So, to get the estimator of the PXL model, the solution of the following nonlinear equation provides the estimate of α, where:

5.3. Anderson–Darling Estimation (ADE)

The ADE is based on the difference in empirical and fitted CDF. The ADE of α follows by minimizing:with respect to α.

5.4. Cramér–von Mises Estimation (CVME)

The CVME is based on the difference between empirical and fitted CDF. The CVME of α follows by minimizing:with respect to α.

5.5. Ordinary Least-Squares and Weighted Least-Squares Estimation

Let X be the ith order statistics in a sample of size n. We adopt lower cases for sample values. It is well known that: Thus, the least-squares estimate (LSE) of α, say , can be derived by minimizing:with respect to α. The weighted least-squares estimate (WLSE) of α, say , can be determined by minimizing:with respect to α.

6. Simulation

To assess the accuracy of the six estimators described previously, we conducted a comprehensive simulation study. We used the PXL distribution to generate samples with n=25, 50, 100, 200, and 500 and then calculated the average values (AVEs) of the MLE, MOME, LSE, WLSE, CVME, and ADE to get the mean square errors (MSEs), average absolute biases (ABBs), and mean relative errors (MREs) for α=0.3, 0.5, 1.0, and 1.5. The ABBs, MREs, and MSEs are given as follows: We ran the simulation 5000 times to derive these metrics from the prior values for all estimation methods. The findings in Tables 3–6 were obtained using the R software's optim-CG function. The findings show that as the sample size n increased, the AVEs became closer to the real values of α. Furthermore, when n increases, the ABBs, MREs, and MSEs for all estimators decreased.
Table 3

Simulation results of PXL distribution for α=0.3.

n MLEMOMEADECVMEOLSEWLSE
25AVEs0.33420.33410.29070.29130.29120.2863
500.32960.33070.28860.28870.28870.2819
1000.32790.32790.28760.28840.28810.2782
2000.32640.32600.28740.28760.28760.2750
5000.32570.32610.28750.28730.28720.2711

25AABs0.03420.03410.00930.00870.00880.0137
500.02960.03070.01140.01130.01130.0181
1000.02790.02790.01240.01160.01190.0218
2000.02640.02600.01260.01240.01240.0250
5000.02570.02610.01250.01270.01280.0289

25MREs0.16260.04870.11680.12150.12340.1135
500.12520.03830.08740.08680.08790.0892
1000.10410.03120.06570.06700.06630.0795
2000.09120.02690.05310.05360.05330.0840
5000.08590.02620.04450.04530.04560.0963

25MSEs0.00420.00410.00190.00210.00210.0017
500.00230.00240.00100.00100.00110.0011
1000.00150.00150.00060.00060.00060.0008
2000.00100.00100.00040.00040.00040.0008
5000.00080.00080.00020.00030.00030.0009
Table 4

Simulation results of PXL distribution for α=0.5.

n MLEMOMEADECVMEOLSEWLSE
25AVEs0.60160.60060.46290.46140.46130.4387
500.58920.59140.46180.45880.45870.4282
1000.58480.58600.45940.45770.45670.4164
2000.58180.58340.45850.45640.45630.4065
5000.58000.58110.45830.45660.45620.3950

25AABs0.10160.10060.03710.03860.03870.0613
500.08920.09140.03820.04120.04130.0718
1000.08480.08600.04060.04230.04330.0836
2000.08180.08340.04150.04360.04370.0935
5000.08000.08110.04170.04340.04380.1050

25MREs0.22950.11530.12390.12440.12670.1379
500.18830.09650.09950.10350.10290.1454
1000.17200.08740.08850.09140.09320.1672
2000.16390.08350.08430.08860.08850.1870
5000.16010.08110.08350.08680.08760.2099

25MSEs0.02340.02290.00560.00560.00580.0066
500.01360.01420.00350.00380.00380.0063
1000.01000.01030.00270.00280.00290.0075
2000.00800.00830.00220.00240.00240.0090
5000.00690.00710.00190.00210.00210.0111
Table 5

Simulation results of PXL distribution for α=1.0.

n MLEMOMEADECVMEOLSEWLSE
25AVEs1.54691.55420.78980.77500.77170.6919
501.49101.48750.78750.77030.77080.6658
1001.45921.46360.78570.77040.76850.6403
2001.44751.45040.78650.76920.76830.6180
5001.44081.44490.78510.76870.76810.5924

25AABs0.54690.55420.21020.22500.22830.3081
500.49100.48750.21250.22970.22920.3342
1000.45920.46360.21430.22960.23150.3597
2000.44750.45040.21350.23080.23170.3820
5000.44080.44490.21490.23130.23190.4076

25MREs0.55290.55980.21070.22530.22850.3082
500.49190.48800.21250.22970.22920.3342
1000.45920.46360.21430.22960.23150.3597
2000.44750.45040.21350.23080.23170.3820
5000.44080.44490.21490.23130.23190.4076

25MSEs0.49420.51890.05170.05740.05880.0994
500.32170.31690.04900.05610.05600.1137
1000.24410.24990.04790.05430.05520.1303
2000.21610.21900.04660.05410.05450.1463
5000.20030.20430.04660.05380.05410.1663
Table 6

Simulation results of PXL distribution for α=1.5.

n 1.5MLEMOMEADECVMEOLSEWLSE
25AVEs3.33893.37990.98000.95440.95490.8498
503.05943.05950.98210.95350.95500.8133
1002.95212.94740.98060.95340.95330.7817
2002.89502.88000.98080.95220.95230.7540
5002.85582.86120.98080.95180.95200.7209

25AABs1.83891.87990.52000.54560.54510.6502
501.55941.55950.51790.54650.54500.6867
1001.45211.44740.51940.54660.54670.7183
2001.39501.38000.51920.54780.54770.7460
5001.35581.36120.51920.54820.54800.7791

25MREs1.22711.88120.34660.36370.36340.4335
501.03961.55950.34520.36430.36330.4578
1000.96811.44740.34630.36440.36450.4789
2000.93001.38000.34610.36520.36510.4973
5000.90391.36120.34610.36550.36530.5194

25MSEs6.61357.29130.27650.30360.30320.4279
503.24283.28770.27130.30170.30010.4743
1002.40862.40460.27140.30030.30050.5173
2002.07922.03150.27040.30080.30070.5572
5001.88571.90280.26980.30080.30060.6073

7. Applications

In this section, the flexibility of the PXL distribution is proposed based on two distinctive real data sets. The first data set is the biological experiment data on the European corn borer [17], which is shown in Table 7. The investigator counts the number of borers per hill of corn in an experiment conducted randomly on 8 hills in 15 replications. The mean, variance, and index of dispersion values of X are 1.4833, 3.193, and 2.1526, respectively. Since distribution is over-dispersed, we can use PXL distribution.
Table 7

Goodness of fit for data set I.

XObserved frequencyExpected frequency
PXLDBPoiDPrDRDBHDITLPA
04347.132.727.264.515.968.152.239.6
13529.239.640.420.136.222.030.433.7
21717.824.330.09.734.610.514.121.5
31110.712.514.85.621.06.07.512.2
456.36.05.53.78.93.84.46.5
543.72.71.62.62.72.52.83.3
612.21.20.41.90.61.71.91.7
721.30.50.11.50.11.31.30.8
821.70.40.010.40.04.25.40.7
Total120120120120120120120120120
α MLE0.86610.27671.48331.11121.87430.86551.98400.6740
S.E.0.08220.15980.11110.10270.08740.03850.18320.0667

χ 2 1.82279.642821.89836.24360.17925.1426.93422.7097
Degree of freedom44343344
p Value0.76830.0468<0.01<0.01<0.01<0.01<0.010.1394
L200.63204.68219.19220.62235.23214.05205.15201.22
AIC403.26411.35440.38443.24472.45430.10412.30404.44
BIC406.04414.14443.16446.02475.24432.89415.09407.23
The second data set shows the number of mammalian cytogenetic dosimetry lesions produced by streptogramin (NSC-45383) exposure in rabbit lymphoblasts of −70 3 bc g/kg [18]. The second data set is shown in Table 8. The mean, variance, and index of dispersion values of X are 0.54, 0.8312, and 1.5392, respectively.
Table 8

Fitted PXL distribution and other competitor distributions to second data set.

XObserved frequencyExpected frequency
PXLDBPoiDPrDRDBHDITLPA
0200194.6174.9174.8216.9124.8209.6196.9186.0
15768.594.694.443.9140.354.169.379.1
23024.024.025.516.232.519.919.925.2
378.45.24.67.82.38.57.27.1
4≥64.51.30.715.20.17.96.82.5
Total300300300300300300300300300
α MLE2.05121.23320.54001.85180.96430.60303.71341.8519
S.E.0.17630.05510.04240.11240.02980.03760.22430.1639

χ 2 3.550826.54730.43122.66396.6506.43617.44229.3468
Degree of freedom22231332
p Value0.1694<0.01<0.01<0.01<0.010.09220.0591<0.01
l299.31311.74314.23312.94371.12301.70302.76302.41
AIC600.63625.48630.45627.88744.23605.41607.53606.83
BIC604.33629.18634.16631.59747.94609.11611.23610.53
Some competitive models such as the discrete Bilal (DB) by Altun et al. [19]; discrete Pareto (DPr) by Krishna and Pundir [20]; discrete Rayleigh (DR) by Roy [21]; discrete Burr–Hatke (DBH) by El-Morshedy et al. (2020); discrete inverted Topp–Leone (DITL) by Eldeeb et al. [22]; Poisson–Ailamujia (PA) by Hassan et al. [6]; and Poisson (Poi) distributions are used herein. To obtain the best model to analyze data sets I and II, some criteria should be used such as Akaike information criterion (AIC) and Bayesian information criterion (BIC) as well as −L as indicators of the relative quality of statistical models for the given set of data. These criteria assess the quality of each model with the other models given a set of data models. Moreover, the chi-square (χ2) test is used with its corresponding p value where the estimated probabilities under the null hypothesis are as follows: The estimated expected frequencies are obtained as . The results of the chi-square test are reported in Tables 7 and 8. Thus, we cannot reject the null hypothesis at the 5% level of significance and the PXL distribution is a good fit for these data sets. For data set I, the PXL and PA work quite well for analyzing data set I, but the PXL is the best, and Figure 4 supports our empirical results, which are listed in Table 7. For data set II, the PXL, DBH, and DITL work quite well for analyzing data set II, but the PXL is the best, and Figure 5 supports our empirical results, which are reported in Table 8. Since one of the major aims of this study is to get the best estimators for the data sets I and II, several estimation techniques have been derived for this purpose. Tables 9 and 10 report the different estimators for data sets I and II based on various estimation techniques.
Figure 4

Fitted PMFs of all selected models for the first data set.

Figure 5

Fitted PMFs of all selected models for the second data set.

Table 9

Estimation and goodness of fit for data set I.

Method ↓ Statistics⟶ α χ 2 p Value
ADE0.6011614.496290.02456
CVME0.5956215.045850.01990
OLSE0.5954215.045850.01990
WLSE0.5090628.80345<0.001
MOME0.867471.8395270.76524
Table 10

Estimation and goodness of fit for data set II.

Method ↓ Statistics⟶ α χ 2 p Value
ADE0.9025096.36338<0.001
CVME0.87576104.0459<0.001
OLSE0.87573104.0550<0.001
WLSE0.67663189.2819<0.001
MOME2.050813.5583620.16666
It is noted that MLE and MOME approaches work quite well in analyzing data set I, but the MLE method is the best for these data, whereas data set II can be discussed via the MLE and MOME techniques, but the MLE is the best.

8. Conclusion

In this study, a new one-parameter Poisson–XLindley (PXL) distribution has been proposed for modeling count data. Some distributional properties are derived and studied in detail. It was found that the properties of the PXL can be expressed in closed forms, which make it a proposer probability tool to establish regression and time-series model for discussing different types of data sets in various fields. The new probability mass function can be utilized to model positively skewed data with leptokurtic shape. Moreover, the PXL model can be used to model equi- and over-dispersed phenomena with increasing hazard rate function. Different estimation approaches have been used to estimate the model parameter. The behavior of these methods has been explored using the Monte Carlo simulation. Finally, two applications to real life have been discussed to illustrate the flexibility of the new discrete model.
  1 in total

1.  Types of chromosome structural change induced by the irradiation of Tradescantia microspores.

Authors:  D G CATCHESIDE; D E LEA; J M THODAY
Journal:  J Genet       Date:  1946-01       Impact factor: 1.166

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.