Literature DB >> 31826022

Lomax exponential distribution with an application to real-life data.

Abstract

In this paper, a new modification of the Lomax distribution is considered named as Lomax exponential distribution (LE). The proposed distribution is quite flexible in modeling the lifetime data with both decreasing and increasing shapes (non-monotonic). We derive the explicit expressions for the incomplete moments, quantile function, the density function for the order statistics etc. The Renyi entropy for the proposed distribution is also obtained. Moreover, the paper discusses the estimates of the parameters by the usual maximum likelihood estimation method along with determining the information matrix. In addition, the potentiality of the proposed distribution is illustrated using two real data sets. To judge the performance of the model, the goodness of fit measures, AIC, CAIC, BIC, and HQIC are used. Form the results it is concluded that the proposed model performs better than the Lomax distribution, Weibull Lomax distribution, and exponential Lomax distribution.

Entities: Chemical Disease Gene

Year: 2019 PMID： 31826022 PMCID： PMC6905582 DOI： 10.1371/journal.pone.0225827

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

In probability theory, it has been a usual practice for the last few years to modify the existing probability distributions so as to improve the flexibility of the existing models. These modifications are based on different methods such as increasing the number of parameters, making some transformation in the original distribution, proper mixing of two distributions etc. The main goal of such modifications is to improve the flexibility of the classical models. Motivating from the above methods, Ghitany and Al-Awadhi [1] proposed a compound form of the Lomax distribution with exponential distribution. Cordeiro et al. [2] modified the gamma-G family of distributions. Zografos et al. [3] employed the cumulative distribution function (Cdf) of the Lomax distribution as a baseline distribution. Lemonte et al. [4] and Lai et al. [5] used the idea of combining two distributions. Lemonte et.al [6] demonstrated the idea of Mcdonald-G family of distribution with a Lomax baseline function. Ibrahim et al. [7] modified the Lomax distribution by producing the real number to the power of the cumulative distribution function (Cdf) of Lomax distribution. Ashour and Eltehiwy [8], Merovci and Puka [9] and Khan et al. [10] utilized the well-known method that is the transmutation technique to generate new probability distributions. In this paper, we propose a modification to the Lomax distribution. The Lomax distribution is defined as: let a positive random variable Y has the Lomax distribution with parameters a and b, then the cumulative distribution function (Cdf) takes the form where a and b are the shape and scale parameters respectively. The probability density function related to (1) is given by The above probability distribution has been modified by many researchers. For example, Cordeiro et al. [2] explored the gamma-Lomax distribution and discussed its applications to real data sets. Lemonte et al. [6] presented an extended Lomax distribution. Ibrahim et al. [7] produced a new three parameters probability distribution and referred to it as exponentiated Lomax distribution. Ashour and Eltehiwy [8] discussed the new modification to the Lomax distribution and termed it as a transmuted exponentiated Lomax distribution. Tahir et al. [11] discussed the Weibull Lomax distribution with applications to applied data. The Lomax distribution is a heavily skewed probability distribution that plays a vital role in modeling the lifetime data sets produced in business, computer science, medical and biological sciences, engineering, economics, income and wealth inequality, Internet traffic and reliability modeling. The Lomax or Pareto II distribution have been applied to model the data related to income and wealth [12, 13], the distribution of computer files on server [14], reliability and life testing [15] etc. The Lomax distribution is an alternative to the exponential distribution when the data are heavily tailed [16]. The Lomax distribution has also been applied to record values by Ahsanullah [17]. El-Bassiouny et al. [18] investigated the exponential Lomax distribution. Afify et al. [19] defined the transmuted Weibull-Lomax distribution with real-world applications. For other probability distributions and their applications to different fields, we refer to see [20-31], and [32-36] respectively. In reliability theory where one deals with life testing experiments, most of the data sets result in non-monotonic hazard rate shapes. In such situations, the existing distributions fail to provide an adequate fit to the data. The main goal of this paper is to provide a new probability model that would be more flexible which adequately represents the data sets and have tractable statistical properties. The proposed model shall refer to as Lomax exponential distribution. The proposed model is produced using the transformation in the Lomax distribution. In the following section we have derived different statistical properties including hazard rate function, survival function, quantile function, moments, order statistics, parameter estimation, Renyi entropy, and asymptotic confidence bounds of the proposed model. We have further explored applications of the proposed model with two real data sets in addition to a simulation study.

Lomax exponential (LE) distribution

Let a random variable Y has the Lomax exponential distribution with parameters a and b. The parameters a and b are the shape and scale parameters respectively. The cumulative distribution function of the Lomax exponential distribution is given by The corresponding probability function to (3) is given as The hazard rate function and survival functions respectively are defined as Fig 1 shows the graphical representation of the probability density function and cumulative distribution function, with different parameter values.

Fig 1

Shapes of the Pdf and Cdf of Lomax exponential distribution.

The behavior of the hazard rate function

Theorem 1. The behavior of the hazard rate function of Lomax exponential (a,b) distribution h(y) is studied by taking the derivative of the hazard rate function in Eq (5) and is given by Simplifying we get The mode of the above expression is the roots of h′(x) = 0. If b>1, then h′(x) = 0 implies that the h(x) has a maximum at where W(z) is the Lambert w function. The function h(x) is increasing if h′(y)>0 for yy. h(x) is decreasing if h′(y)<0 for y0 for all values of y>y. Fig 2 illustrates that the Lomax exponential distribution can model both monotonically and non-monotonically hazard rate shapes with different values of the parameter.

Fig 2

Shapes of the hazard rate function with different values of b when a = 1.

Quantile function and median

The quantile function Q((y) of the LE(a,b) is the real solution of the following equation where u~Uniform (0,1). Solving (9) for y, we have where W (.) is the product log function. For calculating the median we have to put u = 0.5 in Eq (10) to have

Rth moment

Theorem 2. If Y has a Lomax exponential distribution with parameters a and b then the rth moments (about the origin) of X, say , does not exist. Using in the above expression to have By solving (12) the integral in (12), we get expression for as follows Hence the skewness and kurtosis can be defined by using the relation, where, var(y) = E(y2)−E2(y).

Order statistics

Let Y1,Y2,…,Y be ordered random variables, then the probability density function (Pdf) of the i order statistics is given by, The 1st and nth order probability density function (pdf) of the LE can be obtained using (3) and (4) in (16) to have

Parameter estimation

In this section, the usual method, that, the maximum likelihood estimation is used to find out the estimates of the unknown parameters of LE(a,b) based on complete information. Let us assume that we have a sample Y1,Y2,…,Y from LE(a,b). The Likelihood function is given by Substituting (4) in (19), we get By applying the natural logarithm to (20), the log-likelihood function is Now computing the first partial derivatives of (21) and setting the results equal zeros, we have The above Eqs from (22) to (25) are not in closed form. For the solution of these explicit equations, we refer to using some iterative procedure such as Newton Raphson, Bisection methods, or some other to get the approximate maximum likelihood estimates (MLE) of these parameters.

Asymptotic confidence bounds

Since the MLE of the unknown parameters a,b are not in closed forms, therefore, it is not possible to derive the exact distribution of the MLE. We have derived the asymptotic confidence bounds for the unknown parameters of LE(a,b) based on the asymptotic distribution of the MLE. For the information matrix, we find the second time partial derivatives of the Eqs from (22) to (25) and are given as So that the observed information matrix is given by Hence the variance-covariance matrix is approximated as To obtain the estimate of V, we replace the parameters by the corresponding MLE’s to get Using the above variance-covariance matrix, one can derive the (1 - β) 100% confidence intervals for the parameters a and b as following where is the upper percentile of the standard normal distribution.

Renyi entropy

Theorem 3. If a random variable X has a LE(a,b), then the Renyi entropy R(x) is defined by where By employing the result of the above expression in (30) we have Solving the function under the integral sign, finally we get

Applications

In this section, we provide an application of the LE distribution to two real data sets to illustrate its usefulness and compare its goodness-of-fit with other invariant forms of the Lomax distribution including the Weibull Lomax (WL) [11], Exponential Lomax (EL) [18], and the Lomax (L) [37], by using Kolmogorov–Smirnov (K–S) statistic, Akaike Information Criterion (AIC), Consistent Akaike Information Criterion (CAIC), Bayesian Information Criterion (BIC), and Hannan Quinn information Criterion (HQIC). Formulae of these criteria are given by where L is the maximized likelihood function and y is the given random sample, is the maximum likelihood estimator and p is the number of parameters in the model.

Data set 1: Losses due to wind catastrophes

The first data set represents the losses due to wind catastrophes recorded in 1977 used by Hogg and Klugman [38]. The data set consists of 40 observations that were recorded to the nearest $1,000,000 and include only losses of $2,000,000 or more. The data set values are as follows (in millions of dollars): 2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,4,4,4,5,5,5,5,6,6,6,6,8,8,9,15,17,22,23,24,25,27,32,43.

Data set 2: Breaking stress of carbon fibers

The second real data set represent the failure times of 84 aircraft windshield. This data is taken from an article published by [18]. The data points are as follows: 3.70,2.74,2.73,2.50,3.60,3.11,3.27,2.87,1.47,3.11,4.42,2.41,3.19,3.22,1.69,3.28,3.09,1.87,3.15,4.90,3.75,2.43,2.95,2.97,3.39,2.96,2.53,2.67,2.93,3.22,3.39,2.81,4.20,3.33,2.55,3.31,3.31,2.85,2.56,3.56,3.15,2.35,2.55,2.59,2.38,2.81,2.77,2.17,2.83,1.92,1.41,3.68,2.97,1.36,0.98,2.76,4.91,3.68,1.84,1.59,3.19,1.57,0.81,5.56,1.73,1.59,2.00,1.22,1.12,1.71,2.17,1.17,5.08,2.48,1.18,3.51,2.17,1.69,1.25,4.38,1.84,0.39,3.68,2.48,0.85,1.61,2.79,4.70,2.03,1.80,1.57,1.08,2.03,1.61,2.12,1.89,2.88,2.82,2.05,3.65. Table 1 represent the maximum likelihood estimates and Table 2 represent the goodness of fit measures AIC, CAIC, BIC, and HQIC of the Lomax exponential distribution for the wind catastrophes data. Table 3 represent the maximum likelihood estimates and Table 4 represent the goodness of fit measures AIC, CAIC, BIC, and HQIC using breaking stress of carbon fibers data. In general, the model is to be considered the best one among others for which these (AIC, CAIC, BIC, and HQIC) statistics values are small. From Table 2 and Table 4, it is evident that the LE model leads to the preferable fit over the Lomax, Weibull Lomax, and Exponential Lomax distribution.

Table 1

Maximum likelihood estimates for data set 1.

Model	Estimates
LE(a,b)	0.1104961	5.6340882	_	_
WL(a,b,c,d)	2.8345778	1.9742578	1.0284592	0.2073842
L(a,b)	2.259102	13.107217	_	_
EL(a,b,c)	0.975232421	0.062429585	0.008794612	_

Table 2

Goodness of fit Criteria: AIC, CAIC, BIC, HQIC for Data set 1.

Model	AIC	CAIC	BIC	HQIC
LE(a,b)	243.7959	244.1293	247.1231	244.9897
WL(a,b,c,d)	249.5339	250.7104	256.1881	251.9214
L(a,b)	252.6833	253.0166	256.0104	253.877
EL(a,b,c)	255.0222	255.7079	260.0129	256.8128

Table 3

Maximum likelihood estimates for data set 2.

Model	Estimates
LE(a,b)	1.125319	34.175778	_	_
WL(a,b,c,d)	0.01968493	1.39764915	1.79715292	2.91573568
L(a,b)	9.44236	23.14359	_	_
EL(a,b,c)	4.1176675	1.5270909	0.0114609	_

Table 4

Goodness of fit Criteria: AIC, CAIC, BIC, HQIC for data set 2.

Model	AIC	CAIC	BIC	HQIC
LE(a,b)	263.2525	263.3988	268.1378	265.2175
WL(a,b,c,d)	265.3037	265.8037	275.0743	269.2338
L(a,b)	341.1852	341.3315	346.0705	343.1502
EL(a,b,c)	264.2428	264.5391	271.5708	267.1903

Fig 3 show the theoretical and empirical probability density function (Pdf) and cumulative distribution function (Cdf) and Fig 4 provides the Q-Q plot and P-P plot of the Lomax exponential for data set 1. Fig 5 shows the theoretical and empirical probability density function (Pdf) and cumulative distribution function (Cdf) and Fig 6 provides the Q-Q plot and P-P plot of the Lomax exponential for data set 2. It is evident that the LE distribution fitted the line very well as compared to others

Fig 3

Theoretical and empirical Pdf and Cdf of LE for data set 1.

Fig 4

Theoretical and empirical Pdf and Cdf with Q-Q plot and P-P plot for LE for data set 1.

Fig 5

Theoretical and empirical Pdf and Cdf of LE for data set 2.

Fig 6

Theoretical and empirical Pdf and Cdf with Q-Q plot and P-P plot for LE for data set 2.

Simulations

Expression (11) can be easily used to draw random data from LE(a,b) distribution. The experiment is repeated for 100 times with a sample of size n = 30, 60, and 90 for different values of the parameter. The average bias and Mean square error (MSE) are given in Table 5. The results reveal that increase in the sample size results in a decrease in both the bias and MSE. The mathematical form of the mean square error and bias are as follows:

Table 5

Mean bias and MSE of LE(a,b) distribution.

a	b	n	Mse(a)	Mse(b)	bias(a)	bias(b)
2	0.1	30	43.37569	0.2849909	3.519714	0.2674605
		60	2.797546	0.01698649	0.8652669	0.0698271
		90	2.547909	0.0147992	0.7191287	0.05946671
2	1.5	30	7.948621	7.12233	0.8635859	0.8316998
		60	1.906238	1.895496	0.2121933	0.328816
		90	0.06762601	0.1207555	0.02700033	0.2979983
2	2	30	16.30946	31.8214	1.998589	2.647095
		60	6.757092	10.25353	1.384559	1.652988
		90	0.8164857	1.4583	0.1488166	0.332031
0.01	0.21	30	1.136051e-05	3.127487	0.002299238	1.507933
		60	8.759477e-07	0.5738629	0.0009260684	0.7550524
		90	2.464595e-07	0.01949125	0.0004921424	0.1385804
0.01	0.30	30	1.223512e-05	4.177988	0.003459386	2.036716
		60	4.526665e-07	0.2298188	0.0006659029	0.4761458
		90	2.010531e-07	0.03170673	0.0004390346	0.1755684
0.01	0.29	30	1.285701e-06	3.607665	0.0007714635	1.78106
		60	3.925089e-07	0.08345893	0.0004569719	0.2779985
		90	2.014616e-07	0.02836001	0.0004394908	0.1657713

Total time on the test (TTT)

The TTT plot plays an important role in identifying the appropriate model to fit the given data in respect of the failure rates. This plot tells us the different forms of the failure rate. If the TTT plot has a straight line (diagonal), this indicates that the given data has a constant failure rate. The failure rates will be increase if this plot is concave and decreases if it is convex. For the bath-tub shape, this plot first decreases and then increases. Similarly, if the failure rates follow some inverted bath-tub shape, then it will be first concave and then convex. The TTT plot is determined by using the following formula where x are the order statistics. The TTT plots for the data (losses due to wind catastrophes and breaking stress of carbon fibers) are given in Fig 7. The graph clearly shows that the proposed distribution plays an important role both in monotonic and non-monotonic hazard rate shapes.

Fig 7

TTT plot for wind catastrophes and carbon fibers.

Conclusion

In this paper, we presented a new modification of the Lomax distribution consisting of two parameters called Lomax exponential Distribution (LE). The statistical properties of the LE distribution are obtained including moments, entropy measures, hazard function, Survival function, median, mode, order statistics, etc. Furthermore, the parameters of the model are estimated using the maximum likelihood estimation method. Asymptotic confidence intervals of the parameters, based on MLE, have been constructed. In future, a study may be conducted to estimate the parameter of the proposed model using Bayesian approach. The behavior of the hazard rate function has been investigated. It is concluded that the Lomax exponential distribution can model data sets having both monotonically and non-monotonically hazard rate shapes. The paper also presents an application of the LE distribution by using two real data sets. The results based on the real-life data sets reveal that the proposed distribution is more flexible for the lifetime data sets and provide a better fit to the data sets as compared to other competing probability models including the Lomax distribution, Weibull Lomax distribution, and exponential Lomax distribution.

Losses due to wind catastrophes.

[38] 2,2,2,2,2,2,2,2,2,2,2,2,3,3,3,3,4,4,4,5,5,5,5,6,6,6,6,8,8,9,15,17,22,23,24,25,27,32,43. (DOCX) Click here for additional data file.

Breaking stress of carbon fibers.

[18] 3.70,2.74,2.73,2.50,3.60,3.11,3.27,2.87,1.47,3.11,4.42,2.41,3.19,3.22,1.69,3.28,3.09,1.87,3.15,4.90,3.75,2.43,2.95,2.97,3.39,2.96,2.53,2.67,2.93,3.22,3.39,2.81,4.20,3.33,2.55,3.31,3.31,2.85,2.56,3.56,3.15,2.35,2.55,2.59,2.38,2.81,2.77,2.17,2.83,1.92,1.41,3.68,2.97,1.36,0.98,2.76,4.91,3.68,1.84,1.59,3.19,1.57,0.81,5.56,1.73,1.59,2.00,1.22,1.12,1.71,2.17,1.17,5.08,2.48,1.18,3.51,2.17,1.69,1.25,4.38,1.84,0.39,3.68,2.48,0.85,1.61,2.79,4.70,2.03,1.80,1.57,1.08,2.03,1.61,2.12,1.89,2.88,2.82,2.05,3.65. (DOCX) Click here for additional data file.

4 in total

1. Injury severities of truck drivers in single- and multi-vehicle accidents on rural highways.

Authors: Feng Chen; Suren Chen
Journal: Accid Anal Prev Date: 2011-04-22

2. Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data.

Authors: Feng Chen; Suren Chen; Xiaoxiang Ma
Journal: J Safety Res Date: 2018-04-25

3. Crash Frequency Modeling Using Real-Time Environmental and Traffic Data and Unbalanced Panel Data Models.

Authors: Feng Chen; Suren Chen; Xiaoxiang Ma
Journal: Int J Environ Res Public Health Date: 2016-06-18 Impact factor: 3.390

4. Investigation on the Injury Severity of Drivers in Rear-End Collisions Between Cars Using a Random Parameters Bivariate Ordered Probit Model.

Authors: Feng Chen; Mingtao Song; Xiaoxiang Ma
Journal: Int J Environ Res Public Health Date: 2019-07-23 Impact factor: 3.390

4 in total

1 in total

1. Exponentiated Odd Lomax Exponential distribution with application to COVID-19 death cases of Nepal.

Authors: Govinda Prasad Dhungana; Vijay Kumar
Journal: PLoS One Date: 2022-06-03 Impact factor: 3.752

1 in total