Literature DB >> 31956845

Introducing Kolmogorov-Smirnov Tests under Uncertainty: An Application to Radioactive Data.

Abstract

The Kolmogorov-Smirnov (K-S) tests based on the assumptions of determined observations in the sample have been popularly applied for the analysis of the data. The existing K-S tests for one sample and two samples cannot be applied when the data contains neutrosophic observations measured from the complex system or under uncertainty. In this paper, we propose the generalization of the existing K-S tests under the neutrosophic statistics. The proposed tests are known as neutrosophic Kolmogorov-Smirnov (NK-S) tests. We present the necessary measures and procedures to perform the proposed tests. An example and advantages of the proposed NK-S tests are given in the paper.

Entities: Chemical Disease Gene Species

Year: 2019 PMID： 31956845 PMCID： PMC6964527 DOI： 10.1021/acsomega.9b03940

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

The statistical methods/techniques have been commonly used in all fields for the analysis of the data, estimation, and forecasting purposes. The data obtained from the system always follows some statistical distribution, which is unknown in advance. Usually, it is assumed that the data follows the normal distribution. However, in practice, it is not always necessary that the data in hand follows the normal distribution. Therefore, statisticians designed several tests to test some hypotheses about the distribution of the data under investigation; see ref (1). As mentioned by Massey,[1] “Attempts have been made to find test statistics whose sampling distribution does not depend upon either the explicit form of or the value of certain parameters in, the distribution of the population. Such tests have been called non-parametric or distribution-free tests.” The Kolmogorov–Smirnov (K–S) test is an alternative non-parametric test, which uses the cumulative distribution to decide about the specific distribution of the data. The K–S test is found to be efficient for goodness of fit purposes. Many authors worked on the K–S test; see, for example, refs (1−8). The K–S test under classical statistics is applied when all observations in the data are determined, precise, and sure. However, in real situations, it may happen that the data cannot be represented by statistical terms or the data may be in an interval or imprecise data. For example, the ecology data, soil data, ocean data, and censored data may be fuzzy data rather than exact data. Therefore, several authors developed the K–S test for the analysis of fuzzy data; see, for example, refs (9−16). The fuzzy logic is a special case of neutrosophic logic. The neutrosophic logic is considered the measure of indeterminacy in addition to the fuzzy logic; see ref (17). More applications of the neutrosophic logic can be seen in refs (18−23). The neutrosophic statistics was developed by Smarandache[24] using the neutrosophic logic. The neutrosophic statistics is an extension of classical statistics, which considers the measure of indeterminacy. The neutrosophic statistics is applied when the observations in the data are neutrosophic numbers. Chen et al.[25,26] discussed the advantages of methods based on neutrosophic numbers. Previous work[27,28] introduced several basic concepts for the neutrosophic statistics. Recently, another previous work[29] proposed the neutrosophic ANOVA test. The existing K–S test under classical statistics and a fuzzy approach cannot be applied when the measure of indeterminacy is needed. By exploring the literature on classical statistics and the fuzzy approach, we did not find any work on the K–S test under the neutrosophic statistics. In this paper, we propose neutrosophic Kolmogorov–Smirnov (NK–S) tests for a single sample and two samples. It is expected that the proposed NK–S tests will effectively analyze the imprecise, vague, and uncertain data compared to the existing K–S test under classical statistics.

Results

For a radioactive source model, radioactive engineering is interested in testing the assumption that the count rate per second follows a neutrosophic Poisson distribution. The count rate per second has the neutrosophic mean [5,7] counts per second. To test the assumption, radioactive engineering collected a large number of count data. The neutrosophic Poisson distribution from ref (18) is given byand where Cu(X) is the neutrosophic commutative values, λϵ[5,7] and nϵ[78,92]. The neutrosophic count data and neutrosophic statistics are shown in Table . Suppose the level of significance for this test is 0.01. The critical neutrosophic value from ref (32) is . According to eq , the statistic in the indeterminacy interval can be written as 0.1594 + 0.4032I; Iϵ[0,0.6046]. The neutrosophic statistic from Table is Dϵ[0.4032,0.1594]. Note here that the lower value of the indeterminacy interval denotes the determined part. By comparing the values of D with D0.01,14, we note that the determinate part follows the Poisson distribution, but the indeterminate part of the data does not follow the Poisson distribution.

Table 1

Necessary Computations for the NK–S Test

no.	X_iN	Cu(X_iN)	S_nN(x_nN)	F_0N(x_n)	D_N
1	[1,4]	[1,4]	[0.0128,0.0435]	[0.0404,0.1730]	[0.0276,0.1295]
2	[1,4]	[2,8]	[0.0256,0.0870]	[0.0404,0.1730]	[0.0148,0.0860]
3	[3,5]	[5,13]	[0.0641,0.1413]	[0.2650,0.3007]	[0.2009,0.1594]
4	[3,5]	[8,18]	[0.1026,0.1957]	[0.2650,0.3007]	[0.1625,0.1051]
5	[4,5]	[12,23]	[0.1538,0.2500]	[0.4405,0.3007]	[0.2866,0.0507]
6	[5,6]	[17,29]	[0.2179,0.3152]	[0.6160,0.4497]	[0.3980,0.1345]
7	[5,6]	[22,35]	[0.2821,0.3804]	[0.6160,0.4497]	[0.3339,0.0693]
8	[6,6]	[28,41]	[0.3590,0.4457]	[0.7622,0.4497]	[0.4032,0.0041]
9	[6,6]	[34,47]	[0.4359,0.5109]	[0.7622,0.4497]	[0.3263,0.0612]
10	[6,7]	[40,54]	[0.5128,0.5870]	[0.7622,0.5987]	[0.2494,0.0118]
11	[8,8]	[48,62]	[0.6154,0.6739]	[0.9319,0.7291]	[0.3165,0.0552]
12	[8,9]	[56,71]	[0.7179,0.7717]	[0.9319,0.8305]	[0.2141,0.0588]
13	[10,9]	[66,80]	[0.8462,0.8696]	[0.9863,0.8305]	[0.1402,0.0391]
14	[12,12]	[78,92]	[1.0000,1.0000]	[0.9980,0.9730]	[0.0020,0.0270]

Discussion

In this section, we compare the performance of the proposed NK–S test over the K–S test under classical statistics. According to refs (25) and (26), a method that provides the results in the indeterminacy interval when the data have the neutrosophic numbers is said to be more adequate and effective than the method that provides the results in the determined form. To compare the proposed NK–S test with the existing NK test, we will use the same data that are given in Table . Note here that the data given in Table reduces to the determined part under classical statistics if no observations of uncertainty are recorded. For example, for sample 1, the first value, which is 1, represents the indeterminate part of the indeterminacy interval. The second value of this sample represents the determinate part of the interval. From Table , we note that the proposed test provides the results in the indeterminacy interval rather than the determined values. Using eq , the values of the statistic in the indeterminacy form can be written as 0.1594 + 0.4032I; Iϵ[0,0.6046]. Note here that the proposed test provides a good measure of indeterminacy. At a level of significance 0.01, the probability that the null hypothesis will be accepted is 0.99, the probability of rejecting the null hypothesis when it is true is 0.01, and the probability of indeterminacy is 0.60. For example, in the statistic Dϵ[0.4032,0.1594], the value DL = 0.1594 presents the determined part under the classical statistics, and the value DU = 0.4032 shows the indeterminate part under the uncertainty. By comparing both tests, we note that DL < 0.1845, which shows that the existing NK test indicates that the sample belongs to the Poisson distribution. However, the indeterminate part shows that under uncertainty, the sample does not come from the Poisson distribution. From this comparison, we conclude that the values of the statistic D can be from 0.1594 to 0.4032 under uncertainty. Hence, the theory of the proposed NK–S test concurs with the theories of refs (25) and (26).

Concluding Remarks

In this paper, we presented the modifications of the Kolmogorov–Smirnov (K–S) test under the neutrosophic statistics. We proposed the neutrosophic Kolmogorov–Smirnov (NK–S) tests, which are the generalization of the K–S tests. The proposed NK–S test under the neutrosophic statistical interval method is more adequate, informative, and effective to be applied when the data have neutrosophic numbers. The proposed test provides the results in the indeterminacy interval, which is desirable under uncertainty or when the data is measured from the complex system. We presented an example and found that the proposed test is better than the existing K–S test. We recommend applying the proposed NK–S tests for the analysis of the data in biomedical sciences, big data analysis, engineering, and statistics. More properties using the simulation data and/or the development of software for the analysis of the proposed NK–S tests can be considered for future research.

Computational Methods

Assume that X = a + b be a neutrosophic number (NN) where a is the determinate part and b; Iϵ[IL, IU] is the indeterminate part of the NN. Let X = X + I; X[XL, XU] be a random variable based on the NN where XL and XU are lower and upper values of the indeterminacy interval. Note here that the NN and Xϵ[XL, XU] reduce to a number and a variable under classical statistics if IL = 0 or XL = XU, respectively. The neutrosophic variable Xϵ[XL, XU] presented the NNs in a sample selected from the population having imprecise, uncertain, and indeterminate values or parameters. More details about neutrosophic statistics can be seen in ref (17). The main aim is to propose the K–S tests under the neutrosophic statistics to determine the specific distribution of the data in the presence of neutrosophy.

Neutrosophic Kolmogorov–Smirnov Tests

The Kolmogorov–Smirnov (K–S) test was originally derived by Kolmogorov[30] and Smirnov[31] and has been used in nonparametric testing of the hypothesis. In classical statistics, the K–S test has been commonly used to test whether the sample under study belongs to a specific distribution or not. In other words, the K–S test is applied to decide whether the observed distribution significantly differs from the specified population distribution.[32] The existing K–S test is applied under the assumption that all observations/parameters in the observed sample and in the population are determined and precise. The data that came from complex systems such as the ocean, the human brain data, and power grid or under uncertainty may not have all determined observations. In these situations, the K–S test under classical statistics cannot be applied for testing whether the data belong to a specific distribution. We modify the existing K–S test under classical statistics using the neutrosophic statistics. The proposed neutrosophic Kolmogorov–Smirnov (NK–S) test is the generalization of the existing K–S test proposed by Kolmogorov[30] and Smirnov.[31] The proposed NK–S test will be applicable under the following assumptions: The data consists of uncertain, imprecise, and indeterminate values. The two neutrosophic samples should be mutually independent. The K–S test can be applied independent of the cumulative distribution function. Woodruff et al.[33] used it for the Weibull distribution. Papadopolous and Qiao[34] and Frey[35] presented the K–S test for the Poisson distribution. Suppose that X1, X2, ..., X be a neutrosophic random sample from a neutrosophic population having a neutrosophic cumulative frequency distribution function, say F0(x). By following ref (1), the null hypothesis that the neutrosophic sample came from the specified neutrosophic distribution is rejected if the neutrosophic cumulative frequency distribution function is not close to the specified neutrosophic distribution function. Suppose now that F0(x); F0(x)ϵ[F0L(x), F0(x)] and S(x); S(x)ϵ[S(x), S(x)] be the neutrosophic population cumulative distribution function and the observed neutrosophic sample distribution function, respectively. Then, the neutrosophic maximum difference statistic based on F0(x); F0(x)ϵ[F0(x), F0(x)] and S(x)ϵ[S(x), S(x)] is given by The proposed test in the indeterminacy interval can be written asNote here that A and B are the determined and indeterminate parts of the test. The proposed test reduces to the tests in refs (30) and (31) if no indeterminacy is found in the data. Also, note here that the proposed NK–S test reduces to the tests in refs (30) and (31) when D = D. The neutrosophic null hypothesis that the sample came from the neutrosophic specified population is accepted if Dϵ[DL, DU] > Dα where Dαϵ[DαL, DαU] is a neutrosophic critical value and can be selected from ref (32).

NK–S Test for Comparing Two Populations

Kolmogorov[30] and Smirnov[31] also extended the K–S test for comparing two populations. Like the K–S test for a single population, this test is also based on the assumption that the observations/parameters of two populations should be determined and precise. In this section, we present the NK–S test for comparing two neutrosophic populations. Let X1, X2, ..., X and Y1, Y2, ..., Y be two neutrosophic independent samples of sizes n1ϵ[n1L, n1U] and n2ϵ[n1L, n1U] from a specified population, respectively. Let S(x)ϵ[S(x), S(x)] and S(y)ϵ[S(y), S(y)] be neutrosophic sample cumulative distribution functions. Then, the neutrosophic maximum difference statistic based on S(x)ϵ[S(x), S(x)] and S(y)ϵ[S(y), S(y)] is given by The proposed test for two populations in the form of indeterminacy can be written asNote here that C and E are the determined and indeterminate parts of the test. The proposed test reduces to the tests in refs (30) and (31) if no indeterminacy is found in the data. Note also here that the proposed NK–S test reduces to the tests in refs (30) and (31) when S(x) = S(x) and S(y) = S(y). The neutrosophic null hypothesis that two samples came from the same neutrosophic specified population is accepted if Dϵ[DL, DU] > Dα where Dαϵ[DαL, DαU] is a neutrosophic critical value.

2 in total

1. Sampling Inspection Plan to Test Daily COVID-19 Cases Using Gamma Distribution under Indeterminacy Based on Multiple Dependent Scheme.

Authors: Muhammad Aslam; Gadde Srinivasa Rao; Mohammed Albassam
Journal: Int J Environ Res Public Health Date: 2022-04-27 Impact factor: 4.614

2. Vague data analysis using neutrosophic Jarque-Bera test.

Authors: Muhammad Aslam; Rehan Ahmad Khan Sherwani; Muhammad Saleem
Journal: PLoS One Date: 2021-12-02 Impact factor: 3.240

2 in total