Literature DB >> 34497328

Neutrosophic statistical test for counts in climatology.

Abstract

The existing F-test for two counts data from the Poisson distribution under classical statistics can be applied only when the counts in the data are exact or not intervals. The existing test cannot be applied when the count data is indeterminate, in the interval, and uncertain. In this paper, the F-test for two counts data from the Poisson distribution under classical statistics is designed. The test for two counts recording at the same time or different times is presented. The daily and the monthly number of records broken data in the U.S from the weather department is selected for the application of the proposed test. The application and comparison studies show the efficiency of the proposed test. The proposed test was found to be informative, flexible, and appropriate to be applied in an uncertain environment.

Entities: Chemical

Year: 2021 PMID： 34497328 PMCID： PMC8426340 DOI： 10.1038/s41598-021-97344-x

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.996

Introduction

The Poisson distribution has only one parameter which is also known as the mean of the distribution. The Poisson distribution is applied when the events have occurred in a specified time such as the number of defective items produced in a day or the number of weather broken records. In such a situation, it is quite interested to investigate the significance of the difference between two counts that came from the Poisson distribution. The F-test for two counts of data is applied to test the null hypothesis that there is no statistical difference in two counts vs. the alternative hypothesis that there is a statistical difference in two counts. Usually, the F-test for two counts data is applied under the assumption that the counts follow the Poisson distribution and counts should be recorded at the same time of occurrence. Kanji[1] discussed F-test for two counts data under the classical statistics. Krishnamoorthy and Thomson[2] worked on the test for testing two means of Poisson distribution. Hilbe[3] applied the test for count data in education. Puig and Weiß[4] presented the goodness of fit test with real application. More applications of such tests can be seen in[5-10]. The statistical methods and tests have been widely applied for testing the normality and estimation of wind speed data. Several authors introduced various statistical models for the wind speed data. References[11-20] used various statistical techniques in the area of metrology. The existing tests under classical statistics are applied when the count’s data is determined. Viertl[21] stated that “statistical data are frequently not precise numbers but more or less non-precise, also called fuzzy. Measurements of continuous variables are always fuzzy to a certain degree”. The statistical tests designed under fuzzy logic are applied when uncertainty is found in the data. References[22-29] introduced statistical tests using fuzzy logic. According to Smarandache[30], the fuzzy logic is not efficient as neutrosophic logic in terms of the measure of indeterminacy. Smarandache[31] proved the efficiency of neutrosophic logic over interval-based analysis and fuzzy logic. References[32-36] presented several applications of neutrosophic logic. Smarandache[37] introduced the extension of classical statistics is known as neutrosophic statistics. The neutrosophic statistics can be applied when uncertainty is found in the data. References[38,39] introduced the methods to deal with neutrosophic data. The statistical tests under neutrosophic statistics were introduced by references[40-42]. The F-test under classical statistics applied under the assumption that all observations in the data are determined and précised. Therefore, the existing F-test for two counts data from the Poisson distribution can be applied only when the counts are determined. In real life, it is not always necessary that the counts are determined. In this situation, the existing F-test for two counts data may mislead the decision-makers. In addition, the use of the existing F-test on the data having uncertain observations does not give information about the measure of indeterminacy. The literature study shows that F-test to deal with the neutrosophic in counts data is not available. In this paper, the F-test for two counts data having uncertainty will be introduced originally. The operational procedure and statistic of the proposed test will be introduced. The proposed test will be applied in testing weather records at two different times. It is expected that the proposed test will be efficient and informative than the existing test under classical statistics.

Methods

The existing F-test for two counts data from the Poisson distribution under classical statistics can be applied only when all counts in the data are determined, clear, and exact, see[1]. When the count data is the interval, the existing F-test for count data cannot be applied for testing the significance between two counted results. In this situation, the F-test for two count data under neutrosophic statistics can be applied. In this section, the methodology of the proposed F-test under indeterminacy will be presented. The main objective of the proposed F-test for the count data is to investigate the difference between two counted results having minimum and maximum counts in the data. The proposed F-test for the count data will be applicable under the assumptions that counts are from the Poisson distribution (rare events) and in addition, both samples of count data are recorded under uniform conditions. Let us assume that and be neutrosophic forms of count data from the first and second populations, respectively. Note that and are the determined parts in neutrosophic forms and and are the indeterminate parts of neutrosophic forms. Note also that and are the measure of indeterminacy associated with counts in the first and second population, respectively. The information about the neutrosophic numbers can be seen in[43-49]. Suppose that and be the means of the first and second population, respectively. To test the null hypothesis that , the F-test, say based on neutrosophic counts and is defined as The statistic follows the neutrosophic F-distribution with degree of freedom, see Aslam[40]. It is worth noting that the statistic was given in Eq. (1) can be applied when two counts are recorded in the same period of time . The neutrosophic form of the proposed statistic can be expressed as The proposed statistic is a generalization of the existing F-test for two counts data. The statistic presents the existing F-test for two counts data. Note that present the indeterminate part and is a measure of uncertainty associated with . The proposed statistic becomes the existing statistic when . When the counts are noted over the different period’s time and , the counting rates and are obtained. For this situation, the proposed statistic is defined as The neutrosophic form of the proposed statistic can be expressed as The proposed statistic is a generalization of the existing F-test for two counts data. The statistic presents the existing F-test for two counts data. Note that present the indeterminate part and is a measure of uncertainty associated with . The proposed statistic become the existing statistic when .

Application

Now, we will discuss the application of the proposed F-test for count data recorded from a subset of stations in the Global Historical Climatological Network. The weather data is selected from https://www.ncdc.noaa.gov/cdo-web/datatools/records on January 07, 2021. The U.S daily records broken are shown in Table 1 and U.S monthly records broken are shown in Table 2. From Tables 1–2, it can be seen that record counting is in intervals rather than the exact values. From such counting data, the existing F-test for two counts data under classical statistics cannot be applied. The proposed test is an alternative to the existing test. From Tables 1–2, it can be seen that two counts are recorded in the same period of time , therefore, the statistic is suitable to apply. The values of are also shown in Tables 1–2. The proposed test is implemented in the following steps.

Table 1

U.S daily records.

Period	Low min, low max	High min, high max	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{N1} \epsilon \left[ {F_{L1} ,F_{U1} } \right]$$\end{document}FN1ϵFL1,FU1
Last 30 days	[115, 152]	[597, 1483]	[0.1923,0.1024]
Last 365 days	[15679, 19150]	[37143, 33514]	[0.4221,0.5713]

Table 2

U.S monthly records.

Period	Low min, low max	High min, high max	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{N1} \epsilon \left[ {F_{L1} ,F_{U1} } \right]$$\end{document}FN1ϵFL1,FU1
Last 30 days	[0, 1]	[14, 25]	[0, 0.0384]
Last 365 days	[586, 985]	[2384, 2363]	[0.2457, 0.4166]

Step 1 State vs. . Step 2 Set the level of significance at = 5% and select the critical value from F-table at = 5% which is 1. Step 3 The values of for last 30 days is computed as . Similarly, the other values of in Table 1 and Table 2 can be computed. Step 4 Accept for the U.S daily and monthly records as for both datasets are smaller than 1. U.S daily records. U.S monthly records. From the study, it is concluded that there is no statistical difference between the two counts of U.S daily records and U.S monthly records.

Comparative study

The proposed F-test for two counts data is reduced to F-test for two counts data under classical statistics when the counts are determined or not in intervals and no indeterminacy is recorded in counts. The comparison of the proposed test is given over the existing F-test for two counts data in terms of chance of uncertainty. The neutrosophic analyses of of both data sets along with the measures of indeterminacy are shown in Table 3. The neutrosophic forms consist of the statistic of the existing test and indeterminate part. Note that the symbols of statistic represent the corresponding number of days. For example, for the records in the last 30 days, the neutrosophic form is , where the value of statistic 0.1923 presents the existing test when = 0 and is an indeterminate part and the measure of uncertainty associated with is 0.87. It means that for the proposed test, the value of can be expected from 0.1923 to 0.1024. From the analysis, it can be seen that under uncertainty, the proposed test gives the values of statistic is a range rather than the exact value. Therefore, the proposed test is quite effective and flexible to apply in uncertainty. Similarly, other neutrosophic forms given in Table 3 can be interpreted. Based on the information, the proposed test can be interpreted as for = 5%, the chance of accepting is 0.95, the chance of committing a type-I error (the probability of rejecting when it is true) is 0.05 and the chance of uncertainty about the acceptance of is 0.87. It is clear that for the real example, the chance of indeterminacy is high; therefore, the decision-makers should be careful in making the decision about the acceptance of . The proposed test under neutrosophic statistics is also a generalization of interval-based analysis. The interval analysis uses intervals instead of crisp numbers in order to approximate/capture the data inside the intervals. On the other hand, the neutrosophic statistics analysis uses set analysis (any type of set, not only intervals) in order to approximate/capture the data inside intervals. The results obtained from the proposed test can also be compared with the results obtained from the interval data analysis. From the data analysis, the value of from the interval analysis is 0.1923 to 0.1024. In addition, from the neutrosophic form , it can be seen that is also better structured since we know that is the determinate part and is the fluctuating part around . From the comparison, it can be seen that the interval-based analysis provides the results in an interval without the information about the measure of indeterminacy. From the study, it can be concluded that the proposed F-test is efficient than the existing F-test under classical statistics and interval-based analysis in terms of information and flexibility. Therefore, under indeterminacy, it is recommended to apply the proposed test for testing the daily recodes and monthly records data.

Table 3

Neutrosophic analysis of two datasets.

Period	Neutrosophic form	Measure of Indeterminacy
U.S Daily Records
Last 30 days	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{N1} = 0.1923 - 0.1024I_{N130D}$$\end{document}FN1=0.1923-0.1024IN130D	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{N130D} \left[ {0,0.87} \right]$$\end{document}IN130D0,0.87
Last 365 days	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{N1} = 0.4221 + 0.5713I_{N1365D}$$\end{document}FN1=0.4221+0.5713IN1365D	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{N13650D} \left[ {0,0.2611} \right]$$\end{document}IN13650D0,0.2611
U.S Monthly Records
Last 30 days	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{N1} = 0 + 0.0.084I_{N130M}$$\end{document}FN1=0+0.0.084IN130M	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{N130M} \left[ {0,1} \right]$$\end{document}IN130M0,1
Last 365 days	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_{N1} = 0.2457 + 0.4166I_{N1365M}$$\end{document}FN1=0.2457+0.4166IN1365M	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$I_{N1365M} \left[ {0,0.4102} \right]$$\end{document}IN1365M0,0.4102

Neutrosophic analysis of two datasets.

Concluding remarks

In this paper, the F-test for two counts data from the Poisson distribution under classical statistics was designed. The tests for two counts time are the same or different was presented. The procedure to test two counts from the same or different times is equal or not was discussed. The application of the proposed was given using the weather records data. The application of the proposed test showed that the proposed test is flexible and informative to apply in uncertainty. In addition, the proposed test gives the results in indeterminate intervals. Based on the study, it is recommended to apply the proposed test when the counts are recorded in an indeterminate environment. The proposed test using double sampling can be considered as future research. The application of the proposed test for big data can be considered as future research.

5 in total

1. The analysis of count data: a gentle introduction to poisson regression and its alternatives.

Authors: Stefany Coxe; Stephen G West; Leona S Aiken
Journal: J Pers Assess Date: 2009-03

2. [Statistical analysis for count data: use of healthcare services applications].

Authors: Aarón Salinas-Rodríguez; Betty Manrique-Espinoza; Sandra G Sosa-Rubí
Journal: Salud Publica Mex Date: 2009 Sep-Oct

3. Cosine similarity measures of bipolar neutrosophic set for diagnosis of bipolar disorder diseases.

Authors: Mohamed Abdel-Basset; Mai Mohamed; Mohamed Elhoseny; Le Hoang Son; Francisco Chiclana; Abd El-Nasser H Zaied
Journal: Artif Intell Med Date: 2019-10-05 Impact factor: 5.326

4. Sequence count data are poorly fit by the negative binomial distribution.

Authors: Stijn Hawinkel; J C W Rayner; Luc Bijnens; Olivier Thas
Journal: PLoS One Date: 2020-04-30 Impact factor: 3.240

5. An accurate paired sample test for count data.

Authors: Thang V Pham; Connie R Jimenez
Journal: Bioinformatics Date: 2012-09-15 Impact factor: 6.937

5 in total

3 in total

1. Design of a new Z-test for the uncertainty of Covid-19 events under Neutrosophic statistics.

Authors: Muhammad Aslam
Journal: BMC Med Res Methodol Date: 2022-04-06 Impact factor: 4.615

2. Analysis and Allocation of Cancer-Related Genes Using Vague DNA Sequence Data.

Authors: Muhammad Aslam; Mohammed Albassam
Journal: Front Genet Date: 2022-04-19 Impact factor: 4.772

3. Statistical inference for a constant-stress partially accelerated life tests based on progressively hybrid censored samples from inverted Kumaraswamy distribution.

Authors: Manal M Yousef; Salem A Alyami; Atef F Hashem
Journal: PLoS One Date: 2022-08-01 Impact factor: 3.752

3 in total