Literature DB >> 32733737

Multivariate Analysis under Indeterminacy: An Application to Chemical Content Data.

Abstract

The Hotelling T-squared statistic has been widely used for the testing of differences in means for the multivariate data. The existing statistic under classical statistics is applied when observations in multivariate data are determined, precise, and exact. In practice, it is not necessary that all observations in the data are determined and precise due to measurement in complex situations and under uncertainty environment. In this paper, we will introduce the Hotelling T-squared statistic under neutrosophic statistics (NS) which is the generalization of classical statistics and applied under uncertainty environment. We will discuss the application and advantage of the neutrosophic Hotelling T-squared statistic with the aid of data. From the comparison, we will conclude that the proposed statistic is more adequate and effective in uncertainty.

Entities: Chemical Disease Gene Species

Year: 2020 PMID： 32733737 PMCID： PMC7369667 DOI： 10.1155/2020/1406028

Source DB: PubMed Journal: J Anal Methods Chem ISSN： 2090-8873 Impact factor: 2.193

1. Introduction

In classical statistics (CS), the univariate analysis is the technique to analyze the single-variable data. The multivariate analysis has been widely used to analyze data having more than one variable. In the multivariate technique under the CS, the Hotelling T-squared statistic has been widely applied in the variety of fields (see, for example, [1, 2]), for the testing either the means for more than one populations are equal or not. This statistic is the extension of the t-test, which is applied for the testing of the mean for the single population. Brereton [3] used the Hotelling T-squared statistic to detect the outlier in chemical data. In [4], Varmuza and Filzmoser worked on multivariate analysis for chemometric data. Hervé et al. [5] applied the multivariate technique on biological data. Kitaga ki et al. [6] used Hotelling T-squared statistic in chemical and electrochemical oscillator issues. For more details about the applications of the Hotelling T-squared statistic, the reader may read [3, 7] and [8]. The Hotelling T-squared statistic derived under the CS can be only applied for the analysis when all observations in the multivariate data are determined, precise, and certain. In practice, the data under study are not always precise but linguistic. For example, the temperature of a certain city may be high, low, and medium or the measurement of variable data in a complex system may lead to being in an interval rather than the determined values. In such situations, the Hotelling T-squared statistic under the CS cannot be used for the analysis of the data. When observations are uncertain or fuzzy, the fuzzy Hotelling T-squared statistic can be applied for the testing of means of multivariate populations. Taleb et al. [9] applied the fuzzy Hotelling T-squared statistic to design a control chart. D'Urso [10] provided a review on fuzzy multivariate analysis. Bakdi and Kouadri [11] presented a new adoptive principle component analysis technique to detect fault in a complex system. In [12], Ammiche et al. introduced principle component analysis for the Tennessee Eastman process using a fuzzy approach. More applications can be read in [13-15]. Recently, the neutrosophic logic, which is the extension of the fuzzy logic, attracted many researchers due to its applications in the variety of fields. The neutrosophic logic considered the measure of indeterminacy which fuzzy logic does not consider (see [16]). The neutrosophic statistics (NS) which is based on the neutrosophic numbers is the generalization of the CS (see [17, 18]). The NS has been applied widely in the rock-measuring issues (see, for example, [19, 20]). The application of the NS for the inspection of the product can be seen in [21, 22]. The applications of the NS in the area of the process control can be seen in [23, 24]. The application of the NS in medical can be read in [25]. For more information on neutrosophic theory, the reader may refer to [26, 27]. Aslam and Smarandache [17, 18] pointed out some suggestions to extend the several concepts of CS to the NS. By exploring the literature and best of our knowledge, there is no work on the development of Hotelling T-squared statistic under the NS. In this paper, we will introduce the Hotelling T-squared statistic under NS, which is the generalization of classical statistics and applied under uncertainty environment. We will discuss the application and advantage of neutrosophic Hotelling T-squared statistic with the aid of data. We expect that the proposed neutrosophic Hotelling T-squared statistic will perform better than the existing Hotelling T-squared statistic in uncertainty.

2. Preliminaries

Let x∈[x, x] be a neutrosophic random variable, which represents the particular neutrosophic observation of the kth variable that is noted from the jth item. Note here that x∈[x, x] is expressed in the indeterminacy interval having the smaller value xand the larger value x. The neutrosophic form of x∈[x, x] having determinate part x and indeterminate part xI; I∈[I, I] can be written as follows: x=x+xI; I∈[I, I]. Note here that the neutrosophic random variable reduces to the variable under classical statistics if no indeterminacy is recorded in the data. The neutrosophic data matrix having n∈[n, n] neutrosophic observations of p∈[p, p] neutrosophic variables is given as follows: The neutrosophic form of Xϵ[X, X] can be written as Note here that Xϵ[X, X] is the generalization of the data matrix under classical statistics. The data matrix under Xϵ[X, X] reduces to the data matrix under classical statistics when I = 0. The neutrosophic sample mean and neutrosophic sample variance from n measurements from p neutrosophic variables are computed as follows: The neutrosophic form of can be written as Note here that is the generalization of the sample mean under classical statistics. The data matrix under reduces to the sample mean under classical statistics when I = 0: The neutrosophic form of s2ϵ[s2, s2] can be written as Note here that s2ϵ[s2, s2] is the generalization of sample variance under classical statistics. The data matrix under s2ϵ[s2, s2] reduces to the sample variance under classical statistics when I = 0. The neutrosophic sample covariance between two neutrosophic variables are given by The neutrosophic form of Sϵ[S, S] can be written as Note here that Sϵ[S, S] is the generalization of sample covariance under classical statistics. The data matrix under Sϵ[S, S] reduces to the sample covariance under classical statistics when no indeterminate observations. Finally, neutrosophic sample correlation between the ith and kth variables is given by The neutrosophic form of rϵ[r, r] can be written as Note here that rϵ[r, r] is the generalization of sample correlation under classical statistics. The data matrix under rϵ[r, r] reduces to the sample correlation under classical statistics when no indeterminate observations. The neutrosophic descriptive statistics for n measurements and on p variables can be presented into the following arrays. The neutrosophic sample mean variance and covariance and correlation are presented by the array

3. Neutrosophic Hotelling T2 Statistic

In this section, we discuss the proposed neutrosophic Hotelling T2 statistic. In classical statistics, the student t-test is applied for the testing of the mean for the univariate case. As mentioned by [28], rejecting the null hypothesis that means are equal when |t|is large is the same as rejecting the null hypothesis of its square: The neutrosophic form of t2ϵ[t2, t2] can be written as Note here that t2ϵ[t2, t2] is the generalization of Hotelling T-squared statistic under classical statistics. The data matrix under t2ϵ[t2, t2] reduces to the Hotelling T-squared statistic under classical statistics when no indeterminate observations. For the given values of and S2ϵ[S2, S2], the null hypothesis will be rejected ifwhere α is the level of significance and t(α/2) is upper 100(α/2)th percentiles of the neutrosophic t-distribution with the neutrosophic degree of freedom n − 1. The generalization of equations (1) and (2) for the multivariate case under the neutrosophic statistical interval method (NSIM) is given bywhere The neutrosophic form of T2ϵ[T2, T2] can be written as The statistic is given in equation (14) is called neutrosophic Hotelling T2 statistic and has neutrosophic F-distribution with neutrosophic degree of freedom (ndf) p and (n − p): The neutrosophic Hotelling T2 statistic can be used for the testing of hypothesis H0 : μ=μ0 and alternative hypothesis H0 : μ ≠ μ0. The H0 : μ=μ0 will be rejected if The software provides the p value in making a decision about the acceptance or the rejection of the null hypothesis. According to [18], “a neutrosophic p value is defined in the same way as in classical statistics: the smallest level of significance at which a null hypothesis H0 can be rejected.” Note here that the neutrosophic p value is not an exact or determined value as in the case of classical statistics. Smarandache [18] discussed criteria to accept or reject the null hypothesis using the neutrosophic p value.

4. Application

Now, we discuss the application of the proposed neutrosophic Hotelling T2 statistic using data selected from the healthcare department. The data are collected from 20 healthy women and three variables, which are sweat rate, sodium, and potassium contents are measured. The observations of variables underinvestigated will be obtained from the measurement process. It is expected that not all observations in the data are precise and exact. Therefore, it cannot be analyzed using CS. Similar data for classical statistics are given by [28]. The data having some neutrosophic observations are shown in Table 1. We want to test that the means of three groups for the healthy women have the same population means. We state null and alternative hypotheses as follows:

Table 1

The neutrosophic sweat data.

Individual	X ₁ (sweat level)	X ₂ (sodium)	X ₃ (potassium)
1	[3.7, 3.7]	[48.5, 48.7]	[9.3, 9.3]
2	[5.7, 5.8]	[65.1, 65.1]	[8.0, 8.1]
3	[3.8, 3.8]	[47.2, 47.3]	[10.9, 10.9]
4	[3.2, 3.3]	[53.2, 53.3]	[12.0, 12.0]
5	[3.1, 3.1]	[55.5, 55.5]	[9.7, 9.8]
6	[4.6, 4.8]	[36.1, 36.2]	[7.9, 7.9]
7	[2.4, 2.4]	[24.8, 24.8]	[14.0, 14.0]
8	[7.2, 7.3]	[33.1, 33.2]	[7.6, 7.7]
9	[6.7, 6.7]	[47.4, 47.4]	[8.5, 8.6]
10	[5.4, 5.5]	[54.1, 54.2]	[11.3, 11.3]
11	[3.9, 3.9]	[36.9, 36.9]	[12.7, 12.7]
12	[4.5, 4.6]	[58.8, 58.9]	[12.3, 12.4]
13	[3.5, 3.5]	[27.8, 27.9]	[9.8, 9.8]
14	[4.5, 4.5]	[40.2, 40.2]	[8.4, 8.5]
15	[1.5, 1.7]	[13.5, 13.5]	[10.1, 10.2]
16	[8.5, 8.5]	[56.4, 56.5]	[7.1, 7.1]
17	[4.5, 4.7]	[71.6, 71.9]	[8.2, 8.2]
18	[6.5, 6.5]	[52.8, 52.8]	[10.9, 10.9]
19	[4.1, 4.2]	[44.1, 44.2]	[11.2, 11.3]
20	[5.5,5.5]	[40.9, 40.9]	[9.4, 9.5]

Step 1: vs . Step 2: some basic calculations for the data are given in Table 1 are shown as Step 3: let α=[0.10, 0.10] be the level of significance. Step 4: the neutrosophic Hotelling T2 statistic is Step 5: the critical region is using equation (5) is given as Step 6: as T2=[9.7387, 11.4176] > T00=[8.17, 8.17], we reject .

5. Comparisons

In Section 4, we presented the testing procedure for the proposed neutrosophic Hotelling T2. The proposed neutrosophic Hotelling T2 is the generalization of CS. The proposed neutrosophic Hotelling T2 testing procure reduces to the testing procedure under CS when all observations of sweat data are precise. From neutrosophic sweat data, we note that the proposed testing procedure provides the analysis values in the indeterminacy interval rather than the determined values. The neutrosophic form of proposed Hotelling statistic is T2=9.7387 − 11.41I; Iϵ[0,0.1470]. For example, the proposed Hotelling statistic has the indeterminacy interval from 9.73 to 11.41. It means, under uncertainty environment, one can expect the values of T2 from 9.73 to 11.41. The first value 9.73 of the indeterminacy interval of T2 shows the determined part, and 11.41 is an indeterminate part. When imprecise observations are noted in the sweat data, the value of T2 is 9.73 which is under the CS. In other words, when the level of significance is 5%, the probabilities that the null hypothesis is accepted, rejected, and indeterminate are 0.95, 0.50, and 0.1470. By comparing the proposed test with the test under CS, we note that the existing test is unable to tell about the probability of the indeterminacy. As mentioned by [19, 20] that a method that provides the values in an indeterminacy interval under uncertainty is considered as the most effective and adequate method. By comparing the proposed testing procedure with the existing under CS, our theory is the same as in [19, 20].

6. Concluding Remarks

In this paper, we introduced the Hotelling T-squared statistic under neutrosophic statistics (NS) which is the generalization of classical statistics and applied under uncertainty environment. We discussed the application and advantage of neutrosophic Hotelling T-squared statistic with the aid of data. The proposed neutrosophic Hotelling T-squared statistic is expressed in the indeterminacy interval and hence more flexible and information than the Hotelling T-squared statistic under classical statistics. Based on the comparison, we recommend using the proposed neutrosophic Hotelling T-squared statistic for the analysis of the data under uncertainty. Some more properties of the proposed neutrosophic Hotelling T-squared statistic can be studied as future research. The sensitivity of the proposed statistic to uncertainty and measurement errors can be studied in future work.

5 in total

1. Nonlinear modeling and adaptive monitoring with fuzzy and multivariate statistical methods in biological wastewater treatment plants.

Authors: Chang Kyoo Yoo; Peter A Vanrolleghem; In-Beum Lee
Journal: J Biotechnol Date: 2003-10-09 Impact factor: 3.307

Multivariate Analysis under Indeterminacy: An Application to Chemical Content Data.

1. Introduction

2. Preliminaries

3. Neutrosophic Hotelling T2 Statistic

4. Application

5. Comparisons

6. Concluding Remarks

1. Nonlinear modeling and adaptive monitoring with fuzzy and multivariate statistical methods in biological wastewater treatment plants.

2. Multivariate statistical analysis of chemical and electrochemical oscillators for an accurate frequency selection.

3. Multivariate Analysis of Multiple Datasets: a Practical Guide for Chemical Ecology.

4. Generalized Hotelling's test for paired compositional data with application to human microbiome studies.

5. Application of the Hotelling and ideal observers to detection and localization of exoplanets.