Literature DB >> 28638197

Does More Data Mean Higher Efficiency? An Experience from Pre- and Post-treatment Study with Missing Data.

Hongyue Wang¹, Jing Peng¹, Juila Z Zheng², Bokai Wang¹, J X Tu³, Changyong Feng^1,4.

Abstract

In this paper we compare two moment-based methods which have been widely used to test the hypothesis of no treatment effect in pre- and post-treatment studies with data missing completely at random. Our theoretical derivation and simulation results show that the method based on all available data is not necessarily more efficient than the method that uses only complete data pairs. We propose an optimal linear combination of these two methods which turns to be more powerful in all cases.

Entities: CellLine Chemical Disease Gene Species

Keywords: asymptotical relative efficiency; likelihood ratio test; paired t-test

Year: 2016 PMID： 28638197 PMCID： PMC5434275 DOI： 10.11919/j.issn.1002-0829.216058

Source DB: PubMed Journal: Shanghai Arch Psychiatry ISSN： 1002-0829

1. Introduction

It is well known in data analysis that more data usually offer more information to make statistical inferences. For example, suppose we want to find the average body weight of 2-year-old boys in New York City. For this purpose, we can randomly select 100 boys from the targeted population, obtain their individual body weight, and calculate the average body weight and the standard deviation. If possible, we can also randomly select 10, 000 boys and do the same calculations. Usually, the average weights in both cases are very similar. However, the standard deviation of the latter is only about 10% of the former one. Student’s t-test [ is one of the most popular statistical tools used to compare the mean value with continuously distributed data. Let be a random sample from a population of interest with mean value μ and variance σ The sample mean and sample variance are which are unbiased estimators of μ and σ, respectively[. A widely used method to test the hypothesis H=μ0 is the test statistic defined by If the data follows normal distribution, then under the null hypothesis H, the test statistic T in (1) has a t-distribution with n-1 degrees of freedom[. If the data is not normally distributed, the exact distribution of T is usually not available. However, as long as the sample size n is large enough, we can use the standard normal distribution to approximate the distribution of T, which is the direct result of the central limit theorem in probability theory [. The test statistic in (1) is also called the one-sample t-test in statistics [. Now consider the case of two independent samples. Suppose Xi1, Xi2, …, Xin, i=1,2 are data from two independent populations with means μ and variances σi2, i=1, 2. Let X̄i and Si2 be the sample means and sample variances in these two samples. The widely used test statistic for testing the hypothesis is H is If data are normally distributed in both samples and σ12=σ22, under H in (2) has a t-distribution with degrees of freedom n 2. If σ12≠σ22, the distribution of T in (2) is not so straightforward. This is the well-known Behrens-Fisher problem[13] in statistics and is out of the scope of this paper. However, if both n and n are large enough, we can still use the standard normal distribution to approximate the distribution of T. The test statistic in (2) is also called the two-sample t-test in statistics[13]. In two-sample t-test, the groups usually have different sample sizes. Consider another scenario. Suppose we have a set of randomly selected, matched pair observations (X, from a study population. This kind of data is very typical in pre- and post-treatment study. For example, in a hypertension study, X and X are the blood pressure of patient i before and after the treatment. This is different from the two independent samples considered above. For matched pair data, X and X are correlated as they are two measurements on the same individual. Suppose in the study population, the mean blood pressures before and after the treatment are μ and μ2, respectively. The treatment effect can be measured by μ1 – μ2. Let Yi=X, the difference of measurements before and after treatment. Let Y and SY2 be the sample mean and sample variance of Y, i=1, ... n. The test statistic widely used in statistics to test the hypothesis H is If X, X has a bivariate distribution [, the test statistic in (3) has t-distribution with degrees of freedom n - 1, and is called the paired t-test in statistics [. Note that the two-sample t-test and the paired t-test can be written in the same form where is an consistent estimator131 of the variance of X̄1 - X̄2. The formula of obtained from the authors upon request. From (3) we know that the paired t-test is exactly the one-sample t-test constructed from the pre- and post-treatment differences. In this paper we focus on the matched pair data. In the construction of test statistic (3) we assume that the pre- and post-treatment data are available for each individual. The power of the test increases with the sample size. However, missing data is very typical for pre- and post-treatment studies. Usually the pre-treatment measurement is available for each individual. However, the post-treatment data may be missing for some individuals. This puts some challenges on the data analysis. For example, to test the hypothesis of no treatment effect, we may construct the test statistics of the same structure as (4). However, for the estimation of the mean value of the pre-treatment measurements, should we use all individuals, or only the individuals with complete pairs? What is the relative efficiency of the test statistics based these two different estimations? The paper is organized as follows. Section 2 introduces two widely used moment-based test statistics, and calculates their relative efficiency. In Section 3 we construct a test which is more powerful that the tests in Section 2, and is equivalent to the likelihood ratio test when the data is from bivariate normal distribution. In Section 4, we do some simulation studies to compare the powers of these tests. Our conclusion and further discussion are reported in Section 5.

2. Two moment-based tests and their relative efficiency

Suppose the full data is (Xi1, Xi2) i=1, . . . n, where Xi1 and Xi2 are pre- and post-treatment measurements, respectively. The pre-treatment measurement is observed for every individual. However, for some individuals, the post-treatment measurements are not observed. For individual i, we define an indicator R, with R=1 if X is observed, and R,=0 otherwise. Hence the number of complete pairs (i.e. both pre- and post-treatment measurements are observed) is ; Data can be missing in very complicated patterns in biomedical research, especially in longitudinal follow- up studies. See Rubin[, Little and Rubin[ for more theoretical discussions about the missing patterns. In this manuscript, we consider a very simple missing patter where the post-treatment is assumed to be missing completely at random (MCAR)[, which means that the probability that X is missing does not dependent on X This is a strong assumption. For example, suppose X and X are the blood pressures before and after the treatment. If the patient skips the post-treatment because he accidently forgets the appointment, the MCAR assumption is satisfied. However, if the patient thinks his blood pressure at the first appointment is in the normal range and he doesn’t want to waste time to do the second measurement, the MCAR assumption is invalid in this case as the missing depends on the first measurement. We assume the mean and variance of X are μ and σj2, j=1, 2. Since X and X are from the same individual, they are usually correlated (assuming their correlation coefficient is ρ). Given the data, these parameters need to be estimated in order to make appropriate statistical inference. With MCAR data, -consistent estimators can be easily obtained for all these parameters. For the pre- and post-treatment data, we are interested in the treatment effect which can be measured by μ1–μ2. Statistically, the hypothesis for no treatment effect is the same as H.

2.1 Test statistic based on all available data

From formula (4) we know that test statistic depends on the estimation of the pre- and post-treatment means and the (estimator of) the variance of the sample mean difference. In this section, the sample mean (denoted by µ̂1A) of the pre-treatment measurement is calculated based on all individuals, and the sample mean (denoted by µ̂2) of the post-treatment is calculated only based on the post-treatment measurements, i.e. The t-test based on all available data is The exact distribution of T is difficult to calculate. However, under MCAR, when sample size n is large enough, the normal distribution can be used to appropriate the asymptotic distribution of T

2.2 Test statistic based on complete pairs

In this method, the sample mean (denoted by µ̂1C) of the pre-treatment measurement is only based on individuals with complete pairs, i.e. The t-test is exactly the paired t-test Similarly, under MCAR, the asymptotic distribution of T can be approximated by normal distribution for large sample size.

2.3 Asymptotic relative efficiency

The relative efficiency of two tests is used to characterize their powers[. Let rσ=σ1/σ2, the ratio of the standard deviations of the pre- and post-treatment measurements. Let n denote the probability that the post-treatment measurement is observed. It can be proved that the asymptotic relative efficiency of T with respect to T is Here ARE > (or <) 1 means that T is more (or less) powerful than T to detect the pre- and post-treatment difference if it exists. From formula (5) we can see that the relative efficiency depends on the proportion of missing (1 - π), the ratio of the variances of the pre- and post-treatment measurements, and their correlation. More specifically, we have the following conclusions about ARE: Formula (5) shows that ARE is always greater than π. This is very intuitive as π is the proportion of patients without missing data. If rσ ≥ 2, T is more powerful than TC. If rσ /2 ≤ ρ < 1, T is more powerful than T If – 1 ≤ ρ < rσ /2 < 1, T is more powerful than TC. If rσ1, i.e. σ, then. This means that for highly (positively) correlated data, T is more powerful than TA. It is interesting to see that T is not always more powerful than T, as one would have expected since the former test is based on more data than the latter one. When σ < 2σ2, T is actually more efficient than T if r /2 ≤ ρ ≤ 1. In addition, in the special case of σ1=σ, T can be much more efficient than T if the pre- and post-treatment measurements are highly correlated.

3. An optimal combination of moment-based tests

Section 2 shows that although T and T are the same when data is not missing; none of them is uniformly more powerful than the other when data is missing completely at random. A very intuitive idea to find an intermediate point between those two tests which may be at least as powerful as both of them. More precisely, consider the following set Each element in F is a valid test, and T and T are two special elements in this family. Theorem 1. Among all tests defined in (7), T(λo) is the most powerful one, where and The proof of this theorem is out of the scope of this paper, but it is available from the authors upon request. Remark: It is well known that if the data is from bivariate normal distribution, the likelihood ratio test (LRT) is the most efficient test[ . We can prove that T(λo) in Theorem 1 is equivalent to the likelihood ratio test for bivariate normal data. It only depends on the first two moments of the data, is easy to use, and is more powerful than currently widely used two tests TA and TC. Same idea of combination has been used in other area of statistics. For example, Oakes and Feng[ constructed of an optimal linear combination of the stratified and unstratified log-rank tests[

4. Simulation results

In this section we compare the empirical power of T and T(λo) for different sample sizes and different parameters in the distribution of the data. The significance level was set at 5% for all cases. About 30% of the post-treatment data is missing. For each test statistic T, we first standardize it to make its (asymptotic) variance equal 1. The empirical power is obtained from 10, 000 Monte Carlo replications. The empirical power is the proportion of times that |T| >1.96.

Case 1. Bivariate normal data

In this case, the matched pair (X are generated from bivariate normal distribution[. We report the powers of T, T(λo) and LRT. The result is in Table 1.

Table 1:

Comparison of powers of test statistics (bivariate normal data)

n	Parameters					Powers of test statistics
n	μ1	μ2	σ1	σ2	ρ	TA	TC	T (λo)	LRT
50	0.0	0.5	3.0	1.0	0.6	0.27	0.21	0.27	0.27
	0.0	0.5	1.0	2.0	0.3	0.32	0.42	0.42	0.42
	0.0	0.5	1.0	2.0	0.6	0.39	0.44	0.44	0.44
	0.0	0.5	2.0	1.0	0.6	0.52	0.44	0.54	0.54
	0.0	0.25	1.0	1.0	0.9	0.60	0.91	0.91	0.91
	0.0	0.25	0.6	1.0	0.3	0.31	0.31	0.32	0.32
	0.0	0.25	0.6	1.0	0.6	0.40	0.45	0.45	0.45
	0.0	0.25	1.0	1.0	0.8	0.49	0.65	0.66	0.66
100	0.0	0.5	3.0	1.0	0.6	0.48	0.38	0.49	0.49
	0.0	0.5	1.0	2.0	0.3	0.57	0.57	0.58	0.58
	0.0	0.5	1.0	2.0	0.6	0.67	0.74	0.74	0.74
	0.0	0.5	2.0	1.0	0.6	0.82	0.74	0.84	0.84
	0.0	0.25	1.0	1.0	0.9	0.88	1.00	1.00	1.00
	0.0	0.25	0.6	1.0	0.3	0.55	0.55	0.56	0.56
	0.0	0.25	0.6	1.0	0.6	0.67	0.74	0.74	0.74
	0.0	0.25	1.0	1.0	0.8	0.78	0.91	0.92	0.92
200	0.0	0.5	3.0	1.0	0.6	0.77	0.65	0.78	0.78
	0.0	0.5	1.0	2.0	0.3	0.85	0.85	0.87	0.87
	0.0	0.5	1.0	2.0	0.6	0.93	0.95	0.95	0.95
	0.0	0.5	2.0	1.0	0.6	0.98	0.95	0.99	0.99
	0.0	0.25	1.0	1.0	0.9	0.99	1.00	1.00	1.00
	0.0	0.25	0.6	1.0	0.3	0.84	0.84	0.85	0.85
	0.0	0.25	0.6	1.0	0.6	0.93	0.96	0.96	0.96
	0.0	0.25	1.0	1.0	0.8	0.97	1.00	1.00	1.00

As expected, given the parameters in the distribution of the data, the power of each test increases with the sample size. For T and T, none of them is always more powerful than the other. For example, given sample size n=200, when μ1=0, μ=0.5, σ=3.0, σ=1.0, and ρ=0.6, the powers of T and T are 0.77 and 0.65, respectively. However, when μ1=0, μ=0.5, σ=1.0, σ=2.0, and ρ=0.6, their powers are 0.93 and 0.95, respectively. In any scenario, T(λ is more powerful than T and T even when sample size is relatively small (e.g. n=50), and it always has the same power as the likelihood ratio test.

Case 2. Mixed normal-exponential data

The data is generated in the following form: where X, X, and X are independent random variables; X and X have normal distribution, and X has exponential distribution. In this case, the data does not have bivariate normal distribution. However, as long as the sample size is large enough, we can still use the t-test to compare the pre- and post-treatment mean values. Table 2 reports the empirical powers of TA, TC, and T(λo). It shows that neither T nor T is more powerful than the other in all situations. However, T(λo) is always more powerful than both of them.

Table 2.

Comparison of powers of test statistics (mixed normal-exponential data)

n	Parameters					Powers of test statistics
n	μ1	μ2	σ1	σ2	ρ	TA	TC	T (λo)
50	0.0	1.5	6.25	4.0	0.6	0.47	0.42	0.49
	0.0	1.5	4.0	5.0	0.6	0.52	0.58	0.59
	0.0	1.5	4.0	5.0	0.3	0.39	0.38	0.40
	0.0	1.5	5.0	4.0	0.6	0.59	0.57	0.63
	0.0	1.0	4.0	4.0	0.9	0.61	0.91	0.91
	0.0	1.0	2.5	4.0	0.6	0.40	0.46	0.46
	0.0	1.0	2.5	4.0	0.3	0.31	0.31	0.32
	0.0	1.0	4.0	4.0	0.8	0.50	0.64	0.66
100	0.0	1.5	6.25	4.0	0.6	0.76	0.71	0.79
	0.0	1.5	4.0	5.0	0.6	0.81	0.86	0.86
	0.0	1.5	4.0	5.0	0.3	0.67	0.65	0.68
	0.0	1.5	5.0	4.0	0.6	0.87	0.86	0.90
	0.0	1.0	4.0	4.0	0.9	0.88	1.00	1.00
	0.0	1.0	2.5	4.0	0.6	0.68	0.74	0.74
	0.0	1.0	2.5	4.0	0.3	0.55	0.55	0.56
	0.0	1.0	4.0	4.0	0.8	0.79	0.91	0.92
200	0.0	1.5	6.25	4.0	0.6	0.97	0.95	0.97
	0.0	1.5	4.0	5.0	0.6	0.98	0.99	0.99
	0.0	1.5	4.0	5.0	0.3	0.92	0.91	0.93
	0.0	1.5	5.0	4.0	0.6	0.99	0.99	1.00
	0.0	1.0	4.0	4.0	0.9	0.99	1.00	1.00
	0.0	1.0	2.5	4.0	0.6	0.93	0.96	0.96
	0.0	1.0	2.5	4.0	0.3	0.84	0.83	0.84
	0.0	1.0	4.0	4.0	0.8	0.97	1.00	1.00

5. Conclusion

In the pre- and post-treatment studies, if the data is missing completely at random, we can construct test statistics either using all available data or using the complete pairs. These two methods only use the first two moments of the data and are very easy to implement. However, none of these two methods is uniformly better than the other in all cases. The relative efficiency of these two methods depends on the proportion of missing, the ratio of the variances, and the correlation of two measurements on the same individuals. In this paper, we propose a data-based method which is more powerful than those two methods. In fact, it is the most efficient test when data has bivariate normal distribution. Missing data is a typical problem in pre- and post-treatment studies, and the missing pattern may be very complicated. MCAR is an over-simplified assumption. Another more realistic and still mathematically tractable missing pattern is missing at random (MAR). Generalizing our method to MAR data is in progress.

1 in total

1. Combining stratified and unstratified log-rank tests in paired survival data.

Authors: David Oakes; Changyong Feng
Journal: Stat Med Date: 2010-07-20 Impact factor: 2.373

1 in total