Literature DB >> 30899916

Improved methods for estimating fraction of missing information in multiple imputation.

Abstract

Multiple imputation (MI) has become the most popular approach in handling missing data. Closely associated with MI, the fraction of missing information (FMI) is an important parameter for diagnosing the impact of missing data. Currently γ m , the sample value of FMI estimated from MI of a limited m, is used as the estimate of γ0, the population value of FMI, where m is the number of imputations of the MI. This FMI estimation method, however, has never been adequately justified and evaluated. In this paper, we quantitatively demonstrated that E(γ m ) decreases with the increase of m so that E(γ m ) > γ0 for any finite m. As a result γ m would inevitably overestimate γ0. Three improved FMI estimation methods were proposed. The major conclusions were substantiated by the results of the MI trials using the data of the 2012 Physician Workflow Mail Survey of the National Ambulatory Medical Care Survey, USA.

Entities: Chemical Disease Gene Species

Keywords: Applied Mathematics; Mathematics & Statistics; Mathematics for Biology & Medicine; National Ambulatory Medical Care Survey; Science; fraction of missing information; missing data; multiple imputation; number of imputations

Year: 2018 PMID： 30899916 PMCID： PMC6423960 DOI： 10.1080/25742558.2018.1551504

Source DB: PubMed Journal: Cogent Math Stat

Introduction

Multiple imputation (MI) becomes the most popular approach to accounting for missing data (Carpenter & Kenward, 2013, Dohoo, 2015, Rezvan, Lee, & Simpson, 2015, Rubin, 1987, Van Buuren, 2012). Closely associated with MI, fraction of missing information (FMI) is an important parameter for diagnosing the effects of data missingness (Rubin, 1987). FMI can be interpreted as the fraction of information about Q due to non-response, where Q is the quantity of interest (Rubin, 1987). As MI become increasingly important, the importance of FMI is also increasing. The best known use of FMI is to define the relative efficiency (RE) of MI as RE = (1 + γ0/m)−1/2, where γ0 is the population value of FMI and m is the number of imputations (Rubin, 1987). Based on this RE, Rubin concluded that m ≤ 5 would be sufficient for MI (Rubin, 1987). Little et al. as well as Wagner suggested that FMI be used as an alternative tool for measuring data missing data or the response rate (Little et al., 2016, Wagner, 2010). Siddique, Harel, Crespic, and Hedekerd (2014) used FMI to verify the missing data mechanisms. The most common practice of FMI estimation is to use , where is the estimated value of γ0 and γ is the FMI obtained from MI of a given m, e.g. (Khare, Little, Rubin, & Schafer, 1993, Lewis et al., 2014, Schafer, 2001, Schenker et al., 2006). However, the accuracy of the method has not been adequately evaluated. This paper is to quantify possible biases of and to improve FMI estimation methodology if necessary and possible. Established by Rubin in 1987, the current FMI paradigm is defined by Equations (1)–(11) below: where subscript m and ∞ stands for a finite and infinite m, the subscript 0 for the population value, the subscript i for the ith imputation, and the bar hat for the parameter’s mean. where B, U, and T are the between-imputation, within-imputation, and the total variances. where r is the fractional variance increase due to data missingness. where v is the degrees of freedom. Equation (11) cannot be used to calculate γ0 in practice because B, U, and T are usually unknown. No researchers have provided an equation that explicitly links γ and γ0. The justification for using is not available from Equations (1)–(11). Assume . For to be valid, E(γ) = γ0 must be true. For E(γ) = γ0 to be true, E(γ) must be independent of m. To understand E(γ), let the same MI of a given m be repeated for j times. Denote the γ from each MI repeat as γ, γ, γ, …, γ. By definition, the expected value is the sum of all possible values each multiplied by the probability of its occurrence (Hogg, McKean, & Craig, 2013). Therefore, E(γ) can be defined as: Equation (12) shows that E(γ) can be understood as the ultimate mean of γ when j becomes infinity. If E(γ) is independent of m, we should have E(γ2) = E(γ3) = … = E(γ) = γ0, and the use of would be justified. If E(γ) depends on m, we should have E(γ2) ≠ E(γ3) ≠ … ≠ E(γ) ≠ γ0. The use of may not be justified if the difference between E(γ) and γ0 is intolerably big. Rubin indicated that the mean of γ can be regarded as γ0 [1 page 143], underlining an assumption that E(γ) is independent of m. Harel briefly mentioned that γ “tends to decrease as m increases” without providing any details (Harel, 2007). Although Harel’s statement favours E (γ) ≠ γ0, it cannot be a base for disproving because it might be acceptable to use if the decrease of γ with the increase of m is statistically negligible. To date the justifications for using is still missing. For FMI estimation, Harel’s 2007 paper (Harel, 2007) is important in that it pointed out that, unlike γ that “tends to decrease as m increases,” the quantity B/(U + B) “does not tend to decrease as m increases” (Harel, 2007). Harel used to estimate FMI in his research (Harel, 2007). The goal of Harel’s paper was not to find a better FMI estimation method per se and his discussion on was brief. Most researchers have not used Harel’s method for FMI estimation probably because most people may have treated Harel’s method as being research-specific rather than a method that may potentially be universally used for FMI estimation. In this study, we examined the relationships between m, γ, γ∞, and γ0, quantified the decrease of γ with the increase of m, quantified the biases of , and proposed improved methods for FMI estimation. Only univariate FMI definition will be examined in this paper even though multi-variate FMI definition may exist. The major conclusions were substantiated by the MI trials using the data of the 2012 Physician Workflow Mail Survey (PWS) of the National Ambulatory Medical Care Survey (NAMCS). This paper focuses on MI approach only even though it may be possible to estimate FMI via a non-MI approach (Savalei & Rhemtulla, 2012, Zheng & Lo, 2008).

The relationships between m, γ, γ∞, and γ0

The condition for γ∞ = γ0

To use γ for estimating γ0, one must assume γ∞ = γ0. Most researchers simply treat γ∞ and γ0 as synonyms (e.g. He et al., 2016). But they are not. Imagine a population of imputations (POI) is generated by repeating the imputation of the same model on the same data for an infinite number of times. An MI is simply a sample of the POI with sample size m. The sample value and the population value of FMI for the POI are γ and γ∞, respectively. The population for γ0, however, is not POI but the population of the sampling units of the survey. A γ∞ is inseparably linked to an MI, but a γ0 can be independent of MI. The γ0 may be estimated by MI as well as other methods such as maximum likelihood (Savalei & Rhemtulla, 2012, Zheng & Lo, 2008). Using Equations (5)–(7), one can prove that the condition for γ∞ = γ0 is B/U = B0/U0. When we use MI to estimate γ0, we have to assume γ∞ = γ0, which is probably why γ∞ and γ0 are often treated as synonyms in MI analyses. In this paper, we will assume γ∞ = γ0 because we use MI to estimate γ0.

Bm/Um is independent of m

Equation (7) that defines γ does not have m as a factor. Combining Equations (5), (6), and (7) gives an expanded definition of γ with m as one of the independent factors affecting γ: Equation (13) shows that γ is a function of three factors, i.e. γ = F(m, B, U). In Equation (13), B and U always appear together as B/U. Letting c = B/U, then γ becomes a function of two factors, i.e. γ = F(m, c). Whether c is independent of m is important in understanding the m–γ relationship. If c depends on m, the direct effects of m on γ would be confounded by the indirect effects of m on γ via m’s effects on c, which in turn could be due to m’s effect on B, U or both. If c is independent of m, then the m–γ relationship would be greatly simplified. In order to establish that c is independent of m, we need to prove E(B/U) = B0/U0. Equation (2) indicates that the relationship between m and B is that between the sample size n and the variance (s2) so that E(B) is independent of m, i.e. E(B) = B0 (Serfling, 1980). Equation (3) indicates that the relationship between m and U is that between the sample size n and the sample mean so that E(U) is independent of m, i.e. E(U) = U0 (Hogg et al., 2013). Jensen’s Inequality (Hogg et al., 2013) determines that E(1/U) ≥ 1/E(U). Therefore E(B/U) = E(B)E(1/ U) ≥ E(B)/E(U) = B0/U0, or E(B/U) ≥ B/U. Our simulation studies show that the maximum difference between E(B)/E(U) and E(B/U) is less than 0.1%, which is negligible in virtually any statistics work. We can safely regard E(B/U) = B0/U0 as a fact for the purpose of studying the m– γ relationship. The c’s independence of m is thus proved. The subscript m can be removed from c. As a result, we can indeed letting c = B/U be a constant c in Equation (13) and make γ become a function of the single factor m, i.e. γ = F(m).

The γm = F(m,γ0) equation

When m goes infinite, γ becomes γ0. Our goal is to establish the mathematic relationship between γ and γ0 at a finite m, which is currently missing in published literatures. In the discussions above, we have showed that it is mathematically legitimate to letting c be a constant c in studying the m– γ relationship because c is independent of m. What is the best value to choose for c to obtain the most truthful m–γ relationship? The answer is: c = E(B/U) = B0/U0. If and only if c = E(B/U) = B0/ U0, the m–γ relationship as determined by Equation (13) would reflect the true m–γ relationship. From Equations (10) and (11) we can obtain B0/U0 = γ0/(1 − γ0). Replacing B/U in Equation (13) with γ0/(1 − γ0), we obtain an equation that directly links γ to γ0 as follows: Establishment of equation is a significant step forward in understanding the relationship between m, γ, and γ0 because it links the three factors in the same equation for any m, finite or infinite. For a given analysis of a given dataset, γ0 is a constant. When we repeat the same MI of a given m for j times, the γ value from each repeat of the MI will not change when the γ is determined by Equation (14). In other words, we will have γ = γ = γ = … γ = E(γ) (see Equation (12)). In other words, the γ value obtained from Equation (14) will be E(γ). For different data and analyses, γ0 is a variable. Equation (14) shows that E(γ) is a function of the two factors, m and γ0, i.e. E(γ) = F(m, γ0).

The decrease of E(γ) with the increase of m

E(γm) > γ0 for any finite m

We all know that is independent of n and equals to μ, which provides the theoretical base for (Hogg et al., 2013). The use of implies the assumption that E(γ) = γ0. Using Equation (14), the m–E(γ) relationship curve can be constructed for any given γ0. Figure 1 presents the m–E(γ) relationship curves for γ0 = 0.15 and 0.2. Based on Figure 1, for the first time in MI research, we can explicitly state this important fact: E(γ) decreases with the increase of m. The decrease of E(γ) with the increase of m can be interpreted as follows: For a given dataset with a given MI model, the ultimate mean of γ, which is the mean of an infinite number of individual γ values obtained from repeating the MI of the given m for an infinite number of times, would always be greater than the γ0. Of course what is called “the ultimate mean” here is the E(γ) (Hogg et al., 2013). Therefore, by showing E(γ) decreases with the increase of m, we have proved that E(γ) > γ0 for any finite m (Figure 1). The fact that E(γ) > γ0 is further illustrated by more data in Table 1 for a wider range of m values and more γ0 values.

Figure 1.

The m–E(γ) relationship curve at γ0 = 0.2 and 0.15 as determined by Equation (14).

Table 1.

Changes of E(γ), Dγ, and R with the increase of m at different γ0 levels, where Dγ = 100(E(γ)−γ0 )/γ0 and R= 100(γ −γ)/γ

m	E(γ_m)		R_Dγ		D_γ
m	γ₀ = 0.2	γ₀ = 0.01	γ₀ = 0.2	γ₀ = 0.01	γ₀ = 0.2	γ₀ = 0.01
2	0.361	0.01536	23.33	14.12	80.59	53.64
5	0.250	0.01205	3.872	2.957	25.23	20.47
10	0.224	0.01102	1.0240	0.8502	11.84	10.16
20	0.212	0.01051	0.2664	0.2306	5.750	5.062
40	0.206	0.01025	0.0682	0.0602	2.837	2.528
60	0.204	0.01017	0.0306	0.0272	1.883	1.684
100	0.202	0.01010	0.0111	0.0099	1.126	1.010
200	0.201	0.01005	0.0028	0.0025	0.561	0.505

The bias of the current FMI estimation method

The fact that E(γ) > γ0 dictates that the current FMI estimation method must be biased. One achievement of this paper is that we successfully quantified the bias of the current FMI estimation method. We use Dγ, the percentage difference between E(γ) and γ0 as the parameter to measure this bias, i.e.: Table 1 presents the Dγ values at different γ0 and m values as determined by Equation (14). At a given m, Dγ differs at different γ0 values (Table 1). For m = 2, Dγ is 80.59% and 53.64% for γ0 = 0.2 and 0.01, respectively (Table 1). When γ0 increased from 0.001 to 0.6, Dγ first increases with the increase of γ0, reaches a peak, and then decreases (Table 1 and Figure 2). The value of the γ0 at which Dγ reaches the peak differs with m (data not shown). For m = 5, the maximum Dγ value of 25.31% occurs at γ0 = 0.23, and the minimum Dγ value of 16.53% occurs at γ0 = 0.6 (Figure 2(b)). In other words, one could overestimate FMI by 25% at m = 5 if the current method is used. A bias of this magnitude cannot and should not be ignored. Development of a better FMI estimation method is indeed necessary.

Figure 2.

Effects of γ0 levels onDγ as defined by Equation (15) and RDγ as defined by Equation (16): a. Dγ at m = 5; b. Dγ at m = 2.

The γm decrease rate: smaller at larger m

We use Rγ, the percentage rate of the γ decrease per unit m, to measure the rate of the γ decrease: is affected by both m and γ0 (Table 1 and Figure 2, b1 and b2). At m = 5, is 3.87% and 2.96% for γ0 = 0.2 and 0.01, respectively (Table 1). Figure 2(b) show that increases initially, reaches a peak, and then decreases as γ0 increases from 0.001 to 0.6. For m = 2, the maximum = 23.91% occurs at γ0 = 0.15 (Figure 2(b)). For m = 5, the maximum = 3.88% occurs at γ0 = 0.21 (data not shown). The gradual reduction of makes it possible for choosing a sufficient m when the m-driven γ reduction becomes negligibly small.

Improved methods for γ0 estimation

Regarding as the control, any method that gives more accurate FMI estimation than this control will be considered as an improved method. Three improved methods are proposed below.

Improved method 1:

The control method is to use regardless the size of m. The first improved method is to choose a sufficiently large m when use . Data in Figure 1 show that E(γ) approaches γ0 as m gets larger. Therefore, γ would estimate γ0 with an adequate accuracy when m is sufficiently large. Various criteria have been used to determine the sufficient m (Bodner, 2008, Graham, Olchowski, & Gilreath, 2007, Hershberger & Fisher, 2003, Pan, Wei, Shimizu, & Jamoom, 2014, Royston, 2004, Rubin, 1987). An adequately accurate estimation of γ0 using offers another criterion for determining a sufficient m. As measured by , the gain in reducing the bias from increasing a unit m becomes smaller at a greater m. Using Equation (14), we can prove the bias of the default method as measured by Dγ would be about 1% or less for any reasonable γ0 values when m is greater than 100. We arbitrarily choose a bias of ≤1% as an acceptable level and recommend m ≥100 as being sufficient for an adequately accurate estimation of γ0 using . This method can be expressed as .

Improved method 2:

Calculating γ for different m and γ0 combinations using Equation (14), one will find the following approximation stands well for m ≥10: From Equation (17), we obtain the following method of estimating γ0 from γ: For those who may be interested, this method may be proven by resolving γ0 from Equation (14) using Taylor series expansion approximation. An advantage of this method is that one could use it to have a more accurate FMI estimation from the m and the γ information available in an earlier publication that uses a small m and γ to estimate FMI.

Improved method 3: , where cm = Bm/Um

In Section 2.2, we proved that E(B/U) = B/U. In other words, B/U is an unbiased estimation of B/U. As a result, Equation (19) below is a better γ0 estimation than : where c = B/U. Harel used this method to estimate γ0 for his study on two-stage MI (Harel, 2007). However, the justification for this method discussed here was not available in Harel’s paper or any other published literature (Harel, 2007).

Results from MI trials of PWS12

Methods

PWS was a supplemental survey of NAMCS, which collects data about the provision and use of ambulatory medical care services in the United States (Lau, McCaig, & Hing, 2016). The 2012 PWS data (PWS12) were used for the MI trial, which had 2,567 responded physicians in the sample. PWS data can be accessed via NCHS Research Data Center (RDS) program (https://www.cdc.gov/rdc/index.htm). MI was conducted on three variables representing the physician’s practice size at different scales, namely SIZE100, SIZE20, and SIZE5. The three variables had the same missing data percentage of 29% due to item non-responses. The hot-deck imputation method (Andridge & Little, 2010) was used. The RDS-released PWS12 data, which had 3.6% of missing values for SIZE after some of the missing values in PWS12 were replaced by the corresponding non-missing values for the same physician from the 2011 PWS data, were used as the hot-deck donor. Two MI models denoted as MI-1 and MI-2 were used. MI-1 did not use any covariate in the imputation and the non-missing replacement values for the missing value were randomly chosen from entire donor dataset. MI-2 used PRIMEMM as the covariate in the imputation and the non-missing replacement values for the missing value were randomly chosen from the cell of the same PRIMEMM value in donor dataset. PRIMEMM was the physician’s primary employment type that was coded into nine categories for this research. The MIs had m = 3, 5, 10, 20, 30, 40, 60, 80, and 100, with each MI being repeated for 30 times. Excluding m, there were 12 treatment combinations (3 imputed variables × 2 imputation models × 2 analytic models). The hot-deck imputation method used in this study was similar to that used by the survey for creating the RDC-released PWS12 data. According to Rubin (1987, equation 4.3.8), the hot-deck bias can be expressed as E(B)= B(n1/n), where n is the number of the units of the full sample and n1 is the number of the units with observed values. Since the n1/n ratio is independent of m, the percentage fraction of the hot-deck bias would be a fixed value as long as the n1/ n ratio is fixed. Therefore the m–γ relationship obtained from the hot-deck-based MI trials should still be valid. One should be aware of the potential hot-deck bias when interpreting the results of this study. The quantity of interest (Q) was the means of the SIZE100, SIZE20, and SIZE5. Two analytical models denoted as Anal-1 and Anal-2 were used. In Anal-1, U, the within-imputation variance of the ith complete dataset generated by the MI, was the total variance of SIZE100, SIZE20, or SIZE5 in the ith dataset. In Anal-2, U was the variance of the ith dataset after the variance due to the effect of PRIMEMM was removed. Analyses were based on un-weighted data. Results obtained in this study were for research purpose only. Barnard and Rubin (1999) suggested that, for making the statistical inferences in MI-involved analyses, instead of using the degrees of freedom (v) as defined by Equation (6), the adjusted degrees of freedom (DFa) as proposed by their paper should be used where the complete-data degrees of freedom is not sufficiently large. However in the γ definition, i.e. Equation (7), v does not function as the degrees of freedom per se but mealy as a mathematical value in the estimation of γ. We have found that replacing v in Equation (7) with DFa will result in an erroneous estimation of γ. Therefore we used v, instead of DFa, when used Equation (7) for the γ estimation in this study.

The γm decrease with the increase of m in the MI trials

Would the γ decrease due to the increase of m (see Section 2.1 and 3.1) be big enough to stand out from sampling errors and other noises in real-world MI analyses? The answer is yes, as demonstrated by the data in Figure 3. Figure 3 shows the effects of m on γ in SIZE100, SIZE20, and SIZE5 for the two MI models for Anal-2. In spite of the γ variations due to sampling errors as shown by the error bars in the graphs, the dominant trend was clear: γ decreased significantly as m increased from 3 to 100. The γ values at m = 3–40 were significantly greater than γ100 in most cases (Figure 3). These results suggest that the γ decrease with the increase of m is not ignorable in FMI estimation in real world data analyses.

Figure 3.

Effects of m on γ at δ = 29% for analytic model = Anal-2: MI model = MI-1; b. MI model = MI-2.

Variation of γm, Bm, and Um

In establishing the MI framework, Rubin (1987) assumed that U ≈ U0, which would be more likely to be true if the variance of U is negligible. The authors did not find any information on the magnitude of U variance in published literature. A detailed study on B variance was reported by Pan et al. (Pan et al., 2014). The variance of B was substantial when m < 30 (Pan et al., 2014). The variations in B and U would inevitably lead to γ variation. As a result, when using at an insufficient m, the inaccuracy of would not only come from E(γ) > γ0 but also from the variation of γ The possible bias from sampling-error-driven γ variation has not been given an adequate attention. The coefficient of variations (CV) of B, U, and γ are presented in Table 2. Both the imputations models and the analytic models affected the variations of γ, B, and U (Table 2). CV of U was much smaller—usually 1–10% that of B. The CV of B and γ were very similar, with the CV of γ being always slightly smaller than that of B. The greater the m, the smaller the variations of γ, B, and U (Table 2). These results were in agreement with Harel’s conclusion (Hogg et al., 2013) that it is necessary to choose a sufficient m for MI to control the variations of γ. Due to the significant effects of the MI model and the analytic model on the variations of γ, B, and U (Table 2), it may not be possible to propose a single m that fits all situations for controlling the variance of γ, B, and U.

Table 2.

Coefficient of variations (%) of B, U, and γ for SIZE100

m	MI-1, Anal-1			MI-2, Anal-2
m	B_m	U_m	γ_m	B_m	U_m	γ_m
3	20.24	0.185	20.19	1.52	0.0553	1.50
5	11.12	0.179	11.10	1.00	0.0200	1.01
10	7.75	0.135	7.75	0.88	0.0353	0.91
20	5.91	0.098	5.89	0.49	0.0282	0.48
30	6.24	0.077	6.26	0.24	0.0150	0.24
40	3.60	0.048	3.59	0.61	0.0180	0.59
60	2.74	0.058	2.73	0.39	0.0091	0.39
80	2.48	0.041	2.47	0.17	0.0062	0.17
100	2.45	0.037	2.45	0.15	0.0081	0.16

An advantage of using is that not only can this method reduce the E(γ) > γ0 bias but also reduce γ variation because of a large m. The other two improved methods can effectively reduce or even eliminate the E(γ) > γ0 bias even if when m is small. However the inaccuracy may be a concern for any FMI estimation methods unless a sufficient m is chosen. Data in Figure 3 suggest that a ≥20 m may be necessary to reduce the γ variation to an acceptable level for using and .

Comparison of different FMI estimation methods

Table 3 presents data for visualizing the performance of these three improved γ0 estimation methods described in Section 4 in comparison with the default method in an example of real-world data analyses. The treatment combination of the MI trials was {SIZE20, MI-2, Anal-2}. The control values was the γ values at m = 3, 5, etc., which would be the FMI estimation when the default method was used. The γ100 value was used as the for the improved method . The best was calculated by Equation (19) using as the estimate of B0/U0, where and were the mean of the 30 replicates of B100 and U100.

Table 3.

Comparison of different γ0 estimation methods for SIZE20 with imputation model = MI-2 and analytic model = Anal-2 in the PWS12 MI trials. The best was calculated by Equation (19) using as the estimate of B0/U0, where and were the mean of the 30 replicates of B100 and U100, respectively

m	SIZE20, MI-2, Anal-2
	Controlγ^0=γm	Improved
	Controlγ^0=γm	γ^0=γm≥100	γ^0=cm/(1+cm)	γ^0=γm(m/(1+m))
3	0.00379		0.00283	0.00284
5	0.00318		0.00265	0.00265
10	0.00286		0.00260	0.00260
20	0.00293		0.00279	0.00279
30	0.00278		0.00269	0.00269
40	0.00273		0.00267	0.00267
60	0.00277		0.00273	0.00273
80	0.00272		0.00269	0.00269
100	0.00270	0.00270	0.00267	0.00267
(∞)	Best γ^0: 0.00267

For m ≤ 80, all three improved methods performed better than the control method (Table 3). These results suggest that the three improved methods proposed in this paper can be used to replace the control method in real world data analyses. In general we recommend to use , for it essentially eliminates the E(γ) > γ0 bias at all levels of m. But the two other methods may come in handy under certain circumstances. For example, if an earlier publication which had used m = 5 without providing B and U values, one can simply use to convert the biased γ0 estimate of the paper into a more correct γ0 estimate.

Conclusions

In most published researches, γ∞ and γ0 are treated as synonyms. However, the two are different. The γ0 is independent of MI, whereas γ∞ is a parameter of MI. γ∞ equals to γ0 only if B/U = B0/U0. To use MI for FMI estimation, one has to assume γ∞ = γ0, which will also be the assumption here. The γ decreases with the increase of m. We quantified the m–γ relationship. The magnitude and the rate of the γ decrease varies with m and γ0. At m = 2, γ2 is greater than γ0 by 50–81% depending on the γ0 level. At m = 5, the recommended m value as being sufficient by some (e.g. Rubin, 1987), γ is greater than γ0 by 20–25% when γ0 value ranges from 0.001 to 0.6. The decrease of γ with the increase of m determines that E(γ) > γ0 for any finite m. The results from the MI trials suggest that the volume of the γ decrease with increased m is not ignorable in real world data analyses in spite of the noises from sampling errors and other sources. E(B) and E(U) are independent of m. Therefore, the decrease of γ with the increase of m is not due to an indirect effect of m on E(B) and E(U). As a result, it is not necessary to use the B and U from the same MI for best γ estimation. Instead, one should use the best estimates of B0 and U0 available, which leads to the development of Equation (14) that links γ to γ0 directly. The variation in γ can be substantial. The CV of γ was essentially identical with that of B, and CV of U was 1–10% that of γ or B. The variation of γ is smaller as m gets bigger. The inaccuracy of FMI estimation due to γ variation should be concerned in FMI estimation when m is small regardless what method is used. The current method may result in a substantial FMI overestimation when m is not sufficiently large. Three improved methods are proposed for estimating γ0 from MI of a finite m. These three methods are (1), (2), and (3), where c = B/U. In our MI trials, all three improved methods gave more accurate γ0 estimates than where m is less than 80. When m is sufficiently large, say, m ≥ 100, all three methods should give a statistically sound estimation of γ0. When m is not sufficiently large, say, m < 100, the third method should be one’s best option for γ0 estimation. The second method has its value where B and U are not available and the only values available to use for γ0 estimation are m and γ.

2 in total

1. Estimating a panel MSK dataset for comparative analyses of national absorptive capacity systems, economic growth, and development in low and middle income countries.

Authors: Muhammad Salar Khan
Journal: PLoS One Date: 2022-10-20 Impact factor: 3.752

2. Accounting for multiple imputation-induced variability for differential analysis in mass spectrometry-based label-free quantitative proteomics.

Authors: Marie Chion; Christine Carapito; Frédéric Bertrand
Journal: PLoS Comput Biol Date: 2022-08-29 Impact factor: 4.779

2 in total