| Literature DB >> 30124749 |
Margarita Moreno-Betancur1,2, Katherine J Lee1,3, Finbarr P Leacy4, Ian R White5, Julie A Simpson2, John B Carlin1,2.
Abstract
With incomplete data, the "missing at random" (MAR) assumption is widely understood to enable unbiased estimation with appropriate methods. While the need to assess the plausibility of MAR and to perform sensitivity analyses considering "missing not at random" (MNAR) scenarios has been emphasized, the practical difficulty of these tasks is rarely acknowledged. With multivariable missingness, what MAR means is difficult to grasp, and in many MNAR scenarios unbiased estimation is possible using methods commonly associated with MAR. Directed acyclic graphs (DAGs) have been proposed as an alternative framework for specifying practically accessible assumptions beyond the MAR-MNAR dichotomy. However, there is currently no general algorithm for deciding how to handle the missing data given a specific DAG. Here we construct "canonical" DAGs capturing typical missingness mechanisms in epidemiologic studies with incomplete data on exposure, outcome, and confounding factors. For each DAG, we determine whether common target parameters are "recoverable," meaning that they can be expressed as functions of the available data distribution and thus estimated consistently, or whether sensitivity analyses are necessary. We investigate the performance of available-case and multiple-imputation procedures. Using data from waves 1-3 of the Longitudinal Study of Australian Children (2004-2008), we illustrate how our findings can guide the treatment of missing data in point-exposure studies.Entities:
Mesh:
Year: 2018 PMID: 30124749 PMCID: PMC6269242 DOI: 10.1093/aje/kwy173
Source DB: PubMed Journal: Am J Epidemiol ISSN: 0002-9262 Impact factor: 4.897
Figure 1.Canonical complete-data directed acyclic graph (c-DAG) for a general point-exposure study. For illustration, we provide under each node heading the variables involved in an example study of maternal mental illness and child behavior that used data from waves 1–3 of the Longitudinal Study of Australian Children (2004–2008). SDQ, Strengths and Difficulties Questionnaire.
Figure 2.Canonical missingness directed acyclic graphs (m-DAGs) for a general point-exposure study. These 10 m-DAGs were identified as providing the most general forms of all essentially distinct extensions of the m-DAG shown in panel A (referred to as “m-DAG A”) in terms of recoverability. To illustrate how each m-DAG extends m-DAG A, the additional arrows are indicated with a heavier line. In the text and tables, we refer to each m-DAG according to its figure locant (m-DAG A, m-DAG B, etc.).
Recoverability Results for the Missingness Directed Acyclic Graphs (m-DAGs) in Figure 2, Stating Whether Each Distribution and Parameter Was Found to Be Recoverable in Each m-DAGa
| Missingness DAG | Joint Distribution of | Marginal Distribution of | Marginal Distribution of | Conditional Distribution of | |||
|---|---|---|---|---|---|---|---|
| Entire Distribution | Expectation (e.g., Proportion Exposed) | Entire Distribution | Expectation (e.g., Mean of | Entire Distribution | Expectation (If Yes, Also Holds for the Regression Coefficient) | ||
| A | Yesb | Yesb | Yes | Yesb | Yes | Yesb | Yes |
| B | Yesc | Yesd | Yes | Yesd | Yes | Yesb | Yes |
| C | Yesc | Yesd | Yes | Yesd | Yes | Yesd | Yes |
| D | Noe | Noe | No | Yesb | Yes | Yesb | Yes |
| E | Noe | Noe | No | Unable to establish | Conjecture no unless | Yesb | Yes |
| F | Noe | Noe | No | Yesb | Yes | Unable to establish | Conjecture nob |
| G | Noe | Yesd | Yes | Noe | No | Nof | No |
| H | Noe | Unable to establish | Conjecture no unless | Noe | No | Nof | No |
| I | Noe | Noe | No | Unable to establish | Conjecture no unless | Unable to establish | Conjecture nob |
| J | Noe | Noe | No | Noe | No | Nof | No |
Abbreviation: DAG, directed acyclic graph.
a Expressions in terms of available data provided in Table 2 in case of recoverability.
b Proof is provided in Web Appendix 2.
c Result obtained by corollary 1 in the paper by Mohan and Pearl (15).
d By recoverability of joint distribution (possibly of a reduced graph).
e By theorem 3 in the paper by Mohan and Pearl (15).
f By corollary 2 in the paper by Mohan and Pearl (15).
Recoverability Results for the Missingness Directed Acyclic Graphs in Figure 2, Providing for Each Recoverable Distribution Its Mathematical Expression in Terms of Available Dataa,b,c
| Missingness DAG | Joint Distribution | Marginal Distribution of | Marginal Distribution of | Conditional Distribution of |
|---|---|---|---|---|
| A | ||||
| B | No simple expression | No simple expression | ||
| C | No simple expression | No simple expression | No simple expression | |
| D | ||||
| E | ||||
| F | ||||
| G | No simple expression | |||
| H, I, J |
Abbreviation: DAG, directed acyclic graph.
a Proofs are provided in Web Appendix 2.
b A blank space is left where the distribution is not recoverable or it has not been established as documented in Table 1.
c = (M, M , M), 0 = (0, 0, 0).
Figure 3.Assessment of the existence of an arrow from each incomplete variable to each missingness indicator in the example from the Longitudinal Study of Australian Children (2004–2008), drawing from evidence in the literature (39, 42–45). SDQ, Strengths and Difficulties Questionnaire.
Estimates of 3 Target Parameters Using 2 Approaches to Handle Missing Data in the Example Study of Maternal Mental Illness and Child Behavior, Longitudinal Study of Australian Children (Waves 1–3), 2004–2008
| Parameter | Estimate (SE) | 95% CI | Is Estimate Reliablea if We Adopt: | |
|---|---|---|---|---|
| m-DAG E? | m-DAG J? | |||
| Proportion of mentally ill mothers at wave 1 | ||||
| Available-case analysis | 0.21 (0.01) | 0.20, 0.22 | No | No |
| MICE | 0.21 (0.01) | 0.20, 0.23 | No | No |
| Mean SDQ scoreb of children at wave 3 | ||||
| Available-case analysis | 7.48 (0.09) | 7.31, 7.65 | No | No |
| MICE | 7.74 (0.09) | 7.57, 7.90 | No | No |
| Regression-adjusted difference in mean SDQ scorec | ||||
| Available-case analysis | 0.59 (0.20) | 0.20, 0.98 | Yes | No |
| MICE | 0.64 (0.21) | 0.23, 1.06 | Yes | No |
Abbreviations: CI, confidence interval; m-DAG, missingness directed acyclic graph; MICE, multiple imputation by chained equations; SDQ, Strengths and Difficulties Questionnaire; SE, standard error.
a This indicates whether the estimate can be considered reliable according to which m-DAG from Figure 2 is adopted, based on the recoverability of the parameter in that m-DAG.
b Range, 0–40. A higher score indicates increased behavioral difficulties.
c Comparing mentally ill mothers with non–mentally ill mothers.