| Literature DB >> 26095614 |
Tim P Morris1,2, Ian R White3, James R Carpenter1,2, Simon J Stanworth4, Patrick Royston1.
Abstract
Multivariable fractional polynomial (MFP) models are commonly used in medical research. The datasets in which MFP models are applied often contain covariates with missing values. To handle the missing values, we describe methods for combining multiple imputation with MFP modelling, considering in turn three issues: first, how to impute so that the imputation model does not favour certain fractional polynomial (FP) models over others; second, how to estimate the FP exponents in multiply imputed data; and third, how to choose between models of differing complexity. Two imputation methods are outlined for different settings. For model selection, methods based on Wald-type statistics and weighted likelihood-ratio tests are proposed and evaluated in simulation studies. The Wald-based method is very slightly better at estimating FP exponents. Type I error rates are very similar for both methods, although slightly less well controlled than analysis of complete records; however, there is potential for substantial gains in power over the analysis of complete records. We illustrate the two methods in a dataset from five trauma registries for which a prognostic model has previously been published, contrasting the selected models with that obtained by analysing the complete records only.Entities:
Keywords: fractional polynomials; missing data; multiple imputation; multivariable fractional polynomials
Mesh:
Year: 2015 PMID: 26095614 PMCID: PMC4871237 DOI: 10.1002/sim.6553
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
Figure 1Example FP2 functions of the form . The numbers in parentheses are values of (p 1,p 2) used to plot the curve.
Figure 2Simulation results: estimation of according to method (10 000 replicates). CD‐ll is log‐likelihood in complete data; CR‐ll is log‐likelihood in complete records; MI‐Wald is Wald statistic in multiple imputation (MI) data; MI‐ll is log‐likelihood in MI data.
Figure 3Type I error (left) and power (right) of FP1 versus linear test of nominal size 0.1 on x 1 with x 1 and x 2 missing at random.
Figure A.5Type I error (left) and power (right) of FP1 versus null test of nominal size 0.1 on x 1 with incomplete x 1 and x 2 missing at random.
Figure A.2Type I error (left) and power (right) of FP1 versus linear test of nominal size 0.1 on x 1 with x 1 missing at random.
Figure A.1Type I error (left) and power (right) of FP1 versus null test of nominal size 0.1 on x 1 with x 1 missing at random.
Figure A.4Type I error (left) and power (right) of FP1 versus linear test of nominal size 0.1 on x 1 with x 2 missing at random.
Figure A.3Type I error (left) and power (right) of FP1 versus null test of nominal size 0.1 on x 1 with x 2 missing at random.
Summary of variables in the trauma dataset relevant to this work, n= 5,693.
| Frequency | Mean (SD) | Frequency (%) | |
|---|---|---|---|
| Variable | missing (%) | in observed data | in observed data |
| Massive transfusion (outcome) | 0 (0) | 518 (9) | |
| Age (years) | 0 (0) | 40 (20) | |
| Sex: male | 0 (0) | 4161 (73) | |
| Injury type: penetrating | 23 (0.4) | 580 (10) | |
| Time to emergency dept. (mins) | 2396 (42) | 65 (40) | |
| Systolic blood pressure (mm Hg) | 425 (7) | 126 (29) | |
| Base deficit (m | 868 (16) | 3.4 (5.1) | |
| Prothrombin time (seconds) | 1,648 (29) | 17 (8) |
SD, standard deviation.
Models selected in trauma data.
| Complete records | Stack | ΔWald | |
|---|---|---|---|
| Age (years) | −2 | 0.5, 1 | 1, 1 |
| Sex | — | 1 | 1 |
| Injury type | 1 | 1 | 1 |
| Time to emergency dept. (minutes) | 1 | 1 | 1 |
| Systolic blood pressure (mm Hg) | 1 | 1 | −2, 0.5 |
| Base deficit (m | 1 | −1 | −0.5 |
| Prothrombin time (seconds) | −0.5,−0.5 | −0.5,−0.5 | −0.5,−0.5 |
The numbers give the exponents selected for each variable in the final model.
For binary variables, an exponent of 1 indicates inclusion in the final model.
Two notional individuals' covariate values used for Figure 4.
| Individual | A | B |
|---|---|---|
| Age (years) |
|
|
| Sex | Female | Male |
| Injury type | Blunt | Blunt |
| Time to emergency dept. (minutes) | 63 | 73 |
| Systolic blood pressure (mm Hg) | 91 | 130 |
| Base deficit (m |
|
|
| Prothrombin time (seconds) | 16.8 | 14.4 |
Values of age are fixed when base deficit is varied in Figure 4 and vice versa.
Figure 4Fitted functions for two continuous variables (age and base deficit), by fixing parameters for other covariates, where the method of model selection returns different exponents.
Possible strategies for imputation and model building with pros, cons and recommendations in light of results.
| Stage | Possible approach | Pros | Cons | Practical advice |
|---|---|---|---|---|
| Imputation | JAV | Unbiased for linear models with data MCAR. | Biased in all other settings. | Avoid |
| PMM | Ease of implementation; some ability to model nonlinear associations. | Performance degrades under strong MAR mechanisms. | Possibly useful for exploratory analysis. | |
| SMC FCS | Good approach if the analysis model is known, for example when validating a prognostic model. | Unclear how best to proceed when the analysis model is to be developed from multiply imputed data. | Consider using | |
| Draw FP1 exponents via ABB | Good approach if the highest dimension of FP considered is FP1. | Does not extend beyond FP1. Predictive mean matching can be incorporated for further flexibility but comes with the aforementioned caution. With complete binary or continuous covariates, the search over the parameter space of | Consider using | |
| Estimation of | Log‐likelihood in complete records | Reflects how MFP models are built in complete data. May be adequate with a small fraction of incomplete records and could be followed by SMC FCS to impute for the selected model. | Does not use incomplete records, leading to bias in estimates of
| Avoid unless there are few incomplete records |
| Log‐likelihood in MI data | Reflects how MFP models are built in complete data but uses MI data. | Small bias in
| Consider using | |
| Wald statistics | Typically used in MI data where likelihoods do not have the same meaning. | Very small bias in
| Consider using | |
| Selection of | Likelihood‐ratio tests on complete records | Type I error rate well controlled. May be adequate with a small fraction of incomplete records and could be followed by SMC FCS to impute for the selected model. | Estimates of | Avoid unless there are few incomplete records |
| Weighted likelihood‐ratio tests on stacked MI data | Standard approach to building MFP models in complete data. Superior power to complete records and less biased. | Approximation for the fraction of missing information may be wrong. Type I error rate less well controlled than analysis of complete records. | Consider | |
| Wald and ΔWald tests on MI data | The standard approach to testing in multiply imputed data. Better power and lower bias than complete records. | No theoretical basis for ΔWald. Type I error less well controlled than analysis of complete records. | Consider | |
| Meng and Rubin | Does not require access to full covariance matrix | Computational complexity and extremely low power. | Avoid | |
| Robins and Wang | Provides consistent variance estimation even when the imputation and analysis models are incompatible. | Impractical. Requires a different approach to imputation. Implementation is extremely complex for all but the simplest settings and is infeasible for MFP. | Avoid |
ABB, approximate Bayesian bootstrap; JAV, just another variable; PMM, predictive mean matching; SMC FCS, substantive model compatible fully conditional specification; FP, fractional polynomial; MCAR, missing completely at random; MAR, missing at random; MFP, multivariable FP; MI, multiple imputation.