| Literature DB >> 25402487 |
George O Agogo1, Hilko van der Voet2, Pieter van't Veer3, Pietro Ferrari4, Max Leenders5, David C Muller6, Emilio Sánchez-Cantalejo7, Christina Bamia8, Tonje Braaten9, Sven Knüppel10, Ingegerd Johansson11, Fred A van Eeuwijk2, Hendriek Boshuizen12.
Abstract
In epidemiologic studies, measurement error in dietary variables often attenuates association between dietary intake and disease occurrence. To adjust for the attenuation caused by error in dietary intake, regression calibration is commonly used. To apply regression calibration, unbiased reference measurements are required. Short-term reference measurements for foods that are not consumed daily contain excess zeroes that pose challenges in the calibration model. We adapted two-part regression calibration model, initially developed for multiple replicates of reference measurements per individual to a single-replicate setting. We showed how to handle excess zero reference measurements by two-step modeling approach, how to explore heteroscedasticity in the consumed amount with variance-mean graph, how to explore nonlinearity with the generalized additive modeling (GAM) and the empirical logit approaches, and how to select covariates in the calibration model. The performance of two-part calibration model was compared with the one-part counterpart. We used vegetable intake and mortality data from European Prospective Investigation on Cancer and Nutrition (EPIC) study. In the EPIC, reference measurements were taken with 24-hour recalls. For each of the three vegetable subgroups assessed separately, correcting for error with an appropriately specified two-part calibration model resulted in about three fold increase in the strength of association with all-cause mortality, as measured by the log hazard ratio. Further found is that the standard way of including covariates in the calibration model can lead to over fitting the two-part calibration model. Moreover, the extent of adjusting for error is influenced by the number and forms of covariates in the calibration model. For episodically consumed foods, we advise researchers to pay special attention to response distribution, nonlinearity, and covariate inclusion in specifying the calibration model.Entities:
Mesh:
Year: 2014 PMID: 25402487 PMCID: PMC4234679 DOI: 10.1371/journal.pone.0113160
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Country-specific summary measures for the percentage of zero intake measurements reported on 24-HDR (% R = 0, non-consumers) and Pearson Correlation (ρ) for intake as measured by 24-HDR and DQ for leafy vegetables, fruiting vegetables and root vegetables.
| Leafy vegetables | Fruiting vegetables | Root vegetables | |||||
| Participating Countries | N | % R = 0 | ρ | % R = 0 | ρ | % R = 0 | ρ |
| France | 4735 | 42.8 | 0.17 | 44.4 | 0.10 | 71.6 | 0.06 |
| Italy | 3961 | 59.3 | 0.16 | 37.6 | 0.15 | 79.6 | 0.11 |
| Spain | 3220 | 48.9 | 0.34 | 31.7 | 0.22 | 76.1 | 0.12 |
| UK | 1313 | 68.2 | 0.16 | 40.8 | 0.19 | 59.3 | 0.23 |
| Netherlands | 4545 | 70.5 | 0.10 | 48.7 | 0.21 | 82.0 | 0.14 |
| Greece | 2930 | 67.9 | 0.10 | 29.5 | 0.13 | 83.2 | 0.03ns |
| Germany | 4418 | 75.9 | 0.15 | 41.6 | 0.17 | 79.2 | 0.22 |
| Sweden |
| 70.5 | 0.19 | 34.9 | 0.24 | 67.2 | 0.17 |
| Denmark | 3918 | 77.4 | 0.09 | 41 | 0.21 | 61.8 | 0.40 |
| Norway |
| 58.5 | 0.12 | ||||
EPIC Study, 1999–2000.
N is 3132 instead of 6132 for leafy vegetables in Sweden because data from Umeå were excluded from analysis based of the inclusion criteria in EPIC;
N refers to data for root vegetables only because data for Norway were excluded for leafy vegetable and fruit vegetable subgroups; nsmeans correlation is not statistically significant at α = 0.05, other correlation coefficients are highly significant with P<0.0001.
Figure 1The boxplots for the distribution of intake of vegetable subgroups.
The country-specific boxplots show the distribution of the consumed amount for those who reported consumption on the 24-HDR for leafy vegetables (LV), fruiting vegetables (FV) and root vegetable (RV) subgroups in the EPIC study, 1992–2000.
Figure 2The variance-mean relation for Leafy vegetable intake.
The graph shows a least squares regression line fitted to the scatterplots of the logarithm of center-specific standard deviation versus logarithm of center-specific mean of the consumed amount of leafy vegetables for those who reported consumption on the 24HDR in the EPIC Study, 1992–2000. The approximately linear regression line suggests a variance that increases with the mean.
Figure 3The empirical logit graph for Leafy vegetable intake.
The graph shows loess curves fitted to 1) the scatterplots for the empirical logit (dotted line) and 2) the mean of the predicted logit from a logistic model with log-transformed DQ (thick line) against the DQ category-specific means for leafy vegetable intake in the EPIC Study, 1992–2000. The similarity in the two logit curves suggests that a log- transformed DQ is appropriate for the consumption probability part of the two-part calibration model.
Significant covariates (marked ×) in the reduced two-part calibration models, after a backward elimination on each part of the standard two-part regression calibration model with transformed DQ and with other covariates selected using the standard way of variable inclusion.
| Leafy vegetables | Fruitingvegetables | Root vegetable | ||||
| Covariates | Part I | Part II | Part I | Part II | Part I | Part II |
|
| ||||||
| Qt | × | × | × | × | × | × |
| BMI | × | × | × | × | × | |
| Smoking status | × | × | × | × | × | |
| Physical activity | × | × | × | × | ||
| Lifetime alcohol | × | × | ||||
| Education | × | × | × | × | × | |
| Age | × | × | × | × | × | × |
| Age2 | × | |||||
| Total energy | × | × | × | |||
| Weight | × | × | × | |||
| Center | × | × | × | × | × | × |
| Season | × | × | × | × | × | |
| Sex | × | × | × | × | ||
|
| ||||||
| Qt
| × | × | ||||
| Qt
| × | × | × | × | ||
| Qt
| × | × | ||||
| Qt
| × | × | ||||
| Qt
| × | × | × | × | × | × |
EPIC Study, 1992–2000.
Qt is a transformed DQ; Part I, refers to consumption probability part of the two-part calibration model; Part II, refers to consumed amount part of the two-part calibration model;
*refers to an interaction term.
The area under the curve (AUC) from ROC curve for consumption probability (Part I), and root mean square error (RMSE) and mean bias for the consumed amount (Part II) of the standard and the reduced forms of two-part regression calibration models with transformed DQ.
| Vegetable Subgroups | Part I | Part II | ||
| Models | AUC | RMSEa | Mean Biasb | |
| Leafy | Standard | 0.6846 | 66.841 | 0.0223 |
| Reduced | 0.6843 | 64.578 | 0.0019 | |
| Fruiting | Standard | 0.6305 | 118.823 | 0.0446 |
| Reduced | 0.6304 | 110.415 | −0.0334 | |
| Root | Standard | 0.6413 | 68.626 | 0.0895 |
| Reduced | 0.6408 | 66.524 | 0.0883 | |
; b
Figure 4Linearity assessment in the Cox proportional hazards model for Leafy vegetables.
The graph shows a smoothed curve fitted to the scatterplots of log hazard ratio estimate of leafy vegetable intake on all-cause mortality in each DQ category versus DQ category-specific median intake. The approximately linear downward trend suggests a possible linear relation and a beneficial effect of vegetable intake on the risk of all-cause mortality.
Log hazard ratio estimate (standard error) per 100 g usual intake of each of the three vegetable subgroups, calibrated with each of the three forms of regression calibration models in their reduced and standard forms.
| Reduced form | Standard form | |||||||||
| VegetableSubgroups | Calibration methods |
| s.e ratioc |
| s.eratioc | |||||
| Leafy | Naïve method | −0.144 (0.027) | −0.144 (0.027) | |||||||
| One-part linear calibration | −0.480 (0.090;0.112) | 1.24 | −0.409 (0.083;0.127) | 1.53 | ||||||
| Two-part (untransformedDQ) | −0.395 (0.092;0.183) | 1.99 | −0.174 (0.089;0.278) | 3.11 | ||||||
| Two-part (transformedDQ) | −0.509 (0.090;0.292) | 3.24 | −0.461 (0.047;0.160) | 3.41 | ||||||
| Fruiting | Naïve method | −0.094 (0.014) | −0.094 (0.014) | |||||||
| One-part linear calibration | −0.125 (0.031;0.034) | 1.11 | −0.123 (0.031;0.034) | 1.11 | ||||||
| Two-part(untransformedDQ) | −0.161 (0.030;0.034) | 1.14 | −0.109 (0.030;0.073) | 2.42 | ||||||
| Two-part (transformedDQ) | −0.255 (0.037;0.108) | 2.92 | −0.228 (0.035;0.131) | 3.74 | ||||||
| Root | Naïve method | −0.160 (0.026) | −0.16 (0.026) | |||||||
| One-part linear calibration | −0.342 (0.060;0.082) | 1.36 | −0.305 (0.054;0.077) | 1.43 | ||||||
| Two-part(untransformedDQ) | −0.203 (0.088;0.219) | 2.49 | −0.107 (0.060;0.167) | 2.78 | ||||||
| Two-part (transformedDQ) | −0.479 (0.070;0.214) | 3.06 | −0.265 (0.056;0.181) | 3.23 | ||||||
s.ea is the standard error (×10−2) for that does not account for the uncertainty in the calibration; s.eb is the standard error (×10−2) that accounts for the uncertainty in the calibration; s.e ratioc is the ratio of s.eb to s.ea.