Literature DB >> 29398298

New approach for the identification of implausible values and outliers in longitudinal childhood anthropometric data.

Joy Shi1, Jill Korsiak1, Daniel E Roth2.   

Abstract

PURPOSE: We aimed to demonstrate the use of jackknife residuals to take advantage of the longitudinal nature of available growth data in assessing potential biologically implausible values and outliers.
METHODS: Artificial errors were induced in 5% of length, weight, and head circumference measurements, measured on 1211 participants from the Maternal Vitamin D for Infant Growth (MDIG) trial from birth to 24 months of age. Each child's sex- and age-standardized z-score or raw measurements were regressed as a function of age in child-specific models. Each error responsible for a biologically implausible decrease between a consecutive pair of measurements was identified based on the higher of the two absolute values of jackknife residuals in each pair. In further analyses, outliers were identified as those values beyond fixed cutoffs of the jackknife residuals (e.g., greater than +5 or less than -5 in primary analyses). Kappa, sensitivity, and specificity were calculated over 1000 simulations to assess the ability of the jackknife residual method to detect induced errors and to compare these methods with the use of conditional growth percentiles and conventional cross-sectional methods.
RESULTS: Among the induced errors that resulted in a biologically implausible decrease in measurement between two consecutive values, the jackknife residual method identified the correct value in 84.3%-91.5% of these instances when applied to the sex- and age-standardized z-scores, with kappa values ranging from 0.685 to 0.795. Sensitivity and specificity of the jackknife method were higher than those of the conditional growth percentile method, but specificity was lower than for conventional cross-sectional methods.
CONCLUSIONS: Using jackknife residuals provides a simple method to identify biologically implausible values and outliers in longitudinal child growth data sets in which each child contributes at least 4 serial measurements. Crown
Copyright © 2018. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Biologically implausible values; Jackknife residuals; Longitudinal growth data; Outliers

Mesh:

Year:  2018        PMID: 29398298      PMCID: PMC5840491          DOI: 10.1016/j.annepidem.2018.01.007

Source DB:  PubMed          Journal:  Ann Epidemiol        ISSN: 1047-2797            Impact factor:   3.797


Introduction

Child growth in stature and body dimensions is a continuous and dynamic process that is optimally studied through longitudinal follow-up. As such, many epidemiologic studies are designed to collect repeated anthropometric measures of each child across successive time points to describe growth patterns, identify predictors, and assess associations with later health outcomes [1], [2], [3]. Standardized procedures for the measurement and collection of anthropometry data have been established to maximize data quality [4], [5], and advanced analytical strategies are employed to accommodate the longitudinal nature of the data collected in such studies [6]; similarly, rigorous quality control procedures during data cleaning are warranted to identify outliers and implausible values. For cross-sectional studies in which each child only contributes one set of measurements, identification of outliers and implausible values is limited to the use of fixed cutoffs that were derived from a reference population, such as those established for the WHO Child Growth Standards [7] or from the observed distributional properties of the study population. However, with serial anthropometric measurements collected in the same individual, the growth trajectory of each individual provides another basis upon which to assess the biological plausibility of any given measurement for that individual. For example, a decrease in length or height between two successive time points is biologically implausible and thus indicates that at least one of the values was incorrectly measured or recorded. However, without additional information, it is often challenging to determine which of the two values induced the implausible trajectory. Many longitudinal studies use only conventional cross-sectional approaches to identify outliers; when the longitudinal nature of the data is taken into consideration, the methods used to identify these implausible values are often not described, or exclusions are made on a case-by-case basis [8]. Yang et al. [9] recently described the application of conditional growth percentiles for systematically identifying outliers and implausible values in longitudinal childhood anthropometric data. In their example, a hierarchical model of serial weight measurements as a function of age was constructed to estimate an individual's weight percentile at time t, while conditioning on the individual's weight percentile at time t−1. This approach implies that the plausibility of a given measurement is solely based on the preceding measurement (and therefore cannot be applied to an individual's first measurement), yet the expected value is contingent on the overall growth trajectory of the study population. We propose an alternative longitudinal approach to assess for outliers and implausible values using growth trajectories that are fit to each individual's anthropometric data, in which the identification of outliers and implausible values does not depend on the distribution of measurements in the whole study population. In this study, we aimed to use a longitudinal growth data set with artificially induced errors to demonstrate how jackknife residuals of linear models of z-scores or raw growth data as a function of age can be used to identify biologically implausible values and outliers. We further aimed to compare the sensitivity and specificity of jackknife residuals to alternative methods of data cleaning, including conditional growth percentiles and conventional cross-sectional approaches.

Methods

Data source

Anthropometric data collected from infants enrolled in the Maternal Vitamin D for Infant Growth (MDIG) trial were used. Data collection including anthropometry is ongoing; therefore, data available up to January 26, 2017 were used for this study. The MDIG trial methods have been previously described [10]. In brief, the MDIG trial is a randomized placebo-controlled dose-ranging trial of vitamin D supplementation during pregnancy and lactation in Dhaka, Bangladesh. Length, weight, and head circumference measurements are scheduled in tri-monthly intervals from birth to 24 months of age, plus an additional measurement taken at 2, 4, 6, or 8 weeks of age chosen through random assignment, with some variability between infants in actual timing of measurement collection. Age- and sex-standardized z-scores for length, weight, and head circumference measurements were generated using a combination of growth references: the Intergrowth-21st Newborn Size standards; the Intergrowth-21st International Postnatal Growth Standards for Preterm Infants; and the World Health Organization (WHO) Child Growth Standards (Supplementary Material Methods). Data cleaning of any natural occurring errors in the data set was not conducted to preclude biasing the results in favor of one method over another.

Simulation of implausible values and outliers

Simulated outliers and implausible values were randomly generated in the existing data set by deliberate introduction of data errors. In primary analyses, we randomly selected 5% of all encounters for error induction, in which values were randomly shifted upward or downward in relation to the original observed values. The magnitude of the errors (i.e. differences between original and shifted values) followed a normal distribution of mean = 0 with a standard deviation based off of the derived standard deviation of the raw anthropometric measurements in the WHO Multicentre Growth Reference Study that is the basis for the WHO child growth standards [7]. As such, the standard deviation of the errors varied by type of measurement, sex, and age of the infant. In sensitivity analyses, we used error rates of 10% and 15%, and standard deviations of 2- and 3-times the age- and sex-specific standard deviation for the corresponding anthropometric measure. A Monte Carlo approach was used, in which each scenario for the varying error rates and error generation methods was simulated 1000 times.

Identifying biologically implausible values

Jackknife residuals were applied to identify the incorrect measurement in instances where errors introduced into the anthropometric data set resulted in a biologically implausible decrease in a child's length, weight, or head circumference from one time point to the next. Jackknife (or externally studentized) residuals, r(, are generated from regression residuals, e, that are scaled by a function of the mean squared error with the ith observation deleted, MSE, and the leverage, h: Jackknife residuals are expected to follow a t distribution with (n–k–2) degrees of freedom, where n is the number of observations and k is the number of parameters in the fitted model, thereby giving the distribution a mean of 0 and standard deviation slightly greater than 1. Given that the k observation is an outlier, the jackknife residuals of other observations will shrink toward zero due to an overestimation of MSE, whereas the jackknife residual of the k observation will not. As such, jackknife residuals respond more strongly to the presence of a single outlier than does the standardized residual [11]. Any decrease in raw length or head circumference measurements was considered to be biologically implausible, whereas a decrease of greater than 15% in the raw measurements for weight was considered biologically implausible. This analysis addressed instances in which an error in the data set could be clearly identified, but the exact time point at which the error occurred was not as easily discerned. Therefore, these analyses were limited to children for whom there was at least one implausible decrease induced in anthropometric measures. All individuals with a biologically implausible decrease between any two measurement time points were first identified based on the criteria listed previously. Separately for each child, linear regression was used to fit a straight line through the individual's sex- and age-standardized z-score of the corresponding anthropometry measurement as a function of age:where “i” denotes the ith individual and “j” denotes the jth time point. For raw measurements, each individual's measurements were regressed on the square root of age (t½) to model a curvilinear relationship in which growth rates vary with age [12]: Each measurement of a given individual is assessed for adequate fit to the modeled trajectory for that individual using jackknife residuals (Supplementary Material Methods). We compared the absolute values of the jackknife residuals for the two adjacent time points that spanned the interval across which each biologically implausible decrease occurred. For each pair of values, the value with the largest absolute jackknife residual was labeled as the incorrect value, irrespective of the absolute magnitude of the jackknife residual. Kappa statistics were computed to assess the performance of these approaches in identifying the correct biologically implausible value (i.e., agreement between the classification by the jackknife residual method vs. the true classification of values as induced errors or unmodified measurements). Analyses were restricted to infants for whom four or more measurement time points were available, since the residual method cannot be applied to individuals with fewer than (k+2) measurements available, where k represents the number of parameters in the model (Supplementary Material Methods). In addition, biologically implausible decreases in the original MDIG data set were excluded from this analysis since the time point at which the error occurred is unknown.

Comparison of sensitivity and specificity across methods for identifying outliers

To assess the sensitivity and specificity of the jackknife residual method for detecting induced errors, analyses were no longer restricted to instances in which there were implausible decreases in size between subsequent time points, although many of the identified errors were expected to overlap with the biologically implausible values identified in the previous analysis. The modeling strategies were identical to those described previously. All measurements with a jackknife residual below −5 or above +5 were considered outliers. Sensitivity analyses were conducted in which cutoffs of ±3 and ±7 were used instead. The conditional growth percentile method outlined by Yang et al. was also applied. A random-effects model of the raw anthropometric measurement as a function of age was constructed for the whole study population, using a restricted cubic spline with five knots, where knot locations were based on Harrell's recommendations [13]. Conditional percentiles for each measurement were estimated, and measurements which were below −4 SD or above +4 SD were considered outliers, as implemented by Yang et al. [9]. As per the demonstrations provided by Yang et al., conditional growth percentiles were calculated only for raw measurements and not for their corresponding z-scores because one of its strengths is that the method can be applied even when external standards are not available. Finally, two traditional cross-sectional approaches to identify outliers and biologically implausible values were applied: (1) using the cutoffs for biologically implausible values that were derived from the WHO Child Growth Standards (<−6 SD or >6 SD for LAZ, <−6 SD or >5 SD for WAZ, and <−5 SD or >5 SD for HCAZ) [7] and (2) using cutoffs of 4 SD below or above the observed population average at the given time point. For each method, the sensitivity and specificity of the approach was assessed, whereby knowledge of which errors were artificially introduced into the data set was considered the “gold standard”. Sensitivities and specificities were calculated for the whole data set, regardless of the limitations of each method with respect to the detection of errors under certain conditions (e.g., residual method cannot be applied to individuals with fewer than 4 measurements; conditional growth percentile method cannot be applied to an individual's first measurement) and regardless of whether a given measurement was a suspected outlier in the original data set. As such, the sensitivity and specificity of any of these methods were not expected to be 100%. Sensitivity analyses also calculated sensitivity and specificity of these approaches after stratifying by time point or by number of measurement points per individual.

Results

A total of 8868 length measurements, 8883 weight measurements and 8888 head circumference measurements were available from 1211 infants in the MDIG trial (Table 1). Because data collection is still ongoing, there were fewer measurements available at later time points.
Table 1

Summary of anthropometric measurements available from the Maternal Vitamin D for Infant Growth (MDIG) trial∗

MeasureLengthWeightHead circumference
Number of measurements, by age
 Birth (0–48 h)828835835
 Birth (>48 h)252251252
 2 to 8 wk109510991100
 3 mo112511321132
 6 mo113111321133
 9 mo112611261126
 12 mo107210711071
 15 mo880880880
 18 mo610609610
 21 mo443442443
 24 mo306306306
Total number of measurements886888838888
Number of measurements per infant
 Mean ± SD7.3 ± 2.07.3 ± 2.07.3 ± 2.0
 Median (range)7 (1, 11)7 (1, 11)7 (1, 11)
Number of infants with
 ≥1 measurement121112111211
 ≥2 measurements, n (%)1196 (98.8)1196 (98.8)1196 (98.8)
 ≥4 measurements, n (%)1165 (96.2)1166 (96.3)1166 (96.3)
 ≥6 measurements, n (%)1005 (83.0)1004 (82.9)1006 (83.1)
 ≥8 measurements, n (%)557 (46.0)557 (46.0)557 (46.0)

Based on data available up to January 26, 2017.

Because of variability in the timing of measurements, these ages represent the scheduled visit time and the actual age of infants at their visit range from the midpoints of adjacent categories (e.g., timing of 6 month measurements range from 4.5 to 7.5 months of age).

Summary of anthropometric measurements available from the Maternal Vitamin D for Infant Growth (MDIG) trial∗ Based on data available up to January 26, 2017. Because of variability in the timing of measurements, these ages represent the scheduled visit time and the actual age of infants at their visit range from the midpoints of adjacent categories (e.g., timing of 6 month measurements range from 4.5 to 7.5 months of age). After inducing errors at a 5% rate, applying the jackknife residual method to either sex- and age-standardized z-scores or raw measurements performed comparably for both length and weight; the induced errors within each pair were correctly identified in an average of 88%–92% of pairs and kappa statistics ranged from 0.760 to 0.795 (Table 2). For head circumference, applying the jackknife residual method to the sex- and age-standardized z-scores performed better than when applied to the raw measurements (Table 2).
Table 2

Comparison of using jackknife residuals from linear versus nonlinear models of z-scores or raw growth data, respectively, as a function of age to identify biologically implausible decreases in length, weight, and head circumference measurements over 1000 simulations with an induced error rate of 5%

ModelNumber of pairs of adjacent values with a biologically implausible decrease, mean ± SDPercent of pairs in which the error was correctly identified (%), mean ± SDKappa statistic, mean ± SD
Length
 Model 162.5 ± 7.988.2 ± 4.00.760 ± 0.081
 Model 2§62.5 ± 7.989.6 ± 3.90.788 ± 0.080
Weight
 Model 126.0 ± 5.391.5 ± 5.20.795 ± 0.127
 Model 2§26.0 ± 5.391.2 ± 5.40.789 ± 0.129
Head circumference
 Model 1123.3 ± 10.884.3 ± 3.10.685 ± 0.062
 Model 2§123.3 ± 10.873.2 ± 3.90.462 ± 0.079

Any decrease in raw length or head circumference measurements were considered to be biologically implausible, whereas a decrease of greater than 15% in the raw measurements for weight were considered biologically implausible.

Agreement between the jackknife residual method and truth in the classification of induced plausible values.

Linear equation of sex- and age-standardized z-score as a function of age (Z = β + βt + ε).

Raw anthropometric measurement as a function of square root age (Y = β + βt + ε).

Comparison of using jackknife residuals from linear versus nonlinear models of z-scores or raw growth data, respectively, as a function of age to identify biologically implausible decreases in length, weight, and head circumference measurements over 1000 simulations with an induced error rate of 5% Any decrease in raw length or head circumference measurements were considered to be biologically implausible, whereas a decrease of greater than 15% in the raw measurements for weight were considered biologically implausible. Agreement between the jackknife residual method and truth in the classification of induced plausible values. Linear equation of sex- and age-standardized z-score as a function of age (Z = β + βt + ε). Raw anthropometric measurement as a function of square root age (Y = β + βt + ε). When using the jackknife residuals method to identify any induced error, sensitivities ranged from 10.7% to 14.1% and specificities ranged from 97.4% to 97.6% when applied to sex- and age-standardized z-scores for length, weight, and head circumference (Table 3). Sensitivity estimates were lower when the jackknife residual method was used for raw length, weight, or head circumference measurements, although specificities were similar to the models based on z-scores (Table 3). Alternative methods to identify induced errors in length, weight, and head circumference measurements had much lower sensitivities (Table 3). The conditional growth percentile method had specificities that were slightly lower than the jackknife residual approach, whereas the conventional cross-sectional methods were very insensitive (<1%) but had nearly perfect specificities (>99%) (Table 3).
Table 3

Comparison of alternative methods to identify induced errors in length, weight, and head circumference measurements over 1000 simulations with an induced error rate of 5%

MeasureJackknife residuals (model 1) with >5 or < −5 cutoffJackknife residuals (model 2) with >5 or < −5 cutoffConditional growth percentile with >4 or < −4 cutoffRecommended cutoffs from the WHO child growth standards§>4 or < −4 SD from population average
Length
 Sensitivity (%), mean ± SD11.9 ± 1.510.2 ± 1.40.2 ± 0.20.1 ± 0.10.4 ± 0.3
 Specificity (%), mean ± SD97.4 ± 0.197.4 ± 0.186.2 ± 0.1100.0 ± 0.099.9 ± 0.0
Weight
 Sensitivity (%), mean ± SD14.1 ± 1.69.7 ± 1.40.1 ± 0.20.9 ± 0.50.6 ± 0.3
 Specificity (%), mean ± SD97.4 ± 0.198.0 ± 0.186.3 ± 0.199.9 ± 0.099.9 ± 0.0
Head circumference
 Sensitivity (%), mean ± SD10.7 ± 1.44.1 ± 0.90.2 ± 0.20.4 ± 0.30.5 ± 0.3
 Specificity (%), mean ± SD97.6 ± 0.198.1 ± 0.186.3 ± 0.199.8 ± 0.099.8 ± 0.0

Linear equation of sex- and age-standardized z-score as a function of age (Z = β + βt + ε).

Raw anthropometric measurement as a function of square root of age (Y = β + βt+ ε).

Based on a random effects restricted cubic spline (with 5 knots) model.

For LAZ, <−6 SD or >6 SD; for WAZ, <−6 SD or >5 SD; and for HCAZ, <−5 SD or >5 SD [7].

Comparison of alternative methods to identify induced errors in length, weight, and head circumference measurements over 1000 simulations with an induced error rate of 5% Linear equation of sex- and age-standardized z-score as a function of age (Z = β + βt + ε). Raw anthropometric measurement as a function of square root of age (Y = β + βt+ ε). Based on a random effects restricted cubic spline (with 5 knots) model. For LAZ, <−6 SD or >6 SD; for WAZ, <−6 SD or >5 SD; and for HCAZ, <−5 SD or >5 SD [7]. As expected, sensitivity decreased and specificity increased with increasing absolute values of cutoffs used for the jackknife residual method (Fig. 1, Supplementary Material Table S1). When stratified by the number of measurements available per infant, both the residual method and the conditional growth percentile method had higher sensitivities and lower specificities among participants for whom there were fewer numbers of measurement encounters (Supplementary Material Table S2). When stratified by timing of the measurement, higher sensitivity and lower specificity were also generally observed when the residual method was applied to the first measurement taken for a given individual compared with midtrajectory visits or last visits, although the pattern of differences in sensitivities was not evident for raw measurements (Supplementary Material Table S3). Substantial differences in sensitivity and specificity were not observed between midtrajectory visits and last visits when the conditional growth percentile method was applied. Increasing the overall induced error rate to 10% or 15% resulted in a decrease in sensitivity for the residual method, but specificity was largely unchanged (Supplementary Material Table S4). In contrast, doubling or tripling the width of the distribution of the magnitudes of errors resulted in increased sensitivity, but specificity remained fairly constant (Supplementary Material Table S4). For the alternative methods, changes in the error rate had smaller effects on sensitivities and specificities (Supplementary Material Table S4).
Fig. 1

Sensitivity and specificity of the jackknife residual method for detection of outliers in child (A) raw length, (B) length-for-age z-score, (C) raw weight, (D) weight-for-age z-score, (E) raw head circumference, and (F) head circumference-for-age z-score data using cutoffs from ±3 to ± 8.

Table S1

Comparison of various cutoffs to identify induced errors in length, weight, and head circumference using the jackknife residual method over 1000 simulations with an induced error rate of 5%

MeasureJackknife residuals: Model 1
Jackknife residuals: Model 2
>3 or < −3 residual cutoff>5 or < −5 residual cutoff>7 or < −7 residual cutoff>3 or < −3 residual cutoff>5 or < −5 residual cutoff>7 or < −7 residual cutoff
Length
 Sensitivity (%), mean ± SD27.8 ± 2.011.9 ± 1.55.6 ± 1.125.8 ± 2.110.2 ± 1.44.5 ± 1.0
 Specificity (%), mean ± SD94.4 ± 0.197.4 ± 0.198.2 ± 0.194.1 ± 0.197.4 ± 0.198.2 ± 0.1
Weight
 Sensitivity (%), mean ± SD29.9 ± 2.114.1 ± 1.67.4 ± 1.224.2 ± 1.99.7 ± 1.44.4 ± 1.0
 Specificity (%), mean ± SD94.1 ± 0.197.4 ± 0.198.3 ± 0.195.7 ± 0.198.0 ± 0.198.6 ± 0.0
Head circumference
 Sensitivity (%), mean ± SD25.6 ± 1.910.7 ± 1.45.1 ± 1.014.6 ± 1.64.1 ± 0.91.6 ± 0.6
 Specificity (%), mean ± SD94.2 ± 0.197.6 ± 0.198.4 ± 0.195.5 ± 0.198.1 ± 0.198.6 ± 0.0

Linear equation of sex- and age-standardized z-score as a function of age (Z = β + βt + ε).

Raw anthropometric measurement as a function of square root of age (Y = β + βt + ε).

Table S2

Comparison of various methods to identify induced errors in length, weight, and head circumference measurements over 1000 simulations with an induced error rate of 5%, stratified by the number of measurements available

Number of observations per infantJackknife residuals (model 1) with >5 or < −5 cutoff
Jackknife residuals (model 2) with >5 or < −5 cutoff
Conditional growth percentile with >4 SD or < −4 SD cutoff
Sensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SD
Length
 2–3 measurements per infant1.6 ± 8.754.2 ± 1.4
 4−5 measurements per infant19.0 ± 6.292.7 ± 0.417.1 ± 6.293.1 ± 0.40.1 ± 0.679.1 ± 0.3
 6−7 measurements per infant13.3 ± 2.798.5 ± 0.110.7 ± 2.598.1 ± 0.10.1 ± 0.384.6 ± 0.2
 ≥8 measurements per infant10.2 ± 1.899.1 ± 0.19.0 ± 1.799.2 ± 0.10.2 ± 0.388.9 ± 0.1
 Overall11.9 ± 1.597.4 ± 0.110.2 ± 1.497.4 ± 0.10.2 ± 0.286.2 ± 0.1
Weight
 2−3 measurements per infant0.1 ± 1.756.4 ± 1.4
 4−5 measurements per infant21.0 ± 6.394.5 ± 0.415.7 ± 6.195.1 ± 0.40.1 ± 0.579.3 ± 0.3
 6−7 measurements per infant15.6 ± 3.097.9 ± 0.110.3 ± 2.598.7 ± 0.10.1 ± 0.384.7 ± 0.2
 ≥8 measurements per infant12.4 ± 2.199.2 ± 0.18.6 ± 1.899.8 ± 0.00.1 ± 0.289.0 ± 0.1
 Overall14.1 ± 1.697.4 ± 0.19.7 ± 1.498.0 ± 0.10.1 ± 0.286.3 ± 0.1
Head circumference
 2−3 measurements per infant0.2 ± 2.857.0 ± 1.4
 4−5 measurements per infant20.0 ± 6.595.1 ± 0.414.2 ± 5.394.9 ± 0.40.2 ± 0.779.3 ± 0.3
 6−7 measurements per infant12.4 ± 2.798.4 ± 0.15.2 ± 1.899.2 ± 0.10.3 ± 0.484.6 ± 0.2
 ≥8 measurements per infant8.4 ± 1.799.1 ± 0.12.0 ± 0.999.6 ± 0.10.3 ± 0.388.9 ± 0.1
 Overall10.7 ± 1.497.6 ± 0.14.1 ± 0.998.1 ± 0.10.3 ± 0.286.3 ± 0.1

Linear equation of sex- and age-standardized z-score as a function of age (Z = β + βt + ε).

Raw anthropometric measurement as a function of square root of age (Y = β + βt + ε).

Based on a random effects restricted cubic spline (with 5 knots) model.

Table S3

Comparison of various methods to identify induced errors in length, weight, and head circumference measurements among infants with at least 2 measurements over 1000 simulations with an induced error rate of 5%, stratified by visit at which error was induced

Timing of visitJackknife residuals (model 1) with >5 or < −5 cutoff
Jackknife residuals (model 2) with >5 or < −5 cutoff
Conditional growth percentile with >4 SD or < −4 SD cutoff
Sensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SD
Length
 First visit17.6 ± 4.891.7 ± 0.411.2 ± 4.190.9 ± 0.4
 Middle visit11.5 ± 1.898.9 ± 0.19.8 ± 1.799.0 ± 0.10.2 ± 0.399.8 ± 0.0
 Last visit8.1 ± 3.496.4 ± 0.211.7 ± 4.296.5 ± 0.20.2 ± 0.599.8 ± 0.0
 Overall11.9 ± 1.597.4 ± 0.110.2 ± 1.497.4 ± 0.10.2 ± 0.286.2 ± 0.1
Weight
 First visit21.9 ± 5.590.0 ± 0.45.9 ± 3.093.5 ± 0.3
 Middle visit13.3 ± 1.999.2 ± 0.19.5 ± 1.699.4 ± 0.00.2 ± 0.299.9 ± 0.0
 Last visit11.1 ± 4.196.7 ± 0.214.8 ± 4.696.6 ± 0.20.0 ± 0.2100.0 ± 0.0
 Overall14.1 ± 1.697.4 ± 0.19.7 ± 1.498.0 ± 0.10.1 ± 0.286.3 ± 0.1
Head circumference
 First visit17.5 ± 5.191.6 ± 0.46.2 ± 3.294.6 ± 0.3
 Middle visit10.2 ± 1.699.0 ± 0.13.0 ± 0.999.5 ± 0.00.4 ± 0.399.8 ± 0.0
 Last visit6.3 ± 3.197.2 ± 0.28.1 ± 3.695.5 ± 0.30.1 ± 0.499.9 ± 0.0
 Overall10.7 ± 1.497.6 ± 0.14.1 ± 0.998.1 ± 0.10.3 ± 0.286.3 ± 0.1

Linear equation of sex- and age-standardized z-score as a function of age (Z = β + βt + ε).

Raw anthropometric measurement as a function of square root of age (Y = β + βt + ε).

Based on a random effects restricted cubic spline (with 5 knots) model.

Table S4

Comparison of various methods to identify induced errors in length, weight, and head circumference measurements over 1000 simulations with induced error rates of 5%, 10%, and 15%

Error rateJackknife residuals (model 1) with >5 or < −5 cutoff
Jackknife residuals (model 2) with >5 or < −5 cutoff
Conditional growth percentile with >4 SD or < −4 SD cutoff
Recommended cutoffs from the WHO child growth standards
>4 or < −4 SD from population average
Sensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SD
Length
 5% error rate11.9 ± 1.597.4 ± 0.110.2 ± 1.497.4 ± 0.10.2 ± 0.286.2 ± 0.10.1 ± 0.1100.0 ± 0.00.4 ± 0.399.9 ± 0.0
 10% error rate2.3 ± 0.796.7 ± 0.12.2 ± 0.796.8 ± 0.10.1 ± 0.286.2 ± 0.10.1 ± 0.1100.0 ± 0.00.2 ± 0.299.9 ± 0.0
 15% error rate2.4 ± 0.796.7 ± 0.12.3 ± 0.796.8 ± 0.10.1 ± 0.186.2 ± 0.10.1 ± 0.199.9 ± 0.00.1 ± 0.299.9 ± 0.0
Weight
 5% error rate14.1 ± 1.697.4 ± 0.19.7 ± 1.498.0 ± 0.10.1 ± 0.286.3 ± 0.10.2 ± 0.299.9 ± 0.00.6 ± 0.399.9 ± 0.0
 10% error rate2.5 ± 0.896.6 ± 0.11.6 ± 0.697.4 ± 0.10.1 ± 0.186.3 ± 0.10.1 ± 0.199.9 ± 0.00.1 ± 0.299.8 ± 0.0
 15% error rate2.6 ± 0.896.5 ± 0.11.8 ± 0.797.2 ± 0.10.1 ± 0.186.3 ± 0.10.1 ± 0.199.9 ± 0.00.1 ± 0.299.9 ± 0.0
Head circumference
 5% error rate10.7 ± 1.497.6 ± 0.14.1 ± 0.998.1 ± 0.10.3 ± 0.286.3 ± 0.10.4 ± 0.399.8 ± 0.00.5 ± 0.399.8 ± 0.0
 10% error rate2.1 ± 0.797.0 ± 0.11.2 ± 0.597.9 ± 0.10.1 ± 0.286.3 ± 0.10.2 ± 0.299.8 ± 0.00.2 ± 0.299.8 ± 0.0
 15% error rate2.2 ± 0.796.9 ± 0.11.3 ± 0.597.8 ± 0.10.1 ± 0.186.3 ± 0.10.2 ± 0.299.8 ± 0.00.2 ± 0.299.8 ± 0.0

Linear equation of sex- and age-standardized z-score as a function of age (Z = β + βt + ε).

Raw anthropometric measurement as a function of square root of age (Y = β + βt + ε).

Based on a random effects restricted cubic spline (with 5 knots) model.

Sensitivity and specificity of the jackknife residual method for detection of outliers in child (A) raw length, (B) length-for-age z-score, (C) raw weight, (D) weight-for-age z-score, (E) raw head circumference, and (F) head circumference-for-age z-score data using cutoffs from ±3 to ± 8. As a case study, the jackknife residual method was applied to the original MDIG data set without induced errors. For length, 21 pairs of measurements with biologically implausible decreases were identified from the data set. An example of an individual with such a pair of measurements is presented in Figure 2, where applying the jackknife residual method to LAZ and raw measurements were consistent in identifying the incorrect measurement time point, as represented by the red marker.
Fig. 2

Example of a participant for whom an error was identified within a pair of values in which there was a biologically implausible decrease in length between two adjacent encounters (shown in hollow circles). The error was similarly identified when the jackknife residual method was applied to (A) length-for-age z-scores (LAZ) or (B) raw length measurements. Each measurement is labeled with its corresponding jackknife residual values.

Example of a participant for whom an error was identified within a pair of values in which there was a biologically implausible decrease in length between two adjacent encounters (shown in hollow circles). The error was similarly identified when the jackknife residual method was applied to (A) length-for-age z-scores (LAZ) or (B) raw length measurements. Each measurement is labeled with its corresponding jackknife residual values. To reduce the probability of labeling true measurements as errors/outliers by this process, a cutoff of ±5 was used in general, with a more extreme cutoff of ±6 applied to individuals with only 4 to 5 measurements, or if it was an individual's first measurement. As such, of the 8868 available length measurements, the residual method identified 133 (1.50%) outliers when applied to LAZ, and 139 (1.57%) outliers when applied to the raw measurements, with an overlap of 43 (0.48%) measurements that were identified by both methods. A total of 85 (0.96%) measurements could not be evaluated using this method, as these individuals had fewer than 4 separate measurements taken. An example of an outlier identified by both methods is presented in Figure 3.
Fig. 3

Example of a participant for whom an outlier (shown in hollow circles) was identified when the jackknife residual method is applied to (A) length-for-age z-score (LAZ) or (B) raw length measurement. Each measurement is labeled with its corresponding jackknife residual values.

Example of a participant for whom an outlier (shown in hollow circles) was identified when the jackknife residual method is applied to (A) length-for-age z-score (LAZ) or (B) raw length measurement. Each measurement is labeled with its corresponding jackknife residual values.

Discussion

We demonstrated the use of a novel, simple, and objective approach to identify outliers and biologically implausible values in longitudinal growth data. Although the regression models chosen may provide a crude fit to the data, relative to more sophisticated modeling strategies, this was done intentionally to preclude overfitting the data and biasing the residuals toward the null. The simplicity of the models chosen to assess the jackknife residuals of given measurements will allow this method to be easily applied, especially for large data sets in which manual inspection may not be feasible. In assessing the application of the jackknife residual method to identify the incorrect measurement within pairs of adjacent values with a biologically implausible decrease, incorrectly labeling an unmodified value as an error or failing to correctly identify an induced error was largely due to instances in which errors were introduced in both measurements within the pair or if the participant had many errors introduced at other measurement time points. As such, the use of the jackknife residual method to identify the incorrect measurement in instances of a biologically implausible decrease is predicated on the assumption that one measurement is correct whereas the other is not—this method is unable to identify instances in which errors occurred at both time points. In addition, increased number and magnitude of errors per individual will distort growth trajectories fit to each individual's measurements and thereby reduce the utility of this approach. Errors in head circumference measurements were more difficult to identify, likely due to the smaller absolute change in head circumference that occurs during development, relative to length and weight. Errors which were large enough to cause a biologically implausible decrease in measurements only caused minor changes in the overall head circumference trajectory, making it difficult for the jackknife residual method, as well as other methods, to discern between correct and incorrect measurements. The accuracy of the jackknife residual method when applied to age- and sex-standardized z-scores versus raw measurements was comparable, except for head circumference, where using a ±5 cutoff resulted in much higher sensitivity and slightly lower specificity when using HCAZ rather than the raw measurements. Although observed sensitivities may appear low, this was expected since the magnitude of many induced errors were quite small and were very unlikely to be detected using any available method. In addition, since precleaning of the data set was not conducted, naturally occurring errors could be identified as errors and would therefore reduce estimated specificities. However, this would affect not just the jackknife residual method but all methods that were assessed, although likely to different extents. For example, conventional cross-sectional methods have overall lower sensitivity, and therefore are less likely to detect naturally occurring errors. As such, specificity of these methods is less likely to be reduced by naturally occurring errors than for methods which have higher sensitivity such as the jackknife residual method. The effect on estimates of sensitivity are more difficult to predict, but given that the true prevalence of errors in the data set is greater than how much was induced, our estimates of sensitivity may also be an underestimate of the true sensitivity because our sensitivity analyses have shown that the sensitivity of these methods decreased with increasing error rate (Table S4). We also showed that the timing of the measurement as well as the number of measurements available for a given individual has implications for the sensitivity and specificity of the jackknife residual method, but can be accounted for by combining a variety of different cutoffs when flagging potential outliers. Both sensitivity and specificity were lower when applying the conditional growth percentile method to identify the induced errors compared with the jackknife residual method. The substantially reduced specificity can be attributed to the calculation of specificity in the whole data set, rather than in the subset of measurements to which the method can be applied. For example, the jackknife residual method could not be applied to measurements of individuals who have fewer than four total measurements, which comprised approximately 1% of all measurements in the data set, effectively reducing the observed overall sensitivity of this method by 0.5% and specificity by 0.95% (assuming errors were induced in 5% of those observations, as expected since they were randomly generated). In contrast, the conditional growth percentile method could not be applied to the first measurement time point of each individual, which comprised approximately 13.6% of all measurements in the data set, thus affecting the observed overall sensitivity and specificity of the method to a much greater extent and thereby highlighting the most consequential limitation of this method. The reduced sensitivity of the conditional growth percentile method may be attributed to the use of the 4 SD threshold as recommended by Yang et al [9] to prioritize specificity. Similarly, careful consideration of the sensitivity-specificity trade-off is required for the jackknife residual method, and further inquiry and investigation into outliers flagged by this method should be conducted. Unsurprisingly, conventional cross-sectional methods performed quite poorly in discriminating between unmodified measurements and induced errors. Although they had near-perfect specificity, a very limited number of the induced errors were detected by these methods, resulting in extremely poor sensitivity. Although lowered cutoffs were assessed to try to increase the sensitivity of these methods (data not shown), substantial increases in sensitivity were not observed until cutoffs were lowered to values that would have eliminated measurements which were likely to be plausible. Although we demonstrated that jackknife residuals are a practical approach to identify both biologically implausible values and outliers, assumptions were made regarding the functional forms used to reflect the shape of the growth trajectory. Although we used linear equations of z-scores as a function of age or raw anthropometric measurement as a function of square root of age, the general shape of the growth trajectory can be described using other functional forms. Our models were selected on the basis of being generalized forms of the expected growth trajectories of infants from birth to 2 years of age. For example, the linear equation of z-scores as a function of age represents the expected trajectory of a child maintaining a z-score at 0, whereas the square root equation reflects the rapid but decelerating rate of change in raw size measurements that occurs in the postnatal period. Different functional forms may be needed to account for alternate patterns of growth that may be expected with other types or timing of measurements. However, caution is warranted against overfitting the data, as not only will this increase the number of measurements needed per individual, where (k + 2) measurements are needed for a model with k parameters, residuals are biased closer toward the null when the data are overfit. In addition, our simulations assumed an arbitrary error rate and distribution for the magnitude of the errors, which may be impossible to characterize in real longitudinal growth data. However, our sensitivity analyses indicated that although increasing the error rate or magnitude of these errors has implications for the sensitivity of the jackknife residual method, its specificity remains relatively constant. While no single method will be able to identify all errors in a longitudinal growth data set, a combination of approaches, such as applying the jackknife residual method using various regression equations as well as the conditional growth percentile method, may provide the best sensitivity without erroneously identifying real measurements as errors. A multitude of factors, including type of measurement, number of measurements available per individual, and the timing between measurements, should be considered in deciding on a data cleaning strategy. Ultimately, the acceptable balance of sensitivity and specificity—thereby determining the parameters used to implement the method (e.g., cutoff values, use of variable cutoff values for different time points)—is determined by individual investigators based on the study design and overall sample size. Flagged values should undergo manual review and adjudication before being excluded from further analyses; therefore, the choice of cutoffs may be determined by available resources to undertake a manual review process. In conclusion, the use of jackknife residuals provides a simple and flexible method to identify biologically implausible values and outliers in longitudinal growth data in studies in which most children have at least 4 serial measurements. The detection and correction (or exclusion, if necessary) of measurement errors can increase precision in analyses to identify determinants of growth trajectories or the effects of child growth on later health outcomes.
Table S5

Comparison of various methods to identify induced errors in length, weight, and head circumference measurements over 1000 simulations in which magnitude of induced errors have standard deviations of 1, 2, and 3

Jackknife residuals (model 1) with >5 or < −5 cutoff
Jackknife residuals (model 2) with >5 or < −5 cutoff
Conditional growth percentile with >4 SD or < −4 SD cutoff
Recommended cutoffs from the WHO child growth standards
>4 or < −4 SD from population average
Sensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SDSensitivity (%), mean ± SDSpecificity (%), mean ± SD
Length
 SD 111.9 ± 1.597.4 ± 0.110.2 ± 1.497.4 ± 0.10.2 ± 0.286.2 ± 0.10.1 ± 0.1100.0 ± 0.00.4 ± 0.399.9 ± 0.0
 SD 229.3 ± 2.297.5 ± 0.126.9 ± 2.197.5 ± 0.12.8 ± 0.786.3 ± 0.11.4 ± 0.6100.0 ± 0.03.5 ± 0.899.9 ± 0.0
 SD 341.0 ± 2.397.5 ± 0.138.9 ± 2.397.5 ± 0.19.1 ± 1.286.3 ± 0.16.6 ± 1.2100.0 ± 0.010.0 ± 1.299.9 ± 0.0
Weight
 SD 114.1 ± 1.697.4 ± 0.19.7 ± 1.498.0 ± 0.10.1 ± 0.286.3 ± 0.10.2 ± 0.299.9 ± 0.00.6 ± 0.399.9 ± 0.0
 SD 230.7 ± 2.297.5 ± 0.125.2 ± 2.198.1 ± 0.13.1 ± 0.886.3 ± 0.13.8 ± 0.999.9 ± 0.05.0 ± 0.999.9 ± 0.0
 SD 341.9 ± 2.397.6 ± 0.136.5 ± 2.298.1 ± 0.19.7 ± 1.186.3 ± 0.111.2 ± 1.599.9 ± 0.012.5 ± 1.299.9 ± 0.0
Head circumference
 SD 110.7 ± 1.497.6 ± 0.14.1 ± 0.998.1 ± 0.10.3 ± 0.286.3 ± 0.10.4 ± 0.399.8 ± 0.00.5 ± 0.399.8 ± 0.0
 SD 226.9 ± 2.197.7 ± 0.114.3 ± 1.698.1 ± 0.14.2 ± 0.986.3 ± 0.14.1 ± 1.099.8 ± 0.04.3 ± 0.999.8 ± 0.0
 SD 338.7 ± 2.397.7 ± 0.125.4 ± 2.198.2 ± 0.110.7 ± 1.286.3 ± 0.112.9 ± 1.699.8 ± 0.011.3 ± 1.299.8 ± 0.0

Linear equation of sex- and age-standardized z-score as a function of age (Z = β + βt + ε).

Raw anthropometric measurement as a function of square root of age (Y = β + βt + ε).

Based on a random effects restricted cubic spline (with 5 knots) model.

  11 in total

1.  Measurement and standardization protocols for anthropometry used in the construction of a new international growth reference.

Authors:  Mercedes de Onis; Adelheid W Onyango; Jan Van den Broeck; Wm Cameron Chumlea; Reynaldo Martorell
Journal:  Food Nutr Bull       Date:  2004-03       Impact factor: 2.069

2.  A critical evaluation of statistical approaches to examining the role of growth trajectories in the developmental origins of health and disease.

Authors:  Yu-Kang Tu; Kate Tilling; Jonathan A C Sterne; Mark S Gilthorpe
Journal:  Int J Epidemiol       Date:  2013-09-14       Impact factor: 7.196

3.  WHO Child Growth Standards based on length/height, weight and age.

Authors: 
Journal:  Acta Paediatr Suppl       Date:  2006-04

4.  Comparing Methods for Identifying Biologically Implausible Values in Height, Weight, and Body Mass Index Among Youth.

Authors:  Hannah G Lawman; Cynthia L Ogden; Sandra Hassink; Giridhar Mallya; Stephanie Vander Veur; Gary D Foster
Journal:  Am J Epidemiol       Date:  2015-07-15       Impact factor: 4.897

5.  Trajectories of growth among children who have coronary events as adults.

Authors:  David J P Barker; Clive Osmond; Tom J Forsén; Eero Kajantie; Johan G Eriksson
Journal:  N Engl J Med       Date:  2005-10-27       Impact factor: 91.245

6.  Identifying outliers and implausible values in growth trajectory data.

Authors:  Seungmi Yang; Jennifer A Hutcheon
Journal:  Ann Epidemiol       Date:  2015-10-19       Impact factor: 3.797

7.  Factors predicting ante- and postnatal growth.

Authors:  Peter C Hindmarsh; Michael P P Geary; Charles H Rodeck; John C P Kingdom; Tim J Cole
Journal:  Pediatr Res       Date:  2008-01       Impact factor: 3.756

8.  Anthropometric standardisation and quality control protocols for the construction of new, international, fetal and newborn growth standards: the INTERGROWTH-21st Project.

Authors:  L Cheikh Ismail; H E Knight; E O Ohuma; L Hoch; W C Chumlea
Journal:  BJOG       Date:  2013-07-11       Impact factor: 6.531

9.  Modelling childhood growth using fractional polynomials and linear splines.

Authors:  Kate Tilling; Corrie Macdonald-Wallis; Debbie A Lawlor; Rachael A Hughes; Laura D Howe
Journal:  Ann Nutr Metab       Date:  2014-11-18       Impact factor: 3.374

10.  Critical windows for nutritional interventions against stunting.

Authors:  Andrew M Prentice; Kate A Ward; Gail R Goldberg; Landing M Jarjou; Sophie E Moore; Anthony J Fulford; Ann Prentice
Journal:  Am J Clin Nutr       Date:  2013-04-03       Impact factor: 7.045

View more
  12 in total

1.  Measurement error and misclassification in electronic medical records: methods to mitigate bias.

Authors:  Jessica C Young; Mitchell M Conover; Michele Jonsson Funk
Journal:  Curr Epidemiol Rep       Date:  2018-09-10

2.  Do Early Infant Feeding Practices and Modifiable Household Behaviors Contribute to Age-Specific Interindividual Variations in Infant Linear Growth? Evidence from a Birth Cohort in Dhaka, Bangladesh.

Authors:  Sarah L Silverberg; Huma Qamar; Farhana K Keya; Shaila S Shanta; M Munirul Islam; Tahmeed Ahmed; Joy Shi; Davidson H Hamer; Stanley Zlotkin; Abdullah Al Mahmud; Daniel E Roth
Journal:  Curr Dev Nutr       Date:  2021-04-30

3.  Prenatal dietary diversity may influence underweight in infants in a Ugandan birth-cohort.

Authors:  Isabel Madzorera; Shibani Ghosh; Molin Wang; Wafaie Fawzi; Sheila Isanaka; Ellen Hertzmark; Grace Namirembe; Bernard Bashaasha; Edgar Agaba; Florence Turyashemererwa; Patrick Webb; Christopher Duggan
Journal:  Matern Child Nutr       Date:  2021-02-17       Impact factor: 3.092

4.  Comparison of methods for interpolating gestational weight gain between clinical visits in twin and singleton pregnancies.

Authors:  Michelle C Dimitris; Jennifer A Hutcheon; Robert W Platt; Katherine P Himes; Lisa M Bodnar; Jay S Kaufman
Journal:  Ann Epidemiol       Date:  2021-04-22       Impact factor: 6.996

5.  Linear Growth Spurts are Preceded by Higher Weight Gain Velocity and Followed by Weight Slowdowns Among Rural Children in Burkina Faso: A Longitudinal Study.

Authors:  Ilana R Cliffer; Nandita Perumal; William A Masters; Elena N Naumova; Laetitia Nikiema Ouedraogo; Franck Garanet; Beatrice L Rogers
Journal:  J Nutr       Date:  2022-08-09       Impact factor: 4.687

6.  Optimisation of children z-score calculation based on new statistical techniques.

Authors:  Antonio Martinez-Millana; Jessie M Hulst; Mieke Boon; Peter Witters; Carlos Fernandez-Llatas; Ines Asseiceira; Joaquin Calvo-Lerma; Ignacio Basagoiti; Vicente Traver; Kris De Boeck; Carmen Ribes-Koninckx
Journal:  PLoS One       Date:  2018-12-20       Impact factor: 3.240

7.  Vitamin D Supplementation in Pregnancy and Lactation and Infant Growth.

Authors:  Daniel E Roth; Shaun K Morris; Stanley Zlotkin; Alison D Gernand; Tahmeed Ahmed; Shaila S Shanta; Eszter Papp; Jill Korsiak; Joy Shi; M Munirul Islam; Ishrat Jahan; Farhana K Keya; Andrew R Willan; Rosanna Weksberg; Minhazul Mohsin; Qazi S Rahman; Prakesh S Shah; Kellie E Murphy; Jennifer Stimec; Lisa G Pell; Huma Qamar; Abdullah Al Mahmud
Journal:  N Engl J Med       Date:  2018-08-09       Impact factor: 91.245

8.  Is it time to stop sweeping data cleaning under the carpet? A novel algorithm for outlier management in growth data.

Authors:  Charlotte S C Woolley; Ian G Handel; B Mark Bronsvoort; Jeffrey J Schoenebeck; Dylan N Clements
Journal:  PLoS One       Date:  2020-01-24       Impact factor: 3.240

9.  Agreement between self-reported pre-pregnancy weight and measured first-trimester weight in Brazilian women.

Authors:  Thaís Rangel Bousquet Carrilho; Kathleen M Rasmussen; Dayana Rodrigues Farias; Nathalia Cristina Freitas Costa; Mônica Araújo Batalha; Michael E Reichenheim; Eric O Ohuma; Jennifer A Hutcheon; Gilberto Kac
Journal:  BMC Pregnancy Childbirth       Date:  2020-11-26       Impact factor: 3.007

10.  Brazilian Maternal and Child Nutrition Consortium: establishment, data harmonization and basic characteristics.

Authors:  Thaís Rangel Bousquet Carrilho; Dayana Rodrigues Farias; Mônica Araújo Batalha; Nathalia Cristina Freitas Costa; Kathleen M Rasmussen; Michael E Reichenheim; Eric O Ohuma; Jennifer A Hutcheon; Gilberto Kac
Journal:  Sci Rep       Date:  2020-09-10       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.