| Literature DB >> 31187683 |
Evie van der Spoel1, Jungyeon Choi2, Ferdinand Roelfsema3, Saskia le Cessie2,4, Diana van Heemst1, Olaf M Dekkers2,3.
Abstract
Measurement errors commonly occur in 24-h hormonal data and may affect the outcomes of such studies. Measurement errors often appear as outliers in such data sets; however, no well-established method is available for their automatic detection. In this study, we aimed to compare performances of different methods for outlier detection in hormonal serial data. Hormones (glucose, insulin, thyroid-stimulating hormone, cortisol, and growth hormone) were measured in blood sampled every 10 min for 24 h in 38 participants of the Leiden Longevity Study. Four methods for detecting outliers were compared: (1) eyeballing, (2) Tukey's fences, (3) stepwise approach, and (4) the expectation-maximization (EM) algorithm. Eyeballing detects outliers based on experts' knowledge, and the stepwise approach incorporates physiological knowledge with a statistical algorithm. Tukey's fences and the EM algorithm are data-driven methods, using interquartile range and a mathematical algorithm to identify the underlying distribution, respectively. The performance of the methods was evaluated based on the number of outliers detected and the change in statistical outcomes after removing detected outliers. Eyeballing resulted in the lowest number of outliers detected (1.0% of all data points), followed by Tukey's fences (2.3%), the stepwise approach (2.7%), and the EM algorithm (11.0%). In all methods, the mean hormone levels did not change materially after removing outliers. However, their minima were affected by outlier removal. Although removing outliers affected the correlation between glucose and insulin on the individual level, when averaged over all participants, none of the 4 methods influenced the correlation. Based on our results, the EM algorithm is not recommended given the high number of outliers detected, even where data points are physiologically plausible. Since Tukey's fences is not suitable for all types of data and eyeballing is time-consuming, we recommend the stepwise approach for outlier detection, which combines physiological knowledge and an automated process.Entities:
Keywords: automatic outlier detection; hormones; measurement error; outlier; time series
Mesh:
Substances:
Year: 2019 PMID: 31187683 PMCID: PMC6637814 DOI: 10.1177/0748730419850917
Source DB: PubMed Journal: J Biol Rhythms ISSN: 0748-7304 Impact factor: 3.182
Figure 1.(a) Eyeballing detects outliers without fitting smooth curves. By visual inspection, individual experts detect outliers by taking into account that some hormones were measured in a same sample. Afterward, a consensus meeting is held, and the experts discuss all data points with conflicting detection results. (b) Tukey’s fences starts with fitting a moving average curve to per-person per-hormone data and taking residuals of all data points. Then the interquartile range (IQR = Q3–Q1) of the residuals is calculated. The data points lying outside the range between Q1 − 31QR and Q3 + 3IQR are detected as outliers. (c) The stepwise approach fits the moving average curve to per-person per-hormone data, and standardized residuals of all data points are calculated (step 1). The data points lying outside the range between −3 and 4 standard deviations are detected as outliers (step 2). Then, the residuals of 5 hormones measured at the same time points are summed. When the sum of the residuals is smaller than −8, the data points are detected as outliers (step 3). Afterward, steps 1 and 3 are repeated (step 4). (d) The expectation-maximization (EM) algorithm first fits a smoothing curve to per-person per-hormone data, and the residuals are calculated. Then, all the residuals of a hormone from all 38 participants are put in the EM algorithm. The algorithm then identifies 2 distinguishable distributions and yields the probability of each data point to be an outlier.
Percentage of time points with at least 1 detected outlier among the hormones measured and the percentage of total data points detected as outliers among the same set of hormones.[a]
| Mean (SD), | ||
|---|---|---|
| Time Points Detected to Contain an Outlier (%) | Total Data Points Detected to be Outliers (%) | |
| Eyeballing | 1.7 (2.1) | 1.0 (1.4) |
| Tukey’s fences | 9.3 (5.6) | 2.3 (1.4) |
| Stepwise approach | 5.1 (1.5) | 2.7 (1.5) |
| EM algorithm[ | 40.3 (7.7) | 11.0 (2.8) |
Mean and standard deviation in the 38 participants are given.
For the EM algorithm results, the measurement points at which the probability to be an outlier was >0.9 were counted.
Figure 2.Mean number of data points detected per hormone per method across all participants.
Figure 3.Venn diagrams visualizing the number of measurement errors detected by each method (eyeballing, stepwise approach, and Tukey’s fences) and their overlap counted in total time points (a) and in all data points (b). The overlap with the expectation-maximization algorithm is not presented here for the reasons mentioned in the Results section.
Figure 4.(a) The results of outlier detection by eyeballing in glucose, insulin, thyroid-stimulating hormone (TSH), cortisol, and growth hormone of participant 19. Hollow data points indicate detected outliers (b) The results of outlier detection by Tukey’s fences Hollow data points indicate detected outliers (c) The results of outlier detection by stepwise approach Hollow data points indicate detected outliers Hollow data points indicate detected outliers (d) The results of outlier detection by the expectation-maximization algorithm. Hollow data points indicate the probability of the data point to be an outlier is higher than 0.9.
Mean, median, minimum, and maximum values for glucose, insulin, thyroid-stimulating hormone (TSH), cortisol, and growth hormone (GH) in 24 h, before (raw data) and after outlier removal (eyeballing, Tukey’s fences, stepwise approach, and the expectation-maximization (EM) algorithm).
| Mean (SD), | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Glucose (mmol/L) | Insulin (mU/L) | TSH (mU/L) | ||||||||||
| Mean | Median | Min | Max | Mean | Median | Min | Max | Mean | Median | Min | Max | |
| Raw data | 5.09 (0.36) | 4.80 (0.39) | 2.76 (0.70) | 9.51 (1.52) | 19.90 (10.11) | 9.66 (5.51) | 2.76 (2.41) | 91.61 (54.41) | 2.02 (1.05) | 1.92 (1.01) | 1.01 (0.61) | 3.57 (1.89) |
| Eyeballing | 5.11 (0.36) | 4.81 (0.39) | 3.16 (0.53) | 9.48 (1.47) | 19.96 (10.14) | 9.66 (5.52) | 2.80 (2.46) | 91.61 (54.41) | 2.03 (1.05) | 1.93 (1.02) | 1.21 (0.67) | 3.57 (1.89) |
| Tukey’s fences | 5.07 (0.37) | 4.80 (0.39) | 3.04 (0.62) | 9.21 (1.42) | 19.96 (10.16) | 9.69 (5.52) | 3.39 (2.39) | 91.34 (54.69) | 2.02 (1.04) | 1.93 (1.02) | 1.19 (0.63) | 3.49 (1.82) |
| Stepwise approach | 5.12 (0.36) | 4.80 (0.39) | 3.29 (0.41) | 9.40 (1.48) | 20.47 (10.43) | 10.21 (5.86) | 3.54 (2.31) | 91.03 (54.26) | 2.02 (1.05) | 1.92 (1.01) | 1.21 (0.64) | 3.50 (1.81) |
| EM algorithm[ | 5.00 (0.37) | 4.77 (0.40) | 3.14 (0.58) | 9.08 (1.47) | 20.05 (10.35) | 10.16 (5.97) | 3.74 (2.44) | 87.65 (49.73) | 1.98 (1.01) | 1.90 (1.00) | 1.24 (0.73) | 3.33 (1.64) |
| Cortisol (µmol/L) | GH (mU/L) | |||||||||||
| Mean | Median | Min | Max | Mean | Median | Min | Max | |||||
| Raw data | 0.21 (0.05) | 0.18 (0.05) | 0.05 (0.03) | 0.57 (0.09) | 2.49 (1.51) | 0.95 (0.94) | 0.16 (0.22) | 20.63 (10.31) | ||||
| Eyeballing | 0.21 (0.05) | 0.18 (0.05) | 0.05 (0.03) | 0.57 (0.09) | 2.48 (1.58) | 0.95 (0.94) | 0.16 (0.22) | 20.63 (10.31) | ||||
| Tukey’s fences | 0.20 (0.05) | 0.18 (0.05) | 0.05 (0.03) | 0.55 (0.09) | 2.47 (1.55) | 0.95 (0.94) | 0.17 (0.22) | 20.27 (10.67) | ||||
| Stepwise approach | 0.21 (0.05) | 0.18 (0.05) | 0.05 (0.03) | 0.56 (0.09) | 2.51 (1.54) | 0.96 (0.95) | 0.17 (0.22) | 20.27 (10.59) | ||||
| EM algorithm[ | 0.18 (0.04) | 0.16 (0.05) | 0.05 (0.03) | 0.50 (0.08) | 2.24 (1.48) | 0.94 (1.02) | 0.18 (0.22) | 18.90 (11.13) | ||||
Mean and standard deviation in the 38 participants are given.
For the EM algorithm results, weighted mean and standard deviation are used.
Cross-correlations between glucose and insulin.
| Mean (SD), | |||
|---|---|---|---|
| Correlation at Lag Time 0 | Maximum Cross-correlation | Lag Time at Maximum Cross-correlation (min) | |
| Raw data | 0.74 (0.12) | 0.74 (0.12) | −4.7 (7.3) |
| Eyeballing | 0.74 (0.11) | 0.75 (0.12) | −5.3 (7.6) |
| Tukey’s fences | 0.73 (0.14) | 0.74 (0.14) | −6.3 (8.2) |
| Stepwise approach | 0.74 (0.12) | 0.75 (0.12) | −5.0 (8.0) |
| EM algorithm[ | 0.71 (0.12) | 0.73 (0.17) | −9.5 (9.8) |
Mean and standard deviation across 38 participants.
For the expectation-maximization (EM) algorithm results, weighted correlation is used.
Figure 5.Change in correlation at lag time 0 (%) after removal of measurement errors detected by the 4 methods: eyeballing, Tukey’s fences, stepwise approach, and the expectation-maximization algorithm. Each bar represents an individual participant.
Characteristics of 4 outlier detection methods.
| Eyeballing | Tukey’s Fences | Stepwise Approach | The Expectation-maximization Algorithm | |
|---|---|---|---|---|
| Underlying assumptions | • Researchers’ expert knowledge is reliable | • Hormones follows a smooth trajectory over 24 hours | • Two distinguishable distributions (outliers/ non-outliers) | |
| Efficiency and generalizability of the method | • Relatively time-consuming process | • Although it needs several adjustments for different types of time series (e.g., parameters for smoothing curves), the processes can be easily applied to different settings | ||
| Strength and limitations | • Explicit knowledge and clear physiological reasoning behind the detection process | • The method is highly affected by smoothing techniques and the type of data, especially when the hormone levels are mostly constant over time | • Measurement error both within a hormone and within a sample can be detected | • Yields a probability |