| Literature DB >> 33187469 |
Tobias Hepp1, Jakob Zierk2, Manfred Rauh2, Markus Metzler2, Andreas Mayr3.
Abstract
BACKGROUND: Medical decision making based on quantitative test results depends on reliable reference intervals, which represent the range of physiological test results in a healthy population. Current methods for the estimation of reference limits focus either on modelling the age-dependent dynamics of different analytes directly in a prospective setting or the extraction of independent distributions from contaminated data sources, e.g. data with latent heterogeneity due to unlabeled pathologic cases. In this article, we propose a new method to estimate indirect reference limits with non-linear dependencies on covariates from contaminated datasets by combining the framework of mixture models and distributional regression.Entities:
Keywords: Distributional regression; Finite mixture models; Latent class regression; Reference limits
Mesh:
Substances:
Year: 2020 PMID: 33187469 PMCID: PMC7666475 DOI: 10.1186/s12859-020-03853-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Simulation example. Exemplary data with n = 500 cases, gap parameter c = 20 and . Figures in the bottom row show the true mixture and component densities for
Number of non-converged runs (out of 1000) per setting for latent class distributional regression (LCDR)
| 0.6 | 0.7 | 0.8 | 0.6 | 0.7 | 0.8 | ||
|---|---|---|---|---|---|---|---|
| 5 | 79 | 87 | 125 | 24 | 58 | 86 | 459 |
| 10 | 51 | 63 | 105 | 14 | 26 | 60 | 319 |
| 15 | 23 | 34 | 96 | 3 | 7 | 30 | 193 |
| 20 | 4 | 17 | 55 | 0 | 1 | 16 | 93 |
| 157 | 201 | 381 | 41 | 92 | 192 | 1064 | |
Simulation results for n = 500
| LCDR | GOLD | NAIVE | |||||
|---|---|---|---|---|---|---|---|
| 0.6 | 20 | − 0.185 | (2.837) | − 0.160 | (0.942) | 26.60 | (1.223) |
| 15 | − 0.318 | (3.365) | − 0.172 | (0.936) | 19.33 | (1.160) | |
| 10 | − 0.924 | (3.753) | − 0.169 | (0.945) | 12.81 | (1.087) | |
| 5 | − 1.997 | (3.897) | − 0.179 | (0.936) | 7.410 | (0.980) | |
| 0.7 | 20 | − 0.496 | (2.528) | − 0.110 | (0.863) | 21.86 | (1.225) |
| 15 | − 0.711 | (2.885) | − 0.106 | (0.859) | 15.66 | (1.146) | |
| 10 | − 1.596 | (3.297) | − 0.090 | (0.858) | 10.19 | (1.058) | |
| 5 | − 2.895 | (3.810) | − 0.103 | (0.858) | 5.760 | (0.958) | |
| 0.8 | 20 | − 0.563 | (2.168) | − 0.082 | (0.821) | 16.18 | (1.205) |
| 15 | − 0.991 | (2.459) | − 0.089 | (0.824) | 11.40 | (1.129) | |
| 10 | − 2.112 | (2.943) | − 0.064 | (0.825) | 7.243 | (1.030) | |
| 5 | − 3.599 | (3.645) | − 0.062 | (0.825) | 3.967 | (0.921) | |
| 0.6 | 20 | 17.05 | (17.81) | 3.792 | (2.770) | 747.43 | (74.07) |
| 15 | 22.15 | (26.54) | 3.804 | (2.788) | 397.42 | (50.93) | |
| 10 | 28.59 | (39.72) | 3.843 | (2.799) | 177.19 | (31.74) | |
| 5 | 42.32 | (57.01) | 3.788 | (2.759) | 61.99 | (16.77) | |
| 0.7 | 20 | 13.06 | (11.57) | 3.253 | (2.325) | 509.10 | (61.96) |
| 15 | 16.31 | (17.00) | 3.250 | (2.330) | 264.13 | (41.33) | |
| 10 | 23.82 | (29.69) | 3.258 | (2.352) | 114.47 | (24.89) | |
| 5 | 41.57 | (56.00) | 3.238 | (2.310) | 39.07 | (13.05) | |
| 0.8 | 20 | 9.919 | (8.52) | 2.869 | (2.154) | 284.09 | (45.34) |
| 15 | 13.16 | (12.49) | 2.883 | (2.183) | 143.68 | (29.81) | |
| 10 | 21.86 | (26.45) | 2.883 | (2.168) | 60.38 | (17.29) | |
| 5 | 45.80 | (58.75) | 2.880 | (2.176) | 20.39 | (8.814) | |
Reported are the means and standard deviations of the integrated error and the integrated squared error over all simulation runs for latent class distributional regression (LCDR), “gold standard” (GOLD) and “naive” fit (NAIVE)
Simulation results for n = 1000
| LCDR | GOLD | NAIVE | |||||
|---|---|---|---|---|---|---|---|
| 0.6 | 20 | − 0.200 | (2.110) | − 0.059 | (0.652) | 26.66 | (0.872) |
| 15 | − 0.218 | (2.601) | − 0.060 | (0.651) | 19.38 | (0.828) | |
| 10 | − 0.309 | (2.755) | − 0.056 | (0.653) | 12.87 | (0.773) | |
| 5 | − 0.987 | (3.057) | − 0.057 | (0.656) | 7.501 | (0.706) | |
| 0.7 | 20 | − 0.436 | (1.876) | − 0.056 | (0.594) | 21.92 | (0.871) |
| 15 | − 0.482 | (2.183) | − 0.057 | (0.594) | 15.72 | (0.825) | |
| 10 | − 0.680 | (2.314) | − 0.061 | (0.594) | 10.23 | (0.764) | |
| 5 | − 1.621 | (2.53) | − 0.063 | (0.594) | 5.808 | (0.689) | |
| 0.8 | 20 | − 0.394 | (1.637) | − 0.061 | (0.565) | 16.17 | (0.845) |
| 15 | − 0.540 | (1.908) | − 0.054 | (0.568) | 11.42 | (0.793) | |
| 10 | − 1.010 | (2.022) | − 0.057 | (0.570) | 7.263 | (0.733) | |
| 5 | − 2.199 | (2.560) | − 0.050 | (0.570) | 4.009 | (0.660) | |
| 0.6 | 20 | 8.326 | (8.142) | 1.910 | (1.378) | 746.50 | (52.97) |
| 15 | 11.69 | (14.89) | 1.911 | (1.380) | 396.38 | (36.42) | |
| 10 | 13.47 | (17.26) | 1.913 | (1.383) | 176.55 | (22.58) | |
| 5 | 19.92 | (28.98) | 1.914 | (1.384) | 61.46 | (12.09) | |
| 0.7 | 20 | 6.764 | (5.920) | 1.692 | (1.185) | 506.36 | (43.79) |
| 15 | 8.532 | (7.778) | 1.694 | (1.187) | 262.09 | (29.60) | |
| 10 | 10.02 | (10.58) | 1.692 | (1.186) | 112.57 | (17.90) | |
| 5 | 17.01 | (25.73) | 1.685 | (1.181) | 37.64 | (9.275) | |
| 0.8 | 20 | 5.150 | (4.574) | 1.495 | (1.007) | 279.49 | (32.28) |
| 15 | 6.634 | (5.876) | 1.494 | (1.009) | 140.29 | (21.19) | |
| 10 | 8.354 | (9.089) | 1.494 | (1.003) | 58.09 | (12.47) | |
| 5 | 19.34 | (27.67) | 1.500 | (1.010) | 18.94 | (6.266) | |
Reported are the means and standard deviations of the integrated error and integrated squared error over all simulation runs for latent class distributional regression (LCDR), “gold standard” (GOLD) and “naive” fit (NAIVE)
Fig. 2Estimated 95%-quantiles with n = 500. Shown are the first 100 estimated quantiles from four different settings, where is the proportion of observations sampled from the corresponding component and c is the amount of overlap as described in the simulation section
Fig. 3Estimated 95%-quantiles with n = 1000. Shown are the first 100 estimated quantiles from four different settings, where is the proportion of observations sampled from the corresponding component and c is the amount of overlap as described in the simulation section
Fig. 4Reference intervals for hemoglobin concentration. Shaded areas represent the estimated distribution of healthy hemoglobin concentration enclosed by the 2.5% and 97.5% quantiles for boys (left) and girls (right) represented by colored solid lines, whereas the dashed lines result from fitting separate models for boys and girls. Black solid lines show solutions from an alternative approach estimated by splitting the population into multiple subgroups