| Literature DB >> 27216531 |
Silvia Rizzi1,2, Mikael Thinggaard3,4, Gerda Engholm5, Niels Christensen5, Tom Børge Johannesen6, James W Vaupel3,4,7, Rune Lindahl-Jacobsen3,4.
Abstract
BACKGROUND: Histograms are a common tool to estimate densities non-parametrically. They are extensively encountered in health sciences to summarize data in a compact format. Examples are age-specific distributions of death or onset of diseases grouped in 5-years age classes with an open-ended age group at the highest ages. When histogram intervals are too coarse, information is lost and comparison between histograms with different boundaries is arduous. In these cases it is useful to estimate detailed distributions from grouped data.Entities:
Keywords: Aggregated count data; Smoothing; Ungrouping methods
Mesh:
Year: 2016 PMID: 27216531 PMCID: PMC4877978 DOI: 10.1186/s12874-016-0157-8
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Simulation study scheme
| Scenario 1 | Scenario 2 | Scenario 3 | Scenario 4 | |
|---|---|---|---|---|
| Distribution | Weibull | Weibull | Weibull | Weibull |
| Sample size |
|
|
|
|
| Age groups | 5-years | 5-years | 5-years with 85+ | 5-years with 85+ |
For each scenario 500 simulation repetitions
Fig. 1Weibull true density (gray line with overplotted points) and models’ fitted densities from 5-years age groups and n=1000 (black lines)
Fig. 3Weibull true density (gray line with overplotted points) and models’ fitted densities from 5-years age groups with open-ended age interval 85+ and n=1000 (black lines)
Fig. 2Measures of performance for Weibull density from 5-years age groups. Integrated absolute error (IAE), Integrated squared error (ISE), Kullback Leibler distance (KL)
Fig. 4Measures of performance for Weibull density from 5-years age groups with open-ended age interval 85+. Integrated absolute error (IAE), Integrated squared error (ISE), Kullback Leibler distance (KL)
Fig. 5Age-at-death for all cancers in Denmark for 2010. Empirical data (gray line with overplotted points), grouped counts (histogram) and models’ estimates from 5-years age groups with open-ended age interval 85+ (black smooth lines)
Fig. 6Age-at-onset of testis cancer in Denmark for 1980, 1990, 2000 and 2010 combined. Empirical data (gray line with overplotted points), grouped counts (histogram) and models’ estimates from 5-years age groups (black smooth lines)
Fig. 7Measures of performance for the empirical data analyzed: Overall cancer deaths from 5-years age groups (black squares); Overall cancer deaths from 5-years age groups with open-ended age interval 85+ (dark gray circles); Testis cancer incidence from 5-years age groups (light gray triangles). Integrated absolute error (IAE), Integrated squared error (ISE), Kullback Leibler distance (KL)
Selected ungrouping methods for comparison
| Method and references | Abbreviation | Program for estimation |
|---|---|---|
| Bootstrap kernel density estimator [ | bootkde | bda R package, bde function, bootkde method |
| Piecewise cubic Hermite interpolating polynomial [ | hermite spline | signal R package, interp1 function, pchip method |
| Spline interpolation with Hyman filter [ | hyman spline | demography R package, cm.spline function |
| Iterated conditional expectation kernel density estimator [ | ickde | ICE R package, ickde function |
| Penalized composite link model [ | pclm | R code in Rizzi et al. (2015) |