| Literature DB >> 29739415 |
Warangkana Ruckthongsook1, Chetan Tiwari2, Joseph R Oppong3, Prathiba Natesan4.
Abstract
BACKGROUND: Maps of disease rates produced without careful consideration of the underlying population distribution may be unreliable due to the well-known small numbers problem. Smoothing methods such as Kernel Density Estimation (KDE) are employed to control the population basis of spatial support used to calculate each disease rate. The degree of smoothing is controlled by a user-defined parameter (bandwidth or threshold) which influences the resolution of the disease map and the reliability of the computed rates. Methods for automatically selecting a smoothing parameter such as normal scale, plug-in, and smoothed cross validation bandwidth selectors have been proposed for use with non-spatial data, but their relative utilities remain unknown. This study assesses the relative performance of these methods in terms of resolution and reliability for disease mapping.Entities:
Keywords: Bandwidth selection; Disease mapping; Kernel density estimation; Monte Carlo simulation; Threshold
Mesh:
Year: 2018 PMID: 29739415 PMCID: PMC5938815 DOI: 10.1186/s12942-018-0129-9
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Fig. 1Geographic distribution of age-specific heart disease mortality rates for males aged 65 years and older, 2009–2013. Maps were created using the adaptive bandwidth kernel density estimation method with various bandwidths (h): a h = 50; b h = 100; c h = 500; d h = 1000. (Note: the data were obtained from CDC NCHS [16])
Fig. 2Flow chart of methodology showing the steps using in this study
Age-adjusted and age-specific heart disease death rates for males in Texas by age group, 2009–2013 [16], and population distribution from the 2010 U.S. Census Bureau [27]
| Age | Population | Range of aggregated population at the ZCTA level | Rate (per 100,000) |
|---|---|---|---|
| 35–44 | 1,722,904 | [1, 7925] | 33.87 |
| 45–54 | 1,702,639 | [1, 7407] | 115.15 |
| 55–64 | 1,256,976 | [1, 4948] | 297.36 |
| 65+ | 1,135,517 | [1, 4792] | 1245.93 |
| Total (35+) | 5,818,036 | [1, 22,555] | 351.15 |
Fig. 3The running RMSE between the simulated baseline rates and the true value as a function of the number of replicates (L) for all age groups
Summaries of characteristics of simulated baseline rate distribution
| Age group | Mean | SD | Coverage rate (%) | Over-estimated (%) | Under-estimated (%) |
|---|---|---|---|---|---|
| 35–44 | 33.92 | 1.40 | 17 | 50.6 | 49.4 |
| 45–54 | 115.17 | 2.52 | 11 | 49.4 | 50.6 |
| 55–64 | 297.60 | 4.49 | 20 | 56.2 | 43.8 |
| 65+ | 1245.93 | 10.21 | 16 | 47.6 | 52.4 |
| 35+ | 351.12 | 2.27 | 14 | 52.3 | 47.7 |
Descriptive results and calculated thresholds stratified by age group
| Age groups | Total population | Range | No. of ZCTAs | Calculated thresholds | % ZCTAs with specified minimum population | ||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
| Median | ≤ 100 (%) | ≤ 300 (%) | ||||
| 35–44 | 1,722,904 | [1, 7925] | 1911 | 53 | 56 | 280 | 327 | 32 | 48 |
| 45–54 | 1,702,639 | [1, 7407] | 1910 | 57 | 55 | 255 | 399 | 28 | 45 |
| 55–64 | 1,256,976 | [1, 4948] | 1906 | 44 | 41 | 177 | 342 | 30 | 48 |
| 65+ | 1,135,517 | [1, 4792] | 1902 | 41 | 40 | 156 | 330 | 28 | 48 |
| Total (35 +) | 5,818,036 | [1, 22,555] | 1920 | 200 | 189 | 837 | 1411 | 14 | 25 |
Fig. 4Density curves overlaid on population distribution (age > 35; ZCTA level). Column A describes the gamma distribution. Column B describes threshold choices
Characteristics of the population density curve estimates from various thresholds stratified by age groups
| Desirable characteristics | Density curve characteristics | Age groups | ||||
|---|---|---|---|---|---|---|
| 35–44 | 45–54 | 55–64 | 65+ | 35+ | ||
| Most | Density curve is smoother, and fluctuations in the tail ceases to exist |
|
|
|
| 500 |
|
|
|
|
|
| ||
| 500 | 500 | 500 | 500 | 1000 | ||
| Density curve closely matches to the actual gamma distribution and contains fluctuations at the tail | 100 | 100 | 100 | 100 |
| |
| The highest density estimates of density curve is greater than that of the actual gamma distribution, and the density curve contains high fluctuations at the tail | 50 | 50 | 50 | 50 | 50 | |
|
|
|
|
| 100 | ||
|
|
|
|
| |||
| Density curve is smoother and difficult to distinguish between the mode and tail | 1000 | 1000 | 1000 | 1000 |
| |
| Density curve is flat and cannot distinguish between the mode and tail | 5000 | 5000 | 5000 | 5000 | 5000 | |
| 10,000 | 10,000 | 10,000 | 10,000 | 10,000 | ||
Fig. 5The distribution of estimated state rates of each threshold from 100 repetitions: a Aged 35 to 44 years; b Aged 45 to 54 years; c Aged 55 to 64 years; d Aged 65 years and older; e Aged 35 years and older
Fig. 6Geographic distribution of age-specific heart disease mortality rates for males aged 35 years and older. Maps were created using the adaptive KDE method with simulated cases as numerators, population data as denominator, and threshold choices (h) derived from the bandwidth selector methods and arbitrary choices