| Literature DB >> 27488416 |
Junhee Han1, Li Zhu2, Martin Kulldorff3, Scott Hostovich4, David G Stinchcomb5, Zaria Tatalovich6, Denise Riedel Lewis6, Eric J Feuer6.
Abstract
BACKGROUND: Spatial and space-time scan statistics are widely used in disease surveillance to identify geographical areas of elevated disease risk and for the early detection of disease outbreaks. With a scan statistic, a scanning window of variable location and size moves across the map to evaluate thousands of overlapping windows as potential clusters, adjusting for the multiple testing. Almost always, the method will find many very similar overlapping clusters, and it is not useful to report all of them. This paper proposes to use the Gini coefficient to help select which of the many overlapping clusters to report.Entities:
Keywords: Cancer mortality; Cluster detection; Cluster reporting size; Disease surveillance; Gini coefficient; Log likelihood ratio; SaTScan; Scan statistic; Spatial statistics
Mesh:
Year: 2016 PMID: 27488416 PMCID: PMC4971627 DOI: 10.1186/s12942-016-0056-6
Source DB: PubMed Journal: Int J Health Geogr ISSN: 1476-072X Impact factor: 3.918
Selection of maximum spatial window size in 81 recent publications
| Maximum spatial window size | <5 % | 10 % | 15 % | 25 % | 30 % | 50 % | Multiple maxima | Distance based | Not specified | Total |
|---|---|---|---|---|---|---|---|---|---|---|
| # of pub. | 5 | 3 | 1 | 5 | 1 | 22 | 8 | 13 | 23 | 81 |
| % of pub. | 6 | 4 | 1 | 6 | 1 | 27 | 10 | 16 | 28 | 100 |
Using Google Scholar, we ran a search of publications with both words “SaTScan” and “cancer” published during 2015 and yielded 156 results. Restricting the search to scientific papers published in peer-reviewed journals in English, we found a total of 81 papers using the SaTScan™ software (www.satscan.org). This table summarizes the maximum spatial window size (MSWS) used in these 81 papers. 8 papers (10 %) erroneously (see reason in “Maximum spatial window size of reported clusters” section) used multiple MSWS ranging from 2 to 4 choices, with 50 % always included as one of them
Fig. 1Sizes of clusters by the maximum spatial window size (U.S. Female Lung Cancer Mortality, 2006)
Fig. 2Spatial clusters and the relative risks at various maximum spatial window sizes (U.S. Female Lung Cancer Mortality, 2006; Clusters are identified in colour with relative risks labelled on clusters.)
Fig. 3Illustration of Lorentz curve and Gini coefficient for a cluster model with three clusters
Cancer sites of the actual US cancer mortality data, 2006
| Cancer site | Total number of deaths | Years of data aggregation |
|---|---|---|
| Lung, male | 88,791 | 2006 |
| Lung, female | 69,037 | 2006 |
| Breast | 40,600 | 2006 |
| Prostate | 28,256 | 2006 |
| Ovary | 14,781 | 2002–2006 |
| Bladder, male | 9368 | 2000–2006 |
| Bladder, female | 4049 | 2000–2006 |
| Cervical | 3953 | 2000–2006 |
Fig. 4Simulated cluster configurations with one large cluster and three smaller clusters. a Urban center with small rural clusters. b Rural center with small rural clusters
Optimal maximum reported cluster size (MRCS, in percent of population) chosen by the Gini coefficient for actual cancer mortality data and simulated cancer mortality data
| Cancer site | Year(s) | Actual dataa | Simulated data |
|---|---|---|---|
| Male lung | 2006 | 50 (10) | 10 |
| Female lung | 2006 | 15 (10) | 15 |
| Breast | 2006 | 30 (25) | 30 |
| Prostate | 2006 | 10 (2) | 10 |
| Ovary | 2002–2006 | 25 (30) | 25 |
| Male bladder | 2000–2006 | 5 (10) | 5 |
| Female bladder | 2000–2006 | 50 (15) | 15 |
| Cervical | 2000–2006 | 5 (10) | 5 |
aNumbers in parentheses are the second best MRCS chosen by Gini coefficient
Fig. 5Values of Gini and CLIC at various maximum spatial window sizes for U.S. Female Lung Cancer Mortality, 2006
Optimal MCRS identified by the Gini coefficient in the Northeastern USA benchmark data
| # counties | % pop in cluster | Optimal MRCS (%) | |
|---|---|---|---|
| 600 cases | 6000 cases | ||
| Rural | |||
| 1 | 0.01 | 1 | 1 |
| 4 | 0.5 | 1 | 1 |
| 8 | 0.7 | 1 | 2 |
| 16 | 1.2 | 1 | 2 |
| Urban | |||
| 1 | 2.7 | 3 | 3 |
| 4 | 3.6 | 10 | 10 |
| 8 | 10.0 | 20 | 25 |
| 16 | 25.8 | 40 | 30 |
| Mixed | |||
| 1 | 2.4 | 6 | 3 |
| 4 | 2.8 | 5 | 5 |
| 8 | 3.8 | 6 | 6 |
| 16 | 5.7 | 6 | 6 |
Comparison of the Gini coefficient and the hierarchical cluster reporting criteria using the simulated cluster configurations in the Northeastern USA
| (A) Urban centre with small rural clusters | (B) Rural centre with small urban clusters | |||
|---|---|---|---|---|
| One large cluster | Three small clusters | One large cluster | Three small clusters | |
| # counties | 57 | 11 | 40 | 6 |
| Total population in clusters (%) | 4,344,150 (14.7) | 681,984 (2.3) | 2,736,674 (9.3) | 780,451 (2.6) |
| Relative risk | 1.18 | 1.46 | 1.23 | 1.43 |
| Gini | ||||
| 1 | 90 % | 0 | 76 % | 15 % |
| 2 | 1 % | 12 % | 13 % | 28 % |
| 3 | 0 | 85 % | 4 % | 56 % |
| 4+ | 9 % | 2 % | 7 % | 1 % |
| Hierarchical | ||||
| 1 | 100 % | 2 % | 100 % | 64 % |
| 2 | 0 | 52 % | 0 | 25 % |
| 3 | 0 | 46 % | 0 | 11 % |
| 4+ | 0 | 0 | 0 | 0 |