| Literature DB >> 28753674 |
Abstract
The spatial scan statistic is an important tool for spatial cluster detection. There have been numerous studies on scanning window shapes. However, little research has been done on the maximum scanning window size or maximum reported cluster size. Recently, Han et al. proposed to use the Gini coefficient to optimize the maximum reported cluster size. However, the method has been developed and evaluated only for the Poisson model. We adopt the Gini coefficient to be applicable to the spatial scan statistic for ordinal data to determine the optimal maximum reported cluster size. Through a simulation study and application to a real data example, we evaluate the performance of the proposed approach. With some sophisticated modification, the Gini coefficient can be effectively employed for the ordinal model. The Gini coefficient most often picked the optimal maximum reported cluster sizes that were the same as or smaller than the true cluster sizes with very high accuracy. It seems that we can obtain a more refined collection of clusters by using the Gini coefficient. The Gini coefficient developed specifically for the ordinal model can be useful for optimizing the maximum reported cluster size for ordinal data and helpful for properly and informatively discovering cluster patterns.Entities:
Mesh:
Year: 2017 PMID: 28753674 PMCID: PMC5533428 DOI: 10.1371/journal.pone.0182234
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
A hypothetical example of cluster detection analysis results for three different MRCS and the values of the Gini coefficient (see Fig 1).
| MRCS | Cluster | LLR | Gini | ||
|---|---|---|---|---|---|
| 40% | 1 | 400 | (10, 50, 160, 180) | 194.33 | 0.124 |
| 30% | 1 | 300 | (5, 35, 80, 180) | 180.16 | 0.176 |
| 2 | 200 | (10, 30, 70, 90) | 55.09 | ||
| 20% | 1 | 200 | (5, 5, 40, 150) | 173.21 | 0.163 |
| 2 | 100 | (5, 10, 30, 55) | 35.13 | ||
| 3 | 100 | (10, 10, 30, 50) | 24.28 |
# Obs in each category, number of observations in each category; LLR, log-likelihood ratio.
Fig 1Lorenz curves for the ordinal model constructed from the hypothetical example of cluster detection analysis results for three different MRCS (see Table 1).
Fig 2Three simulated cluster models.
(a) model 1: a single circular cluster, (b) model 2: a single elliptic cluster, and (c) model 3: two clusters slightly apart from each other.
Simulation results of cluster model 1 (10% and 40% of the total cases in the true cluster).
Maximum reported cluster sizes chosen by the Gini coefficient at least once are only shown. Cells most chosen as the optimal maximum size are shaded in gray.
| Maximum reported cluster size | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 3 | 4 | 5 | 6 | 8 | 10 | 12 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 50 | Default | |
| Scenario A | |||||||||||||||||
| 200 cases | |||||||||||||||||
| | 0 | 3 | 0 | 36 | 40 | 109 | 729 | 5 | 36 | 24 | 15 | 2 | 1 | 0 | 0 | 0 | |
| Sensitivity | - | 0.67 | - | 0.65 | 0.64 | 0.77 | 1.00 | 0.80 | 0.99 | 0.99 | 1.00 | 1.00 | 1.00 | - | - | - | 0.96 |
| PPV | - | 1.00 | - | 1.00 | 0.98 | 0.98 | 1.00 | 0.65 | 0.75 | 0.58 | 0.50 | 0.43 | 0.38 | - | - | - | 0.97 |
| 800 cases | |||||||||||||||||
| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 134 | 109 | 189 | 3 | 492 | 55 | 18 | |
| Sensitivity | - | - | - | - | - | - | - | - | - | 0.98 | 0.98 | 1.00 | 0.89 | 1.00 | 1.00 | 1.00 | 0.99 |
| PPV | - | - | - | - | - | - | - | - | - | 0.99 | 0.99 | 0.99 | 0.62 | 1.00 | 0.71 | 0.52 | 0.96 |
| Scenario B | |||||||||||||||||
| 200 cases | |||||||||||||||||
| | 26 | 120 | 17 | 16 | 15 | 31 | 2 | 19 | 20 | 23 | 16 | 9 | 11 | 8 | 11 | 12 | |
| Sensitivity | 0.17 | 0.33 | 0.10 | 0.23 | 0.33 | 0.65 | 0.00 | 0.61 | 0.70 | 0.74 | 0.92 | 1.00 | 0.78 | 0.92 | 0.88 | 0.89 | 0.72 |
| PPV | 0.50 | 0.94 | 0.29 | 0.66 | 1.00 | 0.95 | 0.00 | 0.61 | 0.57 | 0.49 | 0.46 | 0.42 | 0.28 | 0.28 | 0.25 | 0.22 | 0.56 |
| 800 cases | |||||||||||||||||
| | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 5 | 0 | 0 | 4 | 41 | 85 | 442 | 191 | 231 | |
| Sensitivity | - | - | - | - | - | - | 0.33 | 0.33 | - | - | 0.33 | 0.67 | 0.67 | 0.99 | 1.00 | 1.00 | 0.94 |
| PPV | - | - | - | - | - | - | 1.00 | 1.00 | - | - | 1.00 | 1.00 | 0.86 | 0.99 | 0.68 | 0.51 | 0.82 |
| Scenario C | |||||||||||||||||
| 200 cases | 0 | 9 | 3 | 11 | 14 | 180 | 598 | 13 | 60 | 60 | 28 | 11 | 5 | 3 | 2 | 3 | |
| | - | 0.41 | 0.33 | 0.52 | 0.36 | 0.67 | 1.00 | 0.69 | 0.94 | 0.97 | 1.00 | 1.00 | 1.00 | 1.00 | 0.83 | 1.00 | 0.90 |
| Sensitivity | - | 1.00 | 0.67 | 1.00 | 0.96 | 0.99 | 1.00 | 0.63 | 0.73 | 0.59 | 0.49 | 0.42 | 0.35 | 0.30 | 0.22 | 0.24 | 0.92 |
| PPV | |||||||||||||||||
| 800 cases | |||||||||||||||||
| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12 | 10 | 94 | 11 | 717 | 111 | 45 | |
| Sensitivity | - | - | - | - | - | - | - | - | - | 0.67 | 0.70 | 0.97 | 0.79 | 1.00 | 1.00 | 1.00 | 0.99 |
| PPV | - | - | - | - | - | - | - | - | - | 1.00 | 0.87 | 0.98 | 0.89 | 1.00 | 0.71 | 0.52 | 0.94 |
| Scenario D | |||||||||||||||||
| 200 cases | |||||||||||||||||
| | 0 | 23 | 3 | 14 | 15 | 214 | 450 | 26 | 78 | 71 | 45 | 24 | 13 | 8 | 8 | 8 | |
| Sensitivity | - | 0.33 | 0.33 | 0.43 | 0.38 | 0.67 | 0.99 | 0.67 | 0.93 | 0.95 | 0.99 | 0.99 | 1.00 | 1.00 | 0.96 | 0.96 | 0.81 |
| PPV | - | 1.00 | 0.67 | 1.00 | 0.97 | 0.99 | 1.00 | 0.66 | 0.71 | 0.59 | 0.48 | 0.42 | 0.35 | 0.31 | 0.27 | 0.24 | 0.83 |
| 800 cases | |||||||||||||||||
| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 7 | 8 | 55 | 16 | 658 | 153 | 103 | |
| Sensitivity | - | - | - | - | - | - | - | - | - | 0.67 | 0.67 | 0.92 | 0.71 | 1.00 | 1.00 | 1.00 | 0.99 |
| PPV | - | - | - | - | - | - | - | - | - | 1.00 | 0.92 | 0.98 | 0.96 | 1.00 | 0.70 | 0.53 | 0.92 |
| Scenario E | |||||||||||||||||
| 200 cases | |||||||||||||||||
| | 0 | 1 | 0 | 29 | 25 | 130 | 684 | 9 | 47 | 40 | 18 | 5 | 5 | 6 | 1 | 0 | |
| Sensitivity | - | 0.33 | - | 0.59 | 0.57 | 0.71 | 1.00 | 0.67 | 0.95 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | - | 0.92 |
| PPV | - | 1.00 | - | 0.95 | 1.00 | 1.00 | 1.00 | 0.67 | 0.74 | 0.60 | 0.49 | 0.43 | 0.37 | .31 | 0.27 | - | 0.94 |
| 800 cases | |||||||||||||||||
| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 36 | 19 | 37 | 4 | 767 | 115 | 22 | |
| Sensitivity | - | - | - | - | - | - | - | - | - | 1.00 | 0.97 | 0.97 | 0.83 | 1.00 | 1.00 | 1.00 | 1.00 |
| PPV | - | - | - | - | - | - | - | - | - | 1.00 | 0.97 | 1.00 | 0.75 | 1.00 | 0.71 | 0.52 | 0.95 |
# of OMRCS, frequency chosen as the optimal maximum reported cluster size by the Gini coefficient among 1000 random data sets; PPV, positive predictive value.
Simulation results of cluster model 2 (20%, 30%, and 40% of the total cases in the true cluster of irregular shape).
Maximum reported cluster sizes chosen by the Gini coefficient at least once are only shown. Cells most chosen as the optimal maximum size are shaded in gray.
| Maximum reported cluster size | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 6 | 8 | 10 | 12 | 15 | 20 | 25 | 30 | 35 | 40 | 45 | 50 | Default | |
| Circular shape | ||||||||||||||
| 2000 cases | ||||||||||||||
| | 28 | 554 | 291 | 10 | 9 | 98 | 0 | 0 | 10 | 0 | 0 | 0 | 0 | |
| Sensitivity | 1.00 | 1.00 | 1.00 | 0.98 | 1.00 | 1.00 | - | - | 1.00 | - | - | - | - | 0.99 |
| PPV | 1.00 | 1.00 | 1.00 | 0.77 | 0.78 | 1.00 | - | - | 0.71 | - | - | - | - | 0.74 |
| 3000 cases | ||||||||||||||
| | 556 | 27 | 317 | 8 | 92 | 0 | 0 | 0 | 0 | 0 | 0 | |||
| Sensitivity | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | - | - | - | - | - | - | 0.99 | ||
| PPV | 1.00 | 1.00 | 1.00 | 0.5 | 1.00 | - | - | - | - | - | - | 0.72 | ||
| 4000 cases | ||||||||||||||
| | 0 | 0 | 0 | 29 | 536 | 231 | 101 | 57 | 36 | 0 | 0 | 0 | 0 | |
| Sensitivity | - | - | - | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | - | - | - | - | 1.00 |
| PPV | - | - | - | 1.00 | 1.00 | 1.00 | 0.99 | 0.99 | 1.00 | - | - | - | - | 0.71 |
| Elliptic shape | ||||||||||||||
| 2000 cases | ||||||||||||||
| | 37 | 672 | 104 | 35 | 5 | 102 | 40 | 5 | 0 | 0 | 0 | 0 | 0 | |
| Sensitivity | 0.99 | 0.99 | 1.00 | 0.98 | 0.92 | 0.99 | 1.00 | 1.00 | - | - | - | - | - | 0.97 |
| PPV | 1.00 | 1.00 | 0.92 | 0.85 | 1.00 | 0.87 | 1.00 | 0.83 | - | - | - | - | - | 0.99 |
| 3000 cases | ||||||||||||||
| | 0 | 0 | 663 | 32 | 105 | 14 | 123 | 11 | 46 | 6 | 0 | 0 | 0 | |
| Sensitivity | - | - | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | - | - | - | 0.99 |
| PPV | - | - | 1.00 | 1.00 | 0.86 | 0.78 | 0.96 | 0.78 | 1.00 | 0.83 | - | - | - | 0.99 |
| 4000 cases | ||||||||||||||
| | 0 | 0 | 0 | 33 | 557 | 217 | 16 | 73 | 32 | 0 | 69 | 3 | 0 | |
| Sensitivity | - | - | - | 1.00 | 1.00 | 1.00 | 0.99 | 1.00 | 1.00 | - | 1.00 | 1.00 | - | 1.00 |
| PPV | - | - | - | 1.00 | 1.00 | 0.98 | 0.85 | 0.96 | 0.81 | - | 1.00 | 0.83 | - | 0.97 |
# of OMRCS, frequency chosen as the optimal maximum reported cluster size by the Gini coefficient among 1000 random data sets; PPV, positive predictive value.
Simulation results of cluster model 3 (15% and 20% of the total cases in each of two clusters slightly apart from each other).
Maximum reported cluster sizes chosen by the Gini coefficient at least once are only shown. Cells most chosen as the optimal maximum size are shaded in gray.
| Maximum reported cluster size | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| 15 | 20 | 25 | 30 | 35 | 40 | 45 | 50 | Default | |
| Circular shape | |||||||||
| | 942 | 57 | 0 | 0 | 0 | 1 | 0 | 0 | |
| Sensitivity | 1.00 | 1.00 | - | - | - | 1.00 | - | - | 1.00 |
| PPV | 1.00 | 1.00 | - | - | - | 0.38 | - | - | 0.49 |
| Elliptic shape | |||||||||
| | 940 | 49 | 0 | 0 | 0 | 11 | 0 | 0 | |
| Sensitivity | 1.00 | 1.00 | - | - | - | 1.00 | - | - | 1.00 |
| PPV | 1.00 | 1.00 | - | - | - | 0.68 | - | - | 0.66 |
# of OMRCS, frequency chosen as the optimal maximum reported cluster size by the Gini coefficient among 1000 random data sets; PPV, positive predictive value.
Fig 3Clusters with high rates of higher birth order category in Seoul, Korea identified using (a) the Gini coefficient (12% of MRCS) and (b) the default setting (50% of MRCS).
Cluster detection analysis results for birth order data in Seoul, Korea using the elliptic window shape (see Fig 2).
| MRCS | Cluster | LLR | |||
|---|---|---|---|---|---|
| 12% (chosen by Gini) | 1 | 3 | (4894, 3348, 720) | 19.23 | 0.001 |
| 2 | 3 | (5352, 3642, 737) | 14.54 | 0.001 | |
| 3 | 4 | (5554, 3287, 794) | 10.52 | 0.013 | |
| 4 | 2 | (5565, 3719, 729) | 8.95 | 0.024 | |
| 50% | 1 | 9 | (15806, 10493, 2192) | 40.00 | 0.001 |
| 2 | 3 | (5352, 3642, 737) | 14.54 | 0.001 |
# Districts, number of districts;
# Obs in each category, number of observations in each category; LLR, log-likelihood ratio.