| Literature DB >> 26820646 |
Yue Ma1, Fei Yin1, Tao Zhang1, Xiaohua Andrew Zhou2, Xiaosong Li1.
Abstract
Spatial scan statistics are widely used in various fields. The performance of these statistics is influenced by parameters, such as maximum spatial cluster size, and can be improved by parameter selection using performance measures. Current performance measures are based on the presence of clusters and are thus inapplicable to data sets without known clusters. In this work, we propose a novel overall performance measure called maximum clustering set-proportion (MCS-P), which is based on the likelihood of the union of detected clusters and the applied dataset. MCS-P was compared with existing performance measures in a simulation study to select the maximum spatial cluster size. Results of other performance measures, such as sensitivity and misclassification, suggest that the spatial scan statistic achieves accurate results in most scenarios with the maximum spatial cluster sizes selected using MCS-P. Given that previously known clusters are not required in the proposed strategy, selection of the optimal maximum cluster size with MCS-P can improve the performance of the scan statistic in applications without identified clusters.Entities:
Mesh:
Year: 2016 PMID: 26820646 PMCID: PMC4731069 DOI: 10.1371/journal.pone.0147918
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Simulated cluster models.
| Cluster size | Total simulated cases | 600 | 6000 | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Cluster location | Rural | Mixed | Urban | Two | Three | Rural | Mixed | Urban | Two | Three | |
| 1 | 10 | 39 | 42 | 52 | 91 | 13 | 208 | 226 | 239 | 447 | |
| 0.05 | 14.43 | 15.97 | 16.02 | 30.45 | 0.5 | 144.3 | 159.7 | 160.2 | 304.5 | ||
| RR | 192.89 | 2.85 | 2.73 | 3.24 | 2.99 | 23.73 | 1.45 | 1.43 | 1.51 | 1.51 | |
| Population | 2675 | 710196 | 786178 | 788853 | 1499049 | 2675 | 710196 | 786178 | 788853 | 1499049 | |
| 2 | 12 | 42 | 50 | 62 | 104 | 23 | 231 | 293 | 316 | 547 | |
| 0.46 | 16.41 | 21.78 | 22.24 | 38.65 | 4.6 | 164.1 | 217.8 | 222.4 | 386.5 | ||
| RR | 27.03 | 2.70 | 2.43 | 2.79 | 2.68 | 4.96 | 1.42 | 1.36 | 1.44 | 1.45 | |
| Population | 22911 | 817050 | 1072181 | 1095092 | 1912142 | 22911 | 817050 | 1072181 | 1095092 | 1912142 | |
| 4 | 18 | 51 | 100 | 118 | 169 | 59 | 302 | 716 | 775 | 1077 | |
| 2.69 | 22.52 | 59.99 | 62.68 | 85.2 | 26.9 | 225.2 | 599.9 | 626.8 | 852 | ||
| RR | 7.05 | 2.40 | 1.81 | 1.88 | 1.98 | 2.21 | 1.36 | 1.22 | 1.27 | 1.32 | |
| Population | 132343 | 1108440 | 2953077 | 3085420 | 4193860 | 132343 | 1108440 | 2953077 | 3085420 | 4193860 | |
| 8 | 22 | 58 | 150 | 172 | 230 | 80 | 358 | 1162 | 1242 | 1600 | |
| 4.16 | 27.47 | 101.96 | 106.12 | 133.59 | 41.6 | 275.7 | 1019.6 | 1061.2 | 1336.9 | ||
| RR | 5.35 | 2.24 | 1.63 | 1.62 | 1.72 | 1.92 | 1.32 | 1.17 | 1.21 | 1.27 | |
| Population | 204829 | 1352284 | 5018909 | 5223738 | 6576022 | 204829 | 1352284 | 5018909 | 5223738 | 6576022 | |
| 16 | 28 | 67 | 209 | 237 | 304 | 121 | 434 | 1713 | 1834 | 2268 | |
| 7.32 | 34.22 | 154.94 | 162.26 | 196.48 | 73.2 | 342.2 | 1549.4 | 1622.6 | 1964.8 | ||
| RR | 3.9 | 2.1 | 1.53 | 1.46 | 1.55 | 1.66 | 1.29 | 1.15 | 1.19 | 1.25 | |
| Population | 360275 | 1684327 | 7627173 | 7987448 | 9671775 | 360275 | 1684327 | 7627173 | 7987448 | 9671775 |
Note: E(c/H) and E(c/H) are the expected number of cases under the alternative and null hypotheses, respectively. RR is the relative risk.
Agreements of MCS-P with the other performance measures in different scenarios.
| Cluster(s) locations | Rural area | Mixed area | Urban area | Rural and urban areas | Rural, mixed and urban areas | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Cluster sizes | Total simulated cases | 600 | 6000 | 600 | 6000 | 600 | 6000 | 600 | 6000 | 600 | 6000 |
| 1 | Sensitivity | 0.9624 | 0.9678 | 0.9824 | 0.9658 | 0.9802 | 0.9810 | ||||
| PPV | 1.0000 | 0.9900 | 0.6728 | 0.7824 | 0.8828 | 0.8734 | |||||
| Misclassification | 1.0000 | 0.9836 | 0.8288 | 0.9034 | 0.9112 | 0.9386 | |||||
| 2 | Sensitivity | 0.9558 | 0.9494 | 0.9470 | 0.9566 | 0.9862 | 0.9856 | 0.6150 | 0.7908 | 0.8626 | |
| PPV | 0.9688 | 0.9900 | 0.7034 | 0.6758 | 0.9036 | 0.8668 | 0.6572 | 0.4590 | 0.4658 | ||
| Misclassification | 0.9664 | 0.9518 | 0.8620 | 0.8542 | 0.9066 | 0.8730 | 0.6652 | 0.7552 | 0.8092 | ||
| 4 | Sensitivity | 0.9928 | 0.9704 | 0.9322 | 0.9244 | 0.9346 | 0.9082 | 0.7326 | 0.7850 | 0.8418 | 0.8626 |
| PPV | 0.9706 | 0.9606 | 0.4798 | 0.5008 | 0.7498 | 0.7742 | 0.6762 | 0.7354 | 0.5296 | 0.5186 | |
| Misclassification | 0.9612 | 0.9328 | 0.8172 | 0.8316 | 0.8054 | 0.8182 | 0.7208 | 0.7756 | 0.8250 | 0.8338 | |
| 8 | Sensitivity | 0.9928 | 0.9432 | 0.9298 | 0.9488 | 0.9418 | 0.9342 | 0.8192 | 0.7738 | 0.8622 | 0.8564 |
| PPV | 0.9706 | 0.7952 | 0.3886 | 0.4826 | 0.6186 | 0.6016 | 0.5674 | 0.5958 | 0.6138 | 0.4810 | |
| Misclassification | 0.9612 | 0.8656 | 0.8130 | 0.8670 | 0.7584 | 0.7408 | 0.7580 | 0.6970 | 0.8190 | 0.8112 | |
| 16 | Sensitivity | 0.9738 | 0.9306 | 0.8946 | 0.9540 | 0.9466 | 0.9472 | 0.8238 | 0.8796 | 0.8402 | 0.8950 |
| PPV | 0.6316 | 0.7556 | 0.4442 | 0.4056 | 0.5532 | 0.5828 | 0.5114 | 0.5082 | 0.5834 | 0.5304 | |
| Misclassification | 0.9428 | 0.9096 | 0.8046 | 0.8594 | 0.7966 | 0.8318 | 0.7854 | 0.7960 | 0.8086 | 0.8958 | |
Note: Scenarios with low agreements of MCS-P with other performance measures are underlined.
Average performance measures for different maximum spatial cluster sizes in 6000-three-8.
| Maximum spatial cluster size | MCS-P | Sensitivity | PPV | Misclassification |
|---|---|---|---|---|
| 1 | 0.2039 | 0.0308 | 0.9157 | 0.2166 |
| 2 | 0.2074 | 0.0319 | 0.8905 | 0.2168 |
| 3 | 0.2448 | 0.0904 | 0.9251 | 0.2037 |
| 4 | 0.2705 | 0.1405 | 0.1929 | |
| 5 | 0.2771 | 0.1742 | 0.9258 | 0.1860 |
| 6 | 0.2770 | 0.1838 | 0.1850 | |
| 7 | 0.2796 | 0.1894 | 0.9215 | 0.1845 |
| 8 | 0.2957 | 0.2475 | 0.9356 | 0.1713 |
| 9 | 0.3034 | 0.2943 | 0.9363 | 0.1620 |
| 10 | 0.3127 | 0.3598 | 0.1470 | |
| 11 | 0.3141 | 0.3741 | 0.1439 | |
| 12 | 0.3174 | 0.3868 | 0.1414 | |
| 13 | 0.3174 | 0.3881 | 0.9399 | 0.1420 |
| 14 | 0.3178 | 0.4345 | 0.9401 | 0.1322 |
| 15 | 0.3256 | 0.5335 | 0.1108 | |
| 16 | 0.5632 | 0.1039 | ||
| 17 | 0.6113 | |||
| 18 | 0.6141 | |||
| 19 | 0.6232 | |||
| 20 | 0.6298 | |||
| 21 | 0.6297 | |||
| 22 | 0.6307 | 0.9345 | ||
| 23 | 0.9300 | |||
| 24 | 0.9193 | |||
| 25 | 0.9254 | |||
| 26 | 0.9177 | |||
| 27 | 0.9177 | |||
| 28 | 0.9172 | |||
| 29 | 0.9215 | |||
| 30 | 0.9205 | |||
| 31 | 0.9201 | |||
| 33–50 | 0.9193 |
Note: Values with a distance less than 0.01 (1%) from the optimal values are underlined. Boldface values are the optimal results of each performance measure.
Fig 1Average MCS-P and other measures in 6000-three-8.
Average performance measures for different maximum spatial cluster sizes in 600-two-1.
| Maximum spatial cluster size | MCS-P | PPV | Sensitivity | Misclassification |
|---|---|---|---|---|
| 1 | 0.0034 | 0.9516 | 0.0268 | |
| 2 | 0.0034 | 0.9710 | 0.0267 | |
| 3 | 0.4160 | 0.8704 | ||
| 4–7 | 0.4131 | 0.9700 | ||
| 8–50 | 0.4121 | 0.9634 |
Note: Values with a distance less than 0.01 (1%) from the optimal values are underlined. Boldface values are the optimal results of each performance measure.
Average performance measures for different maximum spatial cluster sizes in 6000-two-16.
| Maximum spatial cluster size | MCS-P | PPV | Sensitivity | Misclassification |
|---|---|---|---|---|
| 1 | 0.1851 | 0.0287 | 0.2630 | |
| 2 | 0.2061 | 0.9190 | 0.0381 | 0.2613 |
| 3 | 0.2107 | 0.9191 | 0.0418 | 0.2603 |
| 4 | 0.2127 | 0.9157 | 0.0489 | 0.2587 |
| 5 | 0.2189 | 0.9298 | 0.0644 | 0.2546 |
| 6 | 0.2202 | 0.9307 | 0.0730 | 0.2523 |
| 7 | 0.2217 | 0.9327 | 0.0804 | 0.2503 |
| 8 | 0.2264 | 0.9349 | 0.1042 | 0.2442 |
| 9 | 0.2269 | 0.9384 | 0.1252 | 0.2385 |
| 10 | 0.2245 | 0.9399 | 0.1507 | 0.2321 |
| 11 | 0.2259 | 0.9317 | 0.1667 | 0.2288 |
| 12 | 0.2288 | 0.9381 | 0.1932 | 0.2213 |
| 13 | 0.2296 | 0.9372 | 0.2246 | 0.2132 |
| 14 | 0.2320 | 0.9320 | 0.2496 | 0.2075 |
| 15 | 0.2362 | 0.9354 | 0.2969 | 0.1952 |
| 16 | 0.2403 | 0.9401 | 0.3506 | 0.1804 |
| 17 | 0.2427 | 0.9451 | 0.3832 | 0.1713 |
| 18 | 0.2475 | 0.9464 | 0.4132 | 0.1635 |
| 19 | 0.2514 | 0.9475 | 0.4764 | 0.1468 |
| 20 | 0.2541 | 0.9472 | 0.4889 | 0.1436 |
| 21 | 0.2570 | 0.9516 | 0.5425 | 0.1290 |
| 22 | 0.2618 | 0.9532 | 0.5704 | 0.1212 |
| 23 | 0.2640 | 0.6367 | 0.1027 | |
| 24 | 0.6720 | |||
| 25 | 0.7006 | |||
| 26 | 0.7068 | |||
| 27–28 | 0.7150 | |||
| 29 | 0.9457 | 0.7320 | ||
| 30 | 0.9425 | 0.7354 | ||
| 31 | 0.9363 | 0.7363 | ||
| 32 | 0.9348 | 0.7442 | ||
| 33 | 0.9321 | 0.7538 | ||
| 34 | 0.9311 | 0.7538 | ||
| 35 | 0.9312 | 0.7543 | ||
| 36 | 0.9276 | 0.7516 | ||
| 37 | 0.9277 | 0.7529 | ||
| 38 | 0.9249 | 0.7529 | ||
| 39–40 | 0.9254 | |||
| 41–50 | 0.9238 |
Note: Values with a distance less than 0.01 (1%) from the optimal values are underlined. Boldface values are the optimal results of each performance measure.
Fig 2Average MCS-P and other measures in 6000-two-16.
This figure shows two stages of the relationship between average MCS-P and the other performance measures. The vertical line shows the cut-off point where the first value close to the optimal results of MCS-P is achieved.
Average performance measures for different maximum spatial cluster sizes in 600-rural-1.
| Maximum spatial cluster size | MCS-P | PPV | Sensitivity | Misclassification |
|---|---|---|---|---|
| 1–2 | ||||
| 3 | ||||
| 4 | 0.960753 | |||
| 5 | 0.7432 | 0.940789 | ||
| 6 | 0.7404 | 0.9308 | ||
| 7–50 | 0.7447 | 0.940779 |
Note: Values with distance less than 0.01 (1%) from the optimal values are underlined. Boldface values are the optimal results of each performance measure.
Fig 3Incidence of measles in Henan in May 2009 and clusters detected with default (a) and selected maximum spatial cluster size using MCS-P (b).
Administrative codes of 42 clustering counties are labeled. RR of each counties are presented as bar.
Different counties in clusters detected using maximum spatial cluster sizes of 2% and 50%.
| Counties | Cases | Population | RR | Cluster in Z2 | Cluster in Z50 |
|---|---|---|---|---|---|
| 411326 | 2 | 665822 | 0.199677035 | n/a | 2 |
| 410326 | 2 | 406337 | 0.328122839 | n/a | 2 |
| 411524 | 4 | 570738 | 0.467054897 | n/a | 1 |
| 411729 | 8 | 955639 | 0.557153276 | n/a | 1 |
| 411727 | 9 | 799313 | 0.751226184 | n/a | 1 |
| 411323 | 5 | 420871 | 0.793592249 | n/a | 2 |
| 411724 | 9 | 709589 | 0.847050538 | n/a | 1 |
| 411381 | 19 | 1336473 | 0.949869751 | n/a | 2 |
| 411328 | 17 | 1193603 | 0.95170867 | n/a | 2 |
| 411303 | 12 | 839940 | 0.954863968 | n/a | 2 |
| 411329 | 12 | 642526 | 1.250955777 | n/a | 2 |
| 411526 | 14 | 692849 | 1.354691194 | n/a | 1 |
| 411322 | 20 | 913187 | 1.471270547 | n/a | 2 |
| 411527 | 13 | 580652 | 1.501736143 | n/a | 1 |
| 410327 | 20 | 677480 | 1.988301511 | n/a | 2 |
| 410423 | 26 | 859244 | 2.043008744 | n/a | 2 |
| 411321 | 23 | 586748 | 2.648641855 | n/a | 2 |
Counties in boldface are mentioned as examples.