| Literature DB >> 33265554 |
Marianthi Markatou1, Yang Chen1.
Abstract
One natural way to measure model adequacy is by using statistical distances as loss functions. A related fundamental question is how to construct loss functions that are scientifically and statistically meaningful. In this paper, we investigate non-quadratic distances and their role in assessing the adequacy of a model and/or ability to perform model selection. We first present the definition of a statistical distance and its associated properties. Three popular distances, total variation, the mixture index of fit and the Kullback-Leibler distance, are studied in detail, with the aim of understanding their properties and potential interpretations that can offer insight into their performance as measures of model misspecification. A small simulation study exemplifies the performance of these measures and their application to different scientific fields is briefly discussed.Entities:
Keywords: Kullback-Leibler distance; divergence measure; mixture index of fit; model assessment; non-quadratic distance; statistical distance; total variation
Year: 2018 PMID: 33265554 PMCID: PMC7512982 DOI: 10.3390/e20060464
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Computer packages for calculating total variation, mixture index of fit, and Kullback- Leibler distances.
| Information | Total Variation | Kullback-Leibler | Mixture Index of Fit |
|---|---|---|---|
| R package | distrEx | bioDist | pistar |
| R function | TotalVarDist | KLD.matrix | pistar.uv |
| Dimension | Univariate | Univariate | Univariate |
| Website |
|
|
|
Means and standard deviations (SD) of the total variation (TV) and Kullback-Leibler (KLD) distances. Data are generated from the model with . The sample size n is 200, 1000, 5000. The number of Monte Carlo replications is 500.
| Contaminating Model | Percentage of Contamination ( | Summary |
|
|
| |||
|---|---|---|---|---|---|---|---|---|
| TV | KLD | TV | KLD | TV | KLD | |||
| 0.01 | Mean | 0.144 | 0.224 | 0.065 | 0.048 | 0.029 | 0.008 | |
| SD | 0.017 | 0.244 | 0.007 | 0.051 | 0.004 | 0.009 | ||
| 0.05 | Mean | 0.146 | 0.255 | 0.069 | 0.065 | 0.034 | 0.017 | |
| SD | 0.017 | 0.267 | 0.009 | 0.059 | 0.004 | 0.015 | ||
| 0.1 | Mean | 0.149 | 0.323 | 0.076 | 0.088 | 0.047 | 0.026 | |
| SD | 0.017 | 0.343 | 0.009 | 0.073 | 0.005 | 0.018 | ||
| 0.2 | Mean | 0.162 | 0.482 | 0.097 | 0.147 | 0.081 | 0.059 | |
| SD | 0.020 | 0.462 | 0.011 | 0.123 | 0.006 | 0.030 | ||
| 0.3 | Mean | 0.181 | 0.616 | 0.128 | 0.215 | 0.117 | 0.102 | |
| SD | 0.022 | 0.528 | 0.013 | 0.150 | 0.007 | 0.044 | ||
| 0.4 | Mean | 0.201 | 0.733 | 0.162 | 0.293 | 0.155 | 0.153 | |
| SD | 0.024 | 0.616 | 0.014 | 0.176 | 0.007 | 0.058 | ||
| 0.5 | Mean | 0.232 | 0.937 | 0.198 | 0.392 | 0.192 | 0.207 | |
| SD | 0.026 | 0.735 | 0.014 | 0.203 | 0.007 | 0.067 | ||
| 0.01 | Mean | 0.149 | 0.577 | 0.070 | 0.338 | 0.034 | 0.231 | |
| SD | 0.017 | 0.373 | 0.008 | 0.131 | 0.004 | 0.063 | ||
| 0.05 | Mean | 0.167 | 1.416 | 0.092 | 1.041 | 0.060 | 0.838 | |
| SD | 0.020 | 0.499 | 0.009 | 0.248 | 0.004 | 0.138 | ||
| 0.1 | Mean | 0.196 | 2.392 | 0.126 | 2.002 | 0.103 | 1.731 | |
| SD | 0.020 | 0.609 | 0.010 | 0.335 | 0.004 | 0.219 | ||
| 0.2 | Mean | 0.259 | 4.841 | 0.210 | 4.404 | 0.199 | 3.947 | |
| SD | 0.023 | 0.941 | 0.012 | 0.512 | 0.006 | 0.383 | ||
| 0.3 | Mean | 0.336 | 7.924 | 0.302 | 7.305 | 0.297 | 6.652 | |
| SD | 0.028 | 1.182 | 0.014 | 0.730 | 0.007 | 0.569 | ||
| 0.4 | Mean | 0.419 | 11.317 | 0.398 | 10.655 | 0.396 | 9.843 | |
| SD | 0.031 | 1.388 | 0.016 | 0.863 | 0.006 | 0.792 | ||
| 0.5 | Mean | 0.506 | 15.045 | 0.495 | 14.443 | 0.494 | 13.573 | |
| SD | 0.035 | 1.768 | 0.016 | 1.027 | 0.007 | 0.999 | ||
| 0.01 | Mean | 0.149 | 0.352 | 0.070 | 0.129 | 0.034 | 0.082 | |
| SD | 0.017 | 0.275 | 0.008 | 0.071 | 0.004 | 0.024 | ||
| 0.05 | Mean | 0.169 | 0.862 | 0.094 | 0.713 | 0.061 | 0.705 | |
| SD | 0.018 | 0.408 | 0.009 | 0.178 | 0.004 | 0.093 | ||
| 0.1 | Mean | 0.197 | 1.898 | 0.128 | 1.850 | 0.105 | 1.854 | |
| SD | 0.020 | 0.593 | 0.010 | 0.261 | 0.004 | 0.132 | ||
| 0.2 | Mean | 0.259 | 4.685 | 0.211 | 4.640 | 0.202 | 4.638 | |
| SD | 0.026 | 0.968 | 0.013 | 0.423 | 0.006 | 0.253 | ||
| 0.3 | Mean | 0.340 | 8.393 | 0.305 | 8.055 | 0.300 | 7.909 | |
| SD | 0.029 | 1.391 | 0.014 | 0.631 | 0.007 | 0.388 | ||
| 0.4 | Mean | 0.420 | 12.209 | 0.402 | 11.846 | 0.401 | 11.653 | |
| SD | 0.031 | 1.433 | 0.014 | 0.657 | 0.007 | 0.448 | ||
| 0.5 | Mean | 0.515 | 16.544 | 0.503 | 16.041 | 0.501 | 15.841 | |
| SD | 0.032 | 1.499 | 0.016 | 0.730 | 0.007 | 0.432 | ||
Means and standard deviations (SD) of the total variation (TV) and Kullback-Leibler (KLD) distances. Data are generated from the model with . The sample size n is 200, 1000, 5000. The number of Monte Carlo replications is 500.
| Contaminating Model | Percentage of Contamination ( | Summary |
|
|
| |||
|---|---|---|---|---|---|---|---|---|
| TV | KLD | TV | KLD | TV | KLD | |||
| 0.01 | Mean | 0.145 | 0.263 | 0.066 | 0.068 | 0.030 | 0.021 | |
| SD | 0.017 | 0.250 | 0.008 | 0.058 | 0.003 | 0.014 | ||
| 0.05 | Mean | 0.147 | 0.497 | 0.069 | 0.204 | 0.034 | 0.079 | |
| SD | 0.017 | 0.391 | 0.008 | 0.130 | 0.004 | 0.036 | ||
| 0.1 | Mean | 0.154 | 0.778 | 0.076 | 0.368 | 0.044 | 0.181 | |
| SD | 0.018 | 0.527 | 0.008 | 0.168 | 0.004 | 0.062 | ||
| 0.2 | Mean | 0.166 | 1.275 | 0.094 | 0.712 | 0.071 | 0.426 | |
| SD | 0.020 | 0.639 | 0.010 | 0.255 | 0.005 | 0.108 | ||
| 0.3 | Mean | 0.182 | 1.797 | 0.118 | 1.067 | 0.101 | 0.671 | |
| SD | 0.021 | 0.738 | 0.012 | 0.324 | 0.006 | 0.158 | ||
| 0.4 | Mean | 0.201 | 2.320 | 0.144 | 1.407 | 0.133 | 0.924 | |
| SD | 0.021 | 0.875 | 0.012 | 0.403 | 0.006 | 0.198 | ||
| 0.5 | Mean | 0.220 | 2.766 | 0.173 | 1.755 | 0.164 | 1.164 | |
| SD | 0.025 | 0.932 | 0.013 | 0.450 | 0.006 | 0.219 | ||
| 0.01 | Mean | 0.146 | 0.369 | 0.067 | 0.122 | 0.031 | 0.046 | |
| SD | 0.018 | 0.348 | 0.007 | 0.089 | 0.003 | 0.022 | ||
| 0.05 | Mean | 0.154 | 0.839 | 0.074 | 0.490 | 0.040 | 0.321 | |
| SD | 0.017 | 0.477 | 0.008 | 0.187 | 0.004 | 0.081 | ||
| 0.1 | Mean | 0.164 | 1.414 | 0.087 | 0.945 | 0.058 | 0.661 | |
| SD | 0.018 | 0.602 | 0.009 | 0.256 | 0.005 | 0.120 | ||
| 0.2 | Mean | 0.189 | 2.529 | 0.120 | 1.748 | 0.101 | 1.300 | |
| SD | 0.021 | 0.801 | 0.011 | 0.366 | 0.005 | 0.188 | ||
| 0.3 | Mean | 0.216 | 3.529 | 0.161 | 2.526 | 0.149 | 1.954 | |
| SD | 0.023 | 0.957 | 0.012 | 0.466 | 0.006 | 0.276 | ||
| 0.4 | Mean | 0.252 | 4.608 | 0.205 | 3.444 | 0.196 | 2.660 | |
| SD | 0.026 | 1.071 | 0.014 | 0.549 | 0.006 | 0.339 | ||
| 0.5 | Mean | 0.286 | 5.630 | 0.250 | 4.289 | 0.244 | 3.423 | |
| SD | 0.026 | 1.123 | 0.014 | 0.657 | 0.007 | 0.406 | ||
| 0.01 | Mean | 0.146 | 0.429 | 0.067 | 0.166 | 0.031 | 0.078 | |
| SD | 0.016 | 0.374 | 0.007 | 0.100 | 0.003 | 0.032 | ||
| 0.05 | Mean | 0.156 | 1.073 | 0.078 | 0.716 | 0.044 | 0.511 | |
| SD | 0.017 | 0.514 | 0.008 | 0.203 | 0.004 | 0.088 | ||
| 0.1 | Mean | 0.169 | 1.774 | 0.094 | 1.281 | 0.066 | 0.981 | |
| SD | 0.019 | 0.606 | 0.008 | 0.277 | 0.005 | 0.142 | ||
| 0.2 | Mean | 0.200 | 3.160 | 0.137 | 2.383 | 0.120 | 1.927 | |
| SD | 0.021 | 0.800 | 0.011 | 0.408 | 0.005 | 0.218 | ||
| 0.3 | Mean | 0.239 | 4.471 | 0.187 | 3.532 | 0.177 | 2.937 | |
| SD | 0.025 | 1.045 | 0.013 | 0.485 | 0.006 | 0.278 | ||
| 0.4 | Mean | 0.280 | 5.812 | 0.242 | 4.822 | 0.235 | 4.044 | |
| SD | 0.026 | 1.125 | 0.014 | 0.589 | 0.007 | 0.355 | ||
| 0.5 | Mean | 0.331 | 7.537 | 0.298 | 6.145 | 0.293 | 5.218 | |
| SD | 0.029 | 1.274 | 0.015 | 0.693 | 0.007 | 0.433 | ||
Means and standard deviations (SD) for the mixture index of fit. Data are generated from an asymmetric contamination model of the form , with sample sizes, n, of 1000, 5000. The number of Monte Carlo replications is 500.
| Percentage of Contamination | Summary |
|
|
| |||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||
| 0.1 | Mean | 0.180 | 0.160 | 0.223 | 0.213 | 0.837 | 0.934 |
| SD | 0.045 | 0.044 | 0.041 | 0.040 | 0.279 | 0.198 | |
| 0.2 | Mean | 0.184 | 0.172 | 0.288 | 0.287 | 0.433 | 0.521 |
| SD | 0.044 | 0.042 | 0.036 | 0.036 | 0.144 | 0.240 | |
| 0.3 | Mean | 0.189 | 0.179 | 0.344 | 0.346 | 0.314 | 0.317 |
| SD | 0.047 | 0.039 | 0.028 | 0.024 | 0.016 | 0.012 | |
| 0.4 | Mean | 0.194 | 0.186 | 0.436 | 0.436 | 0.410 | 0.413 |
| SD | 0.044 | 0.034 | 0.026 | 0.021 | 0.017 | 0.011 | |
| 0.5 | Mean | 0.194 | 0.185 | 0.529 | 0.533 | 0.511 | 0.512 |
| SD | 0.047 | 0.035 | 0.024 | 0.020 | 0.017 | 0.010 | |