| Literature DB >> 29568130 |
Carolina Plescia1, Lorenzo De Sio2.
Abstract
Ecological inference refers to the study of individuals using aggregate data and it is used in an impressive number of studies; it is well known, however, that the study of individuals using group data suffers from an ecological fallacy problem (Robinson in Am Sociol Rev 15:351-357, 1950). This paper evaluates the accuracy of two recent methods, the Rosen et al. (Stat Neerl 55:134-156, 2001) and the Greiner and Quinn (J R Stat Soc Ser A (Statistics in Society) 172:67-81, 2009) and the long-standing Goodman's (Am Sociol Rev 18:663-664, 1953; Am J Sociol 64:610-625, 1959) method designed to estimate all cells of R × C tables simultaneously by employing exclusively aggregate data. To conduct these tests we leverage on extensive electoral data for which the true quantities of interest are known. In particular, we focus on examining the extent to which the confidence intervals provided by the three methods contain the true values. The paper also provides important guidelines regarding the appropriate contexts for employing these models.Entities:
Keywords: Aggregate data; Ecological inference; R × C contingency tables; Split-ticket voting
Year: 2017 PMID: 29568130 PMCID: PMC5847155 DOI: 10.1007/s11135-017-0481-z
Source DB: PubMed Journal: Qual Quant ISSN: 0033-5177
Summary of country, between-districts and within-district variation
| Country | Year | No. of districts | No. of polling stations (range) | No. of parties (range) | No. of candidates (range) | Within-district average party variance (SDs) (range) |
|---|---|---|---|---|---|---|
| New Zealand | 2002 | 69 | 25–645 | 14 (7–8)a | 6–11 (3–7)a | 10.33 (9.99)–542.81 (306.92) |
| 2005 | 69 | 24–691 | 19 (7–8)a | 3–14 (3–8)a | 13.55 (13.20)–599.14 (299.91) | |
| 2008 | 70 | 27–681 | 19 (7–8)a | 2–14 (3–7)a | 11.45 (10.34)–417.39 (265.88) | |
| Scotland | 2007 | 73 | 22–103 | 16–25 (5–8)a | 5–8 (5–8)a | 15.48 (14.21)–226.57 (927.52) |
aNumbers in parenthesis represent No. of rows or columns in reduced forms
Percentage observations inside 95% confidence intervals and RMSE, summary
| EI-MD full | EI-MD reduced | EI-ML reduced | Goodman full | Goodman red | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| % CI | RMSE | % CI | RMSE | % CI | RMSE | % CI | RMSE | % CI | RMSE | |
|
| ||||||||||
| NZ 2002 | 39.8 | 0.157 | 40.9 | 0.157 | 29.9 | 0.238 | 28.8 | 0.227 | 25.9 | 0.226 |
| NZ 2005 | 29.7 | 0.168 | 29.7 | 0.159 | 31.5 | 0.277 | 27.3 | 0.331 | 21.0 | 0.289 |
| NZ 2008 | 32.3 | 0.147 | 35.5 | 0.137 | 26.4 | 0.263 | 29.8 | 0.285 | 23.6 | 0.286 |
| STD 2007 | 34.5 | 0.143 | 49.1 | 0.131 | 11.7 | 0.163 | 23.4 | 0.222 | 26.1 | 0.243 |
|
| ||||||||||
| Large | 30.9 | 0.126 | 40.6 | 0.121 | 17.2 | 0.171 | 17.9 | 0.187 | 13.5 | 0.195 |
| Small | 44.7 | 0.162 | 42.6 | 0.157 | 44.1 | 0.317 | 31.7 | 0.337 | 24.6 | 0.305 |
|
| ||||||||||
| <1 | 32.8 | 0.156 | 28.9 | 0.167 | 31.8 | 0.289 | 17.7 | 0.313 | 11.5 | 0.277 |
| 1 < R < 2 | 42.6 | 0.184 | 41.5 | 0.139 | 25.3 | 0.226 | 12.8 | 0.232 | 19.0 | 0.249 |
| >2 | 52.9 | 0.082 | 46.9 | 0.132 | 18.1 | 0.180 | 25.9 | 0.229 | 26.6 | 0.133 |
Recall that larger values of RMSE indicate less precise estimates. NZ stands for New Zealand. STD stands for Scotland. For the definition of large and small parties refer to footnote 3. R refers to the ratio calculated as the number of polling stations divided by the coefficients to be estimated. For the results of the EI-ML full refer to footnote 6
Predictors of reliable confidence intervals (logit regression)
| EI-MD full | EI-MD reduced | EI-ML reduced | Goodman full | Goodman red | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Model 1 | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | Model 8 | Model 9 | Model 10 | |
| No. of columns | −0.000 | −0.016 | 0.055 | −0.062 | −0.048 | |||||
| No. of rows | −0.020 | −0.276* | −0.448*** | −0.042 | −0.324* | |||||
| No. polling stations | 0.003* | 0.004* | 0.001 | 0.010* | 0.012* | |||||
| Variance | −0.617 | −1.124*** | −3.704* | −5.309** | −0.138 | −2.662 | −0.487 | −0.991 | −1.157 | −1.932 |
| Ratio | 0.180** | 0.104*** | 0.046 | 1.297** | 0.384* | |||||
| Constant | −0.454 | −0.641** | 1.639* | −0.140 | −4.494*** | −1.254*** | −1.654** | −0.606* | −2.875*** | −1.087** |
|
| 1302 | 1302 | 1302 | 1302 | 1177 | 1177 | 1260 | 1260 | 1260 | 1260 |
| Nagelkerke | 0.14 | 0.09 | 0.39 | 0.24 | 0.26 | 0.04 | 0.54 | 0.45 | 0.40 | 0.32 |
| AIC | 1671.448 | 1671.952 | 1701.950 | 1712.235 | 1333.634 | 1347.466 | 1382.122 | 1386.215 | 1216.894 | 1218.872 |
| LL | −830.724 | −832.976 | −845.975 | −853.117 | −661.817 | −670.733 | −686.061 | −690.107 | −603.447 | −606.436 |
Standard errors, clustered by district, in parentheses: * p < 0.05; ** p < 0.01; *** p < 0.001. Adding fixed effect by country leads to a slight increase in the predicted power of the models but do not change substantive conclusions