Literature DB >> 34843519

Polling India via regression and post-stratification of non-probability online samples.

Roberto Cerina1, Raymond Duch2.   

Abstract

Recent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modified MrP model of Indian vote preferences that proposes innovations to each of its three core components: stratification frame, training data, and a learner. For the post-stratification frame we propose a novel Data Integration approach that allows the simultaneous estimation of counts from multiple complementary sources, such as census tables and auxiliary surveys. For the training data we assemble panels of respondents from two unorthodox online populations: Amazon Mechanical Turks workers and Facebook users. And as a modeling tool, we replace the Bayesian multilevel regression learner with Random Forests. Our 2019 pre-election forecasts for the two largest Lok Sahba coalitions were very close to actual outcomes: we predicted 41.8% for the NDA, against an observed value of 45.0% and 30.8% for the UPA against an observed vote share of just under 31.3%. Our uniform-swing seat projection outperforms other pollsters-we had the lowest absolute error of 89 seats (along with a poll from 'Jan Ki Baat'); the lowest error on the NDA-UPA lead (a mere 8 seats), and we are the only pollster that can capture real-time preference shifts due to salient campaign events.

Entities:  

Mesh:

Year:  2021        PMID: 34843519      PMCID: PMC8629219          DOI: 10.1371/journal.pone.0260092

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


  7 in total

1.  MissForest--non-parametric missing value imputation for mixed-type data.

Authors:  Daniel J Stekhoven; Peter Bühlmann
Journal:  Bioinformatics       Date:  2011-10-28       Impact factor: 6.937

2.  Permutation importance: a corrected feature importance measure.

Authors:  André Altmann; Laura Toloşi; Oliver Sander; Thomas Lengauer
Journal:  Bioinformatics       Date:  2010-04-12       Impact factor: 6.937

3.  Election polling errors across time and space.

Authors:  Will Jennings; Christopher Wlezien
Journal:  Nat Hum Behav       Date:  2018-03-12

4.  Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife.

Authors:  Stefan Wager; Trevor Hastie; Bradley Efron
Journal:  J Mach Learn Res       Date:  2014-01       Impact factor: 3.654

5.  Probability machines: consistent probability estimation using nonparametric learning machines.

Authors:  J D Malley; J Kruppa; A Dasgupta; K G Malley; A Ziegler
Journal:  Methods Inf Med       Date:  2011-09-14       Impact factor: 2.176

6.  A Nonparametric, Multiple Imputation-Based Method for the Retrospective Integration of Data Sets.

Authors:  Madeline M Carrig; Daniel Manrique-Vallier; Krista W Ranby; Jerome P Reiter; Rick H Hoyle
Journal:  Multivariate Behav Res       Date:  2015       Impact factor: 5.923

7.  Conditional variable importance for random forests.

Authors:  Carolin Strobl; Anne-Laure Boulesteix; Thomas Kneib; Thomas Augustin; Achim Zeileis
Journal:  BMC Bioinformatics       Date:  2008-07-11       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.