Literature DB >> 27447254

Predictive Modeling With Big Data: Is Bigger Really Better?

Enric Junqué de Fortuny, David Martens, Foster Provost.   

Abstract

With the increasingly widespread collection and processing of "big data," there is natural interest in using these data assets to improve decision making. One of the best understood ways to use data to improve decision making is via predictive analytics. An important, open question is: to what extent do larger data actually lead to better predictive models? In this article we empirically demonstrate that when predictive models are built from sparse, fine-grained data-such as data on low-level human behavior-we continue to see marginal increases in predictive performance even to very large scale. The empirical results are based on data drawn from nine different predictive modeling applications, from book reviews to banking transactions. This study provides a clear illustration that larger data indeed can be more valuable assets for predictive analytics. This implies that institutions with larger data assets-plus the skill to take advantage of them-potentially can obtain substantial competitive advantage over institutions without such access or skill. Moreover, the results suggest that it is worthwhile for companies with access to such fine-grained data, in the context of a key predictive task, to gather both more data instances and more possible data features. As an additional contribution, we introduce an implementation of the multivariate Bernoulli Naïve Bayes algorithm that can scale to massive, sparse data.

Entities:  

Year:  2013        PMID: 27447254     DOI: 10.1089/big.2013.0037

Source DB:  PubMed          Journal:  Big Data        ISSN: 2167-6461            Impact factor:   2.128


  11 in total

1.  Enhancing Transparency and Control When Drawing Data-Driven Inferences About Individuals.

Authors:  Daizhuo Chen; Samuel P Fraiberger; Robert Moakler; Foster Provost
Journal:  Big Data       Date:  2017-09       Impact factor: 2.128

2.  Electronic Health Record-Based Screening for Substance Abuse.

Authors:  Farrokh Alemi; Sanja Avramovic; Mark D Schwartz
Journal:  Big Data       Date:  2018-09-19       Impact factor: 2.128

Review 3.  State of the art review: the data revolution in critical care.

Authors:  Marzyeh Ghassemi; Leo Anthony Celi; David J Stone
Journal:  Crit Care       Date:  2015-03-16       Impact factor: 9.097

4.  Cluster-Based Analysis of Infectious Disease Occurrences Using Tensor Decomposition: A Case Study of South Korea.

Authors:  Seungwon Jung; Jaeuk Moon; Eenjun Hwang
Journal:  Int J Environ Res Public Health       Date:  2020-07-06       Impact factor: 3.390

5.  Predicting hospital admission at emergency department triage using machine learning.

Authors:  Woo Suk Hong; Adrian Daniel Haimovich; R Andrew Taylor
Journal:  PLoS One       Date:  2018-07-20       Impact factor: 3.240

6.  Intelligent Sensing to Inform and Learn (InSTIL): A Scalable and Governance-Aware Platform for Universal, Smartphone-Based Digital Phenotyping for Research and Clinical Applications.

Authors:  Scott Barnett; Kit Huckvale; Helen Christensen; Svetha Venkatesh; Kon Mouzakis; Rajesh Vasa
Journal:  J Med Internet Res       Date:  2019-11-06       Impact factor: 5.428

Review 7.  Using data mining techniques to fight and control epidemics: A scoping review.

Authors:  Reza Safdari; Sorayya Rezayi; Soheila Saeedi; Mozhgan Tanhapour; Marsa Gholamzadeh
Journal:  Health Technol (Berl)       Date:  2021-05-07

8.  Towards teaching analytics: a contextual model for analysis of students' evaluation of teaching through text mining and machine learning classification.

Authors:  Kingsley Okoye; Arturo Arrona-Palacios; Claudia Camacho-Zuñiga; Joaquín Alejandro Guerra Achem; Jose Escamilla; Samira Hosseini
Journal:  Educ Inf Technol (Dordr)       Date:  2021-10-11

9.  Anticipating human resilience and vulnerability on the path to 2030: What can we learn from COVID-19?

Authors:  Stefano Armenia; Steven Arquitt; Matteo Pedercini; Alessandro Pompei
Journal:  Futures       Date:  2022-04-01

10.  Deep learning classification of lung cancer histology using CT images.

Authors:  Tafadzwa L Chaunzwa; Ahmed Hosny; Yiwen Xu; Andrea Shafer; Nancy Diao; Michael Lanuti; David C Christiani; Raymond H Mak; Hugo J W L Aerts
Journal:  Sci Rep       Date:  2021-03-09       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.