Literature DB >> 35768746

A methodology for preprocessing structured big data in the behavioral sciences.

Paul A Brown1, Ricardo A Anderson2.   

Abstract

The characteristics of big data, including high volume, increased variety, and velocity, pose special challenges for data analysis. As these characteristics generally preclude manual data inspection and processing, researchers must often use computational methodologies to deal with this type of data; techniques that may be unfamiliar to nonspecialists, including behavioral scientists. However, previous data analytics methodologies within the field of computer science, developed to handle the generic tasks of data collection, preprocessing, and analysis, can be appropriated for use in other disciplines. These methodologies involve a sequential pipeline of quality checks to prepare data sets for analysis and application. Building upon these methodologies, this paper describes the Big Data Quality & Statistical Assurance (BDQSA) model, applicable for researchers in the behavioral sciences. It involves a series of data preprocessing tasks, to achieve data understanding, as well as data screening, cleaning, and transformation. These are followed by a statistical quality phase, which includes extraction of the relevant data subset, type conversions, ensuring sample representativeness when appropriate, and assessing statistical assumptions. The resulting model thereby provides methodological guidance for the preprocessing of behavioral science big data, aimed at ensuring acceptable data quality before analysis is undertaken. Sample R code snippets demonstrating the application of this model are provided throughout the paper.
© 2022. The Psychonomic Society, Inc.

Entities:  

Keywords:  Behavioral science research; Behavioral sciences; Big data; Data preprocessing; Personality big data

Year:  2022        PMID: 35768746     DOI: 10.3758/s13428-022-01895-4

Source DB:  PubMed          Journal:  Behav Res Methods        ISSN: 1554-351X


  23 in total

1.  Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires.

Authors:  Samuel D Gosling; Simine Vazire; Sanjay Srivastava; Oliver P John
Journal:  Am Psychol       Date:  2004 Feb-Mar

Review 2.  Review: a gentle introduction to imputation of missing values.

Authors:  A Rogier T Donders; Geert J M G van der Heijden; Theo Stijnen; Karel G M Moons
Journal:  J Clin Epidemiol       Date:  2006-07-11       Impact factor: 6.437

Review 3.  Missing data: a systematic review of how they are reported and handled.

Authors:  Iris Eekhout; R Michiel de Boer; Jos W R Twisk; Henrica C W de Vet; Martijn W Heymans
Journal:  Epidemiology       Date:  2012-09       Impact factor: 4.822

4.  Are cross-cultural comparisons of personality profiles meaningful? Differential item and facet functioning in the Revised NEO Personality Inventory.

Authors:  A Timothy Church; Juan M Alvarez; Nhu T Q Mai; Brian F French; Marcia S Katigbak; Fernando A Ortiz
Journal:  J Pers Soc Psychol       Date:  2011-11

5.  A practical guide to big data research in psychology.

Authors:  Eric Evan Chen; Sean P Wojcik
Journal:  Psychol Methods       Date:  2016-12

6.  Big data in psychology: Introduction to the special issue.

Authors:  Lisa L Harlow; Frederick L Oswald
Journal:  Psychol Methods       Date:  2016-12

7.  Comparing the similarity of responses received from studies in Amazon's Mechanical Turk to studies conducted online and with direct recruitment.

Authors:  Christoph Bartneck; Andreas Duenser; Elena Moltchanova; Karolina Zawieska
Journal:  PLoS One       Date:  2015-04-14       Impact factor: 3.240

8.  Analyzing Big Data in Psychology: A Split/Analyze/Meta-Analyze Approach.

Authors:  Mike W-L Cheung; Suzanne Jak
Journal:  Front Psychol       Date:  2016-05-23

9.  When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts.

Authors:  Janus Christian Jakobsen; Christian Gluud; Jørn Wetterslev; Per Winkel
Journal:  BMC Med Res Methodol       Date:  2017-12-06       Impact factor: 4.615

10.  Assessing the representativeness of population-sampled health surveys through linkage to administrative data on alcohol-related outcomes.

Authors:  Emma Gorman; Alastair H Leyland; Gerry McCartney; Ian R White; Srinivasa Vittal Katikireddi; Lisa Rutherford; Lesley Graham; Linsay Gray
Journal:  Am J Epidemiol       Date:  2014-09-16       Impact factor: 4.897

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.