Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research.

Literature DB >> 27213980

A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research.

Richard N Landers¹, Robert C Brusso², Katelyn J Cavanaugh¹, Andrew B Collmus¹.

Abstract

The term big data encompasses a wide range of approaches of collecting and analyzing data in ways that were not possible before the era of modern personal computing. One approach to big data of great potential to psychologists is web scraping, which involves the automated collection of information from webpages. Although web scraping can create massive big datasets with tens of thousands of variables, it can also be used to create modestly sized, more manageable datasets with tens of variables but hundreds of thousands of cases, well within the skillset of most psychologists to analyze, in a matter of hours. In this article, we demystify web scraping methods as currently used to examine research questions of interest to psychologists. First, we introduce an approach called theory-driven web scraping in which the choice to use web-based big data must follow substantive theory. Second, we introduce data source theories, a term used to describe the assumptions a researcher must make about a prospective big data source in order to meaningfully scrape data from it. Critically, researchers must derive specific hypotheses to be tested based upon their data source theory, and if these hypotheses are not empirically supported, plans to use that data source should be changed or eliminated. Third, we provide a case study and sample code in Python demonstrating how web scraping can be conducted to collect big data along with links to a web tutorial designed for psychologists. Fourth, we describe a 4-step process to be followed in web scraping projects. Fifth and finally, we discuss legal, practical and ethical concerns faced when conducting web scraping projects. (PsycINFO Database Record (c) 2016 APA, all rights reserved).

Mesh：

Year: 2016 PMID： 27213980 DOI： 10.1037/met0000081

Source DB: PubMed Journal: Psychol Methods ISSN： 1082-989X

Keyword Cloud
Cited

7 in total

1. A computational reward learning account of social media engagement.

Authors: Björn Lindström; Martin Bellander; David T Schultner; Allen Chang; Philippe N Tobler; David M Amodio
Journal: Nat Commun Date: 2021-02-26 Impact factor: 14.919

2. Temporal trends in incidence of time-loss injuries in four male professional North American sports over 13 seasons.

Authors: Garrett S Bullock; Elizabeth Murray; Jake Vaughan; Stefan Kluzek
Journal: Sci Rep Date: 2021-04-15 Impact factor: 4.379

3. Analysis of mental and physical disorders associated with COVID-19 in online health forums: a natural language processing study.

Authors: Rashmi Patel; Fabrizio Smeraldi; Maryam Abdollahyan; Jessica Irving; Conrad Bessant
Journal: BMJ Open Date: 2021-11-05 Impact factor: 2.692

4. Visualization analysis of junior school students' pubertal timing and social adaptability using data mining approaches.

Authors: Youzhong Ma; Ruiling Zhang; Yongxin Zhang
Journal: Heliyon Date: 2022-08-28

5. Big data hurdles in precision medicine and precision public health.

Authors: Mattia Prosperi; Jae S Min; Jiang Bian; François Modave
Journal: BMC Med Inform Decis Mak Date: 2018-12-29 Impact factor: 2.796

6. Characterization of Rookie Season Injury and Illness and Career Longevity Among National Basketball Association Players.

Authors: Chelsea L Martin; Amelia J H Arundale; Stefan Kluzek; Tyler Ferguson; Gary S Collins; Garrett S Bullock
Journal: JAMA Netw Open Date: 2021-10-01

7. Temporal Trends and Severity in Injury and Illness Incidence in the National Basketball Association Over 11 Seasons.

Authors: Garrett S Bullock; Tyler Ferguson; Jake Vaughan; Desiree Gillespie; Gary Collins; Stefan Kluzek
Journal: Orthop J Sports Med Date: 2021-06-14

7 in total