| Literature DB >> 32939153 |
Alina Ristea1,2, Mohammad Al Boni3, Bernd Resch4,5, Matthew S Gerber6, Michael Leitner1,7.
Abstract
Sporting events attract high volumes of people, which in turn leads to increased use of social media. In addition, research shows that sporting events may trigger violent behavior that can lead to crime. This study analyses the spatial relationships between crime occurrences, demographic, socio-economic and environmental variables, together with geo-located Twitter messages and their 'violent' subsets. The analysis compares basketball and hockey game days and non-game days. Moreover, this research aims to analyze crime prediction models using historical crime data as a basis and then introducing tweets and additional variables in their role as covariates of crime. First, this study investigates the spatial distribution of and correlation between crime and tweets during the same temporal periods. Feature selection models are applied in order to identify the best explanatory variables. Then, we apply localized kernel density estimation model for crime prediction during basketball and hockey games, and on non-game days. Findings from this study show that Twitter data, and a subset of violent tweets, are useful in building prediction models for the seven investigated crime types for home and away sporting events, and non-game days, with different levels of improvement.Entities:
Keywords: Crime prediction; local kernel density estimation; violent tweets
Year: 2020 PMID: 32939153 PMCID: PMC7455052 DOI: 10.1080/13658816.2020.1719495
Source DB: PubMed Journal: Int J Geogr Inf Sci ISSN: 1365-8816 Impact factor: 4.186
Reported crime records in Chicago, Illinois, US Chicago police department’s CLEAR (counts for the five bins used in this study) *relative total includes just the seven crime types considered in this study; the real value includes 30 crime types.
| Crime Type | Frequency |
|---|---|
| Assault | 6,446 (8.32%) |
| Battery | 18,875 (24.35%) |
| Criminal damage | 11,010 (14.20%) |
| Motor vehicle theft | 4,837 (6.24%) |
| Other offense | 6,978 (9.00%) |
| Robbery | 4,046 (5.22%) |
| Theft | 25,313 (32.66%) |
| Relative Total* | 77,505 (100%) |
Day selection for the five bins.
| Bulls | Blackhawks | ||||
|---|---|---|---|---|---|
| home | away | home | away | Control days | |
| Mon-Thu | 20 | 19 | 17 | 20 | 19 |
| Fri-Sun | 10 | 11 | 13 | 10 | 11 |
Figure 1.Feature selection using random forest.
Figure 2.Data used in this case study.
Summary of predictor variables used in this analysis.
| Crime history variables: |
| Historical crime data before prediction day for the five bins |
| Demographic variables: |
| Population at crime risk (different for the five bins), residential population; population white, population black or African American, population Asian, population 62 years and over, foreign-born (%), 25 years and over high school or General Educational Development, total 25 years and over, 25 years and over less than high school, 25 years and over some college, foreign-born, household with individuals under 18 years, population 18 years and over total, households by type: non-family, households by type: husband-wife family, Bachelor’s or higher studies (%), 25 years and over bachelor’s degree or higher, Hispanic or Latino (of any race), average household size of occupied housing units by tenure: owner-occupied, average household size of occupied housing units by tenure: renter-occupied, median age by sex for both sexes |
| Socio-economic variables: |
| Vacant housing units, homeowner vacancy rate (%), unemployed, households below poverty (%), below the poverty level (%), rental vacancy rate (%), occupied housing units, hardship index, income per capita, the price per person for Airbnb, Airbnb locations |
| Environmental variables: |
| restaurants, bars, bus stops, buildings, bike racks, transportation routes: density |
| stadium: distance |
| Dynamic variables: |
| Geo-located Twitter data for the five bins: density and distance |
| Violent Tweets for the five bins: density and distance |
Figure 5.Density distribution of crimes around the venue, where gray squares represent areas with similar crime densities, and brown square with higher crime density during game days; red dots circle is the 1km buffer around the venue.
Figure 7.Density distribution of violent tweets around the venue; red dots circle is the 1km buffer around the venue.
Figure 3.Prediction models design.
Figure 4.Example of surveillance plots for other offense for two different prediction days, home games Chicago Bulls.
Figure 6.Density distribution of geo-located tweets around the venue; red dots circle is the 1km buffer around the venue.
Figure 8.Moran’s I index for aggregated and disaggregated crime types, tweets and violent tweets; unit of analysis is the city of Chicago and a 1km buffer around united center.
Figure 9.The sliding window prediction approach.
Figure 10.AUC and AUC improvement for the seven crime types *AUC real values are presented only for assault in order to save space.
Figure 11.AUC and AUC improvement for the five bins *AUC real values are presented only for home games Chicago bulls in order to save space.