| Literature DB >> 36105922 |
Yiyi Chen1,2, Harald Sack1,2, Mehwish Alam1,2.
Abstract
Among other ways of expressing opinions on media such as blogs, and forums, social media (such as Twitter) has become one of the most widely used channels by populations for expressing their opinions. With an increasing interest in the topic of migration in Europe, it is important to process and analyze these opinions. To this end, this study aims at measuring the public attitudes toward migration in terms of sentiments and hate speech from a large number of tweets crawled on the decisive topic of migration. This study introduces a knowledge base (KB) of anonymized migration-related annotated tweets termed as MigrationsKB (MGKB). The tweets from 2013 to July 2021 in the European countries that are hosts of immigrants are collected, pre-processed, and filtered using advanced topic modeling techniques. BERT-based entity linking and sentiment analysis, complemented by attention-based hate speech detection, are performed to annotate the curated tweets. Moreover, external databases are used to identify the potential social and economic factors causing negative public attitudes toward migration. The analysis aligns with the hypothesis that the countries with more migrants have fewer negative and hateful tweets. To further promote research in the interdisciplinary fields of social sciences and computer science, the outcomes are integrated into MGKB, which significantly extends the existing ontology to consider the public attitudes toward migrations and economic indicators. This study further discusses the use-cases and exploitation of MGKB. Finally, MGKB is made publicly available, fully supporting the FAIR principles.Entities:
Keywords: Hate speech detection; Immigration attitudes; Knowledge base; Public attitudes; Social media analysis
Year: 2022 PMID: 36105922 PMCID: PMC9463678 DOI: 10.1007/s13278-022-00915-7
Source DB: PubMed Journal: Soc Netw Anal Min
Fig. 8The trend of Negative sentiments and Hate Speech against immigrants/refugees in (left) all the identified destination countries and (right) the UK from 2013 to July 2021 (Green: youth unemployment rate (%), orange: total unemployment rate (%), blue: real GDP growth rate (%), red: negative tweets (%), purple: hate speech tweets (%)). (Color figure online)
Fig. 5Temporal distribution of the sentiments of the public toward migrations. (Left) shows the sentiments of the people toward migrations in the UK, and (right) shows the sentiments for all 11 destination countries in Europe.
Fig. 1Pipeline for constructing MGKB
Statistics of crawled and preprocessed tweets
| Unique | Germany | Spain | Poland | France | Sweden | UK | Austria | Hungary | Switzerland | Netherlands | Italy | SUM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Crawled | 26,892 | 21,392 | 6187 | 29,049 | 7556 | 265,448 | 6394 | 3355 | 12,062 | 16,095 | 30,023 | 424,453 |
| Pre-processed | 25,498 | 20,240 | 5764 | 26,514 | 7263 | 248,580 | 6027 | 3226 | 11,658 | 15,346 | 27,223 | 397,423 |
Results of ETM with different numbers of topics
| # Topics | 25 | 50 | 75 | 100 |
|---|---|---|---|---|
| Val PPL | 3329 | 3015 | 2920 | |
| Best epoch | 185 | 172 | 176 | 178 |
| Topic coherence | 0.0744 | 0.0506 | 0.02 | |
| Topic diversity | 0.9288 | 0.9056 | 0.7832 | |
| Topic quality | 0.0460 | 0.0157 | ||
| Classified Nr. of topics | 25 | 50 | 75 | 87 |
The bold numbers represent the best results for the metrics
Fig. 2Filter tweets based on the MGPS score, and assign topic to tweet based on the maximal topic probability score. Green refers to migration-related topics, and orange refers to other topics
Example of topics, words belonging to the topics, an example Tweet, and its MGPS
| Topic | Top words | Preprocessed tweet | MGPS |
|---|---|---|---|
| 1* | Refugee, seeker, kill, alien, enter | Treatment refugee violate human rights dehumanize refugee endanger european value security argue group psychologist open letter | 0.7195 |
| 2* | Great, call, immigration, question, town | Peddle lie interwoven thread brexit regional leave voter low exposure immigration easy scare foreigner queue town come assimilate quickly | 1.1062 |
| 3* | Work, refugee, covid, border, woman | Yeah let corrupt nhs education system fine cause deport load hard work immigrant | 0.8585 |
| 4 | People, take, uk, health, hope | Illegal immigrant get day uk free home cash health education maternity british national take fool katiehopkins | 0.9598 |
| 5 | Stop, find, austria, future, country | Proven liar self promote cheat allow uncounted unchecked immigration country cause current crisis | 0.4782 |
The topics with * are chosen as migration-related
Fig. 3Distribution of all the Crawled Tweets based on geographic location
Fig. 4Distribution of Tweets based on geographic location after filtering using ETM
Statistics (a) and results (b) of contextual embedding models on SemEval2017 test dataset for sentiment analysis
| Train | Validation | Test | All | ||
|---|---|---|---|---|---|
| (a) | |||||
| Negative | 6291 | 752 | 766 | 7809 | |
| Neutral | 17,981 | 2256 | 2287 | 22,524 | |
| Positive | 15,833 | 2006 | 1960 | 19,799 | |
| Negative | 7316 | 923 | 939 | 9178 | |
| Neutral | 2475 | 316 | 308 | 3099 | |
| Positive | 1921 | 225 | 217 | 2363 | |
| Negative | 13,608 | 1736 | 1643 | 16,987 | |
| Neutral | 20,516 | 2535 | 2572 | 25,623 | |
| Positive | 17,693 | 2207 | 2262 | 22,162 | |
The statistics (a) of the HateXplain Dataset and the results (b) of different hate speech detection models (Bold values show the best results)
| Dataset | Normal | Offensive | Hateful | |
|---|---|---|---|---|
| (a) | ||||
| Train | 6251 | 4384 | 4748 | |
| Val | 781 | 548 | 593 | |
| Test | 782 | 548 | 594 | |
Fig. 6Temporal distribution of tweets after hate speech detection. (Left) shows the distribution of tweets from the UK, while (right) is for all the 11 EU countries
The query and result for retrieving a list of top 10 entity labels containing “refugee” and its frequency of detected entity mentions
| (a) SPARQL query Q4 | |
|---|---|
| SELECT ?EntityLabel (COUNT(?EntityLabel) AS | |
| ?NumOfEntityMentions) WHERE{ | |
| ?tweet schema:mentions ?entity. | |
| ?entity a nee:Entity; | |
| nee:hasMatchedURI ?uri. | |
| ?uri a rdfs:Resource; | |
| rdfs:label ?EntityLabel. | |
| FILTER( REGEX(?EntityLabel, “refugee”, “i”) || | |
| LCASE(STR(?EntityLabel))=“refugee”). | |
| }GROUP BY ?EntityLabel | |
| ORDER BY DESC(?NumOfEntityMentions) LIMIT 10 |
Fig. 7MGKB Ontology
Correlation matrix of factors used in multivariate analysis of potential determinants (***, **, *)
| Share negative | Share positive | Share offensive | Share hateful | Total unemployment | Long-term unemployment | Youth unemployment | Disposable income | Real GDP growth rate | Immigration flow per 100k population | First-time asylum applications per 100k population | Immigrant stock per 100k population | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Share positive | − 0.75*** | 1 | ||||||||||
| Share offensive | 0.75*** | − 0.67*** | 1 | |||||||||
| Share hateful | 0.77*** | − 0.52*** | 0.68*** | 1 | ||||||||
| Total unemployment | − 0.1 | 0.19* | − 0.03 | − 0.05 | 1 | |||||||
| Long-term unemployment | − 0.17 | 0.23** | − 0.05 | − 0.06 | 0.95*** | 1 | ||||||
| Youth unemployment | − 0.02 | 0.11 | 0.05 | 0.04 | 0.94*** | 0.92*** | 1 | |||||
| Disposable income | 0.3** | − 0.11 | 0.1 | 0.26** | − 0.27** | − 0.32** | − 0.28** | 1 | ||||
| Real GDP growth rate | − 0.42*** | 0.14 | − 0.27*** | − 0.2* | − 0.11 | − 0.06 | − 0.14 | − 0.4*** | 1 | |||
| Immigration flow per 100k population | − 0.08 | 0.06 | − 0.25** | − 0.24** | − 0.33*** | − 0.43*** | − 0.48*** | 0.5*** | − 0.01 | 1 | ||
| First-time asylum applications per 100k population | − 0.08 | − 0.02 | − 0.07 | − 0.16 | − 0.15 | − 0.2* | − 0.16 | 0.01 | 0.13 | 0.31*** | 1 | |
| Immigrant stock per 100k population | − 0.19 | 0.26** | − 0.32*** | − 0.34*** | − 0.06 | − 0.07 | − 0.18 | 0.7*** | − 0.44*** | 0.71*** | 0.07 | 1 |
Pooled linear regression model with explanatory model (***, **, *)
| (1) Share of posts with negative sentiment | (2) Share of offensive posts | (3) Share of hateful posts | (4) Share of posts with negative sentiment | (5) Share of offensive posts | (6) Share of hateful posts | |
|---|---|---|---|---|---|---|
| Yearly first-time asylum seekers per 100k population | − 1.355e−05 (1.67e−05) | − 8.085e−07 (1.66e−06) | − 1.133e−05* (6.51e−06) | |||
| Total migrant stock per 100k population | − 2.032e−06** (9.18e−07) | − 8.085e−07*** (1.66e−06) | − 1.573e−06*** (3.12e−07) | − 2.759e−06* (1.43e−06) | − 2.027e−07* (1.1e−07) | − 1.425e−06*** (4.53e−07) |
| Total unemployment rate (15–74) | − 0.0025 (0.001) | − 0.0002 (0.000) | − 0.0010* (0.001) | − 0.0021 (0.002) | − 0.0002 (0.000) | − 0.0010 (0.001) |
| Real GDP growth rate | − 0.0018 (0.005) | 7.467e−05 (0.001) | − 0.0019 (0.002) | − 0.0029 (0.005) | 0.0001 (0.001) | − 0.0020 (0.002) |
| Yearly immigration flow per 100k population | 1.228e−05 (2.13e−05) | − 1.521e−06 (1.76e−06) | − 3.946e−06 (6.96e−06) | |||
| Constant | 0.1176*** (0.027) | 0.0143*** (0.003) | 0.0689*** (0.011) | 0.1082*** (0.030) | 0.0150*** (0.003) | 0.0692*** (0.012) |
| Observations | 66 | 66 | 66 | 66 | 66 | 66 |
| 0.072 | 0.119 | 0.163 | 0.070 | 0.122 | 0.145 |
Pooled linear regression model with subgroup analysis (***, **, *)
| Outcome variable | (1) Share of posts with negative sentiment | (2) Share of offensive posts | (3) Share of hateful posts |
|---|---|---|---|
| Subsample | Below median total/non-EU immigrant stock | Below median total/non-EU immigrant stock | Below median total/non-EU immigrant stock |
| Yearly first-time asylum seekers per 100k population | 3.826e−06 (1.57e−05) | 5.023e−07 (1.96e−06) | − 6.028e−06 (7.45e−06) |
| Total migrant stock per 100k population | − 1.796e−06 (2.1e−06) | − 4.547e−07** (1.94e−07) | − 2.985e−06*** (5.93e−07) |
| Total unemployment rate (15–74) | 0.0087 (0.007) | 6.661e−05 (0.001) | 0.0027 (0.004) |
| Real GDP growth rate | 0.0046 (0.008) | − 0.0003 (0.001) | − 0.0032 (0.003) |
| Yearly immigration flow per 100k population | 4.112e−07** (1.54e−07) | 1.36e−08 (1.93e−08) | − 7.911e−08 (6.13e−08) |
| Constant | − 0.0415 (0.058) | 0.0119* (0.006) | 0.0641** (0.027) |
| Observations | 30 | 30 | 30 |
| 0.277 | 0.182 | 0.391 |
Linear fixed effects model accounting for time-invariant country differences (***, **, *)
| Outcome variable | (1) Share of posts with negative sentiment | (2) Share of offensive posts | (3) Share of hateful posts | (4) Share of posts with negative sentiment | (5) Share of offensive posts | (6) Share of hateful posts |
|---|---|---|---|---|---|---|
| Yearly first-time asylum seekers per 100k population | − 3.59e−05*** (8.037e−06) | − 1.914e−06* (1.108e−06) | − 2.023e−05** (7.947e−06) | 2.256e−06 (8.169e−06) | 1.638e−06 (1.276e−06) | 2.256e−06 (8.169e−06) |
| Total migrant stock per 100k population | − 1.187e−06*** (3.182e−07) | − 2.299e−07*** (6.624e−08) | − 1.255e−06*** (1.535e−07) | − 4.747e−07 (3.403e−06) | 2.392e−07 (5.7e−07) | − 4.747e−07 (3.403e−06) |
| Total unemployment rate (15–74) | − 0.0031*** (0.0007) | − 0.0002 (0.0002) | − 0.0014*** (0.0003) | − 0.0015 | 0.0009 | − 0.0015 |
| Real GDP growth rate | 0.0056 (0.0051) | 0.0005 (0.0005) | 0.0006 (0.0024) | 0.0055 (0.0071) | 7.489e−05 (0.0014) | 0.0055 (0.0071) |
| Country-specific linear time trends | No | No | No | Yes | Yes | Yes |
| Observations | 66 | 66 | 66 | 66 | 66 | 66 |
| − 0.1531 | − 0.1165 | − 0.1442 | − 0.0424 | 0.0318 | 0.0380 | |
| Number of country fixed effects | 11 | 11 | 11 | 11 | 11 | 11 |
The query and result for extracting the potential driving factors and the number of tweets in different sentiments and offensive/hateful speeches from the UK over the years 2013–2020
| (a) SPARQL query Q1 | ||||||||
|---|---|---|---|---|---|---|---|---|
| SELECT ?Year ?RealGDPGrowthRate ?TotalUnemploymentRate ?DisposableIncome | ||||||||
| (SUM(IF (?atti=mgkb:offensive, 1, 0)) AS ?Offensive) | ||||||||
| (SUM(IF (?atti=wna:hate, 1, 0)) AS ?Hateful) | ||||||||
| (SUM(IF (?atti=wna:negative-emotion, 1, 0)) AS ?Negative) | ||||||||
| (SUM(IF (?atti=wna:positive-emotion, 1, 0)) AS ?Positive) | ||||||||
| (COUNT(?tweet) AS ?TotalTweets) | ||||||||
| WHERE { | ||||||||
| ?tweet fibo_fnd_rel_rel:isCharacterizedBy ?gdpr. | ||||||||
| ?gdpr a fibo_ind_ei_ei:GrossDomesticProduct; | ||||||||
| schema:addressCountry “GB”; dc:date ?Year; | ||||||||
| fibo_ind_ei_ei:hasIndicatorValue ?RealGDPGrowthRate. | ||||||||
| ?tweet fibo_fnd_rel_rel:isCharacterizedBy ?unemploy. | ||||||||
| ?unemploy a mgkb:TotalUnemploymentRate; | ||||||||
| schema:addressCountry “GB”; dc:date ?Year; | ||||||||
| fibo_ind_ei_ei:hasIndicatorValue ?TotalUnemploymentRate. | ||||||||
| ?tweet fibo_fnd_rel_rel:isCharacterizedBy ?income. | ||||||||
| ?income a mgkb:DisposableIncome; | ||||||||
| fibo_ind_acc_cat:hasMonetaryAmount ?monetaryamount; | ||||||||
| dc:date ?Year; schema:addressCountry “GB”. | ||||||||
| ?monetaryamount fibo_ind_acc_cat:hasAmount ?DisposableIncome. | ||||||||
| ?tweet onyx:hasEmotionSet ?y. | ||||||||
| ?y a onyx:EmotionSet; onyx:hasEmotion ?z. | ||||||||
| ?z a onyx:Emotion; onyx:hasEmotionCategory ?atti. | ||||||||
| }GROUP BY ?Year ?RealGDPGrowthRate ?TotalUnemploymentRate ?DisposableIncome | ||||||||
| ORDER BY DESC(?Year) |
Correlation matrix of factors used in multivariate analysis of potential determinants in the UK regarding public attitudes toward migrations based on (b) (***, **, *)
| Share negative | Share positive | Share offensive | Share hateful | Total unemployment rate | Disposable income | Real GDP growth rate | Immigration flow per 100k | Asylum requests per 100k | Migrant stock per 100k | |
|---|---|---|---|---|---|---|---|---|---|---|
| Share positive | − 0.7* | 1 | ||||||||
| Share offensive | 0.86*** | − 0.81** | 1 | |||||||
| Share hateful | 0.92*** | − 0.56 | 0.78** | 1 | ||||||
| Total unemployment rate | 0.3 | − 0.06 | 0.31 | 0.63* | 1 | |||||
| Disposable income | − 0.28 | 0.26 | − 0.34 | − 0.58 | − 0.94 *** | 1 | ||||
| Real GDP growth rate | − 0.26 | 0.21 | − 0.32 | 0.0 | 0.39 | − 0.29 | 1 | |||
| Immigration flow per 100k | 0.29 | − 0.45 | 0.15 | − 0.04 | − 0.61 | 0.55 | − 0.0 | 1 | ||
| Asylum requests per 100k | 0.1 | − 0.27 | − 0.03 | − 0.18 | − 0.74* | 0.72* | − 0.51 | 0.54 | 1 | |
| Migrant stock per 100k | − 0.54 | 0.69 | − 0.46 | − 0.72 | − 0.98 *** | 0.92 *** | − 0.95 *** | − 0.1 | 0.36 | 1 |
The query and result for extracting the number of hateful, negative, positive tweets and the total number of tweets regarding the topics which contain keywords “border” and “covid” in the UK by temporal distribution
| (a) SPARQL query Q2 | ||||
|---|---|---|---|---|
| SELECT ?Year | ||||
| (SUM(IF (?atti=wna:hate, 1, 0)) AS ?Hateful) | ||||
| (SUM(IF (?atti=wna:negative-emotion, 1, 0)) AS ?Negative) | ||||
| (SUM(IF (?atti=wna:positive-emotion, 1, 0)) AS ?Positive) | ||||
| (COUNT(?tweet) AS ?Total) WHERE { | ||||
| ?tweet dc:subject ?topic; | ||||
| onyx:hasEmotionSet ?y ; | ||||
| schema:location ?location; | ||||
| dcterms:created ?createdDate. | ||||
| ?location schema:addressCountry “GB”. | ||||
| BIND( SUBSTR(?createdDate,1,4 ) AS ?Year). | ||||
| ?y a onyx:EmotionSet; | ||||
| onyx:hasEmotion ?z. | ||||
| ?z a onyx:Emotion; | ||||
| onyx:hasEmotionCategory ?atti. | ||||
| ?topic a sioc_t:Category ; | ||||
| rdfs:label ?topicLabel. | ||||
| FILTER( REGEX(?topicLabel, “border”, “i”) | ||||
| || LCASE(STR(?topicLabel))=“refugee”). | ||||
| }GROUP BY ?Year ORDER BY DESC(?Year) |
The query and the first 15 results for extracting top occurring entities concerning “refugees” linked to Wikidata in the tweets and their sentiments and offensive/hate speeches
| (a) SPARQL query Q3 | |||
|---|---|---|---|
| SELECT ?EmotionCategory ?EntityMention ?Entities (COUNT(?tweet) AS ?NumOfTweets) WHERE{ | |||
| ?tweet schema:mentions ?entity. | |||
| ?entity a nee:Entity; nee:hasMatchedURI ?Entities. | |||
| ?Entities a rdfs:Resource; rdfs:label ?EntityMention. | |||
| FILTER( REGEX(?EntityMention, “refuge”, “i”) || LCASE(STR(?EntityMention))=“refuge”). | |||
| ?tweet onyx:hasEmotionSet ?y. | |||
| ?y a onyx:EmotionSet; onyx:hasEmotion ?z. | |||
| ?z a onyx:Emotion; onyx:hasEmotionCategory ?EmotionCategory. | |||
| } GROUP BY ?EmotionCategory ?EntityMention ?Entities ORDER BY DESC(?NumOfTweets) LIMIT 15 |
The query and result for identifying the negative and positive sentiments and hate speeches of the Public and the total number of tweets over time by searching entities defining “Refugees”
| (a) SPARQL query Q5 | ||||
|---|---|---|---|---|
| SELECT ?Year (SUM(IF (?atti=wna:hate, 1, 0)) AS ?Hateful) | ||||
| (SUM(IF (?atti=wna:negative-emotion, 1, 0)) AS ?Negative) | ||||
| (SUM(IF (?atti=wna:positive-emotion, 1, 0)) AS ?Positive) | ||||
| (count(?tweet) as ?TotalTweets) WHERE { | ||||
| ?tweet schema:mentions ?entity; | ||||
| dcterms:created ?createdDate. | ||||
| ?entity a nee:Entity; | ||||
| nee:hasMatchedURI ?uri. | ||||
| ?uri a rdfs:Resource; | ||||
| rdfs:label ?x. | ||||
| BIND( SUBSTR(?createdDate,1,4 ) AS ?Year). | ||||
| FILTER( REGEX(?x, “refugee”, “i”) | ||||
| || LCASE(STR(?x))=“refugee”). | ||||
| ?tweet onyx:hasEmotionSet ?y. | ||||
| ?y a onyx:EmotionSet; | ||||
| onyx:hasEmotion ?z. | ||||
| ?z a onyx:Emotion; | ||||
| onyx:hasEmotionCategory ?atti. | ||||
| }GROUP BY ?Year ORDER BY DESC(?Year) |