| Literature DB >> 36185776 |
Lina Zhou1, Jie Tao2, Dongsong Zhang1.
Abstract
Fake news is being generated in different languages, yet existing studies are dominated by English news. The analysis of fake news content has focused on lexical and stylometric features, giving little attention to semantic features. A few studies involving semantic features have either used them as the inputs to classifiers with no interpretations, or treated them in isolation. This research aims to investigate both thematic and emotional characteristics of fake news at different levels and compare them between different languages for the first time. It extends a state-of-the-art topic modeling technique to extract news topics and introduces a divergence measure to assess the importance of thematic characteristics for identifying fake news. We further examine associations of the thematic and emotional characteristics of fake news. The empirical findings have implications for developing both general and language-specific countermeasures for fake news.Entities:
Keywords: COVID-19; Cross-lingual analysis; Emotion; Fake news; Theme; Topic modeling
Year: 2022 PMID: 36185776 PMCID: PMC9510725 DOI: 10.1007/s10796-022-10329-7
Source DB: PubMed Journal: Inf Syst Front ISSN: 1387-3326 Impact factor: 5.261
Fig. 1Transformer-based topic modeling (TM2)
Performances of different embedding and classification models
| Embedding models | Diversity | Average F1 | |
|---|---|---|---|
| Proposed | Baseline | ||
| (a) English news | |||
| all-MiniLM-L12-v2 | .8641 | ||
| all-distilroberta-v1 | .7767 | .8864 | .8214 |
| all-mpnet-base-v2 | .7653 | .8812 | .8199 |
| (b) Chinese news | |||
| paraphrase-multilingual-MiniLM-L12-v2 | .7625 | ||
| distiluse-base-multilingual-cased-v1 | .7321 | .8801 | .7832 |
| distiluse-base-multilingual-cased-v2 | .7415 | .8695 | .7346 |
Values in bold denote the results of the best performing models.
Fig. 2Elbow method plot for determining an optimal number of topics
Descriptive statistics (mean [std]) and t-test results of the extracted topics
| ID | Topic | Fake news | Real news | ||
|---|---|---|---|---|---|
| (a) English | |||||
| E1 | getting vaccinated | 0.043 [0.140] | 0.005 [0.022] | 31.852 | < .001*** |
| E2 | pandemic | 0.006 [0.031] | 0.022 [0.104] | –16.120 | < .001*** |
| E3 | Vaccinequity | 0.015 [0.023] | 0.039 [0.126] | –21.220 | < .001*** |
| E4 | face mask | 0.010 [0.050] | 0.018 [0.092] | –8.808 | < .001*** |
| E5 | depopulation | 0.016 [0.083] | 0.003 [0.003] | 19.082 | < .001*** |
| E6 | 0.015 [0.043] | 0.036 [0.121] | –18.508 | < .001*** | |
| E7 | unvaccinated people | 0.038 [0.106] | 0.011 [0.019] | 29.878 | < .001*** |
| E8 | 0.016 [0.027] | 0.048 [0.147] | –23.524 | < .001*** | |
| E9 | virus origin | 0.025 [0.106] | 0.007 [0.022] | 18.820 | < .001*** |
| E10 | pandemic in Italy | 0.028 [0.104] | 0.009 [0.019] | 20.111 | < .001*** |
| E11 | U.S. President | 0.021 [0.087] | 0.010 [0.024] | 14.279 | < .001*** |
| E12 | UK lockdown | 0.004 [0.033] | 0.012 [0.087] | –9.244 | < .001*** |
| E13 | India fighting COVID-19 | 0.009 [0.010] | 0.024 [0.107] | –15.314 | < .001*** |
| E14 | 0.017 [0.050] | 0.025 [0.092] | –7.975 | < .001*** | |
| E15 | flu vaccine | 0.010 [0.045] | 0.016 [0.075] | –7.674 | < .001*** |
| E16 | 0.010 [0.028] | 0.024 [0.098] | –15.621 | < .001*** | |
| E17 | cases in Nigeria | 0.005 [0.004] | 0.018 [0.102] | –14.349 | < .001*** |
| E18 | Covidview report | 0.011 [0.042] | 0.020 [0.081] | –10.565 | < .001*** |
| E19 | back to school | 0.009 [0.054] | 0.005 [0.024] | 7.948 | < .001*** |
| E20 | chicken contamination | 0.008 [0.054] | 0.004 [0.018] | 8.313 | < .001*** |
| E21 | India lockdown | 0.017 [0.081] | 0.006 [0.015] | 15.216 | < .001*** |
| E22 | Bill Gates | 0.009 [0.063] | 0.001 [0.003] | 14.158 | < .001*** |
| E23 | case updates | 0.006 [0.029] | 0.014 [0.093] | –9.531 | < .001*** |
| E24 | 0.005 [0.031] | 0.007 [0.048] | –3.999 | < .001*** | |
| E25 | travel | 0.010 [0.045] | 0.009 [0.056] | 1.534 | 0.125 |
| (b) Chinese | |||||
| C1 | 0.033 [0.065] | 0.058 [0.171] | –6.615 | < .001*** | |
| C2 | Shenshan hospital | 0.070 [0.175] | 0.068 [0.160] | 0.350 | 0.727 |
| C3 | prevention & control | 0.017 [0.054] | 0.025 [0.095] | –3.299 | 0.001** |
| C4 | 0.025 [0.030] | 0.057 [0.159] | –11.105 | < .001*** | |
| C5 | viral infection | 0.039 [0.114] | 0.020 [0.102] | 4.117 | < .001*** |
| C6 | 0.048 [0.106] | 0.030 [0.123] | 4.067 | < .001*** | |
| C7 | pandemic in U.S | 0.049 [0.161] | 0.019 [0.068] | 4.930 | < .001*** |
| C8 | fighting COVID-19 | 0.017 [0.035] | 0.028 [0.105] | –4.850 | < .001*** |
| C9 | 0.026 [0.067] | 0.022 [0.102] | 1.325 | 0.186 | |
| C10 | pandemic in Japan | 0.011 [0.008] | 0.030 [0.126] | –8.869 | < .001*** |
| C11 | British Prime Minister | 0.016 [0.016] | 0.029 [0.123] | –6.373 | < .001*** |
| C12 | preventative measures | 0.055 [0.147] | 0.007 [0.038] | 8.658 | < .001*** |
| C13 | wastewater surveillance | 0.019 [0.097] | 0.030 [0.135] | –2.538 | 0.011 |
| C14 | 0.006 [0.006] | 0.014 [0.072] | –6.463 | < .001*** | |
| C15 | pandemic in Spain | 0.002 [0.002] | 0.020 [0.118] | –8.768 | < .001*** |
***: p < .001; **: p < .01
Topics in bold denote overlapping topics between news in the two languages.
Sample topics and their top-10 terms
| ID | Top-10 terms | Topic |
|---|---|---|
| (a) English news | ||
| E22 | gates, bill, bill gates, depopulation, vaccine, billgatesbioterrorist, his, he, foundation, gates foundation | Bill Gates |
| E14 | vaccine, vaccines, covid, covid vaccine, more, covid vaccines, mrna, dose, vaccinated, get | COVID-19 vaccine |
| E9 | China, Chinese, coronavirus, Wuhan, in Wuhan, in China, the coronavirus, virus, video, bats | Virus origin |
| (b) Chinese news | ||
| C2 | 神山(Shenshan), 医院(Hospital), 一线(frontline), 神山 医院 (Shenshan hospital), 武汉(Wuhan), 医疗(medical), 英雄 (hero), 医疗队(medical team), 军舰 (naval ship), 湖北 (Hubei) | Shenshan hospital |
| C6 | 口罩(face mask), 消毒剂(sanitizer), 消毒(sanitizing), 洗手 (handwashing), 佩戴(wear), 清洗(wash), 医用 口罩 (medical mask), 接触(contact), 防护(protection), 酒精 (alcohol) | Protective measures |
| C9 | 疫苗(vaccine), 病毒(coronavirus), 研究(research), 抗体 (antibody), 蛋白(protein), 突变(mutation), 研发(R&D), 临床试验 (clinical trial), 灭活疫苗 (inactivated vaccine), 检测 (testing) | COVID-19 vaccine |
Fig. 3KLD scores of topic pairs
Descriptive statistics (mean [std]) and T-test results of news-level topic characteristics
| Language | Variable | ||||
|---|---|---|---|---|---|
| English | topic uncertainty | 1.377 [0.898] | 1.276 [0.887] | 9.112 | < .001*** |
| topic concentration | 0.195 [0.262] | 0.252 [0.316] | –15.763 | < .001*** | |
| Chinese | topic uncertainty | 1.335 [0.859] | 1.069 [0.759] | 7.693 | < .001*** |
| topic concentration | 0.273 [0.264] | 0.370 [0.325] | –8.557 | < .001*** |
Descriptive statistics of emotion features
| Variable | English | Chinese | ||
|---|---|---|---|---|
| Fake news | Real news | Fake news | Real news | |
| overall emotion | 3.540 [4.548] | 2.924 [3.481] | 6.707 [7.141] | 4.703 [4.052] |
| emotional polarity | 0.039 [0.217] | 0.127 [0.198] | 0.265 [0.348] | 0.366 [0.239] |
| negative emotion | 1.744 [3.293] | 1.158 [2.321] | 2.736 [4.222] | 2.013 [2.550] |
| anger | 0.701 [2.283] | 0.156 [0.796] | 0.484 [1.832] | 0.222 [0.918] |
| sad | 0.242 [1.159] | 0.302 [1.067] | 0.250 [1.446] | 0.135 [0.722] |
| anxiety | 0.310 [1.271] | 0.361 [1.201] | 0.402 [1.893] | 0.288 [0.902] |
T-test results of emotion features between fake and real news
| Variable | English | Chinese | ||
|---|---|---|---|---|
| overall emotion | 12.335 | < .001*** | 7.255 | < .001*** |
| emotional polarity | –34.391 | < .001*** | –8.670 | < .001*** |
| negative emotion | 16.737 | < .001*** | 4.412 | < .001*** |
| anger | 26.276 | < .001*** | 3.719 | < .001*** |
| sad | –4.350 | < .001*** | 2.064 | 0.039* |
| anxiety | –3.316 | 0.001** | 1.573 | 0.116 |
***: p < .001; **: p < .01; *: p < .05
Fig. 4Correlation coefficients between topic loadings and emotional polarity
Fig. 5Log(KLD) distribution plot