| Literature DB >> 35400845 |
Changhao Tang1, Kun Ma1, Benkuan Cui1, Ke Ji1, Ajith Abraham2.
Abstract
The spread of COVID-19 has had a serious impact on either work or the lives of people. With the decrease in physical social contacts and the rise of anxiety on the pandemic, social media has become the primary approach for people to access information related to COVID-19. Social media is rife with rumors and fake news, causing great damage to the Society. Facing shortages, imbalance, and nosiness, the current Chinese data set related to the epidemic has not helped the detection of fake news. Besides, the accuracy of classification was also affected by the easy loss of edge characteristics in long text data. In this paper, long text feature extraction network with data augmentation (LTFE) was proposed, which improves the learning performance of the classifier by optimizing the data feature structure. In the stage of encoding, Twice-Masked Language Modeling for Fine-tuning (TMLM-F) and Data Alignment that Preserves Edge Characteristics (DA-PEC) was proposed to extract the classification features of the Chinese Dataset. Between the TMLM-F and DA-PEC processes, we use Attention to capture the dependencies between words and generate corresponding vector representations. The experimental results illustrate that this method is effective for the detection of Chinese fake news pertinent to the pandemic.Entities:
Keywords: COVID-19; Data augmentation; Fake news; Long text; Social media
Year: 2022 PMID: 35400845 PMCID: PMC8979485 DOI: 10.1007/s10489-022-03185-0
Source DB: PubMed Journal: Appl Intell (Dordr) ISSN: 0924-669X Impact factor: 5.086
Fig. 1General architecture of fake news detection
Fig. 2Text feature extraction
Fig. 3Twice-Masked Language Modeling for Fine-tuning(TMLM-F)
Fig. 4COVID-19 Data length distribution
Fig. 5BAAI data length distribution
The results of expanded CHECKED dataset
| Model | Method | P | R | F1 | ACC | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TMLM-F | DA-PEC | True | Fake | Macro | True | Fake | Macro | True | Fake | Macro | ||
| TextCNN | - | no | 0.9275 | 0.8758 | 0.9017 | 0.9529 | 0.8171 | 0.8850 | 0.9400 | 0.8454 | 0.8927 | 0.9136 |
| - | yes | 0.8829 | 0.9106 | 0.8967 | 0.9727 | 0.6829 | 0.8278 | 0.9256 | 0.7805 | 0.8531 | 0.8889 | |
| TextRNN | - | no | 0.8228 | 0.8602 | 0.8415 | 0.9677 | 0.4878 | 0.7278 | 0.8894 | 0.6226 | 0.7560 | 0.8289 |
| - | yes | 0.8298 | 0.8660 | 0.8479 | 0.9677 | 0.5122 | 0.7400 | 0.8935 | 0.6437 | 0.7686 | 0.8360 | |
| RCNN | - | no | 0.8197 | 0.8667 | 0.8432 | 0.9702 | 0.4756 | 0.7229 | 0.8886 | 0.6142 | 0.7514 | 0.8272 |
| - | yes | 0.8319 | 0.8763 | 0.8541 | 0.9702 | 0.5183 | 0.7443 | 0.8958 | 0.6513 | 0.7736 | 0.8395 | |
| DPCNN | - | no | 0.9100 | 0.8690 | 0.8895 | 0.9529 | 0.7683 | 0.8606 | 0.9309 | 0.8155 | 0.8732 | 0.8995 |
| - | yes | 0.9277 | 0.8816 | 0.9046 | 0.9553 | 0.8171 | 0.8862 | 0.9413 | 0.8481 | 0.8947 | 0.9153 | |
| BERT | no | no | 0.9400 | 0.9267 | 0.9334 | 0.9727 | 0.8476 | 0.9101 | 0.9561 | 0.8854 | 0.9207 | 0.9365 |
| +CNN | no | yes | 0.9897 | 0.8939 | 0.9418 | 0.9529 | 0.9756 | 0.9642 | 0.9709 | 0.9709 | 0.9519 | 0.9594 |
| LTFE | yes | yes | 0.9872 | 0.9086 | 0.9603 | 0.9695 | 0.9736 | 0.9381 | ||||
| +CNN | ||||||||||||
| BERT | no | no | 0.8902 | 0.8923 | 0.8912 | 0.9653 | 0.7073 | 0.8363 | 0.9262 | 0.7891 | 0.8577 | 0.8907 |
| +RNN | no | yes | 0.9736 | 0.8191 | 0.8964 | 0.9156 | 0.9390 | 0.9273 | 0.9437 | 0.8750 | 0.9094 | 0.9224 |
| LTFE | yes | yes | 0.9769 | 0.8708 | 0.9429 | 0.9451 | 0.9596 | 0.9064 | ||||
| +RNN | ||||||||||||
| BERT | no | no | 0.8284 | 0.8737 | 0.8510 | 0.9702 | 0.5061 | 0.7382 | 0.8937 | 0.6409 | 0.7673 | 0.8360 |
| +RCNN | no | yes | 0.8957 | 0.9365 | 0.9801 | 0.7195 | 0.8498 | 0.9360 | 0.8138 | 0.8749 | 0.9048 | |
| LTFE | yes | yes | 0.9262 | 0.9048 | 0.9155 | 0.9653 | 0.8110 | 0.9453 | 0.8553 | |||
| +RCNN | ||||||||||||
| BERT | no | no | 0.9317 | 0.8662 | 0.8990 | 0.9479 | 0.8293 | 0.8886 | 0.9397 | 0.8474 | 0.8935 | 0.9136 |
| +DPCNN | no | yes | 0.9353 | 0.9133 | 0.9243 | 0.9677 | 0.8354 | 0.9016 | 0.9512 | 0.8726 | 0.9119 | 0.9295 |
| LTFE | yes | yes | 0.9722 | 0.8947 | 0.9553 | 0.9329 | 0.9637 | 0.9134 | ||||
| +DPCNN | ||||||||||||
Bold entries are the optimal results of three different experiments
The results of BAAI dataset
| Model | Method | P | R | F1 | ACC | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TMLM-F | DA-PEC | True | Fake | Macro | True | Fake | Macro | True | Fake | Macro | ||
| TextCNN | - | no | 0.8466 | 0.9400 | 0.8933 | 0.9447 | 0.8349 | 0.8898 | 0.8929 | 0.8844 | 0.8886 | 0.8888 |
| - | yes | 0.8644 | 0.9442 | 0.9043 | 0.9475 | 0.8568 | 0.9021 | 0.9041 | 0.8984 | 0.9012 | 0.9013 | |
| TextRNN | - | no | 0.8306 | 0.9278 | 0.8792 | 0.9341 | 0.8164 | 0.8752 | 0.8793 | 0.8685 | 0.8739 | 0.8741 |
| - | yes | 0.8623 | 0.9125 | 0.8874 | 0.9145 | 0.8592 | 0.8869 | 0.8876 | 0.8850 | 0.8863 | 0.8863 | |
| RCNN | - | no | 0.8638 | 0.8886 | 0.8762 | 0.8874 | 0.8651 | 0.8763 | 0.8754 | 0.8767 | 0.8761 | 0.8761 |
| - | yes | 0.8634 | 0.9080 | 0.8857 | 0.9095 | 0.8613 | 0.8854 | 0.8859 | 0.8841 | 0.8850 | 0.8850 | |
| DPCNN | - | no | 0.8919 | 0.9218 | 0.9068 | 0.9215 | 0.8923 | 0.9069 | 0.9064 | 0.9068 | 0.9066 | 0.9066 |
| - | yes | 0.8753 | 0.9621 | 0.9187 | 0.9645 | 0.8675 | 0.9160 | 0.9177 | 0.9124 | 0.9151 | 0.9151 | |
| BERT | no | no | 0.9496 | 0.9730 | 0.9613 | 0.9726 | 0.9502 | 0.9614 | 0.9609 | 0.9614 | 0.9612 | 0.9612 |
| +CNN | no | yes | 0.9566 | 0.9674 | 0.9620 | 0.9665 | 0.9577 | 0.9621 | 0.9615 | 0.9625 | 0.9620 | 0.9620 |
| LTFE | yes | yes | 0.9560 | 0.9772 | 0.9768 | 0.9567 | 0.9663 | 0.9668 | ||||
| +CNN | ||||||||||||
| BERT | no | no | 0.9600 | 0.9597 | 0.9598 | 0.9587 | 0.9609 | 0.9598 | 0.9594 | 0.9603 | 0.9598 | 0.9598 |
| +RNN | no | yes | 0.9677 | 0.9627 | 0.9616 | 0.9686 | 0.9647 | 0.9657 | ||||
| LTFE | yes | yes | 0.9594 | 0.9691 | 0.9643 | 0.9687 | 0.9599 | 0.9643 | 0.9640 | 0.9645 | 0.9643 | 0.9643 |
| +RNN | ||||||||||||
| BERT | no | no | 0.9586 | 0.9659 | 0.9623 | 0.9648 | 0.9599 | 0.9623 | 0.9617 | 0.9629 | 0.9623 | 0.9623 |
| +RCNN | no | yes | 0.9528 | 0.9707 | 0.9617 | 0.9701 | 0.9537 | 0.9619 | 0.9614 | 0.9621 | 0.9617 | 0.9617 |
| LTFE | yes | yes | 0.9647 | 0.9632 | 0.9617 | 0.9661 | 0.9632 | 0.9646 | ||||
| +RCNN | ||||||||||||
| BERT | no | no | 0.9726 | 0.9454 | 0.9590 | 0.9416 | 0.9744 | 0.9580 | 0.9569 | 0.9597 | 0.9583 | 0.9583 |
| +DPCNN | no | yes | 0.9556 | 0.9631 | 0.9594 | 0.9620 | 0.9569 | 0.9595 | 0.9588 | 0.9600 | 0.9594 | 0.9594 |
| LTFE | yes | yes | 0.9619 | 0.9683 | 0.9673 | 0.9631 | 0.9646 | 0.9657 | ||||
| +DPCNN | ||||||||||||
Bold entries are the optimal results of three different experiments