| Literature DB >> 33286215 |
Olga Bernikova1, Oleg Granichin2, Dan Lemberg3, Oleg Redkin1, Zeev Volkovich3.
Abstract
A new method for the recognition of meaningful changes in social state based on transformations of the linguistic content in Arabic newspapers is suggested. The detected alterations of the linguistic material in Arabic newspapers play an indicator role. The currently proposed approach acts in an "online" fashion and uses pre-trained vector representations of Arabic words. After a pre-processing stage, the words in the issues' texts are substituted by vectors obtained within a word embedding methodology. The approach typifies the consistent linguistic template by the similarity of the embedded vectors. A change in the distributions of the issue-grounded samples indicates a difference in the underlying newspaper template. A two-step procedure implements the concept, where the first step compares the similarity distribution of the current issue versus the union of ones corresponding to several of its predecessors. A repeating under-sampling approach accompanied by a two-sample test stabilizes the sampling and returns a collection of the resultant p-values. In the second stage, the entropy of these sets is sequentially calculated, such that the change points of the time series obtained in this way indicate the changes in the newspaper content. Numerical experiments provided on the following issues of several Arabic newspapers published in the Arab Spring period demonstrate the high reliability of the method.Entities:
Keywords: anomaly detection; publishing model modeling; word embedding
Year: 2020 PMID: 33286215 PMCID: PMC7516919 DOI: 10.3390/e22040441
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The overall entropy graph of the “Al-Ahraam” newspaper in the first time frame.
Figure 2Examples of the similarity distributions.
Change points detected for the “Al-Ahraam” newspaper in the first time frame.
| 1 | October 27, 2010 | 0.608 |
| 2 | November 17, 2010 | 0.593 |
| 3 | January 11, 2011 | 0.665 |
| 4 | January 13, 2011 | 0.637 |
| 5 | January 26, 2011 | 0.644 |
| 6 | February 23, 2011 | 0.66 |
| 7 | March 11, 2011 | 0.654 |
| 8 | April 3, 2011 | 0.637 |
| 9 | April 24, 2011 | 0.658 |
| 10 | May 2, 2011 | 0.652 |
| 11 | May 18, 2011 | 0.512 |
| 12 | June 21, 2011 | 0.661 |
| 13 | June 28, 2011 | 0.66 |
| 14 | June 30, 2011 | 0.657 |
| 15 | August 7, 2011 | 0.664 |
| 16 | August 21, 2011 | 0.665 |
| 17 | October 2, 2011 | 0.667 |
| 18 | October 14, 2011 | 0.667 |
| 19 | October 23, 2011 | 0.653 |
| 20 | November 24, 2011 | 0.654 |
| 21 | November 29, 2011 | 0.655 |
| 22 | December 6, 2011 | 0.641 |
Figure 3The overall entropy graph of the “Al-Ahraam” newspaper in the second time frame.
Change points detected for the “Al-Ahraam” newspaper in the second time frame.
| 1 | January 11, 2014 | 0.559 |
| 2 | January 12, 2014 | 0.103 |
| 3 | March 16, 2014 | 0.609 |
| 4 | April 11, 2014 | 0.629 |
| 5 | May 5, 2014 | 0.646 |
| 6 | May 11, 2014 | 0.639 |
| 7 | May 20, 2014 | 0.561 |
Figure 4The overall entropy graph of the “Akhbaar Al-Khaleej” newspaper.
Change points detected for the “Akhbaar Al-Khaleej” newspaper.
| 1 | December 5, 2010 | 0.31 |
| 2 | January 8, 2011 | 0.623 |
| 3 | January 11, 2011 | 0.627 |
| 4 | January 18, 2011 | 0.62 |
| 5 | January 21, 2011 | 0.601 |
| 6 | February 17, 2011 | 0.537 |
| 7 | February 23, 2011 | 0.509 |
| 8 | March 25, 2011 | 0.591 |
Figure 5The overall entropy graph of the “Al-Ghad” newspaper.
Change points detected for the “Al-Ghad” newspaper.
| 1 | November 17, 2010 | 0.254 |
| 2 | December 8, 2010 | 0.58 |
| 3 | January 11, 2011 | 0.645 |
| 4 | January 13, 2011 | 0.568 |
| 5 | January 26, 2011 | 0.595 |
| 6 | February 23, 2011 | 0.632 |
| 7 | March 11, 2011 | 0.619 |
| 8 | April 3, 2011 | 0.569 |