| Literature DB >> 32832674 |
Meysam Alizadeh1, Jacob N Shapiro1, Cody Buntain2, Joshua A Tucker3.
Abstract
We study how easy it is to distinguish influence operations from organic social media activity by assessing the performance of a platform-agnostic machine learning approach. Our method uses public activity to detect content that is part of coordinated influence operations based on human-interpretable features derived solely from content. We test this method on publicly available Twitter data on Chinese, Russian, and Venezuelan troll activity targeting the United States, as well as the Reddit dataset of Russian influence efforts. To assess how well content-based features distinguish these influence operations from random samples of general and political American users, we train and test classifiers on a monthly basis for each campaign across five prediction tasks. Content-based features perform well across period, country, platform, and prediction task. Industrialized production of influence campaign content leaves a distinctive signal in user-generated content that allows tracking of campaigns from month to month and across different accounts.Entities:
Year: 2020 PMID: 32832674 PMCID: PMC7439640 DOI: 10.1126/sciadv.abb5824
Source DB: PubMed Journal: Sci Adv ISSN: 2375-2548 Impact factor: 14.136
Mean and SD of monthly macro-averaged F1 scores.
| China | 0.89 | 0.93 | 0.89 | NA§ | NA║ | |
| (0.08) | (0.04) | (0.12) | – | – | ||
| Russia | 0.85 | 0.81 | 0.81 | 0.75 | 0.60‡ | |
| (0.13) | (0.07) | (0.13) | (0.11) | (0.03) | ||
| Russia | 0.82 | 0.82 | 0.74 | NA§ | 0.37 | |
| (0.07) | (0.09) | (0.15) | – | (0.03) | ||
| Venezuela | 0.99 | 0.99 | 0.92 | 0.49 | NA║ | |
| (0.03) | (0.002) | (0.15) | (0.07) | – |
*Training data are all tweets from a 50% random sample of troll users combined with independent random samples from each of our two control groups. Test data use all tweets by the other 50% of troll users and a stratified random sample of 50% of tweets by nontroll users.
†Because this test includes the same troll accounts in both train and test sets, we exclude features related to account creation date.
‡We calculate mean and SD in F1 over months in which there are at least 1000 troll tweets or 500 troll Reddit posts in the test month.
§Not applicable. There was only one official data release for the Chinese campaign on Twitter and the Russian campaign on Reddit as of 1 December 2019.
║Not applicable. Cross-platform data are only available for Russian campaign.
Fig. 1Ordinary least squares (OLS) regression coefficients for variables explaining the predictive performance of classifiers across campaigns and tests.
Points represent estimated coefficients, and bars represent the 95% confidence interval of the estimate. F1 scores for Chinese and Venezuelan operations are approximately 5.7 and 9.5% greater than on Russian activity controlling for temporal trends. Most of that effect, however, is due to their distinct communication features. Once those are accounted for, Chinese activity is no more predictable than Russian, and Venezuelan is actually less predictable. In other words, Venezuelan operations were the easiest to detect because of their unusual way of using retweets (RT), replies, hashtags, mentions, and URLs. Conditioning on timing and other factors, F1 for task 4 is 31% lower than for task 1, making it the hardest prediction task. A 1% increase in the number of positive cases in the test set predicts a 7% increase in precision, and a 1% increase in the number in the training set predicts a 5% increase in recall. Task 5 is excluded because of noncomparability issue.
Monthly mean of macro-averaged F1 scores for detection of Russian troll tweets, with varying predictor sets.
User-timing features were removed for task 3. Task 5 is excluded because it is based on a reduced set of features and therefore not comparable to other tasks.
| Model number | (1) | (2) | (3) | (4) | (5) |
| Experiments | |||||
| Within-month train/ | 0.76 | 0.81 | 0.82 | 0.85 | 0.84 |
| Train on | 0.74 | 0.82 | 0.82 | NA | 0.85 |
| Train on | 0.66 | 0.75 | 0.75 | 0.81 | 0.82 |
| Within-month | 0.66 | 0.70 | 0.70 | 0.74 | 0.75 |
Fig. 2Feature importance trends of a set of selected predictors.
Analyzing changes of feature importance trends over time can reveal tactical changes in the Russian IRA influence operation on Twitter. The top row shows features whose importance was higher before the November 2016 U.S. elections. The second row focuses on features whose importance was relatively higher in the election month of November 2016. The last row illustrates importance trend for features that were most important at other times.
Fig. 3Characterizing retweet networks of Venezuelan and Russian influence operations.
Each node represents a user, and there is an edge between two nodes if one of them retweeted the other. Node label size represents Page Rank score. Color reflects different concepts in each graph. (A) Venezuelan trolls. Node color and edge size represent the number of retweets. Venezuelan trolls were mostly interested in tweets from or about Trump. We can see that a considerable portion of their campaign was a single-issue Trump-related campaign. Structurally, we see few central accounts and many side accounts around each, which is the simplest form of running a campaign on Twitter. (B) Russian IRA trolls. Node color represents communities derived by applying the Louvain algorithm. Edges are colored by source. Russian operations were quite diverse in terms of their topics and audience of interest. They were targeting right-leaning (green), left-leaning (purple), and African-American left-leaning (green) individuals and hashtag gamers (blue). Structurally, we see target-specific clusters reflecting division of labor with frequent communication between them.
Summary of Twitter and Reddit data for troll and control accounts.
| Troll | China | 2,660 | 1,940,180 | |
| Troll | Russia | 3,722 | 3,738,750 | |
| Troll | Venezuela | 594 | 1,488,142 | |
| Control | U.S. political | 5,000 | 22,977,929 | |
| Control | U.S. random | 5,000 | 20,935,038 | |
| Troll | Russia | 944 | 14,471 | |
| Control | Political | 107,052 | 713,236 | |
| Control | Top 30 | 784,711 | 5,475,687 |