| Literature DB >> 36035522 |
Kai-Cheng Yang1, Emilio Ferrara2, Filippo Menczer1.
Abstract
Social bots have become an important component of online social media. Deceptive bots, in particular, can manipulate online discussions of important issues ranging from elections to public health, threatening the constructive exchange of information. Their ubiquity makes them an interesting research subject and requires researchers to properly handle them when conducting studies using social media data. Therefore, it is important for researchers to gain access to bot detection tools that are reliable and easy to use. This paper aims to provide an introductory tutorial of Botometer, a public tool for bot detection on Twitter, for readers who are new to this topic and may not be familiar with programming and machine learning. We introduce how Botometer works, the different ways users can access it, and present a case study as a demonstration. Readers can use the case study code as a template for their own research. We also discuss recommended practice for using Botometer.Entities:
Keywords: Bot detection; Botometer; Social bots; Twitter
Year: 2022 PMID: 36035522 PMCID: PMC9391657 DOI: 10.1007/s42001-022-00177-5
Source DB: PubMed Journal: J Comput Soc Sci ISSN: 2432-2725
Fig. 1The timeline of Botometer versions
Annotated datasets of human and bot accounts used to train Botometer
| Dataset | Bots | Humans | Annotation method | References |
|---|---|---|---|---|
| varol-icwsm | 733 | 1495 | Human annotation | [ |
| cresci-17 | 7049 | 2764 | Various methods | [ |
| pronbots | 17882 | 0 | Spam bots | [ |
| celebrity | 0 | 5918 | Celebrity accounts | [ |
| vendor-purchased | 1087 | 0 | Fake followers | [ |
| botometer-feedback | 139 | 380 | Human annotation | [ |
| political-bots | 62 | 0 | Human annotation | [ |
| gilani-17 | 1090 | 1413 | Human annotation | [ |
| cresci-rtbust | 353 | 340 | Human annotation | [ |
| cresci-stock | 7102 | 6174 | Signs of coordination | [ |
| botwiki | 698 | 0 | Self-declared | [ |
| midterm-2018 | 0 | 7459 | Human annotation | [ |
| astroturf | 505 | 0 | Human annotation | [ |
| kaiser | 875 | 499 | Politicians + bots | [ |
Comparison of Botometer-V4 and BotometerLite APIs
| Model | Botometer-V4 | BotometerLite |
|---|---|---|
| Endpoint | Check account | Check account in bulk |
| Query payload | User object, 200 most recent tweets, mentions | List of user objects and timestamps |
| Response | Raw bot scores, sub-scores, CAP scores, basic account information, etc. | BotometerLite scores |
| Daily number of accounts allowed | 43,200 | |
| Corresponding botometer-python method(s) | check_account | check_accounts_from_tweets, check_accounts_from_user_ids, check_accounts_from_screen_names |
*The values represent the upper bounds based on Twitter’s API rate limit when using a single app key. The actually numbers depend on other factors such as internet speed as well
Common resources for using Botometer
| Resource name | Resource | Note |
|---|---|---|
| Botometer website | Web interface of Botometer: useful for checking a small amount of accounts | |
| Botometer Pro API | API of Botometer: useful for checking accounts in bulk programmatically | |
| Botometer-python package | Python package to access Botometer Pro API | |
| Botometer case study | Case study using Botometer with source code | |
| Bot repository | Annotated training datasets for Botometer |
Numbers of tweets and unique accounts mentioning different cashtags in raw data and analytical sample
| Raw data | Analytical sample | |||
|---|---|---|---|---|
| Cashtag | Tweets | Unique accounts | Tweets | Unique accounts |
| $SHIB | 2000 | 1241 | 1819 | 1111 |
| $FLOKI | 2000 | 937 | 1893 | 860 |
| $AAPL | 2000 | 1107 | 1864 | 1006 |
Fig. 2Percentage of accounts using each language in the three datasets combined
Fig. 3a Bot score distributions for tweets mentioning different cashtags. b Percentage of tweets posted by likely bots using 0.5 as a threshold. c Box plots of the bot scores for tweets mentioning different cashtags. The white lines indicate the median values; the white dots indicate the mean values. d Similar to b but using a bot score threshold of 0.7. Statistical tests are performed for pairs of results in b–d. Significance level is represented by the stars: ***, **, *, NS
Fig. 4Screenshot of a bot-like account replying to a tweet containing the keyword “NFT” with a message promoting cryptocurrencies. The same message was replied by this account to a large number of tweets
Fig. 5Time series of bot scores of an account from September 2020 to November 2021. The queries were not made regularly, so the time intervals between consecutive data points vary