| Literature DB >> 33824934 |
Emily Chen1, Ashok Deb1, Emilio Ferrara1.
Abstract
Credible evidence-based political discourse is a critical pillar of democracy and is at the core of guaranteeing free and fair elections. The study of online chatter is paramount, especially in the wake of important voting events like the recent November 3, 2020 U.S. Presidential election and the inauguration on January 21, 2021. Limited access to social media data is often the primary obstacle that limits our abilities to study and understand online political discourse. To mitigate this impediment and empower the Computational Social Science research community, we are publicly releasing a massive-scale, longitudinal dataset of U.S. politics- and election-related tweets. This multilingual dataset encompasses over 1.2 billion tweets and tracks all salient U.S. political trends, actors, and events from 2019 to the time of this writing. It predates and spans the entire period of the Republican and Democratic primaries, with real-time tracking of all presidential contenders on both sides of the aisle. The dataset also focuses on presidential and vice-presidential candidates, the presidential elections and the transition from the Trump administration to the Biden administration. Our dataset release is curated, documented, and will continue to track relevant events. We hope that the academic community, computational journalists, and research practitioners alike will all take advantage of our dataset to study relevant scientific and social issues, including problems like misinformation, information manipulation, conspiracies, and the distortion of online political discourse that has been prevalent in the context of recent election events in the United States. Our dataset is available at: https://github.com/echen102/us-pres-elections-2020.Entities:
Keywords: Presidential election; Social media analysis; Twitter
Year: 2021 PMID: 33824934 PMCID: PMC8017518 DOI: 10.1007/s42001-021-00117-9
Source DB: PubMed Journal: J Comput Soc Sci ISSN: 2432-2725
A sample of the mentions and accounts that we actively tracked (v1.12 — January 25, 2021)
| Mentions | Started tracking | Stopped | Restarted |
|---|---|---|---|
| @realDonaldTrump | 5/20/19 | – | – |
| @GovBillWeld | 5/20/19 | – | – |
| @MarkSanford | 5/20/19 | 11/14/19 | 9/25/20 |
| @WalshFreedom | 5/20/19 | – | – |
| @MichaelBennet | 5/20/19 | – | – |
| @JoeBiden | 5/20/19 | – | – |
| @CoryBooker | 5/20/19 | 1/13/20 | 9/25/20 |
| @GovernorBullock | 5/20/19 | 12/2/19 | 9/25/20 |
| @PeteButtigieg | 5/20/19 | – | – |
| @JulianCastro | 5/20/19 | 1/2/20 | 9/25/20 |
| @BilldeBlasio | 5/20/19 | 11/14/19 | 9/25/20 |
| @JohnDelaney | 5/20/19 | – | – |
| @TulsiGabbard | 5/20/19 | – | – |
| @gillbrandny | 5/20/19 | 11/14/19 | 6/20/20 |
| @KamalaHarris | 5/20/19 | 12/3/19 | 6/20/20 |
| @SenKamalaHarris | 5/20/19 | 12/3/19 | 6/20/20 |
| @Hickenlooper | 5/20/19 | 11/14/19 | 9/25/20 |
| @JayInslee | 5/20/19 | 11/14/19 | 9/25/20 |
| @amyklobuchar | 5/20/19 | – | – |
| @SenAmyKlobuchar | 5/20/19 | 3/3/20 | 6/20/20 |
| @WayneMessam | 5/20/19 | 12/2/19 | 9/25/20 |
| @sethmoulton | 5/20/19 | 11/14/19 | 9/25/20 |
| @BetoORourke | 5/20/19 | 11/14/19 | 9/25/20 |
| @TimRyan | 5/20/19 | 11/14/19 | 9/25/20 |
| @BernieSanders | 5/20/19 | – | – |
| @ericswalwell | 5/20/19 | 11/14/19 | 9/25/20 |
| @ewarren | 5/20/19 | – | – |
| @SenWarren | 6/20/20 | – | – |
| @marwilliamson | 5/20/19 | – | – |
| @AndrewYang | 5/20/19 | – | – |
| @JoeSestak | 5/20/19 | 12/2/19 | 9/25/20 |
| @MikeGravel | 5/20/19 | 8/6/19 | 9/25/20 |
| @TomSteyer | 5/20/19 | – | – |
| @DevalPatrick | 5/20/19 | – | – |
| @MikeBloomberg | 5/20/19 | – | – |
| @staceyabrams | 6/20/20 | – | – |
| @SenDuckworth | 6/20/20 | – | – |
| @TammyforIL | 6/20/20 | – | – |
| @KeishaBottoms | 6/20/20 | – | – |
| @RepValDemings | 6/20/20 | – | – |
| @val_demings | 6/20/20 | – | – |
| @AmbassadorRice | 6/20/20 | – | – |
| @GovMLG | 6/20/20 | – | – |
| @Michelle4NM | 6/20/20 | – | – |
| @SenatorBaldwin | 6/20/20 | – | – |
| @tammybaldwin | 6/20/20 | – | – |
| @KarenBassTweets | 6/20/20 | – | – |
| @RepKarenBass | 6/20/20 | – | – |
| @Maggie_Hassan | 6/20/20 | – | – |
| @SenatorHassan | 6/20/20 | – | – |
| @GovRaimondo | 6/20/20 | – | – |
| @GinaRaimondo | 6/20/20 | – | – |
| @GovWhitmer | 6/20/20 | – | – |
| @gretchenwhitmer | 6/20/20 | – | – |
A sample of keywords that we actively tracked in our Twitter collection (v1.12 — January 25, 2021)
| Keywords | Tracked since |
|---|---|
| ballot | 6/20/20 |
| mailin | 6/20/20 |
| mail-in | 6/20/20 |
| mail in | 6/20/20 |
| donaldtrump | 9/12/20 |
| donaldjtrump | 9/12/20 |
| donald j trump | 9/12/20 |
| donald trump | 9/12/20 |
| don trump | 9/12/20 |
| joe biden | 9/12/20 |
| joebiden | 9/12/20 |
| biden | 9/12/20 |
| mike pence | 9/12/20 |
| michael pence | 9/12/20 |
| mikepence | 9/12/20 |
| michaelpence | 9/12/20 |
| kamala harris | 9/12/20 |
| kamala | 9/12/20 |
| kamalaharris | 9/12/20 |
| trump | 9/13/20 |
| PresidentTrump | 9/13/20 |
| MAGA | 9/13/20 |
| trump2020 | 9/13/20 |
| Sleepy Joe | 9/13/20 |
| Sleepyjoe | 9/13/20 |
| HidenBiden | 9/13/20 |
| CreepyJoeBiden | 9/13/20 |
| NeverBiden | 9/13/20 |
| BidenUkraineScandal | 9/13/20 |
| DumpTrump | 9/13/20 |
| NeverTrump | 9/13/20 |
| VoteRed | 9/13/20 |
| VoteBlue | 9/13/20 |
| RussiaHoax | 9/13/20 |
Top 40 hashtags (v1.12 — January 25, 2021)
| Conservative/Trump campaign | Liberal/Biden campaign | Conspiracy | Other |
|---|---|---|---|
| trump2020 | bidenharris2020 | wwg1wga | coronavirus |
| trump | joebiden | stopthesteal | vote |
| kag | biden2020 | qanon | election2020 |
| americafirst | demconvention | dobbs | breaking |
| kag2020 | demdebate | trumpvirus | |
| maga2020 | democrats | foxnews | |
| trump2020landslide | yanggang | fakenews | |
| usa | |||
| china | |||
| georgia | |||
| resist | |||
| covid | |||
| gapol | |||
| fightback | |||
| walkaway | |||
| blacklivesmatter | |||
| debates2020 | |||
| wethepeople |
Top 10 language breakdown for release v1.12. Languages were automatically tagged by Twitter and returned in a tweet’s metadata
| Language | ISO | # Tweets | Percentage |
|---|---|---|---|
| English | en | 1,111,698,635 | 88.36% |
| Undefined | und | 95,452,866 | 7.59% |
| Spanish | es | 17,387,937 | 1.38% |
| French | fr | 5,703,955 | 0.45% |
| Portuguese | pt | 5,224,164 | 0.42% |
| Japanese | ja | 3,368,223 | 0.27% |
| German | de | 1,743,004 | 0.14% |
| Turkish | tr | 1,700,836 | 0.14% |
| Indonesian | in | 1,680,790 | 0.13% |
| Italian | it | 1,585,394 | 0.13% |
This table lists each of the 2020 US Presidential candidates’ names, party affiliation and campaign suspension date.
| Candidate name | Party affiliation | Campaign suspended |
|---|---|---|
| Joseph R. Biden Jr. | Democrat | Democratic Nominee |
| Donald J. Trump | Republican | Republican Nominee |
| Bernie Sanders | Democrat | 4/8/20 |
| William F. Weld | Republican | 3/18/20 |
| Tulsi Gabbard | Democrat | 3/19/20 |
| Elizabeth Warren | Democrat | 3/05/20 |
| Michael R. Bloomberg | Democrat | 3/04/20 |
| Amy Klobuchar | Democrat | 3/02/20 |
| Pete Buttigieg | Democrat | 3/01/20 |
| Deval Patrick | Democrat | 2/12/20 |
| Andrew Yang | Democrat | 2/11/20 |
| Michael Bennet | Democrat | 2/11/20 |
| Joe Walsh | Republican | 2/07/20 |
| John Delaney | Democrat | 1/31/20 |
| Cory Booker | Democrat | 1/13/20 |
| Marianne Williamson | Democrat | 1/10/20 |
| Julin Castro | Democrat | 1/02/20 |
| Kamala Harris | Democrat | 12/03/19 |
| Steve Bullock | Democrat | 12/02/19 |
| Joe Sestak | Democrat | 12/01/19 |
| Wayne Messam | Democrat | 11/20/19 |
| Mark Sanford | Republican | 11/12/19 |
| Beto O’Rourke | Democrat | 11/01/19 |
| Tim Ryan | Democrat | 10/24/19 |
| Bill de Blasio | Democrat | 9/20/19 |
| Kirsten Gillibrand | Democrat | 8/28/19 |
| Seth Moulton | Democrat | 8/23/19 |
| Jay Inslee | Democrat | 8/21/19 |
| John Hickenlooper | Democrat | 8/15/19 |
| Eric Swalwell | Democrat | 7/08/19 |
| Richard Ojeda | Democrat | 1/25/19 |
https://www.nytimes.com/interactive/2019/us/politics/2020-presidential-candidates.html
Keywords for each Democratic candidate that had not suspended their campaign by March 2020, and for Republican candidate Trump. We used these keywords to identify whether or not a candidate was mentioned in a tweet. We note that one tweet can be counted towards multiple candidates, if multiple candidates are mentioned in a tweet
| Candidate name | Keywords |
|---|---|
| Donald J. Trump | @realDonaldTrump, realDonaldTrump, Donald Trump, DonaldTrump, Trump |
| Joseph R. Biden Jr. | @JoeBiden, JoeBiden, Joe Biden, Biden |
| Bernie Sanders | @BernieSanders, BernieSanders, Bernie Sanders, Sanders |
| Tulsi Gabbard | @TulsiGabbard, TulsiGabbard, Tulsi Gabbard, Gabbard |
| Elizabeth Warren | @ewarren, @senwarren, ewarren, senwarren, ElizabethWarren, Elizabeth Warren, Warren |
| Michael R. Bloomberg | @MikeBloomberg, MikeBloomberg, Mike Bloomberg, MichaelBloomberg, Michael Bloomberg, Bloomberg |
| Amy Klobuchar | @amyklobuchar, @senamyklobuchar, amyklobuchar, senamyklobuchar, amy klobuchar, klobuchar |
| Pete Buttigieg | @PeteButtigieg, PeteButtigieg, Pete Buttigieg, Buttigieg |
Fig. 1The above figure shows a time series analysis of tweets that mention keywords related to a Democratic nominee’s campaign from January 2020 through May 8, 2020. Sanders announced the suspension of his presidential campaign on April 8, 2020, so we capture all discourse through a month after Biden was declared the presumptive Democratic Presidential nominee. We measure the percentage of total tweets collected on a particular day that mention the candidate on a rolling 7-day average. The keywords we use for each candidate can be found in Table 6 and descriptions of the noted dates in the table below the time series. We also include the raw volume of all tweets collected on a particular day on a rolling 7-day average above the time series
Fig. 2The above figure shows a time series analysis of tweets that mention keywords related to either Trump or Biden from December 2020 through January 2020. We measure the percentage of total tweets collected on a particular day that mention the candidate on a rolling 7-day average. The keywords we use for each candidate can be found in Table 6 and descriptions of the noted dates in the table below the time series. We also include the raw volume of all tweets collected on a particular day on a rolling 7-day average above the time series
Fig. 3We remove all tweets without an identifiable state, and visualize the intra-state tweet engagement activity within the United States in our dataset. For each retweet or quoted tweet, we visualize the geographic flow from the original poster’s state to the retweeter’s state. The line color corresponds to the location of the original poster