| Literature DB >> 32944604 |
Ari Z Klein1, Graciela Gonzalez-Hernandez1.
Abstract
Despite the prevalence in the United States of miscarriage [1], stillbirth [2], and infant mortality associated with preterm birth and low birthweight [3], their causes remain largely unknown [4], [5], [6]. To advance the use of social media data as a complementary resource for epidemiology of adverse pregnancy outcomes, we present a data set of 6487 tweets that mention miscarriage, stillbirth, preterm birth or premature labor, low birthweight, neonatal intensive care, or fetal/infant loss in general. These tweets are a subset of 22,912 tweets retrieved by applying hand-written regular expressions to a database containing more than 400 million public tweets posted by more than 100,000 women who have announced their pregnancy on Twitter [7]. Two professional annotators labeled the 6487 tweets in a binary fashion, distinguishing those potentially reporting that the user has personally experienced the outcome ("outcome" tweets) from those that merely mention the outcome ("non-outcome" tweets). Inter-annotator agreement was κ = 0.90 (Cohen's kappa). The tweets annotated as "outcome" include 1318 women reporting miscarriage, 94 stillbirth, 591 preterm birth or premature labor, 171 low birthweight, 453 neonatal intensive care, and 356 fetal/infant loss in general. These "outcome" tweets can be used to explore patient experiences and perceptions of adverse pregnancy outcomes, and can direct researchers to the users' broader timelines-tweets posted by a user over time-for observational studies. Our past work demonstrates the analysis of timelines for selecting a study population [8] and conducting a case-control study [9] of users reporting that their child has a birth defect. For larger-scale studies, the full annotated corpus can be used to train supervised machine learning algorithms to automatically identify additional users reporting adverse pregnancy outcomes on Twitter. We used the annotated corpus to train feature-engineered and deep learning-based classifiers presented in "A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes" [10].Entities:
Keywords: Data mining; Epidemiology; Machine learning; Natural language processing; Pregnancy; Social media
Year: 2020 PMID: 32944604 PMCID: PMC7481818 DOI: 10.1016/j.dib.2020.106249
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Samples and frequency of “outcome” (+) and “non-outcome” (-) tweets matching 11 query patterns. For ethical considerations, the tweets are slightly modified and usernames and URLs are redacted. The bold text indicates the string matched by the regular expression.
| 1 | Miscarriage | + | 1318 | |
| A few months ago | – | 2047 | ||
| 2 | Fetal/infant loss | 3 years ago today | + | 315 |
| – | 94 | |||
| 3 | Stillbirth | Stress can be vary dangerous! | + | 83 |
| – | 436 | |||
| 4 | Preterm birth/labor | @[username] | + | 248 |
| – | 411 | |||
| 5 | Preterm birth/labor | Really? | + | 175 |
| – | 303 | |||
| 6 | Neonatal intensive care | @[username] | + | 453 |
| People are so judgmental. | – | 138 | ||
| 7 | Preterm birth/labor | My baby had her owns plans and decided to | + | 174 |
| I had a dream that my baby was | – | 139 | ||
| 8 | Low birthweight | On December 13th we | + | 166 |
| last ultrasound before she's | – | 79 | ||
| 9 | Fetal/infant loss | Thinking about | + | 41 |
| – | 14 | |||
| 10 | Low birthweight | It turns out I was losing my amnio fluid but I didnt want to waste anyones time. My son was | + | 5 |
| My friends little boy was | – | 12 | ||
| 11 | Stillbirth | @[username] | + | 11 |
| – | 1 |
| Subject | Health informatics |
| Specific subject area | Social media mining for studying adverse pregnancy outcomes |
| Type of data | Text |
| How data were acquired | The raw data were acquired from a database of public tweets |
| Data format | Raw, analyzed |
| Parameters for data collection | Tweets were collected if they mention miscarriage, stillbirth, preterm birth/premature labor, low birthweight, or neonatal intensive care. |
| Description of data collection | Handcrafted regular expressions retrieved 22,912 tweets that mention adverse pregnancy outcomes from a database containing public tweets posted by women who have announced their pregnancy on Twitter |
| Data source location | Various |
| Data accessibility | With the article |
| Related research article | A.Z. Klein, H. Cai, D. Weissenbacher, L.D. Levine, G. Gonzalez-Hernandez, A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes, Journal of Biomedical Informatics: X, Available online 8 August 2020, 100076. |