| Literature DB >> 31583284 |
Ari Z Klein1, Abeed Sarker2, Davy Weissenbacher1, Graciela Gonzalez-Hernandez1.
Abstract
Social media has recently been used to identify and study a small cohort of Twitter users whose pregnancies with birth defect outcomes-the leading cause of infant mortality-could be observed via their publicly available tweets. In this study, we exploit social media on a larger scale by developing natural language processing (NLP) methods to automatically detect, among thousands of users, a cohort of mothers reporting that their child has a birth defect. We used 22,999 annotated tweets to train and evaluate supervised machine learning algorithms-feature-engineered and deep learning-based classifiers-that automatically distinguish tweets referring to the user's pregnancy outcome from tweets that merely mention birth defects. Because 90% of the tweets merely mention birth defects, we experimented with under-sampling and over-sampling approaches to address this class imbalance. An SVM classifier achieved the best performance for the two positive classes: an F1-score of 0.65 for the "defect" class and 0.51 for the "possible defect" class. We deployed the classifier on 20,457 unlabeled tweets that mention birth defects, which helped identify 542 additional users for potential inclusion in our cohort. Contributions of this study include (1) NLP methods for automatically detecting tweets by users reporting their birth defect outcomes, (2) findings that an SVM classifier can outperform a deep neural network-based classifier for highly imbalanced social media data, (3) evidence that automatic classification can be used to identify additional users for potential inclusion in our cohort, and (4) a publicly available corpus for training and evaluating supervised machine learning algorithms.Entities:
Keywords: Data mining; Epidemiology
Year: 2019 PMID: 31583284 PMCID: PMC6773753 DOI: 10.1038/s41746-019-0170-5
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1F1-scores (F) for “defect” (+), “possible defect” (?), and “non-defect” (−) tweet classes for three classifiers trained on the original, imbalanced data set
F1-scores (F) for “defect” (+), “possible defect” (?), and “non-defect” (−) tweet classes for three classifiers trained on the original, under-sampled, and over-sampled data sets
| Classifier | Training set | F (+) | F (?) | F (−) |
|---|---|---|---|---|
| NB | Original, imbalanced training set (14,716) | 0.54 | 0.44 | 0.94 |
| NB | Under-sampling based on similar majority class tweets in original training set (5551)a | 0.46 | 0.38 | 0.92 |
| NB | Under-sampling based on similar false-negative majority class tweets (8015)b | 0.44 | 0.40 | 0.92 |
| NB | Random under-sampling control set (5551)c | 0.50 | 0.43 | 0.93 |
| NB | Random under-sampling control set (8015)c | 0.51 | 0.44 | 0.93 |
| NB | Over-sampling instances of minority classes with replacement (40,675)d | 0.49 | 0.40 | 0.93 |
| NB | SMOTE on original training set (39,148)e | 0.36 | 0.30 | 0.95 |
| SVM | Original, imbalanced training set (14,716) | 0.62 | 0.52 | 0.96 |
| SVM | Under-sampling based on similar majority class tweets in original training set (5551)a | 0.62 | 0.43 | 0.96 |
| SVM | Under-sampling based on similar false-negative majority class tweets (8015)b | 0.58 | 0.51 | 0.95 |
| SVM | Random under-sampling control set (5551)c | 0.62 | 0.49 | 0.96 |
| SVM | Random under-sampling control set (8015)c | 0.62 | 0.50 | 0.96 |
| SVM | Over-sampling instances of minority classes with replacement (40,675)d | 0.62 | 0.46 | 0.95 |
| SVM | SMOTE on original training set (39,148)e | 0.62 | 0.51 | 0.96 |
| LSTM | Original, imbalanced training set (14,716) | 0.60 | 0.35 | 0.96 |
| LSTM | Under-sampling based on similar majority class tweets in original training set (5551)a | 0.55 | 0.33 | 0.91 |
| LSTM | Under-sampling based on similar false-negative majority class tweets (8015)b | 0.48 | 0.36 | 0.90 |
| LSTM | Random under-sampling control set (5551)c | 0.54 | 0.37 | 0.92 |
| LSTM | Random under-sampling control (8015)c | 0.59 | 0.45 | 0.95 |
| LSTM | Over-sampling instances of minority classes with replacement (40,675)d | 0.55 | 0.45 | 0.95 |
aMethod (1) described in the “Methods” section
bMethod (2) described in the “Methods” section
cMethod (3) described in the “Methods” section
dMethod (4) described in the “Methods” section
eMethod (5) described in the “Methods” section
Feature ablation for a support vector machine (SVM) classifier: precision (P), recall (R), and F1-scores (F) for “defect” (+), “possible defect” (?), and “non-defect” (−) tweet classes
| Features | P (+) | R (+) | F (+) | P (?) | R (?) | F (?) | P (−) | R (−) | F (−) |
|---|---|---|---|---|---|---|---|---|---|
| All | 0.62 | 0.68 | 0.65 | 0.58 | 0.45 | 0.51 | 0.96 | 0.96 | 0.96 |
| - W/O word n-grams | 0.22 | 0.55 | 0.32 | 0.43 | 0.22 | 0.29 | 0.94 | 0.89 | 0.92 |
| - W/O word clusters | 0.67 | 0.58 | 0.62 | 0.58 | 0.46 | 0.52 | 0.95 | 0.97 | 0.96 |
| - W/O word/character lengths | 0.62 | 0.68 | 0.65 | 0.57 | 0.45 | 0.51 | 0.96 | 0.96 | 0.96 |
| Word n-grams | 0.67 | 0.58 | 0.62 | 0.58 | 0.46 | 0.52 | 0.95 | 0.97 | 0.96 |
| Word clusters | 0.20 | 0.58 | 0.30 | 0.43 | 0.22 | 0.29 | 0.95 | 0.87 | 0.91 |
Fig. 2Precision-recall curve for a support vector machine (SVM) classifier for the “defect” tweet class
Fig. 3Precision-recall curve for a support vector machine (SVM) classifier for the “possible defect” tweet class
Confusion matrix for a support vector machine (SVM) classifier for three classes of tweets: “defect” (+), “possible defect” (?), and “non-defect” (−)
| Predicted | ||||
|---|---|---|---|---|
| + | ? | − | ||
| 163 | 12 | 64 | + | |
| 18 | 109 | 113 | ? | Actual |
| 81 | 68 | 3974 | − | |
Samples of false-negative “defect” (+) and “possible defect” (?) tweets, misclassified as “non-defect” (−) by a support vector machine (SVM) classifier trained on the original, imbalanced data set, with word n-grams, word clusters, and word/character lengths as features
| Tweet | Actual | Predicted | |
|---|---|---|---|
| 1 | [name] was diagnosed with craniosynostosis and had surgery to repair it. | ? | – |
| 2 | He has a cleft palate…..I don’t think he’s special needs. | ? | – |
| 3 | I’m just in love w. this little boy. Hydrocephalus, split cerebellum, 2 vessel cord, dilated kidney, & survived | ? | – |
| 4 | Born @ 25 weeks, hole in heart, no anal area, narrow airway, no vocal chords, floppy voicebox, missing vertebrae. Healed! | ? | – |
| 5 | They couldn’t get a good picture of | ? | – |
| 6 | #DUPCoalition say the decision I was forced to make for my unborn daughter was wrong Trisomy 13, alobar holoprosencephaly,cystic hygroma | + | – |
| 7 | Her face | + | – |
| 8 | Raising a daughter with down syndrome makes me dream of a more inclusive society | + | – |
| 9 | My #clubfoot #cutie | + | – |
| 10 | We were given no options other than termination “We have no resources for you” | + | – |
Fig. 4Possible sources of error for false-negative (FN) “defect” (+) and “possible defect” (?) tweets
Examples and per-class frequencies of birth defect terms that distinguish the “defect” (+), “possible defect” (?), and “non-defect” (−) tweet classes in the raw annotated data
| Birth defect term | + | ? | − |
|---|---|---|---|
| CHD | 115 | 79 | 914 |
| Club foot | 65 | 35 | 207 |
| Down syndrome | 407 | 467 | 11,157 |
| Dwarfism | 18 | 18 | 381 |
| Gastroschisis | 27 | 23 | 100 |
| Hydrocephalus | 22 | 39 | 256 |
| Microcephaly | 54 | 11 | 597 |
| Trisomy 18 | 82 | 38 | 232 |
Fig. 5Architecture of the LSTM classifier