| Literature DB >> 32214879 |
Koustav Rudra1, Ashish Sharma1, Niloy Ganguly1, Muhammad Imran2.
Abstract
During a new disease outbreak, frustration and uncertainties among affected and vulnerable population increase. Affected communities look for known symptoms, prevention measures, and treatment strategies. On the other hand, health organizations try to get situational updates to assess the severity of the outbreak, known affected cases, and other details. Recent emergence of social media platforms such as Twitter provide convenient ways and fast access to disseminate and consume information to/from a wider audience. Research studies have shown potential of this online information to address information needs of concerned authorities during outbreaks, epidemics, and pandemics. In this work, we target three types of end-users (i) vulnerable population-people who are not yet affected and are looking for prevention related information (ii) affected population-people who are affected and looking for treatment related information, and (iii) health organizations-like WHO, who are interested in gaining situational awareness to make timely decisions. We use Twitter data from two recent outbreaks (Ebola and MERS) to build an automatic classification approach useful to categorize tweets into different disease related categories. Moreover, the classified messages are used to generate different kinds of summaries useful for affected and vulnerable communities as well as health organizations. Results obtained from extensive experimentation show the effectiveness of the proposed approach. © Springer Science+Business Media, LLC, part of Springer Nature 2018.Entities:
Keywords: Classification; Epidemic; Health crisis; Summarization; Twitter
Year: 2018 PMID: 32214879 PMCID: PMC7087635 DOI: 10.1007/s10796-018-9844-9
Source DB: PubMed Journal: Inf Syst Front ISSN: 1387-3326 Impact factor: 6.191
Fig. 1Our proposed framework for classification-summarization of tweets posted during epidemic
Examples of various types of disease tweets (which contribute to information about epidemic) and non-disease tweets
| Type | Event | Tweet text |
|---|---|---|
| Disease tweets (which contribute to information about epidemic) | ||
| Ebola | Early #ebola symptoms include fever headache body aches cough stomach pain | |
| vomiting and diarrhea | ||
| Symptom | MERS | Middle east respiratory syndrome symptoms include cough fever can lead to |
| pneumonia & kidney failure | ||
| Ebola | Ebola is a deadly disease prevent it today drink / bath with salty warm water | |
| Prevention | MERS | #mers prevention tip 3/5—avoid touching your eyes nose and mouth with |
| unwashed hands | ||
| Disease | Ebola | Airborne cdc now confirms concerns of airborne transmission of ebola |
| transmission | MERS | World health a camel reasons corona virus transmission |
| Ebola | Dozens flock to new liberia ebola treatment center new liberia ebola treatment | |
| center receives more than 100 | ||
| Treatment | MERS | cn-old drugs tested to fight new disease mers |
| Death | Ebola | The largest #ebola outbreak on record has killed 4000 + |
| report | MERS | Saudia Arabia reports 102 deaths from mers disease |
| Non-disease tweets | ||
| Not | Ebola | lies then he came to attack nigeria with ebola disease what is govt doing about |
| relevant | that too | |
| MERS | good question unfortunately i have not the answer but something to investigate | |
| fomites #mers | ||
Number of tweets present in different classes
| Event | Symptom | Prevention | Transmission | Treatment | Death report | Non-disease |
|---|---|---|---|---|---|---|
| Ebola | 52 | 69 | 65 | 59 | 51 | 56 |
| MERS | 105 | 70 | 77 | 74 | 68 | 84 |
Lexical features used to classify tweets across different classes
| Feature | Explanation |
|---|---|
| Presence of | We check if a concept (‘phsf’, ‘sosy’) related to symptoms is present in the |
| sign/symptoms | tweet. Expected to be higher in symptom related tweets. The semantic types |
| which indicate the presence of such term are Sign or Symptom (‘sosy’); | |
| Physiologic Function (‘phsf’)) | |
| Presence of preventive | Concepts related to preventive procedures (‘topp’) mostly present |
| procedures | in preventive category tweets |
| Presence of anatomy | Preventive procedures sometimes indicate taking care of certain parts of body. |
| This feature identifies the presence of terms related to body system, | |
| substance, junction, body part, organ, or organ Component. Concepts like | |
| ‘bdsu’, ‘blor’, ‘bpoc’ are present in tweets describing anatomical structures | |
| Presence of preventive | Terms like ‘preventive’, ‘prevention’ etc. indicates tweets containing |
| terms | information about preventive mechanism |
| Presence of transmission | Terms like ‘transmission’, ‘spread’ mostly present in tweets related to disease |
| terms | transmission |
| Presence of treatment terms | Terms like ‘treating’, ‘treatment’ mostly present in tweets related to treatment |
| Presence of death terms | Tweets related to dead people contains terms like ‘die’, ‘kill’, ‘death’ etc |
Classification accuracies of tweets, using (i) bag-of-words features (BOW), (ii) proposed features (PRO). Diagonal entries are for in-domain classification, while the non-diagonal entries are for cross-domain classification. Values in the bracket represent standard deviations in case of in-domain accuracies
| Train set | Test set | |||
|---|---|---|---|---|
| Ebola | MERS | |||
| BOW | PRO | BOW | PRO | |
| Ebola |
|
| 65.69% |
|
| MERS | 66.19% |
|
|
|
In-domain classification results are represented by italic entries. For each train-test pair, the accuracy of better performing system has been boldfaced
Recall (F-score) of tweets, using (i) bag-of-words features (BOW), (ii) proposed features (PRO)
| Train set | Test set | |||
|---|---|---|---|---|
| Ebola | MERS | |||
| BOW | PRO | BOW | PRO | |
| Ebola |
|
| 0.65(0.66) | 0.76(0.76) |
| MERS | 0.66(0.65) | 0.75(0.75) |
|
|
In-domain classification results are represented by italic entries. For each train-test pair, the accuracy of better performing system has been boldfaced
Examples of misclassified tweets
| Tweet | True class | Predicted class |
|---|---|---|
| Worried about the #mers #virus here are 10 ways to boost your body’s | Prevention | Not relevant |
| immune system to fight disease #health | ||
| The truth is that #coronavirus #mers can transmit between humans we | Prevention | Disease |
| think not as well as flu but protect yourself anyway wash hands 24/7 | transmission | |
| From on mers-cov wash your hands cover your coughs and sneezes | Prevention | Symptom |
| and stay home if you are sick | ||
| Learn more about #mers the virus that causes it how it spreads symptoms | Symptom | Prevention |
| prevention tips & amp what cdc is doing | ||
| Wash your hands folks and keep your areas clean mers-middle east | Prevention | Death reports |
| respiratory syndrome 1/3 of the people who get this dies | ||
| #mers is not as contagious as the flu says #infectiousdisease expert via | Disease | Not relevant |
| transmission |
Sample tweets posted during outbreak containing symptoms in positive and negative context
| Context | Tweet |
|---|---|
| #Ebola symptoms: fever, headache, muscle aches, weakness, no appetite, | |
| stomach pain, vomiting, diarrhea & bleeding | |
| Positive | RT @NTANewsNow: Ebola symptoms starts as malaria or cold then vomiting, |
| weakness, Joint & Muscle Ache, Stomach pain and Lack of Appetite | |
| #Ebola symptoms are different than upper respiratory tract pathogens, no | |
| cough, nasal congestion Dr. Wilson | |
| Negative | I’ve been informed that coughing is not a symptom of Ebola |
Sample tweets posted during outbreak containing information about transmission mediums in positive and negative context
| Context | Tweet |
|---|---|
| @USER @USER @USER I’ve also read that Ebola can spread thru airborne | |
| transmission [url] | |
| Positive | #Ebola virus could be transmitted via infectious aerosol particles |
| Idiots & liars! @USER WH briefing: “Ebola is not like the flu. #Ebola is | |
| Negative | RT @USER: CDc: You must have personal contact to contract #Ebola. It |
| is |
Notations used in the summarization technique
| Notation | Meaning |
|---|---|
|
| Desired summary length (number of words) |
|
| Number of tweets considered for summarization (in the |
| time window specified by user) | |
|
| Number of distinct content words included in the |
|
| Index for tweets |
|
| Index for preventive terms |
|
| Indicator variable for tweet |
| included in summary, 0 otherwise) | |
|
| Indicator variable for preventive term |
| Number of words present in tweet | |
| Score( | cf score of preventive term |
|
| Set of tweets where content word |
|
| Set of preventive terms present in tweet |
Sample tweets posted during outbreak containing information about killed or died people
| As of Oct. 15th 2014 CDC numbers for #Ebola are 8997 total cases, |
| 5006 laboratory-confirmed cases, and 4493 deaths in total |
| RT @USER: New WHO numbers on #Ebola outbreak in 3 West |
| African countries: 1440 ill including 826 deaths. (As of 7/30) |
| #Ebola has infected almost 10,000 people this year, mostly in Sierra |
| Leone, Guinea and Liberia, killing about 4900 |
| RT @USER: #Ebola: As of 4 Aug 2014, countries have reported |
| 1711 cases (1070 conf, 436 probable, 205 susp), incl 932 deaths |
Sample tweets posted during outbreak containing recovery information
| Fujifilm Drug Eyed As Possible Treatment For Ebola Virus |
| @USER Guarded optimism - use of #HIV antiviral to treat #ebola. |
| FDA-approved genital warts drug could treat #MERS |
| RT @USER: DNA vaccine demonstrates potential to prevent and treat |
| deadly MERS coronavirus: Inovio Pharmaceuticals |
Precision and recall of our symptom identification method
| Disease | Precision | Recall |
|---|---|---|
| Ebola | 0.80 | 0.65 |
| MERS | 0.80 | 0.60 |
Precision and recall of our transmission mediums detection method
| Disease | #Mediums | Precision | Recall |
|---|---|---|---|
| 10 | 0.70 | 0.53 | |
| Ebola | 20 | 0.65 | 0.92 |
| 10 | 0.50 | 0.42 | |
| MERS | 20 | 0.40 | 0.67 |
Comparison of ROUGE-1 recall and F-scores (Twitter-specific tags, emoticons, hashtags, mentions, urls, removed and standard rouge stemming(-m) and stopwords(-s) option) for MEDSUM (the proposed methodology) and the baseline method COWTS for prevention class
| Event | MEDSUM | COWTS | ||
|---|---|---|---|---|
| Recall | F-score | Recall | F-score | |
| Ebola |
|
| 0.4575 | 0.5109 |
| MERS |
|
| 0.4761 | 0.4811 |
For each evaluation metric, the result of better performing system has been boldfaced
Comparison of ROUGE-1 recall and F-scores (Twitter-specific tags, emoticons, hashtags, mentions, urls, removed and standard rouge stemming(-m) and stopwords(-s) option) for MEDSUM (the proposed methodology) and the baseline method COWTS for death reports
| Event | MEDSUM | COWTS | ||
|---|---|---|---|---|
| Recall | F-score | Recall | F-score | |
| Ebola |
|
| 0.4961 | 0.4942 |
| MERS |
|
| 0.3448 | 0.3322 |
For each evaluation metric, the result of better performing system has been boldfaced
Comparison of ROUGE-1 recall and F-scores (Twitter-specific tags, emoticons, hashtags, mentions, urls, removed and standard rouge stemming(-m) and stopwords(-s) option) for MEDSUM (the proposed methodology) and the baseline method COWTS for treatment class
| Event | MEDSUM | COWTS | ||
|---|---|---|---|---|
| Recall | F-score | Recall | F-score | |
| Ebola |
|
| 0.3858 | 0.3525 |
| MERS |
|
| 0.4642 | 0.4244 |
For each evaluation metric, the result of better performing system has been boldfaced
Fraction of class specific terms covered and missed in symptom, prevention, and treatment class for both Ebola and MERS
| Event | Symptom | Prevention | Treatment | |||
|---|---|---|---|---|---|---|
| Covered | Missed | Covered | Missed | Covered | Missed | |
| Ebola | 82.35 | 17.65 | 71.43 | 28.57 | 65 | 35 |
| MERS | 94.44 | 5.56 | 91.67 | 8.33 | 78.57 | 21.43 |
Fig. 2Variation of running time with number of tweets