| Literature DB >> 30979067 |
Aldo Hernandez-Suarez1, Gabriel Sanchez-Perez2, Karina Toscano-Medina3, Hector Perez-Meana4, Jose Portillo-Portillo5, Victor Sanchez And Luis6,7, Luis Javier García Villalba8.
Abstract
In recent years, Online Social Networks (OSNs) have received a great deal of attention for their potential use in the spatial and temporal modeling of events owing to the information that can be extracted from these platforms. Within this context, one of the most latent applications is the monitoring of natural disasters. Vital information posted by OSN users can contribute to relief efforts during and after a catastrophe. Although it is possible to retrieve data from OSNs using embedded geographic information provided by GPS systems, this feature is disabled by default in most cases. An alternative solution is to geoparse specific locations using language models based on Named Entity Recognition (NER) techniques. In this work, a sensor that uses Twitter is proposed to monitor natural disasters. The approach is intended to sense data by detecting toponyms (named places written within the text) in tweets with event-related information, e.g., a collapsed building on a specific avenue or the location at which a person was last seen. The proposed approach is carried out by transforming tokenized tweets into word embeddings: a rich linguistic and contextual vector representation of textual corpora. Pre-labeled word embeddings are employed to train a Recurrent Neural Network variant, known as a Bidirectional Long Short-Term Memory (biLSTM) network, that is capable of dealing with sequential data by analyzing information in both directions of a word (past and future entries). Moreover, a Conditional Random Field (CRF) output layer, which aims to maximize the transition from one NER tag to another, is used to increase the classification accuracy. The resulting labeled words are joined to coherently form a toponym, which is geocoded and scored by a Kernel Density Estimation function. At the end of the process, the scored data are presented graphically to depict areas in which the majority of tweets reporting topics related to a natural disaster are concentrated. A case study on Mexico's 2017 Earthquake is presented, and the data extracted during and after the event are reported.Entities:
Keywords: CRF; LSTM; data mining; geocoding; geoparsing; twitter; word2vec
Mesh:
Year: 2019 PMID: 30979067 PMCID: PMC6484392 DOI: 10.3390/s19071746
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1An earthquake survivor uses the WhatsApp messaging system to describe their situation inside a collapsed building. The messages translated to English are My love. The roof fell. We are trapped. My love I love you. I love you so much. We are on the 4th floor. Near the emergency staircase. There are 4 of us. My love are you ok? As a result of these messages, rescue teams were able to save the individuals trapped in the rubble [6].
Figure 2A tweet providing the location (spatial information) of a collapsed building, along with a timestamp (temporal information), one day after the 2017 earthquake in Mexico City. The message translated to English is: Mexico. Preliminary damage report #Earthquake in #CdMx Zapata and Peten and Division del Norte collapsed building… It is worth noticing that some users mention places using hashtags. In this example a hashtag #CdMx was used to refer to Mexico City.
Related work that contributes to natural disaster sensing using data extracted from Twitter and other OSNs.
| Title | Description |
|---|---|
| Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors. | To detect a target event, this work classifies tweets on the basis of features such as keywords and the number of words given a context. Then, the methodology estimates a probabilistic spatiotemporal model to find the center and the trajectory of the target event. To this end, each Twitter user is assumed to be a sensor. Then, Kalman particle filtering is applied for location estimation with ubiquitous/pervasive computing. The authors claim that a 96% probability of correctly detecting an earthquake can be achieved by monitoring textual features [ |
| Public health implications of social media use during natural disasters, environmental disasters, and other environmental concerns. | This work analyzes how social media can be used to disseminate information, predict data, and provide early warnings within the context of environmental awareness and health promotion. The work also analyzes how social media can be used as an indicator of public participation in environmental issues. The authors found evidence supporting social media as a useful surveillance tool during natural disasters, environmental disasters, and other environmental concerns. The work shows that public health officials can use social media to gain insight into public opinions and perceptions. Moreover, the work shows that social media allows public health workers and emergency responders to act more quickly and efficiently during crises [ |
| Real-Time Crisis Mapping of Natural Disasters Using Social Media. | In this work, the authors propose a social media crisis mapping platform for natural disasters that uses statistical analysis with geoparsed real-time tweet data streams matched to locations from gazetteers, street maps, and volunteered geographic information. Geoparsing results are benchmarked against existing published work and evaluated across multilingual datasets. Two case studies are presented to compare five-day tweet crisis maps compiled from verified satellite and aerial imagery sources for official post-event impact assessment by the US National Geospatial Agency [ |
| Tweedr: Mining Twitter to Inform Disaster Response. | In this paper, the authors introduce Tweedr, a Twitter-mining tool that extracts actionable information for disaster relief workers during natural disasters. The Tweedr pipeline consists of three main parts: classification, clustering, and extraction. In the classification phase, they use classification methods, namely, Latent Dirichlet Allocation (LDA), Support Vector Machines (SVM), and Logistic Regression, to identify tweets reporting damage or casualties. In the clustering phase, they use filters to merge tweets that are similar. Finally, in the extraction phase, they extract tokens and phrases that report specific information about different classes of infrastructure damage, the types of damage, and casualties [ |
| A Linguistically-driven Approach to Cross-event Damage Assessment of Natural Disasters from Social Media Messages. | In this work, the authors focus on the analysis of Italian social media messages for disaster management. Their aim is to detect those messages conveying critical information for the damage assessment task. The main novelty of this study is the focus on out-of-domain and cross-event damage detection and the investigation of the most relevant tweet-derived features for these tasks. They conducted different experiments by resorting to a wide set of linguistic features to qualify the lexical and grammatical structure of a text, as well as ad-hoc features specifically extracted for this task [ |
| Combining Machine Learning Topic Models and Spatio-temporal Analysis of Social Media data for Disaster Footprint and Damage Assessment. | The authors propose a crisis mapping system by analyzing the textual content of disaster reports from a twofold perspective. A damage detection component employs an SVM classifier to detect mentions of damage among emergency reports. A novel geoparsing technique is proposed and used to perform message geolocation. They report a case study to show how the information extracted through damage detection and message geolocation can be combined to produce accurate crisis maps. The crisis maps detect both highly and lightly damaged areas, thus opening up the possibility to prioritize rescue efforts where they are most needed [ |
| From Social Sensor Data to Collective Human Behaviour Patterns: Analysing and Visualising Spatio-temporal Dynamics in Urban Environments. | This paper presents an approach to analyzing social media posts to assess the footprint of and the damage caused by natural disasters by combining machine learning techniques (LDA) for semantic information extraction with spatial and temporal analysis (local spatial autocorrelation) for hotspot detection. The results demonstrate that earthquake footprints can be reliably and accurately identified. The results also show that a number of relevant semantic topics can be automatically identified without a priori knowledge, revealing clearly differing temporal and spatial signatures. Furthermore, a damage map that indicates where significant losses have occurred is also presented [ |
| The Performance of Publicness in Social Media: tracing patterns in tweets after a disaster | The authors propose a computer-assisted discourse analysis—specifically, a corpus-linguistic-informed analysis of half a million tweets—in order to describe four main public discursive moves that were prevalent during the earthquake in Aotearoa, New Zealand, in 2011. The final results describe how people employ their social media communication at critical, reflexive moments, such as in the aftermath of disaster [ |
| Spatio-Temporal Distribution of Negative Emotions in New York City After a Natural Disaster as Seen in Social Media | In this paper, the authors propose a sentiment analysis technique termed |
Disadvantages of algorithms employed for NER classification
| Algorithm | Disadvantages for NER Tasks |
|---|---|
| Decision Trees (DT) and Random Forests (RF) | In [ |
| Naive Bayes (NB) | The authors of [ |
| Support Vector Machines (SVM) | SVM-based applications have been widely used for NER tasks [ |
| Single Conditional Random Fields (CRF) | CRF is one of the top-ranked generative algorithms used for NER, as studied in ref. [ |
Figure 3Proposed Twitter-based social sensor for natural disasters.
Entity tags used for classification.
| Named Entity Tag Type | Description |
|---|---|
| LOC | Location representation, e.g., a street, avenue, region, or country |
| ORG | Reference to an organization, institution, or establishment |
| PER | Reference to a person or a group of people |
| O | Any other criteria |
Named entity tags used in the training set.
| CoNLL-2002 Tag | Generic Tag |
|---|---|
| I-LOC, B-LOC | LOC |
| I-ORG, B-ORG | ORG |
| I-PER, B-PER | PER |
| O, I-MISC, B-MISC | O |
Examples of tweets with their corresponding named entity tags.
| Tweet in Spanish | English Translation |
|---|---|
| help people trapped in a building located in Alvaro Obregon | |
| a collapsed department building on tlalpan avenue | |
| Taxqueña’s Soriana has fallen down |
Figure 4A biLSTM network for NER tasks. English Translation: Taxqueña’s Soriana has fallen down.
Figure 5Toponym geocoding.
Figure 6The first report occurs at 1:46 p.m., almost half an hour after the earthquake. The localized entity corresponds to the street Av. Álvaro Obregón, number 286, with geographic coordinates 19.4162205, −99.1705947. The other classified entities are similar and ordered temporally until the last report at 4:22 p.m. on the third observation day. (a) Users first report that a person is trapped in a collapsed building; (b) a day later, users continue reporting that a person is in the rubble, and information is already disseminated in a retweet; (c) on the third day, the victim is reported as rescued.
Recent works used to compare the proposed sensor.
| Titles | Natural Disaster | Dataset | Algorithms Employed | Algorithm with Overall Best Performance Metric Reported | Year |
|---|---|---|---|---|---|
| Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related [ | Napa California Earthquake, USA | Publicly available on | NB, SVM, and RF with Word Embeddings | NB with 82% accuracy | 2016 |
| A linguistically-driven approach to cross-event damage assessment of natural disasters from social media messages [ | L’Aquila and Emilia earthquakes from 2009 to 2014, Italy | Publicly Available on | SVM + Word Embeddings, SVM and NLP + POS tags | SVM + Word Embeddings with 88% F1-score | 2018 |
Results comparison.
| Dataset | Classifier | Named Entity Tag | Precision | Recall | F-1 Score |
|---|---|---|---|---|---|
| 19 September 2017 Mexico Earthquake | biLSTM-CRF | LOC | 0.83 | 0.76 | 0.80 |
| 19 September 2017 Mexico Earthquake | biLSTM-CRF | ORG | 0.83 | 0.86 | 0.85 |
| 2009–2014 L’Aquila and Emilia earthquakes, Italy | biLSTM-CRF | LOC | 0.84 | 0.84 | 0.84 |
| 2009–2014 L’Aquila and Emilia earthquakes, Italy | biLSTM-CRF | ORG | 0.79 | 0.69 | 0.74 |
| 2014 Napa California Earthquake, USA | biLSTM-CRF | LOC | 0.93 | 0.90 | 0.92 |
| 2014 Napa California Earthquake, USA | biLSTM-CRF | ORG | 0.88 | 0.87 | 0.87 |
| 0.85 | 0.82 | 0.84 | |||
| 19 September 2017 Mexico Earthquake | RF | LOC | 0.89 | 0.19 | 0.31 |
| 19 September 2017 Mexico Earthquake | RF | ORG | 0.89 | 0.18 | 0.30 |
| 2009–2014 L’Aquila and Emilia earthquakes, Italy | RF | LOC | 0.74 | 0.60 | 0.66 |
| 2009–2014 L’Aquila and Emilia earthquakes, Italy | RF | ORG | 0.76 | 0.29 | 0.42 |
| 2014 Napa California Earthquake, USA | RF | LOC | 0.60 | 0.25 | 0.35 |
| 2014 Napa California Earthquake, USA | RF | ORG | 0.75 | 0.26 | 0.39 |
| 0.77 | 0.30 | 0.40 | |||
| 19 September 2017 Mexico Earthquake | SVM | LOC | 0.76 | 0.48 | 0.59 |
| 19 September 2017 Mexico Earthquake | SVM | ORG | 0.73 | 0.78 | 0.64 |
| 2009–2014 L’Aquila and Emilia earthquakes, Italy | SVM | LOC | 0.75 | 0.57 | 0.65 |
| 2009–2014 L’Aquila and Emilia earthquakes, Italy | SVM | ORG | 0.82 | 0.25 | 0.38 |
| 2014 Napa California Earthquake, USA | SVM | LOC | 0.63 | 0.44 | 0.52 |
| 2014 Napa California Earthquake, USA | SVM | ORG | 0.82 | 0.24 | 0.37 |
| 0.75 | 0.45 | 0.53 | |||
| 19 September 2017 Mexico Earthquake | NB | LOC | 0.88 | 0.19 | 0.31 |
| 19 September 2017 Mexico Earthquake | NB | ORG | 0.86 | 0.18 | 0.30 |
| 2009–2014 L’Aquila and Emilia earthquakes, Italy | NB | LOC | 0.79 | 0.46 | 0.58 |
| 2009–2014 L’Aquila and Emilia earthquakes, Italy | NB | ORG | 0.84 | 0.24 | 0.37 |
| 2014 Napa California Earthquake, USA | NB | LOC | 0.51 | 0.57 | 0.54 |
| 2014 Napa California Earthquake, USA | NB | ORG | 0.78 | 0.26 | 0.39 |
| 0.70 | 0.47 | 0.42 |
Figure 7Hotspots maps obtained by applying KDE to the spatial information extracted from data collected over a 3-day window. (a) The hotspot map of the estimated spatial locations related to damages and collapses and official reports. (b) The hotspot map of estimated spatial locations related to official and collaborative shelters and official reports. (c) The hotspot map of estimated spatial locations related to missing persons (there are no official reports of missing persons).
Geocoded addresses and coordinates found by the sensor and officially declared as disaster areas.
| Geocoded Address | Geocoded Coordinates | Tweets | Retweets | |
|---|---|---|---|---|
| 1 | Rancho Tamboreo & Calz de las Brujas, Nueva Oriental Coapa, 14300 Ciudad de México, CDMX | 19.2965695, −99.1328497 | 135 | 368 |
| 2 | Calz. de Tlalpan 20, Conjunto Urbano Tlalpan, 04400 Ciudad de México, CDMX | 19.3385929, −99.1446581 | 126 | 331 |
| 3 | Av. Álvaro Obregón 286 Hipódromo 06100 Ciudad de México, CDMX | 19.4162255, −99.170594 | 112 | 250 |
| 4 | Amsterdam 25, Hipódromo, 06100 Ciudad de México, CDMX | 19.4158929, −99.1701461 | 109 | 204 |
| 5 | Calle Torreón & Viad. Miguel Alemán, Piedad Narvarte, 06760 Ciudad de México, CDMX | 19.4025116, −99.1634792 | 104 | 237 |
| 6 | Edimburgo & Escocia, Col del Valle Centro, 03100 Ciudad de México, CDMX | 19.3875319, −99.1656197 | 103 | 228 |
| 7 | Amsterdam & Calle Laredo, Hipódromo, 06100 Ciudad de México, CDMX | 19.4129041, −99.1730674 | 97 | 143 |
| 8 | Av. Álvaro Obregón 284, Hipódromo, 06100 Ciudad de México, CDMX | 19.4162562, −99.1704433 | 96 | 127 |
| 9 | Coahuila 286 Hipódromo, 06700 Ciudad de México, CDMX | 19.410391, −99.1685889 | 94 | 164 |
| 10 | Simón Bolívar 190, Obrera, 06800 Ciudad de México, CDMX | 19.4221723, −99.1422295 | 95 | 131 |
| 11 | Petén & Gral. Emiliano Zapata, Sta Cruz Atoyac, 03320 Ciudad de México, CDMX | 19.3665055, −99.1591011 | 92 | 199 |
| 12 | Puebla 282 Roma Nte. 06700 Ciudad de México, CDMX | 19.4211364, −99.1714281 | 92 | 216 |
| 13 | Calle Salamanca 107, Roma Nte., 06700 Ciudad de México, CDMX | 19.4172303, −99.1714257 | 91 | 139 |
| 14 | Balsas 18 sineo, Miravalle 03580 Ciudad de México, CDMX | 19.3605422, −99.1424208 | 88 | 215 |
| 15 | Escocia & Calle Gabriel Mancera, Col del Valle Centro, 03100 Ciudad de México, CDMX | 19.3876749, −99.1661223 | 87 | 220 |
| 16 | Calz. de Tlalpan 2050, Campestre Churubusco, 04200 Ciudad de México, CDMX | 19.3429739, −99.1434801 | 74 | 155 |
| 17 | Calle Querétaro & Medellín, Roma Nte. 06700 Ciudad de México, CDMX | 19.413905, −99.1672667 | 73 | 211 |
| 18 | Av Sonora 149, Hipódromo, 06100 Ciudad de México, CDMX | 19.4145946, −99.1714381 | 70 | 237 |
| 19 | Calle Concepción Beistegui & Calle Yacatas, Narvarte Poniente 03020 Ciudad de México, CDMX | 19.3873507, −99.1582722 | 69 | 178 |
| 20 | Galicia Niños Héroes, Ciudad de México, CDMX | 19.3886011, −99.1482661 | 69 | 111 |
| 21 | Calle Enrique Rebsamen & La Morena Narvarte Poniente, 03020 Ciudad de México, CDMX | 19.3985479, −99.1609147 | 61 | 97 |
| 22 | Rancho Vista Hermosa & Rancho de Los Arcos, Parque Alameda del Sur 04929 Ciudad de México, CDMX | 19.3069132, −99.124864 | 54 | 128 |
| 23 | Bretaña & Irolo, Zacahuitzco, 03550 Ciudad de México, CDMX | 19.3731238, −99.1398383 | 50 | 133 |
| 24 | Gral. Emiliano Zapata 51 Portales Nte, Ciudad de México, CDMX | 19.3642598, −99.1446719 | 47 | 131 |
| 25 | Saratoga 714, Portales Sur, 03303 Ciudad de México, CDMX | 19.3649279, −99.1540524 | 43 | 94 |
| 26 | Sierravista & Calle Riobamba Lindavista Nte. 07300 Ciudad de México, CDMX | 19.4940873, −99.1265294 | 41 | 117 |
| 27 | Calle Salvador Díaz Mironn Sta María la Ribera Ciudad de México, CDMX | 19.4492376, −99.1620973 | 40 | 85 |
| 28 | Av. las Trancas 40 Narciso Mendoza 14390 U. Hab. Narciso Mendoza Super 6 Coapa, CDMX | 19.292755, −99.125329 | 38 | 72 |
| 29 | Calz. de la Viga 1756, Héroes de Churubusco, 09090 Ciudad de México, CDMX | 19.3612758, −99.1240497 | 37 | 101 |
| 30 | Avenida Santa Ana 300, Ex-Ejido de San Francisco Culhuacan, 04470 Ciudad de México, CDMX | 19.3296075, −99.1272789 | 37 | 99 |
| 31 | Coquimbo 07300 Ciudad de México, CDMX | 19.4899307, −99.1281605 | 31 | 91 |
| 32 | Calle Puente 222, San Bartolo el Chico, 14380 Ciudad de México, CDMX | 19.2833487, −99.1373406 | 24 | 79 |
| 33 | Paseo Galias 47 Lomas Estrella 2da Secc, 09890, Ciudad de México, CDMX | 19.3205935, −99.0995659 | 23 | 56 |
| 34 | Vicente Guerrero 40, San Gregorio Atlapulco, 16600, Ciudad de México, CDMX | 19.2522187, −99.0614642 | 16 | 67 |
| 35 | Av. México, San Gregorio Atlapulco, 16600 Ciudad de México, CDMX | 19.2531664, −99.0513852, | 15 | 74 |
| 36 | Xochimilco-tulyehualco 191, Xochimilco, 16500, Ciudad de México, CDMX | 19.2468579, −99.0835714 | 13 | 46 |