| Literature DB >> 25350136 |
Liang Zhao1, Feng Chen2, Jing Dai3, Ting Hua1, Chang-Tien Lu1, Naren Ramakrishnan1.
Abstract
Twitter has become a popular data source as a surrogate for monitoring and detecting events. Targeted domains such as crime, election, and social unrest require the creation of algorithms capable of detecting events pertinent to these domains. Due to the unstructured language, short-length messages, dynamics, and heterogeneity typical of Twitter data streams, it is technically difficult and labor-intensive to develop and maintain supervised learning systems. We present a novel unsupervised approach for detecting spatial events in targeted domains and illustrate this approach using one specific domain, viz. civil unrest modeling. Given a targeted domain, we propose a dynamic query expansion algorithm to iteratively expand domain-related terms, and generate a tweet homogeneous graph. An anomaly identification method is utilized to detect spatial events over this graph by jointly maximizing local modularity and spatial scan statistics. Extensive experiments conducted in 10 Latin American countries demonstrate the effectiveness of the proposed approach.Entities:
Mesh:
Year: 2014 PMID: 25350136 PMCID: PMC4211687 DOI: 10.1371/journal.pone.0110206
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1A map of civil unrest event hotspots on September 27th, 2012 pertaining to labor reform and other issues.
Flags denote the ground-truth events reported by authorities. Circles denote the events detected by our method.
Figure 2Flowchart of the proposed method.
The algorithm of Dynamic Query Expansion.
| Algorithm 1: Dynamic Query Expansion. |
|
|
|
|
| Initialize |
| Set Φ via |
| Set |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The algorithm of Local Modularity Spatial Scan.
| Algorithm 2: Local Modularity Spatial Scan. |
|
|
|
|
| Initialize |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Add |
| Check overlapping among subgraphs and update Ω |
| Randomization testing on subgraphs and update Ω |
Dataset and Label Source.
| Country | #Tweets (million) | News source | #Events |
| Argentina | 52 | Clarín; La Nación; Infobae | 365 |
| Brazil | 57 | O Globo; O Estado de São Paulo; Jornal do Brasil | 451 |
| Chile | 28 | La Tercera; Las Últimas Notícias; El Mercurio | 252 |
| Colombia | 41 | El Espectador; El Tiempo; El Colombiano | 298 |
| Ecuador | 13 | El Universo; El Comercio; Hoy | 275 |
| El Salvador | 7 | El Diáro de Hoy; La Prensa Gráfica; El Mundo | 180 |
| Mexico | 51 | La Jornada; Reforma; Milenio | 1217 |
| Paraguay | 8 | ABC Color; Ultima Hora; La Nacíon | 563 |
| Uruguay | 3 | El Paí; El Observador | 124 |
| Venezuela | 45 | El Universal; El Nacional; Ultimas Notícias | 678 |
In addition to the top 3 domestic news outlets, the following news outlets are included: The New York Times; The Guardian;
The Wall Street Journal; The Washington Post; The International Herald Tribune; The Times of London; Infolatam.
Methods and Efficiencies.
| Methods | Targeted Domain | Supervised | Running Time |
| Earthquake Detection | Yes | Yes | 15.2 hours |
| Topic Modeling | No | No | 9.7 hours |
| Graph Partition | No | No | 18.9 hours |
| ST Burst | No | No | 30.1 hours |
| TEDAS | Yes | Yes | 20.9 hours |
| QE | No | No | 23.2 hours |
| SVM + LMSS | Yes | Yes | 22.0 hours |
| DQE + SS | Yes | No | 16.3 hours |
| Our proposed (DQE + LMSS) | Yes | No | 18.2 hours |
| EDSS | No | No | 19.8 hours |
Comparison between Expanded Query from DQE and GSR Description of Events.
| Detect-Date | Expanded Query Extracted by DQE | GSR Description of Real Events | Occur-Date |
| 1-Jul | #YoSoy132, #Granmarcha132, patrull, Companer, PRI, movement | “Youth movement #YoSoy132 staged a sit-in outside the local board of Federal Electoral Institute.” | 1-Jul |
| 3-Jul | #Epnnuncaseramipresidente, fraud, #YoSoy132, movimient, progress, contig, march | “The student movement #YoSoy132 protested against fraud in the elections.” | 3-Jul |
| 7-Jul | #Megamarcha, #Exigimosdemocracia Eugenio, Derbez, eleccion, @YoSoy132Media | “Protesters unite to call for mega march.” “YoSoy132 go and concentrate on the Esplanade of Heroes.” | 7-Jul |
| 8-Jul | #Megamarcha, #Megamarch, Eugenio, Derbez, against, election | “Protesters unite to call for mega march against virtual presidential election.” | |
| 13-Jul | imposicion, #Megamarcha, 15hrs, principal, march, #AMLO | “A march was in protest of the imposition of the PRI candidate.” | 14-Jul |
| 14-Jul | #Megamarcha, #Megamarch, 14juli, zocal, angel, march | “Virtual #Megamarch against the winner of the presidential election, Enrique Peña Nieto, left the Angel de Independencia to el Zocalo of Mexico City.” | 14-Jul |
| 19-Jul | #Sosmexico, #Sosmexic, fraud, elector, march, protest | “Protesting for alleged fraud in the election of July 1” | 19-Jul |
| 22-Jul | #Megamarcha, #YoSoy132, @epigmenioibarra, Zocal, march, imposicion | “A mega march against the alleged imposition of the PRI.” | 22-Jul |
| “YoSoy132 march arrives at El Zocalo and goes to the Monument to the Revolution” | |||
| 27-Jul | #Ocupatelevisa, #YoSoy132, televisa, chapultepec, installation, march | “Students symbolically take over facilities of Hidalgo Radio and TV, and fence outside Televisa Chapultepec in Mexico City” | 27-Jul |
Performance Comparison with Baseline Components (Precision, Recall, F-measure).
| Dataset | DQE + LMSS | DQE + SS | QE + LMSS | SVM + LMSS |
| Brazil |
| 0.84, | 0.44, 0.14, 0.21 | 0.39, 0.24, 0.30 |
| Colombia |
| 0.58, 0.73, 0.65 | 0.31, 0.16, 0.21 | 0.63, 0.64, 0.63 |
| Uruguay | 0.66, | 0.76, 0.26, 0.39 |
| 0.45, 0.27, 0.34 |
| El Salvador |
| 0.63, 0.09, 0.16 | 0.55, 0.37, 0.44 | 0.61, 0.19, 0.29 |
| Mexico |
| 0.73, 0.37, 0.49 | 0.56, 0.09, 0.16 | 0.56, 0.18, 0.27 |
| Chile |
| 0.58, | 0.28, 0.28, 0.28 | 0.78, 0.29, 0.42 |
| Paraguay |
| 0.96, 0.17, 0.29 | 0.88, | 0.57, 0.11, 0.19 |
| Argentina | 0.78, 0.61, 0.69 | 0.69, | 0.67, 0.54, 0.60 |
|
| Venezuela |
| 0.57, 0.31, 0.40 | 0.56, 0.26, 0.36 | 0.65, 0.12, 0.20 |
| Ecuador |
| 0.72, 0.44, 0.55 | 0.54, | 0.62, 0.71, 0.66 |
Performance Comparison with Existing Event Detection Methods (Precision, Recall, F-measure).
| Dataset | DQE + LMSS | Graph Partition | Earthquake | Topic Modeling | TEDAS | ST Burst | EDSS |
| Brazil |
| 0.55, 0.34, 0.42 | 0.65, 0.19, 0.30 | 0.46, 0.09, 0.15 | 0.39, 0.20, 0.27 | 0.80, | 0.86, 0.28, 0.42 |
| Colombia | 0.81, | 0.68, 0.29, 0.41 | 0.55, 0.49, 0.52 | 0.26, 0.39, 0.31 | 0.66, 0.41, 0.50 |
| 0.57, 0.52, 0.54 |
| Uruguay | 0.66, | 0.28, 0.23, 0.25 | 0.86, 0.11, 0.20 | 0.22, 0.06, 0.09 |
| 0.11, 0.06, 0.08 | 0.66, 0.13, 0.22 |
| El Salvador |
| 0.35, 0.07, 0.10 | 0.32, 0.06, 0.10 | 0.40, 0.05, 0.09 | 0.71, 0.36, 0.48 | 0.30, 0.12, 0.17 | 0.52, 0.15, 0.23 |
| Mexico |
| 0.72, 0.23, 0.35 | 0.51, 0.19, 0.28 | 0.34, 0.08, 0.12 | 0.56, 0.20, 0.29 | 0.76, 0.43, 0.55 | 0.69, 0.27, 0.39 |
| Chile | 0.80, | 0.83, 0.39, 0.53 | 0.46, 0.19, 0.27 | 0.42, 0.48, 0.45 |
| 0.67, | 0.35, 0.43, 0.39 |
| Paraguay |
| 0.76, 0.19, 0.30 | 0.40, 0.10, 0.16 | 0.86, 0.07, 0.13 | 0.88, | 0.34, 0.12, 0.18 | 0.83, 0.16, 0.27 |
| Argentina | 0.78, 0.61, |
| 0.63, 0.57, 0.60 | 0.38, 0.42, 0.40 | 0.51, | 0.63, 0.73, 0.67 | 0.73, 0.55, 0.63 |
| Venezuela |
| 0.46, 0.21, 0.29 | 0.87, 0.22, 0.35 | 0.47, 0.37, 0.41 | 0.79, 0.28, 0.42 | 0.82, 0.33, 0.47 | 0.86, |
| Ecuador |
| 0.30, 0.22, 0.25 | 0.78, | 0.67, 0.04, 0.08 | 0.55, 0.92, | 0.29, 0.26, 0.27 | 0.64, 0.28, 0.39 |
Figure 3Sensitivity analysis of parameters.
(a) Sensitivity analysis of “number of seed query terms” (b) Sensitivity analysis of “trade-off β for updating tweet node weights” (c) Sensitivity analysis of “trade-off between local modularity and spatial scan statistics”.
Figure 4Sensitivity analysis of the longest distance r between any two neighboring locations.
Figure 5Event detection case studies.