| Literature DB >> 35468157 |
Qasim Khan1, Edda Kalbus2, Nazar Zaki3, Mohamed Mostafa Mohamed1,2.
Abstract
Floods are among the devastating types of disasters in terms of human life, social and financial losses. Authoritative data from flood gauges are scarce in arid regions because of the specific type of dry climate that dysfunctions these measuring devices. Hence, social media data could be a useful tool in this case, where a wealth of information is available online. This study investigates the reliability of flood related data quality collected from social media, particularly for an arid region where the usage of flow gauges is limited. The data (text, images and videos) of social media, related to a flood event, was analyzed using the Machine Learning approach. For this reason, digital data (758 images and 1413 video frames) was converted into numeric values through ResNet50 model using the VGG-16 architecture. Numeric data of images, videos and text was further classified using different Machine Learning algorithms. Receiver operating characteristics (ROC) curve and area under curve (AUC) methods were used to evaluate and compare the performance of the developed machine learning algorithms. This novel approach of studying the quality of social media data could be a reliable alternative in the absence of real-time flow gauges data. A flash flood that occurred in the United Arab Emirates (UAE) from March 7-11, 2016 was selected as the focus of this study. Random forest showed the highest accuracy of 80.18% among the five other classifiers for images and videos. Precipitation/rainfall data were used to validate social media data, which showed a significant relationship between rainfall and the number of posts. The validity of the machine learning models was assessed using the area under the curve, precision-recall curve, root mean square error, and kappa statistics to confirm the validity and accuracy of the model. The data quality of YouTube videos was found to have the highest accuracy followed by Facebook, Flickr, Twitter, and Instagram. These results showed that social media data could be used when gauge data is unavailable.Entities:
Mesh:
Year: 2022 PMID: 35468157 PMCID: PMC9037947 DOI: 10.1371/journal.pone.0267079
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Study area United Arab Emirates showing highest rainfall event (data from global circulation model).
Collected data from different social media platforms.
| Platform | No. of posts | No. of images | No. of videos | #tags/Keywords (Search Date: 7th to 11th March 2016) |
|---|---|---|---|---|
|
| 261 | 88 | 16 | #UAE AND #Rain OR #Flood OR #Storm OR #Weather |
| #AbuDhabi AND #Rain OR #Flood OR #Storm OR #Weather | ||||
| #Dubai AND #Rain OR #Flood OR #Storm OR #Weather | ||||
|
| 318 | 560 | 112 | #AlAin AND #Rain OR #Flood OR #Storm OR #Weather |
| #Sharjah AND #Rain OR #Flood OR #Storm OR #Weather | ||||
| #RAK AND #Rain OR #Flood OR #Storm OR #Weather | ||||
|
| 112 | 66 | 44 | #RainUAE OR #StormUAE #UAEWeather |
| #RainAbuDhabi OR #AbuDhabiFlood OR #AbuDhabiWeather | ||||
| #RainDubai OR #DubaiFlood OR #DubaiWeather | ||||
| #RainAlAin OR #AlAinFlood OR #AlAinWeather | ||||
| #RainSharjah OR #SharjahFlood OR #SharjahWeather | ||||
| #RainRAK OR #RAKFlood OR #RAKWeather | ||||
| #RainSharjah OR #SharjahFlood OR #SharjahWeather | ||||
|
| 25 | 21 | 0 | UAE AND Rain OR Flood OR Storm OR Weather |
| AbuDhabi AND Rain OR Flood OR Storm OR Weather | ||||
| Dubai AND Rain OR Flood OR Storm OR Weather | ||||
|
| 78 | 0 | 78 | AlAin AND Rain OR Flood OR Storm OR Weather |
| Sharjah AND Rain OR Flood OR Storm OR Weather | ||||
| RAK AND Rain OR Flood OR Storm OR Weather | ||||
|
|
|
|
|
Data classified into classes.
| Classes | ||||
|---|---|---|---|---|
| Not Relevant | Rain | Low Flood | High Flood | |
|
| 155 | 397 | 117 | 89 |
|
| 0 | 103 | 37 | 14 |
|
| 122 | 448 | 91 | 44 |
Fig 2Sample images from final dataset containing classified images based on four classes (a) not relevant, (b) rain, (c) low flood and (d) high flood.
Fig 3VGG-16 architecture for converting images into flattened features.
Fig 4Sample data from 2,171 rows showing binary coded matrix of text messages and extracted features from images and frames.
Fig 5Methodology of the case study from data collection to output.
Model accuracy from different functions in Weka.
| Classifiers | Images & videos | Text, images & videos | Text | |||
|---|---|---|---|---|---|---|
| Accuracy | Time (sec) | Accuracy | Time (sec) | Accuracy | Time (sec) | |
| Random Forest | 80.18% | 3.78 | 64.30% | 1.23 | 61.28% | 0.19 |
| k-nearest Neighbours | 76.22% | 4E-3 | 52.94% | 2E-2 | 60.99% | 1E-3 |
| Naïve Bayes | 37.83% | 0.44 | 41.38% | 0.14 | 60.28% | 1E-2 |
| Support Vector Machine | 69.82% | 5.66 | 59.03% | 0.98 | 62.84% | 0.08 |
| C4. 5 (J48) | 72.07% | 11.69 | 53.55% | 1.94 | 63.12% | 0.02 |
Fig 6Time series of rainfall depths (a) with frequency of total posts per day, (b) with frequency of images and videos per day.
Different classifier results for model accuracy, Kappa statistics, RMSE, F-measure, Area under Curve (AUC) and Precision Recall Curve (PRC).
| Metrics | Accuracy | Kappa statistics | RMSE | F-Measure | AUC | PRC | |
|---|---|---|---|---|---|---|---|
|
| Images & videos | 80.18% | 0.63 | 0.27 | 0.79 | 0.94 | 0.88 |
| Images, videos & text | 64.30% | 0.39 | 0.35 | 0.61 | 0.82 | 0.7 | |
| Text | 61.28% | 0.08 | 0.37 | 0.54 | 0.65 | 0.58 | |
|
| Images & videos | 76.22% | 0.59 | 0.34 | 0.76 | 0.79 | 0.68 |
| Images, videos & text | 52.94% | 0.27 | 0.48 | 0.53 | 0.63 | 0.46 | |
| Text | 60.99% | 0.08 | 0.39 | 0.54 | 0.63 | 0.56 | |
|
| Images & videos | 37.83% | 0.18 | 0.56 | 0.4 | 0.68 | 0.52 |
| Images, videos & text | 41.38% | 0.22 | 0.54 | 0.43 | 0.66 | 0.47 | |
| Text | 60.28% | 0.04 | 0.37 | 0.52 | 0.65 | 0.57 | |
|
| Images & videos | 69.82% | 0.42 | 0.36 | 0.67 | 0.74 | 0.59 |
| Images, videos & text | 59.03% | 0.35 | 0.39 | 0.58 | 0.7 | 0.5 | |
| Text | 62.84% | 0.01 | 0.39 | 0.77 | 0.5 | 0.45 | |
|
| Images & videos | 72.07% | 0.53 | 0.36 | 0.72 | 0.78 | 0.65 |
| Images, videos & text | 53.55% | 0.23 | 0.47 | 0.53 | 0.64 | 0.45 | |
| Text | 63.12% | 0.006 | 0.37 | 0.77 | 0.49 | 0.46 | |
Fig 7Area under Curve (AUC) for three set of data formats using random forest.
Random forest classifier accuracy for data quality of different social media platforms.
| Metrics | Random Forest | ||||||
|---|---|---|---|---|---|---|---|
| Accuracy | Kappa statistics | RMSE | F-Measure | AUC | PRC | ||
|
| Images & videos | 72.92% | 0.52 | 0.32 | 0.71 | 0.87 | 0.79 |
| Images, videos & text | 57.50% | 0.3 | 0.37 | 0.6 | 0.75 | 0.62 | |
| Text | 41.25% | 0.04 | 0.42 | 0.74 | 0.52 | 0.38 | |
|
| Images & videos | 80.46% | 0.65 | 0.27 | 0.79 | 0.94 | 0.88 |
| Images, videos & text | 66.67% | 0.39 | 0.34 | 0.62 | 0.82 | 0.72 | |
| Text | 48.60% | 0.01 | 0.42 | 0.37 | 0.46 | 0.35 | |
|
| Images & videos | 47.47% | 0.11 | 0.38 | 0.36 | 0.69 | 0.54 |
| Images, videos & text | 49.50% | 0.12 | 0.38 | 0.53 | 0.72 | 0.57 | |
| Text | 45.50% | 0.02 | 0.42 | 0.37 | 0.45 | 0.35 | |
|
| Videos | 83.61% | 0.65 | 0.25 | 0.82 | 0.96 | 0.93 |
| Videos & text | 40.91% | 0.13 | 0.42 | 0.37 | 0.59 | 0.42 | |
| Text | 43.18% | 0.21 | 0.43 | 0.42 | 0.59 | 0.41 | |
|
| Images | 74% | 0.13 | 0.29 | 0.7 | 0.68 | 0.76 |
| Images & text | 71.43% | 0.16 | 0.33 | 0.5 | 0.69 | 0.73 | |
| Text | 57.15% | 0.17 | 0.42 | 0.52 | 0.21 | 0.48 | |