| Literature DB >> 36074292 |
Thushara Sudheish Kumbalaparambi1, Ratish Menon2, Vishnu P Radhakrishnan3, Vinod P Nair4.
Abstract
Social media platforms are one of the prominent new-age methods used by public for spreading awareness or drawing attention on an issue or concern. This study demonstrates how the twitter responses of public can be used for qualitative monitoring of air pollution in an urban area. Tweets discussing about air quality in Delhi, India, were extracted during 2019-2020 using a machine learning technique based on self-attention network. These tweets were cleaned, sorted, and classified into 3-class quality viz. poor air quality, good air quality, and noise or neutral tweets. The present study used a multilayer classification model with first layer as an embedding layer and second layer as bi-directional long-short term memory (BiLSTM) layer. A method was then devised for estimating PM2.5 concentration from the tweets using 'spaCy' similarity analysis of classified tweets and data extracted from Continuous Ambient Air Quality Monitoring Stations (CAAQMS) in Delhi for the study period. The accuracy of this estimation was found to be high (80-99%) for extreme air quality conditions (extremely good or severe) and lower during moderate variations in air quality. Application of this methodology depended on perceivable changes in air quality, twitter engagement, and environmental consciousness among public.Entities:
Keywords: Air pollution; BiLSTM; Deep learning; Delhi; PM2.5; spaCy
Year: 2022 PMID: 36074292 PMCID: PMC9453714 DOI: 10.1007/s11356-022-22836-w
Source DB: PubMed Journal: Environ Sci Pollut Res Int ISSN: 0944-1344 Impact factor: 5.190
Comparison between various reference papers that performed air quality prediction
| Reference papers | Monitored air quality parameter | Source/social media platform used for prediction | Country or city under study | Prediction models/techniques used, efficiency obtained |
|---|---|---|---|---|
| Jiang et al. ( | AQI | Sina Weibo (Chinese Twitter) | Beijing | Gradient tree boosting (GTB) — 59% |
| Gurajala and Matthews ( | PM2.5 | Paris, Delhi, London | Air quality not predicted | |
| Jiang et al. ( | AQI | California, Idaho, Illinois, Indiana, Ohio | Natural Language Processing (NLP) — 6.9 to 17.7% improvement with social media intervention over base line method. | |
| Xu et al. ( | PM2.5 | Historical meteorological data, road network data, administrative boundary vector data, POI data. | Beijing, Tianjin, Hebei | Temporal-spatial-regression-tree model, grid prediction model — 90% |
| Chang et al. ( | PM2.5 | Local and neighbouring station data, chimney and abroad pollution data | Taiwan | Aggregate LSTM — better than GTB, LSTM, SVR |
| Zhang et al. ( | PM2.5 | PM2.5 — hourly, daily, restructured multi hour data | Beijing | LSTM, Bi-LSTM, EMD-BiLSTM, — more than 95% |
SVR support vector regression, EMD empirical mode decomposition
Fig. 1Framework of the study
Fig. 2Study area and air pollution monitoring stations in Delhi, India (Source: CPCB)
Keyword combinations used to extract tweets during study period
| Keyword combinations | ||
|---|---|---|
| Air Quality + Delhi | AirPollution + Delhi | DelhiChokes |
| AirQuality + Delhi | AirPollution + Delhi | Choke + Delhi |
| AirQualityDelhi | AirPollutionDelhi | Clean air + Delhi |
| DelhiAirQuality | DelhiAirPollution | DelhiEmergency |
| Delhi + Smog | DelhiSmog | Delhi + RightToBreathe |
Examples of classified tweets
| S. no. | Tweets | Classification |
|---|---|---|
| 1. | New Delhi: Over 1.2 million people died in India due to air pollution in 2017, said a global report on air pollution | Class 0 |
| 2. | #DelhiAirEmergency #DelhiPollution #DelhiBachao #DelhiAirQuality #DelhiNCRPollution | Class 0 |
| 3. | Kab tak zindagi katoga bd or cigar mein, kuch din to gujaro delhi, in ncr | Class 0 |
| 4. | Just landed in #Delhi, the air here is just unbreathable | Class 1 |
| 5. | Amazing Air Quality today in Delhi! Enjoy the blue sky and clean air while it lasts. | Class 2 |
Network architecture (deep learning model)
| Embedding: weight matrix using Glove used | |
| BiLSTM: units: 100 | |
| SeqSelfAttention | |
| Dense: (units: 50, activation: Relu) | |
| Dense: (units: 50, activation: Relu) | |
| Flatten | |
| Dense: (3 (for class 0, class 1, class 2), activation: softmax) | |
| Epochs = 5, batch size: 128, Optimizer: Adam. |
Fig. 6Variations in accuracy of PM2.5 estimation from the tweet content by described methodology
Fig. 3Time series of PM2.5 mass concentration with a Class I tweets count and b Class II tweets count
Fig. 4Statistical variations of the tweet volume responses with PM2.5 concentration ranges for a Class I and (b) Class II
Fig. 5Word clouds for a) pre monsoon, b) southwest monsoon, c) post monsoon, and d) winter season