| Literature DB >> 35669348 |
Gabriela Nathania Harywanto1, Juan Sebastian Veron1, Derwin Suhartono1.
Abstract
Coral reefs are very important ecosystem which are the foundation of all life on this earth, but now they are under threat. Coral bleaching are happening now at a serious rate and the ultimate goal of conservation effort toward this issue is behaviour change. One of the most important parts of conservation effort is monitoring. However, monitoring the success of the coral bleaching campaign on behaviour change requires extensive data collection so traditional methods are not effective because they require resources that may not be met. The goal of this study is to build fast and vast automation in analyzing the stage of behaviour change. Social media data has prospect to become good alternative to be used because social media usage is currently increasing every year, including Twitter. Therefore, an automatic classification model was designed which can identify the stages of behaviour change based on the Five Doors Theory on Twitter. Five Doors Theory define 5 stages of behavior change: Desirability, Enabling Context, Can Do, Buzz, and Invitation. The data was fetched through a trusted repository, Mendeley Data, with title "An Annotated Dataset for Identifying Behaviour Change Based on Five Doors Theory Under Coral Bleaching Phenomenon on Twitter". There are 1,222 tweets with keywords related to coral bleaching that have been annotated according to the behaviour change stages. There are two proposed designs: embedding extraction which utilizes the output of each encoder layer in BERTweet and stacking ensemble which uses several BERTweet models with different hyperparameters that are ensembled using a logistic regression model. The best accuracy of 0.7796 with an f1-score of 0.7945 was obtained in the stacking ensemble design scenario. The classification model created can identify each class at the stage of behaviour change well, even though the dataset is unbalanced in its distribution. The proposed design has a performance that exceeds all baseline models and the standalone BERTweet. In conclusion, the automatic classification model create the process of monitoring the stages of behavior change run effectively and efficiently so that the success of the coral bleaching campaign can be monitored and achieved.Entities:
Keywords: BERTweet model; Behaviour change; Embedding extraction; Ensemble technique; Five Doors Theory; Tweet classification
Year: 2022 PMID: 35669348 PMCID: PMC9153220 DOI: 10.1186/s40537-022-00615-1
Source DB: PubMed Journal: J Big Data ISSN: 2196-1115
Fig. 1The interrelationships and reciprocity between humans and coral reefs
Tweet example according to Five Doors Theory
| Stage | Tweet example |
|---|---|
| Desirability | Our buildings need 40% of all energy consumed in Switzerland! |
| Enabling Context | I am considering walking or using public transport at least once a week |
| Can Do | If you are not using it, turn it off! |
| Buzz | I’m so proud when I remember to save energy and I know however small it’s helping |
| Invitation | Take 15 min out to think about what you do now and what you could do in the future. Read up on the subject and decide what our legacy will be |
Fig. 2General experiment flow
Fig. 3Distribution of each class on dataset
Distribution of each class on training and testing set
| Set | Number of Tweets | |||||
|---|---|---|---|---|---|---|
| Desirability | Enabling context | Can Do | Buzz | Invitation | Total | |
| Training | 298 | 343 | 43 | 246 | 47 | 977 |
| Testing | 75 | 86 | 10 | 62 | 12 | 245 |
Fig. 4Preprocessing path
Fig. 5Deep Learning tokenization procedures
BERTweet Embedding Extraction scenarios description
| Scenario | Extracted Embedding |
|---|---|
| All 12 layers | |
| Last layer | |
| Last 4 layers | |
| Last 2 layers | |
| First 2 + Last 2 | |
| First + Last | |
| Last 2 + Mid 2 | |
| Last + Mid |
Fig. 6BERTweet Embedding Extraction configuration
The model configuration of the four individual models with batch size 4 and 9 epochs
| Model | Learning Rate (lr) | Epsilon (eps) |
|---|---|---|
| modelSE#1 | 1.00E−05 | 1.00E−08 |
| modelSE#2 | 1.00E−05 | 1.00E−12 |
| modelSE#3 | 2.00E−05 | 1.00E−08 |
| modelSE#4 | 2.00E−05 | 1.00E−12 |
SE stacking ensemble
Fig. 7BERTweet Stacking Ensemble configuration
Baseline model results
| Model | Accuracy | F1 score |
|---|---|---|
| BERT | 0.6393 | 0.5692 |
| SVM | 0.6612 | 0.5230 |
| Logistic regression | 0.6122 | 0.5318 |
| K-nearest neighbors | 0.5755 | 0.4154 |
| Random forest | 0.5959 | 0.3742 |
BERTweet embedding extraction (EE) result with lr = 1e-4 and eps = 1e-4
| Scenario | Description | Maximum accuracy | Maximum F1 score |
|---|---|---|---|
| EE#2 | All 12 Layers | 0.7633 | 0.7406 |
| EE#4 | Last 2 Layers | 0.7510 | 0.7484 |
| EE#5 | First 2 + Last 2 | 0.7510 | 0.7445 |
| EE#6 | First + Last | 0.7673 | 0.7388 |
| EE#7 | Last 2 + Mid 2 | 0.7510 | 0.7589 |
| EE#8 | Last + Mid | 0.7673 | 0.7214 |
Scenario EE#1 is bolded because it has the highest accuracy while scenario EE#3 is bolded because it has the highest F1 score
BERTweet stacking ensemble (SE) result
| Scenario | Description | Accuracy | F1 score |
|---|---|---|---|
| SE#1 | modelSE#1 | 0.7429 | 0.6786 |
| SE#2 | modelSE#2 | 0.7592 | 0.7399 |
| SE#4 | modelSE#4 | 0.7429 | 0.7436 |
| SE#5 | combination1 (modelSE#1 and modelSE#2) | 0.7755 | 0.7677 |
| SE#6 | combination2 (modelSE#1 and modelSE#3) | 0.7551 | 0.7530 |
| SE#7 | combination3 (modelSE#1 and modelSE#4) | 0.7551 | 0.7298 |
| SE#8 | combination4 (modelSE#2 and modelSE#3) | 0.7388 | 0.7236 |
| SE#10 | combination6 (modelSE#3 and modelSE#4) | 0.7592 | 0.7754 |
Scenario SE#3 is bolded because it has the best performance out of all standalone models, scenario SE#9 is bolded because it has the best performance out of the combination of two models and SE#11 is bolded because it has the best performance out of the combination of four models
Example of Buzz tweet which predicted as other class. Label 0 stand for Desirability, 1 stand for enabling context, and 3 stand for Buzz
| Tweet | Actual | modelSE#1 | modelSE#2 | modelSE#3 | modelSE#4 |
|---|---|---|---|---|---|
Think these corals are bleached? Think again! If this coral were bleached, we would see the entire colony slowly lose its color in a process called “paling.” Those white tips you see are actually new growth!
| 3 | 1 | 1 | 1 | 0 |
Results for BERTweet embedding extraction scenario EE#3
| Precision | Recall | F1 score | |
|---|---|---|---|
| Desirability | 0.7349 | 0.8133 | 0.7722 |
| Enabling Context | 0.7073 | 0.6744 | 0.6905 |
| Can Do | 0.6667 | 0.8000 | 0.7273 |
| Buzz | 0.8947 | 0.8226 | 0.8571 |
| Invitation | 0.9091 | 0.8333 | 0.8696 |
| Average | 0.78255 | 0.788733 | 0.783322 |
Results for BERTweet stacking ensemble scenario SE#11
| Precision | Recall | F1 score | |
|---|---|---|---|
| Desirability | 0.8261 | 0.7600 | 0.7917 |
| Enabling Context | 0.7000 | 0.7326 | 0.7159 |
| Can Do | 0.6429 | 0.9000 | 0.7500 |
| Buzz | 0.8525 | 0.8387 | 0.8455 |
| Invitation | 0.9091 | 0.8333 | 0.8696 |
| Average | 0.7861 | 0.8129 | 0.7945 |
Confusion matrix BERTweet embedding extraction scenario EE#3
| Actual | Predicted | |||||
|---|---|---|---|---|---|---|
| Desirability | Enabling Context | Can Do | Buzz | Invitation | ||
| Desirability | 61 | 13 | 1 | 0 | 0 | |
| Enabling Context | 20 | 58 | 2 | 5 | 1 | |
| Can Do | 0 | 2 | 8 | 0 | 0 | |
| Buzz | 2 | 9 | 0 | 51 | 0 | |
| Invitation | 0 | 0 | 1 | 1 | 10 | |
Confusion matrix BERTweet stacking ensemble scenario SE#11
| Actual | Predicted | |||||
|---|---|---|---|---|---|---|
| Desirability | Enabling Context | Can Do | Buzz | Invitation | ||
| Desirability | 57 | 17 | 1 | 0 | 0 | |
| Enabling Context | 11 | 63 | 3 | 8 | 1 | |
| Can Do | 0 | 1 | 9 | 0 | 0 | |
| Buzz | 1 | 9 | 0 | 52 | 0 | |
| Invitation | 0 | 0 | 1 | 1 | 10 | |
Fig. 8Performance comparison between baseline models and two best scenarios (EE#3 and EE#11)