Literature DB >> 35340901

Understanding information behavior of South Korean Twitter users who express suicidality on Twitter.

Donghun Kim¹, Woojin Jung¹, Seojin Nam¹, Hongjin Jeon^2,3, Jihyun Baek², Yongjun Zhu⁴.

Abstract

Objective: Although there were few studies on how suicidal users behave on Twitter, they only investigated partial aspects such as tweeting frequency and tweet length. Therefore, we aim to understand the various information behavior of suicidal users in South Korea.
Methods: To achieve this goal, we annotated 20,000 tweets and identified 1097 tweets with the expression of suicidality (i.e. suicidal tweets) and 229 suicidal users (i.e. experimental group). Using the data, a user profile analysis, comparative analysis with control group, and tweets/hashtags analysis were performed.
Results: Our results show that many suicidal users used suicide-related keywords in their user IDs, usernames, descriptions, and pinned tweets. We also found that, compared to the control group, the experimental group show different patterns of information behavior. The experimental group did not frequently use Twitter and, on average, wrote longer texts than the control group. A clear seasonal pattern was also identified in the experimental group's tweeting behavior. Frequently used keywords/hashtags were extracted from tweets written by the experimental group for the purpose of understanding their concerns and detecting more suicidal tweets. Conclusions: We believe that our study will help in the understanding of suicidal users' information behavior on social media and lay the basis for more accurate actions for suicide prevention and early intervention on social media.

Entities: Chemical

Keywords: Data analysis; disease; health communication; informatics; media; public health; public health informatics; social media

Year: 2022 PMID： 35340901 PMCID： PMC8943454 DOI： 10.1177/20552076221086339

Source DB: PubMed Journal: Digit Health ISSN： 2055-2076

Introduction

Suicide worldwide

Suicide is one of the most critical causes of death worldwide. The World Health Organization reported that about 800,000 people kill themselves every year in the world. Many countries have been increasing their efforts to prevent suicide and reduce suicide risks. In the United States of America (USA), the government is operating the National Suicide Prevention Lifeline campaign (NSPL) which is a 24-hour suicide prevention hotline in the USA. The government of Australia also operated a National Suicide Prevention campaign called R U OK?, in cooperation with private organizations. South Korea, the country with the highest suicide rate in Organization for Economic Co-operation and Development (OECD), initiated a nationwide suicide prevention plan in 2004. The plan includes the establishment of the National Mental Health Welfare Center and the enactment of a suicide prevention ordinance. They also established the Korea Suicide Prevention Center (KSPC) in 2003 with the purpose of preventing suicide and constructing a collaborative network with other institutes that work in the field of suicide prevention. Despite these governmental efforts, suicide still remains a life-threatening social problem, suggesting that the paradigm to cope with suicide needs to be different from what it has been before.

Suicide and social media

In recent years, several studies on suicide analyzed posts on social media platforms such as Twitter,[5,6] Facebook, and Instagram to identify suicidal posts and detect warning signs of suicide for early intervention. They have identified that suicidal individuals use social media for suicide-related communication, including the expression of their suicidality and posting of suicide notes or related contents.[5,9,10] Different from the traditional communication methods such as letters, notes, and diaries which are mostly revealed after individuals’ suicide attempts, social media posts are open to the public and, therefore, it can be found before attempting suicide.[9,10] Thus, early identification of suicide-related posts can be helpful in preventing people from killing themselves. Although there were some studies[11-14] on how suicidal users behave on Twitter, they only investigated partial aspects of this behavior such as tweeting frequency and tweet length.

Identifying suicidal posts on social media

Keywords play a significant role in the identification of suicidal posts. A post may be relevant to suicide if it includes suicide-related keywords. Thus, the initial filtering of suicidal posts depends on these keywords. Keywords are also used in analyses of suicidal posts in combination with methods for natural language processing such as text classification and sentiment analysis. To identify suicidal posts, many studies used existing keyword dictionaries such as HowNet, LIWC, and CLIWC. Although they have achieved meaningful results, their findings were limited because they used general-purpose keywords that are used for psycholinguistic analysis, and not specifically for suicide. In turn, they identified many posts with low precision. Overall, there was a weak association between their keywords and suicide, and it made their result less likely to perform better. To overcome this problem, there are several studies[11,18,19] that used manually curated suicide-related keywords to identify suicidal posts.

Analyzing suicidal posts on social media

Several studies examined suicidal posts on social media.[7,10,19,20] On social media, because of the element of anonymity, people freely reveal their suicidal thoughts and plans. A few studies examined the linguistic characteristics of posts expressing suicidality on social media.[7,9,19] Gunn and Lester (2012) found that, closer to suicide, there was a decrease in the use of the pronoun “I” and in references to the self. They also found that there was an increase in positive emotion terms and a decrease in negative emotion terms. In addition, as the postings got closer to the time of death, they increased in length and in the number of words used per sentence. Ahuja (2014) studied the case of a male patient in his late 20s, who used Facebook and e-mail to announce a suicide attempt. It was found that the person with suicidal thoughts made significantly more use of self-focused words and employed more “me, too” expressions. In a recent study, it was found that, compared to general posts, suicidal posts had more words and general adverbs, greater expressions of anger, increased concentration on the present, decreased references to the cognitive processes of cause and differentiation, and fewer quotation marks.

Information behavior of suicidal users on social media

Although a few studies have examined the behavior of users expressing suicidality on the Internet,[21-26] few of them focused on social media.[11,13,27,28] Chloe and colleagues (2018) examined 467 young adults’ social media behavior and found that vaguebooking, which refers to posts that contain little actual and clear information but are written in a way as to solicit attention and concern from others, could be a warning sign of suicide. They described that vaguebooking is more useful than other elements such as time spent using social media and importance of social media in lives when predicting suicide ideation. Wang and colleagues (2019) explored the behavior of people with suicidal ideation on Weibo and found that more suicidal tweets were written in the evening than in the daytime and females post more suicidal tweets than males. They also found that non-suicidal users in suicide communities—networks of users retweeting or replying to tweets from actual suicide—can be contagious with suicidality through interacting with suicidal users. They described that these findings are in line with previous works that interpersonal communication might lead to suicide contagion. Luo (2020) collected suicidal posts from Twitter and identified the main topics of them and explored their temporal patterns. The results showed that tweets related to School, Work, Finance, and Emotion were posted more on Mondays and Tuesdays while tweets related to Low mood, Fashion (such as belts or Gucci), Appearance, and Caring for were posted more on weekends due to the stress of returning to work or school after weekends. Ramirez and colleagues (2020) investigated the difference in the posting patterns between suicidal users and non-suicidal users and described that tweet length and number of tweets of suicidal users are shorter than those of non-suicidal users. Considering the abovementioned studies, they identified several meaningful findings, but it still has limited understanding on the behavior of suicidal users on social media. Therefore, the study aims to discover detailed characteristics and patterns of users who express suicidality on social media.

Research questions

To our knowledge, there has been no attempt to investigate South Korean users’ suicidal behavior on social media. In this study, we aim to understand the information behavior of South Korean Twitter users who express suicidality on Twitter. Specifically, we have the following research questions: RQ1: What are the characteristics of suicidal users’ Twitter profiles? RQ2: How do suicidal users’ behavior on Twitter differs from that of non-suicidal users? RQ3: What keywords and hashtags can be used to detect suicidality on Twitter?

Data and methods

Data

Figure 1 shows our pipeline of data collection, preprocessing, and annotation. The initial dataset was collected and preprocessed to filter irrelevant tweets. Data annotation was performed to identify suicidal tweets and users, followed by the second data collection process for detailed data.

Figure 1.

The pipeline of data collection, data preprocessing, and data annotation.

The pipeline of data collection, data preprocessing, and data annotation. In the flowing, we describe data collection, data preprocessing, and data annotation in detail. Data collection: In this study, data collection was performed twice. In the first step, we collected tweets including the 45 suicide-related keywords (e.g. suicide, mass suicide, suicide methods, suicidal thoughts, and way to die easily) proposed by the KSPC. Several psychiatrists developed the list of 45 keywords through extensive discussions and the KSPC is using the keywords to screen suicide-related information on the Internet. We used Twitter Scraper 1.4.0 to collect tweets that were created between 1 January 2019 and 31 December 2019 and included one or more of those 45 keywords. In total, we collected 457,947 tweets. User profiles such as user_id, user_name, description, joined_date, number_of_likes, and number_of_favorites were collected using Python library selenium 3.141.0. After identifying the experimental group and control groups, we collected the historical tweets of the two groups of users. We used GetOldTweets3 0.0.11 for collecting those historical tweets. We collected tweets written by the 182 users in the experimental group and the 185 users in the control group during 1 July 2019 and 31 December 2019. In total, we collected 74,048 tweets written by the experimental group and 561,352 tweets written by the control group. Data preprocessing: Because the collected tweets included news articles, campaigns, and entertainment content that are not related to the personal expression of suicidality, we used multiple approaches to filter them out. First, we checked the usernames to filter out those related to bots (i.e. tweet generating computer programs), universities, campaigns, news, and entertainment. Because these users create tweets for suicide prevention, promotion, or other purposes not related to suicide, we removed all tweets written by these users. Next, we removed tweets that include hyperlinks since these hyperlinks often link to news articles or posts generated by others. As a final step, we checked hashtags and removed tweets that have hashtags related to bots (#**bot, #**auto), entertainment (#BTS, #V, #Taehyung), and marketing (#**sale, #saleabout**). Table 1 shows the filtering process and the resulting number of tweets.

Table 1.

Data collection and preprocessing.

	Step	Deleted	Results	Details
1	Initial set of tweets		457,947
2	Removal of tweets written by non-suicidal users	30,210	427,737	Bots: 27,201 Universities: 1373 Entertainment: 607 Campaigns: 519 News:510
3	Removal of tweets with hyperlinks	39,015	388,722	Tweets with hyperlinks
4	Removal of tweets with non-suicidal hashtags	24,112	364,610	Entertainment: 20,640 Marketing: 2493 Bots: 582 Campaigns: 123 Others: 274

Data collection and preprocessing. Data annotation: After the preprocessing, 20,000 tweets were randomly sampled from the preprocessed data using a sampling method from Python library Pandas (i.e. Pandas.DataFrame.sample()). The sampled tweets were annotated by three graduate students to identify whether a tweet is about the personal expression of suicidality. Before annotating, the three graduate students consulted two psychiatrists to develop a general annotation guideline and were trained by an expert of mental health and suicide research. After the consultation and training, each student annotated the same randomly selected 1000 tweets. The inter-rater reliability had reached Fleiss’ kappa value of 0.703 (p < 0.001), which is considered as a substantial agreement, each of the graduate students annotated 20,000 tweets. The three annotators were in complete agreement for 19,078 tweets (95.4%) and for the remaining 922 tweets (4.6%) with one annotator having different labels from those of the other two, we went through a process of consensus with the intervention of the psychiatric expert. As a result, 1097 tweets were annotated as suicidal tweets while the remaining 18,903 tweets were annotated as non-suicidal tweets. Experimental and control groups: Three annotators manually reviewed historical tweets written by the users of the 1097 tweets to identify users who had frequently written suicidal tweets. A total of 229 users were identified as the experimental group, i.e. users who had frequently written suicidal tweets. When we were checking the historical tweets of the experimental group, we masked their metadata and reviewed only the body text of tweets to avoid any bias that may be caused by reading user IDs, usernames, or descriptions. The control group was identified by reviewing the remaining 18,903 tweets, which consists of the users who had written a few suicide-related tweets with many ordinary tweets. We stopped the process when the size of the control group was comparable to that of the experimental group.

Methods

In this study, we aim to identify and analyze tweets with the expression of suicidality (i.e. suicidal tweets) and the suicidal users (i.e. the experimental group). Specifically, we have three goals. First, we investigate the profiles of the experimental group on Twitter and identify their characteristics. Second, we compare the experimental group with the control group to understand how they differ in terms of social media usage patterns. Lastly, we analyze the contents and hashtags of suicidal tweets for the purpose of collecting keywords that can be used for the identification of suicidal posts. User profile analysis of the experimental group: We analyze the experimental group's profiles that include the following metadata: user_id, user_name, description, and pinned tweet to understand the characteristics of the experimental group. We assume users use the abovementioned metadata to express themselves on social media and some users may use keywords or expressions that are closely related to suicide. Description is a short text used to describe a user, while pinned tweets are the tweets that are pinned to the top of all the tweets by the creators for easy access. By analyzing the metadata, we aim to understand notable characteristics of the experimental group and how they express themselves on Twitter. Three annotators manually reviewed the text of each profile and developed a coding schema. The process of reviewing, coding, and refining the schema had continued until a satisfactory coding schema was developed. After building the coding schema, we classified each user's profile using this schema. For example, if someone used “I have stopped self-harm. | There are a lot of self-downing tweets. | Block me if you don't want to see it.” in the description, we coded it as self-harm and self-downing. Comparison between the experimental and control groups. We compare the experimental group with the control group to understand how the experimental group behaves differently on Twitter from the control group. To understand differences in social media usage patterns between the two groups, we compare the two groups by analyzing tweet length, tweet frequency, seasonal pattern, and any time-related tweeting patterns using a custom script implemented using Python 3.7.3. Suicidal tweets and hashtag analysis: By analyzing suicidal tweets and hashtags, we aim to identify keywords and hashtags that are frequently used by the experimental group. Identified keywords and hashtags help us better understand topics and interests the experimental group talks about, and they can serve as valuable seeds that can be used to identify and retrieve suicidal tweets for later studies. To achieve this, we investigate major keywords frequently used in suicidal tweets. Keyword extraction is largely dependent on how we tokenize long texts. Because of social media users’ extensive use of slang, intentional misspelling of words, and neologisms, traditional dictionary and rule-based tokenizers may perform worse. To address this problem, we used SentencePiece which can train subword models such as byte pair encoding (BPE) directly from the raw dataset. This allows us to make a purely end-to-end and language-independent system. In this process, tokenization is a key process in identifying keywords. We needed to set the number of tokens/keywords (i.e. vocabulary size) before tokenization when using SentencePiece. The size of preset vocabulary significantly affects the quality of keywords extraction and one of our goals was to identify the optimal number of keywords. We performed multiple experiments by setting the vocabulary size from 2000 to 16,000 and manually checking the top 500 frequent keywords extracted using different settings of vocabulary size. We also extracted major hashtags frequently used in suicidal tweets.

Results

User profile analysis of the experimental group

Among the randomly selected 20,000 tweets, 1097 tweets written by 669 users were identified as suicidal tweets. We manually reviewed historical tweets of all 669 users and identified 229 users who belong to the experimental group. Analysis of user ID and username: As shown in Figure 2, many suicidal users’ user IDs and/or usernames include expressions that are related to negative expression/emotion (e.g. **sadman, **issosad), keywords related to gloom (e.g. dark_night**, dark_moon**, **dawn, rainy**), depression (e.g. **_depress, depression**, **_dep), suicide (e.g. **w_to_killme, kill_myse**), self-harm (e.g. selfhar**) help seeking (e.g. help_so**, help_me**), death (e.g. death_d**, _death**), and self-disgust (e.g. useless**). In total, 22.7% of the experimental group use suicide-related expressions in their user IDs and/or usernames. This finding, added to the existing knowledge of suicidality detection on social media, will enable us to identify suicidal users more accurately. In addition to the reported patterns, several users expressed their emotions and feelings using emojis such as rain, moon, and fog in their usernames. Twelve users used persons’ names (e.g. **yoonHee, minji**) as user IDs and/or usernames.

Figure 2.

Categories of suicidal users’ user IDs and usernames.

Categories of suicidal users’ user IDs and usernames. Analysis of user description: As shown in Figure 3, many suicidal users mentioned depression, self-harm, age, mental disorder, gender, suicide, sexual orientation, self-disgust, help seeking, negative expression/emotion, date of suicide attempt, and treatment/counseling. Specifically, some suicidal users described their sexual orientation such as being autochorissexual, bisexual, demigender, lesbian, pansexual, and paraphilias. In addition, many users talked about specific mental disorders such as anxiety disorder, attention-deficit hyperactivity disorder (ADHD), borderline personality disorder, bulimia nervosa, burnout syndrome, dysthymia, eating disorder, gender identity disorder, homesickness, intermittent explosive disorder (IED), manic depressive disorder, panic disorder, paranoia, smile mask syndrome, sociophobia, and split personality. Some suicidal users mentioned their experiences of school/domestic violence and the date of their suicide attempt. Among suicidal users who revealed their gender and age, we found that 25 out of 26 are female and 33 out of 45 belong to the youth.

Figure 3.

Categories of suicidal users’ descriptions.

Categories of suicidal users’ descriptions. Analysis of pinned tweets: A pinned tweet is a fixed tweet at the top of one's timeline. A pinned tweet often delivers a message that the creator thinks is important and wants to share with others. In pinned tweets, suicidal users made mentions of self-harm, help seeking, age, depression, gender, mental disorder, negative expression/emotion, drinking/smoking, gloomy picture/video, treatment/counseling, self-disgust, suicide, sexual orientation, and self-soothing. These results are shown in Figure 4. Compared to their user IDs, usernames, and descriptions, suicidal users, additionally, mentioned drinking, smoking, and self-soothing.

Figure 4.

Categories of pinned tweets posted by suicidal users.

Categories of pinned tweets posted by suicidal users. Furthermore, because pinned tweets can contain pictures or videos, some suicidal users added gloomy pictures or videos, selfie images, and photos that imply self-harm such as those with knives. A suicide note was also found in a user’s pinned tweet.

Comparison between the experimental and the control groups

We compared tweets written by 182 users in the experimental group and 185 users in the control group from 1 July 2019 to 31 December 2019 to understand how the experimental group behaves differently on Twitter from the control group. Although we identified 229 suicidal user accounts in the annotation step, some accounts were not accessible after the initial data collection. In the end, we were able to collect tweets from 182 users. In total, 74,048 tweets were written by the experimental group and 561,352 tweets were written by the control group. Comparison of tweet length: Figure 5 shows the two groups’ differences in tweet length, which was the number of characters in a tweet. Overall, the experimental group (mean: 43.06, SD: 20.86) wrote longer tweets than the control group (mean: 32.67, SD: 17.76). The bell-shaped curve of the control group is narrower than that of the experimental group and has a longer tail. The boxplot also shows the evident difference which is statistically significant at the level of p < 0.0001 (i.e. 4.470e-07) in the Student's t-test

Figure 5.

Difference in tweet length between the experimental and control groups.

Difference in tweet length between the experimental and control groups. Comparison of tweet frequency: Figure 6 shows the experimental and control groups’ difference in tweet frequency, measured by the average tweet count per month. Overall, the experimental group tweeted much less frequently than the control group. Most users in the experimental group wrote fewer than 40 tweets on average in a month. Among them, 94 users wrote fewer than ten tweets on average in a month. More diverse patterns are shown among the control group. There were 64 heavy users that wrote more than 500 tweets, 77 medium users who wrote fewer than 500 but more than 100 tweets, and 44 light users who wrote fewer than 100 tweets in a month.

Figure 6.

Difference in tweet frequency between the experimental and control groups, measured by the users’ average tweet count per month.

Difference in tweet frequency between the experimental and control groups, measured by the users’ average tweet count per month. Comparison of seasonal pattern in tweeting behavior: As shown in Figure 7, there is a notable trend in the experimental group. The number of tweets written by the experimental group increased from the first week of July to the fourth week of October and then decreased afterward. In other words, the number of tweets written by the experimental group peaked in the fall season with a notable increasing trend from summer to fall and a decreasing trend from fall to winter. We identified a seasonal pattern in tweeting behavior which was similar to the previous works about actual suicide.[36,37] On the other hand, there was no such seasonal pattern shown in the control group.

Figure 7.

Difference in number of tweets by week between the experimental and control groups.

Difference in number of tweets by week between the experimental and control groups. Comparison of tweet creation time: We compared the two groups by tweet creation time. As shown in Figure 8, there is no difference between the two groups in terms of when they post tweets. The two groups show a similar pattern of tweeting, writing more tweets at night than in the daytime. This finding is consistent with a previous study.

Figure 8.

Tweet creation time in the experimental and control groups.

Suicidal tweets and hashtags analysis

In this part, we analyzed the suicidal tweets and hashtags to identify keywords and hashtags that are frequently used by suicidal users. Identification of suicide-related keywords: Table 2 shows the number of suicide-related keywords among each of the words on the top 500 keyword list that were manually annotated by the authors. We were able to extract the largest number of suicide-related keywords with the vocabulary size of 6000 and 8000. After the additional check using the top 1000 keyword lists between the 6000 and 8000 groups, we found that 6000 is the optimal vocabulary size that generates the largest number of suicide-related keywords.

Table 2.

Number of suicide-related keywords in the top 500 frequent keywords generated using different vocabulary size.

Vocabulary size	2K	4K	6K	8K	10K	12K	14K	16K
Number of suicide-related keywords	20	63	94	94	91	56	74	68

Number of suicide-related keywords in the top 500 frequent keywords generated using different vocabulary size. Table 3 shows the main suicide-related keywords extracted with the vocabulary size setting of 6000. Among them, some are related to methods of suicide/self-harm (e.g. self-poisoning, strangled myself), tools in suicide/self-harm (e.g. cutter knife, drug), and slang words (e.g. self-harmer).

Table 3.

Main suicide-related keywords extracted with the vocabulary size setting of 6K.

	Tokens
Suicide	_want to die, _suicide attempt, _feel suicide impulse, _want to commit suicide, _even attempt suicide, _have to die, _suicidal thought, _how to suicide, _suicide plan, _don’t want to live, _strangled myself, _want to cut, _like cutter knife
Self-harm	_self-harmer, _want to self-harm, _self-harming, _arm warmer, _self-harm scar, _do self-harm terribly, _self-poisoning/harm by drugs

Main suicide-related keywords extracted with the vocabulary size setting of 6K. Identification of suicide-related hashtags: To extract frequently used hashtags, we collected all the hashtags included in the 74,048 tweets written by the 182 suicidal users. After arranging them according to frequency, we manually extracted the suicide-related hashtags. The top 30 suicide-related hashtags are shown in Table 4. Results showed that suicide-related hashtags are mainly related to self-harm (#self-harm, #self-harm_picture) and depression (#depress, #depression). There are some hashtags related to mental disorder (#panic_disorder, #anxiety_disorder, #bipolar_disorder), sexual orientation (#gay, #introduction_queer), and signals used by lesbians to identify each other (#a_three_o'clock_ball, #ATB).

Table 4.

Top 30 suicide-related hashtags.

Rank	hashtag	Count	Rank	hashtag	Count
1	#self-harm	92	16	#dissociative_disorders	15
2	#introduction_depressioner_account	63	17	#self-harm_picture	13
3	#introduction_self-harm_account	44	18	#a_three_o'clock_ball	13
4	#introduction_ self-harmer	41	19	#obsessive-compulsive_disorder	11
5	#self-harm_account	30	20	#ATB (a_three_o'clock_ball)	9
6	#depressed	29	21	#introduction_queer	9
7	#introduction_depressioner	28	22	#self-harmer	8
8	#suicide	25	23	#gay	8
9	#depression_account	25	24	#introductionDepressedAccount	7
10	#IQ (introduction_queer)	23	25	#rt_reason_for_living_and_dying	7
11	#bipolar_disorder	19	26	#express_death_without_ words	6
12	#introduction_mental_patienter	18	27	#mental_disorder	6
13	#depression	17	28	#self_introduction_of_depressed_account	5
14	#panic_disorder	16	29	#alpram	5
15	#anxiety_disorder	15	30	#seroquel	5

Top 30 suicide-related hashtags.

Discussion

In this study, we have several meaningful findings regarding the information behavior of suicidal users on Twitter. First, through user profile analysis, we found that many suicidal users used suicide-related keywords in their user IDs, usernames, descriptions, and pinned tweets. Our results showed that user profile has been used as a tool to represent suicidal users’ status of mental disorders, suicidal thoughts, sexual orientation, help seeking, treatment/counseling, drinking, smoking, etc. Therefore, while previous studies focused mainly on tweet texts and overlooked the importance of metadata,[38,39] we found that metadata such as user profiles is a valuable source of understanding suicidal users’ information behavior. Furthermore, we found that several suicidal users uploaded pictures related to suicide/self-harm (e.g. blood, knives) in pinned tweets. In other words, it was confirmed that the suicidal people were actively communicating on social media about their suicidal thoughts and attempts that they did not tell others in their daily life. Thus, it would be possible to find and help those at risk of suicide that is difficult to find in our daily life through social media. We expect that with detailed metadata analysis and the state-of-the-art image detection and classification techniques, detection of possible suicidal users may become less challenging. Similar to findings of previous studies,[13,29,40] we also found that most of the suicidal users who revealed their gender or age were youth females. This finding described that youth female users were more willing to reveal their personal thoughts or emotions related to suicide than male and adult users. Second, through the comparative analysis, we found that the experimental group (i.e. suicidal users) wrote longer but fewer tweets than the control group (i.e. non-suicidal users). It is possible that the experimental group tweet when they feel the need to say something and do not use Twitter routinely. However, this result that the experimental group wrote longer tweets than the control group is inconsistent with that of a previous work. The inconsistency may be due to differences in language systems, countries, and culture. A clear seasonal pattern was identified in the experimental group's tweeting behavior. The number of tweets written by the experimental group peaked in the fall. Although there is no study investigating seasonal patterns on tweeting behavior, this finding is in line with those of several previous studies about seasonal patterns in actual suicide.[36,37] Some previous studies reported that the peak season of suicide is spring[41-43] while several other studies reported that the peak season is fall.[36,37] Researchers reported that peak seasons vary depending on the suicide method, age, nationality, and gender.[42,44] Third, through suicidal tweets and hashtag analysis, we found additional keywords and hashtags that were frequently used by the experimental group. These suicide-related keywords and hashtags can be utilized in further identifying suicidal tweets and users. Among them, some hashtags that cannot be easily discoverable were used specifically by certain groups of people. With a larger-scale study, we would expect to find more meaningful keywords and hashtags with hidden meanings that might be used confidentially by suicidal users. In addition, when we have no sufficient prior knowledge on suicidality, reviewing hashtags is a straightforward and effective way to discover keywords that can be easily overlooked. For example, using hashtag analysis, we found “Seroquel” and “Alpram”: drugs with side effects such as depression or suicidal thoughts. This information is especially useful in suicide prevention.

Conclusions

In this study, we investigated suicidal users’ information behavior using the following analyses: user profile analysis, comparative analysis between the experimental and control groups, and tweets/hashtags analysis. In user profile analysis, we found that suicidal users made use of their user IDs, usernames, descriptions, and pinned tweets to express their thoughts and/or interests. Suicidal users used keywords related to negative expression/emotion, depression, and suicide in their user IDs and/or usernames. In their descriptions, they specifically made mention of their age, gender, mental disorder, sexual orientation, and date of suicide attempt. In pinned tweets, users also mentioned drinking, smoking, and self-soothing, and uploaded gloomy pictures/videos, selfie images, and photos that imply self-harm. In the comparative analysis, we found that suicidal users wrote longer but fewer tweets than non-suicidal users. A clear seasonal pattern was identified in suicidal users’ tweeting behavior, their number of tweets written peaking in the fall season. However, we only analyzed the seasonal pattern between July and December due to the difficulty of data collection. Therefore, additional data collection is required to understand the overall seasonal pattern of the whole year. Furthermore, although we found the seasonal pattern in tweeting behavior of the suicidal users, it did not explain its correlation with the suicide rate. Thus, there is a need to further research by considering the various elements to understand the seasonal pattern in the suicidal users indicated in social media. In suicidal tweets and hashtag analysis, we found additional suicide-related keywords and hashtags that are related to methods of suicide/self-harm, tools for suicide/self-harm, sexual orientation, and signals used by certain online communities. The study has the following contributions. First, we manually annotated 20,000 tweets and constructed a valuable dataset for studying suicidality on Twitter. To our knowledge, it is the first annotated dataset in South Korea that can be utilized by future studies. Second, we identified suicidal users and their characteristics of information behavior on Twitter by thoroughly reviewing tweet metadata. These can be used in identifying potential suicidal people on a large scale starting by analyzing public tweets. Through this, we confirmed the possibility of identifying potential suicidal groups through information behavior on social media. Based on the dataset and findings of this study, we believe that more effective actions for early intervention and suicide prevention might be possible. Third, we captured the information behavior of suicidal people (i.e. user profile, tweet usage patterns, keywords, and hashtags) that cannot be identified with electronic health records (EHRs). Based on these characteristics of information behavior on social media, we believe more accurate actions for suicide prevention and early intervention on social media could be conducted. The limitations of our study are as follows. First, although our results showed a high accuracy in suicide prediction with extensive manual processes of annotation and coding, this method is not practical in a large-scale analysis of tweet data. Therefore, the development of automated or semiautomated methods that are based on the findings of this study is required. Second, although the approach of the study is useful to identify suicidal users on social media, it may have a limited impact on early intervention due to the anonymous nature of social media. Third, we found several differences in information behavior between the suicidal and non-suicidal users, but it was difficult to find clinical evidence that explain our findings. Further studies utilizing linked clinical and social media data would be a great solution.

28 in total

1. Seasonal variation in specific methods of suicide: a national register study of 20,234 Finnish people.

Authors: Pirkko Räsänen; Helinä Hakko; Jari Jokelainen; Jari Tiihonen
Journal: J Affect Disord Date: 2002-09 Impact factor: 4.839

2. Evidence for lack of change in seasonality of suicide from Timiş County, Romania.

Authors: Martin Voracek; Mona Vintila; Maryanne L Fisher; Paul S F Yip
Journal: Percept Mot Skills Date: 2002-06

3. Internet suicide searches and the incidence of suicide in young people in Japan.

Authors: Akihito Hagihara; Shogo Miyazaki; Takeru Abe
Journal: Eur Arch Psychiatry Clin Neurosci Date: 2011-04-21 Impact factor: 5.270

4. A Linguistic Analysis of Suicide-Related Twitter Posts.

Authors: Bridianne O'Dea; Mark E Larsen; Philip J Batterham; Alison L Calear; Helen Christensen
Journal: Crisis Date: 2017-02-23

5. Suicide on facebook.

Authors: Amir Kumar Ahuja; Krystine Biesaga; Donna M Sudak; John Draper; Ashley Womble
Journal: J Psychiatr Pract Date: 2014-03 Impact factor: 1.325

6. Sex-specific time patterns of suicidal acts on the German railway system. An analysis of 4003 cases.

Authors: N Erazo; J Baumert; K-H Ladwig
Journal: J Affect Disord Date: 2004-11-15 Impact factor: 4.839

7. Exploring Behavior of People with Suicidal Ideation in a Chinese Online Suicidal Community.

Authors: Zheng Wang; Guang Yu; Xianyun Tian
Journal: Int J Environ Res Public Health Date: 2018-12-26 Impact factor: 3.390

8. Mediation Effect of Suicide-Related Social Media Use Behaviors on the Association Between Suicidal Ideation and Suicide Attempt: Cross-Sectional Questionnaire Study.

Authors: Xingyun Liu; Jiasheng Huang; Nancy Xiaonan Yu; Qing Li; Tingshao Zhu
Journal: J Med Internet Res Date: 2020-04-28 Impact factor: 5.428

9. Analysing the connectivity and communication of suicidal users on twitter.

Authors: Gualtiero B Colombo; Pete Burnap; Andrei Hodorog; Jonathan Scourfield
Journal: Comput Commun Date: 2016-01-01 Impact factor: 3.167

10. Detection of Suicidal Ideation on Social Media: Multimodal, Relational, and Behavioral Analysis.

Authors: Diana Ramírez-Cifuentes; Ana Freire; Ricardo Baeza-Yates; Joaquim Puntí; Pilar Medina-Bravo; Diego Alejandro Velazquez; Josep Maria Gonfaus; Jordi Gonzàlez
Journal: J Med Internet Res Date: 2020-07-07 Impact factor: 5.428