Literature DB >> 36118953

A Theory-based Deep-Learning Approach to Detecting Disinformation in Financial Social Media.

Wingyan Chung¹, Yinqiang Zhang², Jia Pan².

Abstract

The spreading of disinformation in social media threatens cybersecurity and undermines market efficiency. Detecting disinformation is challenging due to large volumes of social media content and a rapidly changing environment. This research developed and validated a theory-based, novel deep-learning approach (called TRNN) to disinformation detection. Grounded in social and psychological theories, TRNN uses deep-learning and data-centric augmentation to enhance disinformation detection in financial social media. Temporal and contextual information is encoded as specific knowledge about human-validated disinformation, which was identified from our unique collection of 745,139 financial social media messages about four U.S. high-tech company stocks and their fine-grained trading data. TRNN uses multiple series of long short-term memory (LSTM) recurrent neurons to learn dynamic and hidden patterns to support disinformation detection. Our experimental findings show that TRNN significantly outperformed widely-used machine learning techniques in terms of precision, recall, F-score and accuracy, achieving consistently better classification performance in disinformation detection. A case study of Apple Inc.'s stock price movement demonstrates the potential usability of TRNN for secure knowledge management. The research contributes to developing novel approach and model, producing new information systems artifacts and dataset, and providing empirical findings of detecting online disinformation.

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Entities: Chemical

Keywords: Cybersecurity; Deep learning; Design science; Disinformation detection; Financial market; Machine learning; Secure knowledge management; Sequence prediction; Social media; Temporal recurrent neural network

Year: 2022 PMID： 36118953 PMCID： PMC9465158 DOI： 10.1007/s10796-022-10327-9

Source DB: PubMed Journal: Inf Syst Front ISSN： 1387-3326 Impact factor: 5.261

Introduction

Concern about disinformation in social media is rising. Results of 150 interviews of industry practitioners, subject-matter experts, and government officials across nine countries show that disinformation campaigns on social media will likely increase in the future (Cohen et al., 2021). The spread of disinformation about COVID-19 vaccination (Bond, 2021), financial market (Commission, U.S.E, 2015), and political election (Abrams, 2019) (among others) threatens cybersecurity and seriously undermines social confidence (Chung, 2016). However, detecting disinformation in social media can be challenging. The large volumes of social media content and rapidly changing market indicators (e.g., stock prices, sales) make it difficult to accurately identify disinformation, defined as “false information that is purposedly spread to deceive people” (Lazer et al., 2018). In recent years, artificial intelligence (AI) technologies (machine learning and deep learning in particular) have been used to detect fake news and misinformation (e.g., Reis et al., 2019; Ducci et al., 2020). Despite their capability to automatically learn from data, a lack of data sense-making and data-centric augmentation is prevalent among various high-stake AI applications (Sambasivan et al., 2021). Social and psychological theories can be used to explain human behavior when faced with disinformation. But their use to enhance the application of AI to detect disinformation is not widely available. This research seeks to answer several questions: (1) How can machine learning techniques and social and psychological theories be used to support detection of online disinformation? (2) How can a theory-based, data-driven AI approach be developed to detect disinformation in social media? (3) Compared with widely-used machine-learning techniques, how does the approach perform in detecting disinformation in financial social media? To answer these questions, we developed and validated a novel, theory-based deep learning approach, called temporal recurrent neural network (TRNN), to address the challenges in disinformation detection from social media. TRNN advances the classification capability of traditional AI techniques and was developed based on social and psychological theories and data-centric augmentation. To evaluate the usability and effectiveness of TRNN, we built a unique collection of 745,139 social media messages and fine-grained stock price data (of 277 contiguous trading days) of four U.S. high-tech companies. Our experiments empirically compared TRNN with widely-used machine learning techniques. We provide a case study to demonstrate the potential application to secure knowledge management. The results indicate a high generalizability of TRNN in other data types and domains than finance, and have strong implication for design science research and information systems practice.

Literature Review

Disinformation is used increasingly to manipulate human perception (Cybenko & Cybenko, 2018) and has raised concern in the academia, governments, and industries (Cohen et al., 2021; Chung, 2016; Del Vicario et al., 2016; Vosoughi et al., 2018). Social and psychological theories can be used to explain the proliferation of disinformation. Various techniques and methods have been developed to address the issues of disinformation. This review tries to identify strengths, weaknesses, and research gaps from the literature. Tables 1 and 2 summarize respectively methods of disinformation detection and components of systems used in disinformation detection research.

Table 1

A summary of methods for disinformation detection

Category	Sub-Category	Description	Strength	Weakness	Work
Manual	Rule based	Rule-based methods utilize	Rule based methods are	Trivial blacklist	(Lee et al., 2018;
Pattern	methods	user white lists or keyword	able to easily incorporate	keywords matching tend	Ribeiro et al., 2018)
Matching		blacklists and manually	expert’s domain	to be error-prone when
		crafted rules to detect	knowledge into the	the context contains
		disinformation	blacklists to guide the	negation, sarcasm, etc.
			disinformation detection
Network	Semantic	Semantic based methods	Network based methods	If the entity to be checked	(Ciampaglia et al., 2015;
Based	network	capture the structure of the	are promising in accuracy	is not in the existing	Del Vicario et al., 2016;
Methods	methods	knowledge network and use it	of statements of the form	database, the	Ruchansky et al., 2017;
		to infer the truthfulness of	“A is B” and it also	disinformation detection	Shi & Weninger, 2016)
		given information	reveals the topology	can not be done
			dynamics of the social
			connections
	Diffusion based	Diffusion based methods look	Once suspicious accounts	Extremely	(Kuhlman et al., 2013;
	methods	for critical network links or	and initial spread of	computationally heavy in	Nguyen et al., 2019;
		nodes to control the spread of	disinformation are	the context of billions of	Pham et al., 2019;
		disinformation	identified, the epidemic	nodes and links	Vosoughi et al., 2018)
			of devastating
			consequences can be
			avoided
Machine	Traditional	Traditional ML methods use a	With different variations	Traditional ML methods	(Delort et al., 2011;
Learning	machine learning	collection of labeled instances	and kernel tricks,	might not model the	Feng et al., 2012;
(ML)	methods	to train a classifier such as	traditional machine	complex social dynamics	Shu et al., 2017;
Methods		support vector machine	learning methods are	exhibited in social media.	Reis et al., 2019;
		(SVM), Decision Tree and	flexible in different		Giasemidis et al., 2018;
		logistic regression to learn the	situations to handle the		Langley et al., 2021)
		disinformation patterns	disinformation features
	Deep Learning	DL methods use multi-layered	Deep learning methods	Difficulty in model	(Zhang et al., 2019;
	(DL) Methods	massive computational units	are powerful in modeling	interpretation and	Zhang et al., 2015;
		to learn disinformation	complex non-linear	explanation. Need large	Volkova et al., 2017;
		features with	social dynamics	amount of labeled	Kumar et al., 2021)
		back-propagation algorithm.		training data
		Representative techniques
		include RNN, LSTM, and
		CNN.

Table 2

Components of systems used in disinformation detection research

Category	Article	System Input	Feature	Technique	Results
Textual	(Giasemidis et al., 2018)	Twitter messages	n-grams,	Semi-supervised	Improved speed with less labeled
feature			part-of-speech	learning algorithm	data for stance classification
	(Wang et al., 2018)	Weibo messages	Event features	Adversarial neural	Improved accuracy on fake news
				network	detection
	(Vosoughi et al., 2017)	Twitter messages	Liguistic and network	Hidden Markov	Improved accuracy on unverified
			features	model	rumours
	(Liu et al., 2018)	Twitter and Weibo	Linguistic features	Neural network	Improved performance in
		messages	and temporal feature	method	detecting disinformation
Network	(Nguyen et al., 2019)	Twitter, Pokec, DBLP	Network node	Linear threshold	Reduced complexity in stopping
feature		nodes and edges	activation feature	model	cyber-epidemics
	(Tong et al., 2017)	Wiki, YouTube,	Neighbour influence	Randomized	Reduced complexity in rumor
		Epinions nodes and edges		Algorithm	blocking
	(Yan et al., 2019)	Wikipedia, Slashdot,	Node disseminating	Link deletion	Improved approximation of
		Google+ nodes and	influence	algorithm	minimizing rumour spread
		edges
	(Zhang et al., 2016)	Twitter, Epinion,	Network propagation	Network monitor	Reduced # of monitors to place in
		Slashdot nodes and	feature	placement algorithm	social network in detecting online
		edges			misinformation

Theoretical Background on Disinformation

Theories can provide clues to explain human behavior when faced with disinformation (Kandhway & Kuri, 2017; Stage, 2013). Social, economic, and psychological theories postulate that humans tend to rely on heuristics and incentives when making judgments. One common heuristic is to follow the crowd in the social environment. Social Contagion Theory postulates that humans behave based on the information available to them (e.g., rational thought, experience) (Le Bon, 1895; Wheeler, 1966). In an online environment, information is often contradictory due to a lack of agreement, forcing the individuals to look for additional cue. Emergent Norm Theory further posits that new norms happen when group leaders and members agree on a new normative status or purpose for the group (Turner & Killian, 1957). These norms and cue become heuristics for judging the reliability of online information. A second heuristic is to rely on neighbors, experts, or famous social actors. Theories based on social positions of humans describe persons as nodes in a social network and their relationship as links. Each node (or each link) can be characterized by common attributes within all the nodes (or all links). Homophily Theory states that network nodes may behave similarly if they share similar nodal attributes (McPherson et al., 2001). Social Impact Theory postulates that the influence of a node in a social network is a multiplicative function of the strength, immediacy, and count of all nodes in the network (Latané, 1981; Sedikides & Jackson, 1990). Social Interaction Theory states that people make decisions based on their social neighbors’ decisions (Becker, 1974). A third heuristic is to decide based on the expected rewards. Social Exchange Theory states that people engage in social interaction with an expectation that it will bring them some rewards, such as respect, approval, or recognition (Emerson, 1976). These heuristics are often exploited by malicious actors who may manipulate the online environment and messages to fabricate some consensus, to distort opinions or messages of famous persons, or to present lucrative payback (e.g., from stock investments). In addition, malicious actors manipulate incentives in a financial market to earn illegal profits. A cornerstone of the Capital Asset Pricing Model (CAPM) (Sharpe, 1964) relates market systematic risk and investors’ expected return of an asset (e.g., stock portfolio), and can be used to explain investors’ expectation of seeking abnormal returns that compensate for risk and the time value of money. Several differences exist between online communication and offline (in-person) communication that may facilitate the creation and spread of disinformation: the quality of human interaction, and the speed and geographic spread of messages (Li et al., 2017; Quan-Haase, 2016). As a result, online communication changes the human perception of time fundamentally. Elements of time that are relevant to the spread of information throughout mass groups include recency and primacy effects (Hovland, 1957; Miller & Campbell, 1959). It has been found that people are more likely to share information when they are exposed to that information recently (recency) and when it is important to them (primacy) (Gino et al., 2009; Ngai et al., 2015). Therefore, malicious actors may use temporal information strategically in online messages to spread disinformation. Techniques that can use temporal information strategically from online content and context can possibly help to detect disinformation. The following review AI techniques for detecting malicious content.

Artificial Intelligence Techniques

Prior research uses rule-based methods that rely on user white lists, keyword blacklists, and hand-crafted rules to detect disinformation (Lee et al., 2018; Owda et al., 2017). Word usage, part-of-speech tags, syntax, and bag-of-word approaches are used to learn the patterns of disinformation messages (Feng et al., 2012; Markowitz & Hancock, 2016). Since rule-based methods rely primarily on n-grams or syntactical analysis, contextual meaning in the word sequence may not be captured (Conroy et al., 2015). Creating and maintaining hand-crafted rules is time-consuming and lacks generalizability.

Network-based Methods

Aside from content aspect of the online messages, network-based methods capture behaviors and structure of the online communities to help detect disinformation. Semantic network represents semantic relations between entities in a network (Sowa, 1987). By aggregating the existing information in network (e.g., profiles, labeled users, and confirmed statements), upcoming new messages can be fact-checked quickly with high accuracy (Dave, 2013). By analyzing the network topology, scores can be assigned to entities based on their relevance and distance to classify disinformation (e.g., Ciampaglia et al., 2015; Ruchansky et al., 2017). Diffusion-based methods study the propagation pattern of disinformation in social networks. A random walk algorithm is used to remove most effective links in online social network to prevent the spread of disinformation (Nguyen et al., 2019). A study using differential diffusion found that false news diffused significantly faster, deeper and more broadly than legitimate news (Vosoughi et al., 2018). Edge removal was used in Kuhlman et al. (2013) as a heuristic to limit the disinformation diffusion.

Machine Learning Methods

Machine learning (ML) is the set of theoretical and practical approaches for designing machines that learn autonomously from data without explicitly being programmed (Mitchell, 1997). A subcategory of ML, supervised ML techniques use labeled data instances, each consisting of a feature vector X and an output label y, to infer a mathematical function that maps from X (e.g., sales transactions) to y (e.g., fraud or non-fraud). These techniques have been used to detect fake news (Cybenko & Cybenko, 2018). Traditional ML techniques include such diverse methods as k-nearest neighbor classifier, support vector machine (SVM), random forests (RN), and XGBoost (Reis et al., 2019). For instance, a Naive Bayes classifier was used to detect information that violates community guidelines (Delort et al., 2011). SVM was used with syntactic features to distinguish deception from benign information (Feng et al., 2012) and was shown to outperform logistic regression, decision tree, and neural networks in classifying textual news headlines into true or fake news (Langley et al., 2021). “Event adversarial neural networks,” a supervised ML technique that uses massive interconnected computational units, was used to perform multi-modal fake news detection (Wang et al., 2018). Convolutional neural network (CNN) achieves good performance in general sentence classification tasks such as sentiment analysis (e.g., online reviews) (Zhang et al., 2015, 2019). Sequence models, such as Markov models and Kalman filters, deal with sequential data but are ill-equipped to learn long-range dependencies (Alzaidy et al., 2019).

Deep Learning

Deep learning – the use of multi-layered, interconnected computational units to infer non-linear functions (as found in CNN, RNN, and LSTM) – has dramatically advanced different application domains, most notably computer vision and speech recognition (LeCun et al., 2015). Deep learning (DL) models have been developed and applied to detecting different forms of false information, such as rumors, fake news, and misinformation. For example, a tree-structured classifier, known as “cascade-LSTM,” was developed to learn from retweet behavior and to predict the veracity of 2,156 Twitter cascades that contain misinformation, giving a 2.8% improvement over the best baseline classifier (Ducci et al., 2020). To detect algorithmically modified images and videos (or “deepfakes”), a decentralized blockchain framework uses multiple LSTM networks to support tracing and tracking of a digital content’s historical provenance (Chan et al., 2020). In a study of classifying hate speeches on Twitter, a CNN-based model and a pre-trained VGG-16 network were used to process text (encoded with Glove embedding vectors) and image data respectively (Langley et al., 2021). Another application called “F-NAD” uses an ensemble technique of recurrent neural networks (LSTM and GRU) to classify the origins of news articles into either fake or real sources (Barua et al., 2019). Multiple neural network approaches (CNN, LSTM, bidirectional LSTM) were compared in detecting fake news (collected from Twitter and PolitiFact), finding that CNN plus bidirectional LSTM ensembled network with attention mechanism achieved the highest accuracy of 88.78% (Kumar et al., 2020). Another study comparing CNN, RNN, and LSTM and a tree-structured RNN produced similar findings, showing superior performance of bi-directional LSTM model over other methods (Bahad et al., 2019). The aforementioned studies indicate superior performance of bi-directional LSTM among other DL techniques due to its ability to address vanishing gradient and long-term memory problems. Other than using variations of RNN, the attention mechanism has gained much traction due to its high performance in language translation (Vaswani et al., 2017). However, due to a model-centric design, attention-based DL algorithms are limited by the quality and quantity of available data, and their adoption is limited by the level of trust afforded by human users (Gennatas et al., 2020). Disinformation detection presents additional challenges due to the intentional deception found in communities seeking profits (e.g., financial investment).

Feature Representation Learning

Disinformation detection requires understanding both the content and context of the information being used to deceive recipients. Prior research considered temporal characteristics such as content freshness and the period of time to classify rumors into different categories (Knapp, 1944). Temporal features play a role during breaking news events. In early stages of news release, people tend to support unverified rumor but as time goes on, a shift occurs to debunk false rumors (Zubiaga et al., 2016). Burstiness and linguistic, temporal, and structural features of rumor propagation were studied, finding that the popularity of rumors fluctuates over time in different platforms of social media (Kwon & Cha, 2014; Kwon et al., 2013). In a feature stability analysis, structural and temporal features were found to distinguish rumors from non-rumors over a long-term window, whereas user and linguistic features performed well in the early stages of rumor propagation (Kwon et al., 2017). Prior work has also used textual features in disinformation detection. By using typical text phrases to express skepticism about factual claims, a study found that rumor clusters can be detected at about a third of the top 50 clusters in Twitter (Zhao et al., 2015). Bigram-based textual features were used to identify rumors in microblogs (Qazvinian et al., 2011). LSTM (Hochreiter & Schmidhuber, 1997; Mikolov et al., 2010; Yu et al., 2019) has been used to represent features using a word encoder, a sentence encoder and a headline-body encoder in detecting fake news in 2016 US election (Singhania et al., 2017). CNN and RNN were applied to detecting fake news in the event of Sydney siege, Ottawa shooting, Germanwings crash, etc. (Ajao et al., 2018). While prior studies examined different aspects of rumors and fake news, disinformation features that are highly interrelated (e.g., market prices, media content, and temporal features) such as those in financial social media are not studied widely. A summary of methods for disinformation detection Components of systems used in disinformation detection research

Summary of Research Gaps

The literature review has identified a diverse set of theories, methods, techniques, and features used in detecting fake news, misinformation, and deceptive information. Table 1 provides a summary of different categories of methods. Table 2 summarizes various system input, features, and techniques. Manual pattern matching can yield accurate and intuitive results, but does not scale up to rapid growth of online data. Network-based methods can reveal online community structure and user behavior, but do not help to identify whether the behavior constitutes disinformation. Deep learning techniques, and RNN and LSTM in particular, have shown promise in detecting rumors and fake news (Chan et al., 2020; Ducci et al., 2020). However, current studies on recurrent neural networks are mainly empirical explorations and lack explicit knowledge (p. 1261, Yu et al., 2019), thus requiring a richer and clearer representation of the complex features that may appear in disinformation. Although various features were studied in previous works (Singhania et al., 2017; Wang et al., 2018), prior research does not consider temporal dependencies of textual features in social media and does not incorporate features whose values interrelate highly with the economy (e.g., financial market). These requirements call for new DL approaches and representation that simultaneously address the temporal and contextual needs of disinformation detection. Another research gap is inadequate application-domain expertise among artificial intelligence (AI) practitioners to support data sense-making. Results of interviews with 53 AI practitioners in high-stake domains show that a lack of domain expertise is experienced by 43.5% of the practitioners and can cause negative downstream data issues in deployment (Sambasivan et al., 2021). A lack of well-defined ground truth and high-quality data can hamper the training and application of DL models to detect disinformation. Unfortunately, existing research focuses primarily on model development and does not use data-centric augmentation (Gennatas et al., 2020) to enhance accurate detection of disinformation.

Temporal Recurrent Neural Network

This section describes a novel theory-based, data-driven temporal recurrent neural network (TRNN) approach that is developed based on a design science paradigm (Ericsson & Simon, 1993) to address the aforementioned gaps. Grounded in social and psychological theories, TRNN dramatically expands the power of traditional recurrent neural networks (RNNs) by incorporating contextual and temporal information from social media, financial stock prices, general market trends, and the complex interactions among these factors. A unique representation of temporal and contextual information in disinformation allows TRNN to encode specific knowledge for the detection. The design artifacts include the TRNN approach and model, an instantiation of the model with application to disinformation detection in financial social media, and the related disinformation dataset.

Design Rationale

The design rationale of TRNN is three-fold: (1) to advance the architecture of traditional approaches by using theories and specific temporal and contextual information, (2) to enrich and clarify the knowledge representation of complex features found in disinformation, and (3) to enhance detection performance by using data-centric augmentation. Addressing the needs for modeling complex disinformation in social media, TRNN captures temporal and contextual information by using the posting times of and neighboring words appearing in messages, and considers various behavioral attributes as postulated by social, economic, and psychological theories (Becker, 1974; Hovland, 1957; Li et al., 2017; Miller & Campbell, 1959; Quan-Haase, 2016; Turner & Killian, 1957) reviewed in Section 2.1. Different from prior work that uses time windows of stock trading information (e.g., Islam et al., 2018), TRNN learns from social media messages organized into contiguous, time-based scenarios, each spanning five minutes and containing all messages posted during that time span. Only scenarios in which an abnormal return of the interested financial stocks is observed are considered in disinformation detection, because malicious hackers often launch cyber attacks amidst market turbulence to gain illegal profits (e.g., Commission, U.S.E, 2015). Based on social exchange theory (Emerson, 1976), TRNN considers abnormal returns in identifying disinformation because people are lured by these returns to engage in a social media environment. To support the learning of dynamic information from scenarios that span a long time, TRNN concatenates multiple RNNs to enable deep learning of long-range textual and temporal dependencies for disinformation detection. Different from prior research that relies primarily on generic word embeddings and trading data (e.g., Seth & Chaudhary, 2020), TRNN considers enriched data from social media, financial market, temporal information, and multiple independent human annotations to increase prediction accuracy. Contextual information is obtained from time-based scenarios and can uniquely model changes in social media discussion and their impact on financial stock prices. The use of positive pointwise mutual information (PPMI, to be explained below) helps to extract semantic content from noisy messages (Jurafsky & Martin, 2020). We implemented the TRNN model in a proof-of-concept system, whose three modules and architecture are shown in Fig. 1 as explained below.

Fig. 1

Architecture of the temporal recurrent neural network approach

Module 1 - Feature Representation with PPMI

Module 1 converts raw textual data into numerical features that encode content and contextual information. To capture contextual information from voluminous text, TRNN uses a context of 11 words (five words before and five words after a target word, plus the target word) to identify neighboring words and to compute the positive pointwise mutual information (PPMI) of a target word. The size of the context was determined based on empirical testing that balances between the extent of context being considered in a word of a social media post and specific information presented in the word. Based on social contagion theory, TRNN uses the context of social media text to identify potential disinformation because humans behave based on the information they receive (Wheeler, 1966; Le Bon, 1895). TRNN also uses emergent norm theory (Turner & Killian, 1957) to characterize humans’ behavior of following opinion leaders in a social media environment. PPMI is a measure of the contextual information of a word with reference to the collection of words used in the corpus. Each word is represented as a vector of numeric values that reflect the word usage in relation to other words. Shown in Eq. 3, is a measure of the likelihood of co-occurrence of words i (target word) and j (contextual word), compared with what would be expected if they were independent (Jurafsky & Martin, 2016). The likelihood that word i occurs in the context of word j is computed as , whereas the likelihood that word j occurs in the context of word i is computed as , as shown in Eq. 2, in which is the frequency of co-occurrence of words i and j in the same context. PPMI, an improved version of Pointwise Mutual Information (PMI), ranges from zero to infinity and replaces negative PMI values (which carry no semantic meaning) by zeros. PPMI overcomes some limitations arising from the use of word frequency alone that may ignore word context (e.g., “Apple” occuring near “computer” carries a different meaning than “Apple” occuring near “juice.”). Our human-annotated data collection (as described below) allows PPMI to encode more specific knowledge about disinformation than general word embeddings (e.g., GloVe) or traditional information retrieval methods (e.g., tf-idf) (Kowsari et al., 2019; Salton & McGill, 1983). Architecture of the temporal recurrent neural network approach

Module 2 - RNN Using PPMI Textual Features

Module 2 transforms the output of Module 1 into sequences of activation values computed by recurrent neural network (RNN) cells. Figure 1 shows that the PPMI values of textual features are fed into multiple RNNs, each representing a time segment containing all the tweets in their time period. Two same-length segments of trading times are considered: (1) from 9:30am to 12:45pm (morning) and (2) from 12:45pm to 4:00pm (afternoon). Each RNN node represents one time segment. A single long short-term memory (LSTM) node is used in the RNN cells. The formulas that are used in Module 2 to compute the output values are given in Eqs. 4, 5, 6, 7, 8 and 9, in which is the input feature vector of PPMI values; is the forget gate that controls the extent to which textual/temporal information is not stored in the RNN cells; is the output gate that controls the extent to which the output values in Module 2 are used to feed in Module 3 and activation function for final classification respectively; is the PPMI input feature vector; is the hidden state vector that holds previous textual/temporal information the neural network has been presented during model training; and is the cell state vector that transfers relative information as a highway in textual/temporal sequences. W, U and b are the weights matrices and bias vectors for disinformation detection that are learned in the training process. and are the sigmoid and hyperbolic tangent activation functions (Han & Moraga, 1995) to map the output values to probabilities. Compared with the standard feedforward neural network, LSTM can be used to memorize and learn the feedback information in both text and time sequences. LSTM overcomes the “vanishing gradient problem” in traditional RNN by applying multiplicative gates that enable information to pass through the internal states of the memory cells. To train TRNN (which uses multiple LSTM units as shown in Fig. 2), the weights are updated corresponding to the gradient of an error function (the cross entropy is used because it can achieve a maximum likelihood in the disinformation/benign binary classification) in every training iteration. The “vanishing gradients problem” happens when the gradient’s value is extremely small such that the weights either change too slowly or do not change at all in subsequent iterations of training, preventing the neural network from learning correctly from long-range textual features. The input gates, output gates, and forget gates are designed to keep selected values in states unmodified to achieve memorization and correct prediction in long sequences (Hochreiter & Schmidhuber, 1997).

Fig. 2

A long short-term memory unit

Module 3 - Recurrent Neural Network Using Temporal Features

Module 3 takes the output from Module 2 to produce a single time-series RNN that incorporates the temporal information of the scenarios. The formulas given in Eqs. 4, 5, 6, 7, 8, 9 are also used in the RNN of Module 3, with different input sizes than those of Module 2 (the RNN takes as input textual feature vectors ). Psychological theories have shown that people are more likely to share information when they are exposed to that information recently (recency) or when it is important to them (primacy) (Gino et al., 2009; Ngai et al., 2015). Therefore, TRNN models each input feature vector by including its time segment (morning or afternoon) to incorporate temporal information. A sigmoid function is used in the final layer to produce a probability to indicate the likelihood that a given scenario contains disinformation.

Novelty of TRNN

The novelty of the TRNN approach includes its theoretical foundation, comprehensiveness in modeling disinformation, and innovative data-centric augmentation of DL techniques. First, social and psychological theories were used to explain human behavior in spreading and detecting disinformation. Disinformation in financial social media reflects malicious behavior of illegal profit-seeking by exploiting people’s motivation to gain from abnormal market movements and to follow the “crowd” in an uncertain social environment. As explained in Sections 3 and 4.2, TRNN uses Capital Asset Pricing Model (Sharpe, 1964) (as a data augmentation method) to characterize abnormal price movements of stock portfolios, Social Contagion Theory to model human behavior based on the information they receive (Wheeler, 1966; Le Bon, 1895), Emergent Norm Theory (Turner & Killian, 1957) to represent investors’ behavior to follow the “crowd,” and Social Exchange Theory (Emerson, 1976) to represent malicious actors’ behavior of seeking illegal profit by posting disinformation social media messages. These theories are unique in TRNN and have not been used in prior DL techniques and in disinformation detection. Second, the TRNN approach is designed specifically to capture temporal and contextual information from messages that may contain disinformation, thus enriching knowledge representation of the complex features. To our knowledge, TRNN is the first approach that models human temporal perception of information by considering the importance of recent occurrences and past memory according to psychological theories (Gino et al., 2009; Ngai et al., 2015). The approach also advances IS research by designing and validating new information technology artifacts (Hevner et al., 2004) for detecting disinformation in financial social media. Third, the TRNN approach advances traditional DL methods (such as RNN and LSTM) by integrating multiple layers of RNN and LSTM cells and combining financial information (modeled by CAPM (Sharpe, 1964) and abnormal price movements), textual information (using PPMI), and interactions of market signals and timed trading patterns in the prediction of disinformation. The integration enables dynamic modeling of disinformation in financial social media that is beyond the predictive capabilities of traditional RNN techniques.

Experimental Design

To understand the usability and performance of the TRNN approach to disinformation detection, we conducted a series of experiments to compare the TRNN approach with different machine learning techniques. The U.S. financial market is chosen as the domain of the experiments because malicious hackers often spread disinformation to create abnormal price movements and to gain illegal profits. The study used a unique dataset that was built in this research to capture disinformation in social media. The following sections describe the data collection, augmentation, and experimental hypotheses.

Data Collection

The research developed an automated system to collect social media messages and stock prices of four Dow Jones Industrial Average (DJIA) technology companies: Apple, Microsoft, Intel, and Cisco (Chung & Sura, 2019). These companies were selected based on their important roles of providing technology products (e.g., smart phones), services (e.g., office productivity software), infrastructure (e.g., integrated circuits), and network hardware (e.g., routers) to the global economy. The system consists of multiple components to transform raw data into feature values (see Fig. 3): The crawler is a general-purpose collection agent that crawls publicly accessible web resources. The scraper extracts relevant data from HTML and JSON files and can be configured for different data content. The featurizer transforms raw text and financial data into input values for TRNN, using formulas as explained in Sections 3.2 and 4.2.1. The scheduler is the starting point of the system and regulates various day-to-day operations as mentioned above. The learner supports experimentation with different algorithms and approaches for machine learning. Using the collected data, a research test bed was built for use in the experimental evaluation.

Fig. 3

An automated system to collect and transform data to support the TRNN approach

An automated system to collect and transform data to support the TRNN approach The stock prices and social media messages were recorded once every five minutes on each U.S. trading day during 11-July-2017 – 15-August-2018 (277 trading days). The social media messages were collected from the sites Twitter and StockTwits (a social media platform for sharing ideas between investors, traders, and entrepreneurs) by using their public APIs. The total number of messages is 745,139, in which 560,062 messages are from Twitter and 185,077 messages are from StockTwits. The raw data includes (for each message) full text, timestamp, weekday, daytime (morning/afternoon), publication source, number of “likes,” target respondent(s), sentiment (positive/negative), and author’s total number of “likes.” Selected sample messages collected from StockTwit are shown in Fig. 4.

Fig. 4

A sample of social media discussions about stock price movements

A sample of social media discussions about stock price movements We collected stock market data from three sources to enable computation of abnormal returns of stocks (to be explained in Section 4.2.1): (1) Google Finance provides real-time stock prices and S &P 500 index values that were used to indicate market performance. (2) The U.S. Treasury provides the latest risk-free rates of return for different maturities. (3) Yahoo Finance provides historical monthly closing stock prices. Due to its popularity among investors who also use financial social media, Google Finance and Yahoo Finance were chosen instead of other professional services (e.g., CRSP database). On each trading day, our system automatically collected data continuously (over 5-minute intervals) from the start (9:30 am) to close (4:00 pm) of the stock market (Jeong, 1999).

Data Augmentation

We performed data augmentation on our collected data to (1) transform the social media messages into contiguous message sets (called scenarios) and to derive labels from abnormal rates of return of the stock prices and to (2) produce human-validated labels of disinformation for each scenario.

Data Transformation

The raw data were transformed to scenarios and labels that indicate abnormal stock price movements. Each scenario is a concatenation of all social media messages posted during a 5-minute time frame. The total number of scenarios extracted from the raw data is 10,455 (37.74 scenarios per day). A label is assigned automatically to each scenario to indicate abnormal return (up, down, or none) of the market-weighed price of the selected companies’ stocks. The abnormal rate of return was calculated using Eq. 10 (according to Capital Asset Pricing Model (Sharpe, 1964), where is Stock i’s actual rate of return; is the market return based on S &P 500 index normalized to the time span of the scenario; is Stock i’s price volatility relative to the overall market and is computed as the ratio of the covariance between the rate of return of Stock i and market rate of return () divided by the variance of market rate of return. In Eq. 10, the stock portfolio consists of the four selected stocks. The portfolio’s rate of return () is computed as the weighted sum of the component stocks’ rates of return (in which the “weight” of a stock is the ratio of the stock’s market capitalization to the total market capitalization of all the portfolio’s stocks). Consequently, a scenario is labeled as one of the following: “normal,” “abnormal up,” or “abnormal down.”Based on the calculation, 10,170 scenarios (97.27%) have normal price movement whereas 137 scenarios (1.31%) have abnormal upward movements (abnormal rate of return = 0.5% or above), and 148 scenarios (1.42%) have abnormal downward movements (abnormal rate of return = % or below). Only scenarios that are labeled “abnormal up” or “abnormal down” were used to study whether disinformation exists. On average, each abnormal up scenario has 62.58 messages and each abnormal down scenario has 70.42 messages (overall average = 66.25 messages per scenario). While disinformation may possibly appear in “normal” scenarios and in “abnormal” scenarios, this research focuses only on disinformation found in scenarios with abnormal price movements because of two reasons. First, malicious hackers often leverage abnormal stock price movements to gain illicit profits. Therefore, focusing only on abnormal scenarios would help to create a useful filter of the input data (e.g., normal scenarios) that may be less likely to contain disinformation. Second, abnormal scenarios caused by disinformation are often investigated by financial security regulators and by intelligence specialists to devise strategies to combat cyber attacks. The practical value of detecting disinformation in abnormal scenarios is thus far higher than in normal scenarios. Third, disinformation detection from abnormal scenarios (that are a minority among all scenarios) is not found in the literature. Related studies (such as this one) should provide new findings to support future research developments. To develop the research test bed for use in disinformation detection, we manually built two message sets labeled as “benign” and “disinformation” respectively by drawing messages randomly from the “abnormal” scenarios as explained above. This manual process consists of two steps. First, we randomly drew messages from the “abnormal up” and “abnormal down” scenarios and extracted from each message these contextual and temporal feature values: timestamp, weekday, source, message’s count of “likes,” count of messages in the scenario, author’s count of “likes,” and message sentiment score (calculated by using the tool described in Hutto & Gilbert (2014). Second, we used the aforementioned feature values and the message content to assign an initial label to indicate whether the message is potentially disinformation or not (i.e., benign) (the initial label was later validated by independent human annotators as explained in Section 4.2.2). The two-step process resulted in a balanced dataset consisting of 2,000 messages categorized into four groups (each having 500 messages): (1) abnormal upward / benign, (2) abnormal upward / disinformation, (3) abnormal downward / benign, (4) abnormal downward / disinformation (see Table 3). An even distribution among the four groups ensures the same probability of selecting among the four types of messages in data validation.

Table 3

Categorization (and count) of Messages in Abnormal Stock Price Movements

Categorization	Abnormal upward movement	Abnormal downward movement
Benign Messages	Benign messages in abnormal	Benign messages in abnormal
	upward scenarios (500 messages)	downward scenarios (500 messages)
Disinformation	Disinformation in abnormal	Disinformation in abnormal
	upward scenarios (500 messages)	downward scenarios (500 messages)

Data Validation

The research test bed was validated by five human annotators who independently evaluated the labeling of disinformation in the messages. The validation required each annotator to indicate, for each message of the 50 randomly-sampled from the 2,000 messages (see Table 3), whether they agree on the initial labeling (i.e., benign or disinformation). Each sampled message was displayed to the annotator together with the contextual and temporal feature values as explained above and its initial label (half of the 50 messages were labeled initially as “disinformation” and the other half as “benign” using the two-step process explained above). Categorization (and count) of Messages in Abnormal Stock Price Movements All annotators possess academic degrees from U.S. universities – four annotators have master’s degrees and one has a bachelor’s degree. Each annotator was given a survey with background introduction and a tutorial along with training examples to guide them to perform the task. The use of 5 annotators is aligned with research guidelines stating that generally 3-5 annotators are sufficient to validate the dataset labels to produce reliable results (Burmania et al., 2015). The research was certified by the Institutional Review Board of the investigators’ university to comply with all regulations for protecting human subjects and data privacy. Each annotator classified the same set of 50 messages that contain 25 messages with an initial label of “disinformation” and 25 with an initial label of “benign message.” Out of the 50 messages, 48 messages (24 disinformation and 24 benign messages) were classified (by 3 or more annotators) as having the same label as their initial labels. Based on the results of the annotation, we also calculated three reliability measures of internal consistency among the five annotators: Cronbach’s alpha (Cronbach, 1951), Cronbach’s Alpha Based on Standardized Items, and Guttman’s Lambda 6 (Guttman, 1945). The formulas are given in Eqs. 11, 12, 13. The notation and meaning are shown in Table 4. The results of these three measures are given in Table 5, showing that over 90% of the responses have at least four (out of 5) agreements among the independent human annotators.

Table 4

Notation and its Meaning in the Eqs. 11 – 13

Notation	Meaning
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α	Cronbach’s alpha \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\in$$\end{document}∈ [0,1] that indicates inter-rater reliability
N	Total number of responses from the annotators
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{c}$$\end{document}c^	Average of all covariances between pairs of responses
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{v}$$\end{document}v^	Average variance of each response
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$e_{j}^2$$\end{document}ej2	Variance of the errors of estimate response j on the rest of the responses
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s^2$$\end{document}s2	Variance in each response that can be accounted for the
	linear regression of all of the other responses
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{r}$$\end{document}r^	Mean of correlation coefficients
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _6$$\end{document}λ6	Guttman’s Lambda-6 estimate of reliability

Table 5

Reliability test results

Measure	Value
Cronbach’s alpha, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha$$\end{document}α	0.9094
Cronbach’s standardized alpha, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _{standardized}$$\end{document}αstandardized	0.9074
Guttman’s Lambda-6, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda _6$$\end{document}λ6	0.9059

Notation and its Meaning in the Eqs. 11 – 13 In the sample messages shown in Fig. 4, the highlighted message (listed on the second line) was identified as disinformation, according to verification by independent human annotators described above. In that message, the author twitSocial speculated that the stock price of Apple Inc. would go up dramatically and suggested a timely purchase of the stock. The annotators based their independent decisions on the definition of disinformation (Lazer et al., 2018), which was provided to them in the beginning of the annotation process. Our validation shows that all three measures of reliability of the annotation exceed 90% (see Table 5). Reliability test results

Experimental Benchmarks and Hypothesis Testing

We selected four benchmark techniques to compare against the TRNN approach: artificial neural network (ANN) (Rumelhart et al. 1994), recurrent neural network (RNN), long short-term memory RNN (LSTM) (Yu et al., 2019), and convolutional RNN (CRNN) (Wang et al., 2019). ANN and RNN were chosen because they are among the most popular machine learning techniques for text classification. LSTM is among major deep-learning techniques that overcome the problems of large input gaps and long-term dependencies (Yu et al., 2019). CRNN uses convolutional and pooling layers to extract multiple sets of features that are then used as input to the LSTM neural network. CRNN has been shown to outperform other state-of-the-art text classification methods (fastText developed by Meta Platforms Inc. (formerly known as Facebook) (Joulin et al., 2016) and HAN (Yang et al., 2016) across different datasets (Wang et al., 2019). The benchmark techniques use as their input the textual features extracted from our experimental dataset, whereas TRNN uses as input both textual and temporal features from the same dataset. ANN and RNN use two hidden layers, each having 128 nodes chosen empirically based on the size of the input and output vectors. Both LSTM and CRNN use a batch size of 64 for training and testing, and their architecture and hyperparameters are listed in Table 6.

Table 6

Hyperparameters Used in LSTM and CRNN

Technique	Hyperparameter	Description	Value
LSTM (Yu et al., 2019)	Dense Units	Fully-connected sequential layers,	[32, 32, 1]
		each having a specified number of computational nodes
	Activation function	Functions to transform input	[‘relu’, ‘relu’, ‘sigmoid’ ]
		values to output values
CRNN (Wang et al., 2019)	CNN Channels	Number of channels used in	[8, 16, 32]
		CNN layers
	CNN Kernel	Same kernel size used in all CNN layers	[3, 3, 3]
	Pooling Layer	Input window size for max-pooling in CNN layers	[[2, 2], [2, 2], [1, 1]]
	LSTM Units	Number of units in bidirectional LSTM layers	[64, 64]
	LSTM Dropout rate	Fraction of the units to drop for linear transformation	0.5

Hyperparameters Used in LSTM and CRNN We developed three hypotheses to evaluate the performance of TRNN against the four benchmark techniques. First, TRNN is hypothesized to outperform the benchmarks in detecting disinformation in accuracy due to its novel theory-based development that should enable a deeper understanding of human social and psychological features. Second, TRNN is hypothesized to outperform benchmark techniques in classifying disinformation messages in upward and downward scenarios due to its theory-based prediction of human behavior when faced with either type of scenarios. Third, the relatively better performance of TRNN is hypothesized to achieve a statistically-significant validity due to its theory-based representation and data-driven detection of disinformation. We used pairwise two-sample t-tests to test the hypotheses. We used these performance measures to evaluate the hypotheses: precision, recall, F-score and accuracy, whose formulas are given in Eqs. 14–17. In our experiments, we randomly sampled from our research test bed 1,600 messages for training and 400 messages for testing. Furthermore, we randomly sampled 320 messages from the training set for validation. The random sampling was conducted without replacement.

Experimental Findings and Case Study

This section describes the experimental findings and provides a case to illustrate a potential real-world application of the approach to disinformation detection.

Accuracy of Disinformation Detection

As shown in Fig. 5, TRNN achieved the highest overall accuracy in disinformation detection. LSTM achieved the second-highest accuracy, followed by RNN, ANN, and CRNN in third, fourth, and fifth places respectively. Because TRNN specifically learns from temporal information of disinformation messages in addition to textual features, the TRNN model was able to accurately capture the dynamic signals unique to disinformation. The models of LSTM and RNN were also able to capture long-term dependencies in textual features that relate to temporal information, but their capabilities are lower than that of TRNN because the dependencies are not the most relevant temporal information for the detection. Similarly, ANN models the information by encoding the relationship in network weights, but do not possess as high a capability as TRNN does due to its lack of theoretical representation and of temporal modeling. Surprisingly, CRNN achieved the lowest accuracy – we believe its use of various convolutional operations may produce too much noise from the data, and its lack of temporal information also caused the lower accuracy.

Fig. 5

Accuracies achieved by different techniques on disinformation detection

Detecting Disinformation in Upward and Downward Scenarios

According to the literature, little is known about the dynamics of spreading disinformation in social media (Zubiaga et al., 2016; Kwon & Cha, 2014). This lack of understanding presents significant risk to the economy due to vulnerabilities of malicious use of financial social media. Therefore, we were interested in finding whether the use of TRNN enables better detection of disinformation in abnormal upward and downward price movement scenarios than benchmark techniques. Table 7 shows the results comparing TRNN to the four benchmarks in upward and downward scenarios. The results show that TRNN consistently outperformed the benchmarks in terms of precision and F-score in both upward and downward scenarios; TRNN also outperformed all benchmarks in terms of recall in downward scenarios, and obtained the second-highest recall among all models in upward scenarios (in which CRNN obtained the highest recall). Several observations can be found from the results. First, the differences between the performance of TRNN and the performance of benchmarks are significantly larger than those between RNN and ANN. It is because the use of deep learning and temporal features (explicitly modeled in TRNN) supports a more timely and precise identification of disinformation, which is often spread by malicious hackers to create a “cognitive hack” that may work only for a short time frame (e.g., approximately 20 minutes in the hack reported in Lauricella et al. (2013). Second, the gaps between precision and recall in upward scenarios are generally larger than those in downward scenarios across all deep-learning models (TRNN, CRNN, LSTM). In addition, the scores of recall in upward scenarios are higher than those in downward scenarios across all models. This indicates that the effect of disinformation messages in upward scenarios is more readily identified by the models than in downward scenarios, due to the fact that malicious hackers tend to profit from abnormal gains. Third, in upward scenarios, TRNN has a relatively narrower gap between recall and precision than RNN, ANN, and CRNN have. This can be attributed to the combined effect of the two aforementioned reasons, i.e., temporal effect modeled by TRNN and TRNN’s relatively stronger ability to identify “cognitive hack” in upward scenarios, in which hackers may use pump-and-dump schemes (Xu & Livshits, 2018) that have significant upward price movements signals (e.g., terms such as “BUY”). Fourth, in downward scenarios, TRNN is the only model having all the three measures above 80%, whereas both RNN and ANN have all these measures below 80%, and both LSTM and CRNN have precision and F-score below 80%. This is attributed to the effectiveness of using temporal features in TRNN to identify disinformation when prices go down dramatically (e.g., speculative short-selling).

Table 7

Performance of disinformation detection in downward and upward scenarios

Scenario	Model	Precision	Recall	F-score
Downward	TRNN	0.8036	0.8223	0.8113
	CRNN	0.7290	0.8211	0.7717
	LSTM	0.7527	0.8013	0.7752
	RNN	0.7549	0.7426	0.7465
	ANN	0.7428	0.7578	0.7485
Upward	TRNN	0.7789	0.8481	0.8108
	CRNN	0.7178	0.8573	0.7808
	LSTM	0.7697	0.8287	0.7975
	RNN	0.7270	0.8024	0.7620
	ANN	0.7202	0.8116	0.7623

The bold numbers indicate the best performance achieved by a model among all models’ performances in downward or upward scenarios

Note: the values are averaged from 30 random samples

Performance of disinformation detection in downward and upward scenarios The bold numbers indicate the best performance achieved by a model among all models’ performances in downward or upward scenarios Note: the values are averaged from 30 random samples

Statistical Validity

This section reports results of statistical tests that compared between TRNN and each of the four benchmark techniques. We created 30 random samples with different proportions of training (80%) and validation (20%) in the dataset to evaluate the models’ performance. Pairwise Two-Sample t-Test of Models Using F-score The bold numbers indicate the best performance achieved by a model among all models' performances in downward or upward scenarios Table 8 shows the hypotheses and their testing results. All p-values are significantly below 0.05, indicating that the F-scores of TRNN are significantly higher than those of the benchmark techniques. In addition, the Shapiro-Wilks tests for normality show that the F-scores achieved by all models do not differ significantly from a normal distribution, thus confirming the validity of the two-sample t-tests. Therefore, we conclude that TRNN achieved a significantly higher performance in detecting disinformation from financial social media than all benchmark techniques did. Among the results, the p-value of the test comparing F-scores of TRNN and of LSTM is the highest (still below 0.05 significantly), indicating that LSTM is a close contender to TRNN among all the benchmarks. The results show that the design of TRNN of being able to simultaneously process temporal and textual features can significantly increase the performance in disinformation detection than the benchmarks that use textual features only. Since temporal and textual features are widely available in a variety of social media, we believe TRNN will not only work well in the financial domain, but also in other areas such as political campaign or sport events.

Table 8

Pairwise Two-Sample t-Test of Models Using F-score

The bold numbers indicate the best performance achieved by a model among all models' performances in downward or upward scenarios

Hypothesis	textitp-value	Significant?	Conclusion
F_score(TRNN) > F_score(RNN)	6.0974e-08	Yes	Hypothesis confirmed
F_score(TRNN) > F_score(ANN)	2.2614e-08	Yes	Hypothesis confirmed
F_score(TRNN) > F_score(LSTM)	0.0001319	Yes	Hypothesis confirmed
F_score(TRNN) > F_score(CRNN)	8.5647e-07	Yes	Hypothesis confirmed

Implication and Explanation

The results provide several implications for detecting disinformation with application to financial social media. First, the superior performance of TRNN across different measures indicates the importance of representing the social, economic, and psychological factors at play in the composition and spread of disinformation. The level of complexity in detecting disinformation can be adequately captured by TRNN, while other machine learning / DL models (RNN, ANN, LSTM and CRNN) may either overfit the data or contain too much bias. Therefore, TRNN is shown to be suitable for detecting disinformation in financial social media, thus demonstrating the promise of AI-driven secure knowledge management. The results provide new insights to examine social, economic, and psychological theories to detect malicious online behavior. Second, TRNN supports rational explanation of disinformation detection by using financial data, market movements, and textual information, thus overcoming the interpretability problems due to a “black-box” nature commonly found in other DL techniques (Savage, 2022). A case of disinformation detection related to the stock of Apple Inc. is provided in Section 5.5. TRNN’s use of social contagion theory, emergent norm theory, social exchange theory, and psychological theories on timeliness of human decisions provide theoretical guidance of its detection of disinformation from social media. Third, compared with other approaches using ML/DL methods to detect disinformation (e.g., CRNN, ANN (Wang et al., 2019; Bahad et al., 2019), TRNN provides several advantages: (1) TRNN represents dynamic market signals by considering the temporal and contextual information in each financial social media scenario. (2) TRNN is developed based on social, economic, and psychological theories and can be used to explain its predictions from a theoretical perspective. By contrast, prior work in detecting disinformation does not examine these theories; other ML applications are also not grounded in these theories. (3) TRNN uses data-centric augmentation in its representation of complex features found in disinformation, thus advancing traditional DL techniques that focus primarily on model building and architectural sophistication. These advantages explain the generalizability of TRNN in domains other than finance. As the use of social media is prevalent across different domains, the highly encouraging results obtained from our experiments demonstrate a strong potential of TRNN to contribute to any domains involving textual social media, human decision making, and valuable assets.

A Case on Apple Inc.’s Stock Price Movement

To understand the potential application of TRNN to detecting disinformation in financial social media, we conducted an analysis of the tweets associated with abnormal price movements. We presented an empirical observation of how disinformation correlates with the stock price movement. Figure 6 shows a sample tweet identified from the dataset.

Fig. 6

A tweet about abnormal stock price movement

A tweet about abnormal stock price movement Similar to the tweet posted by the Scottish trader (Patrick, 2015), this aggressive tweet was posted at 2018-02-02 09:24:39 by the well known financial agency Phil’s Stock World. This tweet claimed “a stock failure day” by stating that U.S. Federal Government and President Donald Trump failed to boost the markets. Before this tweet, Apple’s stock price was steady at around $166 since 2018-01-29 for 4 consecutive days. Without any influence of the company’s earning release or major announcement, Apple’s stock price still went down from $166.34 to $164.9 immediately (in less than 9 minutes after the tweet), and then to a closing price of $160.5 that day, resulting in a 3.64% drop. While other factors might have contributed to the change, the dramatic price drop following the tweet signaled the powerful and potentially malicious impact brought by the tweet.

Conclusion

Disinformation in social media poses significant threats to cybersecurity and efficient market operations (Cohen et al., 2021). However, the large volume of social media and rapidly changing environment make it challenging for AI techniques to accurately detect disinformation in social media. This research developed and validated a temporal recurrent neural network (TRNN) approach to addressing the needs. Grounded in social and psychological theories, TRNN incorporates contextual and temporal information from human-annotated social media data and from fine-grained financial market data that are synchronized with the social media data. Findings from our experiments on detecting disinformation in financial social media about four U.S. high-tech companies show that TRNN significantly outperformed the benchmark techniques in both accuracy and classification performance. A case study of social media messages and Apple Inc.’s stock price movement demonstrates a strong potential to apply TRNN to disinformation detection.

Research Contributions

This research makes several contributions. First, this research is the first attempt to develop a theory-based, deep learning (DL) approach that combines contextual, textual, financial, and temporal information in disinformation detection. Grounded in social and psychological theories, the TRNN approach and model have advanced the understanding of disinformation and of ways to detect disinformation from social media. As managers and decision makers face rapidly-growing challenges from online disinformation, the approach and model provide useful tools and techniques for secure knowledge management. Second, this research provides the first data-centric augmentation to existing DL methods for disinformation detection in financial social media. While existing DL methods focus primarily on depth and sophistication of neural network architecture, our findings enrich the understanding of human behavioral data used in training and applying DL methods. Third, the research contributes to new information systems (IS) artifacts and reusable datasets for disinformation detection research in financial social media. While prior research uses standardized datasets for testing DL models, our research breaks new ground by producing a unique disinformation dataset annotated by multiple independent human raters (with empirically-confirmed reliability) and novel IS artifacts in the forms of DL method and its instantiation for detecting disinformation in financial social media. These artifacts and dataset can benefit researchers, practitioners, and people interested in design science research and in related fields (Hevner et al., 2004; Peffers et al., 2007). Fourth, this research contributes to building a generalizable tool for classifying complex instances that involve textual content, temporal information, user activities, and financial assets. With suitable domain adaptation, the tool can be generalized to other applications, such as market prediction based on cryptocurrency movement, rumor detection in online forums, strategic planning in marketing campaigns, and intelligence filtering in adversarial settings (e.g., launching of competing products), among others.

Limitations and Future Directions

There are several limitations in this research. First, the use of only Twitter, StockTwits, and U.S. financial market data in our datasets limits the sources of social media and financial data used in the experiments. Using additional platforms and data sources (e.g., datasets other than DJIA component stocks) will provide new opportunities for deeper understanding of disinformation detection across different markets and online platforms. Second, the design of TRNN assumes daily collections and five-minute scenarios of social media messages to represent contextual information. This design may limit the study of non-linear message propagation in social media networks. These lengths can be dynamically adjusted to extend TRNN to incorporate more contextual information, such as message propagation patterns and network topology. Third, the parameters and structure of TRNN are selected based on our empirical testing, which was limited by our available resources. To address this limitation, extensive optimization and tesing can be used to improve model performance and computational efficiency.

14 in total

1. Recency and primacy in persuasion as a function of the timing of speeches and measurements.

Authors: N MILLER; D T CAMPBELL
Journal: J Abnorm Psychol Date: 1959-07

2. The spreading of misinformation online.

Authors: Michela Del Vicario; Alessandro Bessi; Fabiana Zollo; Fabio Petroni; Antonio Scala; Guido Caldarelli; H Eugene Stanley; Walter Quattrociocchi
Journal: Proc Natl Acad Sci U S A Date: 2016-01-04 Impact factor: 11.205

Review 3. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

4. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures.

Authors: Yong Yu; Xiaosheng Si; Changhua Hu; Jianxun Zhang
Journal: Neural Comput Date: 2019-05-21 Impact factor: 2.026

5. The science of fake news.

Authors: David M J Lazer; Matthew A Baum; Yochai Benkler; Adam J Berinsky; Kelly M Greenhill; Filippo Menczer; Miriam J Metzger; Brendan Nyhan; Gordon Pennycook; David Rothschild; Michael Schudson; Steven A Sloman; Cass R Sunstein; Emily A Thorson; Duncan J Watts; Jonathan L Zittrain
Journal: Science Date: 2018-03-08 Impact factor: 47.728

6. The spread of true and false news online.

Authors: Soroush Vosoughi; Deb Roy; Sinan Aral
Journal: Science Date: 2018-03-09 Impact factor: 47.728

7. Contagion and differentiation in unethical behavior: the effect of one bad apple on the barrel.

Authors: Francesca Gino; Shahar Ayal; Dan Ariely
Journal: Psychol Sci Date: 2009-02-23

8. Computational Fact Checking from Knowledge Networks.

Authors: Giovanni Luca Ciampaglia; Prashant Shiralkar; Luis M Rocha; Johan Bollen; Filippo Menczer; Alessandro Flammini
Journal: PLoS One Date: 2015-06-17 Impact factor: 3.240

9. Rumor Detection over Varying Time Windows.

Authors: Sejeong Kwon; Meeyoung Cha; Kyomin Jung
Journal: PLoS One Date: 2017-01-12 Impact factor: 3.240

10. Expert-augmented machine learning.

Authors: Efstathios D Gennatas; Jerome H Friedman; Lyle H Ungar; Romain Pirracchio; Eric Eaton; Lara G Reichmann; Yannet Interian; José Marcio Luna; Charles B Simone; Andrew Auerbach; Elier Delgado; Mark J van der Laan; Timothy D Solberg; Gilmer Valdes
Journal: Proc Natl Acad Sci U S A Date: 2020-02-18 Impact factor: 11.205