Literature DB >> 35071730

Prediction of generalized anxiety levels during the Covid-19 pandemic: A machine learning-based modeling approach.

Faisal Mashel Albagmi¹, Aisha Alansari², Deema Saad Al Shawan³, Heba Yaagoub AlNujaidi³, Sunday O Olatunji².

Abstract

The rapid spread of the Covid-19 outbreak led many countries to enforce precautionary measures such as complete lockdowns. These lifestyle-altering measures caused a significant increase in anxiety levels globally. For that reason, decision-makers are in dire need of methods to prevent potential public mental crises. Machine learning has shown its effectiveness in the early prediction of several diseases. Therefore, this study aims to classify two-class and three-class anxiety problems early by utilizing a dataset collected during the Covid-19 pandemic in Saudi Arabia. The data was collected from 3017 participants from all regions of the Kingdom via an online survey containing questions to identify factors influencing anxiety levels, followed by questions from the GAD-7, a screening tool for Generalized Anxiety Disorders. The prediction models were built using the Support Vector Machine classifier for its robust outcomes in medical-related data and the J48 Decision Tree for its interpretability and comprehensibility. Experimental results demonstrated promising results for the early classification of two-class and three-class anxiety problems. As for comparing Support Vector Machine and J48, the Support Vector Machine classifier outperformed the J48 Decision Tree by attaining a classification accuracy of 100%, precision of 1.0, recall of 1.0, and f-measure of 1.0 using 10 features.

Entities: Chemical

Keywords: Anxiety; COVID-19; Machine learning; Pandemic; Saudi Arabia

Year: 2022 PMID： 35071730 PMCID： PMC8766246 DOI： 10.1016/j.imu.2022.100854

Source DB: PubMed Journal: Inform Med Unlocked ISSN： 2352-9148

Introduction

One of the shared global experiences of the COVID-19 pandemic is the experience of “lockdown.” However, such strict restrictions vary from country to country and change over time. The consequences of lockdowns on mental health have been substantial [[1], [2], [3]]. Lockdown conditions lead to social isolation and confinement, which can impact the population's mental health. Furthermore, this crisis can have a broader impact on education, work, everyday life, and implications for mental health services. The Lancet psychiatry has highlighted the need for mental health services during lockdowns, especially for the most vulnerable groups such as students, medical professionals, and women [4]. As precautionary measures impact large portions of the population, it is expected that mental health problems will be on the increase globally [1,5,6]. According to the Anxiety and Depression Association of America, the commonest mental health problem in the United States was Generalized Anxiety Disorder (GAD). Approximately one-third of the American population suffers from GAD, but only less than half of them have access to mental healthcare [7]. Despite the extensive research, the magnitude and the underlying factors of GAD during lockdown are unknown. However, there has been evidence to suggest that early identification and access to mental health treatment may help mitigate the impact of mental health, especially GAD. Using technology such as telepsychiatry for frontline medical workers and vulnerable populations during lockdown times could alleviate the effects of mental health in the population affected by the COVID-19 pandemic [2]. Nevertheless, research could guide mental health services such as telepsychiatry to target the most vulnerable population groups, especially during pandemic lockdowns [3]. For that reason, machine learning can be a powerful tool to enable decision makers to customize mental health services depending on the predicted needs of different of these subpopulations. The early preparation for the potential mental health needs is crucial to prevent a mental health crisis. According to Thompson and his colleagues investigated the delay in seeking treatment for anxiety and mood disorders. They found that people delay seeking help for around 8.2 years. Moreover, they reported two main indicators associated with this delay slower problem recognition and younger age at onset. As older people take a longer time to contact initial treatment. This could be effectively prevented by early prediction of anxiety using machine learning models [8]. Several studies aimed to assess the psychological impact of the pandemic on the Saudi population, which enforced a complete lockdown in March 2020 [9]. However, most of these studies lack modeling of the collective effects of GAD on the population in a pandemic. This study addresses this gap by using supervised machine learning algorithms, which is an explainable artificial intelligence approach to capture the joint multivariant distribution underlying extensive survey data collected across Saudi Arabia during a lockdown. The choice of machine learning algorithm selected paid succinct attention to models that have proved their success in various medical applications, namely, Support Vector Machine (SVM) and J48 Decision Tree (DT). Support Vector Machine learning is a well-established algorithm for both classification and regression with extensive successful applications in several fields, including medical and biomedical applications, while decision tree is well known for its clarity and easier understanding even to non-computer professionals, thereby making it appealing to the medical and public health professionals coupled with its excellent performance in various applications. Empirical results showed that SVM outperformed the J48 Decision Tree using the ten highest correlated features and the optimized hyperparameters, achieving a 100% accuracy in anxiety binary [10]. As for comparing Support Vector Machine and J48, the Support Vector Machine classifier outperformed the J48 Decision Tree by attaining a classification accuracy of 100%, precision of 1.0, recall of 1.0, and f-measure of 1.0 using 10 features. Although the J48 decision tree achieved lesser performance measures, the highest being 95% accuracy, nonetheless, it offers the possibility of having a better explain-ability to non-computer professionals in understanding how the developed models worked. In fact, the potentials of the proposed machine learning models in mitigating the late effect of anxiety cannot be overemphasized.

Review of related literature

The public health mental crisis during COVID-19 has been studied by several researchers worldwide. According to a study conducted in China in 2020, one-third of the participants reported moderate-to-severe anxiety, and more than half of the participants had a moderate-to-severe psychological impact [11]. There are several studies that aim to assess the psychological impact of the pandemic on the Saudi Population. For instance, Albagmi and his colleagues assessed the prevalence of anxiety and associated factors during the lockdown period at the peak of the outbreak in Saudi Arabia. A total of 3,017 respondents from all five main regions of Saudi Arabia completed the survey. The results indicated that 19.6% of the respondents had a moderate to severe level of anxiety during the COVID-19 pandemic [9]. The factors that were associated with a higher level of anxiety included being female, being a student, being single or divorced, and living with a family member who is vulnerable to COVID-19. Similarly, another study conducted in Saudi Arabia measured the impact of the pandemic on the psychological disposition of a total of 2081 Saudi residents and citizens. According to the results, 7.3% of the respondents had anxiety. Additionally, the researchers concluded that individuals are more likely to develop depression during the pandemic included non-Saudi, divorcees, the elderly, and university students. As for factors that correlated with a higher level of anxiety, they included “Saudi individuals, married people, the unemployed and those with a high income” [12]. Moreover, another study investigated the anxiety level across students in Saudi Arabia during the COVID-19 pandemic. The study revealed that 35% of students experienced moderate to severe anxiety. Female and fourth-year students were more anxious compared to their counterparts [13]. In recent years, there has been an increasing interest in using machine learning models in predicting anxiety disorders. These prediction models are appealing to decision-makers due to their ability to detect the potential outcomes of different courses of action. These tools are handy to assess the potential impact of public mental health crises and understand their associated factors' dynamics. Pintelas and colleagues., [14] conducted a systematic review of machine learning prediction methods for anxiety disorders. They concluded that the accuracy of these research relay on the type of prediction methods and data acquisitions as clinical data or self or screening tools. Out of the 16 studies examined, they found that the highly used method for predicting post-traumatic stress disorder (PTSD) and Seasonal affective disorder (SAD) were Hybrid methods and Support Vector Machine (SVM), respectively. Also, Artificial neural networks (ANNs) and ensemble methods achieve the highest prediction scores. Boeke and his colleagues used neuroimaging measurements to predict traits of anxiety using a k-fold cross validation machine across 531, 307 women. They conclude that they did not find evidence of a generalizable anxiety biomarker using different method [15]. Other studies have also predicted GAD among women using data acquired from a self-screening survey. Husain et al. [16] found that the random forest approach showed high prediction accuracy (0.9). This was also investigated by Jothi et al., in 2021 as they used Shapley value as a feature selection to predict GAD among women in Malaysia. Elhai et al. collected Cross-sectional data from 908 adults from Eastern China. The questionnaire was distributed between 24 February to 15 March 2020, when strict social distancing measures were in place [17]. The authors adopted several instruments to measure the Generalized Anxiety Disorder and other mental illnesses. These tools include the GAD-7, The Depression Anxiety Stress Scale-21 (DASS-21), and the Ruminative Responses Scale. Additionally, the participants were queried and the magnitude of their exposure to pandemic-related news. Furthermore, the researchers utilized multiple machine learning algorithms to customize their model to identify vulnerability factors for COVID-10–influenced anxiety and the perceived threat of death. The study's findings identified several predictors of anxiety severity such as stress, rumination, the threat of death from COVID-19, age, negative consequences of illness, news exposure to coronavirus, and the participant's sex. Thompson and his colleagues investigated the delay in seeking treatment for anxiety and mood disorders. They found that people delay seeking help for around 8.2 years. Moreover, they reported two main indicators associated with this delay, slower problem recognition and younger age at onset, as older people take a longer time to contact initial treatment, which could be effectively prevented by early prediction of anxiety using machine learning models [8]. Additional studies, methods, results and limitations are described in Table 1 .

Table 1

Overview of the reviewed sources arranged by data of publication.

Author(s) Citation	Title of article or chapter	Objective	Method	Findings	Limitations
[17]	Modeling anxiety and fear of COVID-19 using machine learning in a sample of Chinese adults: associations with psychopathology, sociodemographic, and exposure variables	To examined vulnerability factors associated with increased anxiety and fear.	The researchers used R caret package for machine learning, with packages for specific algorithms of glmnet (lasso, ridge, and elastic net regression), rf (random forest), xgbTree (extreme gradient boosted regression), and svmRadial (support vector machine with a radial basis function kernel).	Stress and rumination were the most relevant variables in modeling COVID-19-related anxiety intensity, according to shrinkage machine learning methods. The most powerful predictor of perceived COVID-19 death threat was health anxiety.	Data was from one geographical area china.They only included self-report measures of psychopathology
[18]	Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence	To identify important predictors for GAD and MDD risk using artificial intelligent	A novel machine learning process was used to re-analyze data from an observational study to tackle the problem of predicting MDD and GAD. The pipeline is an algorithmically diverse collection of machines learning approaches, including deep learning.	Being comfortable with living conditions and having public health insurance were the two most important factors in predicting MDD. Up-to-date vaccinations and marijuana usage were the two most powerful predictors of GAD. Our findings show that machine learning algorithms for detecting GAD and MDD based on EHR data have a moderate predictive performance.	The original screening for MDD and GAD outcomes may not have identified all cases in the community.The research originates from French college students, who are likely to have different baselines than other psychiatric populations.
[19]	Predicting generalized anxiety disorder among women using Shapley value	To predict GAD among women using Shapley value	On the mental health data set, the Shapley value was used as the feature selection for the data mining classifier.	The finding has been improved using feature selection among the prediction's models (Naïve Bayes, Random Forest and J48).	Small sample size 180 participants
[15]	Toward Robust Anxiety Biomarkers: A Machine Learning Approach in a Large-Scale Sample.	To predict trait anxiety from neuroimaging measurements in humans.	They compared a suite of neuroimaging-based machine learning models using Python to predict anxiety within a discovery sample (n = 531, 307 women) via k-fold cross-validation. The final model using (a stacked model incorporating region-to-region functional connectivity, amygdala seed-to-voxel connectivity, and volumetric and cortical thickness data) in a held-out, unseen test sample (n = 348, 209 women).	Stacked model was able to predict anxiety within the discovery sample. But failed to test the generalizability in the holdout sample.	The researchers studied a limited set of brain phenotypes and applied a circumscribed set of approaches.They didn't analyze a clinical sample.The imaging sequences used lack the spatial and temporal precision of current approaches
[20]	Assessment of Anxiety, Depression and Stress using Machine Learning Models	To predict anxiety, depression, and stress using 8 algorithms.	Using data from the online DASS42 tool, eight machine learning algorithms were used to predict the occurrence of psychological issues such as anxiety, depression, and stress.	The prediction accuracy obtained by utilizing the hybrid algorithm was higher than that obtained by using single methods, although the radial basis function network, which falls within the category of neural networks, yielded the highest accuracy.	NA
[6]	Learning the Mental Health Impact of COVID-19 in the United States with Explainable Artificial Intelligence	To focus on learning a ranked list of factors that could indicate a predisposition to a mental disorder during the COVID-19 pandemic.	They surveyed 17,764 adults in the United States using Bayesian network inference, they have identified key factors affecting mental health during the COVID-19 pandemic.	They discovered that patients with a chronic mental disease were more susceptible to mental problems during the COVID-19 pandemic using the Bayesian network model.	The data analyzed is limited to one geographical area (united stated)
[21]	Screening of anxiety and depression among seafarers using machine learning technology	To compare performance of different machine learning algorithms for screening of anxiety and depression among the seafarers.	After obtaining the required approval and ethical clearance, 470 sailors were interviewed at the Haldia Dock Complex in India.Five machine learning classifiers i.e., CatBoost, Logistic Regression, Naïve Bayes, Random Forest, and Support Vector Machine, were evaluated using the Python programming language.	They found that Catboost appeared to be the best one for predicting anxiety and depression with accuracy and precision 82.6% and 84.1% respectively.	The study emphasized the application of machine learning technology in the field of automated screening for mental health illness.
[22]	Detecting anxiety on Reddit	To detect anxiety related posts from Reddit using various linguistic features.	study anxiety disorders through personalnarratives collected through the popularsocial media website, Reddit	apply N-gram language modeling, vector embeddings, topicanalysis, and emotional norms to generate features that accurately classify postsrelated to binary levels of anxiety.	They achieve an accuracy of 91% with vectorspace word embeddings, and an accuracy of 98% when combined with lexiconbased features.

Overview of the reviewed sources arranged by data of publication. There are limited studies on the pandemic's impact and its associated factors on public mental health in Saudi Arabia. For that reason, this paper aims to use a carefully selected machine learning algorithm that includes SVM and Decision Tree for predicting anxiety using real-life data collected in Saudi Arabia during the lockdown due to the Covid-19 pandemic. In addition, feature selection was systematically carried out, which identified 10 best features that achieved the highest accuracy out of the 20 available features.

Description of the proposed techniques

The following subsections exhibit a brief description of the machine learning algorithms utilized in the proposed project for anxiety classification.

Support Vector Machine (SVM)

Support Vector Machine (SVM) is a promising supervised non-linear machine learning algorithm founded by Cortes, Vapnik, and Boser in the late nineties [23]. Many researchers have commonly adopted SVM for its unique ability to operate with linear and non-linear data and support diverse kernel functions. Nevertheless, SVM's main advantage is its ability to overcome the curse of dimensionality issues and operate successfully with few data through utilizing a generalization control technique [23]. SVM can be employed in both classification and regression problems. However, it is mainly adopted for binary classification applications [24], where it inspects the training instances and determines a hyperplane to classify two classes. The distance between the support vectors and the hyperplane must be maximized to obtain an optimal hyperplane [25], as shown in Fig. 1 .

Fig. 1

Maximum hyperplane distance.

Maximum hyperplane distance. Equation (1) represents the mathematical formula for measuring the hyperplane, where w denotes the weight vector, x is the value from the set of labeled training pairs, and b is the bias. However, minimizing the weight vector is essential for finding the optimal hyperplane to obtain generalization control. Equation (2) represents the mathematical formula for finding the optimum hyperplane through employing the Lagrangian duality theory to give the function more degree of freedom [26].

J48 decision tree

J48, also known as C4.5, is a supervised machine learning algorithm developed by Ross Quinlan [27]. It is basically a decision tree algorithm extended from the ID3 algorithm [28]. The structure of the J48 tree is composed of three main components, the interior node, which denote the attributes, the branches that give information on the possible values a node can have, and the leaves that determine the final value of classification [29]. Fig. 2 shows the basic structure of a J48 decision tree [29].

Fig. 2

Decision tree scheme.

Decision tree scheme. J48 utilizes an enhanced procedure of the tree pruning method to overcome the misclassification error that a high noise training dataset can cause. It also uses the divide and conquer approach to partition the data into smaller subsets recursively [30]. As in other decision tree algorithms, the gain is calculated in each step to decide the best attribute in each upcoming node [29]. To calculate the gain, entropy is first computed to evaluate the uncertainty degree of an instance, as shown in Equation (3). In Equation 3, S provides the set of samples, c corresponds to the number of classes, and Pi denote the most frequent probability of an element (i) in the sample set. The entropy is null when all values are related to one class and is maximum when the sample is proportional [29]. Equation (4) shows how the information gain is calculated, where the biggest possible information gain is calculated [29]. contains the instances that have the value v in feature A, whereas V(A) contains the values of feature A.

Empirical studies

Description of dataset

A cross-sectional study was conducted to assess generalized anxiety disorder (GAD) levels during the COVID-19 pandemic in Saudi Arabia. The data collection took place during the full lockdown from May 11 to May 26, 2020. The researchers adopted the GAD-7, which had been proven as a valid and efficient tool for screening for GAD. The current survey consisted of questions to identify demographic information and potential factors associated with anxiety levels, followed by the seven questions of the GAD-7 tool. The survey was developed using an online Question pro questionnaire. The survey was initially distributed through Sharek Health, an organization that aids in data collection in all Saudi Arabian regions, followed by a snowball sampling strategy to increase the number of participants. The survey was shared via different social media platforms, including Twitter and WhatsApp. The sample included 3017 participants who had completed the survey questionnaire with no missing data, as participants were required to complete all the questions. Some questions were selected from the survey to be included in this study as seen in Table 2 , the remaining questions were omitted due to their irrelevance to this study. One-third (33%) were males, more than half were between the age group 20–39 years (n = 1689, 56%) and married (n = 989, 63.7%).

Table 2

Survey questions.

Variable	Label
Q3	Nationality
Q18	Gender
Q19	Age
Q20	Marital status
Q21	How many people are in the house? (Includes house workers and drivers)
Q22	Are you or any of your household members at increased risk of contracting the coronavirus? (This includes anyone over the age of 60 or pregnant or having comorbidities)
Q24A1	Have you been tested positive for COVID-19 test?
Q24A2	Have you been suspected of carrying the coronavirus?
Q24A3	Have any member of your family have been diagnosed with coronavirus?
Q25	Qualification
Q26	Occupation
Q28	What is the method followed by your employer, or academic institution during the pandemic? (Online or in person)
Q30	Feeling nervous, anxious, or on edge
Q31	Not being able to stop or control worrying
Q32	Worrying too much about different things
Q33	Trouble relaxing
Q34	Being so restless that it's hard to sit still
Q35	Becoming easily annoyed or irritable
Q36	Feeling afraid as if something awful might happen
Q37	How difficult have these problems made it to do work, take care of things at home, or get along with other people
Georgian	Geographically region
Anxiety (Two category)	Anxiety two categories (Anxious and non-anxious)
Anxiety (Three category)	Anxiety score three categories (Mild-Moderate-Severe)

Survey questions.

Statistical analysis

In this study, the main statistical analysis methods were used to analyze the dataset attributes. Before analyzing the dataset, three attributes were removed, which are sector, whether the participant was learning/working online or in person (Q28), and anxiety score. The mean, median, standard deviation, maximum, and minimum values of the numerical attributes were calculated and recorded in Table 3 . Furthermore, the correlation coefficient between each attribute and the target class was computed, and the values were ranked in descending order, as shown in Table 4, Table 5 .

Table 3

Statistical analysis of the dataset.

Attributes	Mean	Median	Standard Deviation	Max.	Min.
Q3	1.063	1	0.242	2	1
Q18	1.560	2	0.496	2	1
Q19	3.307	3	1.300	6	1
Q20	1.731	2	0.560	4	1
Q21	6.733	7	3.026	30	0
Q20	1.651	2	0.477	2	1
Q24A1	0.002	0	0.048	1	0
Q24A2	0.006	0	0.075	1	0
Q24A3	0.010	0	0.099	1	0
Q25	3.731	4	0.954	5	1
Q26	2.730	2	1.487	6	1
Q30	1.056	1	1.046	3	0
Q31	0.638	0	0.910	3	0
Q32	0.930	1	0.982	3	0
Q33	0.700	0	0.941	3	0
Q34	0.754	0	0.976	3	0
Q35	0.768	0	0.966	3	0
Q36	0.627	0	0.900	3	0
Q37	1.696	2	0.712	4	1
Geo-region	1.022	1	0.989	4	0

Table 4

Correlation between each Attribute and the First Experiment Target Attribute.

Attributes	Target Attribute	Correlation coefficient
Q31	Anxiety Two category (2)	0.69032
Q32	Anxiety Two category	0.68472
Q30	Anxiety Two category	0.68466
Q33	Anxiety Two category	0.67673
Q36	Anxiety Two category	0.65965
Q35	Anxiety Two category	0.58508
Q34	Anxiety Two category	0.54546
Q37	Anxiety Two category	0.48791
Q19	Anxiety Two category	0.14877
Q22	Anxiety Two category	0.11936
Q26	Anxiety Two category	0.09987
Q20	Anxiety Two category	0.08589
Q18	Anxiety Two category	0.06622
Q24A2	Anxiety Two category	0.05201
Georegion	Anxiety Two category	0.05052
Q3	Anxiety Two category	0.02726
Q24A3	Anxiety Two category	0.02619
Q21	Anxiety Two category	0.01486
Q25	Anxiety Two category	0.01195
Q24A1	Anxiety Two category	0.00648

Table 5

Correlation between each Attribute and the Second Experiment Target Attribute.

Attributes	Target Attribute	Correlation coefficient
Q31	Anxiety Three category	0.64316
Q30	Anxiety Three category	0.63942
Q32	Anxiety Three category	0.63888
Q33	Anxiety Three category	0.63119
Q36	Anxiety Three category	0.61451
Q35	Anxiety Three category	0.54564
Q34	Anxiety Three category	0.50835
Q37	Anxiety Three category	0.45479
Q19	Anxiety Three category	0.13954
Q22	Anxiety Three category	0.11146
Q26	Anxiety Three category	0.09348
Q20	Anxiety Three category	0.08045
Q18	Anxiety Three category	0.06233
Q24A2	Anxiety Three category	0.04852
Georegion	Anxiety Three category	0.04767
Q3	Anxiety Three category	0.02526
Q24A3	Anxiety Three category	0.02427
Q21	Anxiety Three category	0.01428
Q25	Anxiety Three category	0.01328
Q24A1	Anxiety Three category	0.00807

Statistical analysis of the dataset. Correlation between each Attribute and the First Experiment Target Attribute. Correlation between each Attribute and the Second Experiment Target Attribute.

Experimental setup

This experiment was carried out using open-source software, called Weka, that affords machine learning algorithms to build an anxiety prediction model. The dataset was used to classify two-class and three-class anxiety problems. The attributes “sector” and “Q28″ were excluded from both experiments since they contain missing values that can negatively affect the classification accuracy. The attribute “Anxiety score 1″ was also removed from both experiments, as it can directly contribute to classifying anxiety in patients on its own. Additionally, the target class of the first experiment was excluded from the second and vice versa. Afterward, the nominal features were converted to numerical using Excel software. Two supervised machine learning algorithms were employed to build the models in both experiments: SVM and J48 decision tree. Hyperparameter tuning was performed to optimize the classifiers. The Correlation Ranking Filter provided by Weka called ‘CorrelationAttributeEval’ was utilized to obtain the best feature subset that results in attaining the highest average accuracy for both experiments, that are anxiety two-class and three-class. Then, 10-fold cross-validation was used to partition the dataset and evaluate the accuracy. Furthermore, to determine the best models for classifying anxiety, confusion matrices were constructed to compare the accuracy, recall, precision, and the ƒ˗Measure of the proposed models.

Performance measure

Four primary performance measures were utilized in this study: classification accuracy, ƒ˗Measure, precision, and recall. Equation (5) shows how the classification accuracy that is responsible for calculating the precisely classified instances is calculated. Equations 6, 7) show how precision and recall are computed, where precision calculates the amount of true positive prediction belonging to the positive class, and recall calculates true positive prediction belonging to all positive samples. Equation (8) shows how the ƒ˗Measure is calculated, which estimates the performance of each class [31]. In the above equations, TP denotes true positive, TF denotes true negative, whereas FP denotes false positive, and FN denotes false negative.

Optimization strategy

Developing an optimum model for medical applications is essential to avoid further complications. Therefore, parameter tuning is a crucial step that must be performed effectively. Regarding the SVM hyperparameters for classifying the two-class and three-class anxiety problems, only the kernel function and cost (C) were tuned, as the other hyperparameters did not positively affect the accuracy. In SVM, the cost hyperparameter was fixed to its default value in Weka, which is 1, and the kernel functions (Poly Kernel, Normalized Poly Kernel, PUK, and RBF Kernel) were individually experimented. The kernel function that achieved the highest accuracy was then tried on a cost range from 1 to 10 to gain the optimum accuracy. For the J48 decision tree, only the confidence factor hyperparameter was altered within a range from 0.15 to 0.95.

Two-class anxiety classification

Fig. 3, Fig. 4 show the results of manipulating SVM's kernel functions and costs to find the optimum binary-class classification accuracy, where the poly kernel with cost 2 to 10 achieved the best result. The cost value 2 was chosen as the optimal hyperparameter.

Fig. 3

Tuning Kernel function.

Fig. 4

Tuning the cost.

Tuning Kernel function. Tuning the cost. Table 6 shows the best hyperparameter combinations of SVM that achieved an accuracy of 100% when applied to the whole dataset for binary-class classification.

Table 6

Optimum hyperparameters for the proposed SVM model.

Parameters	Optimal value chosen
Kernel	Poly Kernel
C	2
Epsilon	1.0E-12

Optimum hyperparameters for the proposed SVM model. Fig. 5 shows the results of adjusting J48's confidence factor with different values, where the confidence factor 0.45 achieved the best outcome for classifying the two-class problem.

Fig. 5

Tuning the confidence factor.

Tuning the confidence factor. Table 7 shows the best hyperparameter combinations of the J48 decision tree that achieved an accuracy of 95.79% when applied to the whole dataset when classifying the two-class problem.

Table 7

Optimum hyperparameters for the proposed J48 model.

Parameters	Optimal value chosen
Confidence Factor	0.45
MinNumObj	2

Optimum hyperparameters for the proposed J48 model.

Three-class anxiety classification

Fig. 6, Fig. 7 show the results of manipulating SVM's kernel functions and costs to find the optimum three-class classification accuracy, where the poly kernel with costs 4 to 10 achieved the best result. The cost value 4 was chosen as the optimal hyperparameter.

Fig. 6

Optimizing Kernel functions.

Fig. 7

Optimizing the cost.

Optimizing Kernel functions. Optimizing the cost. Table 8 shows the best hyperparameter combinations of SVM that achieved an accuracy of 100% when applied to the whole dataset for classifying the three-class problem.

Table 8

Optimum hyperparameters for the proposed SVM model.

Parameters	Optimal value chosen
Kernel	Poly Kernel
C	4
Epsilon	1.0E-12

Optimum hyperparameters for the proposed SVM model. For the J48 decision tree, Fig. 8 shows that the confidence factor 0.15 achieved the best result when classifying the three-class problem.

Fig. 8

Tuning the confidence factor.

Tuning the confidence factor. Table 9 shows the best hyperparameter combinations of the J48 decision tree that achieved an accuracy of 92.81% when applied to the whole dataset for classifying the three-class problem.

Table 9

Optimum hyperparameters for the proposed J48 model.

Parameters	Optimal value chosen
Confidence Factor	0.15
MinNumObj	2

Optimum hyperparameters for the proposed J48 model.

Results and discussion

The parameter tuning succeeded in promoting SVM's classification accuracy to 100% in both two-class and three-class classification problems. It also enhanced the accuracy of J48 to 95.79% for the two-class problem and 92.81% for the three-class problem. After performing feature selection, the J48's classifier accuracy was further enhanced using the best feature subset that offered the best performance. Considering SVM, which already achieved an accuracy of 100% after parameter tuning, feature selection is applied to it to gain the same accuracy with fewer features. This facilitates the process of classifying anxiety for medical teams, as they will need to collect fewer attributes from patients. This section will discuss the results of classifying the binary-class and three-class anxiety problems after performing feature selection using the performance measures listed previously. The results were evaluated using 10-fold cross-validation.

Feature selection

The ‘CorrelationAttributeEval’ tool in Weka was employed to the whole dataset to rank the attributes based on their correlation to the target attribute in descending order, as shown previously in Table 2, Table 3 A recursive feature elimination procedure was applied to divide the features in half in each iteration until a single feature remains. The highest correlated V/2 features were further experimented with, whereas the lowest correlated V/2 features were discarded. As shown in Table 10 , the highest average accuracy achieved after classifying the two-class problem is 97.98%, where 10 features were used. The top 10 features include Q31, Q32, Q30, Q33, Q36, Q35, Q34, Q37, Q19, and Q22 in descending order.

Table 10

Average accuracy of different feature subsets of the two-class classification experiment.

Number of features	Accuracy of SVM	Accuracy of J48	Average accuracy of each set of features
Using 20 Features	100%	95.79%	97.90%
Using 10 Features	100%	95.96%	97.98%
Using 5 Features	95.76%	95.00%	95.38%
Using 3 Features	92.97%	93.27%	93.12%
Using 2 Features	91.95%	91.51%	91.73%
Using 1 Feature	90.19%	90.19%	90.19%

Average accuracy of different feature subsets of the two-class classification experiment. As shown in Table 11 , the highest average accuracy achieved after classifying the three-class problem is 96.75%, where 10 features were used. The top 10 features include Q31, Q30, Q32, Q33, Q36, Q35, Q34, Q37, Q19, and Q22 in descending order. It is concluded that the best feature subset in both models is the same but with a slight difference in the order.

Table 11

Average accuracy of different feature subsets of the three-class classification experiment.

Number of features	Accuracy of SVM	Accuracy of J48	The average accuracy of each set of features
Using 20 Features	100%	92.81%	96.40%
Using 10 Features	100%	93.50%	96.75%
Using 5 Features	93.14%	91.48%	92.31%
Using 3 Features	89.63%	89.96%	89.79%
Using 2 Features	87.11%	88.66%	87.89%
Using 1 Feature	85.18%	86.77%	85.98%

Average accuracy of different feature subsets of the three-class classification experiment.

Results of the two-class anxiety classification

Table 12 compares the performance of the classifiers SVM and J48 after parameter tuning and feature selection for classifying the two-class problem. The classifiers’ performance is evaluated according to the performance measures mentioned earlier. As shown in Table 10, SVM achieved the most reliable performance with a classification accuracy of 100%, whereas J48 achieved 95.96% (see Table 13).

Table 12

Results of classifiers after optimization and feature selection of the two-class classification experiment.

Performance Measure	SVM	J48
Accuracy (%)	100	95.96%
Precision	1	0.974
Recall	1	0.975
f-measure	1	0.975

Table 13

SVM Confusion matrix after Optimization and Feature Selection of the Two-class Classification Experiment.

		Predicted
		Anxiety	Non-Anxiety
Actual	Anxiety	2425 (TP)	0 (FN)
Actual	Non-Anxiety	0 (FP)	592 (TN)

Results of classifiers after optimization and feature selection of the two-class classification experiment. SVM Confusion matrix after Optimization and Feature Selection of the Two-class Classification Experiment. Tables 13 and 14 present the confusion matrix of the classifiers SVM and J48. Since the experiment is based on medical diagnosis, the FN rate is the most significant evaluator, as undiagnosed anxiety may cause insomnia (sleep disorder) and mental trouble. SVM succeeded in achieving a 0 FN rate, whereas J48 possessed a 60 FN rate. Therefore, it is concluded that SVM is more powerful for classifying the two-class anxiety problem than the J48 Decision Tree.

Results of the three-class anxiety classification

Table 15 compares the performance of the classifiers SVM and J48 after parameter tuning and feature selection for classifying the three-class problem. The classifiers’ performance is evaluated according to the performance measures mentioned earlier. As shown in Table 14, SVM achieved the most reliable performance with a classification accuracy of 100%, whereas J48 achieved an overall accuracy of 93.50% (see Table 16).

Table 15

Results of classifiers after optimization and feature selection of the three-class classification experiment.

Performance Measure	SVM	J48
Accuracy (%)	100	93.50%
Precision	1	0.933
Recall	1	0.935
f-measure	1	0.934

Table 14

J48 confusion matrix after optimization and feature selection of the two-class classification experiment.

		Predicted
		Anxiety	Non-Anxiety
Actual	Anxiety	2365 (TP)	60 (FN)
Actual	Non-Anxiety	62 (FP)	530 (TN)

Table 16

SVM Confusion matrix after Optimization and Feature Selection of the Three-class Classification Experiment.

		Predicted
		Mild	Moderate	Severe
Actual	Mild	2425	0	0
	Moderate	0	247	0
	Severe	0	0	345

J48 confusion matrix after optimization and feature selection of the two-class classification experiment. Results of classifiers after optimization and feature selection of the three-class classification experiment. SVM Confusion matrix after Optimization and Feature Selection of the Three-class Classification Experiment. Tables 16 and 17 present the confusion matrix of the classifiers SVM and J48 for classifying the three-class anxiety problem (see Table 17). Unlike binary classification, where the TP, TN, FP, FN values can be viewed clearly from the tables, it must be calculated for easier interpretation in multi-class classification. As shown in Table 15, SVM succeeded in achieving 0 rates of false predictions. Table 18 presents the TP, TN, FP, FN for each class for the J48 classifier. As shown, J48 possessed a 196 FN rate. Therefore, it is also concluded that SVM is more powerful for classifying three-class anxiety problems than J48.

Table 17

J48 confusion matrix after optimization and feature selection of the three-class classification experiment.

		Predicted
		Mild	Moderate	Severe
Actual	Mild	2375	0	76
	Moderate	0	211	34
	Severe	50	36	235

Table 18

J48 TP, FP, FN, and TN rates of the Three-class Classification Experiment.

		Class
		Mild	Moderate	Severe
Rate	TP	2375	211	235
	FP	50	36	110
	FN	76	34	86
	TN	516	2736	2586

J48 confusion matrix after optimization and feature selection of the three-class classification experiment. J48 TP, FP, FN, and TN rates of the Three-class Classification Experiment.

Comparing the achieved result for classifying two-class and three-class anxiety problems

From the experiment results stated above, it is concluded that SVM succeeded in classifying all test cases of both two-class and three-class problems. In contrast, J48 achieved an accuracy of 95.96% for two-class classification and 93.50% for three-class classification, which indicates that J48 performed better in the two-class classification experiment. The reason behind the outperformance of J48 in the two-class experiment is the slight difference in the correlation coefficient between the attributes and the two-class problem compared to the three-class problem listed in Table 3, Table 4 Table 19 compares the accuracies of the classifiers in classifying two-class and three-class experiments.

Table 19

Comparing the accuracies of classifiers in 2-class and 3-class Experiments.

Classifier	Anxiety Two-class	Anxiety Three-class
SVM	100%	100%
J48	95.96%	93.50%

Comparing the accuracies of classifiers in 2-class and 3-class Experiments.

Further discussions

This paper aims to predict anxiety using machine learning techniques to study the pandemic's impact on Saudi Arabia's society. According to the previously discussed tables and figures, SVM outperformed the J48 Decision Tree attaining accuracy, precision, recall, and f-measure of 100%, 1.0, 1.0, and 1.0, respectively, in classifying both two-class and three-class problems. For further evaluation, the Area Under the Receiver Operating Characteristics (AUROC) was constructed to measure various confusion matrices that each threshold provided. Fig. 9, Fig. 10 show the AUROC curve of SVM in classifying the two-class and three-class problems. From the figures below, it is concluded that SVM succeeded in providing a perfect prediction reaching an AUROC value of 1.0 in classifying two-class and three-class anxiety problems.

Fig. 9

SVM Roc curve for classifying two-class problem: (a) Class zero (b) class one.

Fig. 10

SVM Roc curve for classifying three-class problem: (a) Class zero (b) class one (c) class two.

SVM Roc curve for classifying two-class problem: (a) Class zero (b) class one. SVM Roc curve for classifying three-class problem: (a) Class zero (b) class one (c) class two. Fig. 11, Fig. 12 show the AUROC curve of the J48 Decision Tree in classifying the two-class and three-class problems. From the figures below, it is concluded that the J48 performed better in classifying the two-class than the three-class problem reaching an AUROC value of 0.9397 against an average AUROC of 0.9170. The outcomes support the fact that increasing the number of output variable classes increases the complexity of the model, making it difficult to get good results. Hence, it is usually better to have fewer classes in the output variable to achieve better results.

Fig. 11

J48 Roc curve for classifying two-class problem: (a) Class zero (b) Class one.

Fig. 12

J48 Roc curve for classifying three-class problem: (a) Class zero (b) Class one (c) Class two.

J48 Roc curve for classifying two-class problem: (a) Class zero (b) Class one. J48 Roc curve for classifying three-class problem: (a) Class zero (b) Class one (c) Class two.

Conclusion and recommendation

A Saudi Arabian dataset was utilized for the first time in this study to build a prediction model that categorizes two categories and three categories of anxiety during the COVID-19 pandemic. The authors utilized two classifiers, namely, the Support Vector Machine (SVM) and the J48 Decision Tree, due to their reliable outcomes in medical-related data. The optimal hyperparameters were obtained, and the effect of feature selection was examined to build the model with a reduced feature subset. The empirical results attested to the fact that SVM outperformed the J48 Decision Tree with 100% accuracy against 95.96% for the three-class problem and 93.50% for the two-class problem when predicting anxiety for earlier diagnosis and timely intervention using ten features. Therefore, the researchers recommend that decision maker in Saudi Arabia adopt the prediction model produced by this study to strategically plan the distribution of both preventative and curative mental health care services.

Declaration of competing interest

The research was funded by Imam Abdulrahman Bin Faisal University, the grant number is Covid19-2020-024-CAMS.

17 in total

1. Modeling anxiety and fear of COVID-19 using machine learning in a sample of Chinese adults: associations with psychopathology, sociodemographic, and exposure variables.

Authors: Jon D Elhai; Haibo Yang; Dean McKay; Gordon J G Asmundson; Christian Montag
Journal: Anxiety Stress Coping Date: 2021-01-26

2. Mental Health and the Covid-19 Pandemic.

Authors: Betty Pfefferbaum; Carol S North
Journal: N Engl J Med Date: 2020-04-13 Impact factor: 91.245

3. Covid-19: Mental health consequences of pandemic need urgent research, paper advises.

Authors: Elisabeth Mahase
Journal: BMJ Date: 2020-04-16

4. Anxiety Levels Amid the COVID-19 Lockdown in Saudi Arabia.

Authors: Faisal Mashel Albagmi; Heba Yaagoub AlNujaidi; Deema Saad Al Shawan
Journal: Int J Gen Med Date: 2021-05-31

5. Depression and anxiety during the COVID-19 pandemic in Saudi Arabia: A cross-sectional study.

Authors: Hamad S Alyami; Abdallah Y Naser; Eman Zmaily Dahmash; Mohammed H Alyami; Musfer S Alyami
Journal: Int J Clin Pract Date: 2021-04-27 Impact factor: 3.149

6. A support vector machine model provides an accurate transcript-level-based diagnostic for major depressive disorder.

Authors: J S Yu; A Y Xue; E E Redei; N Bagheri
Journal: Transl Psychiatry Date: 2016-10-25 Impact factor: 6.222

7. Anxiety Level of University Students During COVID-19 in Saudi Arabia.

Authors: Heba Bakr Khoshaim; Areej Al-Sukayt; Karuthan Chinna; Mohammad Nurunnabi; Sheela Sundarasen; Kamilah Kamaludin; Gul Mohammad Baloch; Syed Far Abid Hossain
Journal: Front Psychiatry Date: 2020-12-11 Impact factor: 4.157

8. Learning the Mental Health Impact of COVID-19 in the United States With Explainable Artificial Intelligence: Observational Study.

Authors: Indra Prakash Jha; Raghav Awasthi; Ajit Kumar; Vibhor Kumar; Tavpritesh Sethi
Journal: JMIR Ment Health Date: 2021-04-20

9. Social isolation during COVID-19 lockdown impairs cognitive function.

Authors: Joanne Ingram; Christopher J Hand; Greg Maciejewski
Journal: Appl Cogn Psychol Date: 2021-03-24

10. Impact of the COVID-19 lockdown on antenatal mental health in Greece.

Authors: Themistoklis Dagklis; Ioannis Tsakiridis; Apostolos Mamopoulos; Apostolos Athanasiadis; Rebecca Pearson; Georgios Papazisis
Journal: Psychiatry Clin Neurosci Date: 2020-09-12 Impact factor: 12.145

1 in total

1. The Relationship between Exercise and Mental Health Outcomes during the COVID-19 Pandemic: From the Perspective of Hope.

Authors: Yingying Yao; Jianqiao Chen; Dan Dong; Yi Feng; Zhihong Qiao
Journal: Int J Environ Res Public Health Date: 2022-03-30 Impact factor: 3.390

1 in total