Literature DB >> 31508477

Internet addiction disorder detection of Chinese college students using several personality questionnaire data and support vector machine.

Zonglin Di¹, Xiaoliang Gong¹, Jingyu Shi², Hosameldin O A Ahmed³, Asoke K Nandi^3,1.

Abstract

With the unprecedented development of the Internet, it also brings the challenge of Internet Addiction (IA), which is hard to diagnose and cure according to the state-of-art research. In this study, we explored the feasibility of machine learning methods to detect IA. We acquired a dataset consisting of 2397 Chinese college students from the University (Age: 19.17 ± 0.70, Male: 64.17%) who completed Brief Self Control Scale (BSCS), the 11th version of Barratt Impulsiveness Scale (BIS-11), Chinese Big Five Personality Inventory (CBF-PI) and Chen Internet Addiction Scale (CIAS), where CBF-PI includes five sub-features (Openness, Extraversion, Conscientiousness, Agreeableness, and Neuroticism) and BSCS includes three sub-features (Attention, Motor and Non-planning). We applied Student's t-test on the dataset for feature selection and Support Vector Machines (SVMs) including C-SVM and ν-SVM with grid search for the classification and parameters optimization. This work illustrates that SVM is a reliable method for the assessment of IA and questionnaire data analysis. The best detection performance of IA is 96.32% which was obtained by C-SVM in the 6-feature dataset without normalization. Finally, the BIS-11, BSCS, Motor, Neuroticism, Non-planning, and Conscientiousness are shown to be promising features for the detection of IA.

Entities: Chemical Disease Species

Keywords: Feature selection; IA detection; Internet addiction (IA); Personality questionnaire; Support vector machine

Year: 2019 PMID： 31508477 PMCID： PMC6726843 DOI： 10.1016/j.abrep.2019.100200

Source DB: PubMed Journal: Addict Behav Rep ISSN： 2352-8532

Introduction

Background

In recent years, the development of the Internet has brought a lot of benefits in our society. However, it also causes the Internet Addiction Disorder (IAD) problems, which is also named as Pathological Internet Use (PIU). Since IAD was first put forward by Ivan Goldberg in 1995 (Abbott, Cramer, & Sherrets, 1995; Young, 1998b), it has become a social-psychological problem and many researchers have been working on this topic (Dongyun, Ni'na, & Yao, 2018; Griffiths, 2018; Wiederhold, 2018). Although IAD was not officially added into Diagnostic and Statistical Manual of Mental Disorders—Fifth Edition (DSM-V) in 2013, Internet Gaming Disorder (IGD) has been included in Section III, illustrating the importance of this area for further study (Petry & O'brien, 2013; Cho et al., 2014; Hahn, Reuter, Spinath, & Montag, 2017; Association, A.P, 2013; Spada, 2014). IAD is a compulsive-impulsive spectrum disorder which includes five specific types or addiction: cyber-sexual addiction, cyber-relationship addiction, net compulsions, information overload and computer addiction (Young, 1998a, Young, 1998b; Nordegren, 2002). As a result, IAD can lead to marriage break-down, job losses, financial problems, academic failures and even death (Whang, Lee, & Chang, 2003; Young, 2004). Many studies indicate that IAD is a multi-dimension construct which has many dependencies like mental health, age, peer influence, social support, family relationship, parental mental health, emotion dysregulation, alexithymia personality and so on (Mo, Chan, Chan, & Lau, 2018; Xiuqin et al., 2010). Among these factors, certain personality traits like self-control, impulsivity, items in Big Five personality including Openness, Extraversion, Conscientiousness, Agreeableness, Neuroticism are regarded to have close association with IAD (Ismail & Zawahreh, 2017; Lam, Peng, Mai, & Jing, 2009; Musetti et al., 2016; Treuer, Fa'bi'an, & Fu¨redi, 2001; Zhou, Li, Li, Wang, & Zhao, 2017). Although IAD can be found in any age group and every occupation, the youths are more vulnerable to IAD. Once they are addicted to the Internet, they will have a deeper addiction level. Influenced by the digital age, the Internet has resulted in the improvement of proficiency in certain courses. However, more and more reports point out the addictive Internet usage problem. Globally, it is estimated that 4–12% of adolescents may demonstrate IAD although the definition of IAD varies a lot (Petry & O'brien, 2013; Yau, Crowley, Mayes, & Potenza, 2012). 15.3% university freshmen in Taiwan and 20.3% adolescents in South Korea were reported to have IAD (Ha et al., 2006; Lin, Ko, & Wu, 2011). In China, the rates ranged from between 2.4%–5.5% in Hunan Province and to 6.4% in Shanxi Province (Mei, Yau, Chai, Guo, & Potenza, 2016). Also, college students are more likely trapped into the Internet among the adolescents because of academic pressure, unlimited Internet accesses and newly experienced freedom from parental control (Young, 2004).

The previous work of IAD detection

IAD has been put forward over twenty years and many researchers have been working on it about either the factors of this disorder or the understanding of this disorder. (Ko et al., 2006) found adolescents who have high novelty seeking, high harm avoidance and low reward dependence are more likely to be addicted to IAD using t-test and logistic regression. (Kayi¸s et al., 2016) investigated relationship between Big Five Personality Traits and Internet Addiction using meta methods which includes 12-study meta-analysis and calculates 13 effect sizes and found that openness to new experiences, conscientiousness, extraversion and agreeableness were negatively related with IAD whereas neuroticism was positively related with IAD. Resilience was found to be protective in IAD according to (Robertson, Yan, & Rapoza, 2018). (Fumero, Marrero, Voltes, & Pen˜ate, 2018) shows that personal factors have a greater impact on IAD. Machine learning has been widely used in bioinformatics, brain-machine interfaces, medical diagnosis and other medical areas (Baldi & Brunak, 2001; Choi et al., 2018; Müller et al., 2008; Zhang et al., 2018; Kononenko, 2001; Giger, 2018; Guo & Nandi, 2006). However, there are few studies about how to detect IAD using machine learning methods. (Wang, ZHANG, & ZHANG, 2008) proposed a Fuzzy Neural Network (FNN) method to forecast pattern of network addiction, where the number of layers is 3 and the features used are online-hours, frequency, the reason for the Internet, determination, socialization, the Internet skills, the attitude to the Internet and whether surf on the Internet all night. The dataset used in the work is not mentioned and the number of subjects is 10. (Gong et al., 2016) used clustering methods including K-Means (Hartigan & Wong, 1979; Wagstaff, Cardie, Rogers, Schro¨dl, et al., 2001), Hierarchical Clustering (Johnson, 1967; Navarro, Frenk, & White, 1997) and Fuzzy C-Means Clustering (Bezdek, Ehrlich, & Full, 1984) and personality data to predict Internet Game Disorder (IGD) among Chinese college freshmen. The dataset used comes from 580 freshmen from the University and the features used include Self-Control, Attention, Motor, Non-planning, Extraversion, Agreeableness, Conscientiousness, Neuroticism and Openness. Based on the same dataset, (Di, Gong, Shi, Ahmed, & Nandi, 2017) used Support Vector Machines (SVMs) and personality questionnaire data to detect IAD among Chinese college students. The author compared the performance of C-SVM and ν-SVM and did some work to find the influence of sex and age on IAD. Their work indicated that IAD can be detected by personality questionnaires and SVMs. In IAD detection, Chen Internet Addiction Scale (CIAS) is one of the standard questionnaires used as criteria of IAD. It has been frequently used in many areas related, such as resting state fMRI study, task-related fMRI study, the correlation studies of IAD with other factors. The internal reliability is in the range from 0.79 to 0.93, which shows its effectiveness in statistic research (Ko et al., 2009; Chen et al., 2015; Dong, Lin, & Potenza, 2015; Liu et al., 2014; Chen, Weng, Su, Wu, & Yang, 2003; Mak et al., 2014; Yen, Ko, Yen, Wu, & Yang, 2007; Mo et al., 2018; Chern & Huang, 2018; Lei, Li, Chiu, & Lu, 2018; Lau, Wu, Gross, Cheng, & Lau, 2017; Chang, Chiu, Lee, Chen, & Miao, 2014). So far, many works (Dieris-Hirche et al., 2017; Geng, Han, Gao, Jou, & Huang, 2018; Lam, 2015; Mahapatra & Sharma, 2018; Robertson et al., 2018; Romano, Truzoli, Osborne, & Reed, 2014) of IAD only investigate the effect of a single factor or feature, which brings some limitation to study this multi-dimension disorder completely. One of the distinctive and important aspects of the current study is the use of several questionnaires and multiple factors simultaneously.

The current study

To test the reliability of machine learning methods in IA detection, this work used a larger dataset, compared the performance of multi-SVM methods and FNN, and uses grid search to optimize the parameters. Different from other works, the data of several related questionnaires was collected to find the most distinguished features in the following steps, which includes Brief Self Control Scale (BSCS), the 11th version of Barratt Impulsiveness Scale (BIS-11), Chinese Big Five Personality Inventory (CBF-PI) and CIAS. The details of these questionnaires are given in the next section. After data acquisition, Student's t-test was used for feature selection to obtain several datasets, each with the same set of students but different features. Then the performance from these datasets using Support Vector Machines (SVMs) (Cortes & Vapnik, 1995) with grid search for parameter selection were compared and 10-fold cross validation was used to avoid the overfitting problem. Furthermore, the runtime of the proposed method and others was compared. Finally, a model for IA detection using C-SVM with the accuracy of about 96.3% was found.

Materials and methods

Questionnaires

The 11th version of Barratt Impulsiveness Scale (BIS-11)

BIS-11 (Barratt, 1959; Patton, Stanford, et al., 1995) is a 30-item questionnaire to evaluate one's impulsivity by summing sub-scale values which are Attention, Motor and Non-planning impulsivity, which is wildly used around the world to measure one's impulsivity for fifty years. Each item is scored according to the Frequency scale (1 is for never; 4 is for Almost Always/Always). The internal consistency coefficients of BIS-11 total score is from 0.79 to 0.83 (Mayhew & Powell, 2014) and Cronbach's α is 0.794 in Chinese children (Li, 2006) and its validity is also confirmed among Chinese (Cao, Su, Liu, & Gao, 2007; Yang, 2007; Yao et al., 2007).

Chinese Big Five Personality Inventory (CBF-PI)

CBF-PI (McCrae & John, 1992; Poropat, 2009; Zhou, Niu, & Zou, 2000) is a restricted version of Big Five Inventory (BFI) to evaluate Openness, Conscientiousness, Extraversion, Agreeableness and Neuroticism in one's personality, which works well for Chinese college students (Costa & McCrae, 1992; Thompson, 2008). It consists of 44-item and the object rated each item on a 5-point scale where 1 stands for strongly disagree and 5 is for strongly agree. The higher score indicates higher level in the specific sub-dimension. The internal consistency coefficients are from 0.78 to 0.85 (Wang, Dai, & Yao, 2011; Wang, Jackson, Zhang, & Su, 2012) and Cronbach's α is in the range from 0.721 to 0.777 among Chinese (Carciofo, Yang, Song, Du, & Zhang, 2016). It is reported that Big Five Inventory can help BIS-11 evaluate one's impulsivity and give a more specific evaluation result (Whiteside & Lynam, 2001).

Brief Self Control Scale (BSCS)

BSCS (Tangney, Baumeister, & Boone, 2004) is a 13-item questionnaire to measure dispositional self-regulatory behaviors using 13 items rated on a 5-point scale, ranging from 1 (Not at all like me) to 5 (Very much like me). The internal consistency coefficient is from 0.73 to 0.84 (Mathews, Youman, Stuewig, & Tangney, 2007). Cronbach's α of Self Control Scale (SCS) is 0.89 (Mei et al., 2016) and the SCS in Chinese fit better (χ2/df = 3.96, GFI = 0.94, TLI = 0.81, RMSEA = 0.06) (Qu & Zou, 2009). BSCS has a very good reputation in self-control assessment and BSCS has been widely used in many studies, especially in the domains of school/work, eating/weight, interpersonal functioning, and wellbeing/adjustment (Malouf et al., 2014).

Chen Internet Addiction Scale (CIAS)

CIAS (Chen et al., 2003) is a 26-item self-report measure for the Internet Addiction Evaluation whose score is from 1 (Does not match my experience at all) to 4 (Definitely matches my experience). The total score is in the range from 26 to 104. Compared with The Internet Addiction Test, CIAS shows its priority for Chinese Students. Cronbach's α of the CIAS for the sample is 0.94 (Chang et al., 2014). Higher CIAS score indicates that increased severity of addiction to Internet activity. As for the cut-off points of CIAS, the screening cut-off point is 57 while the diagnostic point is 63 (Ko, Yen, Chen, Chen, & Yen, 2005). To make the study accurate and general, we used the intersection of two versions. Those with score upper than 63 are considered very likely to suffer with IAD while whose score lower than 57 are not, which is also consistent with the suggested threshold score of 63/64 to provide good diagnostic accuracy with respect to Internet Addiction among adolescents (Chang et al., 2014; Ko et al., 2005)

Participants

3123 college undergraduate students participated in the study, where ethics approval was obtained from the Human Research Ethics Committee of the University for data acquisition. Among these 3123 participants, 2397 students gave valid questionnaire answers. The results from those participants who did not finish all questionnaires or give the straight same answers of the whole questionnaire test were considered as invalid data. The age of the participants ranges from 16.91 to 25 years old. The details about the participants and the valid ones are shown in Table 1, which can be found that characteristics of the valid dataset and the whole dataset are the same in age and gender generally. As for Sex in Table 1, the male was labelled as 1 while the female was-1. Sex in Table 1 and Table 2 is processed in the same way and has the same meaning to illustrate the sex ratio, which is for the convenience of feature selection and calculation in our experiments. Because all the values of Sex are integer, standard deviation (std.) has no meaning for Sex and we cancelled the std. in Table 1 and Table 2.

Table 1

The details about the participants and the valid ones.

Name	Participants	The valid
Male	2004 (64.17%)	1686 (60.98%)
Female	1119 (35.83%)	1079 (39.02%)
Total	3123 (100.00%)	2765 (100.00%)
Age	19.17 ± 0.70	19.17 ± 0.68
Gender	0.21	0.22

Table 2

Mean value and standard deviation of every feature between two groups used in the experiments.

Name	Normal	IA
Gender	0.24	0.17
Age	19.19 ± 0.67	19.13 ± 0.73
Attention	37.98 ± 9.44	35.79 ± 4.69
Motor	20.16 ± 5.81	25.41 ± 5.94
Non-planning	22.39 ± 6.81	27.70 ± 5.55
Openness	51.66 ± 9.97	48.83 ± 9.19
Conscientiousness	50.92 ± 9.64	43.84 ± 9.44
Extroversion	50.71 ± 10.28	47.09 ± 9.69
Agreeableness	51.43 ± 10.15	47.49 ± 10.40
Neuroticism	48.82 ± 9.44	56.90 ± 10.58
BIS-11 total score	80.53 ± 18.07	88.89 ± 16.17
CBF-PI total score	253.55 ± 49.49	244.12 ± 49.30
Self-Control Scale	22.18 ± 6.81	27.70 ± 5.54
CIAS score	39.74 ± 10.44	76.42 ± 11.05

The details about the participants and the valid ones. Mean value and standard deviation of every feature between two groups used in the experiments.

Methods

In this study, we tried to explore IA detection using machine learning methods. First, we selected the valid ones from the collected questionnaires. After that, we applied Student's t-test for feature selection and 2 linear normalizations so that we got several datasets. Then, we compared the performance of C-SVM and ν-SVM on different datasets with or without grid search. FNN was chose as comparison method with SVM. Finally, we analyzed the relationship among grid search parameters, accuracy and computation time. The whole process is illustrated in Fig. 1.

Fig. 1

The Framework of the present study.

Data pre-processing

After data acquisition, we summed the score up individually based on 4 questionnaires. For BIS-11 and CBF-PI, we calculated the score of each subset feature, including Attention, Motor, Non-planning, Openness, Conscientiousness, Extroversion, Agreeableness, Neuroticism, and the total score of 4 questionnaires. Then the dataset was divided into two groups according to the CIAS scores. Those whose score is less than 57 are regarded as the low IA (1598 participants) group while those with score over 63 belong to the high IA group (799 participants). The mean value and standard deviation value are shown in Table 2.

Normalization

After the data pre-processing, two linear normalizations were applied to the dataset to avoid the impact of great attributes numeric value on those small ones and decrease the calculation cost, which mapped the range of every feature to [−1, 1] and [0, 1] respectively. Thus, we had gotten three kinds of datasets with different value ranges: original datasets, the datasets ranged in [0, 1] and the datasets ranged in [−1, 1].

Student's t-test for feature selection

We applied Student's t-test for feature selection. The results of Student's t-test are h, p and C. If h is 0, it indicates that the null hypothesis at significance level is rejected and vice versa. p is the probability of observing the validity of the null hypothesis. Small p shows doubt on the validity of the null hypothesis. C is the confidence interval containing the lower and upper boundaries of the 95% confidence interval.

Grid search

For SVM using radial basis function (RBF) function introduced in the following part, there are two parameters to choose (C and g) and we don't know the best parameter in advance. Grid search is a method where various pairs of (C, g) in certain range are tried to find the best pair according to the best cross-validation accuracy. C and g both are in the range of [2−8, 28]. The smaller the steps of C (C − step) and g (g − step) are, the more accurate results we can get. The original rule of C and g in the following range: To make the result as accurate as possible, we set the updating rule as:where C-step and g-step both are 1 in these two rules.

Support vector machine

Support Vector Machine (SVM) is widely used in binary classification because its original idea is to find a hyperplane to separate the two classes with the maximum margin (Mu & Nandi, 2007) (See Fig. 2). The main advantage of SVM is its classification ability to solve the non-linear problem using kernel function. The kernel function we used in the research is RBF function to nonlinearly map samples into a higher dimensional space, which also helps to decrease the hyper-parameter in calculation (Hsu, Chang, Lin, et al., 2003). The kernel function we used is:where σ is the RBF function width adjusted by user.

Fig. 2

The illustration of SVM in 2-feature dataset. The filled points on the dashed line indicate the support vectors.

The illustration of SVM in 2-feature dataset. The filled points on the dashed line indicate the support vectors. In practice, most of the sample spaces cannot be separated completely. So, a parameter ξ is added to modify the classifier when there are some non-separable cases. To adjust the ability of modification, a new parameter C is defined as a matter of experience, often from range (0, +∞) and its minimization of error function is:which is what so-called C-SVM. Like C-SVM, ν-SVM can control the number of support vectors by adding two parameters ν and ρ. The error function is:

Cross validation

In every iteration, the dataset is split randomly, so it is reasonable to get different classifiers and results. However, we are more interested in the stable performance and its generalization on the whole dataset rather than the training set to avoid over-fitting. Among several methods in evaluation, k-fold cross validation is the most common method in statistics and machine learning (Kohavi et al., 1995; Seni & Elder, 2010). In theory, k can be set randomly as long as k is bigger than 1 and smaller than the number of samples n in the dataset. However, it is meaningless to set k too big or too small. For example, if k is 2, the dataset is just split equally and the classifier is likely not to be trained well to learn the characters of the dataset to give correct prediction results. Meanwhile, if k is (n − 1), k-fold cross validation is exactly the leave-one-out cross-validation (Stone, 1974). In this experiment, we used 10-fold cross validation.

FNN

FNN combines the strengths of neural network and fuzzy logic while overcomes the weaknesses of their own like difficulties in explaining how they reached their decisions, acquiring the rules they use to make decisions automatically, which proves its effectiveness in many aspects for a long time, such as pattern recognition, regression and density estimation (Fuller, 1998; Kruse, 2008).

Experiment platform

In this study, the experimental platform is MATLAB R2017 on a PC with Intel (R) Xeon (R) CPU E5–2665 (2.4GHz) and the RAM is 64GB using the Microsoft Windows 10 Operating System.

Result

Feature selection

To find the most distinct features among 13 features, we applied Student's t-test on the dataset. In Table 3, Age and Sex had the largest p-value and their h are 0, which implied these two features are not so distinguishable. Except Age and Sex, an 11-feature dataset was obtained whose h is 1, the p-value of Extraversion, CBF and Openness were bigger than 1e-20. Compared with the p value of each feature in the 11-feature dataset, we separately acquired 5 datasets with different features using Student's t-test according to h and p-value for detection and they were the following:

Table 3

Student's t-test results for feature selection according to p-value degressively.

Feature	h	p	C_i
BIS	1	1.34e-130	[7.71, 8.99]
BSCS	1	8.74e-127	[4.16, 4.86]
Motor	1	1.02e-99	[4.78, 5.71]
Neuroticism	1	9.04e-84	[−8.76, −7.19]
Non-planning	1	2.66e-82	[4.71, 5.74]
Conscientiousness	1	8.32e-67	[−7.66, −6.13]
Attention	1	7.33e-23	[1.70, 2.53]
Agreeableness	1	4.59e-21	[−4.76, −3.13]
Extraversion	1	1.57e-17	[2.72, 4.33]
CBF-PI	1	1.81e-16	[7.00, 11.34]
Openness	1	3.08e-12	[−3.56, −2.00]
Sex	0	0.14	[−0.11, −0.00]
Age	0	0.50	[−0.01, 0.07]

6 Features: BIS, BSCS, Motor, Neuroticism, Non-planning and Conscientiousness 7 Features: 6 Features and Attention 8 Features: 7 Features and Agreeableness 11 Features: 8 Features, Extraversion, CBF-PI and Openness 13 Features: 11 Features, Sex and Age Student's t-test results for feature selection according to p-value degressively.

IA detection performance

In this study, we made a systematic comparison of IA detection performance among SVM methods and FNN methods as presented in (Wang et al., 2008). We also compared the performance of C– SVM and ν-SVM with and without grid search. For SVMs, 10-fold cross validation was applied to avoid over-fitting. All the experiments were repeated for 1000 times to increase the reliability.

IA detection performance of SVM (without grid search) and FNN

In Table 4, the detection performance of C-SVM, ν-SVM, and FNN without grid search on different datasets are shown. The parameters of SVMs without grid search were set as the default values, where C and g both were 1. In 13-feature and 8-feature datasets, the best performances are from FNN with the accuracy of 69.73% and 75.58%, respectively. It can be observed that feature selection helped improve the performance of three methods. C-SVM had the largest increment (27.7%) and then followed FNN (20.6%). ν-SVM had the least, which was 20.5%.

Table 4

Detection results of C-SVM, ν-SVM and FNN without grid search on every dataset.

Dataset	C-SVM (%)	ν-SVM (%)	FNN (%)
13F₀	51.00 ± 1.49	67.51 ± 1.53	69.73
13F₁	51.46 ± 0.83	54.47 ± 1.32	57.41
13F₂	51.95 ± 0.66	65.71 ± 1.91	57.39
11F₀	52.66 ± 1.14	56.93 ± 0.73	54.98
11F₁	51.30 ± 1.08	57.42 ± 1.30	57.35
11F₂	51.59 ± 0.30	62.74 ± 2.81	57.73
8F₀	72.51 ± 0.07	60.65 ± 0.79	75.58
8F₁	77.52 ± 0.33	75.00 ± 0.72	74.91
8F₂	77.23 ± 0.37	75.00 ± 0.71	74.67
7F₀	74.89 ± 0.32	73.79 ± 0.68	74.45
7F₁	77.60 ± 1.28	76.04 ± 0.28	73.89
7F₂	78.72 ± 1.46	77.43 ± 1.23	72.47
6F₀	76.05 ± 0.74	75.03 ± 0.73	71.74
6F₁	75.15 ± 0.78	75.76 ± 0.71	72.47
6F₂	75.08 ± 0.73	75.79 ± 0.72	72.39

F0 refers to the dataset without normalization. F1 refers to the dataset with the normalization in [−1, 1]. F2 refers to the dataset with the normalization in [0, 1].

The bold emphases the best detection accuracy by different classifiers.

Detection results of C-SVM, ν-SVM and FNN without grid search on every dataset. F0 refers to the dataset without normalization. F1 refers to the dataset with the normalization in [−1, 1]. F2 refers to the dataset with the normalization in [0, 1]. The bold emphases the best detection accuracy by different classifiers. Two types of normalization were also examined in this work. But among all the five datasets, these two ways of normalization did have much difference. Compared with the datasets without normalization, normalization made little contribution to the performance improvement.

IA detection performance of SVM with grid search

SVM algorithms were shown more flexible of our datasets. The next step of this study focused on the optimization of the parameters in SVM. Grid search is a method to optimize the parameters of RBF function in SVM to improve the detection performance. The comparison results are shown in Table 5. Based on 13-feature and 11-feature datasets, we found that the detection performance was not improved by the grid search. While in 8-feature datasets, C-SVM performed best and the best was from the dataset normalized in [0, 1] with the accuracy of 78.90%, which was increased about 3% by grid search. In 7-feature datasets, C-SVM also performed best and the best one was from the dataset normalized in [−1, 1], whose accuracy was 84.78%. In 6-feature datasets, C-SVM performed best in the dataset without normalization, which was also the best in this study and its accuracy and standard deviation were 96.32% and 0.18% respectively. It can be observed that grid search improved the performance in a large scale. Meanwhile, the normalization still made little contribution to the performance improvement and Student's t-test succeeded to select the features and improve the performance in a large scale. As for the increment, C-SVM (45.32%) was larger than ν-SVM (24.03%).

Table 5

Detection results of C-SVM, ν-SVM and FNN without grid search on every dataset.

Dataset	C-SVM (%)	ν-SVM (%)	The best in Table 3 (%)
13F₀⁎	51.00 ± 1.48	67.77 ± 2.01	69.73c
13F₁	51.46 ± 0.83	56.91 ± 1.14	57.41c
13F₂	51.95 ± 0.66	65.71 ± 2.78	65.71 ± 1.91b
11F₀	51.16 ± 1.14	57.40 ± 1.65	56.93 ± 0.73b
11F₁	51.30 ± 1.10	57.42 ± 1.32	57.42 ± 1.30b
11F₂	51.59 ± 0.30	65.16 ± 2.81	62.74 ± 2.81b
8F₀	78.27 ± 0.41	73.70 ± 0.29	75.58c
8F₁	78.74 ± 0.35	77.34 ± 0.38	77.52 ± 0.33a
8F₂	78.90 ± 0.19	77.06 ± 0.53	77.23 ± 0.37a
7F₀	72.28 ± 0.34	75.08 ± 0.29	74.89 ± 0.32a
7F₁	84.78 ± 1.31	80.94 ± 0.47	77.60 ± 1.28a
7F₂	84.58 ± 1.47	78.33 ± 1.11	78.72 ± 1.46a
6F₀	96.32 ± 0.18	78.32 ± 0.36	76.05 ± 0.74a
6F₁	76.05 ± 0.74	78.54 ± 0.44	75.76 ± 0.71b
6F₂	76.02 ± 0.59	78.74 ± 0.43	75.79 ± 0.72b

The bold emphases the best detection accuracy by different classifiers.

The feature is the same as that in Table 3.

This indicates that the best results in Table 3 is C-SVM.

This indicates that the best results in Table 3 is ν-SVM.

This indicates that the best results in Table 3 is FNN.

Detection results of C-SVM, ν-SVM and FNN without grid search on every dataset. The bold emphases the best detection accuracy by different classifiers. The feature is the same as that in Table 3. This indicates that the best results in Table 3 is C-SVM. This indicates that the best results in Table 3 is ν-SVM. This indicates that the best results in Table 3 is FNN.

Accuracy, parameters and computation time in grid search of 6-feature dataset without normalization

Grid search is a time-consuming task to find the best parameters. Among these five datasets, 6-feature datasets were achieved with the best IA detection performance the least features. The following step of this study focused on extracting the best parameter value which can save the computation cost most. The smaller C-step and g-step are, the more time will be consumed. All the experiments were repeated 100 times to increase the dependency. The relationship between (C-step, g-step) pair and accuracy in 6-feature datasets without normalization can be found in Fig. 3, Fig. 4. There are 256 lines in the plots and each line with a different color stands for different g-step in Fig. 3 and different C-step in Fig. 4 which are both from 1 to 256.

Fig. 3

Fig. 4

The relationship between C-step and accuracy when C-step and accuracy when C-step is fixed for C-SVM in 6-feature dataset without normalization. Each line with different color stands for different C-step from 1 to 256. The x-axis is the C-step from 1 to 256 while the y-axis is the accuracy. Point A has the best performance whose C-step is 35 and g-step is 12, which corresponds to Point A in Fig. 3.

The relationship between g-step and accuracy when g-step is fixed for C-SVM in 6-feature dataset without normalization. Each line with different color stands for different g-step from 1 to 256. The x-axis is the g from 1 to 256 while the y-axis is the accuracy. Point A has the best performance whose C-step is 35 and g-step is 12. Point B (g-step is 16) and C (g-step is 17) are the tuning points, which shows that g-step is the key parameter. The relationship between C-step and accuracy when C-step and accuracy when C-step is fixed for C-SVM in 6-feature dataset without normalization. Each line with different color stands for different C-step from 1 to 256. The x-axis is the C-step from 1 to 256 while the y-axis is the accuracy. Point A has the best performance whose C-step is 35 and g-step is 12, which corresponds to Point A in Fig. 3. In Fig. 3, point A shows the overall best performance and the accuracy is 96.32%, where C-step is 35 and g-step is 12. It can be found that all lines are overlapped together from point B (C-step ∈ [1, 256], g-step = 16) to C (C-step ∈ [1, 256], g-step = 17). In Fig. 4, point A corresponded to the overall best performance in Fig. 3. It was found that the accuracy did not vary a lot with the C-step and two part of lines were separated based on different g-step. Accordingly, compared with the C-step value, g-step was shown as the key parameter rather than C-step in this experiment. According to the results we got above, the next step was to study the relationship among g-step, the calculating time and the accuracy when C-step was ignored to inspect the effect of key parameter g-step on calculating time and accuracy, which can be found in Fig. 5. Point E is the “compromised” point whose average accuracy is 96.06% and g-step is 37 while its average time is 33.94 s. Point D is the best point with the average accuracy 96.32% and g-step is 1 and took about 228.7 s, which is reasonable because the smaller step can lead to a larger chance to find the best result in grid search. Thus, an appropriate parameter pair was found (C-step ∈ [1, 256], g-step = 37) to balance the classification performance and calculating time of our IA detection task.

Fig. 5

The relationship among g-step, time and accuracy. Point E's accuracy is 96.06% whose g-step is 37 and calculating time is 33.94 s. Point D's accuracy is 96.08% whose g-step is 1 and calculating time is 228.7 s. The x-axis is the g-step value. The y-axis is the mean value of time for each g-step. The z-axis is the mean value of accuracy for each g-step.

Discussion

In the previous works, the researchers used questionnaire and statistic methods to find the relationship between IAD and the features they used (Kuss, Griffiths, & Binder, 2013; Orsal, Orsal, Unsal, & Ozalp, 2013). Statistical methods can only give the degree of correlation, which cannot be used to predict whether the patients have IA or not directly. This work utilized machine learning using more questionnaires and a relative larger dataset (Di et al., 2017; Gong et al., 2016; Wang et al., 2008) and our results show its efficiency in this kind of task, which provided a new view for the researchers in this area. In the works using multiple questionnaires (Kim et al., 2006; Tsai et al., 2009; Xin et al., 2018), some questions maybe are duplicated or some features maybe are correlated. It is necessary to find the relationship among these features. This work supports the previous works that the features which are found correlated with IA will lead to or prevent IA in some scale (Fumero et al., 2018; Robertson et al., 2018). Meanwhile, it also plays a role as supplement to find the relationship among features in the task using several questionnaires, which our results demonstrated. As for the data collection and experiment design, it is important to do some data pre-processing including de-duplication and de-noising. It is better to use a larger dataset in a large age range. Cross validation is also necessary in the experiment, which will improve the generality and performance of the classifier especially when the dataset is relatively small (Kohavi et al., 1995; Seni & Elder, 2010). Parameter selection can provide a suitable parameter pair to balance the computation cost and performance, which provides possibility for engineering and usage in a really large scale. Questionnaire is a kind of structured data which consists of numeric value according to the features it has. The results in this work demonstrated the efficiency of machine learning methods to deal with questionnaire data in IA. For future study, researchers can refer to our work to build an IA prediction system. Although the dataset we used is relatively large, more participants in different ages and more questionnaires will make our work more general. SVM is a classical method and always plays a role as baseline of classification methods. However, the development of deep learning makes the performance of classifier much better than before, which is also a choice for IAD detection (Goodfellow, Bengio, & Courville, 2016). As for practical meaning, this work provides a new choice to detect IA among teenagers in advance in a simple and quick way. People can use online or offline simple questionnaires mentioned above to detect IA more efficiently and precisely, allowing them to diagnose and cure IA among Chinese college students in time.

Conclusion

In this study, we carried out systematic experiments to detect IA using SVMs and FNN. The results proved the feasibility of using machine learning methods to detect IA. SVM methods were found more flexible for our questionnaire datasets. With grid search, a best parameter pair of C-SVM was achieved (C-step ∈ [1, 256], g-step = 37, t = 33.94 s, accuracy = 96.06% at the 6-feature dataset without normalization) and g-step is more important to the accuracy than C-step as the key parameter in this experiment. More interestingly, BIS-11, BSCS, Motor, Neuroticism, Non-planning and Conscientiousness are shown as a better detection feature combination of IA detection. This indicates the researchers may make more effort to study the relationship between these 6 features and IA. Based on these features to predict the IA risk, it may be a future research interest.

47 in total

1. Internet addiction associated with features of impulse control disorder: is it a real psychiatric disorder?

Authors: T Treuer; Z Fábián; J Füredi
Journal: J Affect Disord Date: 2001-10 Impact factor: 4.839

Review 2. Machine learning for medical diagnosis: history, state of the art and perspective.

Authors: I Kononenko
Journal: Artif Intell Med Date: 2001-08 Impact factor: 5.326

3. Internet over-users' psychological profiles: a behavior sampling analysis on internet addiction.

Authors: Leo Sang-Min Whang; Sujin Lee; Geunyoung Chang
Journal: Cyberpsychol Behav Date: 2003-04

4. High self-control predicts good adjustment, less pathology, better grades, and interpersonal success.

Authors: June P Tangney; Roy F Baumeister; Angie Luzio Boone
Journal: J Pers Date: 2004-04

5. Gender differences and related factors affecting online gaming addiction among Taiwanese adolescents.

Authors: Chih-Hung Ko; Ju-Yu Yen; Cheng-Chung Chen; Sue-Huei Chen; Cheng-Fang Yen
Journal: J Nerv Ment Dis Date: 2005-04 Impact factor: 2.254

Review 6. An introduction to the five-factor model and its applications.

Authors: R R McCrae; O P John
Journal: J Pers Date: 1992-06

7. Screening for Internet addiction: an empirical study on cut-off points for the Chen Internet Addiction Scale.

Authors: Chih-Hung Ko; Ju-Yu Yen; Cheng-Fang Yen; Cheng-Chung Chen; Chia-Nan Yen; Sue-Huei Chen
Journal: Kaohsiung J Med Sci Date: 2005-12 Impact factor: 2.744

8. Tridimensional personality of adolescents with internet addiction and substance use experience.

Authors: Chih-Hung Ko; Ju-Yu Yen; Cheng-Chung Chen; Sue-Huei Chen; Kuanyi Wu; Cheng-Fang Yen
Journal: Can J Psychiatry Date: 2006-12 Impact factor: 4.356

9. Internet addiction in Korean adolescents and its relation to depression and suicidal ideation: a questionnaire survey.

Authors: Kyunghee Kim; Eunjung Ryu; Mi-Young Chon; Eun-Ja Yeun; So-Young Choi; Jeong-Seok Seo; Bum-Woo Nam
Journal: Int J Nurs Stud Date: 2006-02 Impact factor: 5.837

10. Psychiatric comorbidity assessed in Korean children and adolescents who screen positive for Internet addiction.

Authors: Jee Hyun Ha; Hee Jeong Yoo; In Hee Cho; Bumsu Chin; Dongkeun Shin; Ji Hyeon Kim
Journal: J Clin Psychiatry Date: 2006-05 Impact factor: 4.384

2 in total

1. Personality Traits of Croatian University Students with Internet Addiction.

Authors: Ivan Miskulin; Ivana Simic; Nika Pavlovic; Jelena Kovacevic; Ivica Fotez; Goran Kondza; Hrvoje Palenkic; Vesna Bilic-Kirin; Marinela Kristic; Maja Miskulin
Journal: Behav Sci (Basel) Date: 2022-06-01

Review 2. How has Internet Addiction been Tracked Over the Last Decade? A Literature Review and 3C Paradigm for Future Research.

Authors: Xuan-Lam Duong; Shu-Yi Liaw; Jean-Luc Pradel Mathurin Augustin
Journal: Int J Prev Med Date: 2020-11-09

2 in total