Literature DB >> 32773870

Artificial intelligence-based classification of schizophrenia: A high density electroencephalographic and support vector machine study.

Sai Krishna Tikka1, Bikesh Kumar Singh2, S Haque Nizamie3, Shobit Garg4, Sunandan Mandal5, Kavita Thakur5, Lokesh Kumar Singh1.   

Abstract

BACKGROUND: Interview-based schizophrenia (SCZ) diagnostic methods are not completely valid. Moreover, SCZ-the disease entity is very heterogeneous. Supervised-Machine-Learning (sML) application of Artificial-Intelligence holds a tremendous promise in solving these issues. AIMS: To sML-based discriminating validity of resting-state electroencephalographic (EEG) quantitative features in classifying SCZ from healthy and, positive (PS) and negative symptom (NS) subgroups, using a high-density recording. SETTINGS AND
DESIGN: Data collected at a tertiary care mental-health institute using a cross-sectional study design and analyzed at a premier Engineering Institute.
MATERIALS AND METHODS: Data of 38-SCZ patients and 20-healthy controls were retrieved. The positive-negative subgroup classification was done using Positive and Negative Syndrome Scale operational-criteria. EEG was recorded using 256-channel high-density equipment. Eight priori regions-of-interest were selected. Six-level wavelet decomposition and Kernel-Support Vector Machine (SVM) method were used for feature extraction and data classification. STATISTICAL ANALYSIS: Mann-Whitney test was used for comparison of machine learning-features. Accuracy, sensitivity, specificity, and area under receiver operating characteristics-curve were measured as discriminatory indices of classifications.
RESULTS: Accuracy of classifying SCZ from healthy and PS from NS SCZ, were 78.95% and 89.29%, respectively. While beta and gamma frequency related features most accurately classified SCZ from healthy controls, delta and theta frequency related features most accurately classified positive from negative SCZ. Inferior frontal gyrus features most accurately contributed to both the classificatory instances.
CONCLUSIONS: SVM-based classification and sub-classification of SCZ using EEG data is optimal and might help in improving the "validity" and reducing the "heterogeneity" in the diagnosis of SCZ. These results might only be generalized to acute and moderately ill male SCZ patients. Copyright:
© 2020 Indian Journal of Psychiatry.

Entities:  

Keywords:  Feature-extraction; machine-learning; negative symptoms; positive symptoms; validity

Year:  2020        PMID: 32773870      PMCID: PMC7368447          DOI: 10.4103/psychiatry.IndianJPsychiatry_91_20

Source DB:  PubMed          Journal:  Indian J Psychiatry        ISSN: 0019-5545            Impact factor:   1.759


INTRODUCTION

Schizophrenia (SCZ) is clinically diagnosed based on a set of criteria that are elicited by one-to-one interviews. As diagnostic interviews are conducted by a broad set of mental health professionals who vary in terms of qualification, experience, and time at their disposal to conduct the interview, they are bound to lack “reliability.” Structured interview schedules and chart-based diagnoses with structured assessments have been put forth to overcome this problem.[1] Although regarded to possess “utility,” even such methods are far from being termed “valid;” In true sense, diagnostic entities defined by clinical criteria are deemed “valid” only when they can be established as truly discrete entities.[2] Moreover, SCZ is a very heterogeneous entity, and in fact, it has been referred to in the plural i.e., “schizophrenias.”[34] The complexity of causal mechanisms and diversity of clinical presentations are considerably greater for SCZ vis-à-vis other psychiatric and medical disorders, at large. Till “personalized psychiatry” becomes a genuine reality, “heterogeneity” continues to pose a notable challenge to the validity of the diagnosis. Categorical/cluster subgrouping of SCZ into more homogeneous entities has been the answer so far. Of the various sub-classifications proposed thus far, Andreasen and Oslen's[5] division of SCZ into “positive” and “negative” clusters has been the most consistently used model in clinical research.[6] More importantly, this long-trusted division has been shown to be the most reliable in identifying the best treatment for a given individual.[7] With a critical focus on the validity of clinical interview-based diagnosis and growing emphasis on enabling “personalized” or “precision” treatment, the need to identify biomarkers for diagnosing SCZ has risen tremendously.[8] Electroencephalographic (EEG) activity, particularly resting-state activity, has been proposed as potential biomarkers for diagnosing SCZ.[9101112] Besides the advantage of being cost-effective and of unparalleled temporal resolution inherent to EEG, studying resting-state activity enhances the ease and reproducibility of the procedure. In fact, some studies have shown resting-state EEG to differentiate positive and negative symptom (NS) subgroups of SCZ.[131415] However, by and large, the ability of resting-state EEG to distinguish subgroups is weak, when other study results are considered.[10] This lack of consistency in the association between conventional EEG markers, such as spectral power and coherence, and SCZ psychopathology has paved way for more advanced methods such as machine learning (ML). ML, an application of artificial intelligence (AI), enables machines to learn automatically, and improvise with progressive experience without being programmed explicitly.[16] Broadly, ML paradigms are classified into-(1) supervised learning, (2) unsupervised learning and (3) reinforcement learning. Unsupervised and reinforcement learning paradigms are used in robotics and big data visualization, respectively. Supervised ML (sML) is used for classification and regression; and hence becomes pertinent to the context of classifying SCZ. sML based classification has shown “exuberant” promise in solving the problem of heterogeneity in SCZ.[1718] sML-based approaches for discriminating EEG patterns of SCZ and healthy individuals have gained much attention in recent years. Recently, several studies have investigated ML-based diagnosis of SCZ using EEG data.[19202122] However, classifying positive and negative subgroups of SCZ, using sML based tools has not been explored. With this literature back-up, we aimed to assess sML-based discriminating validity of resting-state EEG quantitative features in classifying SCZ from healthy, and positive and negative subgroups, using high-density recording. We also intended to identify specific frequencies and regions that classify the groups most accurately.

MATERIALS AND METHODS

Study design

We used a cross-sectional study design with a comparative group.

Participants

The data were retrieved from a project titled “transcranial magnetic stimulation in modulating neurodevelopmental factors in SCZ” (EEG was one of the secondary outcome variables) that was approved by the Institute Ethics Committee of a tertiary care mental-health institute in Eastern India; this project was registered in the clinical trials registry India (2014-12-005280) prior to recruitment of subjects. Written informed consent was taken from all the participants (and their legally qualified representatives in case of patients) before enrolling them for the study. Data of a total of 38 patients and 20 healthy controls were retrieved. Patients were recruited by purposive sampling from various inpatient wards of the institute. Right-handed, male patients in the age group of 18–50 years, having a diagnosis of SCZ as per the ICD-10 DCR[23] on a stable dose of antipsychotic medications, i.e., no change in the dosage for at least the last 5 days. These patients were recruited within a week of admission in the hospital for the first episode or for an acute exacerbation or relapse; all patients were deemed “symptomatic.” Patients having history of neurological illness, significant head injury, comorbid substance dependence (excluding nicotine and caffeine), other psychiatric disorder, significant medical disorders (such as uncontrolled diabetes, hypertension, and tuberculosis), disruptive behavior (suicidal or homicidal) that warranted immediate intervention or history of electroconvulsive therapy within previous 6 months were excluded. The healthy control “HC” group included right-handed age-matched subjects, recruited among the hospital staff and community living in the vicinity of the institute. The HC group did not include subjects with any psychiatric or neurological illnesses, substance dependence (excluding nicotine and caffeine), and significant medical disorders. We defined the patient sample based on following criteria for categorizing positive and negative sub-groups according to Andreasen et al.:[24] Criteria a: Scores ≥4 on P1, P3, and G9 of the Positive and Negative Syndrome Scale (PANSS)[25] Criteria b: Scores ≥4 on N1, N4 and N6 of PANSS. Those fulfilling “criteria a” and “criteria b” exclusively were designated as a positive symptom (NS) and negative symptom (NS) group, respectively. Those patients who did not exclusively meet any of these criteria (i.e., fulfilled both criteria [n = 8] or none of the two [n = 2]) were not included for sub-analysis. Consequently, 18 patients were designated to PS group and 10 patients to the NS group.

Tools

Clinical assessments

Relevant sociodemographic and clinical data were collected from all the participants. Handedness was assessed using the Sidedness Bias Schedule-Hindi version.[26] The baseline severity of psychopathology in patients was evaluated by administering PANSS.[25] Healthy controls were screened with the General Health Questionnaire-12;[27] only those with scores <3 were included.

Electroencephalographic recording

All participants underwent an EEG recording. The recording was carried out between 0900 and 1200 h at the Institute's Centre for Cognitive Neurosciences. Participants were advised to avoid use of tea, coffee, or nicotine for at least 1 h before recording. In a light and sound-attenuated room, 10 min of resting-state EEG was recorded for each participant, while sitting, eyes closed, on a reclining chair. EEG was acquired on the Geodesic EEG System 400 (Electrical Geodesics, Inc., Eugene, Oregon, USA) system with 256 EEG channel Geodesic Sensor Nets; electrodes were placed according to the international 10–10 system of electrode placement [Figure 1]. Eye movement potentials were monitored using the right and left electrooculogram channels. Electrode impedance was kept <50 kΩ. EEG was filtered (time constant – 0.1 s, high-frequency filter – 120 Hz), digitized (sampling rate – 256 Hz) and artifact-rejected (performed on a moving average of 80 ms (bad channels [5 or more]; >200 mV); eye movements (>55 mV for >20% of each 6-s epoch) and eye blinks (>140 mV for >20% of each epoch)) using Net Station 5.1 software (Electrical Geodesics, Inc., Eugene, Oregon, USA). The EEG signals were further preprocessed using EEGLAB software for baseline correction and common average referencing.[28]
Figure 1

Showing channel placement of the 256 sensor net according to 10-10 international system

Showing channel placement of the 256 sensor net according to 10-10 international system

Analysis

Feature extraction

First, 60-s epochs of artifact-free EEG data were visually selected from each recording after carefully excluding segments with eye movement, blink and electromyogram, movement, electrode, and perspiration artifacts or drowsiness changes. EEG signals corresponding to eight regions of interest (ROI)-right and left inferior frontal gyrus (IFG), dorsolateral prefrontal cortex (DLPFC), inferior parietal lobule (IPL), and superior temporal gyrus (STG) [Figure 1] were averaged. Channels corresponding to these regions were selected as per estimated anatomical cortical projections for the international 10–10 system of electrode placement.[29] Six level wavelet decomposition with Daubechies-9 mother wavelet was carried out to extract five EEG bands, namely-delta (0–4 Hz), theta 4–8 Hz), alpha (8–12 Hz), beta (16–32), and gamma (32–64 Hz), as shown in Supplementary Figure 1. Wavelet decomposition results in approximate (cA) and detail coefficients (cD) corresponding to low frequency and high-frequency components, respectively, of EEG signals. As shown in Supplementary Figure 1, the gamma and beta bands are obtained after the 3rd and 4th level decomposition, respectively, from detail cD. Further, the alpha band is obtained after the first level decomposition of details cD cD5. Finally, delta and theta bands are obtained at the 6th level wavelet decomposition. After extracting the five EEG bands, the corresponding time-domain signals were obtained using wavelet reconstruction. Twelve statistical features, namely mean, kurtosis, skewness, entropy, variance, standard deviation, minimum value, maximum value, range, crest factor, form factor, power, were then extracted from each band resulting in 60 features from each EEG signal. This step was repeated for all the eight selected ROIs, resulting in 480 features.

Statistical significance analysis

Statistical significance analysis was carried out to assess the statistical significance of the extracted features. In this study, non-parametric-Mann–Whitney U-test was used for assessing the statistical significance of the extracted 480 features. This test was conducted using the Statistical Package for Social Sciences (SPSS) Version 21.0 (IBM Corp., Armonk, NY, USA) for a 95% confidence interval. All the features with value of P < 0.05 were considered statistically significant. Only significant features were used further for analysis in the next step.

Classification using machine learning

Figure 2 shows the scheme for the use of sML, the method in classifying the EEG data sets.
Figure 2

General machine learning pipeline for classification of positive and negative samples. In experiment 1, schizophrenia was considered a positive group, and healthy was considered as a negative group. In experiment 2, positive symptom was considered positive and negative symptom was considered a negative group

General machine learning pipeline for classification of positive and negative samples. In experiment 1, schizophrenia was considered a positive group, and healthy was considered as a negative group. In experiment 2, positive symptom was considered positive and negative symptom was considered a negative group The dataset was divided into two groups-training and testing. The training group was used to develop the ML model, while the testing group was used to evaluate its performance. During the training phase, the classifier was supplied with both EEG signals as well as the corresponding class/group (ground truth). In the testing phase, the model was supplied only with EEG signals, and the model predicted its category. The data division strategies used were “hold-out” and “k-fold” cross-validation methods. In the hold-out method, the dataset is divided into two groups randomly wherein 67% of the samples were used for training and 33% of the samples were used for testing the Support Vector Machine (SVM) classifier model. In k-fold cross-validation protocol, the whole dataset was split into “k” groups, consisting of an approximately identical number of samples. Out of “k” groups, “k-1” groups were used for training, while residual one group was used for testing the SVM classifier model. The process is continued “k” times, and average performance over “k” rounds was calculated. The value of k = 10 was used in this study for both the experiments. Two classification experiments were performed in this study-(1) SCZ versus healthy control (HC) group and (2) Positive (PS) versus NS group. The whole dataset, i.e., n = 58 (38 SCZ + 20 healthy) in experiment-1 and n = 28 (18 positive symptoms + 10 NSs) in experiment-2 were divided into two sets namely, training and testing group. For training, one of the most popular sML methods called SVM was used. In SVM, the machine constructs numerous hyperplanes using training data set, which splits the two groups. The maximum margin maintaining hyperplane from samples of one group to that of others is chosen as the finest hyperplane for classification.[30] In SVM classifier, the kernel function maps the input to suitable feature space. Non-linear kernels are used when the decision boundary among the two groups to be classified is nonlinear. Several SVMs with dissimilar kernel functions were used for classification in the present study, namely-linear SVM, quadratic SVM, cubic SVM, fine Gaussian SVM, medium Gaussian SVM and coarse Gaussian SVM. Supplementary Table 1 shows the parameters used for different SVM classifiers.
Supplementary Table 1

Different types of support vector machine classifier and the parameters used in this study

Classification methodKernel typeDescription
Linear SVMLinear kernel
Quadratic SVMPolynomial kernel
Cubic SVMPolynomial kernel
Fine Gaussian SVMGaussian radial basis function
Medium Gaussian SVMGaussian radial basis function
Course Gaussian SVMGaussian radial basis function

f is the kernel function, xi and xj are the input feature vectors and σ is kernel parameter. SVM – Support vector machine

Different types of support vector machine classifier and the parameters used in this study f is the kernel function, xi and xj are the input feature vectors and σ is kernel parameter. SVM – Support vector machine

Performance evaluation

The testing performance of the SVM classifier model was evaluated using four evaluation measures-accuracy (ACC), sensitivity (SEN), specificity (SPE), and Area Under receiveroperating characteristics Curve (AUC) [Supplementary Table 2].
Supplementary Table 2

Different performance measures used for evaluation of support vector machine classifier

MeasureDescriptionMathematical expression
AccuracyPercentage of correctly classified samples
SensitivityPercentage of correctly classified samples belonging to schizophrenia (experiment 1)/positive symptoms (experiment 2)
SpecificityPercentage of correctly classified samples belonging to the healthy group (experiment 1)/positive symptoms (experiment 2)
Area under receiver operating characteristic curveA common measure of sensitivity and specificity

tp – True positives (number of correctly classified positive samples), tn – True negatives (number of correctly classified negative samples), fp – False positives (number of wrongly classified positive samples), fn – False negatives (number of wrongly classified negative samples). In experiment 1 – Schizophrenia was considered as positive group and healthy was considered as negative group. In experiment 2 – Positive symptom was considered as positive group and negative symptom was considered as negative group. SVM – Support vector machine; AUC – Area under receiver operating curve

Different performance measures used for evaluation of support vector machine classifier tp – True positives (number of correctly classified positive samples), tn – True negatives (number of correctly classified negative samples), fp – False positives (number of wrongly classified positive samples), fn – False negatives (number of wrongly classified negative samples). In experiment 1 – Schizophrenia was considered as positive group and healthy was considered as negative group. In experiment 2 – Positive symptom was considered as positive group and negative symptom was considered as negative group. SVM – Support vector machine; AUC – Area under receiver operating curve This article has been written and reports the findings of the study according to the available guidelines for reporting ML studies.[31]

RESULTS

Sample characteristics

Table 1 shows the comparison of sociodemographic and clinical variables between the groups-SCZ versus HC and PS versus NS. On SCZ versus HC comparison, both groups were comparable on all variables except employment and habitat. Significantly higher number of SCZ patients were unemployed and belonged to rural habitat than the healthy controls. As the scores on PANSS were the basis for PS-NS subdivision, on PS versus NS comparison, significantly higher positive syndrome scores in the PS group and significantly higher negative syndrome scores in the NS group were found. These sub-groups were comparable to other clinical variables.
Table 1

Comparison of sociodemographic and clinical variables across the groups

VariablesSCZ (n=38)Mean±SD/n (%)HC (n=20)Mean±SD/n (%)t/χ2, df=1, 57PS (n=18)Mean±SD/n (%)NS (n=10)Mean±SD/n (%)t/χ2, df=1, 27
Age (years)31.56±7.0529.95±3.780.9332.50±8.7129.30±4.711.07
Marital status
 Unmarried15 (39.5)8 (40)0.018 (44.4)3 (30)0.56
 Married23 (60.5)12 (60)10 (55.6)7 (70)
Religion
 Hindu28 (73.7)18 (90)2.1712 (66.7)8 (80)0.58
 Non-Hindu10 (26.3)2 (10)6 (33.3)2 (20)
Educationf
 Illiterate/primary13 (34.2)7 (35)4.453 (16.7)4 (40)5.72
 Secondary7 (18.4)0 (0)7 (38.9)0 (0)
 Graduate18 (47.4)13 (65)8 (44.4)6 (60)
Employment
 Unemployed30 (78.9)6 (30)13.33***5 (27.8)2 (20)0.20
 Employed8 (21.1)14 (70)13 (72.2)8 (80)
Socioeconomic statusf
 Lower16 (42.1)3 (15)4.405 (27.8)5 (50)1.74
 Middle21 (55.3)16 (80)12 (66.7)5 (50)
 Higher1 (2.6)1 (5)1 (5.6)0 (0)
Habitatf
 Rural23 (60.5)5 (25)8.20*9 (50)7 (70)2.14
 Semi urban3 (7.9)6 (30)3 (16.7)0 (0)
 Urban12 (31.6)9 (45)6 (33.3)3 (30)
Duration of illness (months)47.18±26.0443.22.±27.4055.20±31.441.05
Chlorpromazine equivalents425.66±163.65438.33±127.67418.92±198.900.19
PANSS
 Positive21.84±5.2325.83±3.3616.90±2.517.31***
 Negative20.94±8.1115.56±3.2830.70±5.549.14***
 General psychopathology29.24±8.3627.61±5.9133.50±9.742.00
 Total72.03±13.3169.00±8.3281.10±12.063.14**

*P<.05; **P<0.01; ***P<0.001; fFisher exact test used. SCZ – Schizophrenia patients; HC – Healthy controls; PS – Schizophrenia patients with predominantly positive symptoms; NS – Schizophrenia patients with predominantly negative symptoms; PANSS – Positive and negative syndrome scale; SD – Standard deviation

Comparison of sociodemographic and clinical variables across the groups *P<.05; **P<0.01; ***P<0.001; fFisher exact test used. SCZ – Schizophrenia patients; HC – Healthy controls; PS – Schizophrenia patients with predominantly positive symptoms; NS – Schizophrenia patients with predominantly negative symptoms; PANSS – Positive and negative syndrome scale; SD – Standard deviation

Significant machine-learning-based features

Schizophrenia versus healthy controls (experiment-1)

Comparison of all the extracted features (12 features × 5 frequencies × 8 regions = 480) between patient and the healthy control groups found 36 features to be significantly different [Table 2A]-mostly belonging to high-frequency activity (beta and gamma) in all ROIs.
Table 2

Significant features found statistically significant between the groups

A. Experiment-1: SCZ (n=38) versus HC (n=20)

RegionsFrequencyFeatureMean±SEMP

SCZHC
Left IFGThetaEntropy2.7352±0.075822.4468±0.068750.003
Variance92.1948±57.55483121.8455±81.115640.033
SD5.7851±1.755446.0636±1.435350.038
Power92.1886±57.55100121.8374±81.110230.033
GammaSkewness0.1112±0.06864−0.0382±0.022440.003
Right IFGDeltaSkewness−0.3342±0.226910.2836±0.082750.016
BetaSkewness0.1463±0.10157−0.0264±0.028570.039
GammaSkewness0.1203±0.07139−0.0278±0.035490.014
Left DLPFCDeltaSkewness−0.6782±0.20258−0.0545±0.097060.008
Crest form3.0215±0.137493.6768±0.173930.023
GammaSkewness0.1165±0.06885−0.0054±0.014800.015
Right DLPFCDeltaSkewness−0.3277±0.262770.3257±0.105900.024
BetaSkewness0.1520±0.102290.0255±0.026590.010
GammaSkewness0.1208±0.07167−0.0170±0.009510.002
Left IPLThetaForm factor1955.7576±1398.23182−552.9008±365.594270.012
GammaMean−0.0001±0.000090.0000±0.000040.036
Right IPLAlphaVariance32.3363±14.5355232.4062±25.703120.031
SD4.2728±0.854053.1815±0.775140.041
Power32.3342±14.5345532.4041±25.701410.031
BetaVariance44.6849±16.8789633.8945±20.784370.036
SD5.3190±0.917863.8982±0.706420.039
Minimum−58.7084±21.69012−25.9060±9.702500.010
Maximum61.7134±23.1227826.2544±10.161210.005
Range120.4218±44.7678652.1604±19.863220.010
Power44.6819±16.8778433.8922±20.782980.036
GammaVariance43.6071±29.0771315.9474±12.943890.008
SD3.9643±1.209292.1990±0.547510.011
Maximum61.7252±27.5368719.6251±9.353170.036
Range124.1787±55.6501238.7074±18.200580.048
Power43.6041±29.0751915.9463±12.943030.008
Left STGThetaForm factor734.5629±1152.24726−339.0403±789.730240.048
GammaVariance374244.9815±374230.1110915.8634±12.857150.033
SD64.7792±61.996342.1937±0.544280.039
Power374220.0319±374205.1624215.8624±12.856290.033
Right STGBetaMaximum61.5651±23.2516127.9247±10.313700.041
Range119.5335±45.0129055.2043±20.071560.048

B. Experiment-2: PS (n=18) versus NS (n=10)

RegionsFrequencyFeatureMean±SEMP

PSNS

Left IFGDeltaMean−0.0038±0.002080.0011±0.000640.021
Crest factor4.1242±0.295852.9817±0.183310.014
ThetaMean0.0031±0.00157−0.0012±0.000550.027
Form factor953.9041±1738.82570−1961.1192±816.333880.002
Right IFGDeltaMean−0.0016±0.001310.0020±0.000900.024
Variance36089.0665±9247.957452906.2327±1010.585740.035
SD134.2456±23.8954743.8882±8.090260.049
Minimum−463.3971±82.96620−139.9365±23.017460.027
Range1090.9231±209.12290287.8238±56.077300.039
Power36086.6608±9247.340972906.0389±1010.518370.035
ThetaMean0.0003±0.00115−0.0021±0.000800.031
Left DLPFCDeltaForm factor−11239.8840±12182.929078501.4417±3611.424830.007
ThetaForm factor581.4022±241.65339−1426.4185±361.01432<.001
BetaKurtosis16.1049±10.6666940.2326±35.881800.017
Right DLPFCDeltaMean−0.0051±0.001990.0022±0.001400.007
Entropy1.1264±0.074451.2432±0.044520.049
ThetaMean0.0029±0.00132−0.0021±0.001180.010
AlphaMean0.0031±0.00108−0.0002±0.000280.035
BetaMean−0.0009±0.000280.0000±0.000190.027
Left IPLDeltaMean0.0143±0.00337−0.0002±0.001370.007
ThetaMean−0.0085±0.001950.0009±0.001220.004
Kurtosis16.1318±4.679974.4192±0.342350.027
Crest factor5.8362±0.502244.2454±0.148540.017
AlphaSkewness−0.4935±0.18339−0.0008±0.008080.006
Right IPLAlphaForm factor957.9047±361.44474−1626.4591±1132.253930.019
BetaKurtosis5.8536±1.825234.8403±0.321380.049
Crest factor4.9846±0.387325.7063±0.318530.027
Left STGAlphaForm factor−4635.7358±4093.855661340.2171±666.574970.017

SEM – Standard error of mean; IFG – Inferior frontal gyrus; DLPFC – Dorsolateral prefrontal cortex; IPL – Inferior parietal lobule; STG – Superior temporal gyrus; SCZ – Schizophrenia patients; HC – Healthy controls; PS – Schizophrenia patients with predominant positive symptoms; NS – Schizophrenia patients with predominant negative symptoms

Significant features found statistically significant between the groups SEM – Standard error of mean; IFG – Inferior frontal gyrus; DLPFC – Dorsolateral prefrontal cortex; IPL – Inferior parietal lobule; STG – Superior temporal gyrus; SCZ – Schizophrenia patients; HC – Healthy controls; PS – Schizophrenia patients with predominant positive symptoms; NS – Schizophrenia patients with predominant negative symptoms

Positive symptoms versus negative symptoms (experiment-2)

On comparing SCZ patients with predominantly PS and those with predominantly NSs, of all the extracted features, 28 of those mostly belonging to low-frequency activity (delta, theta, and alpha) in bilateral IFG, DLPFC, and IPL were found to be significantly different [Table 2B].

Machine-learning-Based classification

Across various data division protocols [Table 3A], the “hold-out” method using Quadratic SVM was able to classify the originally grouped SCZ patients and healthy controls most accurately (Accuracy 78.95%; Sensitivity-92.31%; Specificity-50%; AUC-71.15%). Further when significant features specific to individual ROIs were input separately to the model, we found gamma activity related features in the right IFG to be most accurate (Accuracy 78.95%; Sensitivity-84.62%; Specificity-66.67%; AUC-75.64%), followed by left DLPFC (Accuracy 73.68%; Sensitivity-91.67%; Specificity-42.86%; AUC-67.26%) and right STG (Accuracy 73.68%; Sensitivity-100%; Specificity-16.67%; AUC-58.33%).
Table 3

Classification results using significant features

A. Experiment-1: SCZ (n=38) versus HC (n=20)

10-fold methodSVM ModelHold-out method


ACC (%)SEN (%)SPE (%)AUC (%)ACC (%)SEN (%)SPE (%)AUC (%)
72.4186.8445.0065.92Linear SVM73.6892.3133.3362.82
70.6978.9555.0066.97Quadratic SVM78.95††92.3150.0071.15
63.7968.4255.0061.71Cubic SVM68.4276.9250.0063.46
70.6997.3720.0058.68Fine Gaussian SVM68.42100.000.0050.00
63.7992.1110.0051.05Medium Gaussian SVM73.68100.0016.6758.33
65.5297.375.0051.18Coarse Gaussian SVM68.42100.000.0050.00

B. Experiment-2: PS (n=18) versus NS (n=10)

10-fold methodSVM ModelHold-out method


ACC (%)SEN (%)SPE (%)AUC (%)ACC (%)SEN (%)SPE (%)AUC (%)

85.7188.8980.0084.44Linear SVM88.89100.0075.0087.50
82.1483.3380.0081.67QUADRATIC SVM88.89100.0075.0087.50
82.1483.3380.0081.67Cubic SVM88.89100.0075.0087.50
64.29100.000.0050.00Fine Gaussian SVM55.56100.000.0050.00
89.29††100.0070.0085.00Medium Gaussian SVM77.78100.0050.0075.00
64.29100.000.0050.00Coarse Gaussian SVM55.56100.000.0050.00

†† Model showing highest accuracy. SVM – Support vector machine; ACC – Accuracy; SEN – Sensitivity; SPE – Specificity; AUC – Area under receiver operating curve; SCZ –Schizophrenia patients; HC – Healthy controls; PS – Schizophrenia patients with predominant positive symptoms; NS – Schizophrenia patients with predominant negative symptoms

Classification results using significant features †† Model showing highest accuracy. SVM – Support vector machine; ACC – Accuracy; SEN – Sensitivity; SPE – Specificity; AUC – Area under receiver operating curve; SCZ –Schizophrenia patients; HC – Healthy controls; PS – Schizophrenia patients with predominant positive symptoms; NS – Schizophrenia patients with predominant negative symptoms For the classification of PS and NS groups [Table 3B], the 10-fold method using medium Gaussian SVM was found to most accurately (Accuracy 89.29%; Sensitivity-100%; Specificity-70%; AUC-85%) classify the conventionally grouped patients. Subsequent region-specific analysis revealed that delta activity-related features in the left IFG were found to be most accurate (Accuracy 89.29%; Sensitivity-88.89%; Specificity-90%; AUC-89.44%) followed by right IFG (Accuracy 82.14%; Sensitivity-83.33%; Specificity-80%; AUC-81.67%).

DISCUSSION

AI-based interpretation of brain activity allows for limitless interesting applications for man-machine interactions. The recently developed neuro-prosthetic-the exoskeleton, which uses brain signals to drive a tetraplegic patient to move,[32] is an extreme advancement that highlights its importance to neuroscience. As introduced, more essential for psychiatry is a valid and objective classification of “disease” entities. Gathering brain signals of SCZ patients and applying them to translate into tools that can identify “disorder” from “normal” or classify sub-groups accurately without referring to any subjective diagnostic criteria has been the emphasis. Although much more work needs to be done from genetic, physiological, and clinico-social standpoint in “validating” the diagnosis of SCZ, application of AI- or ML-based models on various clinical parameters represent a small yet significant futuristic step towards that goal. This study, a first from India applying sML-based methods on high-density EEG data, demonstrates that this technique can distinguish SCZ probands from healthy and also classify SCZ patients with positive symptoms from those with negative symptoms, with optimum accuracies. The sML tool used in our study, i.e., Kernel-SVM method, which avoids over-fitting and hence, better generalization capabilities, has been used in earlier SCZ classification studies as well.[1921333435] The accuracy of 78.95%, although comparable, is lower than those reported by these studies, where the rates range from 81% to 91%. Some of these studies have additionally used top-up models, such as adaptive boosting[3334] or add-on features such as complexity[33] or source-level features in addition to sensor level ones.[19] However, some studies have used very few features.[35] Apart from the use of SVM for SCZ classification, other ML-methods used in previous studies are the Stockwell transform[20] and deep learning models.[36] Ours is the first study that attempted to sub-classify SCZ probands using EEG and sML methods. Moreover, we found that the features selected could classify PS and NS subgroups fairly accurately (89.29%). Perhaps, they could classify the subgroups with 100% sensitivity. Such perfect measures have been linked to the Stockwell transform ML-tool.[20] This particular finding suggests that SVM models might be better used for sub-classification of SCZ, which consequently helps in reducing the heterogeneity within SCZ. More importantly, this is the first-ever study to use data recorded using 256 channels. This allowed us to select ROIs, which has been possible only in neuroimaging studies using MRI data[3738] so far. Our results showed that right IFG > left DLPFC > right STG features classified SCZ from healthy the best. Using sequential ROI selection, Chin et al.[38] using structural MRI data identified 7 ROIs as optimal discriminatory subset of which only superior-, middle- and frontal gyrus and STG are the cortical regions that can be anatomically projected from EEG data. ROIs found to be the best classifiers in our study precisely correspond to these regions. Further, we show that left IFG > right IFG were the regions with most accurate features for classifying PS and NS sub-groups. Another interesting finding from our results is the way low frequency and high-frequency EEG activities characterize themselves in the two experiments. While on one hand, predominantly high frequency related features predict the classification of SCZ and healthy controls, on the other, predominantly low frequency related features predict the classification of positive and NS subgroups. This finding is important to the application of ML-based tools to functional connectivity measures.[39] While low-frequency activity implies long-range connectivity, high-frequency activity underpins local or short-range connectivity.[40] Hence, our study results might suggest specifically investigating local connectivity measures for classifying SCZ from healthy and long-range connectivity for classifying SCZ sub-groups.

Strengths and limitations

The obvious strength is in the use of high-density EEG data, which allowed for features to be selected for predefined ROIs. The use of resting-state data allows for an easier reproducibility. The use of an array of methods for feature selection and classification is a highlight of the study. Although fairly adequate for the classification of SCZ from healthy, the sample size used for sub-classification is deemed limited. Furthermore, the unequal distribution of samples in the second experiment might have contributed to the low and inconsistent specificity scores. The current study used an existing dataset and hence had to use the operational criteria for sub-classifying patients retrospectively. Fresh recruitment of SCZ sub-groups is suggested for future studies. Another important limitation of the study is the confound of multiple comparisons. We did not apply a correction for multiple comparisons in the “statistical significance analysis” because even after assuming a high (i.e., 50%) false detection rates in the Benjamini–Hochberg correction method, which is recommended for analysis with very high number of comparisons (such as 480 as in our case), none of the variables in experiment-1 and only one variable in experiment-2 found statistically significant at P < 0.05 could survive significance after applying the correction.[41] Hence, the possibility of inclusion of false-positively significant extracted features in the classification using ML cannot be refuted. Finally, the results of the study might only be generalized to acutely symptomatic and moderately ill male SCZ patients.

CONCLUSIONS

We conclude that SVM-based classification and sub-classification of SCZ using EEG data is optimal and might help in improving the “validity” and reducing the “heterogeneity” in the diagnosis of SCZ. High- and low- frequency-related features accurately classify SCZ from healthy and positive from NSs SCZ, respectively. Moreover, ROI contributing most accurate features is the IFG for both the classificatory instances. Caution needs to be exercised while generalizing these results as they may be limited to only acutely symptomatic and moderately ill male SCZ patients.

Financial support and sponsorship

Data used in this study was retrieved from the DBT-INCRE fellowship project funded by the Department of Bio Technology (DBT), Ministry of Science and Technology, Government of India. The Fellowship was awarded to SKT. All India Institute of Medical Sciences. Raipur and National Institute of Technology, Raipur have signed a Memorandum of Understanding for Research Collaboration.

Conflicts of interest

There are no conflicts of interest. Scheme for extraction of electroencephalographic bands using wavelet decomposition and feature extraction
  32 in total

Review 1.  Revisiting the diagnosis of schizophrenia: where have we been and where are we going?

Authors:  William R Keller; Bernard A Fischer; William T Carpenter
Journal:  CNS Neurosci Ther       Date:  2010-12-27       Impact factor: 5.243

2.  Classification of schizophrenia using Genetic Algorithm-Support Vector Machine (GA-SVM).

Authors:  Ming-Hsien Hiesh; Yan-Yu Lam Andy; Chia-Ping Shen; Wei Chen; Feng-Shen Lin; Hsiao-Ya Sung; Jeng-Wei Lin; Ming-Jang Chiu; Feipei Lai
Journal:  Conf Proc IEEE Eng Med Biol Soc       Date:  2013

3.  Alterations in resting-state gamma activity in patients with schizophrenia: a high-density EEG study.

Authors:  Máté Baradits; Brigitta Kakuszi; Sára Bálint; Máté Fullajtár; László Mód; István Bitter; Pál Czobor
Journal:  Eur Arch Psychiatry Clin Neurosci       Date:  2018-03-22       Impact factor: 5.270

4.  Resting State Dense Array Gamma Oscillatory Activity as a Response Marker for Cerebellar-Repetitive Transcranial Magnetic Stimulation (rTMS) in Schizophrenia.

Authors:  Sai Krishna Tikka; Shobit Garg; Vinod Kumar Sinha; S Haque Nizamie; Nishant Goyal
Journal:  J ECT       Date:  2015-12       Impact factor: 3.635

Review 5.  High vs low frequency neural oscillations in schizophrenia.

Authors:  Lauren V Moran; L Elliot Hong
Journal:  Schizophr Bull       Date:  2011-06-07       Impact factor: 9.306

6.  Negative v positive schizophrenia. Definition and validation.

Authors:  N C Andreasen; S Olsen
Journal:  Arch Gen Psychiatry       Date:  1982-07

7.  Entropy and complexity measures for EEG signal classification of schizophrenic and control participants.

Authors:  Malihe Sabeti; Serajeddin Katebi; Reza Boostani
Journal:  Artif Intell Med       Date:  2009-04-29       Impact factor: 5.326

8.  Discriminative analysis of schizophrenia using support vector machine and recursive feature elimination on structural MRI images.

Authors:  Xiaobing Lu; Yongzhe Yang; Fengchun Wu; Minjian Gao; Yong Xu; Yue Zhang; Yongcheng Yao; Xin Du; Chengwei Li; Lei Wu; Xiaomei Zhong; Yanling Zhou; Ni Fan; Yingjun Zheng; Dongsheng Xiong; Hongjun Peng; Javier Escudero; Biao Huang; Xiaobo Li; Yuping Ning; Kai Wu
Journal:  Medicine (Baltimore)       Date:  2016-07       Impact factor: 1.889

9.  Recognition of Schizophrenia with Regularized Support Vector Machine and Sequential Region of Interest Selection using Structural Magnetic Resonance Imaging.

Authors:  Rowena Chin; Alex Xiaobin You; Fanwen Meng; Juan Zhou; Kang Sim
Journal:  Sci Rep       Date:  2018-09-14       Impact factor: 4.379

Review 10.  EEG Frequency Bands in Psychiatric Disorders: A Review of Resting State Studies.

Authors:  Jennifer J Newson; Tara C Thiagarajan
Journal:  Front Hum Neurosci       Date:  2019-01-09       Impact factor: 3.169

View more
  2 in total

1.  Schizophrenia: A Narrative Review of Etiopathogenetic, Diagnostic and Treatment Aspects.

Authors:  Laura Orsolini; Simone Pompili; Umberto Volpe
Journal:  J Clin Med       Date:  2022-08-27       Impact factor: 4.964

2.  An integrated machine learning framework for a discriminative analysis of schizophrenia using multi-biological data.

Authors:  Peng-Fei Ke; Dong-Sheng Xiong; Jia-Hui Li; Zhi-Lin Pan; Jing Zhou; Shi-Jia Li; Jie Song; Xiao-Yi Chen; Gui-Xiang Li; Jun Chen; Xiao-Bo Li; Yu-Ping Ning; Feng-Chun Wu; Kai Wu
Journal:  Sci Rep       Date:  2021-07-19       Impact factor: 4.379

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.