| Literature DB >> 35395049 |
Zifan Jiang1,2, Mark Luskus3, Salman Seyedi1, Emily L Griner4, Ali Bahrami Rad1, Gari D Clifford1,2, Mina Boazak5, Robert O Cotes4.
Abstract
BACKGROUND: Schizophrenia is a severe psychiatric disorder that causes significant social and functional impairment. Currently, the diagnosis of schizophrenia is based on information gleaned from the patient's self-report, what the clinician observes directly, and what the clinician gathers from collateral informants, but these elements are prone to subjectivity. Utilizing computer vision to measure facial expressions is a promising approach to adding more objectivity in the evaluation and diagnosis of schizophrenia.Entities:
Mesh:
Year: 2022 PMID: 35395049 PMCID: PMC8992987 DOI: 10.1371/journal.pone.0266828
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1PRISMA flow diagram.
Overview of the participants, objective types, descriptions and findings.
| Article | Year | Subject | Type | Description | Findings |
|---|---|---|---|---|---|
| [ | 2007 | 11 SZ, 10 NC | Descriptive | Developed a computational framework to quantify intended emotional expression differences between patients with schizophrenia and healthy controls matched for age, ethnicity, and gender. | Significant difference in average abilities to express emotions, especially in the case of anger. The average abilities to express emotions correlated significantly with clinical severity of flat affect. |
| [ | 2007 | 12 SZ, 12 NC | Descriptive | Provided a framework to quantify the facial expression abnormality of patients with schizophrenia in posed and evoked emotions by combing 2D and 3D facial features and compared with results from human raters. | Human raters can only correctly identify a low percentage (mostly 40% to 70% except for happiness) of intended emotions for both controls and patients, but showed different accuracies for controls and schizophrenia patients. Significant group difference in evoked disgust was found. |
| [ | 2007 | 12 SZ, 12 NC | Descriptive | Captured facial expressions of individuals and quantified their expression flatness by estimating overlap between different facial expression clusters in the learned embedding. | Patient group has much larger facial expression overlap than the control group, and demonstrate that the flat affect is an important symptom in diagnosing schizophrenia patients. |
| [ | 2008 | 1 SZ, 1 NC | Descriptive | Created an automated computerized scoring system as an alternative to FACS for systematic analysis of facial expressions of healthy controls, schizophrenia patients and patients with Asperger’s syndrome. | The healthy control expressed intended emotion better than the patient with Asperger’s and schizophrenia (especially in the fear). The control has more neutral expression than the two patients. |
| [ | 2010 | 27 SZ, unreported number of NC | Descriptive | Authors aimed to determine whether automated video-based quantification of body movement could be reliable indicators for nonverbal behavior in schizophrenia patients, and if body movement is valid as a measure of expressiveness. | Automated MEA-based detection of body and head movement and movement speed was found to be highly reliable, with clear indications for its validity. MEA provides an objective assessment of body movement. |
| [ | 2011 | 4 SZ, 4 NC | Descriptive | Developed an automated FACS based on advanced computer science technology and derived quantitative measures of flat and inappropriate facial affect automatically from temporal AU profiles. | NA |
| [ | 2013 | 20 SZ, 100NC | Descriptive and predictive | Determined whether schizophrenia patients display less speaking gestures and listener nods and whether patients’ increased symptom severity and poorer social cognition are associated with patients’ reduced gesture and nods. Additionally, authors aimed to determine if patients’ partners compensate for patients’ reduced nonverbal behavior by gesturing more when speaking and nodding when listening. | Patients with schizophrenia exhibit reduced rates of gesture making compared to healthy controls. Increased levels of negative symptoms are associated with poorer rapport with patients. |
| [ | 2014 | 28 SZ, 26 NC | Descriptive and predictive | The authors worked to develop novel measures of facial expressivity using information theory. In particular, they developed measures of ambiguity and distinctiveness in facial expressivity, and hoped that these measures could be used to analyze large data sets of dynamic expressions. | Results indicated that ambiguity and distinctiveness of expression were both associated with a diagnosis of schizophrenia. The method developed is more repeatable and objective than observer-based rating scales. Predictions were highest for measures of overall facial expression, with an F-score of 12. |
| [ | 2015 | 34 SZ, 33 NC | Descriptive and predictive | This study aimed to pair data-driven analysis of facial expression with descriptive methods using machine learning tools and other technology. | Results from this study are in agreement with previous studies, which demonstrate that schizophrenia symptoms result in changes to AUs when compared to healthy controls. |
| [ | 2016 | 34 SZ, 33 NC | Descriptive and predictive | The authors aimed to create‘prototype’ facial expression clusters in order to study a wider range of facial features than traditional AU and FACS computation allows for. | The authors findings were consistent with prior studies, which showed that schizophrenia patients overall have lower levels of facial expressivity. |
| [ | 2016 | 34 SZ, 33 NC | Descriptive | The authors aimed to compute discriminative features of AU activity for the purpose of measuring the following qualities, which represent symptomology used in the diagnosis of schizophrenia: flat affect, incongruent affect, and inappropriate affect. | In contrast with previous studies, the authors found that patients with schizophrenia exhibited reduced amounts of expression in positive emotional responses. Their findings also suggest that the magnitude of changes in facial expression may correlate to symptom severity. |
| [ | 2016 | 18 SZ | Descriptive and predictive | Overarching goal was to create novel methods for examining clinical behavior by identifying behavioral indicators relevant to various symptoms. Application to psychiatric populations could provide needed method to collect objective behavioral data. Authors worked to identify behavior indicators relevant to certain psychosis symptoms as measured by clinical scales and determine which structured interview questions correlate to facial findings suggestive of specific psychotic symptoms. | Negative and positive symptoms are best elicited via different questions. E.g. positive symptoms were elicited via questions regarding the patient’s energy, and negative symptoms were elicited via questions regarding self-confidence. AU5 and AU6 are activated more frequently in patients with depression. AU12 is negatively correlated with the PANSS Negative summative scale. Overall conclusion was that AUs can be used to detect psychotic symptoms as measured on the PANSS, BPRS, and MADRS. There is value at evaluating facial expressions at the question level. |
| [ | 2017 | 1 SZ, 1 NC | descriptive | Compared facial expressions of a patient with schizophrenia and a healthy control, utilizing marker-based technology that recognizes facial features. | Facial expressivity intensity was higher in the healthy control and analysis of facial expressions using marker-based technology displays high fidelity. |
| [ | 2018 | 91 SZ | Descriptive and predictive | Proposed SchiNet, a novel neural network architecture, trained on large-sclae FACS datasets, that estimates presence and intensity of action units. Then it is used to predict expression-related symptoms from two commonly-used assessment interviews; Positive and Negative Syndrome Scale (PANSS) and Clinical Assessment Interview for Negative Symptoms (CAINS). | Significant correlations are found between symptoms and the frequency of occurrence of automatically detected facial expression. The score of several symptoms in the PANSS and CAINS interviews can be estimated with a MAE less than 1 level. Automatic estimation of symptom severity needs further improvement to reach human level performance. |
| [ | 2019 | 25 SZ | Descriptive and predictive | Develop a proof-of-concept for the potential of using the machine learning FAR system as a clinician-supporting tool, in an attempt to improve the consistency and reliability of mental status examination. | There is a lack of inter-rater reliability between five senior adult psychiatrists working in the same mental health center. Automatic facial analysis may be able to predict the label provided by psychiatrists. |
| [ | 2019 | 74 SZ | Predictive | Incorporated temporal information into the SchiNet using stacked GRU to directly addresses the problem of Treatment Outcome Estimation (TOE) in schizophrenia—more specifically, is aimed at determining whether specific symptoms have improved or not by analysing jointly two videos of the same patient, one before and one after the treatment. | Proposed method can determine The TOE of CAINS expression symptoms and PANSS negative symptoms with an accuracy of about 0.7 (0.64–0.71) and a F1 score of around 0.4 (0.33–0.46). The determination is more accurate with proposed specifically designed TOE method than applying symptom severity estimation to before and after treatment independently. |
| [ | 2021 | 18 SZ, 9 NC | Descriptive and predictive | Developed a remote, smartphone based assessments to capture objective measurement of head movement, which were then used as features to predict both PANSS subscale scores and individual items in each of those subscales, with age and gender as confounding variables. | Head movements acquired remotely through smartphone were able to classify schizophrenia diagnosis and quantify symptom severity in patients with schizophrenia. |
SZ = schizophrenia patient, NC = normal control.
Overview of participant interviews.
| Article | Passive/Evoked | Interview Structure |
|---|---|---|
| [ | Evoked | Subjects were asked to make facial expressions of happiness, sadness, anger, and fear. |
| [ | Both | Participants were asked to express happiness, anger, fear, sadness, and disgust at mild, moderate, and peak levels, respectively. In the evoked session, participants were guided through vignettes, which were provided by the participants themselves and describe a situation in their life pertaining to each emotion. |
| [ | Passive | Researchers conducted role-play tests (RPT), which were used to measure social competence in schizophrenia. All RPTs were video recorded. Each test consisted of 14 social scenes that represented three response domains. |
| [ | Evoked | Researchers had participants express the following emotions: happiness, sadness, anger, fear, and disgust. |
| [ | Both | Patients were recorded expressing sadness, anger, happiness, fear, and disgust. Each emotion recording lasted for approximately 2 minutes. Additionally, patients were recorded while being read self-recorded vignettes about times in their life in which they experienced these emotions. |
| [ | Passive | The interview was semi-structured and involved a single question of“Tell me about yourself” followed by three emotionally evocative questions that were not described. |
| [ | Unknown | Participants underwent a short, structured interview that was not described by the authors. |
| [ | Passive | Participants were interviewed in a style consistent with a routine clinical encounter for a patient under inpatient treatment for schizophrenia. The interview was semi-structured, consisted of 13 questions, and was approximately 10 minutes in length. |
| [ | Passive | Recordings from a previous [ |
| [ | Passive | Participants underwent a semi-structured 10-minute interview that consisted of the following ten questions: (1) Can you please present yourself and tell me a bit about yourself? (2) How do you feel? (3) Can you tell me about the events that led to your current hospitalization? (4) Can you tell me some things about your family? (5) Can you tell me of something sad that has recently happened to you? (6) Can you tell me of something pleasing that has recently happened to you? (7) Is there anything else you want to add? (8) What do you think about the recent situation in the country? (9) What are your future plans? (10) How did you feel about talking with me in front of the camera? |
| [ | Evoked | Open-ended questions such as “What have you been doing for the past few hours?” and “What are your plans for the rest of the day?” were asked to elicit a free verbal response. |
Overview of data processing and statistical analyses.
| Article | Frame-level Features | Subject-level Features | Statistical Tests | Performance Metrics | Validation |
|---|---|---|---|---|---|
| Studies with 2D or 3D image data | |||||
| [ | SVM output of the intended expression normalized with outputs from other SVMs. | Average normalized output. | Paired t-test | PCC | NA |
| [ | 2D features: the area of facial regions, the distance between some fiducial points; 3D Curvature Features and 3D Gabor moment invariants for six facial regions. | Lower dimensional embedding of the frame level features was learned with the ISOMAP manifold learning algorithm. | Paired t-test | NA | NA |
| Studies with video data | |||||
| [ | Geometric features similar to [ | Lower dimensional embedding of the frame level features was learned with the ISOMAP manifold learning algorithm. A “Flatness Index” was defined as the minimal pair-wise overlap between one expression to other expressions in the ISOMAP embedding. | Paired t-test | NA | NA |
| [ | Motion energy: the amount of grayscale changes from one frame to the next in the ROIs normalized by ROI size. | Percentage of time with detectable movement in ROIs and the speed of body movement. | paired t-tests; ANOVA | PCC; Cronbach’s alpha | NA |
| [ | Confidence and presence of the 15 AUs. | Frequency (percentage of frames presented) of single AUs and AU combination; Flatness measure: frequency of neutral frames (no AU was present); Inappropriateness measure: frequency of“disqualifying” AUs defined in [ | NA (method paper) | NA | NA |
| [ | Confidence and presence of the 15 AUs. | Same as [ | two-way ANOVA; | PCC; Cohen’s d | None |
| [ | Intensities and presence of the 20 AUs. | Mean and standard deviations of intensities of each AU during answers to specific questions. | NA | PCC | LOSO |
| [ | Normalized intensities of ten AUs and smile. | The Fisher vector representation of the distribution of intensities over time, from unsupervised learned Gaussian Mixture Model. | NA | SCC, PCC, MAE, RMSE | LOSO |
| [ | Intensities of seven emotions: norm, anger, disgust, fear, happiness, sadness and surprise; Mean grey scale of the face. | Mean intensity of the emotions; Number of transitions of emotions; Standard deviation of mean gray scale. | NA | ACC | LOSO |
| [ | Normalized intensities of ten AUs and smile. | Two stacked GRUs were used to extract clip-level (15s segments of the videos) and patient-level representations. | NA | F1, ACC | LOSO |
| [ | Head location of each subject relative to the camera. | Average head movement. | t-test | None | |
| Studies with IR videos | |||||
| [ | Identities of listener and speaker; Head and hand locations of each subject. | Head and hand movement rate; Percentage of time spent in speaking, nodding/gesture as listener of the patients, patients’ partners and controls. | t-test | QICC, SE | None |
| [ | 3D locations of the facial markers. | Average value of distances traveled by markers during shifts from a neutral position. | NA | NA | NA |
| Studies with depth camera videos | |||||
| [ | Output of the five SVMs trained for classifying five expressions (happiness, sadness, anger, fear and neutral). | Output of SVMs were modeled as the observed variable in HMM, where the hidden variable indicates emotions. Four features were used: 1. the average of posterior probabilities of intended and neutral emotions; 2. the occurrence frequency of the appropriate and neutral expressions. | NA (method paper) | NA | NA |
| [ | Activity level of each AU. | Activation Ratio: Fraction of segment during which the AU was activated; Activation Level: Mean intensity of AU activation; Activation Length: Number of frames that the AU activation lasted; Change Ratio: fraction of the period of AU activation when there was a change in activity level; Fast Change Ratio: fraction of fast changes in activation level. | One-way ANOVA, t-test | AUC, PCC | LOSO |
| [ | Activity level of each AU. | Richness: how many prototype expressions appeared; Typicality: how similar they were to the prototype. Distribution: which expressions were more prevalent. | Bonferroni correction, one-way ANOVA, t-test | PCC, AUC | LOSO |
| [ | Activity level of each AU. | Flatness Measures: the sum of the variance in facial activity for similarly/differently rated photos; Congruity Measures: the ratio between the sum of the variance within similarly rated photos and the sum of total variance; Inappropriateness measure: the sum of the squared difference between the average facial activity of all controls and each subject’s individual facial activities. | t-test | PCC, Cohen’s d | LOSO |
ACC = accuracy, Adj. R2 = adjusted R square, ANOVA = analysis of variance, AU = action unit, GRU = gated recurrent units, HMM = hidden Markov model, ISOMAP = isometric mapping, LOSO = leave-one-(subject)-out, MAE = mean absolute error, PCC = Pearson correlation coefficient, QICC = corrected quasi-likelihood under independence model criterion, RMSE = root mean square error, ROI = region of interest, SCC = Spearman rank-order correlation coefficient, SE = standard error. “NA” in the Statistical Tests column indicates that no statistical test was used or was clearly reported. “NA” in the validation column indicates that no classification or regression was conducted in the study, hence the validation was not needed.
Fig 2Visualization of the data pipelines.
Different combinations of the methods in each section were adopted in different studies. Pre-processing and recognition methods used in commercial software were not included due to the lack of clarify on what algorithms were used in them. The face used in illustration was an average face generated from http://faceresearch.org/demos/average, which is available open access (CC-BY-4.0) [50]. The icons used in the figure are available open access (CC-BY) from the NounProject.com. 2D/3D: two/three-dimension, ANOVA: analysis of variance, AUROC: area under the receiver operating characteristic, CNN: convolutional neural network, IR: infrared, ISOMAP: isometric mapping, KNN: k nearest neighbor, LDA: linear discriminant analysis, ML: machine learning, RNN: recurrent neural network, ROI: region of interest, SVM: support vector machine.