Literature DB >> 36078600

Use of Laughter for the Detection of Parkinson's Disease: Feasibility Study for Clinical Decision Support Systems, Based on Speech Recognition and Automatic Classification Techniques.

Miguel Terriza1,2, Jorge Navarro3, Irene Retuerta4, Nuria Alfageme1,2, Ruben San-Segundo5, George Kontaxakis6, Elena Garcia-Martin7,8, Pedro C Marijuan4, Fivos Panetsos1,2.   

Abstract

Parkinson's disease (PD) is an incurable neurodegenerative disorder which affects over 10 million people worldwide. Early detection and correct evaluation of the disease is critical for appropriate medication and to slow the advance of the symptoms. In this scenario, it is critical to develop clinical decision support systems contributing to an early, efficient, and reliable diagnosis of this illness. In this paper we present a feasibility study for a clinical decision support system for the diagnosis of PD based on the acoustic characteristics of laughter. Our decision support system is based on laugh analysis with speech recognition methods and automatic classification techniques. We evaluated different cepstral coefficients to identify laugh characteristics of healthy and ill subjects combined with machine learning classification models. The decision support system reached 83% accuracy rate with an AUC value of 0.86 for PD-healthy laughs classification in a database of 20,000 samples randomly generated from a pool of 120 laughs from healthy and PD subjects. Laughter could be employed for the efficient and reliable detection of PD; such a detection system can be achieved using speech recognition and automatic classification techniques; a clinical decision support system can be built using the above techniques. Significance: PD clinical decision support systems for the early detection of the disease will help to improve the efficiency of available and upcoming therapeutic treatments which, in turn, would improve life conditions of the affected people and would decrease costs and efforts in public and private healthcare systems.

Entities:  

Keywords:  PD; Parkinson´s disease; artificial intelligence; automatic classification techniques; biomarker; clinical decision support systems; laugh; machine learning

Mesh:

Year:  2022        PMID: 36078600      PMCID: PMC9518165          DOI: 10.3390/ijerph191710884

Source DB:  PubMed          Journal:  Int J Environ Res Public Health        ISSN: 1660-4601            Impact factor:   4.614


1. Introduction

Parkinson’s disease (PD) is a neurodegenerative disorder, the main pathological characteristic of which is degeneration of the cells of the substantia nigra (SN) that produce dopamine. The drop in the level of dopamine causes the onset of typical motor symptoms (Figure 1) [1,2]. PD is characterized by a wide range of clinical features which include both motor and non-motor symptoms [3]. Regarding motor symptoms, PD patients express bradykinesia/akinesia, rigidity, postural instability, and rest tremor. Akinesia is the difficulty of initiating a movement; it causes a decrease of the voluntary acts, and it is often associated with bradykinesia, a slowdown of the speed of movements. PD is the most common neurodegenerative disease after Alzheimer’s, with over 10,000,000 cases worldwide, and high associated social and economic burdens that reached $52 billion in the USA and €14 billion in the EU. Male patients’ incidence rate is twice as high as females’ [4].
Figure 1

Simplified representation of how Parkinson’s disease affects speech and laughter. Speech/laughter decision-making cortical areas activate the motor commands–execution circuit (arrow 1a) as well as the basal ganglia–thalamus circuit (arrow 1b), which modulates the activity of these commands (arrow 3). Motor commands–execution areas send their output (arrow 4) to the motor nuclei which control muscles that generate speech/laughter sounds (arrow 5). In green, excitatory neuronal activity; in red, inhibitory neuronal activity; in grey, activity of dopaminergic neurons. Intense color indicates high neuronal activity; light color indicates low neuronal activity. In healthy subjects (left scheme), SNc-produced dopamine excites striatum neurons that inhibit SNr-GP inhibitory neurons. Low inhibitory input to the thalamus (arrow 2) is the ideal condition for the correct modulation of the motor commands (arrow 3), as well as the coordination of the motor nuclei (arrow 4) and of the corresponding muscles (arrow 5). In Parkinson’s disease (right scheme), the reduced SNc dopamine slows down striatum neurons, increasing SNr-GP inhibitory output (arrow 2). The inhibited thalamus fails in the modulation of cortical nuclei (arrow 3), losing the coordination of the motor nuclei (arrow 4) and provoking motor disorders (arrow 5). SNc, substantia nigra compacta; SNr, substantia nigra reticulata; GP, globus pallidus.

Clinical decision support systems for the evaluation of neural PD damages are based on biomarkers like motor, functional, and behavioral alterations of the patient [4,5]. However, PD motor symptoms are not only limited to upper and lower limb movements; they also affect mouth articulation and laryngeal muscles coordination [6]. Indeed, throughout the course of the disease, 90% of patients develop “hypokinetic dysarthria”, a disorder characterized by volume and pitch variation in their voice, inconstant speech rate, imprecise articulation of the consonants, presence of breath noise, as well as lack of coordination or even paralysis of speech mechanisms, which in turn affect phonation, articulation, and prosody [7]. Thanks to the powerful signal processing technology, very fine speech alterations have been identified in PD patients: articulation abnormalities [8,9], phonation variations, reduction of fundamental frequency variability, etc. [10,11]. However, speech alterations by themselves cannot be used as PD biomarkers since several studies reported their ineffectiveness for the detection of the disease [12,13]. Performances can improve by using more complex features to parametrize speech signals, also combined with machine learning techniques, similar to those used in speaker recognition problems [11,14,15]. However, none of these clinical decision support systems is oriented to the accurate detection of the disease. Laughter carries a significant amount of information [16], has long been considered a depression biomarker, and has been postulated as a candidate for the detection of other neurological disorders [17]. Furthermore, laughter is differentially affected by the diverse neurological disorders [18,19], which could make it useful in the discrimination of common syndromes (e.g., PD dementia) [20]. Based on the primitivity of laughter, we hypothesize that laughter-based systems could be more effective than speech-based ones for accurate detection of PD. Since laughter is a more primitive and less elaborate sound expression than speech, we expect subtle changes, normally covered by the complexity of the speech signals, to be detected. Indeed, we know from anatomical and physiological data that, for sound expression, speech and laughter processes share the same laryngeal, respiratory, abdominal, and maxillofacial muscles and joints [21], and that laughter is a primitive sound expression, less complex and less subject to voluntary control than speech [21,22]. Therefore, PD-originated motor dysfunctions will cause laughter alterations similar to speech ones. On the other hand, laughter has been proved to be a valid biomarker for decision support systems in diagnosis and evaluation of diseases involving motor syndromes, like depression [23]. Some speech recognition techniques have been used in PD patient identification with over 80% success rate [24,25,26]. Based on these premises, we hypothesize that PD-originated laughter alterations can be detected by means of speech recognition techniques. In the present paper, we provide evidence for the feasibility of clinical decision support systems for the accurate diagnosis of Parkinson’s disease based on the acoustic characteristics of laughter, analyzed with speech recognition methods, and categorized with automatic classification techniques. Following the scheme of Figure 2, laughs are preprocessed, and a database of laugh signals is created. Each laugh is framed (divided into small, partially overlapping windows) and power spectra are obtained by means of a Fourier transform. Then, each laugh is associated with a set of coefficients, real numbers representing specific changes in the frequencies of this laugh obtained by passing the signal through a set of simple filters. Part of the laughs dataset (laughs now represented by their corresponding coefficients) is used to train an automatic classification system and divide laughs as PD or non-PD. The performance of the automatic classification system is tested using the rest of the laughs in the dataset, that is, laughs not employed in the phase of training.
Figure 2

Temporal representation of one of the signals used in the study, followed by the steps of the analysis pipeline. DFT, digital Fourier transform. “Filter Banks” include Mel, Human Factor, and Bark filters.

2. Materials and Methods

2.1. Laughter Recordings and Preprocessing

Individual laughs (N = 120), 60 corresponding to healthy subjects and 60 corresponding to PD-suffering patients (equally divided between sexes), were extracted using Audacity [27] from recording sessions in which subjects were watching humoristic videos. Original audios were sampled at 44.1 KHz, then digitized at 16 bits and downsampled at 16 KHz. All subjects gave detailed consent to participate in this study, which was conducted in accordance with the guidelines established by the Ethics Committee of the Miguel Servet Hospital and based on the principles of the Declaration of Helsinki. The experimental protocol was approved by the local Ethics Committee (CEICA: Ethic Committee of Clinical Research of Aragon, Spain). Laughs were obtained from a clinical trial performed by the Aragon Institute of Health Science (IACS), Zaragoza, Spain. The Ethics Committee of Aragon revised and approved the clinical protocol of the study. The diagnosis of PD was based on standard clinical and neuroimaging criteria [28] and information about disease severity using the Hoehn Yahr scale [29]. Disease duration and treatment were recorded. Disease duration in the group of patients at the beginning of the study was 13.56 years (SD = 6.22). The median Hoehn Yahr stage at the beginning of the study was 2.68 (SD = 0.69). These are patients with early or moderate disease duration and severity.

2.2. Laughter Characterization Using Speech Recognition Techniques

Each laught was characterized by means of a vector of cepstral coefficients, i.e., mathematical identifiers containing information about signal changes in different spectrum bands [30]. The use of cepstral coefficients is very popular and commonly used in speech recognition problems [31]. The main advantage of audio characterization by cepstral coefficients is that we can separate the signal into two components, one corresponding to the source (vocal cavities, glottis, mandible, etc.) and the second to the speaker, without any a priori knowledge about the source [32]. Before cepstral coefficient analysis, signals are passed through non-linear scaled filters to mimic human pitch perception.

2.3. Cepstral Coefficients

Mel frequency cepstral coefficients (MFCCs) are one of the most frequent representations of a sound in speech recognition techniques. They are based on a linear cosine transform of a log power spectrum on a nonlinear Mel frequency scale, which resembles the psychoacoustic behavior of the human ear. MFCCs are obtained by means of a bank of triangular band-pass filters which convert the linear power spectrum on a logarithmic scale, the Mel scale [33]. To build our decision support system we have evaluated the performance of the classical MFCCs as well as two very common variations, Mel human factor cepstral coefficients (HFCCs) and Bark frequency cepstral coefficients (BFCCs) [33,34]. The three types have been employed in speech recognition-based PD decision support systems [25,33]. HFCCs are extracted using a Mel scale filter bank, the bandwidth of which varies according to the expression of the equivalent rectangular bandwidth (ERB). BFCCs employ a combined frequency representation of the acoustic signal, linear below 500 Hz and logarithmic above. Furthermore, unlike MFCCs, BFCCs employ a greater bandwidth for the higher frequencies. All coefficients were extracted from laugh signals, both from healthy people and from people with PD, using the generic extraction method and different banks of 26 filters. Normally the number of filters used varies between 20 and 40, with 24 and 26 being the most used [34].

2.4. Laughter Processing

The calculation of the different cepstral coefficients was carried out in seven steps, implemented in Matlab R2019a [35]. where H(z) is the amplitude difference between the output and the input of the filter, expressed in terms of Z-Transform. At higher k-values the attenuation of the low frequencies is greater. Here we have used k-values between 0.95 and 0.98 to attenuate DC offset, electrical noise, etc. Pre-emphasis filter corresponds to a first order high-pass filter. For C1 = k filter modifications, cut-off frequency (in this case 1840 Hz) is maintained; maximum attenuation is being modified for lower frequencies while increasing “k”. To reduce edge effects during DFT (distortions at the edges of the signal generated by the convolution of finite duration/length signals) we previously applied a Hanning window which reduces side lobe level amplitude. In the case of human-factor filters, the power spectrum of the signal is transformed to the Mel scale as above but, in this case, the relationship between bandwidth and the central frequency of each filter is corrected through the expression of the equivalent rectangular bandwidth (ERB) given by Equation (4) as a function of the central frequency (f In the case of Bark scale filters, the power spectrum of the signal is transformed into the Bark spectrum by passing the DFT through a series of filters corresponding to the Bark scale. The change in scale is given by Equation (5). Cepstral coefficients are calculated by computing the DCT of the log-spectrum of the signal obtained after passing through the corresponding filter bank given by Equation (6). with s(m) being the power spectrum of the signal after passing through the filter, “m” the m-st filter (m = 0 to M), and “n” the n-st coefficient (n = 0 to N). For speech recognition, 12 to 20 coefficients are used, with 13 being the most used since more coefficients provide redundant information and adds complexity to the systems [ Following what we did with the cepstral coefficients we calculated the mean (μ The highest score was obtained by the RF when fed with MFCC with AR = 83%. In the second place we find the SVM fed with HFCC with AR = 83%. The kNN algorithm performs worse (76% AR, with MFCC) and for this reason it has been excluded from further consideration for implementation in our decision support system. However, kNN behaves in a very stable way, showing AR values over 66% (over 70% with MFCC) with k = 1 to 5. The objective of this step is to compensate for the filtering effects exerted by the glottis and the vocal tract on the signal by enhancing the value of the higher frequencies. For this, a high-pass FIR filter (1) is applied to the original signal To process an acoustic signal that is continuously changing with time, the original signal is divided into very short segments in which we can assume that its characteristics are static. Further, we employ window overlapping to avoid large variations between the segments to be analyzed, this overlap being less than the size of the selected windows. In a preliminary analysis we have shown that laugh signals can be considered invariant in intervals of duration less than 30 ms. For our study we have used 25 ms-long windows with 10 ms inter-window overlap. After framing, the power spectrum of each window is calculated using Equation (2). We have used different filter banks, one for each type of cepstral coefficients. In the case of Mel scale filters, the scale of the power spectrum is transformed into a non-linear scale (Mel scale). For this, the power spectrum is multiplied with the Mel scale filter bank. This transformation is given by Equation (3). With the above procedure we obtain 13 cepstral coefficients for each of the T frames we divide each laugh into, with T being a high number that depends on the duration of the record. To characterize the laugh, we calculate the mean (μ (a) Representation of 6 filters corresponding to each bank, with lower M corresponding to filters with lower central frequency. At lower frequencies, the Bark and HFCC filters have a lower bandwidth; this bandwidth increases in relation to the filter’s central frequency, which is higher for higher frequencies. The bandwidth of the MFCC corresponds to [. The center frequencies of the filters correspond to those of Table 1. (b) Relation between bandwidth and central frequency of the filters in a logarithmic scale. Points correspond to filter M = 1:26.
Table 1

Central frequencies corresponding to each of the 26 filters for the three scales employed in this study: Mel, Human Factor, and Bark.

Filter NrMel (MFCC)Human Factor (HFCC)Bark (BFCC)
162.5031.2562.50
2156.25125.00156.25
3218.75187.50218.75
4312.50281.25312.50
5406.25375.00375.00
6531.25468.75468.75
7656.25593.75562.50
8781.25718.75656.25
9937.50843.75750.00
101093.751000.00875.00
111250.001156.251000.00
121437.501343.751156.25
131656.251531.251281.25
141875.001781.251468.75
152125.002000.001656.25
162406.252281.251843.75
172718.752562.502093.75
183062.502875.002343.75
193437.503250.002656.25
203812.503625.003000.00
214281.254031.253406.25
224750.004500.003875.00
235281.255031.254406.25
245875.005537.505093.75
256531.256187.505937.50
267218.756875.006906.25
For the identification of PD laughs for our decision support system, we have tested the performance of three supervised learning-based classification techniques: (1) Random Forest (RF) model based on the generation of T decision random trees [ The performance of the three different types of cepstral coefficients in the machine learning models was evaluated according to the accuracy rate (AR), validated through the Mathews correlation coefficient (MCC). Overall performance of the laugh identification-and-classification procedure, expressed by the AR, which represents the percentage of correct predictions given by Equation (9): We validated AR results through the Mathews correlation coefficient (MCC), a very good measure method employed in machine learning techniques [ A good overall performance, a high AR, is a necessary but not sufficient condition for the development of a clinically useful decision support system. One of the fundamental requirements is to minimize the percentage of false negative predictions (ill persons classified as healthy), thus, reducing the number of PD patients that could not be detected and, consequently, would not receive early medical care. For this reason, in addition to AR, we evaluated the sensitivity of the system, which means the capacity of the system to classify true PD patients as having PD, by means of the receiver operating characteristics curve (ROC, that relates the true positive rate (TPR), i.e., PD subjects correctly classified as PD patients, with the false positive rate (FPR), i.e., healthy subjects erroneously classified as PD ones, at various threshold settings. One of the most important metrics of the ROC curve is the area under the curve (AUC) that measures the degree of separability between the two classes (healthy and PD) [.

3. Results

A good clinical decision support system should not commit errors in the identification of true PD or, at least, they should minimize the number of such errors (high sensitivity—TPR). On the other hand, if we must choose between a low rate of false positive and a low rate of false negative identifications (healthy subjects classified as PD and PD subjects classified as healthy, respectively), for a clinically useful support system, the second choice is mandatory. Following these criteria, we chose to build the clinical decision support system by coupling the cepstral coefficients with an RF classification procedure. Finally, we evaluated the performance of the clinical decision support system on a dataset of 20,000 laughs of both sexes, randomly generated from healthy and PD subject laughs. None of these laughs was employed in the cepstral coefficients selection nor in the training or testing of the decision support systems. Random laughs of each type were generated with the same M and STD of the corresponding real laughs. Random laughs are generated by means of the “mvnrnd” function from Matlab, which generates normal multivariable random numbers. This function, represented as R = MVNRND(μ,σ,N), returns the N × D R matrix, where N represents the population and D the extracted features of randomly chosen vectors from the multivariate normal distribution with mean vector and covariance matrix generated by means of the variance of each feature. is a 1 × D vector and σ is a D × D symmetric matrix. For the generation of the laughs, and σ are obtained from the original post-processed laughs, which means from the statistical values of their coefficients. A numerous second data set of real laughs could be used. Results are exposed in Table 2, Table 3 and Table 4.
Table 2

Evaluation of the RF model with MFCC, HFCC and BFCC filters, by individually employing the first four moments of their distributions (mean-μ, standard deviation-STD, skewness-skew and kurtosis-kurt), their Δ and their ΔΔ. 10-cross-validation with 20,000 laughs (18,000 training and 2000 test in 10 epochs). AR, accuracy rate; TP, true positive; FP, false positive; TN, true negative; FN, false negative; Sens, sensitivity; Spec, specificity. Best performance per column is highlighted in bold.

Results by employing μ, STD, skewness and kurtosis of the coefficients
InputsAR (%)TPFPTNFNSensSpecAUC
μ(MFCC)720.720.280.680.320.690.710.722
STD(MFCC)680.670.330.690.310.680.670.695
skew(MFCC)590.580.420.610.390.60.590.615
kurt(MFCC)600.620.380.590.410.60.610.625
μ(HFCC)720.720.280.690.320.70.710.725
STD(HFCC)700.70.30.690.310.70.70.721
skew(HFCC)650.650.350.650.350.650.650.67
kurt(HFCC)700.710.290.680.320.690.70.715
μ(BFCC)730.720.280.70.30.710.710.733
STD(BFCC)700.70.30.690.310.690.690.712
skew(BFCC)570.570.430.580.420.570.570.599
kurt(BFCC)630.650.350.620.390.630.630.654
Results by employing μ, STD, skewness and kurtosis of the delta (Δ) of the coefficients
InputsAR (%)TPFPTNFNSensSpecAUC
μ(Δ(MFCC))670.690.310.650.350.660.680.692
STD(Δ(MFCC))740.70.30.650.350.660.680.694
skew(Δ(MFCC))640.650.350.640.360.640.650.665
kurt(Δ(MFCC))620.640.360.60.40.620.630.645
μ(Δ(HFCC))690.70.30.680.320.690.690.712
STD(Δ(HFCC))700.680.320.670.330.670.680.695
skew(Δ(HFCC))700.70.30.710.290.70.70.72
kurt(Δ(HFCC))640.670.330.610.390.630.650.664
μ(Δ(BFCC))630.650.350.620.380.630.640.657
STD(Δ(BFCC))710.680.320.690.310.690.690.71
skew(Δ(BFCC))680.690.310.670.330.680.680.701
kurt(Δ(BFCC))630.660.350.60.40.620.640.656
Results by employing μ, STD, skewness and kurtosis of the delta-delta (ΔΔ) of the coefficients
InputsAR (%)TPFPTNFNSensSpecAUC
μ(ΔΔ(MFCC))690.720.280.650.350.680.70.712
STD(ΔΔ(MFCC))710.780.220.750.250.710.720.735
skew(ΔΔ(MFCC))610.80.20.770.230.610.610.634
kurt(ΔΔ(MFCC))660.790.210.770.230.660.660.685
μ(ΔΔ(HFCC))690.710.290.660.340.680.70.713
STD(ΔΔ(HFCC))710.730.270.690.310.70.720.734
skew(ΔΔ(HFCC))660.650.350.660.340.660.650.675
kurt(ΔΔ(HFCC))610.630.370.60.40.610.620.635
μ(ΔΔ(BFCC))630.650.350.620.380.630.640.655
STD(ΔΔ(BFCC))730.740.260.730.270.730.730.754
skew(ΔΔ(BFCC))700.70.30.690.310.690.70.715
kurt(ΔΔ(BFCC))600.580.420.620.380.60.60.626
Table 3

Evaluation of the RF model with MFCC, HFCC and BFCC filters, by incrementally employing the first four moments of their distributions (mean-μ, standard deviation-STD, skewness-skew and kurtosis-kurt), their Δ and their ΔΔ. 10-cross-validation with 20,000 laughs (18,000 training and 2000 test in 10 epochs). AR, accuracy rate; TP, true positive; FP, false positive; TN, true negative; FN, false negative; Sens, sensitivity; Spec, specificity. Best performance per column is highlighted in bold.

Results by employing μ, STD, skewness and kurtosis of the coefficients
InputsAR (%)TPFPTNFNSensSpecAUC
μ(MFCC)720.720.280.680.320.690.710.71
μ+STD(MFCC)740.750.250.730.270.730.740.75
μ+STD+skew(MFCC)750.760.240.740.260.740.750.76
μ+STD+skew+kurt(MFCC)760.770.230.760.240.760.770.78
μ(HFCC)720.720.280.690.310.700.710.72
μ+STD(HFCC)740.740.260.730.270.740.740.76
μ+STD+skew(HFCC)760.770.230.750.250.760.760.78
μ+STD+skew+kurt(HFCC)770.790.210.760.240.770.780.80
μ(BFCC)730.720.280.700.300.710.710.73
μ+STD(BFCC)740.750.250.730.270.730.740.76
μ+STD+skew(BFCC)750.760.240.740.260.750.750.77
μ+STD+skew+kurt(BFCC)760.770.230.750.250.760.760.79
Results by employing μ, STD, skewness and kurtosis of the delta (Δ) of the coefficients
InputsAR (%)TPFPTNFNSensSpecAUC
μ(Δ(MFCC))670.690.310.650.350.660.680.69
μ+STD(Δ(MFCC))720.730.270.720.280.720.730.75
μ+STD+skew(Δ(MFCC))730.750.250.720.280.730.740.76
μ+STD+skew+kurt(Δ(MFCC))750.760.240.750.250.750.760.78
μ(Δ(HFCC))690.70.30.680.320.690.690.71
μ+STD(Δ(HFCC))720.710.290.720.280.720.710.74
μ+STD+skew(Δ(HFCC))730.730.270.730.270.730.730.75
μ+STD+skew+kurt(Δ(HFCC))760.760.240.760.240.760.760.78
μ(Δ(BFCC))630.650.350.620.380.630.640.66
μ+STD(Δ(BFCC))670.670.330.680.320.670.670.69
μ+STD+skew(Δ(BFCC))690.690.310.700.300.690.690.71
μ+STD+skew+kurt(Δ(BFCC))710.720.280.720.280.720.720.74
Results by employing μ, STD, skewness and kurtosis of the delta-delta (ΔΔ) of the coefficients
InputsAR (%)TPFPTNFNSensSpecAUC
μ(ΔΔ(MFCC))690.720.280.650.350.680.700.71
μ+STD(ΔΔ(MFCC))760.780.220.750.250.760.770.79
μ+STD+skew(ΔΔ(MFCC))780.790.210.770.230.780.790.81
μ+STD+skew+kurt(ΔΔ(MFCC))780.800.200.770.230.780.790.81
μ(ΔΔ(HFCC))690.710.290.660.340.680.700.71
μ+STD(ΔΔ(HFCC))750.760.240.730.270.740.750.77
μ+STD+skew(ΔΔ(HFCC))750.770.240.740.260.750.760.78
μ+STD+skew+kurt(ΔΔ(HFCC))760.770.230.750.250.750.770.78
μ(ΔΔ(BFCC))630.650.350.620.380.630.640.66
μ+STD(ΔΔ(BFCC))720.730.270.720.280.720.720.74
μ+STD+skew(ΔΔ(BFCC))730.730.270.730.270.730.730.75
μ+STD+skew+kurt(ΔΔ(BFCC))740.750.260.740.260.740.740.76
Table 4

Evaluation of the RF model with MFCC, HFCC and BFCC filters, by incrementally employing the first four moments of their distributions (mean-μ, standard deviation-STD, skewness-skew and kurtosis-kurt), together with their Δ and their ΔΔ. 10-cross-validation with 20,000 laughs (18,000 training and 2000 test in 10 epochs). AR, accuracy rate; TP, true positive; FP, false positive; TN, true negative; FN, false negative; Sens, sensitivity; Spec, specificity. Best performance per column is highlighted in bold.

InputsAR (%)TPFPTNFNSensSpecAUC
μ(MFCC+Δ(MFCC)+ΔΔ(MFCC))740.770.230.710.290.730.760.75
μ+STD(MFCC+Δ(MFCC)+ΔΔ(MFCC))820.830.170.820.180.820.830.84
μ+STD+skew(MFCC+Δ(MFCC)+ΔΔ(MFCC))830.840.160.820.180.820.840.85
μ+STD+skew+kurt(MFCC+Δ(MFCC)+ΔΔ(MFCC))830.840.160.820.180.830.840.86
μ(HFCC+Δ(HFCC)+ΔΔ(HFCC))750.770.230.730.270.740.760.76
μ+STD(HFCC+Δ(HFCC)+ΔΔ(HFCC))810.820.180.810.190.810.820.83
μ+STD+skew(HFCC+Δ(HFCC)+ΔΔ(HFCC))820.830.170.820.180.820.820.84
μ+STD+skew+kurt(HFCC+Δ(HFCC)+ΔΔ(HFCC))820.830.170.820.180.820.830.85
μ(BFCC+Δ(BFCC)+ΔΔ(BFCC))720.740.260.700.300.740.710.76
μ+STD(BFCC+Δ(BFCC)+ΔΔ(BFCC))800.800.200.800.200.800.800.82
μ+STD+skew(BFCC+Δ(BFCC)+ΔΔ(BFCC))810.810.190.810.190.810.810.84
μ+STD+skew+kurt(BFCC+Δ(BFCC)+ΔΔ(BFCC))820.820.180.810.190.820.810.85
Both, RF- and SVM-based clinical decision support systems reached 81–83% AR with the three filters (Table 5 and Table 6) with 0.85–0.86 AUC values, suggesting that cepstral coefficients are generally good for classification, regardless of the employed algorithm (RF or SVM). This is especially important because one can gain much interpretability using, for example, a linear SVM (by examining the weights of the classifier), without incurring a greater rate of false negatives.
Table 5

Results of the variation of the kernel in the SVM model with MFCC, HFCC and BFCC filters, by employing the first four moments of their distributions (mean-μ, standard deviation-STD, skewness-skew and kurtosis-kurt). 10-cross-validation with 20,000 laughs (18,000 training and 2000 test in 10 epochs). AR, accuracy rate; TP, true positive; FP, false positive; TN, true negative; FN, false negative; Sens, sensitivity; Spec, specificity. Best performance per column is highlighted in bold.

Results of Mel filters: μ + STD + skew + kurt (MFCC + Δ(MFCC) + ΔΔ(MFCC))
Kernel variationAR (%)TPFPTNFNSensSpecAUC
Linear740.740.260.730.270.730.740.76
Polynomial730.750.250.720.280.730.740.76
Radial Basis650.860.140.450.550.610.760.72
ν-Linear810.810.190.810.190.810.810.85
ν-Polynomial820.820.180.830.170.820.820.86
ν-Radial Basis730.850.150.600.400.680.800.79
Results of Human Factor filters: μ + STD + skew + kurt (HFCC + Δ(HFCC) + ΔΔ(HFCC))
Kernel variationAR (%)TPFPTNFNSensSpecAUC
Linear740.740.260.730.270.740.740.78
Polynomial740.750.250.730.270.730.740.78
Radial Basis660.860.140.450.550.610.760.73
ν-Linear810.810.190.810.190.810.810.85
ν-Polynomial830.830.170.830.170.830.830.86
ν-Radial Basis730.850.150.610.390.690.810.79
Results of Bark filters: μ + STD + skew + kurt (BFCC + Δ(BFCC) + ΔΔ(BFCC))
Kernel variationAR (%)TPFPTNFNSensSpecAUC
Linear710.710.290.720.280.720.710.76
Polynomial720.720.280.720.280.720.720.76
Radial Basis630.850.150.410.590.590.730.69
ν-Linear800.800.200.800.200.800.800.85
ν-Polynomial820.820.180.820.180.820.820.86
ν-Radial Basis660.850.150.470.530.620.760.72
Table 6

Summary of the results of the RF model with MFCC, HFCC and BFCC filters, by employing the first four moments of their distributions (mean-μ, standard deviation-STD, skewness-skew and kurtosis-kurt), Δ and ΔΔ. 10-cross-validation with 20,000 laughs (18,000 training and 2000 test in 10 epochs). AR, accuracy rate; TP, true positive; FP, false positive; TN, true negative; FN, false negative; Sens, sensitivity; Spec, specificity. Note that the three rows correspond to the 4th, 8th and 12th row of Table 4.

AR (%)TPFPTNFNSensSpecAUC
MFCC830.840.160.820.180.830.840.86
HFCC820.830.170.820.180.820.830.85
BFCC810.820.180.810.190.820.810.85
To determine to which extent our classification is affected by laugh’s pitch characteristics (power spectra), we employed it instead and in addition to the cepstral coefficients as an input to our classification system. In both cases, pitch information is not a determinant for the correct classification of the laughs, as AR is very low when pitch statistics (mean, standard deviation, etc.) were employed as input attributes (AR < 50%).

4. Discussion

In the present paper we provided evidence for the feasibility of a clinical decision support system for the detection of Parkinson’s disease which employs laugh as a biomarker of the illness. Such a decision support system would be composed by two sub-systems: one for laugh identification and one for laugh classification. For the first, we tested the suitability of 13 cepstral coefficients, together with their delta and delta-delta components, employing three different filter banks (Mel, Bark and Human), each of which is composed by 26 filters. For the second, we tested three automatic classification techniques (kNN, RF and SVM). Each of them was tested three times; one for each of the three coefficients. We proved that classical speech-recognition techniques like cepstral coefficients can be used to identify and label laugh signals and that such coefficients can be used by automatic classification techniques to decide if laughs belong to a PD or non-PD subject. All of them reached very good AR scores, the highest (83) obtained through the clinical decision support system based on the RF classification model using the Mel cepstral coefficients. This model has been used for the final test due to the lower computational cost compared to the SVM. As mentioned in the Results section, SVM performed similarly. High AR scores have been obtained using both Bark and Human Frequency cepstral coefficients in the final test, proving the consistency of our approach. Mathews correlation coefficient (MCC), an independent measure of the accuracy of the classification, corroborates the best AR performance of RF and SVM models, allowing them 0.66 and 0.64 points over 1.0 scores, respectively. A limitation of the study is that testing has not been performed on a data set of real laughs. The similar and high AR values obtained by the RF, when combined with Human Factor or Bark frequency cepstral coefficients, prove the consistency of the approach, and suggest the models are comparable. Metrics displayed in Table 2, Table 3, Table 4 and Table 5 indicate that, on one hand, individual moments do not carry enough information for a correct classification of the subjects and, on the other, we constantly improve classification performance if we consider these moments in an incremental manner. In SVM data obtained with the three kernels, we observe that linear and polynomial kernels achieve similar ARs, higher than AR of the radial one, which suggests that clusters are not formed by partially intermingled clouds and that they can easily be separated by simple planes. Pitch contribution in the correct classification of the laughs was also tested. Laugh presents a high fundamental frequency variation [35]. This variability is present in all groups and sexes, making fundamental frequency non-suitable as a feature for laugh-based PD classification (AR < 50% when pitch statistics were employed as the sole input attributes). However, pitch does not provide relevant information for classification performance, since classification systems do not improve their AR. This is possibly due to a very low contribution of vibrational components in the characterization of laugher signals, contrary to what occurs in speech ones. Power spectra represent the vibrational components of the signal, which, in our case, are generated by the vocal apparatus during sound production. In neural circuits terms, these results could indicate that laugh analysis primarily detects the degeneration of specific motor nuclei and the reduction of the precise control they exercise to the muscles through the laryngeal reflexogenic control systems [45,46] instead of the degeneration of higher brain areas, like basal ganglia, thalamus or cortex and the global control each of them exercises to the next one (Figure 1), which would also include significant deterioration of the vibrational components. Other not mutually exclusive interpretations are possible, as for example the PD-independent influence of sex on pitch. Our results are consistent with automatic Parkinson’s disease detection systems using speech analysis with MFCC that have obtained AR values higher than 80% [25]. The interest of laugh-based clinical decision support systems we propose could be useful for early detection of the disease, where motor symptoms are not yet detectable by neurologists and early detection of neurodegenerative diseases could facilitate treatments to slow down the evolution of the illness. From a computational point of view, we could highlight that, a priori, the decision support system does not display significant AR differences depending on the selection of the filter bank. This provides relevant information for future studies in laughter-based PD detection since the development of MFCC algorithms is very extended and numerous libraries with their implementations can be easily found. Open-source libraries are available, like Librosa for Python or OpenSmile, where the Mel filter bank is applied by default. On the other hand, Matlab’s Audio Toolbox provides an MFCC extraction function, with an approximate cost of less than 700€ for an annual license. However, the study of the coefficients themselves should be expanded, by evaluating the number of employed filters as well as the number of coefficients, to achieve a compromise between optimal results and computational cost penalty. Furthermore, the adjustment and evaluation of SVM hyper-parameters would be of interest for future studies to further understand input features. Possibly, neural networks and deep learning techniques, would help to build the decision support system for clinical use. In future studies the variability of the humoristic videos and the psychological conditions of the subjects should also be considered, as well as the possible high variability in laughter production, and even that some of the subjects could not feel comfortable during the recording. The possible combination of speech and laugh analysis to improve PD detection performance could facilitate the implementation of a system for the telematic detection of PD. Also, the possibility of evaluating the process of the disease would be of interest, trying to estimate the UPDRS (Unified Parkinson’s Disease Rating Scale) scale of PD patients through speech and laughter, with a more continuous evaluation of the disease and a consequent reduction of health costs. Smartphone apps could be useful for allowing people to perform the test in privacy, thus improving the above-mentioned aspects.

5. Conclusions

Our paper provides evidence that (1) laughter can be used as a biomarker for PD detection, (2) laughter-based support systems are feasible, and (3) laughter-based support systems perform at least as well as speech-based ones, thus giving PD specialists the possibility to perform a prospective study of laughter recordings from people who eventually develop PD. As demonstrated in our experiments, the feature extraction methods (cepstral coefficients) and machine learning algorithms derived from speech processing field can provide promising results for PD detection from laughs. The main contributions of our study are to have proven the feasibility of using laughter as a possible biomarker to detect Parkinson’s disease and having used speech analysis techniques on much more primitive signals such as laughter.
  24 in total

1.  Acoustic voice analysis in untreated patients with Parkinson's disease.

Authors:  F J Jiménez-Jiménez; J Gamboa; A Nieto; J Guerrero; M Orti-Pareja; J A Molina; E García-Albea; I Cobeta
Journal:  Parkinsonism Relat Disord       Date:  1997-04       Impact factor: 4.891

2.  Deep brain stimulation of the subthalamic nucleus enhances emotional processing in Parkinson disease.

Authors:  Frank Schneider; Ute Habel; Jens Volkmann; Sabine Regel; Jürgen Kornischka; Volker Sturm; Hans-Joachim Freund
Journal:  Arch Gen Psychiatry       Date:  2003-03

3.  The meaning and use of the area under a receiver operating characteristic (ROC) curve.

Authors:  J A Hanley; B J McNeil
Journal:  Radiology       Date:  1982-04       Impact factor: 11.105

4.  Speech and Swallowing in Parkinson's Disease.

Authors:  Kris Tjaden
Journal:  Top Geriatr Rehabil       Date:  2008

5.  Imprecise vowel articulation as a potential early marker of Parkinson's disease: effect of speaking task.

Authors:  Jan Rusz; Roman Cmejla; Tereza Tykalova; Hana Ruzickova; Jiri Klempir; Veronika Majerova; Jana Picmausova; Jan Roth; Evzen Ruzicka
Journal:  J Acoust Soc Am       Date:  2013-09       Impact factor: 1.840

6.  Acoustic characteristics of vowel sounds in patients with Parkinson disease.

Authors:  Young-Im Bang; Kyunghoon Min; Young H Sohn; Sung-Rae Cho
Journal:  NeuroRehabilitation       Date:  2013       Impact factor: 2.138

7.  Cepstral analysis of hypokinetic and ataxic voices: correlations with perceptual and other acoustic measures.

Authors:  Stephen Jannetts; Anja Lowit
Journal:  J Voice       Date:  2014-05-16       Impact factor: 2.009

8.  PKG Movement Recording System Use Shows Promise in Routine Clinical Care of Patients With Parkinson's Disease.

Authors:  Rajeshree Joshi; Jeffrey M Bronstein; A Keener; Jaclyn Alcazar; Diane D Yang; Maya Joshi; Neal Hermanowicz
Journal:  Front Neurol       Date:  2019-10-01       Impact factor: 4.003

Review 9.  Fluid Biomarkers in Alzheimer's Disease and Other Neurodegenerative Disorders: Toward Integrative Diagnostic Frameworks and Tailored Treatments.

Authors:  Linda Giampietri; Elisabetta Belli; Maria Francesca Beatino; Sara Giannoni; Giovanni Palermo; Nicole Campese; Gloria Tognoni; Gabriele Siciliano; Roberto Ceravolo; Ciro De Luca; Filippo Baldacci
Journal:  Diagnostics (Basel)       Date:  2022-03-24

10.  The neural control of singing.

Authors:  Jean Mary Zarate
Journal:  Front Hum Neurosci       Date:  2013-06-03       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.