| Literature DB >> 26561811 |
Andrea Guidi1,2, Sergio Salvi3, Manuel Ottaviano4, Claudio Gentili5,6, Gilles Bertschy7, Danilo de Rossi8,9, Enzo Pasquale Scilingo10,11, Nicola Vanello12,13.
Abstract
Bipolar disorder is one of the most common mood disorders characterized by large and invalidating mood swings. Several projects focus on the development of decision support systems that monitor and advise patients, as well as clinicians. Voice monitoring and speech signal analysis can be exploited to reach this goal. In this study, an Android application was designed for analyzing running speech using a smartphone device. The application can record audio samples and estimate speech fundamental frequency, F0, and its changes. F0-related features are estimated locally on the smartphone, with some advantages with respect to remote processing approaches in terms of privacy protection and reduced upload costs. The raw features can be sent to a central server and further processed. The quality of the audio recordings, algorithm reliability and performance of the overall system were evaluated in terms of voiced segment detection and features estimation. The results demonstrate that mean F0 from each voiced segment can be reliably estimated, thus describing prosodic features across the speech sample. Instead, features related to F0 variability within each voiced segment performed poorly. A case study performed on a bipolar patient is presented.Entities:
Keywords: bipolar disorders; fundamental frequency; pitch strength; smartphone application; voice monitoring system; voice segmentation
Mesh:
Year: 2015 PMID: 26561811 PMCID: PMC4701269 DOI: 10.3390/s151128070
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Screenshots of the developed application: main menu (left), and an example of the mood agenda (right).
GTal, microphones comparison: segment lengths (median ± MAD), specificity (Spec) and sensitivity (Sens) estimated from the different concurrent audio samples. Spec and Sens are defined with respect to voiced segments as identified on the high quality microphone (HQmic).
| GTal: Microphones Comparison | ||||
|---|---|---|---|---|
| GTal: HQmic | ||||
| Subj. | Spec | Sens | ||
| 1 | 136 ± 56 | 160 ± 80 | 0.89 | 0.81 |
| 2 | 128 ± 64 | 168 ± 88 | 0.85 | 0.83 |
| GTal: HQmic | ||||
| Subj. | Spec | Sens | ||
| 1 | 136 ± 56 | 152 ± 72 | 0.89 | 0.77 |
| 2 | 128 ± 64 | 128 ± 64 | 0.90 | 0.81 |
GTal, microphones comparison: correlation coefficients between features extracted from the audio acquired with the HQmic and on overlapping portions of the segments. The corresponding p-values are shown in brackets.
| GTal: Microphones Comparison | |||
|---|---|---|---|
| HQmic | 1.00 [< | ||
| 1 | 0.96 [< | ||
| 0.81 [2.3 | |||
| HQmic | 0.95 [1.90 | ||
| 1 | 0.90 [1.68 | ||
| 0.90 [1.69 | |||
| HQmic | 1.00 [< | ||
| 2 | 0.92 [< | ||
| 0.76 [1.72 | |||
| HQmic | 1.00 [< | ||
| 2 | 0.99 [< | ||
| 0.91 [< | |||
Figure 2Specificity and sensitivity trends of voiced segmentation in Subject 1 (left) and Subject 2 (right) regarding the comparison GTal vs. ANDal.
HQmic, algorithms comparison: segment lengths (median ± MAD), specificity (Spec) and sensitivity (Sens) as estimated on high quality data using both algorithms.
| HQmic: Algorithms Comparison | ||||
|---|---|---|---|---|
| Subj. | Spec | Sens | ||
| 1 | 136 ± 56 | 90 ± 60 | 0.90 | 0.68 |
| 2 | 128 ± 64 | 40 ± 30 | 0.95 | 0.62 |
HQmic, algorithms comparison: correlation coefficients regarding the features sets extracted from HQmic by using both ANDal and GTal. The corresponding p-values are shown in brackets.
| HQmic: Algorithms Comparison | |||
|---|---|---|---|
| GTal | 0.98 [< | ||
| 1 | 0.92 [< | ||
| ANDal | 0.86 [1.52 | ||
| GTal | 0.93 [< | ||
| 2 | 0.81 [< | ||
| ANDal | 0.71 [3.28 | ||
Figure 3Specificity and sensitivity trends of voiced segmentation in Subject 1 (left) and Subject 2 (right) regarding the comparison HQmic + GTal vs. + ANDal.
Overall estimation of the system: segment lengths (median ± MAD), specificity (Spec) and sensitivity (Sens) estimated using the two different systems.
| System Evaluation | ||||
|---|---|---|---|---|
| HQmic + GTal | ||||
| Subj. | Spec | Sens | ||
| 1 | 136 ± 56 | 85 ± 35 | 0.96 | 0.56 |
| 2 | 128 ± 64 | 40 ± 25 | 0.93 | 0.46 |
| HQmic + GTal | ||||
| subj. | Spec | Sens | ||
| 1 | 136 ± 54 | 70 ± 30 | 0.96 | 0.47 |
| 2 | 128 ± 64 | 40 ± 20 | 0.97 | 0.40 |
Overall estimation of the system. Correlation coefficients regarding the investigation about the features extracted from audio coming from the two different systems: HQmic + GTal vs. + ANDal and HQmic + GTal vs. + ANDal. The corresponding p-values are shown in brackets.
| System Evaluation | |||
|---|---|---|---|
| HQmic + GTal | 1.00 [< | ||
| 1 | 0.92 [8.25 | ||
| 0.84 [1.63 | |||
| HQmic + GTal | 0.99 [< | ||
| 1 | 0.94 [< | ||
| 0.78 [6.48 | |||
| HQmic + GTal | 0.89 [< | ||
| 2 | 0.77 [< | ||
| 0.79 [< | |||
| HQmic + GTal | 0.91 [< | ||
| 2 | 0.78 [< | ||
| 0.74 [< | |||
Comparison between different smartphone models. Correlation coefficients regarding the investigation about the features extracted from audio coming from the two different smartphone models. The corresponding p-values were always smaller than .
| Comparison between Different Smartphone Models | |||
|---|---|---|---|
| 1.00 | |||
| 1 | ANDal | 0.97 | |
| 0.90 | |||
| 1.00 | |||
| 2 | ANDal | 0.82 | |
| 0.90 | |||
Figure 4Trend observed in the reported patient: , p-value = 0.0392.