| Literature DB >> 26380259 |
Khan Richard Baykaner1, Mark Huckvale1, Iya Whiteley2, Svetlana Andreeva3, Oleg Ryumin3.
Abstract
Automatic systems for estimating operator fatigue have application in safety-critical environments. A system which could estimate level of fatigue from speech would have application in domains where operators engage in regular verbal communication as part of their duties. Previous studies on the prediction of fatigue from speech have been limited because of their reliance on subjective ratings and because they lack comparison to other methods for assessing fatigue. In this paper, we present an analysis of voice recordings and psychophysiological test scores collected from seven aerospace personnel during a training task in which they remained awake for 60 h. We show that voice features and test scores are affected by both the total time spent awake and the time position within each subject's circadian cycle. However, we show that time spent awake and time-of-day information are poor predictors of the test results, while voice features can give good predictions of the psychophysiological test scores and sleep latency. Mean absolute errors of prediction are possible within about 17.5% for sleep latency and 5-12% for test scores. We discuss the implications for the use of voice as a means to monitor the effects of fatigue on cognitive performance in practical applications.Entities:
Keywords: bioinformatics; computational paralinguistics; fatigue; speech
Year: 2015 PMID: 26380259 PMCID: PMC4548483 DOI: 10.3389/fbioe.2015.00124
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Psychophysiological tests selected for analysis.
| S No. | Type | Measures | Description |
|---|---|---|---|
| 1 | Simple RT | Mean reaction time in milliseconds averaged over 57–60 trials | A monitor displays occasional flashes of light. Subjects must respond to every third flash by pressing a button on a handset as quickly as possible |
| 2 | Planned RT | Mean timing error in milliseconds averaged over 50 trials | A monitor displays a colored bar growing in an arc within the border of a circle. After some time, a line appears ahead of the forward end of the bar, and the subject must press a key to stop the bar as close to the line as possible |
| 3 | Memory (pictures) | Count of pictures missed and incorrect selections | A monitor displays 16 pictures in a 4 × 4 grid for 20 s. The pictures disappear and are immediately replaced by 64 pictures containing the original 16. The subject has 60 s to identify the original set of pictures |
| 4 | Memory (numbers) | Count of numbers missed and incorrect selections | A monitor displays 12 numbers from the range 1–100 displayed in a 3 × 4 grid for 20 s. The numbers disappear for 15 s, and then a 5 × 6 grid of numbers appears containing the original 12 numbers. The subject has 60 s to identify the original set of numbers |
| 5 | Cognition | Total time taken to complete task (in seconds) | A monitor displays a 7 × 7 grid containing randomly positioned red and black numbers (for example, 1–25 in red and 1–24 in black). The subject performs three tasks as fast as possible: (i) click the black numbers in ascending order; (ii) click the red numbers in descending order; (iii) alternately click red and black numbers, with the red numbers descending and the black numbers ascending |
Figure 1Psychophysiological test scores plotted against the clock time in the isolation experiment. The gray lines indicate the subject scores and the blue line is the mean score across subjects.
Mixed-effects linear regression model describing the relationship between PPT scores, sleep latency, and phase.
| Test | Sleep latency | p(Sleep latency) | Phase | p(Phase) |
|---|---|---|---|---|
| Simple RT | 0.653 | 0.107 | −3.10 | 0.774 |
| Planned RT | 0.403 | 0.00347* | 1.22 | 0.734 |
| Memory | 0.0286 | 0.156 | 0.785 | 0.149 |
| Cognition | −0.0256 | 0.939 | 25.8 | 0.00471* |
The * marker indicates significance at .
Cross-validation model training for sleep latency, and phase from speech features.
| Test | Model | Average performance over 100, 10-fold CVs | ||
|---|---|---|---|---|
| R | MAE | RAE | ||
| Sleep latency | Null model | 0.00 (0.00) | 993.90 (200.82)/min | 100 (00.00)% |
| MLR | 0.71 (0.18) | 726.53 (190.27)/min | 73.77 (19.85)% | |
| SVR ( | 0.70 (0.19) | 756.36 (219.35)/min | 77.95 (26.84)% | |
| SVR ( | | |||
| Phase | Null model | 0.00 (0.00) | 352.85 (75.21)/min | 100 (00.00)% |
| MLR | 0.43 (0.35) | 379.58 (106.43)/min | 112.86 (46.52)% | |
| SVR ( | 369.63 (101.17)/min | 109.72 (44.03)% | ||
| SVR ( | 0.30 (0.39) | |||
Average metrics are displayed with SDs shown in parentheses. Bold shows best performing systems.
Figure 2Scatter plots showing the relationships between speech-based model predictions of phase (top) and sleep latency (bottom).
Performance results for models constructed to predict psychophysiological test scores from time only or from speech features using SVM and MLR approaches.
| Test | Model | 10-Fold performance | ||
|---|---|---|---|---|
| R | MAE | RAE | ||
| Simple RT | Null model | 0.00 (0.00) | 0.76 (0.21) | 100.00 (0.00)% |
| Time MLR | 0.14 (0.39) | 0.75 (0.21) | 99.91 (13.21)% | |
| Speech MLR | ||||
| Speech SVR ( | ||||
| Speech SVR ( | 0.34 (0.38) | 0.71 (0.21) | 96.10 (23.36)% | |
| Planned RT | Null model | 0.00 (0.00) | 0.79 (0.21) | 100.00 (0.00)% |
| Time MLR | 0.32 (0.35) | 0.75 (0.20) | 96.12 (16.54)% | |
| Speech MLR | ||||
| Speech SVR ( | ||||
| Speech SVR ( | 0.51 (0.32) | 0.66 (0.18) | 87.49 (24.52)% | |
| Memory | Null model | 0.00 (0.00) | 0.81 (0.19) | 100.00 (0.00)% |
| Time MLR | 0.08 (0.38) | 0.84 (0.19) | 103.30 (9.17)% | |
| Speech MLR | ||||
| Speech SVR ( | ||||
| Speech SVR ( | 0.36 (0.35) | 0.76 (0.18) | 96.31 (25.23)% | |
| Cognition | Null model | 0.00 (0.00) | 0.77 (0.22) | 100.00 (0.00)% |
| Time MLR | 0.16 (0.39) | 0.78 (0.21) | 102.26 (12.45)% | |
| Speech MLR | ||||
| Speech SVR ( | ||||
| Speech SVR ( | 0.39 (0.34) | 0.72 (0.21) | 97.12 (27.42)% | |
Average metrics are displayed with SDs shown in parentheses. Bold shows best performing systems.
Figure 3Scatter plots showing the relationship between “time only” model predictions and observations for the psychophysiological tests. The solid line is the line y = x, which shows all possible perfect predictions.
Figure 4Scatter plots showing the relationship between speech MLR model predictions and observations for the psychophysiological tests. The solid line is the line y = x, which shows all possible perfect predictions.