| Literature DB >> 24244681 |
María Florencia Assaneo1, Marcos A Trevisan, Gabriel B Mindlin.
Abstract
Current models of human vocal production that capture peripheral dynamics in speech require large dimensional measurements of the neural activity, which are mapped into equally complex motor gestures. In this work we present a motor description for vowels as points in a discrete low-dimensional space. We monitor the dynamics of 3 points at the oral cavity using Hall-effect transducers and magnets, describing the resulting signals during normal utterances in terms of active/inactive patterns that allow a robust vowel classification in an abstract binary space. We use simple matrix algebra to link this representation to the anatomy of the vocal tract and to recent reports of highly tuned neuronal activations for vowel production, suggesting a plausible global strategy for vowel codification and motor production.Entities:
Mesh:
Year: 2013 PMID: 24244681 PMCID: PMC3828404 DOI: 10.1371/journal.pone.0080373
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1From transducer signals to binary vowel space.
Upper right panel: sketch of Hall Effect transducers and magnets in the oral cavity. Transducers are marked with squares and magnets with circles. Lips (red): the transducer was attached to the center of the lower lip and the magnet was glued to the dental plastic replica, in between the central incisors. Jaw (green): magnet and transducer were glued to the dental replicas, in the space between the canine and the first premolar of the upper and lower teeth respectively. Tongue (blue): a cylindrical magnet was attached at a distance of 1.5 cm from the tip of the tongue. The corresponding Hall Effect transducer was glued to the dental plastic replica, at the hard palate, 1 cm right over the superior teeth (sagittal plane). Transducer wire was glued to the plastic replica and routed away to allow free mouth movements. Left, downwards: a spectrogram of the set of 5 vowels as pronounced by one of the subjects during a recording session (and frequency values for the first 2 formants) and the corresponding transducer signals for the lips, jaw and tongue. A binary code for each vowel can be defined by labeling the signal of each articulator as active (1) or inactive (0) as it reaches or not a predefined threshold (areas in color correspond to active motor coordinates). Lower right panel: resulting vowel-cube in the binary space. The edges of the cube represent an abstract space of size 8, were we explicitly locate the 5 vowels used in this work.
Figure 2Decoding performances.
Left: representative transducer signals during the utterance of the 5 Spanish vowels for the 3 participants. Up: decoding performance for each transducer across vowels using the training set for participant 1 (gray background). We show the hit rates for the lips (red), jaw (green) and tongue (blue) as a function of the threshold value. All the curves present high decoding accuracies over a range of threshold values. Lower right: confusion matrices across subjects for the test set. In (a), we set the thresholds at optimum values for each participant. In (b) we use a single set of thresholds for all the subjects.
Connecting the vowel formants to anatomical coefficients.
| <F1> | <F2> | (q1, q2) | ||
| /a/ | 0.80 | 1.26 | 4 | 1 |
| /e/ | 0.57 | 2.06 | −1.5 | 3 |
| /i/ | 0.28 | 2.50 | −5.5 | 2 |
| /o/ | 0.53 | 0.99 | 2 | −2 |
| /u/ | 0.31 | 0.77 | −2 | −3 |
We use the mapping described by Story et al. [6] to find the coefficients (q 1, q 2) corresponding to the first average formants F 1 and F 2 (kHz) of the Spanish vowels pronounced by our participants. This allowed the construction of a simple affine map connecting each Spanish vowel from the discrete motor space to their corresponding vocal tract configuration.