Literature DB >> 33733161

Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning With Neural Networks.

Gašper Beguš1,2.   

Abstract

Training deep neural networks on well-understood dependencies in speech data can provide new insights into how they learn internal representations. This paper argues that acquisition of speech can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network's internal representations that correspond to phonetic and phonological properties. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. A Generative Adversarial Network was trained on an allophonic distribution in English, in which voiceless stops surface as aspirated word-initially before stressed vowels, except if preceded by a sibilant [s]. The network successfully learns the allophonic alternation: the network's generated speech signal contains the conditional distribution of aspiration duration. The paper proposes a technique for establishing the network's internal representations that identifies latent variables that correspond to, for example, presence of [s] and its spectral properties. By manipulating these variables, we actively control the presence of [s] and its frication amplitude in the generated outputs. This suggests that the network learns to use latent variables as an approximation of phonetic and phonological representations. Crucially, we observe that the dependencies learned in training extend beyond the training interval, which allows for additional exploration of learning representations. The paper also discusses how the network's architecture and innovative outputs resemble and differ from linguistic behavior in language acquisition, speech disorders, and speech errors, and how well-understood dependencies in speech data can help us interpret how neural networks learn their representations.
Copyright © 2020 Beguš.

Entities:  

Keywords:  allophonic distribution; deep neural network interpretability; generative adversarial networks; language acquisition; speech; voice onset time

Year:  2020        PMID: 33733161      PMCID: PMC7861218          DOI: 10.3389/frai.2020.00044

Source DB:  PubMed          Journal:  Front Artif Intell        ISSN: 2624-8212


  29 in total

1.  Acquisition of initial /s/-stop and stop-/s/sequences in Greek.

Authors:  Asimina Syrika; Katerina Nicolaidis; Jan Edwards; Mary E Beckman
Journal:  Lang Speech       Date:  2011-09       Impact factor: 1.500

2.  Weighted constraints in generative linguistics.

Authors:  Joe Pater
Journal:  Cogn Sci       Date:  2009-06-11

Review 3.  Phonological regularity, perceptual biases, and the role of phonotactics in speech error analysis.

Authors:  John Alderete; Paul Tupper
Journal:  Wiley Interdiscip Rev Cogn Sci       Date:  2018-05-30

4.  Learning phonology with substantive bias: an experimental and computational study of velar palatalization.

Authors:  Colin Wilson
Journal:  Cogn Sci       Date:  2006-09-10

5.  Simplification of /s/ + stop consonant clusters: a developmental perspective.

Authors:  H W Catts; A G Kamhi
Journal:  J Speech Hear Res       Date:  1984-12

6.  /s/ plus stop clusters in children's speech.

Authors:  Z S Bond; H F Wilson
Journal:  Phonetica       Date:  1980       Impact factor: 1.759

7.  The influence of sonority on children's cluster reductions.

Authors:  D K Ohala
Journal:  J Commun Disord       Date:  1999 Nov-Dec       Impact factor: 2.288

8.  Patterns of acquisition of native voice onset time in English-learning children.

Authors:  Joanna H Lowenstein; Susan Nittrouer
Journal:  J Acoust Soc Am       Date:  2008-08       Impact factor: 1.840

9.  Voice Onset Time (VOT) at 50: Theoretical and practical issues in measuring voicing distinctions.

Authors:  Arthur S Abramson; D H Whalen
Journal:  J Phon       Date:  2017-05-23

10.  Learning to Produce Syllabic Speech Sounds via Reward-Modulated Neural Plasticity.

Authors:  Anne S Warlaumont; Megan K Finnegan
Journal:  PLoS One       Date:  2016-01-25       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.