David A Moses1, Nima Mesgarani, Matthew K Leonard, Edward F Chang. 1. Department of Neurological Surgery, UC San Francisco, CA, USA. Center for Integrative Neuroscience, UC San Francisco, CA, USA. Graduate Program in Bioengineering, UC Berkeley-UC San Francisco, CA, USA.
Abstract
OBJECTIVE: The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. APPROACH: The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. MAIN RESULTS: The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. SIGNIFICANCE: These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.
OBJECTIVE: The superior temporal gyrus (STG) and neighboring brain regions play a key role in human language processing. Previous studies have attempted to reconstruct speech information from brain activity in the STG, but few of them incorporate the probabilistic framework and engineering methodology used in modern speech recognition systems. In this work, we describe the initial efforts toward the design of a neural speech recognition (NSR) system that performs continuous phoneme recognition on English stimuli with arbitrary vocabulary sizes using the high gamma band power of local field potentials in the STG and neighboring cortical areas obtained via electrocorticography. APPROACH: The system implements a Viterbi decoder that incorporates phoneme likelihood estimates from a linear discriminant analysis model and transition probabilities from an n-gram phonemic language model. Grid searches were used in an attempt to determine optimal parameterizations of the feature vectors and Viterbi decoder. MAIN RESULTS: The performance of the system was significantly improved by using spatiotemporal representations of the neural activity (as opposed to purely spatial representations) and by including language modeling and Viterbi decoding in the NSR system. SIGNIFICANCE: These results emphasize the importance of modeling the temporal dynamics of neural responses when analyzing their variations with respect to varying stimuli and demonstrate that speech recognition techniques can be successfully leveraged when decoding speech from neural signals. Guided by the results detailed in this work, further development of the NSR system could have applications in the fields of automatic speech recognition and neural prosthetics.
Authors: Mitchell Steinschneider; Kirill V Nourski; Hiroto Kawasaki; Hiroyuki Oya; John F Brugge; Matthew A Howard Journal: Cereb Cortex Date: 2011-03-02 Impact factor: 5.357
Authors: Xiaomei Pei; Eric C Leuthardt; Charles M Gaona; Peter Brunner; Jonathan R Wolpaw; Gerwin Schalk Journal: Neuroimage Date: 2010-10-26 Impact factor: 6.556
Authors: Brian N Pasley; Stephen V David; Nima Mesgarani; Adeen Flinker; Shihab A Shamma; Nathan E Crone; Robert T Knight; Edward F Chang Journal: PLoS Biol Date: 2012-01-31 Impact factor: 8.029
Authors: Miguel Angrick; Christian Herff; Emily Mugler; Matthew C Tate; Marc W Slutzky; Dean J Krusienski; Tanja Schultz Journal: J Neural Eng Date: 2019-03-04 Impact factor: 5.379
Authors: Michael Trumpis; Michele Insanally; Jialin Zou; Ashraf Elsharif; Ali Ghomashchi; N Sertac Artan; Robert C Froemke; Jonathan Viventi Journal: J Neural Eng Date: 2017-01-19 Impact factor: 5.379
Authors: Kirill V Nourski; Mitchell Steinschneider; Ariane E Rhone; Matthew A Howard Iii Journal: Front Hum Neurosci Date: 2017-01-10 Impact factor: 3.169
Authors: Yul Hr Kang; Anne Löffler; Danique Jeurissen; Ariel Zylberberg; Daniel M Wolpert; Michael N Shadlen Journal: Elife Date: 2021-03-10 Impact factor: 8.713