OBJECTIVE: Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech. APPROACH: Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant. MAIN RESULTS: In a study with six participants, we achieved correlations up to r = 0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output. SIGNIFICANCE: To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.
OBJECTIVE: Direct synthesis of speech from neural signals could provide a fast and natural way of communication to people with neurological diseases. Invasively-measured brain activity (electrocorticography; ECoG) supplies the necessary temporal and spatial resolution to decode fast and complex processes such as speech production. A number of impressive advances in speech decoding using neural signals have been achieved in recent years, but the complex dynamics are still not fully understood. However, it is unlikely that simple linear models can capture the relation between neural activity and continuous spoken speech. APPROACH: Here we show that deep neural networks can be used to map ECoG from speech production areas onto an intermediate representation of speech (logMel spectrogram). The proposed method uses a densely connected convolutional neural network topology which is well-suited to work with the small amount of data available from each participant. MAIN RESULTS: In a study with six participants, we achieved correlations up to r = 0.69 between the reconstructed and original logMel spectrograms. We transfered our prediction back into an audible waveform by applying a Wavenet vocoder. The vocoder was conditioned on logMel features that harnessed a much larger, pre-existing data corpus to provide the most natural acoustic output. SIGNIFICANCE: To the best of our knowledge, this is the first time that high-quality speech has been reconstructed from neural recordings during speech production using deep neural networks.
Authors: Christian Herff; Garett Johnson; Lorenz Diener; Jerry Shih; Dean Krusienski; Tanja Schultz Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2016-08
Authors: Stephanie K Riès; Rummit K Dhillon; Alex Clarke; David King-Stephens; Kenneth D Laxer; Peter B Weber; Rachel A Kuperman; Kurtis I Auguste; Peter Brunner; Gerwin Schalk; Jack J Lin; Josef Parvizi; Nathan E Crone; Nina F Dronkers; Robert T Knight Journal: Proc Natl Acad Sci U S A Date: 2017-05-22 Impact factor: 11.205
Authors: Leigh R Hochberg; Mijail D Serruya; Gerhard M Friehs; Jon A Mukand; Maryam Saleh; Abraham H Caplan; Almut Branner; David Chen; Richard D Penn; John P Donoghue Journal: Nature Date: 2006-07-13 Impact factor: 49.962
Authors: Stephanie Martin; Peter Brunner; Iñaki Iturrate; José Del R Millán; Gerwin Schalk; Robert T Knight; Brian N Pasley Journal: Sci Rep Date: 2016-05-11 Impact factor: 4.379
Authors: A Bolu Ajiboye; Francis R Willett; Daniel R Young; William D Memberg; Brian A Murphy; Jonathan P Miller; Benjamin L Walter; Jennifer A Sweet; Harry A Hoyen; Michael W Keith; P Hunter Peckham; John D Simeral; John P Donoghue; Leigh R Hochberg; Robert F Kirsch Journal: Lancet Date: 2017-03-28 Impact factor: 79.321
Authors: Sergey D Stavisky; Francis R Willett; Donald T Avansino; Leigh R Hochberg; Krishna V Shenoy; Jaimie M Henderson Journal: J Neural Eng Date: 2020-02-05 Impact factor: 5.379
Authors: Jane E Huggins; Christoph Guger; Erik Aarnoutse; Brendan Allison; Charles W Anderson; Steven Bedrick; Walter Besio; Ricardo Chavarriaga; Jennifer L Collinger; An H Do; Christian Herff; Matthias Hohmann; Michelle Kinsella; Kyuhwa Lee; Fabien Lotte; Gernot Müller-Putz; Anton Nijholt; Elmar Pels; Betts Peters; Felix Putze; Rüdiger Rupp; Gerwin Schalk; Stephanie Scott; Michael Tangermann; Paul Tubig; Thorsten Zander Journal: Brain Comput Interfaces (Abingdon) Date: 2019-12-10
Authors: Blake S Wilson; Debara L Tucci; David A Moses; Edward F Chang; Nancy M Young; Fan-Gang Zeng; Nicholas A Lesica; Andrés M Bur; Hannah Kavookjian; Caroline Mussatto; Joseph Penn; Sara Goodwin; Shannon Kraft; Guanghui Wang; Jonathan M Cohen; Geoffrey S Ginsburg; Geraldine Dawson; Howard W Francis Journal: J Assoc Res Otolaryngol Date: 2022-04-20
Authors: Krishna V Shenoy; Jaimie M Henderson; Sergey D Stavisky; Francis R Willett; Guy H Wilson; Brian A Murphy; Paymon Rezaii; Donald T Avansino; William D Memberg; Jonathan P Miller; Robert F Kirsch; Leigh R Hochberg; A Bolu Ajiboye; Shaul Druckmann Journal: Elife Date: 2019-12-10 Impact factor: 8.140
Authors: Guy H Wilson; Sergey D Stavisky; Francis R Willett; Donald T Avansino; Jessica N Kelemen; Leigh R Hochberg; Jaimie M Henderson; Shaul Druckmann; Krishna V Shenoy Journal: J Neural Eng Date: 2020-11-25 Impact factor: 5.379
Authors: Jaime Delgado Saa; Andy Christen; Stephanie Martin; Brian N Pasley; Robert T Knight; Anne-Lise Giraud Journal: Sci Rep Date: 2020-05-06 Impact factor: 4.379
Authors: David A Moses; Sean L Metzger; Jessie R Liu; Gopala K Anumanchipalli; Joseph G Makin; Pengfei F Sun; Josh Chartier; Maximilian E Dougherty; Patricia M Liu; Gary M Abrams; Adelyn Tu-Chan; Karunesh Ganguly; Edward F Chang Journal: N Engl J Med Date: 2021-07-15 Impact factor: 91.245