Literature DB >> 33584835

Deep multiple instance learning for foreground speech localization in ambient audio from wearable devices.

Rajat Hebbar1, Pavlos Papadopoulos1, Ramon Reyes2, Alexander F Danvers2, Angelina J Polsinelli3, Suzanne A Moseley4, David A Sbarra2, Matthias R Mehl2, Shrikanth Narayanan1.   

Abstract

Over the recent years, machine learning techniques have been employed to produce state-of-the-art results in several audio related tasks. The success of these approaches has been largely due to access to large amounts of open-source datasets and enhancement of computational resources. However, a shortcoming of these methods is that they often fail to generalize well to tasks from real life scenarios, due to domain mismatch. One such task is foreground speech detection from wearable audio devices. Several interfering factors such as dynamically varying environmental conditions, including background speakers, TV, or radio audio, render foreground speech detection to be a challenging task. Moreover, obtaining precise moment-to-moment annotations of audio streams for analysis and model training is also time-consuming and costly. In this work, we use multiple instance learning (MIL) to facilitate development of such models using annotations available at a lower time-resolution (coarsely labeled). We show how MIL can be applied to localize foreground speech in coarsely labeled audio and show both bag-level and instance-level results. We also study different pooling methods and how they can be adapted to densely distributed events as observed in our application. Finally, we show improvements using speech activity detection embeddings as features for foreground detection.
© The Author(s) 2021.

Entities:  

Keywords:  Foreground speech detection; Multiple instance learning; Weakly labeled audio; Wearable audio

Year:  2021        PMID: 33584835      PMCID: PMC7858549          DOI: 10.1186/s13636-020-00194-0

Source DB:  PubMed          Journal:  EURASIP J Audio Speech Music Process        ISSN: 1687-4714


  9 in total

1.  Sensing sociability: Individual differences in young adults' conversation, calling, texting, and app use behaviors in daily life.

Authors:  Gabriella M Harari; Sandrine R Müller; Clemens Stachl; Rui Wang; Weichen Wang; Markus Bühner; Peter J Rentfrow; Andrew T Campbell; Samuel D Gosling
Journal:  J Pers Soc Psychol       Date:  2019-05-20

Review 2.  Unobtrusive sensing and wearable devices for health informatics.

Authors:  Ya-Li Zheng; Xiao-Rong Ding; Carmen Chung Yan Poon; Benny Ping Lai Lo; Heye Zhang; Xiao-Lin Zhou; Guang-Zhong Yang; Ni Zhao; Yuan-Ting Zhang
Journal:  IEEE Trans Biomed Eng       Date:  2014-05       Impact factor: 4.538

3.  "Eavesdropping on Happiness" Revisited: A Pooled, Multisample Replication of the Association Between Life Satisfaction and Observed Daily Conversation Quantity and Quality.

Authors:  Anne Milek; Emily A Butler; Allison M Tackman; Deanna M Kaplan; Charles L Raison; David A Sbarra; Simine Vazire; Matthias R Mehl
Journal:  Psychol Sci       Date:  2018-07-03

4.  Weakly Supervised Object Localization with Multi-Fold Multiple Instance Learning.

Authors:  Ramazan Gokberk Cinbis; Jakob Verbeek; Cordelia Schmid
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2016-02-26       Impact factor: 6.226

5.  Is well-being associated with the quantity and quality of social interactions?

Authors:  Jessie Sun; Kelci Harris; Simine Vazire
Journal:  J Pers Soc Psychol       Date:  2019-10-24

6.  Natural, Everyday Language Use Provides a Window Into the Integrity of Older Adults' Executive Functioning.

Authors:  Angelina J Polsinelli; Suzanne A Moseley; Matthew D Grilli; Elizabeth L Glisky; Matthias R Mehl
Journal:  J Gerontol B Psychol Sci Soc Sci       Date:  2020-10-16       Impact factor: 4.077

7.  Detecting Depression Severity from Vocal Prosody.

Authors:  Ying Yang; Catherine Fairbairn; Jeffrey F Cohn
Journal:  IEEE Trans Affect Comput       Date:  2013-07-11       Impact factor: 10.506

8.  The Electronically Activated Recorder (EAR): A Method for the Naturalistic Observation of Daily Social Behavior.

Authors:  Matthias R Mehl
Journal:  Curr Dir Psychol Sci       Date:  2017-04-06

9.  Voice acoustical measurement of the severity of major depression.

Authors:  Michael Cannizzaro; Brian Harel; Nicole Reilly; Phillip Chappell; Peter J Snyder
Journal:  Brain Cogn       Date:  2004-10       Impact factor: 2.310

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.