Christoph Janott1, Maximilian Schmitt2, Yue Zhang3, Kun Qian4, Vedhas Pandit2, Zixing Zhang3, Clemens Heiser5, Winfried Hohenhorst6, Michael Herzog7, Werner Hemmert8, Björn Schuller9. 1. Institute for Medical Engineering, Technische Universität München, Boltzmannstr. 11, 85748, Garching, Germany. Electronic address: c.janott@gmx.net. 2. Chair of Complex & Intelligent Systems, Universität Passau, Innstr. 43, 94032, Passau, Germany; ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Eichleitnerstr. 30, 86159, Augsburg, Germany. 3. GLAM - Group on Language, Audio & Music, Imperial College London, 180 Queens Gate, Huxley Bldg, London, SW7 2AZ, UK. 4. Machine Intelligence & Signal Processing group, MMK, Technische Universität München, Arcisstr. 21, 80333, Munich, Germany. 5. Department of Otorhinolaryngology/Head and Neck Surgery, Klinikum rechts der Isar, Technische Universität München, Ismaningerstr. 22, 81675, Munich, Germany. 6. Clinic for ENT Medicine, Head and Neck Surgery, Alfried Krupp Krankenhaus, Alfried-Krupp-Str. 21, 45131, Essen, Germany. 7. Clinic for ENT Medicine, Head and Neck Surgery, Carl-Thiem-Klinikum, Thiemstr. 111, 03048, Cottbus, Germany. 8. Institute for Medical Engineering, Technische Universität München, Boltzmannstr. 11, 85748, Garching, Germany. 9. Chair of Complex & Intelligent Systems, Universität Passau, Innstr. 43, 94032, Passau, Germany; ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Eichleitnerstr. 30, 86159, Augsburg, Germany; GLAM - Group on Language, Audio & Music, Imperial College London, 180 Queens Gate, Huxley Bldg, London, SW7 2AZ, UK.
Abstract
OBJECTIVE: Snoring can be excited in different locations within the upper airways during sleep. It was hypothesised that the excitation locations are correlated with distinct acoustic characteristics of the snoring noise. To verify this hypothesis, a database of snore sounds is developed, labelled with the location of sound excitation. METHODS: Video and audio recordings taken during drug induced sleep endoscopy (DISE) examinations from three medical centres have been semi-automatically screened for snore events, which subsequently have been classified by ENT experts into four classes based on the VOTE classification. The resulting dataset containing 828 snore events from 219 subjects has been split into Train, Development, and Test sets. An SVM classifier has been trained using low level descriptors (LLDs) related to energy, spectral features, mel frequency cepstral coefficients (MFCC), formants, voicing, harmonic-to-noise ratio (HNR), spectral harmonicity, pitch, and microprosodic features. RESULTS: An unweighted average recall (UAR) of 55.8% could be achieved using the full set of LLDs including formants. Best performing subset is the MFCC-related set of LLDs. A strong difference in performance could be observed between the permutations of train, development, and test partition, which may be caused by the relatively low number of subjects included in the smaller classes of the strongly unbalanced data set. CONCLUSION: A database of snoring sounds is presented which are classified according to their sound excitation location based on objective criteria and verifiable video material. With the database, it could be demonstrated that machine classifiers can distinguish different excitation location of snoring sounds in the upper airway based on acoustic parameters.
OBJECTIVE: Snoring can be excited in different locations within the upper airways during sleep. It was hypothesised that the excitation locations are correlated with distinct acoustic characteristics of the snoring noise. To verify this hypothesis, a database of snore sounds is developed, labelled with the location of sound excitation. METHODS: Video and audio recordings taken during drug induced sleep endoscopy (DISE) examinations from three medical centres have been semi-automatically screened for snore events, which subsequently have been classified by ENT experts into four classes based on the VOTE classification. The resulting dataset containing 828 snore events from 219 subjects has been split into Train, Development, and Test sets. An SVM classifier has been trained using low level descriptors (LLDs) related to energy, spectral features, mel frequency cepstral coefficients (MFCC), formants, voicing, harmonic-to-noise ratio (HNR), spectral harmonicity, pitch, and microprosodic features. RESULTS: An unweighted average recall (UAR) of 55.8% could be achieved using the full set of LLDs including formants. Best performing subset is the MFCC-related set of LLDs. A strong difference in performance could be observed between the permutations of train, development, and test partition, which may be caused by the relatively low number of subjects included in the smaller classes of the strongly unbalanced data set. CONCLUSION: A database of snoring sounds is presented which are classified according to their sound excitation location based on objective criteria and verifiable video material. With the database, it could be demonstrated that machine classifiers can distinguish different excitation location of snoring sounds in the upper airway based on acoustic parameters.
Authors: C Janott; M Schmitt; C Heiser; W Hohenhorst; M Herzog; M Carrasco Llatas; W Hemmert; B Schuller Journal: HNO Date: 2019-09 Impact factor: 1.284