| Literature DB >> 29034132 |
Yossi Adi1, Joseph Keshet1, Matthew Goldrick2.
Abstract
Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm is based on a deep neural network trained at the frame level on manually annotated data from a phonetic study. Specifically, we try two deep-network architectures: convolutional neural network (CNN), and deep belief network (DBN), and compare their accuracy to an HMM-based forced aligner. Results suggest that CNN is better than DBN, and both CNN and HMM-based forced aligner are comparable in their results, but neither of them yielded the same predictions as models fit to manually annotated data.Entities:
Keywords: convolution neural networks; deep belief networks; forced alignment; hidden Markov models; vowel duration measurement
Year: 2015 PMID: 29034132 PMCID: PMC5636193 DOI: 10.1109/MLSP.2015.7324331
Source DB: PubMed Journal: IEEE Int Workshop Mach Learn Signal Process