| Literature DB >> 35291257 |
Junzhe Zhu1, Mark Hasegawa-Johnson1, Nancy McElwain1.
Abstract
We design a framework for studying prelinguistic child voice from 3 to 24 months based on state-of-the-art algorithms in diarization. Our system consists of a time-invariant feature extractor, a context-dependent embedding generator, and a classifier. We study the effect of swapping out different components of the system, as well as changing loss function, to find the best performance. We also present a multiple-instance learning technique that allows us to pre-train our parameters on larger datasets with coarser segment boundary labels. We found that our best system achieved 43.8% DER on test dataset, compared to 55.4% DER achieved by LENA software. We also found that using convolutional feature extractor instead of logmel features significantly increases the performance of neural diarization.Entities:
Keywords: Child Speech; Language Development; Multiple Instance Learning; Speaker Diarization; Transfer Learning; Voice Activity Detection
Year: 2021 PMID: 35291257 PMCID: PMC8919348 DOI: 10.1109/icassp39728.2021.9413538
Source DB: PubMed Journal: Proc IEEE Int Conf Acoust Speech Signal Process ISSN: 1520-6149