Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion.

Literature DB >> 31984214

Structured Sparse Spectral Transforms and Structural Measures for Voice Conversion.

Yunxin Zhao¹, Mili Kuruvilla-Dugdale², Minguang Song¹.

Abstract

We investigate a structured sparse spectral transform method for voice conversion (VC) to perform frequency warping and spectral shaping simultaneously on high-dimensional (D) STRAIGHT spectra. Learning a large transform matrix for high-D data often results in an overfit matrix with low sparsity, which leads to muffled speech in VC. We address this problem by using the frequency-warping characteristic of a source-target speaker pair to define a region of support (ROS) in a transform matrix, and further optimize it by nonnegative matrix factorization (NMF) to obtain structured sparse transform. We also investigate structural measures of spectral and temporal covariance and variance at different scales for assessing VC speech quality. Our experiments on ARCTIC dataset of 12 speaker pairs show that embedding the ROS in spectral transforms offers flexibility in tradeoffs between spectral distortion and structure preservation, and the structural measures provide quantitatively reasonable results on converted speech. Our subjective listening tests show that the proposed VC method achieves a mean opinion score of "very good" relative to natural speech, and in comparison with three other VC methods, it is the most preferred one in naturalness and in voice similarity to target speakers.

Entities: CellLine Chemical Disease Gene Species

Keywords: NMF; Voice conversion; frequency warping; objective measures; structured sparse spectral transform

Year: 2018 PMID： 31984214 PMCID： PMC6980218 DOI： 10.1109/TASLP.2018.2860682

Source DB: PubMed Journal: IEEE/ACM Trans Audio Speech Lang Process

Keyword Cloud
Cited

1 in total

1. Voice Conversion for Persons with Amyotrophic Lateral Sclerosis.

Authors: Yunxin Zhao; Mili Kuruvilla-Dugdale; Minguang Song
Journal: IEEE J Biomed Health Inform Date: 2019-12-25 Impact factor: 5.772

1 in total