| Literature DB >> 34241427 |
Cong Zhang1, Kathleen Jepson1, Georg Lohfink2, Amalia Arvaniti1.
Abstract
Face-to-face speech data collection has been next to impossible globally as a result of the COVID-19 restrictions. To address this problem, simultaneous recordings of three repetitions of the cardinal vowels were made using a Zoom H6 Handy Recorder with an external microphone (henceforth, H6) and compared with two alternatives accessible to potential participants at home: the Zoom meeting application (henceforth, Zoom) and two lossless mobile phone applications (Awesome Voice Recorder, and Recorder; henceforth, Phone). F0 was tracked accurately by all of the devices; however, for formant analysis (F1, F2, F3), Phone performed better than Zoom, i.e., more similarly to H6, although the data extraction method (VoiceSauce, Praat) also resulted in differences. In addition, Zoom recordings exhibited unexpected drops in intensity. The results suggest that lossless format phone recordings present a viable option for at least some phonetic studies.Entities:
Year: 2021 PMID: 34241427 PMCID: PMC8269758 DOI: 10.1121/10.0005132
Source DB: PubMed Journal: J Acoust Soc Am ISSN: 0001-4966 Impact factor: 1.840
Recording equipment and application information for each participant.
| Participant identification | Mobile phone | Application | Zoom | Recorder | Microphone |
|---|---|---|---|---|---|
| PF1 | Samsung Note10 | AVR | MS Surface Pro 6 | Zoom H6 | Sennheiser HSP2 |
| PF2 | Samsung Galaxy S10e | AVR | Dell Precision 5520 | Zoom H6 | Rode NT3 cardioid mic (on stand) |
| PF3 | Samsung Note10 | AVR | MS Surface Pro 6 | Zoom H6 | Sennheiser HSP2 |
| PF4 | Google Pixel 3a | AVR | Lenovo Thinkpad T495 | Zoom H6 | Sennheiser HSP2 |
| PM1 | Samsung Note9 | AVR | MS Surface Pro 6 | Zoom H6 | Sennheiser HSP2 |
| PM2 | Apple iPhone 5 s | AVR | Lenovo Thinkpad T495 | Zoom H6 | Sennheiser HSP2 |
| PM3 | bq AQUARIS E4.5 Ubuntu Edition | Recorder | Lenovo Thinkpad T495 | Zoom H6 | Sennheiser ME64 and K6P |
Awesome Voice Recorder (Newkline, 2020).
Recorder (DawnDIY, 2016) is an application available on Linux phones; see Sec. II D for details.
Results from the final statistical models for F0 (intercept, H6); formula, F0 ∼ device + (1 | speaker) + (1 | vowel).
| Estimate | Standard error (SE) | Degrees of freedom (df) | Pr(>| | |||
|---|---|---|---|---|---|---|
| VoiceSauce | (Intercept) | 169.94 | 18.39 | 7.13 | 9.24 | <0.001 |
| devicePhone | 0.6 | 1.38 | 490.01 | 0.43 | 0.664 | |
| deviceZoom | −0.04 | 1.38 | 490.01 | −0.03 | 0.979 | |
| Praat | (Intercept) | 170.35 | 19.23 | 7.14 | 8.86 | <0.001 |
| devicePhone | 0.18 | 1.13 | 490 | 0.16 | 0.872 | |
| deviceZoom | 1.14 | 1.13 | 490 | 1.01 | 0.315 |
Results from the final statistical models for F3 (intercept, H6); formula, F3 ∼ device + (1 | speaker) + (1 | vowel). Rows in bold indicate significant statistical difference from the H6 baseline (p < 0.05).
| Estimate | SE | df | Pr(>| | |||
|---|---|---|---|---|---|---|
| VoiceSauce | (Intercept) | 2894.6 | 82.03 | 14.5 | 35.29 | <0.001 |
| devicePhone | 14.56 | 26.32 | 489.97 | 0.55 | 0.581 | |
| deviceZoom | −28.51 | 26.32 | 489.97 | −1.08 | 0.279 | |
| Praat | (Intercept) | 2870.76 | 102.62 | 14.1 | 27.98 | <0.001 |
| devicePhone | 46.41 | 34.4 | 489.97 | 1.35 | 0.178 | |
FIG. 1.Boxplots of the differences in frequency between H6 and Phone and H6 and Zoom for F0 (a), F1 (b), F2 (c), and F3 (d) for VoiceSauce-extracted data (left) and Praat-extracted data (right). The middle line represents the median, the upper and lower edges of the box represent the first and third quartiles, and the whiskers indicate the range, which is up to 1.5 times the inter-quartile range away from the median.
Results from the final statistical models for F1 (intercept, H6); formula, F1 ∼ device + (1 | speaker) + (1 | vowel). Rows in bold indicate significant statistical difference from the H6 baseline (p < 0.05).
| Estimate | SE | df | Pr(>| | |||
|---|---|---|---|---|---|---|
| VoiceSauce | (Intercept) | 518.63 | 60.67 | 10.58 | 8.55 | <0.001 |
| devicePhone | −7.68 | 11.42 | 489.99 | −0.67 | 0.502 | |
| Praat | (Intercept) | 545.88 | 60.1 | 10.94 | 9.08 | <0.001 |
| devicePhone | −2.92 | 8.12 | 489.99 | −0.36 | 0.719 | |
Results from the final statistical models for F2 (intercept, H6); formula, F2 ∼ device + (1 | speaker) + (1 | vowel). Rows in bold indicate significant statistical difference from the H6 baseline (p < 0.05).
| Estimate | SE | df | Pr(>| | |||
|---|---|---|---|---|---|---|
| VoiceSauce | (Intercept) | 1470.34 | 209.71 | 8.89 | 7.01 | <0.001 |
| Praat | (Intercept) | 1453.67 | 210.74 | 8.64 | 6.9 | <0.001 |
| devicePhone | 12.93 | 28.5 | 489.99 | 0.45 | 0.65 | |
FIG. 2.One repetition of vowel [o] from PF3's Zoom recording. The spectrogram with the intensity curve at the top and waveform below is shown.
FIG. 3.The vowel space by device for each speaker are shown for the VoiceSauce-extracted data (top) and Praat-extracted data (bottom).