Literature DB >> 33733165

General Northern English. Exploring Regional Variation in the North of England With Machine Learning.

Patrycja Strycharczuk1, Manuel López-Ibáñez2, Georgina Brown3, Adrian Leemann4.   

Abstract

In this paper, we present a novel computational approach to the analysis of accent variation. The case study is dialect leveling in the North of England, manifested as reduction of accent variation across the North and emergence of General Northern English (GNE), a pan-regional standard accent associated with middle-class speakers. We investigated this instance of dialect leveling using random forest classification, with audio data from a crowd-sourced corpus of 105 urban, mostly highly-educated speakers from five northern UK cities: Leeds, Liverpool, Manchester, Newcastle upon Tyne, and Sheffield. We trained random forest models to identify individual northern cities from a sample of other northern accents, based on first two formant measurements of full vowel systems. We tested the models using unseen data. We relied on undersampling, bagging (bootstrap aggregation) and leave-one-out cross-validation to address some challenges associated with the data set, such as unbalanced data and relatively small sample size. The accuracy of classification provides us with a measure of relative similarity between different pairs of cities, while calculating conditional feature importance allows us to identify which input features (which vowels and which formants) have the largest influence in the prediction. We do find a considerable degree of leveling, especially between Manchester, Leeds and Sheffield, although some differences persist. The features that contribute to these differences most systematically are typically not the ones discussed in previous dialect descriptions. We propose that the most systematic regional features are also not salient, and as such, they serve as sociolinguistic regional indicators. We supplement the random forest results with a more traditional variationist description of by-city vowel systems, and we use both sources of evidence to inform a description of the vowels of General Northern English.
Copyright © 2020 Strycharczuk, López-Ibáñez, Brown and Leemann.

Entities:  

Keywords:  Northern English; accent features; dialect leveling; feature selection; random forests; vowels

Year:  2020        PMID: 33733165      PMCID: PMC7861339          DOI: 10.3389/frai.2020.00048

Source DB:  PubMed          Journal:  Front Artif Intell        ISSN: 2624-8212


  6 in total

1.  Survival ensembles.

Authors:  Torsten Hothorn; Peter Bühlmann; Sandrine Dudoit; Annette Molinaro; Mark J van der Laan
Journal:  Biostatistics       Date:  2005-12-12       Impact factor: 5.899

2.  A cross-dialectal acoustic comparison of vowels in Northern and Southern British English.

Authors:  Daniel Williams; Paola Escudero
Journal:  J Acoust Soc Am       Date:  2014-11       Impact factor: 1.840

3.  Fronting of Southern British English high-back vowels in articulation and acoustics.

Authors:  Patrycja Strycharczuk; James M Scobbie
Journal:  J Acoust Soc Am       Date:  2017-07       Impact factor: 1.840

4.  Automatic sociophonetics: Exploring corpora with a forensic accent recognition system.

Authors:  Georgina Brown; Jessica Wormald
Journal:  J Acoust Soc Am       Date:  2017-07       Impact factor: 1.840

5.  Bias in random forest variable importance measures: illustrations, sources and a solution.

Authors:  Carolin Strobl; Anne-Laure Boulesteix; Achim Zeileis; Torsten Hothorn
Journal:  BMC Bioinformatics       Date:  2007-01-25       Impact factor: 3.169

6.  Conditional variable importance for random forests.

Authors:  Carolin Strobl; Anne-Laure Boulesteix; Thomas Kneib; Thomas Augustin; Achim Zeileis
Journal:  BMC Bioinformatics       Date:  2008-07-11       Impact factor: 3.169

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.