Literature DB >> 33745306

The vocal tract as a time machine: inferences about past speech and language from the anatomy of the speech organs.

Dan Dediu¹, Scott R Moisik², W A Baetsen³, Abel Marinus Bosman^4,5, Andrea L Waters-Rist⁶.

Abstract

While speech and language do not fossilize, they still leave traces that can be extracted and interpreted. Here, we suggest that the shape of the hard structures of the vocal tract may also allow inferences about the speech of long-gone humans. These build on recent experimental and modelling studies, showing that there is extensive variation between individuals in the precise shape of the vocal tract, and that this variation affects speech and language. In particular, we show that detailed anatomical information concerning two components of the vocal tract (the lower jaw and the hard palate) can be extracted and digitized from the osteological remains of three historical populations from The Netherlands, and can be used to conduct three-dimensional biomechanical simulations of vowel production. We could recover the signatures of inter-individual variation between these vowels, in acoustics and articulation. While 'proof-of-concept', this study suggests that older and less well-preserved remains could be used to draw inferences about historic and prehistoric languages. Moreover, it forces us to clarify the meaning and use of the uniformitarian principle in linguistics, and to consider the wider context of language use, including the anatomy, physiology and cognition of the speakers. This article is part of the theme issue 'Reconstructing prehistoric languages'.

Entities: CellLine Chemical Disease Gene Species

Keywords: language change; osteology; phonetics; vocal tract

Year: 2021 PMID： 33745306 PMCID： PMC8059537 DOI： 10.1098/rstb.2020.0192

Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN： 0962-8436 Impact factor: 6.237

Introduction

Obviously, the dead cannot speak: we cannot listen to Shakespeare reading A Midsummer Night's Dream, we cannot elicit verb conjugations from Cicero, and we cannot even get Tutankhamun to say ‘aaah’ … The furthest we could get—which is incontestably a tour de force—was to CT-scan the mummified body of an Egyptian scribe and priest, Nesyamun, from about 3000 years ago, print a three-dimensional reconstruction of his vocal tract, and produce a creepy sounding [æ:::] [1]. That, and the various attempts to simulate Neanderthal vowels based on debatable reconstructions and assumptions, leading to endless debates about their capacity (or lack thereof) to articulate the ‘full modern’ vowel space, with a particular focus on [u] [2-6]. While we are far from a general consensus, language and speech go back at least to the origins of modern humans a few hundred thousands of years ago [7,8] but, most probably (given recent evidence), at least to the last common ancestor of modern humans and our closest evolutionary relatives, the Neanderthals and the Denisovans, about half a million years ago [9,10]. But our limits in what concerns the speech and language of long-gone people run deeper than this: there seems to a widespread assumption in linguistics that, on the one hand, there are precious few traces left by speech and language (writing goes back not more than a few thousand years) and, on the other, living (and attested) languages are very poor at retaining information about their earlier stages. Taken together, these seem to impose a ‘time horizon’ beyond which we cannot really know much [11], a time horizon that is usually placed at most 10 000 years ago, and rooted in the breakdown of the ‘standard’ historical linguistic comparative method of information recovery and inference ([12-15]; see also [16]). This breakdown results in the reluctance to connect established language families into larger (and, presumably, deeper) constructs such as ‘Nostratic’ [17,18], ‘Eurasiatic’ [19] or ‘Altaic’/’Transeurasian’ [20,21]. While this reluctance is clearly justified [11,17,22] by the daunting methodological and data availability issues, which make it very hard to distinguish biases and a priori stances from actual inferences from the data, it also makes it very hard to study these issues. Nevertheless, there are intriguing hints that language might not be as bad at retaining information after all, especially when combined with external sources of evidence, such as (ancient) genetics [23-25] and archaeology [26], but accessing it requires the development and application of new methods, cross-disciplinary collaborations and, most importantly, the willingness to accept that false positives will inevitably be generated, but that the scientific process will weed (most of) them out. In this context, it is interesting to note that quantitative methods borrowed from evolutionary biology (especially Bayesian phylogenetics) have not only helped refine the internal structure of established language families (such as Indo-European, Austronesian, Uralic or Pama-Nyungan), but also, especially combined with external evidence, suggested ideas about their origins and spread [27-30], pushing the boundaries to 5000–8000 years ago. The same class of methods, however, has been used to explore even ‘deeper’ connections between languages, producing exciting but highly controversial results. Some of these results include: the apparent support for something akin to Eurasiatic (about 15 000 years ago) from phylogenetic methods applied to cognacy judgments [31] and to the alignments of actual Automated Similarity Judgment Program (ASJP; [32]) transcriptions [33]; the proposed Bayesian phylogenetic evidence for Transeurasian [20]; the very indirect finding that the stability of structural features of language might conserve continent-wide deep signals of shared ancestry and/or contact, suggesting connections between the language families of the Americas and north-eastern Eurasia going back approximately 15 000 years or so [34]; and, the bold claim that phonological systems might retain a signal of the modern human expansion from Africa some tens of thousands of years ago [35,36]. However, besides being currently unclear how reliable these findings are [22,37,38], they only push our knowledge back but a sliver of the half million years or so of human speech and language [9,10], and remain, in general, fairly abstract and ‘high level’. Can we do better? Here we suggest that we might be able to infer fairly concrete information about past phonetics and phonologies, going back as far as the fossil record of the Neanderthals and other ‘archaic humans’ (half a million—or even more—years ago). To do so, we will use the links between aspects of the vocal tract anatomy, and the articulation and acoustics of speech which are uncovered by recent investigations into the patterns of inter-individual variation in the anatomy of the speech organs. Besides allowing us to make informed guesses about Neanderthals lacking labiodentals and the persistence of clicks in sub-equatorial Africa, this approach also questions the indiscriminate application of a strong uniformitarian principle to speech and language, arguing instead for a much more nuanced inferential framework that takes into account the wider context of language, which includes, among others, the physical environment and human biology [39,40].

Variation everywhere

Due to space constraints and the specific focus of this paper, we will only briefly summarize here points discussed at length in other publications (e.g. [39-41]). Recent advances in several scientific fields (including medicine, human genetics, anthropology and psychology), the availability of large computer-readable databases, the democratization of computing power, data analysis and statistics, and wider changes in how society at large, and science in particular, sees variation, have allowed a renewed interest in understanding how people vary, how this inter-individual variation is patterned, and how it relates to universal characteristics. What we realize is that, on a massive foundation of sharedness, individuals vary in subtle ways at all levels of study: from the molecular [42,43], to the anatomical and physiological [44], and to the psychological and cognitive [45]. This variation is mostly small and quantitative, and results in a wide range of ‘normality’ that grades into the ‘pathological’—here we are interested in this normal range of variation, which is not distributed at random between individuals, but is intricately patterned. Thus, on the one hand, any given individual belongs to multiple (overlapping or nested) groups, and groups differ in myriad continuous, statistical and multivariate ways (as opposed to ‘crisp’, deterministic, categorical differences driven by one or a few characteristics, as usually claimed by racist ideologies). This patterning is rooted in our complex but relatively recent evolutionary and demographic history [46-48]: while we are much more uniform than other species (for example, there is less genetic diversity in the 8 billion people spread across the world than in a few hundred highly geographically circumscribed chimpanzees; [49]), we do vary, with most of this variation distributed between individuals (approx. 80%), and not between groups either within (approx. 10%) or between continents (approx. 10%) [46,50]. Yet, this variation is informative enough (especially when aggregated among many genetic loci and characteristics) to recover individual origins, geographic patterns of human dispersals and migrations, and past demographic events [48,51]. This variation decreases with distance from Africa, is distributed as continuous and overlapping gradients across many variables, with very few (if any) sharp boundaries—often referred to as ‘clinal variation’ [46,47,52]. Clinal variation underscores the unity of humankind, and a proper understanding of its patterns and processes represents one of the most powerful scientific arguments against racism, sexism and other forms of discrimination [48,53-56]. Language in general, and the vocal tract in particular, are far from being exceptions, despite decades of focusing on (absolute) universals and denying, or, at best, dismissing and trivializing variation as mere irrelevant ‘noise’ to be removed from the analysis and ignored from theorizing. Of course, this was not a blanket stance, with aspects of variation being actively studied in, for example, dialectology [57], sociolinguistics [58] and phonetics [59-64], but it did result in the marginalization of such inquiries and limited the appreciation of how widespread variation is, and of how powerful an explanatory factor it may be [39,41]. However, focusing on phonetics and phonology alone, the last two decades have witnessed the accumulation of data concerning the type and patterns of inter-individual variation in the production and perception of speech, as well as a heightened interest in the theoretical implications for phonology, sound change and typological diversity [41,65]. Of particular interest here is variation in the anatomy of the vocal tract structures, especially in those structures that (i) have a higher chance of preservation in the osteological and fossil record, or (ii) whose particularities can be inferred from ancient DNA, and their effects on phonetics and phonology. While our understanding of the genetic and developmental underpinnings of the vocal tract is still in its infancy, being primarily based on pathologies with a genetic component (see, for example, the information available in OMIM, https://omim.org/), there are several candidate genes and mechanisms known, especially concerning the teeth [66,67] and the hard palate [68,69], but also the skull and the face in general (see, for example, FaceBase; https://www.facebase.org/). These sources of information are rich enough to even allow some inferences about the evolution of the position of the larynx and the structure of the face from methylation patterns in ancient DNA extracted from archaic human fossils [70]. However, we are far from being able to reliably reconstruct the details of normal variation in vocal tract anatomy from the (epi)genetic variants ascertained from the remains of a given individual, which means that, at least for now, we should focus instead on the osteological and fossil record, which has the advantage that the relevant anatomical details are sometimes preserved well enough, but is affected by the twin disadvantages of very small (and geographically and historically skewed) sample sizes, and the lack of soft tissue preservation (with a few notable exceptions involving special taphonomic conditions such as anaerobic bogs, permafrost or ice; [71,72]). It is also important to recognize the potential influence of environmental factors on the physiology and anatomy of the vocal tract. The desiccation of the vocal folds, and its subsequent phonatory consequences, in environments with low air humidity [73,74] is an example of the first type, while the sexually dimorphic and geographically patterned variation in the nasal cavity [75-77], which seems partly explained by the need to warm and humidify the in-breathed air in cold and dry climates [76,78-80], with possible consequences for nasalization, is an example of the second type. Before discussing the type of vocal tract data that can be recovered from this record, and the kind of inferences for speech and language that can be made, we briefly review a few studies linking variation in details of vocal tract anatomy to phonetics and phonology using a multitude of methodologies, including experimental designs with living people, computational modelling and phylogenetic inferences.

From details of the vocal tract to phonetic and phonological diversity

To begin, there are well-known claims that the position of the larynx within the throat can be inferred from traces in the archaeological record, most notably the shape of the base of the cranium and characteristics of the hyoid bone, and that this positioning might allow us to say something about the capacity of ancient humans to produce (or not) the full modern vowel space [2,3,5,6,9]. However, despite claims to the contrary [81,82], it is far from clear how reliable such reconstructions are (even when adding inferences from ancient epigenomes; [70]), but, more importantly, it is not obvious what articulatory, acoustic and linguistic effects different positions of the larynx would have. The original claims [6] that Neanderthals had a much higher larynx than modern humans and that this precluded them from producing the full spectrum of vowels found in the currently spoken languages, did not age well. Newer models of speech articulation seem to suggest that the vowel space produced with a higher larynx is not terribly limited thanks to active compensation by the other articulators [2-4], that the rest position of the larynx is not very relevant given its wide dynamic range [83,84], and that the Neanderthal hyoid bone was, in fact, anatomically and biomechanically extremely similar to the modern human one [85]. Finally, despite the importance of the peripheral vowels, and the fact that actual speech productions are spread across the potential (acoustic and articulatory) vowel space relatively independently of the described phonological system of the language, few modern languages use the full extent of this potential space to convey phonological distinctions. As we will detail below, the hard structures of the vocal tract have by far the highest chances of preservation in the osteological and fossil record, suggesting that we should focus on the hard palate, the jaw and the teeth. There is tremendous inter-individual morphological variation in all these structures, and some aspects of this variation are patterned, in a continuous, statistical and multivariate manner, also between groups [39,41]. We will briefly review a few studies, using a variety of methods and data, showing how normal variation in these structures affects speech. There are experimental indications that the midsagittal shape of the hard palate—between ‘domed’ and ‘flat’—influences token-to-token variability, with speakers with ‘flatter’ palates showing less articulatory variability [61,86]. Computer models combining a realistic geometric model of the vocal tract (VocalTractLab 2.1; https://www.vocaltractlab.de/; [87]) controlled by a neural network, which repeatedly learns and transmits vowels across multiple generations [88], show that details of the midsagittal shape of the hard palate, as quantified using Bézier curves [89], do affect the articulation and acoustic properties of vowels. Importantly, the acoustic effects survive the active compensation by the free articulators (the tongue, jaw and lips) and, despite being very weak in any particular generation, are amplified by repeated learning across generations, to the level of the acoustic variation observed within actual languages [88]. Zooming in on a specific substructure of the hard palate, the alveolar ridge (a shelf-like structure behind the upper incisors), biomechanical models using ArtiSynth [90] show that its shape and size affect the effort required for the articulation of click consonants as well as their acoustic properties [91]. Specifically, the alveolar ridge shape that is statistically most often found among the native speakers of ‘click languages’ in southern Africa (i.e. a ‘small’ or ‘absent’ alveolar ridge) seems to reduce the effort needed and may simultaneously enhance the acoustics of clicks. There is also experimental evidence from a large, ethnically diverse sample, using structural, sustained articulation and real-time magnetic resonance imaging (MRI), complemented with intra-oral optical three-dimensional scans and acoustic recordings [41], showing that the strategy used to articulate the North American English ‘r’ is influenced by the anatomy of the anterior vocal tract, including the hard palate, the alveolar ridge and the lower jaw. While practically indistinguishable from an acoustic point of view, these different articulations nevertheless influence the articulation and acoustics of other neighbouring sounds [92,93], thus potentially influencing sound change. The last study to be mentioned here [94] combines biomechanical modelling, large-scale cross-linguistic statistical analyses, detailed case studies, and Bayesian phylogenetic analyses of the Indo-European family. It shows that the type of bite (‘overjet’/’overbite’ versus ‘edge-to-edge’) most common in a population influences the probability that the language(s) spoken by that population will have labiodental sounds (such as ‘f’ and ‘v’) in their sound system. Importantly for us here, the type of bite is strongly influenced by post-developmental factors, especially the diet, with hunter-gatherer populations predominantly showing an ‘edge-to-edge’ bite, while those practicing agriculture predominantly having ‘overjet’/’overbite’ [94]. This influence is mediated by at least two processes having to do with the mechanical properties of food: tougher foods promote tooth erosion and movement [95-97], but also affect the growth of the lower jaw [98]. Taken together (see [39,41] for more comprehensive reviews and discussions), these studies suggest that (a) there is extensive normal inter-individual and, in some cases, even between-group variation in the hard structures of the vocal tract (e.g. hard palate, teeth, lower jaw); (b) this variation is mostly continuous, statistical and gradient in nature, but it (c) sometimes does influence the articulation and/or the acoustics of speech sounds, producing (d) (usually very) weak effects at the individual level (so-called ‘biases’) that nevertheless can be (e) amplified by the repeated use and transmission of language to result in unmistakable differences between dialects and languages in their phonetics and phonology. While this causal chain is very complex, influenced by many other factors (e.g. linguistic, cultural, historical, demographic, environmental) and composed of individually subtle links, it does seem to work, at least in some cases. This raises the hope that we may be able to infer something about past languages by combining such findings with information about vocal tract structures preserved in the osteological and fossil record—to which we now turn.

Aspects of the vocal tract that can be recovered from the osteological and fossil record

In what concerns the osteological and fossil traces of speech and language, a lot of attention has been given to the hyoid bone, the ear structures housed in the temporal bone, the openings in bones through which nerves can pass (i.e. foramina, canals), and to endocasts (fossilized brain traces). To date, we have fossilized hyoids from australopithecines [99], Homo heidelbergensis and Neanderthals ([85,100,101]; the finding originally identified as a Homo erectus hyoid [102] has been reinterpreted as a fragment of a vertebra [103]), and it seems that, while the Australopithecus hyoid is clearly different from the modern human one, the Neanderthal and pre-Neanderthal ones are very similar to our own [9,104]. The shape of the ear structures can be reconstructed using computed tomography (CT) scans, and ear ossicles are sometimes directly preserved [105-108], allowing inferences about the audition of long-gone humans: from these, it seems that, functionally, the hearing of Neanderthals was very similar to that of modern humans and clearly different from that of chimpanzees [9,10]. Unfortunately, brain endocasts [109,110] and the size of the hypoglossal canal [111,112] seem to currently offer rather limited and unclear evidence concerning speech and language. However, the evidence briefly reviewed above seems to suggest that we might want to also focus on other components of the vocal tract, particularly on those with a bony component, increasing their chances of surviving the taphonomic processes. Evidence suggests that while the hard palate is a rather fragile structure, it is often quite complete when the cranium is not too fragmented, while the lower jaw survives rather well [113,114]. In fact, in order to test the feasibility of extracting information about the hard palate and the lower jaw from osteological remains, in 2015–2016 we conducted an exploratory study of well-understood collections of historic human skeletons (coming from different cemeteries). The overarching goals were to (a) digitize the structures of interest in order to (b) compare the historical samples to each other and with a modern sample, so that we can (c) ascertain the feasibility of the methods, (d) draw conclusions about temporal changes in vocal tract anatomy and, possibly, (e) their effects on speech and language. The project was a collaboration between D Dediu and SR Moisik (then at the Max Planck Institute for Psycholinguistics in Nijmegen, The Netherlands) and AL Waters-Rist, WA Baetsen and AM Bosman (then at the Laboratory for Human Osteoarchaeology, Faculty of Archaeology, Leiden University, also in The Netherlands), as part of the larger G[ɜ]bils (Genetic biases in language and speech) project, funded by The Dutch Research Council (NWO). Following a careful assessment, considering data availability, quality, sample size and meta-information, we decided to use one contemporary and three historical samples, all collected in The Netherlands (figure 1): Alkmaar (1484–1574 CE), Klaaskinderkerke (thirteenth to sixteenth centuries CE), Middenbeemster (1829–1866 CE), and part of the contemporary ArtiVarK sample (2014–2015).

Figure 1

The map of The Netherlands showing the locations of the four samples (yellow triangles) as well as the location of Amsterdam (red diamond) for orientation. ArtiVarK is shown as located in Nijmegen. Map generated using QGIS, version 3.10.3-A Coruña. Alkmaar (Paardenmarkt) is the oldest sample; it was part of a monastic cemetery in present-day Alkmaar in Noord Holland, in use between 1484 and 1574 CE. Klaaskinderkerke is a cemetery belonging to a verdronken dorp (lit. ‘drowned village’, or ‘sunken village’) on the island of Schouwen-Duiveland in the province of Zeeland, the remains there dating between the thirteenth and sixteenth centuries CE. Middenbeemster is a Protestant church cemetery in Noord Holland, located on a polder (or reclaimed lake), the Beemster, the first in The Netherlands to be dried by building dykes and pumping out the water using windmills; the remains are firmly dated between 1829 and 1866 CE. For more information about these historical samples, see previous work [115-117]. Finally, the ArtiVarK sample was collected between 2014 and 2015 in Nijmegen, and contains, of relevance here (see [41] for details), structural MRI and intra-oral three-dimensional optical scans of approximately 90 participants; for this study we used a subset of 34 (16 male; 18 female) contemporary Dutch individuals coming from across the whole country. See electronic supplementary material, table S1 for the list of included individuals. For the hard palate (see [115] for details), we included 22 individuals from the Klaaskinderkerke sample (6 male, 11 probable male, and 5 probable female; 2 young adults, 6 young-middle adults, 10 middle and 4 middle-old adults—see below for details about the age categories), and 38 individuals from the Middenbeemster sample (17 male, 2 probable male, 3 probable female and 16 female; 22 young adults and 16 middle adults). For the lower jaw (see [116] for details), we included 37 individuals from the Alkmaar sample (6 male, 8 probable male, 5 female and 18 probable female; 20 young adults and 17 middle adults), and 51 individuals from the Middenbeemster sample (18 male, 10 probable male, 11 female and 12 probable female; 31 young adults and 20 middle adults). Note that the estimation of sex and age-at-death from osteological remains are methodologically complex, often difficult to perform due to damage resulting from taphonomy, and their results must be taken as broad probability estimates [118-120]. Therefore, we decided to split our individuals (all adults) into two broad groups: ‘young adults' (roughly 18–35 years old at death) and ‘middle adults’ (roughly 36–60 years old at death); note that the age ranges for the Klaaskinderkerke sample carry a greater uncertainty than the other collections, because of a general lack of postcranial remains. Likewise, we assign sex with certainty (male or female) whenever possible, but we also classified some individuals as ‘probably’ of one sex or the other. Note that, for the Klaaskinderkerke sample, a second, independent assignment of sex agrees with our own for all specimens except one (not used in the simulations conducted here). The data acquisition and processing protocol was similar for the two structures: following a general osteological analysis, meta-data registration, and the application of various inclusion and exclusion criteria (good or excellent preservation, completeness, adult age, the absence of severe pathologies and abnormalities, and the absence of periodontal disease and ante-mortem tooth loss), the selected individuals were assessed for sex and their age-at-death was estimated. Next, the relevant structures were digitized using a NextEngine™ 3D Desktop Scanner Ultra HD 2020i (NextEngine, Inc., Santa Monica, CA), which is widely used in physical anthropology, palaeoanthropology and other fields. This device allows the acquisition of high-quality, high-resolution three-dimensional models of solid objects using multiple laser beams (for details, see [115,116]). This resulted in a set of three-dimensional meshes (post-processed in ScanStudio™ and MeshLab; [121]), one per individual and structure, on which landmarks (fixed, anatomically clearly defined features that are homologous between individuals) and semilandmarks (or ‘sliding landmarks’, i.e. a set of variable or ‘mobile’ points used to discretize a curve) were placed using Landmark Editor [122]: there are 27 landmarks for the lower jaw in the historical samples, 6 in the contemporaneous ArtiVarK sample (a subset of the 27, due to differing methodologies), and 44 for the hard palate (see electronic supplementary material, tables S2–S5 for a full list including the sliding semilandmarks; [115], tables 8 and 9, and figure 31 and [117], tables 2 and 3 for details; and electronic supplementary material, figure S8 for their placement). These allowed us to quantitatively compare the samples and individuals using classic and geometric morphometric [123] methods which allow the principled separation of variation in shape from variation in size. In a nutshell (see [115] for full details), for the shape of the hard palate we found that our samples showed a large overlap: the variation within both samples is greater than that between them. This is to be expected, since the populations were not significantly separated temporally, geographically or linguistically (Middle Dutch versus modern Dutch; see [115,124-127]). The small amount of variation we found is best characterized by subtle differences in the height and shape (‘U’-shaped versus ‘V’-shaped) of the maxillary dental arch, as well as the flexion of the basicranial angle, the relative width of the nasal aperture and the degree of orthognathy. The only significant difference in average shape was found between the male group from Klaaskinderkerke and the female group from Middenbeemster. Size differences were found to be statistically significant between the two samples, with Klaaskinderkerke being slightly larger than Middenbeemster. The causes for these differences probably stem from developments during the respective periods when these people lived, ranging from climatic (temperatures during the Little Ice Age) to sociocultural (Industrial Revolution). For the lower jaw (see [116,117] for details), we found that differences between the two historic samples (Alkmaar and Middenbeemster) were dominated by size, with the male individuals from the older site (Alkmaar) having the largest mandibles on average, and the female individuals from Middenbeemster having the smallest mandibles. Moreover, the magnitude of sexual dimorphism seems to differ between the two sites, with a lower amount of sexual dimorphism present in the Alkmaar sample. The results are possibly linked to a softening of the diet that occurred between these time periods, although confounding factors such as sampling bias, life history, and shared population history could not be accurately accounted for. There are several methods of palaeodietary reconstruction: while the faunal and floral remains at an archaeological site indicate the available foods, the analysis of various aspects of human bones and teeth indicate what people actually ate. The stable isotope ratios in bones and teeth paint a broad picture of the types of plants (C3 versus C4 photosynthetic pathway) and animals (herbivores versus carnivores, marine versus terrestrial) that were consumed [128], while dental calculus often contains masticated food debris, plant microfossils, protein biomolecules, and plant and animal DNA [129,130]. Such research has been successfully applied to both recent and ancient remains, including Neandertals and early modern humans [131,132], and may provide information about dietary variables that affect vocal tract anatomy. The macro- and microscopic analysis of dental wear can be useful for inferring how ‘hard’ or ‘soft’ the diet was [133,134]. Several of these methods have been applied to the three historical samples used here [135-137], but more research and better data integration are needed before we can study if (and how) dietary differences might have affected vocal tract anatomy. With the currently available data, it seems there were no major differences in dietary ‘softness’ among the archaeological populations, nor the modern ArtiVarK sample, suggesting that all groups had diets requiring broadly similar masticatory forces (but we cannot rule out that this is an artifact of the poor landmark coverage in the modern sample), concordant to the historical peasant staples of wheat or rye bread, dairy products (cheese, butter and milk), root vegetables, with smaller amounts of fish, and even more so, meat. While still preliminary, these findings do show that (i) we can recover and quantitatively analyse data pertaining to the vocal tract from relatively old human remains, and (ii) that there are quantifiable differences between individuals, locations and historical periods that may be relevant for speech.

Getting the past to ‘speak’: simulating the articulation of vowels using medieval hard palates

Here we push this research programme an inch further, by making a subset of medieval hard palate samples ‘speak’. More precisely (figure 2; full details about the methods and results are available in the electronic supplementary materials ‘Modelling the biomechanics and acoustics of reconstructed vocal tracts: methods and full results', figures S1–S7 and Videos), we selected four individuals (two males, one probable male, one probable female; one young-middle adult, two middle adults, one middle-old adult) for which very complete skeletal geometry is available, and we used a biomechanical model of the vocal tract (ArtiSynth; [90]) to reconstruct, as accurately as possible, the way these individuals would have articulated the six vowels [i] (as in the North American English ‘heat’), [e] (as in ‘hate’), [æ] (as in ‘hat’), [ɑ] (as in ‘hot’), [o] (as in ‘hotel’) and [u] (as in ‘hoot’; figure 2; electronic supplementary material, figure S2 and Videos). What we found is that there are differences in the acoustics of the vowels between these individuals (electronic supplementary material, figures S4–S7), some of them rather dramatic (one case being that the vowel [e] of one individual is acoustically close to the [i] of the others). While our models are rather simplistic and, crucially, do not implement articulatory compensation or acoustic-auditory-based targeting, the differences we found are real in the sense that they would exist in the speech output of these individuals if they could maintain exactly the same articulatory structure and posture (given their individual hard palate and dentition); they are components of our organic voice quality [138,139]. Thus, the individuals must overcome these potential acoustic differences in their speech, through articulatory compensation, in order to achieve reasonably well the intended auditory vowel targets or risk being misunderstood.

Figure 2

Midsagittal views (from the left) of the four reconstructed vocal tracts (identified as A–D; see the electronic supplementary materials for details). The airways are shown in blue, the associated skull samples are in beige and the large capital black letters are the samples. Sample access granted by the Stichting Cultureel Erfgoed Zeeland. Digitized image dissemination for educational purposes only. Crania models created by author W.A.B.; for access to the actual three-dimensional models of the crania, contact author W.A.B. (Online version in colour.) Our own previous work [88], using a different type of model of the vocal tract that does allow for articulatory compensation of differences in the midsagittal shape of the hard palate through the use of the free articulators (mainly the tongue, the lips and the lower jaw), shows that even if this compensation is highly effective, it nevertheless fails to completely erase the acoustic ‘signature’ of inter-individual anatomical variation. This ‘attenuated’ acoustic signature is very small but present and, perhaps surprisingly, is sometimes amplified by the repeated use and transmission of language in populations composed of individuals with a similar anatomy [39,88,91]. All in all, we hope to have shown that (i) information about vocal tract structures can be successfully extracted from the remains of long-gone people, (ii) that it can be used in qualitative, quantitative and modelling investigations into (iii) the patterning of inter-individual (and, possibly, inter-group) variation with (iv) consequences for speech and language.

Discussion and conclusion

While the aforementioned data have only ‘scratched the surface’, we do hope to have shown that there is huge potential in such an approach, which uses traces of vocal tract anatomy from past people, combined with results from computer models and experiments in contemporaneous individuals, to make informed inferences about long-gone languages. What we have tried to show here concretely is that there is a wealth of information about specific ‘hard’ components of the vocal tract (the lower jaw and the hard palate) in the osteological record, that this information can be extracted and quantified in a rigorous manner, and that it can be used not only to compare individuals and groups across time, but also to simulate how these would have affected the production of vowels. Naturally, these first steps can be extended in time, space and linguistic coverage. Temporally, we have focused here on relatively recent historical populations from well-understood contexts (medieval and post-medieval northwestern Europe) for pragmatic (reliable contextual information, good preservation, access to digitization technology, osteological expertise) and theoretical reasons (controversies about sexing using the lower jaw, changes in food and nutrition, sound changes in the history of Dutch). But our experience clearly suggests that this can be extended back in time for as far as there are well-preserved components of the vocal tract in the fossil record, emphatically preceding the emergence of anatomically modern humans a few hundred thousand years ago [140]. Geographically, there is nothing special about northwestern Europe besides the fact that it is historically rather well understood and archeologically intensively studied, but as more work is conducted in other regions, as more osteological and fossil data become available, either physically (in museums and collections) or digitally (as three-dimensional optical or CT scans), and as the various non-scientific hurdles related to accessing these data diminish, this type of investigation can be used to shed light on regional or larger-scale developments. Finally, we are guilty of producing yet another study of vowel production here (although we do provide the formants F1 to F5 and the spectra up to 5000 Hz), but this can be extended to other speech sounds and vocal tract structures as well. Just to cite a few possibilities, one could investigate the alveolar ridge and its effects on click consonants [91,141], hard palate shape and ‘r’ [41], dentition/bite and labiodentals [94], or larynx position and, indeed, vowels [4]. Putting these extensions together, we could suggest studies that would, for example, look at the osteological and fossil record of sub-Saharan Africa preceding the relatively recent Bantu expansion [142,143], focusing on the alveolar ridge and aiming to understand the time-depth and geographical extensions of ‘click languages’. More precisely, while currently there are but a few languages that integrate click consonants in their phonological inventories, mostly in southern and eastern Africa (and some Bantu languages that have borrowed clicks), there are intriguing suggestions that they are but remnants of a once widespread use of phonemic clicks [144-146]. If the alveolar ridge shape and size indeed bias the articulation and acoustics of clicks [91,141], then we might be able to infer if past populations were biased in such a way as to favour phonological clicks or not. (Incidentally, we could apply the same inferences to the existing pygmy groups: while their pre-Bantu languages are lost, they may very well have used phonemic clicks as well.) Another example might even transgress the origins of modern humans and try to infer features of Neanderthal speech: if the negative effects of the edge-to-edge bite on labiodentals inferred for recent hunter-gatherers [94] also hold deeper in time, we can infer that pre-12 000 years ago modern humans, archaic humans, Neanderthals (and even H. erectus) probably did not use ‘f’ and ‘v’ that much [147]. Likewise, we might use the shape of the Neanderthal hard palate to infer something about the vowels that their languages might have used, or even how they might have articulated their ‘r’s. But we also need to know more about the type and patterning of variation in vocal tract anatomy, physiology and control in present-day humans, and its effects on speech and language, before we can generate informed hypotheses about the past. The work we presented here, concerning the hard palate [41,88], the alveolar ridge [91], and the bite [94], are (we hope) just the beginning of a vast research programme. Other directions could concern the curvature of the cervical spine and pitch [148,149], or the anatomy of the nasal cavity and the anatomy and physiology of the velum and vowel nasalization. While we do not know how widespread this variation is and how important its effects on speech and language are, the available evidence suggests that this is a worthwhile direction of future research. In the end, the breadth of these questions is only limited by our imagination, the fossil and osteological data, and the kind of relationships between vocal tract anatomy, sound change and linguistic diversity that future work will establish. Such investigations assume that the processes, forces and mechanisms that we observe today (in particular, the effects of vocal tract anatomy on speech and language) were also at work in the past in pretty much the same way—what is known as the uniformitarian principle. However, it is currently unclear how this principle should be applied in practice, because the farther back in time we go the more different things (gradually) become, until we cross into the long evolutionary history preceding the emergence of language as we know it [9,150]. Even closer to the present, if the link between food, bite and labiodentals [94] holds, then strict uniformitarianism breaks down at the dawn of agriculture, as we cannot expect pre-12 000-years old languages to have the same distribution of labiodentals as present-day languages, but the uniformitarian principle would still largely apply, as the same articulatory constraints and affordances worked then as they do today. (For more nuanced discussions and further developments, see [16], as well as [151-153].)

77 in total

1. The iceman: discovery and imaging.

Authors: William A Murphy; Dieter zur Nedden Dz; Paul Gostner; Rudolf Knapp; Wolfgang Recheis; Horst Seidler
Journal: Radiology Date: 2003-01-24 Impact factor: 11.105

2. Language-tree divergence times support the Anatolian theory of Indo-European origin.

Authors: Russell D Gray; Quentin D Atkinson
Journal: Nature Date: 2003-11-27 Impact factor: 49.962

3. Cranial airways and the integration between the inner and outer facial skeleton in humans.

Authors: Markus Bastir; Antonio Rosas
Journal: Am J Phys Anthropol Date: 2013-09-03 Impact factor: 2.868

4. The evolution of language and thought.

Authors: Philip Lieberman
Journal: J Anthropol Sci Date: 2015-03-08

5. The descended larynx is not uniquely human.

Authors: W T Fitch; D Reby
Journal: Proc Biol Sci Date: 2001-08-22 Impact factor: 5.349

6. Differential DNA methylation of vocal and facial anatomy genes in modern humans.

Authors: David Gokhman; Malka Nissim-Rafinia; Lily Agranat-Tamir; Genevieve Housman; Raquel García-Pérez; Esther Lizano; Olivia Cheronet; Swapan Mallick; Maria A Nieves-Colón; Heng Li; Songül Alpaslan-Roodenberg; Mario Novak; Hongcang Gu; Jason M Osinski; Manuel Ferrando-Bernal; Pere Gelabert; Iddi Lipende; Deus Mjungu; Ivanela Kondova; Ronald Bontrop; Ottmar Kullmer; Gerhard Weber; Tal Shahar; Mona Dvir-Ginzberg; Marina Faerman; Ellen E Quillen; Alexander Meissner; Yonatan Lahav; Leonid Kandel; Meir Liebergall; María E Prada; Julio M Vidal; Richard M Gronostajski; Anne C Stone; Benjamin Yakir; Carles Lalueza-Fox; Ron Pinhasi; David Reich; Tomas Marques-Bonet; Eran Meshorer; Liran Carmel
Journal: Nat Commun Date: 2020-03-04 Impact factor: 14.919

7. Out of Africa: modern human origins special feature: isotopic evidence for the diets of European Neanderthals and early modern humans.

Authors: Michael P Richards; Erik Trinkaus
Journal: Proc Natl Acad Sci U S A Date: 2009-08-11 Impact factor: 11.205

8. Auditory capacities in Middle Pleistocene humans from the Sierra de Atapuerca in Spain.

Authors: I Martínez; M Rosa; J-L Arsuaga; P Jarabo; R Quam; C Lorenzo; A Gracia; J-M Carretero; J-M Bermúdez de Castro; E Carbonell
Journal: Proc Natl Acad Sci U S A Date: 2004-06-22 Impact factor: 11.205

9. Bantu expansion shows that habitat alters the route and pace of human dispersals.

Authors: Rebecca Grollemund; Simon Branford; Koen Bostoen; Andrew Meade; Chris Venditti; Mark Pagel
Journal: Proc Natl Acad Sci U S A Date: 2015-09-14 Impact factor: 11.205

Review 10. The mystery of language evolution.

Authors: Marc D Hauser; Charles Yang; Robert C Berwick; Ian Tattersall; Michael J Ryan; Jeffrey Watumull; Noam Chomsky; Richard C Lewontin
Journal: Front Psychol Date: 2014-05-07

4 in total