| Literature DB >> 28118370 |
Hannah Cornish1, Rick Dale2, Simon Kirby3, Morten H Christiansen4,5.
Abstract
Human language is composed of sequences of reusable elements. The origins of the sequential structure of language is a hotly debated topic in evolutionary linguistics. In this paper, we show that sets of sequences with language-like statistical properties can emerge from a process of cultural evolution under pressure from chunk-based memory constraints. We employ a novel experimental task that is non-linguistic and non-communicative in nature, in which participants are trained on and later asked to recall a set of sequences one-by-one. Recalled sequences from one participant become training data for the next participant. In this way, we simulate cultural evolution in the laboratory. Our results show a cumulative increase in structure, and by comparing this structure to data from existing linguistic corpora, we demonstrate a close parallel between the sets of sequences that emerge in our experiment and those seen in natural language.Entities:
Mesh:
Year: 2017 PMID: 28118370 PMCID: PMC5261806 DOI: 10.1371/journal.pone.0168532
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
The initial string sets for the first participant in each of 8 chains.
| Chain | String set |
|---|---|
| CMC, SFL, PCS, LFF, FSM, MSMF, CLMP, PPSL, FLCM, SCPC, CSPLL, LFPSS, PFMLM, MLCFP, SPMCF | |
| VSB, SGT, GTV, BVT, TBZ, VBSS, GZTB, STGS, TZBT, ZVTG, BZTSV, VBGSZ, GVVZG, SSGBB, ZGZVZ | |
| SLW, LXS, CWC, WSX, XKK, LSWK, CCCX, KXKL, SXLC, WKXL, KSKCW, SWCLX, WLSCS, LWXSC, XWLKW | |
| JNB, FJQ, QFP, PPN, NJF, JPFQ, QBNF, FQBP, BFFB, NJBN, JPQNP, BQPBB, PFJNQ, NQNBJ, FPJQJ | |
| XLJ, NXQ, LQP, PNN, JPL, QJNX, PQLQ, XPJL, LNQN, NJXJ, JNPXP, LXJQJ, PLXNQ, QQLPN, XLJPX | |
| PCH, NVP, VNC, HPV, TCN, NPTN, TVTP, HCNT, CTHV, PHHC, NHTCT, TVHPH, HVPCV, CPNNC, VCVNP | |
| RLB, VBF, LFR, GGV, BRG, RBGL, LFBV, VLGG, GFLL, FGLB, GBVRF, BLVFF, LVRRB, RVFBR, FVGRV | |
| SRS, ZPR, MRL, RZM, LMZ, RRZR, LPMP, PLRM, ZSMM, SLSP, PZPSS, MLZRL, RPMPZ, SZLLZ, LSMSP |
Fig 1Increase in learnability and distributional structure across generations of learners.
Global error decreased across time (top-left). Participants become better at reproducing the string sets (top-right). String sets do not diminish in length across time (bottom-left). Structure increases over generations, as indicated by the mean of Associative Chunk Strength (ACS) of string sets (bottom-right). In all cases, the graphs plot means across all eight chains, with error bars reflecting standard error of the mean.
Fig 2Generations 0 (left) and 10 (right) of chain 8.
These network diagrams link strings that share at least one bigram sequence. Although the string sets start out containing relatively few edges (links), by the end of the chain the strings have become quite densely connected to one another.
Fig 3From top-left to bottom-right demonstration the emergence of interconnected structure of strings by bigrams.
By comparison to natural language part-of-speech (POS) ordering from CHILDES (bottom-right panel), the relationship between string size and shared bigrams resemble each other closely. Blue circles are items from the original data; red dots reflect string-internal shuffled items. Lines are linear fits with corresponding color designations.
Fig 4Network connectivity analyses of three different types of sequences: word usage frequencies treated as sequences (left panel), digit sequences gleaned from passwords (center panel), and human-generated sequences of random digits (right panel).
Blue circles are items from the original data; red dots reflect sequence-internal shuffled items. Lines are linear fits with corresponding color designations. Only the sequential generation of random digits reveals the same pattern as observed in Fig 3 for late-generation and CHILDES networks.
Fig 5Examples of string sets found in the experiment.
The initial string set for chain 3 at generation zero (left panel) is lacking in structure, with many singletons. Connections are present when there are shared bigrams. The same string set from chain 3 transformed by the participant chain after ten generations (right panel). Using an automated Kamada-Kawai force-directed method, strings can now be grouped together based on structural similarities. The width of the edges on the network reflect string-edit distance—structural similarity. In general, we find similarity among clusters to increase and take on some apparent systematic structuring.