| Literature DB >> 34762693 |
Jin Cong1, Haitao Liu2,3.
Abstract
The models of linguistic networks and their analytical tools constitute a potential methodology for investigating the formation of structural patterns in actual language use. Research with this methodology has just started, which can hopefully shed light on the emergent nature of linguistic structure. This study attempts to employ linguistic networks to investigate the formation of modern Chinese two-character words (as structural units based on the chunking of their component characters) in the actual use of modern Chinese, which manifests itself as continuous streams of Chinese characters. Network models were constructed based on authentic Chinese language data, with Chinese characters as nodes, their co-occurrence relations as directed links, and the co-occurrence frequencies as link weights. Quantitative analysis of the network models has shown that a Chinese two-character word can highlight itself as a two-node island, i.e., a cohesive sub-network with its two component characters co-occurring more frequently than they co-occur with the other characters. This highlighting mechanism may play a vital role in the formation and acquisition of two-character words in actual language use. Moreover, this mechanism may also throw some light on the emergence of other structural phenomena (with the chunking of specific linguistic units as their basis).Entities:
Mesh:
Year: 2021 PMID: 34762693 PMCID: PMC8584675 DOI: 10.1371/journal.pone.0259818
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1A network model of 语言 (yǔyán, ‘language’) with its immediate context in the text of Example 1.
Fig 2A linguistic network based on the text of Example 1.
Basic information of the TNIs extracted from the three networks.
| Network | CTCWs | Word-like chunks | Non-word chunks | Total |
|---|---|---|---|---|
| LCMC_A | 269 (97.46%) | 6 (2.17%) | 1 (0.36%) | 276 |
| LCMA_J | 267 (96.74%) | 8 (2.90%) | 1 (0.36%) | 276 |
| LWC | 363 (97.58%) | 9 (2.42%) | 0 (0%) | 372 |
Fig 3Rank-frequency distributions of CTCWs forming TNIs in the three networks.
Basic information of TNIs extracted from three sub-networks of Network LCMC_J.
| Network | CTCWs | Word-like chunks | Non-word chunks | CTCWs not extracted from Network LCMC_J | Total |
|---|---|---|---|---|---|
| LCMC_J_1 | 17 (89.47%) | 2 (10.53%) | 0 | 10 | 19 |
| LCMC_J_2 | 37 (92.50%) | 3 (7.50%) | 0 | 28 | 40 |
| LCMC_J_3 | 65 (94.20%) | 4 (5.80%) | 0 | 41 | 69 |