Literature DB >> 26066216

Text mixing shapes the anatomy of rank-frequency distributions.

Jake Ryland Williams1, James P Bagrow1, Christopher M Danforth1, Peter Sheridan Dodds1.   

Abstract

Natural languages are full of rules and exceptions. One of the most famous quantitative rules is Zipf's law, which states that the frequency of occurrence of a word is approximately inversely proportional to its rank. Though this "law" of ranks has been found to hold across disparate texts and forms of data, analyses of increasingly large corpora since the late 1990s have revealed the existence of two scaling regimes. These regimes have thus far been explained by a hypothesis suggesting a separability of languages into core and noncore lexica. Here we present and defend an alternative hypothesis that the two scaling regimes result from the act of aggregating texts. We observe that text mixing leads to an effective decay of word introduction, which we show provides accurate predictions of the location and severity of breaks in scaling. Upon examining large corpora from 10 languages in the Project Gutenberg eBooks collection, we find emphatic empirical support for the universality of our claim.

Year:  2015        PMID: 26066216     DOI: 10.1103/PhysRevE.91.052811

Source DB:  PubMed          Journal:  Phys Rev E Stat Nonlin Soft Matter Phys        ISSN: 1539-3755


  8 in total

1.  Vaporous Marketing: Uncovering Pervasive Electronic Cigarette Advertisements on Twitter.

Authors:  Eric M Clark; Chris A Jones; Jake Ryland Williams; Allison N Kurti; Mitchell Craig Norotsky; Christopher M Danforth; Peter Sheridan Dodds
Journal:  PLoS One       Date:  2016-07-13       Impact factor: 3.240

2.  On the physical origin of linguistic laws and lognormality in speech.

Authors:  Iván G Torre; Bartolo Luque; Lucas Lacasa; Christopher T Kello; Antoni Hernández-Fernández
Journal:  R Soc Open Sci       Date:  2019-08-21       Impact factor: 2.963

3.  A Standardized Project Gutenberg Corpus for Statistical Analysis of Natural Language and Quantitative Linguistics.

Authors:  Martin Gerlach; Francesc Font-Clos
Journal:  Entropy (Basel)       Date:  2020-01-20       Impact factor: 2.524

4.  The Brevity Law as a Scaling Law, and a Possible Origin of Zipf's Law for Word Frequencies.

Authors:  Álvaro Corral; Isabel Serra
Journal:  Entropy (Basel)       Date:  2020-02-17       Impact factor: 2.524

5.  Zipf's laws of meaning in Catalan.

Authors:  Neus Català; Jaume Baixeries; Ramon Ferrer-I-Cancho; Lluís Padró; Antoni Hernández-Fernández
Journal:  PLoS One       Date:  2021-12-16       Impact factor: 3.240

6.  Lognormals, power laws and double power laws in the distribution of frequencies of harmonic codewords from classical music.

Authors:  Marc Serra-Peralta; Joan Serrà; Álvaro Corral
Journal:  Sci Rep       Date:  2022-02-16       Impact factor: 4.379

7.  Zipf's law holds for phrases, not words.

Authors:  Jake Ryland Williams; Paul R Lessard; Suma Desu; Eric M Clark; James P Bagrow; Christopher M Danforth; Peter Sheridan Dodds
Journal:  Sci Rep       Date:  2015-08-11       Impact factor: 4.379

8.  Large-Scale Analysis of Zipf's Law in English Texts.

Authors:  Isabel Moreno-Sánchez; Francesc Font-Clos; Álvaro Corral
Journal:  PLoS One       Date:  2016-01-22       Impact factor: 3.240

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.