| Literature DB >> 28018261 |
Abstract
Lexicostatistics has been applied in linguistics to inform phylogenetic relations among languages. There are two important yet not well-studied parameters in this approach: the conventional size of vocabulary list to collect potentially true cognates and the minimum matching instances required to confirm a recurrent sound correspondence. Here, we derive two statistical principles from stochastic theorems to quantify these parameters. These principles validate the practice of using the Swadesh 100- and 200-word lists to indicate degree of relatedness between languages, and enable a frequency-based, dynamic threshold to detect recurrent sound correspondences. Using statistical tests, we further evaluate the generality of the Swadesh 100-word list compared to the Swadesh 200-word list and other 100-word lists sampled randomly from the Swadesh 200-word list. All these provide mathematical support for applying lexicostatistics in historical and comparative linguistics.Entities:
Keywords: Ansari-Bradley test; Bernoulli process; Spearman's rho; Swadesh lists; binomial distribution; cognates
Year: 2016 PMID: 28018261 PMCID: PMC5149542 DOI: 10.3389/fpsyg.2016.01916
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
.
| 157.866 | 253.456 | 350.498 | 476.568 | 569.158 | 770.815 | |
| 40.670 | 66.526 | 93.788 | 130.833 | 159.288 | 225.260 | |
| 159.122 | 256.709 | 356.750 | 488.202 | 585.829 | 801.713 | |
| 40.753 | 66.748 | 94.230 | 131.694 | 160.567 | 227.826 |
Gray cells indicate the calculated conventional sizes in the eight conditions of our estimation.
Figure 1Detection of sound correspondences in assembled words from languages A and B. x (i = 1 to n) is the concept in the vocabulary list for collecting potential cognates, n is the size of the vocabulary list. u is the word form from A that is semantically equivalent to x. u is the word form from B that is semantically equivalent to x. p(y|x = x) is the probability that some segments in u and u show a correspondence. The detecting process can be conceived as a Bernoulli process. The probabilities of showing correspondences in all exemplars follow a 0–1 distribution, and the probabilities for a particular correspondence to occur different times in all exemplars follow a binomial distribution.
Figure 2Conventional sizes of the vocabulary list under fixed error rate ε [0.05 (A) and 0.1 (B)] and significance level α [0.05 (solid lines) and 0.1 (dash lines)], and various total vocabulary sizes N (500–15000). Shade areas mark the range where N is between 4000 and 5000. Dotted lines mark the range of the conventional size n (round-up to the closest integers) calculated using Equation (8).
Figure 3Conventional size .
Word-initial consonant correspondences (CCs) between English (left) and Latin (right) (extracted from Ringe, .
| 1 | all(pl.) | ∅-∅ | 49 | leaf | l-f | 99 | you(sg.) | y-t |
| 2 | ashes | ∅-k | 50 | lie | l-y | 100 | yellow | y-f |
| 3 | bark[of | b-k | 51 | liver | l-y | 101 | and | ∅-∅ |
| tree] | 52 | long | l-l | 102 | animal | ∅-∅ | ||
| 4 | belly | b-w | 53 | louse | l-p | 103 | at | ∅-∅ |
| 5 | big | b-m | 54 | man | m-w | 104 | back[nn] | b-t |
| 6 | bird | b-∅ | 55 | many | m-m | 105 | bad | b-m |
| 7 | bite | b-m | 56 | moon | m-l | 106 | because | b-k |
| 8 | black | b-∅ | 57 | mountain | m-m | 107 | blow[vb, | b-f |
| 9 | blood | b-s | 58 | mouth | m-∅ | wind] | ||
| 10 | bone | b-∅ | 59 | name | n-n | 108 | breathe | b-s |
| 11 | breast(s) | b-m | 60 | neck | n-k | 109 | child | c-p |
| 12 | burn[intr] | b-∅ | 61 | new | n-n | 110 | count | k-n |
| 13 | claw | k-∅ | 62 | night | n-n | 111 | cut | k-s |
| 14 | cloud | k-n | 63 | nose | n-n | 112 | day | d-d |
| 15 | cold | k-f | 64 | not | n-n | 113 | dig | d-f |
| 16 | come | k-w | 65 | one | w-∅ | 114 | dirty | d-s |
| 17 | die | d-m | 66 | path | p-s | 115 | dull | d-h |
| 18 | dog | d-k | 67 | rain[nn] | r-p | 116 | dust | d-p |
| 19 | drink | d-b | 68 | red | r-r | 117 | fall | f-k |
| 20 | dry | d-s | 69 | root | r-r | 118 | far | f-p |
| 21 | ear | ∅-∅ | 70 | round | r-r | 119 | father | f-p |
| 22 | earth | ∅-t | 71 | sand | s-h | 120 | few | f-p |
| 23 | eat | ∅-∅ | 72 | say | s-d | 121 | fight | f-p |
| 24 | egg | ∅-∅ | 73 | see | s-w | 122 | five | f-k |
| 25 | eye | ∅-∅ | 74 | seed | s-s | 123 | flow | f-f |
| 26 | fat[nn] | f-∅ | 75 | sit | s-s | 124 | flower | f-f |
| 27 | feather | f-p | 76 | skin | s-k | 125 | fog | f-n |
| 28 | fire | f-∅ | 77 | sleep | s-d | 126 | four | f-k |
| 29 | fish | f-p | 78 | small | s-p | 127 | freeze | f-g |
| 30 | flesh | f-k | 79 | smoke | s-f | 128 | fruit | f-p |
| 31 | fly[vb] | f-w | 80 | stand | s-s | 129 | grass | g-g |
| 32 | foot | f-p | 81 | star | s-s | 130 | guts | g-∅ |
| 33 | full | f-p | 82 | stone | s-l | 131 | he | h-∅ |
| 34 | give | g-d | 83 | sun | s-s | 132 | heavy | h-g |
| 35 | good | g-b | 84 | swim | s-n | 133 | here | h-h |
| 36 | green | g-w | 85 | tail | s-k | 134 | hit | h-f |
| 37 | hair[of | h-k | 86 | that(nt.) | 135 | hold | h-t | |
| head] | 87 | this(nt.) | 136 | hunt[vb] | h-w | |||
| 38 | hand | h-m | 88 | tongue | t-l | 137 | husband | h-m |
| 39 | head | h-k | 89 | tooth | t-d | 138 | ice | ∅-g |
| 40 | hear | h-∅ | 90 | tree | t-∅ | 139 | if | ∅-s |
| 41 | heart | h-k | 91 | two | t-d | 140 | in | ∅-∅ |
| 42 | horn | h-k | 92 | walk | w-∅ | 141 | knife | n-k |
| 43 | hot | h-k | 93 | water | w-∅ | 142 | lake | l-l |
| 44 | human[nn] | h-h | 94 | we | w-n | 143 | laugh | l-r |
| 45 | I | ∅-∅ | 95 | what | w-k | 144 | left[-hand] | l-s |
| 46 | kill | k-∅ | 96 | white | w-∅ | 145 | mother | m-m |
| 47 | knee | n-g | 97 | who | h-k | 146 | narrow | n-∅ |
| 48 | know | n-s | 98 | woman | w-m | 147 | near | n-p |
| 148 | now | n-n | 165 | sky | s-k | 183 | think | θ-k |
| 149 | old | ∅-w | 166 | smell[tr] | s-∅ | 184 | three | θ-t |
| 150 | other | ∅-∅ | 167 | smooth | s-l | 185 | throw | θ-y |
| 151 | play | p-l | 168 | snake | s-∅ | 186 | tie | t-l |
| 152 | pull | p-t | 169 | snow | s-n | 187 | true | t-w |
| 153 | push | p-t | 170 | some(pl.) | s-∅ | 188 | vomit | v-w |
| 154 | right[-hand] | r-d | 171 | spit | s-s | 189 | wash | w-l |
| 155 | river | r-f | 173 | squeeze | s-p | 191 | wide | w-l |
| 156 | rotten | r-p | 174 | stab | s-f | 192 | wife | w-∅ |
| 157 | rub | r-f | 175 | stick[nn] | s-b | 193 | wind[nn] | w-w |
| 158 | salt | s-s | 176 | straight | s-r | 194 | wing | w-∅ |
| 159 | scratch | s-s | 177 | suck | s-s | 195 | wipe | w-t |
| 160 | sea | s-m | 178 | swell | s-t | 196 | with | w-k |
| 161 | sew | s-s | 179 | there | 197 | woods | w-s | |
| 162 | sharp | s-∅ | 180 | they | 198 | worm | w-w | |
| 163 | short | s-b | 181 | thick | θ-k | 199 | you(pl.) | y-w |
| 164 | sing | s-k | 182 | thin | θ-t | 200 | year | y-∅ |
The first 100 concepts are from the Swadesh 100-word list. “∅” denotes zero consonant.
Potential and recurrent word-initial consonant correspondences (CC) in the assembled words from English (left) and Latin (right) following the Swadesh 100- and 200-word lists (extracted from Ringe, .
| 0.01 | 100 | 62 | 6 | ∅-∅, f-p, h-k, l-y, n-n, r-r |
| 200 | 108 | 7 | ∅-∅, f-p, m-m, l-y, n-n, r-r, s-s | |
| 0.05 | 100 | 62 | 10 | ∅-∅, b-m, f-p, h-k, l-y, n-n, r-r, s-s, t-d, y-t |
| 200 | 108 | 15 | ∅-∅, b-m, f-p, h-k, l-y, m-m, n-n, p-t, r-r, s-s, ˘s-b, |
“∅” denotes zero consonant.
Figure 4Proportions of decision 1 in 10000 Ansari-Bradley tests under different sizes of sub-lists of the Swadesh 200-word list. Dotted line indicates the significance level 0.05.
Figure 5Thresholds of recurrent sound correspondences under different combined occurring frequencies of involved sound segments (0.0001–0.05, with a step 0.0001). Two regions marked by dotted lines [0.0002, 0.0015) and [0.0015, 0.0043), respectively, denote the ranges of combined occurring frequencies where the threshold can be set to two and three.
Word-initial consonant correspondences (CCs) between English (left) and French (right) (extracted from Ringe, .
| 1 | all(pl.) | ∅-t | 34 | good | g-b | 67 | path | p-s |
| 2 | ashes | ∅-s | 35 | grease | g-g | 68 | rain[nn] | r-p |
| 3 | bark[of tree] | b-∅ | 36 | green | g-v | 69 | red | r-r |
| 4 | belly | b-v | 37 | hair[of head] | h-s | 70 | root | r-r |
| 5 | big | b-g | 38 | hand | h-m | 71 | round | r-r |
| 6 | bird | b-∅ | 39 | head | h-t | 72 | sand | s-s |
| 7 | bite | b-m | 40 | hear | h-∅ | 73 | say | s-d |
| 8 | black | b-n | 41 | heart | h-k | 74 | see | s-v |
| 9 | blood | b-s | 42 | horn | h-k | 75 | seed | s-g |
| 10 | bone | b-∅ | 43 | hot | h-s | 76 | sit | s-∅ |
| 11 | breast(s) | b-s | 44 | human[nn] | h-p | 77 | skin | s-p |
| 12 | burn[intr] | b-b | 45 | I | ∅-m | 78 | sleep | s-d |
| 13 | claw | k-g | 46 | kill | k-m | 79 | small | s-p |
| 14 | cloud | k-n | 47 | knee | n-z | 80 | smoke | s-f |
| 15 | cold | k-f | 48 | know | n-s | 81 | stand | s-d |
| 16 | come | k-v | 49 | leaf | l-f | 82 | star | s-∅ |
| 17 | die | d-m | 50 | lie | l-∅ | 83 | stone | s-p |
| 18 | dog | d-s | 51 | liver | l-f | 84 | sun | s-s |
| 19 | drink | d-b | 52 | long | l-l | 85 | swim | s-n |
| 20 | dry | d-s | 53 | louse | l-p | 86 | tail | s-k |
| 21 | ear | ∅-∅ | 54 | man | m-∅ | 87 | that(nt.) | |
| 22 | earth | ∅-t | 55 | many | m-b | 88 | this(nt.) | |
| 23 | eat | ∅-m | 56 | meat | m-v | 89 | tongue | t-l |
| 24 | egg | ∅-∅ | 57 | moon | m-l | 90 | tooth | t-d |
| 25 | eye | ∅-∅ | 58 | mountain | m-m | 91 | tree | t-∅ |
| 26 | feather | f-p | 59 | mouth | m-b | 92 | two | t-d |
| 27 | fire | f-f | 60 | name | n-n | 93 | water | w-∅ |
| 28 | fish | f-p | 61 | neck | n-k | 94 | we | w-n |
| 29 | fly[vb] | f-v | 62 | new | n-n | 95 | what | w-k |
| 30 | foot | f-p | 63 | night | n-n | 96 | white | w-b |
| 31 | full | f-p | 64 | nose | n-n | 97 | who | h-k |
| 32 | give | g-d | 65 | not | n-n | 98 | woman | w-f |
| 33 | go | g-∅ | 66 | one | w-∅ | 99 | you(sg.) | y-t |
| 100 | yellow | y-z |
“∅” denotes zero consonant.