| Literature DB >> 26822666 |
Aris R Terzopoulos1, Lynne G Duncan2, Mark A J Wilson3, Georgia Z Niolaki4,5, Jackie Masterson5.
Abstract
In this article, we introduce HelexKids, an online written-word database for Greek-speaking children in primary education (Grades 1 to 6). The database is organized on a grade-by-grade basis, and on a cumulative basis by combining Grade 1 with Grades 2 to 6. It provides values for Zipf, frequency per million, dispersion, estimated word frequency per million, standard word frequency, contextual diversity, orthographic Levenshtein distance, and lemma frequency. These values are derived from 116 textbooks used in primary education in Greece and Cyprus, producing a total of 68,692 different word types. HelexKids was developed to assist researchers in studying language development, educators in selecting age-appropriate items for teaching, as well as writers and authors of educational books for Greek/Cypriot children. The database is open access and can be searched online at www.helexkids.org .Entities:
Keywords: Children; Contextual diversity; Frequency; Greek language; Word database
Mesh:
Year: 2017 PMID: 26822666 PMCID: PMC5352803 DOI: 10.3758/s13428-015-0698-5
Source DB: PubMed Journal: Behav Res Methods ISSN: 1554-351X
Numbers of different textbooks in each grade, tabulated by school subject
| Subject | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 | All Grades* |
|---|---|---|---|---|---|---|---|
| Greek | 5 | 6 | 6 | 7 | 8 | 8 | 34 |
| Mathematics | 6 | 6 | 5 | 5 | 5 | 5 | 32 |
| Environmental studies | 2 | 2 | 2 | 2 | 0 | 0 | 8 |
| Science | 0 | 0 | 0 | 0 | 2 | 2 | 4 |
| History | 0 | 0 | 2 | 2 | 2 | 2 | 8 |
| Geography | 0 | 0 | 0 | 0 | 2 | 2 | 4 |
| Religious education | 0 | 0 | 1 | 1 | 1 | 1 | 4 |
| Music education | 2 | 2 | 2 | 2 | 2 | 2 | 10 |
| Art | 2 | 2 | 2 | 2 | 2 | 2 | 6 |
| Theatre | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| Physical education | 1 | 1 | 1 | 1 | 1 | 1 | 3 |
| Citizenship | 0 | 0 | 0 | 0 | 1 | 1 | 2 |
| TOTAL | 18 | 19 | 21 | 22 | 27 | 27 | 116 |
Books used in more than one grade contribute only once to the total number.
Numbers of tokens for each grade by subject
| Subject | Grade 1 | Grade 2 | Grade 3 | Grade 4 | Grade 5 | Grade 6 |
|---|---|---|---|---|---|---|
| Art | 4,881 | 4,881 | 12,465 | 13,676 | 13,104 | 13,104 |
| Citizenship | 9,766 | 18,451 | ||||
| Environmental studies | 7,093 | 11,303 | 20,560 | 46,062 | ||
| Geography | 33,317 | 29,399 | ||||
| Greek | 38,092 | 48,972 | 79,391 | 151,170 | 206,695 | 208,906 |
| History | 51,146 | 37,266 | 38,931 | 51,759 | ||
| Mathematics | 12,920 | 22,744 | 17,684 | 23,470 | 27,324 | 76,162 |
| Music | 2,317 | 2,732 | 10,726 | 10,725 | 18,550 | 26,927 |
| Religious education | 22,629 | 43,008 | 32,557 | 23,192 | ||
| Science | 40,916 | 42,940 | ||||
| Theatre | 16,926 | 16,926 | ||||
| TOTAL | 70,352 | 95,681 | 226,932 | 337,708 | 462,682 | 532,362 |
Fig. 1Distribution of textbooks per subject
Fig. 2Distribution of textbooks per grade
Number of different word types, number of words occurring five or more times per grade, and number and percentage of hapax words, tabulated by grade
| Grade | Word Types | Hapax Words | % of Hapax Words | Words Occurring 5 or More Times |
|---|---|---|---|---|
| Grade 1 | 9,155 | 4,533 | 49.5 | 1,889 |
| Grade 2 | 11,714 | 5,791 | 49.5 | 2,453 |
| Grade 3 | 21,193 | 9,373 | 44.4 | 5,172 |
| Grade 4 | 33,762 | 16,267 | 48.2 | 7,511 |
| Grade 5 | 44,851 | 21,396 | 47.7 | 9,797 |
| Grade 6 | 48,080 | 22,641 | 47.1 | 10,881 |
Number of tokens, of different word types, number of words occurring five or more times, and number and percentage of hapax words, tabulated per grade combination
| Grade | Tokens | Word Types | Hapax Words | % of Hapax Words | Words Occurring 5 or More Times |
|---|---|---|---|---|---|
| Grades 1–2 | 165,864 | 13,531 | 4,353 | 32.2 | 3,798 |
| Grades 1–3 | 391,731 | 26,338 | 9,400 | 35.7 | 8,661 |
| Grades 1–4 | 729,363 | 41,648 | 14,301 | 34.3 | 13,557 |
| Grades 1–5 | 1,191,971 | 59,402 | 18,464 | 31.1 | 19,640 |
| All Grades | 1,355,265 | 68,692 | 27,733 | 40.4 | 20,392 |
Mean, mode, minimum, maximum, and percentiles values (P10, P25, P50, P75, and P90) for all grades for the frequency counts, Zipf, D, U, SFI, and CD
| G1 | G2 | G3 | G4 | G5 | G6 | G1–G6 | ||
|---|---|---|---|---|---|---|---|---|
| Frequency | Mean | 7.68 | 8.15 | 10.68 | 10 | 10.30 | 11.07 | 19.67 |
| Mode | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| Minimum | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| Maximum | 2,912 | 3,462 | 8,604 | 11,220 | 14,967 | 17,481 | 46,576 | |
| P10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| P25 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| P50 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | |
| P75 | 4 | 4 | 4 | 4 | 4 | 4 | 6 | |
| P90 | 10 | 10 | 12 | 11 | 11 | 11 | 19 | |
| Zipf | Mean | 4.25 | 4.15 | 3.90 | 3.78 | 3.71 | 3.69 | 3.62 |
| Mode | 4.18 | 4.06 | 3.73 | 4 | 4 | 3.47 | 3.25 | |
| Minimum | 4.18 | 4.06 | 3.73 | 4 | 4 | 3.47 | 3.25 | |
| Maximum | 6.47 | 6.54 | 6.93 | 7 | 7 | 7.24 | 7.67 | |
| P10 | 4.18 | 4.06 | 3.73 | 3.60 | 3.51 | 3.47 | 3.25 | |
| P25 | 4.18 | 4.06 | 3.73 | 3.60 | 3.51 | 3.47 | 3.25 | |
| P50 | 4.21 | 4.10 | 3.81 | 3.70 | 3.62 | 3.59 | 3.45 | |
| P75 | 4.26 | 4.16 | 3.93 | 3.84 | 3.79 | 3.77 | 3.83 | |
| P90 | 4.39 | 4.31 | 4.22 | 4.15 | 4.12 | 4.11 | 4.30 | |
| D | Mean | 0.14 | 0.15 | 0.15 | 0.14 | 0.14 | 0.14 | 0.16 |
| Mode | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Minimum | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| Maximum | 0.95 | 0.97 | 0.95 | 0.96 | 1 | 0.95 | 0.95 | |
| P10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| P25 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
| P50 | 0 | 0 | 0 | 0 | 0 | 0 | 0.12 | |
| P75 | 0.29 | 0.30 | 0.26 | 0.25 | 0.23 | 0.23 | 0.27 | |
| P90 | 0.48 | 0.51 | 0.49 | 0.48 | 0.46 | 0.46 | 0.47 | |
| U | Mean | 71.02 | 57.54 | 31.89 | 20.04 | 15.16 | 14.72 | 10.28 |
| Mode | 3.92 | 2.12 | 0.59 | 0.43 | 0 | 0.17 | 0.03 | |
| Minimum | 0.47 | 0.27 | 0.10 | 0.44 | 0 | 0.01 | 0 | |
| Maximum | 38,357 | 33,649 | 34,968 | 30,666 | 30,060 | 30,699 | 31,745 | |
| P10 | 0.87 | 0.92 | 0.29 | 0.23 | 0.11 | 0.08 | 0.01 | |
| P25 | 1.86 | 1.26 | 0.59 | 0.38 | 0.16 | 0.17 | 0.02 | |
| P50 | 3.92 | 2.12 | 1.18 | 0.51 | 0.30 | 0.25 | 0.24 | |
| P75 | 18.52 | 13.18 | 6.27 | 3.73 | 2.53 | 2.25 | 1.16 | |
| P90 | 67.86 | 56.19 | 25.92 | 15.14 | 10.47 | 9.67 | 6.02 | |
| SFI | Mean | 48.12 | 46.70 | 42.95 | 40.66 | 38.44 | 37.76 | 32.51 |
| Mode | 45.94 | 43.26 | 37.72 | 36.35 | 34 | 32.40 | 24.15 | |
| Minimum | 36.70 | 34.34 | 29.97 | 26.46 | 23 | 20.73 | 10.87 | |
| Maximum | 85.84 | 85.27 | 85.44 | 84.87 | 85 | 84.87 | 85.02 | |
| P10 | 39.40 | 39.65 | 34.67 | 33.55 | 30.60 | 29.05 | 20.45 | |
| P25 | 42.69 | 40.99 | 37.72 | 35.76 | 32.16 | 32.32 | 23.34 | |
| P50 | 45.94 | 43.26 | 40.73 | 37.12 | 34.83 | 33.90 | 33.85 | |
| P75 | 52.68 | 51.20 | 47.98 | 45.72 | 44.03 | 43.51 | 40.66 | |
| P90 | 58.32 | 57.50 | 54.14 | 51.80 | 50.20 | 49.86 | 47.80 | |
| CD | Mean | 0.18 | 0.19 | 0.14 | 0.13 | 0.11 | 0.11 | 0.05 |
| Mode | 0.10 | 0.10 | 0.07 | 0.06 | 0 | 0.05 | 0.01 | |
| Minimum | 0.10 | 0.10 | 0.07 | 0 | 0 | 0.05 | 0 | |
| Maximum | 1 | 1 | 1 | 1 | 1 | 1 | 0.99 | |
| P10 | 0.10 | 0.10 | 0.07 | 0.06 | 0.05 | 0.05 | 0.01 | |
| P25 | 0.10 | 0.10 | 0.07 | 0.06 | 0.05 | 0.05 | 0.01 | |
| P50 | 0.10 | 0.10 | 0.07 | 0.06 | 0.05 | 0.05 | 0.03 | |
| P75 | 0.20 | 0.20 | 0.13 | 0.13 | 0.10 | 0.10 | 0.05 | |
| P90 | 0.40 | 0.40 | 0.33 | 0.25 | 0.24 | 0.24 | 0.12 |
Textbooks used in more than one grade
| Books | Grades 1 and 2 | Grade 3 and 4 | Grades 5 and 6 | Grades 4, 5, and 6 |
|---|---|---|---|---|
| Grammar | ✓ | |||
| Dictionary | ✓ | |||
| Anthology of short stories and poems | ✓ | ✓ | ✓ | |
| Artistic expression | ✓ | ✓ | ✓ | |
| Music | ✓ | |||
| Theatre | ✓ | |||
| Physical education | ✓ | ✓ | ✓ |