Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Shabd: A psycholinguistic database for Hindi.

Literature DB >> 34357542

Shabd: A psycholinguistic database for Hindi.

Ark Verma¹, Vivek Sikarwar², Himanshu Yadav³, Ranjith Jaganathan², Pawan Kumar⁴.

Abstract

We present Shabd, a psycholinguistic database in Hindi. It is based on a corpus of 1.4 billion words from electronic newspapers and news websites. Word frequencies and part of speech information have been derived and are made available in a cleaned list of 34 thousand hand-selected words, and a list of 96 thousand words observed with a frequency of more than 100 times in the corpus. Next to the Shabd database, we also make a list with all 2.3 million word types available and a list with the 2.5 million most frequent word pairs (word bigrams). The quality of the word frequency measure was tested in two lexical decision tasks. We observed that the Shabd word frequencies outperform existing frequencies based on smaller corpora of newspapers but not the Worldlex word frequencies based on an analysis of blogs. We also observed that word frequency accounts for as much variance as contextual diversity (operationalized as the number of documents in which the words were observed). The Shabd database is freely available for research.

Entities: Chemical

Keywords: Akshara; Contextual diversity; Corpus; Devangari; Hindi; Lexical decision; Visual word recognition; Word frequency

Mesh：

Year: 2021 PMID： 34357542 DOI： 10.3758/s13428-021-01625-2

Source DB: PubMed Journal: Behav Res Methods ISSN： 1554-351X

Keyword Cloud
References

22 in total

1. Aralex: a lexical database for Modern Standard Arabic.

Authors: Sami Boudelaa; William D Marslen-Wilson
Journal: Behav Res Methods Date: 2010-05

2. The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2.

Authors: Marc Brysbaert; Michaël Stevens; Paweł Mandera; Emmanuel Keuleers
Journal: J Exp Psychol Hum Percept Perform Date: 2015-10-26 Impact factor: 3.332

3. Contextual diversity, not word frequency, determines word-naming and lexical decision times.

Authors: James S Adelman; Gordon D A Brown; José F Quesada
Journal: Psychol Sci Date: 2006-09

4. Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English.

Authors: Marc Brysbaert; Boris New
Journal: Behav Res Methods Date: 2009-11

Shabd: A psycholinguistic database for Hindi.

1. Aralex: a lexical database for Modern Standard Arabic.

2. The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2.

3. Contextual diversity, not word frequency, determines word-naming and lexical decision times.

4. Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English.

5. An amorphous model for morphological processing in visual comprehension based on naive discriminative learning.

6. Worldlex: Twitter and blog word frequencies for 66 languages.

7. EsPal: one-stop shopping for Spanish word properties.

8. SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.

9. Subtitle-based word frequencies as the best estimate of reading behavior: the case of greek.

10. What is semantic diversity and why does it facilitate visual word recognition?