Literature DB >> 34357542

Shabd: A psycholinguistic database for Hindi.

Ark Verma1, Vivek Sikarwar2, Himanshu Yadav3, Ranjith Jaganathan2, Pawan Kumar4.   

Abstract

We present Shabd, a psycholinguistic database in Hindi. It is based on a corpus of 1.4 billion words from electronic newspapers and news websites. Word frequencies and part of speech information have been derived and are made available in a cleaned list of 34 thousand hand-selected words, and a list of 96 thousand words observed with a frequency of more than 100 times in the corpus. Next to the Shabd database, we also make a list with all 2.3 million word types available and a list with the 2.5 million most frequent word pairs (word bigrams). The quality of the word frequency measure was tested in two lexical decision tasks. We observed that the Shabd word frequencies outperform existing frequencies based on smaller corpora of newspapers but not the Worldlex word frequencies based on an analysis of blogs. We also observed that word frequency accounts for as much variance as contextual diversity (operationalized as the number of documents in which the words were observed). The Shabd database is freely available for research.
© 2021. The Psychonomic Society, Inc.

Entities:  

Keywords:  Akshara; Contextual diversity; Corpus; Devangari; Hindi; Lexical decision; Visual word recognition; Word frequency

Mesh:

Year:  2021        PMID: 34357542     DOI: 10.3758/s13428-021-01625-2

Source DB:  PubMed          Journal:  Behav Res Methods        ISSN: 1554-351X


  22 in total

1.  Aralex: a lexical database for Modern Standard Arabic.

Authors:  Sami Boudelaa; William D Marslen-Wilson
Journal:  Behav Res Methods       Date:  2010-05

2.  The impact of word prevalence on lexical decision times: Evidence from the Dutch Lexicon Project 2.

Authors:  Marc Brysbaert; Michaël Stevens; Paweł Mandera; Emmanuel Keuleers
Journal:  J Exp Psychol Hum Percept Perform       Date:  2015-10-26       Impact factor: 3.332

3.  Contextual diversity, not word frequency, determines word-naming and lexical decision times.

Authors:  James S Adelman; Gordon D A Brown; José F Quesada
Journal:  Psychol Sci       Date:  2006-09

4.  Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English.

Authors:  Marc Brysbaert; Boris New
Journal:  Behav Res Methods       Date:  2009-11

5.  An amorphous model for morphological processing in visual comprehension based on naive discriminative learning.

Authors:  R Harald Baayen; Petar Milin; Dusica Filipović Đurđević; Peter Hendrix; Marco Marelli
Journal:  Psychol Rev       Date:  2011-07       Impact factor: 8.934

6.  Worldlex: Twitter and blog word frequencies for 66 languages.

Authors:  Manuel Gimenes; Boris New
Journal:  Behav Res Methods       Date:  2016-09

7.  EsPal: one-stop shopping for Spanish word properties.

Authors:  Andrew Duchon; Manuel Perea; Nuria Sebastián-Gallés; Antonia Martí; Manuel Carreiras
Journal:  Behav Res Methods       Date:  2013-12

8.  SUBTLEX-CH: Chinese word and character frequencies based on film subtitles.

Authors:  Qing Cai; Marc Brysbaert
Journal:  PLoS One       Date:  2010-06-02       Impact factor: 3.240

9.  Subtitle-based word frequencies as the best estimate of reading behavior: the case of greek.

Authors:  Maria Dimitropoulou; Jon Andoni Duñabeitia; Alberto Avilés; José Corral; Manuel Carreiras
Journal:  Front Psychol       Date:  2010-12-21

10.  What is semantic diversity and why does it facilitate visual word recognition?

Authors:  Benedetta Cevoli; Chris Watkins; Kathleen Rastle
Journal:  Behav Res Methods       Date:  2021-02
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.