Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A large-scaled corpus for assessing text readability.

Literature DB >> 35297016

A large-scaled corpus for assessing text readability.

Scott Crossley¹, Aron Heintz², Joon Suh Choi³, Jordan Batchelor³, Mehrnoush Karimi³, Agnes Malatinszky².

Abstract

This paper introduces the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~ 5000 text excerpts along with information about the excerpt's year of publishing, genre, and other metadata. The CLEAR corpus will provide researchers interested in discourse processing and reading with a resource from which to develop and test readability metrics and to model text readability. The CLEAR corpus includes a number of improvements in comparison to previous readability corpora including size, breadth of the excerpts available, which cover over 250 years of writing in two different genres, and unique readability criterion provided for each text based on teachers' ratings of text difficulty for student readers. This paper discusses the development of the corpus and presents reliability metrics for the human ratings of readability.

Entities: Chemical

Keywords: Corpus linguistics; Natural language processing; Readability; Readability formulas

Year: 2022 PMID： 35297016 DOI： 10.3758/s13428-022-01802-x

Source DB: PubMed Journal: Behav Res Methods ISSN： 1554-351X

Keyword Cloud
References

13 in total

1. Visual duration threshold as a function of word-probability.

Authors: D H HOWES; R L SOLOMON
Journal: J Exp Psychol Date: 1951-06

2. The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion.

Authors: Scott A Crossley; Kristopher Kyle; Danielle S McNamara
Journal: Behav Res Methods Date: 2016-12

3. Concreteness ratings for 40 thousand generally known English word lemmas.

Authors: Marc Brysbaert; Amy Beth Warriner; Victor Kuperman
Journal: Behav Res Methods Date: 2014-09

4. A power primer.

Authors: J Cohen
Journal: Psychol Bull Date: 1992-07 Impact factor: 17.737

5. Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English.

Authors: Marc Brysbaert; Boris New
Journal: Behav Res Methods Date: 2009-11

6. The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap.

Authors: Scott A Crossley; Kristopher Kyle; Mihai Dascalu
Journal: Behav Res Methods Date: 2019-02

7. Semantic diversity: a measure of semantic ambiguity based on variability in the contextual usage of words.

Authors: Paul Hoffman; Matthew A Lambon Ralph; Timothy T Rogers
Journal: Behav Res Methods Date: 2013-09

8. The role of knowledge in discourse comprehension: a construction-integration model.

Authors: W Kintsch
Journal: Psychol Rev Date: 1988-04 Impact factor: 8.934

9. Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis.

Authors: Scott A Crossley; Kristopher Kyle; Danielle S McNamara
Journal: Behav Res Methods Date: 2017-06

10. The English Lexicon Project.

Authors: David A Balota; Melvin J Yap; Michael J Cortese; Keith A Hutchison; Brett Kessler; Bjorn Loftis; James H Neely; Douglas L Nelson; Greg B Simpson; Rebecca Treiman
Journal: Behav Res Methods Date: 2007-08