Literature DB >> 35297016

A large-scaled corpus for assessing text readability.

Scott Crossley1, Aron Heintz2, Joon Suh Choi3, Jordan Batchelor3, Mehrnoush Karimi3, Agnes Malatinszky2.   

Abstract

This paper introduces the CommonLit Ease of Readability (CLEAR) corpus, which provides unique readability scores for ~ 5000 text excerpts along with information about the excerpt's year of publishing, genre, and other metadata. The CLEAR corpus will provide researchers interested in discourse processing and reading with a resource from which to develop and test readability metrics and to model text readability. The CLEAR corpus includes a number of improvements in comparison to previous readability corpora including size, breadth of the excerpts available, which cover over 250 years of writing in two different genres, and unique readability criterion provided for each text based on teachers' ratings of text difficulty for student readers. This paper discusses the development of the corpus and presents reliability metrics for the human ratings of readability.
© 2022. The Author(s).

Entities:  

Keywords:  Corpus linguistics; Natural language processing; Readability; Readability formulas

Year:  2022        PMID: 35297016     DOI: 10.3758/s13428-022-01802-x

Source DB:  PubMed          Journal:  Behav Res Methods        ISSN: 1554-351X


  13 in total

1.  Visual duration threshold as a function of word-probability.

Authors:  D H HOWES; R L SOLOMON
Journal:  J Exp Psychol       Date:  1951-06

2.  The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion.

Authors:  Scott A Crossley; Kristopher Kyle; Danielle S McNamara
Journal:  Behav Res Methods       Date:  2016-12

3.  Concreteness ratings for 40 thousand generally known English word lemmas.

Authors:  Marc Brysbaert; Amy Beth Warriner; Victor Kuperman
Journal:  Behav Res Methods       Date:  2014-09

4.  A power primer.

Authors:  J Cohen
Journal:  Psychol Bull       Date:  1992-07       Impact factor: 17.737

5.  Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English.

Authors:  Marc Brysbaert; Boris New
Journal:  Behav Res Methods       Date:  2009-11

6.  The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap.

Authors:  Scott A Crossley; Kristopher Kyle; Mihai Dascalu
Journal:  Behav Res Methods       Date:  2019-02

7.  Semantic diversity: a measure of semantic ambiguity based on variability in the contextual usage of words.

Authors:  Paul Hoffman; Matthew A Lambon Ralph; Timothy T Rogers
Journal:  Behav Res Methods       Date:  2013-09

8.  The role of knowledge in discourse comprehension: a construction-integration model.

Authors:  W Kintsch
Journal:  Psychol Rev       Date:  1988-04       Impact factor: 8.934

9.  Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis.

Authors:  Scott A Crossley; Kristopher Kyle; Danielle S McNamara
Journal:  Behav Res Methods       Date:  2017-06

10.  The English Lexicon Project.

Authors:  David A Balota; Melvin J Yap; Michael J Cortese; Keith A Hutchison; Brett Kessler; Bjorn Loftis; James H Neely; Douglas L Nelson; Greg B Simpson; Rebecca Treiman
Journal:  Behav Res Methods       Date:  2007-08
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.