Literature DB >> 30530380

Using Lexical Chains to Identify Text Difficulty: A Corpus Statistics and Classification Study.

Partha Mukherjee, Gondy Leroy, David Kauchak.   

Abstract

Our goal is data-driven discovery of features for text simplification. In this paper, we investigate three types of lexical chains: exact, synonymous, and semantic. A lexical chain links semantically related words in a document. We examine their potential with a document-level corpus statistics study (914 texts) to estimate their overall capacity to differentiate between easy and difficult text and a classification task (11 000 sentences) to determine usefulness of features at sentence-level for simplification. For the corpus statistics study we tested five document-level features for each chain type: total number of chains, average chain length, average chain span, number of crossing chains, and the number of chains longer than half the document length. We found significant differences between easy and difficult text for average chain length and the average number of cross chains. For the sentence classification study, we compared the lexical chain features to standard bag-of-words features on a range of classifiers: logistic regression, naïve Bayes, decision trees, linear and RBF kernel SVM, and random forest. The lexical chain features performed significantly better than the bag-of-words baseline across all classifiers with the best classifier achieving an accuracy of ∼90% (compared to 78% for bag-of-words). Overall, we find several lexical chain features provide specific information useful for identifying difficult sentences of text, beyond what is available from standard lexical features.

Entities:  

Mesh:

Year:  2018        PMID: 30530380      PMCID: PMC6551329          DOI: 10.1109/JBHI.2018.2885465

Source DB:  PubMed          Journal:  IEEE J Biomed Health Inform        ISSN: 2168-2194            Impact factor:   5.772


  14 in total

1.  Coh-metrix: analysis of text on cohesion and language.

Authors:  Arthur C Graesser; Danielle S McNamara; Max M Louwerse; Zhiqiang Cai
Journal:  Behav Res Methods Instrum Comput       Date:  2004-05

2.  The effect of word familiarity on actual and perceived text difficulty.

Authors:  Gondy Leroy; David Kauchak
Journal:  J Am Med Inform Assoc       Date:  2013-10-07       Impact factor: 4.497

3.  Estimation of the probability of an event as a function of several independent variables.

Authors:  S H Walker; D B Duncan
Journal:  Biometrika       Date:  1967-06       Impact factor: 2.445

4.  The Role of Surface, Semantic and Grammatical Features on Simplification of Spanish Medical Texts: A User Study.

Authors:  Partha Mukherjee; Gondy Leroy; David Kauchak; Brianda Armenta Navarrete; Damian Y Diaz; Sonia Colina
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

5.  NegAIT: A new parser for medical text simplification using morphological, sentential and double negation.

Authors:  Partha Mukherjee; Gondy Leroy; David Kauchak; Srinidhi Rajanarayanan; Damian Y Romero Diaz; Nicole P Yuan; T Gail Pritchard; Sonia Colina
Journal:  J Biomed Inform       Date:  2017-03-22       Impact factor: 6.317

6.  Data management in clinical research: Synthesizing stakeholder perspectives.

Authors:  Stephen B Johnson; Frank J Farach; Kevin Pelphrey; Leon Rozenblit
Journal:  J Biomed Inform       Date:  2016-02-27       Impact factor: 6.317

7.  How (not) to inform patients about drug use: use and effects of negations in Dutch patient information leaflets.

Authors:  Christian Burgers; Camiel J Beukeboom; Lisa Sparks; Vera Diepeveen
Journal:  Pharmacoepidemiol Drug Saf       Date:  2014-07-15       Impact factor: 2.890

8.  How the doc should (not) talk: when breaking bad news with negations influences patients' immediate responses and medical adherence intentions.

Authors:  Christian Burgers; Camiel J Beukeboom; Lisa Sparks
Journal:  Patient Educ Couns       Date:  2012-08-29

9.  The influence of text characteristics on perceived and actual difficulty of health information.

Authors:  Gondy Leroy; Stephen Helmreich; James R Cowie
Journal:  Int J Med Inform       Date:  2010-03-04       Impact factor: 4.046

10.  Classification of clinically useful sentences in clinical evidence resources.

Authors:  Mohammad Amin Morid; Marcelo Fiszman; Kalpana Raja; Siddhartha R Jonnalagadda; Guilherme Del Fiol
Journal:  J Biomed Inform       Date:  2016-01-13       Impact factor: 6.317

View more
  2 in total

1.  Improving the Quality of Suggestions for Medical Text Simplification Tools.

Authors:  David Kauchak; Jorge Apricio; Gondy Leroy
Journal:  AMIA Annu Symp Proc       Date:  2022-05-23

2.  Evaluation of an online text simplification editor using manual and automated metrics for perceived and actual text difficulty.

Authors:  Gondy Leroy; David Kauchak; Diane Haeger; Douglas Spegman
Journal:  JAMIA Open       Date:  2022-05-30
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.