Literature DB >> 25164176

Comparing methods for single paragraph similarity analysis.

Benjamin Stone1, Simon Dennis, Peter J Kwantes.   

Abstract

The focus of this paper is two-fold. First, similarities generated from six semantic models were compared to human ratings of paragraph similarity on two datasets-23 World Entertainment News Network paragraphs and 50 ABC newswire paragraphs. Contrary to findings on smaller textual units such as word associations (Griffiths, Tenenbaum, & Steyvers, 2007), our results suggest that when single paragraphs are compared, simple nonreductive models (word overlap and vector space) can provide better similarity estimates than more complex models (LSA, Topic Model, SpNMF, and CSM). Second, various methods of corpus creation were explored to facilitate the semantic models' similarity estimates. Removing numeric and single characters, and also truncating document length improved performance. Automated construction of smaller Wikipedia-based corpora proved to be very effective, even improving upon the performance of corpora that had been chosen for the domain. Model performance was further improved by augmenting corpora with dataset paragraphs.
Copyright © 2010 Cognitive Science Society, Inc.

Entities:  

Keywords:  Corpus construction; Corpus preprocessing; Paragraph similarity; Semantic models; Wikipedia corpora

Mesh:

Year:  2010        PMID: 25164176     DOI: 10.1111/j.1756-8765.2010.01108.x

Source DB:  PubMed          Journal:  Top Cogn Sci        ISSN: 1756-8757


  4 in total

Review 1.  Using experiential optimization to build lexical representations.

Authors:  Brendan T Johns; Michael N Jones; D J K Mewhort
Journal:  Psychon Bull Rev       Date:  2019-02

2.  Framing COVID-19: How we conceptualize and discuss the pandemic on Twitter.

Authors:  Philipp Wicke; Marianna M Bolognesi
Journal:  PLoS One       Date:  2020-09-30       Impact factor: 3.240

3.  Perceptions of Life Support and Advance Care Planning During the COVID-19 Pandemic: A Global Study of Twitter Users.

Authors:  Vishal R Patel; Sofia Gereta; Christopher J Blanton; Alexander L Chu; Akash P Patel; Michael Mackert; David Zientek; Nico Nortjé; Anjum Khurshid; Christopher Moriates; Gregory Wallingford
Journal:  Chest       Date:  2022-01-22       Impact factor: 10.262

4.  The Hidden Pandemic of Family Violence During COVID-19: Unsupervised Learning of Tweets.

Authors:  Jia Xue; Junxiang Chen; Chen Chen; Ran Hu; Tingshao Zhu
Journal:  J Med Internet Res       Date:  2020-11-06       Impact factor: 5.428

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.