Literature DB >> 29568820

SuperCAT: The (New and Improved) Corpus Analysis Toolkit.

K Bretonnel Cohen1, William A Baumgartner1, Irina Temnikova2.   

Abstract

This paper reports SuperCAT, a corpus analysis toolkit. It is a radical extension of SubCAT, the Sublanguage Corpus Analysis Toolkit, from sublanguage analysis to corpus analysis in general. The idea behind SuperCAT is that representative corpora have no tendency towards closure-that is, they tend towards infinity. In contrast, non-representative corpora have a tendency towards closure-roughly, finiteness. SuperCAT focuses on general techniques for the quantitative description of the characteristics of any corpus (or other language sample), particularly concerning the characteristics of lexical distributions. Additionally, SuperCAT features a complete re-engineering of the previous SubCAT architecture.

Entities:  

Keywords:  corpus; representativeness; sublanguage; toolkit

Year:  2016        PMID: 29568820      PMCID: PMC5860820     

Source DB:  PubMed          Journal:  LREC Int Conf Lang Resour Eval


  3 in total

1.  GENIA corpus--semantically annotated corpus for bio-textmining.

Authors:  J-D Kim; T Ohta; Y Tateisi; J Tsujii
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

2.  Concept annotation in the CRAFT corpus.

Authors:  Michael Bada; Miriam Eckert; Donald Evans; Kristin Garcia; Krista Shipley; Dmitry Sitnikov; William A Baumgartner; K Bretonnel Cohen; Karin Verspoor; Judith A Blake; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2012-07-09       Impact factor: 3.169

3.  A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools.

Authors:  Karin Verspoor; Kevin Bretonnel Cohen; Arrick Lanfranchi; Colin Warner; Helen L Johnson; Christophe Roeder; Jinho D Choi; Christopher Funk; Yuriy Malenkiy; Miriam Eckert; Nianwen Xue; William A Baumgartner; Michael Bada; Martha Palmer; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2012-08-17       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.