Literature DB >> 17813910

Gauging Similarity with n-Grams: Language-Independent Categorization of Text.

M Damashek.   

Abstract

A language-independent means of gauging topical similarity in unrestricted text is described. The method combines information derived from n-grams (consecutive sequences of n characters) with a simple vector-space technique that makes sorting, categorization, and retrieval feasible in a large multilingual collection of documents. No prior information about document content or language is required. Context, as it applies to document similarity, can be accommodated by a well-defined procedure. When an existing document is used as an exemplar, the completeness and accuracy with which topically related documents are retrieved is comparable to that of the best existing systems. The results of a formal evaluation are discussed, and examples are given using documents in English and Japanese.

Year:  1995        PMID: 17813910     DOI: 10.1126/science.267.5199.843

Source DB:  PubMed          Journal:  Science        ISSN: 0036-8075            Impact factor:   47.728


  8 in total

1.  Identification of related gene/protein names based on an HMM of name variations.

Authors:  L Yeganova; L Smith; W J Wilbur
Journal:  Comput Biol Chem       Date:  2004-04       Impact factor: 2.877

2.  Citizens at the forefront of the constitutional debate: Voluntary citizen participation determinants and emergent content in Chile.

Authors:  María Paz Raveau; Juan Pablo Couyoumdjian; Claudio Fuentes-Bravo; Carlos Rodriguez-Sickert; Cristian Candia
Journal:  PLoS One       Date:  2022-06-06       Impact factor: 3.752

3.  Human efficiency for classifying natural versus random text.

Authors:  Peter Neri; Alicia Liu; Dennis M Levi
Journal:  Vision Res       Date:  2010-01-15       Impact factor: 1.886

4.  A Study of the Morpho-Semantic Relationship in Medline.

Authors:  W John Wilbur; Larry Smith
Journal:  Open Inf Syst J       Date:  2013-11-21

5.  A probabilistic molecular fingerprint for big data settings.

Authors:  Daniel Probst; Jean-Louis Reymond
Journal:  J Cheminform       Date:  2018-12-18       Impact factor: 5.514

6.  Is T Cell Negative Selection a Learning Algorithm?

Authors:  Inge M N Wortel; Can Keşmir; Rob J de Boer; Judith N Mandl; Johannes Textor
Journal:  Cells       Date:  2020-03-11       Impact factor: 6.600

Review 7.  Detecting Depression Signs on Social Media: A Systematic Literature Review.

Authors:  Rafael Salas-Zárate; Giner Alor-Hernández; María Del Pilar Salas-Zárate; Mario Andrés Paredes-Valverde; Maritza Bustos-López; José Luis Sánchez-Cervantes
Journal:  Healthcare (Basel)       Date:  2022-02-01

8.  n-Gram characterization of genomic islands in bacterial genomes.

Authors:  Gordana M Pavlović-Lazetić; Nenad S Mitić; Milos V Beljanski
Journal:  Comput Methods Programs Biomed       Date:  2008-12-19       Impact factor: 5.428

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.