Literature DB >> 19392005

Level statistics of words: finding keywords in literary texts and symbolic sequences.

P Carpena1, P Bernaola-Galván, M Hackenberg, A V Coronado, J L Oliver.   

Abstract

Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.

Entities:  

Year:  2009        PMID: 19392005     DOI: 10.1103/PhysRevE.79.035102

Source DB:  PubMed          Journal:  Phys Rev E Stat Nonlin Soft Matter Phys        ISSN: 1539-3755


  11 in total

1.  Arrangement of 3D structural motifs in ribosomal RNA.

Authors:  Karen Sargsyan; Carmay Lim
Journal:  Nucleic Acids Res       Date:  2010-02-16       Impact factor: 16.971

2.  Segmentation of time series with long-range fractal correlations.

Authors:  P Bernaola-Galván; J L Oliver; M Hackenberg; A V Coronado; P Ch Ivanov; P Carpena
Journal:  Eur Phys J B       Date:  2012-06-01       Impact factor: 1.500

3.  Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis.

Authors:  Marcelo A Montemurro; Damián H Zanette
Journal:  PLoS One       Date:  2013-06-21       Impact factor: 3.240

4.  Zipf's law leads to Heaps' law: analyzing their relation in finite-size systems.

Authors:  Linyuan Lü; Zi-Ke Zhang; Tao Zhou
Journal:  PLoS One       Date:  2010-12-02       Impact factor: 3.240

5.  WordCluster: detecting clusters of DNA words and genomic elements.

Authors:  Michael Hackenberg; Pedro Carpena; Pedro Bernaola-Galván; Guillermo Barturen; Angel M Alganza; José L Oliver
Journal:  Algorithms Mol Biol       Date:  2011-01-24       Impact factor: 1.405

6.  The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction.

Authors:  Elham Najafi; Amir H Darooneh
Journal:  PLoS One       Date:  2015-06-19       Impact factor: 3.240

7.  A Complex Network Approach to Stylometry.

Authors:  Diego Raphael Amancio
Journal:  PLoS One       Date:  2015-08-27       Impact factor: 3.240

8.  An improved alignment-free model for DNA sequence similarity metric.

Authors:  Junpeng Bao; Ruiyu Yuan; Zhe Bao
Journal:  BMC Bioinformatics       Date:  2014-09-28       Impact factor: 3.169

9.  Model of the Dynamic Construction Process of Texts and Scaling Laws of Words Organization in Language Systems.

Authors:  Shan Li; Ruokuang Lin; Chunhua Bian; Qianli D Y Ma; Plamen Ch Ivanov
Journal:  PLoS One       Date:  2016-12-22       Impact factor: 3.240

10.  Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

Authors:  Zhi Li; Hongyan Cao; Yuehua Cui; Yanbo Zhang
Journal:  Theor Biol Med Model       Date:  2016-01-25       Impact factor: 2.432

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.