Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Level statistics of words: finding keywords in literary texts and symbolic sequences.

Literature DB >> 19392005

Level statistics of words: finding keywords in literary texts and symbolic sequences.

P Carpena¹, P Bernaola-Galván, M Hackenberg, A V Coronado, J L Oliver.

Abstract

Using a generalization of the level statistics analysis of quantum disordered systems, we present an approach able to extract automatically keywords in literary texts. Our approach takes into account not only the frequencies of the words present in the text but also their spatial distribution along the text, and is based on the fact that relevant words are significantly clustered (i.e., they self-attract each other), while irrelevant words are distributed randomly in the text. Since a reference corpus is not needed, our approach is especially suitable for single documents for which no a priori information is available. In addition, we show that our method works also in generic symbolic sequences (continuous texts without spaces), thus suggesting its general applicability.

Entities: Disease

Year: 2009 PMID： 19392005 DOI： 10.1103/PhysRevE.79.035102

Source DB: PubMed Journal: Phys Rev E Stat Nonlin Soft Matter Phys ISSN： 1539-3755

Keyword Cloud
Cited

11 in total

Level statistics of words: finding keywords in literary texts and symbolic sequences.

1. Arrangement of 3D structural motifs in ribosomal RNA.

2. Segmentation of time series with long-range fractal correlations.

3. Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis.

4. Zipf's law leads to Heaps' law: analyzing their relation in finite-size systems.

5. WordCluster: detecting clusters of DNA words and genomic elements.

6. The Fractal Patterns of Words in a Text: A Method for Automatic Keyword Extraction.

7. A Complex Network Approach to Stylometry.

8. An improved alignment-free model for DNA sequence similarity metric.

9. Model of the Dynamic Construction Process of Texts and Scaling Laws of Words Organization in Language Systems.

10. Extracting DNA words based on the sequence features: non-uniform distribution and integrity.