| Literature DB >> 33286049 |
Igor A Bessmertny1, Xiaoxi Huang1, Aleksei V Platonov2, Chuqiao Yu3, Julia A Koroleva2.
Abstract
Search engines are able to find documents containing patterns from a query. This approach can be used for alphabetic languages such as English. However, Chinese is highly dependent on context. The significant problem of Chinese text processing is the missing blanks between words, so it is necessary to segment the text to words before any other action. Algorithms for Chinese text segmentation should consider context; that is, the word segmentation process depends on other ideograms. As the existing segmentation algorithms are imperfect, we have considered an approach to build the context from all possible n-grams surrounding the query words. This paper proposes a quantum-inspired approach to rank Chinese text documents by their relevancy to the query. Particularly, this approach uses Bell's test, which measures the quantum entanglement of two words within the context. The contexts of words are built using the hyperspace analogue to language (HAL) algorithm. Experiments fulfilled in three domains demonstrated that the proposed approach provides acceptable results.Entities:
Keywords: content analysis and indexing; text analysis; text mining
Year: 2020 PMID: 33286049 PMCID: PMC7516728 DOI: 10.3390/e22030275
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Graphic interpretation of Bell’s test.
Term occurrence in the documents of the domain “Geology.”
| No. | Document | 火山 (volcano) | 岩石 (rocks) |
|---|---|---|---|
| 1 | Introduction | 7 | 51 |
| 2 | Minerals | 2 | 26 |
| 3 | Igneous rock | 238 | 122 |
| 4 | Sedimetary rock | 38 | 193 |
| 5 | Metamorphic rock | 4 | 148 |
| 6 | Mineral deposit | 44 | 48 |
| 7 | Structural movement and structural changes | 43 | 44 |
| 8 | Earthquake | 18 | 31 |
Figure 2Bell’s test for the domain “Geology.”
Figure 3TF-IDF analysis for the domain “Geology.”
Term occurrence in the documents of the domain “History of science.”
| No | Document | 实验 (experiment) | 科学(science) |
|---|---|---|---|
| 1 | Introduction | 6 | 51 |
| 2 | Orient. and Europe Middle ages | 19 | 73 |
| 3 | 16th and 17th Cent. Scientific revolution | 19 | 46 |
| 4 | Galileo and mechanics | 30 | 27 |
| 5 | Descartes’s math and philosophy | 35 | 110 |
| 6 | Medicine | 19 | 56 |
| 7 | 17th Cent. science | 58 | 189 |
| 8 | 18th Cent. mechanical sciences | 6 | 44 |
Figure 4Bell’s test for the domain “History of Science.”
Figure 5TF-IDF analysis for the domain “History of Science.”
Figure 6Bell’s test for the domain “Psychology.”
Figure 7TF- IDF analysis for the domain “Psychology.”
Computed discounted cumulative gain for each domain.
| Domain Name | TF-IDF DCG | Bell’s Test DCG |
|---|---|---|
| Geology | 1.08 | 1.70 |
| History of science | 1.24 | 2.13 |
| Psychology | 1.67 | 2.38 |