| Literature DB >> 33809188 |
Jingqiao Wu1, Xiaoyue Feng2, Renchu Guan1,2, Yanchun Liang1,2.
Abstract
Machine learning models can automatically discover biomedical research trends and promote the dissemination of information and knowledge. Text feature representation is a critical and challenging task in natural language processing. Most methods of text feature representation are based on word representation. A good representation can capture semantic and structural information. In this paper, two fusion algorithms are proposed, namely, the Tr-W2v and Ti-W2v algorithms. They are based on the classical text feature representation model and consider the importance of words. The results show that the effectiveness of the two fusion text representation models is better than the classical text representation model, and the results based on the Tr-W2v algorithm are the best. Furthermore, based on the Tr-W2v algorithm, trend analyses of cancer research are conducted, including correlation analysis, keyword trend analysis, and improved keyword trend analysis. The discovery of the research trends and the evolution of hotspots for cancers can help doctors and biological researchers collect information and provide guidance for further research.Entities:
Keywords: feature fusion; feature representation; text mining; trend analysis
Year: 2021 PMID: 33809188 PMCID: PMC8001649 DOI: 10.3390/e23030338
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Framework of our work.
Figure 2Flowchart of similarity trend analysis.
Figure 3Flowchart of keyword trend analysis.
Figure 4Flowchart of improved keyword trend analysis.
Number of research papers for the five cancers.
| Cancer | 2014 | 2015 | 2016 | 2017 | 2018 |
|---|---|---|---|---|---|
| Lung | 9322 | 9966 | 9446 | 9508 | 10,149 |
| Breast | 12,328 | 12,825 | 12,600 | 12,286 | 12,743 |
| Gastric | 3747 | 3572 | 3637 | 3414 | 3561 |
| Colorectal | 8950 | 9174 | 8778 | 8617 | 8868 |
| Liver | 6651 | 6871 | 6517 | 6431 | 6555 |
Figure 5Two-dimensional clustering visualization results based on five word-representation algorithms.
Figure 6Correlation of the top five high-risk cancers. (a) Correlation between lung cancer and the other four cancers, (b) correlation between breast cancer and the other four cancers, (c) correlation between gastric cancer and the other four cancers, (d) correlation between colorectal cancer and the other four cancers, and (e) correlation between liver cancer and the other four cancers.
Figure 7Hotspots of lung cancer were obtained by keyword trend analysis model (a) and improved keyword trend analysis model (b).
Figure 8Research trends of lung cancer research related gene protein and invertase factor in the last five years.
Figure 9Research trends of lung cancer research related therapeutic drugs and methods in the last five years.
Figure 10Research trends of other related hotspots in the last five years.