| Literature DB >> 11262957 |
I Iliopoulos1, A J Enright, C A Ouzounis.
Abstract
We present an algorithm for large-scale document clustering of biological text, obtained from Medline abstracts. The algorithm is based on statistical treatment of terms, stemming, the idea of a 'go-list', unsupervised machine learning and graph layout optimization. The method is flexible and robust, controlled by a small number of parameter values. Experiments show that the resulting document clusters are meaningful as assessed by cluster-specific terms. Despite the statistical nature of the approach, with minimal semantic analysis, the terms provide a shallow description of the document corpus and support concept discovery.Mesh:
Year: 2001 PMID: 11262957 DOI: 10.1142/9789814447362_0038
Source DB: PubMed Journal: Pac Symp Biocomput ISSN: 2335-6928