| Literature DB >> 21818454 |
Choonghyun Han1, Sooyoung Yoo, Jinwook Choi.
Abstract
OBJECTIVES: Measurement of similarities between documents is typically influenced by the sparseness of the term-document matrix employed. Latent semantic indexing (LSI) may improve the results of this type of analysis.Entities:
Keywords: Cluster Analysis; Documentation; Information Storage and Retrieval
Year: 2011 PMID: 21818454 PMCID: PMC3092990 DOI: 10.4258/hir.2011.17.1.24
Source DB: PubMed Journal: Healthc Inform Res ISSN: 2093-3681
Figure 1Diagram showing a conceptual description of singular value decomposition.
Number of terms and proportions of zero-filled cells
ED: editorials, CF: clinical documents with full columns, CS: clinical documents with selected columns.
Cosine values which are calculated with term frequency (TF) and inverse document frequency (IDF) increased after latent semantic indexing (LSI)
SD: standard deviation, ED: editorials, CF: clinical documents with full columns, CS: clinical documents with selected columns.
Number of co-occurring terms. Terms from editorials (ED), clinical documents with full columns (CF), and clinical documents with selected columns (CS)
TF: term frequency, SD: standard deviation, Min: minimum, Max: maximum.
Figure 2Distributions of unique number of shared terms in editorials and clinical (Cli.) documents.
Pearson's correlation between number of co-occurring terms and document similarity
ED: editorials, CF: clinical documents with full columns, CS: clinical documents with selected columns.