| Literature DB >> 15980583 |
Thomas Goetz1, Claus-Wilhelm von der Lieth.
Abstract
Since it is becoming increasingly laborious to manually extract useful information embedded in the ever-growing volumes of literature, automated intelligent text analysis tools are becoming more and more essential to assist in this task. PubFinder (www.glycosciences.de/tools/PubFinder) is a publicly available web tool designed to improve the retrieval rate of scientific abstracts relevant for a specific scientific topic. Only the selection of a representative set of abstracts is required, which are central for a scientific topic. No special knowledge concerning the query-syntax is necessary. Based on the selected abstracts, a list of discriminating words is automatically calculated, which is subsequently used for scoring all defined PubMed abstracts for their probability of belonging to the defined scientific topic. This results in a hit-list of references in the descending order of their likelihood score. The algorithms and procedures implemented in PubFinder facilitate the perpetual task for every scientist of staying up-to-date with current publications dealing with a specific subject in biomedicine.Entities:
Mesh:
Year: 2005 PMID: 15980583 PMCID: PMC1160190 DOI: 10.1093/nar/gki429
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The 20 words that most discriminate abstracts of the example training set (‘literature mining’, see Figure 1) from other non-related abstracts
| Discriminating word | Word frequency in training set | Frequency in dictionary | ln |
|---|---|---|---|
| Abstracts | 7.8e−03 | 7.7e−06 | −331 |
| Medline | 6.6e−03 | 2.9e−05 | −214 |
| Articles | 6.5e−03 | 4.1e−05 | −192 |
| Text | 5.0e−03 | 1.8e−05 | −167 |
| Information | 1.0e−02 | 4.0e−04 | −163 |
| Databases | 4.6e−03 | 3.1e−05 | −132 |
| Database | 5.1e−03 | 7.3e−05 | −121 |
| Mining | 2.9e−03 | 5.9e−06 | −109 |
| Precision | 3.9e−03 | 2.9e−05 | −109 |
| Recall | 3.3e−03 | 2.2e−05 | −97 |
| Abstract | 2.6e−03 | 6.1e−06 | −96 |
| Literature | 5.5e−03 | 1.9e−04 | −96 |
| Names | 2.4e−03 | 5.7e−06 | −85 |
| Extraction | 3.7e−03 | 7.0e−05 | −81 |
| Biomedical | 2.5e−03 | 1.1e−05 | −79 |
| Data | 8.4e−03 | 1.3e−03 | −65 |
| Fields | 2.8e−03 | 4.6e−05 | −62 |
| Automatically | 1.9e−03 | 8.6e−06 | −62 |
| Title | 1.7e−03 | 3.5e−06 | −62 |
| Mesh | 2.1e−03 | 1.3e−05 | −61 |
Figure 1Setting up a new scan with PubFinder's ‘New Scan’ dialogue. After having provided a short description and selecting the starting year of the query, the user has to enter a set of PubMed IDs, which represent the scientific topic to be scanned for. In this example we chose a set of 28 PubMed abstracts that deal with literature mining.