| Literature DB >> 19417065 |
Matthias Frisch1, Bernward Klocke, Manuela Haltmeier, Kornelie Frech.
Abstract
LitInspector is a literature search tool providing gene and signal transduction pathway mining within NCBI's PubMed database. The automatic gene recognition and color coding increases the readability of abstracts and significantly speeds up literature research. A main challenge in gene recognition is the resolution of homonyms and rejection of identical abbreviations used in a 'non-gene' context. LitInspector uses automatically generated and manually refined filtering lists for this purpose. The quality of the LitInspector results was assessed with a published dataset of 181 PubMed sentences. LitInspector achieved a precision of 96.8%, a recall of 86.6% and an F-measure of 91.4%. To further demonstrate the homonym resolution qualities, LitInspector was compared to three other literature search tools using some challenging examples. The homonym MIZ-1 (gene IDs 7709 and 9063) was correctly resolved in 87% of the abstracts by LitInspector, whereas the other tools achieved recognition rates between 35% and 67%. The LitInspector signal transduction pathway mining is based on a manually curated database of pathway names (e.g. wingless type), pathway components (e.g. WNT1, FZD1), and general pathway keywords (e.g. signaling cascade). The performance was checked for 10 randomly selected genes. Eighty-two per cent of the 38 predicted pathway associations were correct. LitInspector is freely available at http://www.litinspector.org/.Entities:
Mesh:
Year: 2009 PMID: 19417065 PMCID: PMC2703902 DOI: 10.1093/nar/gkp303
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Screenshot of the LitInspector user interface (A) and a results page (B). (A) The user interface allows input of up to three gene synonyms (e.g. WT1, PAX2) or gene identifiers (e.g. 7490) and free text (e.g. binding site). Default organism is H. sapiens, other organisms can be chosen from a pop-up menu. The search can be filtered for the occurrence of tissue, disease and pathway keywords in the abstract or tissue and disease annotations provided with the MeSH terms. (B) Each result page starts with query info where the user input, the number of citations found and, in case of a gene search, a link to the signal transduction pathway associations is displayed. The results are color-coded sentences wherein genes (green), transcription factors (purple), or keywords (tissue, cyan; disease, pink; pathway, yellow) are highlighted. Genes and transcription factors are hyperlinked to NCBI's Entrez Gene for further information. The output is ordered by the PubMed identifiers (latest publications coming first) which directly link to NCBI's PubMed. The user can select individual abstracts and retrieve them in batch.
The gene synonym ‘MBP’—a homonym
| Gene synonym/ abbreviation | Gene name (Gene ID)/meaning | Number of citations |
|---|---|---|
| MBP | Myelin basic protein (4155) | 3000 |
| MBP | Major basic protein (5553) | 300 |
| MBP | Mannose binding protein (4153) | 100 |
| MBP | Mean blood pressure | 500 |
| MBP | Monobutyl phthalate | 40 |
| MBP | Megabase pairs | 10 |
MBP is used for three different genes and, in addition, in a ‘non-gene’ context as abbreviation with several different meanings.
Figure 2.Screenshot of the LitInspector signal transduction pathway mining output for the gene ACP1 (acid phosphatase 1). (A) List of the found signaling pathways sorted by the number of references. The user has full access to the color-coded PubMed sentences from which the pathway associations have been derived from. (B) In case of ACP1 most references were found for TCR (T-cell receptor) signaling (eight references).
Evaluation of LitInspector using a dataset of 181 PubMed sentences compared to two other programs
| Program name | Precision (%) | Recall (%) | |
|---|---|---|---|
| LitInspector | 96.8 | 86.6 | 91.4 |
| PolySearch | 90.1 | 85.3 | 87.6 |
| iHOP | 87.1 | 81.8 | 84.4 |
Source: Hoffmann and Valencia (4); Cheng et al. (6). The numbers for iHOP (4) and PolySearch (6) are taken from the corresponding publications.
Evaluation of the LitInspector homonym resolution compared to three other data mining tools by means of the homonym MIZ-1
| Text mining software | MIZ-1 Myc-interacting zinc finger-1 (Gene ID: 7709) | MIZ-1 Msx-interacting-zinc finger-1 (Gene ID: 9063) | Unresolved homonyms | Correctly resolved homonyms (percent) | ||||
|---|---|---|---|---|---|---|---|---|
| Number matches | Correctly resolved | False positives | Number matches | Correctly resolved | False positives | |||
| LitInspector | 53 | 44 | 2 | 36 | 27 | 2 | 7 | 87% |
| EBIMed | 11 | 10 | 1 | 17 | 9 | 8 | 0 | 67% |
| PubGene | 45 | (9) | (1) | 34 | (2) | (8) | (0) | (55% |
| iHOP | 36 | 14 | 0 | 31 | 12 | 6 | 35 | 39% |
The programs were used with default parameters. All numbers refer to abstracts, i.e. sentences with the same identified gene from one paper were counted only once for this gene. The evaluation was performed in July 2008.
aPubGene provides only 10 example papers for each search, therefore, an evaluation of the complete results is not possible. This evaluation was performed using the 20 (2 × 10) abstracts available.
Signal transduction pathways of ACP1 (gene ID 52) found by LitInspector pathway mining
For each pathway, a reference sentence from PubMed is shown (pathway keywords are highlighted in yellow, genes in green).