| Literature DB >> 15998455 |
Martin Krallinger1, Alfonso Valencia.
Abstract
Text-mining in molecular biology -- defined as the automatic extraction of information about genes, proteins and their functional relationships from text documents -- has emerged as a hybrid discipline on the edges of the fields of information science, bioinformatics and computational linguistics. A range of text-mining applications have been developed recently that will improve access to knowledge for biologists and database annotators.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15998455 PMCID: PMC1175978 DOI: 10.1186/gb-2005-6-7-224
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Biomedical text-mining resources, servers and programs
| Name | Description | URL | Published reference or URL* |
| Abbreviation Server | Biomedical abbreviation server | [35] | |
| AbGene | Protein name tagger | [29] | |
| ABNER | Protein/Gene/DNA/RNA/cell tagger | [31] | |
| AliasServer | Protein alias handler | [37] | |
| ARGH | Biomedical acronym resolver | [88] | |
| ARROWSMITH | Extended MEDLINE search tool | [84] | |
| BioMail | PubMed updating and alerting service | [12] | |
| BioRAT | Biology information extraction tool | [81] | |
| BITOLA | Literature-based biomedical discovery system | [86] | |
| Chilibot | Relationship extraction | [57] | |
| CrossRef Search | Full content search engine | [8] | |
| GAPSCORE | Protein name tagger | [23] | |
| Geisha | Text-mining tool to assist microarray analysis | [67] | |
| GeneScene | Information extraction for regulatory pathways | [59] | |
| GOAnnotator | Annotation extraction from literature | [51] | |
| Google Scholar | Scholar literature search engine | [6] | |
| iHOP | Information on hyperlinked proteins | [40] | |
| iProLINK | Protein annotation and tagging | [55] | |
| KAT | Annotate proteins from scientific references | [52] | |
| KeX | Protein name tagger | [33] | |
| KinasePathway database | Tool for extraction of protein, gene and compound interactions from text | [46] | |
| MedBlast | Document retrieval for sequences | [63] | |
| MedMiner | Extraction of sentences relevant to genes | [69] | |
| microGENIE | Text-mining for microarrays | [76] | |
| My NCBI | PubMed updating and alerting service | [11] | |
| NDPG | Scores the literature based coherence of gene clusters | None | [66] |
| NLProt | Protein name tagger | [25] | |
| NPG search engine | Nature Publishing Group search engine | [9] | |
| PreBIND | Classifier of protein interaction documents | [44] | |
| PubCrawler | PubMed updating and alerting service | [13] | |
| PubGene | Text-mining tool for microarrays | [72] | |
| PubMatrix | Multiplex literature mining tool | [74] | |
| PubMed Entrez | Biomedical citation retrieval system | [3] | |
| Relationship Extractor | Biomedical relationship extractor | [90] | |
| SAWTED | Text-enhanced remote homolog detector | [61] | |
| Scopus | Scientific literature database and search | [93] | |
| Textpresso | [48] | ||
| XplorMed | Explores bibliographic MEDLINE searches | [91] | |
| Yapex | Protein name tagger | [27] |
An overview of some of the available text-mining, information-extraction, information-retrieval and selective dissemination of information services currently available. *References to articles describing each tool are given; where no article has been published, the reference is to the URL.
Figure 1An overview of biological natural language processing (BioNLP) and text-mining applications for biology. The major topics are represented by the inner circle of seven approaches, and the corresponding applications are given in the outer layers of boxes. Most of the tools are available online or for download. Some applications could be classified into multiple topics; they are shown here associated with one of their most significant topics. For instance, most of the text-mining applications (that is, the applications that are not simply for article retrieval) have integrated modules for named entity recognition (NER), and selective dissemination of information (SDI) services often use automated Boolean queries for article retrieval. References and URLs for each application, where available, are given in Table 1.
Figure 2Basic steps in the use of the iHOP text-mining tool [40], illustrated with screenshots [42]. For a given query (for example, the protein symbols (a) Wnt-1 or (b) LEF-1), all the sentences mentioning the name are retrieved from PubMed. These sentences also contain mentions of other proteins, which are highlighted and which might show associations with the query protein (see the magnified area in (b)). Functional terms (such as 'target' and 'complexes' and interaction verbs (such as 'activated' and 'stabilizes') are in bold. (c) By clicking on the 'Gene model' link in the left panel in (a,b), interaction networks of proteins that co-occur in sentences with the query proteins can be displayed.