| Literature DB >> 22693219 |
Philippe Thomas1, Johannes Starlinger, Alexander Vowinkel, Sebastian Arzt, Ulf Leser.
Abstract
Research results are primarily published in scientific literature and curation efforts cannot keep up with the rapid growth of published literature. The plethora of knowledge remains hidden in large text repositories like MEDLINE. Consequently, life scientists have to spend a great amount of time searching for specific information. The enormous ambiguity among most names of biomedical objects such as genes, chemicals and diseases often produces too large and unspecific search results. We present GeneView, a semantic search engine for biomedical knowledge. GeneView is built upon a comprehensively annotated version of PubMed abstracts and openly available PubMed Central full texts. This semi-structured representation of biomedical texts enables a number of features extending classical search engines. For instance, users may search for entities using unique database identifiers or they may rank documents by the number of specific mentions they contain. Annotation is performed by a multitude of state-of-the-art text-mining tools for recognizing mentions from 10 entity classes and for identifying protein-protein interactions. GeneView currently contains annotations for >194 million entities from 10 classes for ∼21 million citations with 271,000 full text bodies. GeneView can be searched at http://bc3.informatik.hu-berlin.de/.Entities:
Mesh:
Year: 2012 PMID: 22693219 PMCID: PMC3394277 DOI: 10.1093/nar/gks563
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Query result for articles mentioning genes UBE2I (GeneID 7329) and BRCA1 (GeneID 672).
Figure 2.Visualization of a selected article (PMID 21344391). Additional information such as full gene name and links to external databases can be provided for a selected entity.
Overview of all entities in the GeneView repository
| Entity type | Articles | Entities | Unique | Normalized |
|---|---|---|---|---|
| Chemical | 9 851 347 | 73 354 240 | 59 232 | ChemIDPlus |
| Species | 8 815 334 | 40 992 161 | 110 880 | NCBI Taxonomy |
| Drug | 6 023 081 | 44 595 216 | 3 052 | DrugBank/PharmGKB |
| Gene | 2 855 898 | 32 861 120 | 81 229 | Entrez Gene |
| Enzyme | 561 152 | 825 889 | 2 519 | Kegg |
| Disease | 272 240 | 679 364 | 9 681 | MeSH |
| SNP | 171 597 | 914 543 | 18 942 | dbSNP |
| Cell-type | 36 851 | 82 285 | 585 | MeSH |
| Tissue | 8 164 | 9 488 | 132 | MeSH |
| Histone Mod. | 5 938 | 62 370 | 316 | Brno nomenclature |
Articles: number of citations with at least one entity found; entities: total number of recognized mentions; unique: number of distinct entities; normalized: identifier mentions are normalized to.
Number of co-occurring concepts contained in GeneView
| Entity 1 | Entity 2 | Co-occurrence |
|---|---|---|
| Gene | Chemical | 48 278 038 |
| Gene | Drug | 20 099 049 |
| Gene | SNP | 1 203 334 |
| Gene | Histone modification | 162 108 |
| SNP | Chemical | 3 270 485 |
| SNP | Drug | 1 214 063 |
| SNP | Histone modification | 5 267 |
Multiple mentions of the same entity are only counted once.
Figure 3.Number of citations where the genes BRCA1 or BRCA2 occur in for the last 20 years. Similarly, we show the progression of articles with SNPs co-mentioned with BRCA1 or BRCA2.