| Literature DB >> 35536289 |
Peng-Hsuan Li1, Ting-Fu Chen1, Jheng-Ying Yu1, Shang-Hung Shih1, Chan-Hung Su1, Yin-Hung Lin1, Huai-Kuang Tsai1,2, Hsueh-Fen Juan1,3,4, Chien-Yu Chen1,4,5, Jia-Hsin Huang1.
Abstract
With the proliferation of genomic sequence data for biomedical research, the exploration of human genetic information by domain experts requires a comprehensive interrogation of large numbers of scientific publications in PubMed. However, a query in PubMed essentially provides search results sorted only by the date of publication. A search engine for retrieving and interpreting complex relations between biomedical concepts in scientific publications remains lacking. Here, we present pubmedKB, a web server designed to extract and visualize semantic relationships between four biomedical entity types: variants, genes, diseases, and chemicals. pubmedKB uses state-of-the-art natural language processing techniques to extract semantic relations from the large number of PubMed abstracts. Currently, over 2 million semantic relations between biomedical entity pairs are extracted from over 33 million PubMed abstracts in pubmedKB. pubmedKB has a user-friendly interface with an interactive semantic graph, enabling the user to easily query entities and explore entity relations. Supporting sentences with the highlighted snippets allow to easily navigate the publications. Combined with a new explorative approach to literature mining and an interactive interface for researchers, pubmedKB thus enables rapid, intelligent searching of the large biomedical literature to provide useful knowledge and insights. pubmedKB is available at https://www.pubmedkb.cc/.Entities:
Year: 2022 PMID: 35536289 PMCID: PMC9252824 DOI: 10.1093/nar/gkac310
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Figure 1.An overview of the pubmedKB processing workflow for text mining (A–E). The PubMed abstracts are retrieved form the NCBI FTP server. The journal abstracts are parsed individually to extract relationships between biomedical entities, which are stored in the Neo4j database.
Distinct entities and distinct entity pairs (unordered) in the pubmedKB relational database
| Entities (n) | Entity pairs (n) | ||||
|---|---|---|---|---|---|
| Variants | Diseases | Genes | Chemicals | ||
| Variants | 125 745 | 3 558 | 315 452 | 6 673 | 1 926 |
| Diseases | 71 787 | 315 452 | 429 723 | 40 070 | 148 156 |
| Genes | 45 315 | 6 673 | 40 070 | 33 470 | 37 074 |
| Chemicals | 180 075 | 1 926 | 148 156 | 37 074 | 360 931 |
Figure 2.pubmedKB user interface. (A) Search bar for entering queries; (B) a semantic graph; (C) node selection; (D) relationships filters; (E) number of supporting publications containing relational evidence; (F) a download function to retrieve all results as a .csv file; (G) filter options with publication information; (H) article information, including the PMID link to take the user directly to PubMed; (I) supporting-evidence sentences with the relevant entities highlighted in colour.
Figure 3.Interactive snapshots of pubmedKB filters. (A) If ‘rs4961’ is queried, many related entities are shown. (B) Node selection is used to select one or more node entities. (C) If the ‘ADD1’ entity is clicked, the relevant relationship details and corresponding publications are shown. (D) The relation filters at the top of the graph panel provide edge filters for selecting node entities. (E) If ‘hypertension’ is selected by clicking the node, the relationship details and corresponding publications are shown.