| Literature DB >> 23761449 |
Yuko Makita1, Norio Kobayashi, Yuko Yoshida, Koji Doi, Yoshiki Mochizuki, Koro Nishikata, Akihiro Matsushima, Satoshi Takahashi, Manabu Ishii, Terue Takatsuki, Rinki Bhatia, Zolzaya Khadbaatar, Hajime Watabe, Hiroshi Masuya, Tetsuro Toyoda.
Abstract
Positional MEDLINE (PosMed; http://biolod.org/PosMed) is a powerful Semantic Web Association Study engine that ranks biomedical resources such as genes, metabolites, diseases and drugs, based on the statistical significance of associations between user-specified phenotypic keywords and resources connected directly or inferentially through a Semantic Web of biological databases such as MEDLINE, OMIM, pathways, co-expressions, molecular interactions and ontology terms. Since 2005, PosMed has long been used for in silico positional cloning studies to infer candidate disease-responsible genes existing within chromosomal intervals. PosMed is redesigned as a workbench to discover possible functional interpretations for numerous genetic variants found from exome sequencing of human disease samples. We also show that the association search engine enhances the value of mouse bioresources because most knockout mouse resources have no phenotypic annotation, but can be associated inferentially to phenotypes via genes and biomedical documents. For this purpose, we established text-mining rules to the biomedical documents by careful human curation work, and created a huge amount of correct linking between genes and documents. PosMed associates any phenotypic keyword to mouse resources with 20 public databases and four original data sets as of May 2013.Entities:
Mesh:
Year: 2013 PMID: 23761449 PMCID: PMC3692089 DOI: 10.1093/nar/gkt474
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Concept of SWAS, which calculates the statistical significance of associations between any sets of resources connected through a web of semantic links (solid arrows), while GWAS associates only between alleles and phenotypes (dashed arrows).
Figure 2.Example inferential search result followed by direct search results for retrieving a mouse bioresource associated with the keyword ‘diabetes’. PosMed shows the path connecting from a user’s keyword to the resource, a resource description and linked biological documents (B). To download all candidate mouse strains, click ‘check all’ at the top of ‘Hit resources’ and download them as a text file (C).
Figure 3.A partial example snapshot for ‘expert mode’. The upper path (1.) shows direct search with MEDLINE, mammalian phenotype ontology, mouse bioresources and OMIM documents. The lower path (2.) shows an example inferential path via gene. Users can select the scoring method of each document from ‘strong’, ‘weak’ or ‘none’ in the menu. The ‘strong’ scoring method uses a Boolean function and the P-value becomes 0 when the document has at least one keyword. The ‘weak’ method computes P-value using Fisher’s exact test. If a user selects ‘none’, the biological document is not used (6,7). In this mode users can confirm all PosMed search paths for biological documents.
Figure 4.File upload function and display of users’ descriptions. Users can upload an excel file with gene IDs and descriptions by the user. PosMed ranks the genes listed within the files by statistical relevance between the user’s keyword and each gene, and displays the ranked genes together with the descriptions uploaded by the user.
Updated biological documents for PosMed 2013
| Document set | No. of documents | Data contents | References |
|---|---|---|---|
| Mouse bioresource | 19 280 | Mouse strain information registered at IMSR. | ( |
| 5115 | Mouse strain information from RIKEN BioResource center. | ( | |
| Human gene | 37 287 | Gene annotation of HGNC | ( |
| Mouse gene | 85 726 | Gene annotation of MGI | ( |
| Rat gene | 36 634 | Gene annotation of RGD | ( |
| 32 041 | Gene annotation of TAIR | ( | |
| Rice gene | 29 389 | Gene annotation of RAP-DB | ( |
| Disease | 20 054 | Online Mendelian Inheritance in Man | ( |
| 2037 | Manually collected our original data | ( | |
| 12 131 | ICD-10, International Statistical Classification of Diseases and Related Health Problems | ( | |
| Metabolite | 49 983 | A comprehensive species-metabolite relationship database (KNApSAcK) | ( |
| MEDLINE | 9 378 134 | MEDLINE titles, abstracts and MeSH terms | ( |
| Pathway information | 3809 | Pathway information from REACTOME | ( |
| Protein–protein interaction | 73 645 | Protein–Protein Interactions in Human and Mouse from rom IntAct and | ( |
| Gene ontology | 12 787 | Gene ontology data | ( |
| Human disease ontology | 2282 | Human disease ontology data | ( |
| Mammalian phenotype ontology | 7440 | Mammalian phenotype ontology data | ( |
All data sources and links to the original DBs are described at http://omicspace.riken.jp/Data/.