| Literature DB >> 19468046 |
Yuko Yoshida1, Yuko Makita, Naohiko Heida, Satomi Asano, Akihiro Matsushima, Manabu Ishii, Yoshiki Mochizuki, Hiroshi Masuya, Shigeharu Wakana, Norio Kobayashi, Tetsuro Toyoda.
Abstract
PosMed (http://omicspace.riken.jp/) prioritizes candidate genes for positional cloning by employing our original database search engine GRASE, which uses an inferential process similar to an artificial neural network comprising documental neurons (or 'documentrons') that represent each document contained in databases such as MEDLINE and OMIM. Given a user-specified query, PosMed initially performs a full-text search of each documentron in the first-layer artificial neurons and then calculates the statistical significance of the connections between the hit documentrons and the second-layer artificial neurons representing each gene. When a chromosomal interval(s) is specified, PosMed explores the second-layer and third-layer artificial neurons representing genes within the chromosomal interval by evaluating the combined significance of the connections from the hit documentrons to the genes. PosMed is, therefore, a powerful tool that immediately ranks the candidate genes by connecting phenotypic keywords to the genes through connections representing not only gene-gene interactions but also other biological interactions (e.g. metabolite-gene, mutant mouse-gene, drug-gene, disease-gene and protein-protein interactions) and ortholog data. By utilizing orthologous connections, PosMed facilitates the ranking of human genes based on evidence found in other model species such as mouse. Currently, PosMed, an artificial superbrain that has learned a vast amount of biological knowledge ranging from genomes to phenomes (or 'omic space'), supports the prioritization of positional candidate genes in humans, mouse, rat and Arabidopsis thaliana.Entities:
Mesh:
Year: 2009 PMID: 19468046 PMCID: PMC2703941 DOI: 10.1093/nar/gkp384
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Neural network model for the PosMed gene search algorithm. As an example, the user's keyword ‘diabetes’ can be found in several documents, including MEDLINE (Input). These documents are mapped to genes that are supported by manual curation (Concept). Using biological knowledge (e.g. protein–protein interaction, co-expression and co-citation of document sets), PosMed can also suggest genes that do not have the user's keyword ‘diabetes’ in their associated documents (Association). PosMed then returns the candidate genes that are located within the user's specified genomic interval (Output). Thereafter, the user's keyword will be highlighted in the documents (Display).
Figure 2.Output display on PosMed Search. Example search result for mouse genes against the keyword ‘diabetes OR insulin’ between 90M and 140M bp on Chr 1 in the NCBIm 36 genome. Users can apply their queries at the top of the output display (A). To select genomic interval visually, PosMed cooperates with the Flash-based genomic browser OmicBrowse (12). The tab labeled ‘All Hits’ (B) shows a list of selectable document sets to be included in the search. As a default parameter, PosMed sets ‘Associate the keyword with entities co-cited within the same sentences’. If the total number of the candidate genes is below 20, PosMed will automatically change this parameter to ‘Associate the keyword with entities co-cited within the same document’ to show more candidates (B). PosMed search results are ranked in (C). Users can download at most 300 candidate genes and their annotations from (D).
Figure 3.Detail page showing supporting documents of the inference-type search. Adipor1 related genes are listed in (A). The supporting documents for Adipor1 and Adipoq are ranked in (B).
Document sets implemented in PosMed
| Document | Display name on PosMed | No of documents | Reference |
|---|---|---|---|
| MEDLINE | MEDLINE | 17 132 801 | ( |
| BRMM | mouse mutant | 12 911 | Original data |
| OMIM | OMIM | 19 891 | ( |
| HsPPI | HsPPI | 35 731 | ( |
| AtPID | AtPID | 44 082 | ( |
| ATTED-II | At co-expression | 24 418 | ( |
| REACTOME | REACTOME | 10 761 | ( |
| MouseGeneRecord | mouse gene record | 58 768 | ( |
| RatGeneRecord | rat gene record | 36 634 | ( |
| HumanGeneRecord | human gene record | 31 459 | ( |
| ArabidopsisGeneRecord | arabidopsis gene record | 32 041 | ( |
| MetaboliteRecord | metabolite record | 18 045 | ( |
| DrugRecord | drug record | 1015 | Original data |
| DiseaseRecord | disease record | 1911 | Original data |
| RIKENResearcherRecord | researcher record | 6852 | Original data |
| Total | 17 467 320 |
aOur original data was created from several data sources. The main data sources are listed at http://omicspace.riken.jp/acknwldgmnt.htm
bHsPPI data is derived from the Genome Network Platform (http://genomenetwork.nig.ac.jp/public/sys/gnppub/).