| Literature DB >> 27478829 |
Lan Huang1, Ye Wang2, Yan Wang1, Tian Bai1.
Abstract
The number of gene-related databases has been growing largely along with the research on genes of bioinformatics. Those databases are filled with various gene functions, pathways, interactions, and so forth, while much biomedical knowledge about human diseases is stored as text in all kinds of literatures. Researchers have developed many methods to extract structured biomedical knowledge. Some study and improve text mining algorithms to achieve efficiency in order to cover as many data sources as possible, while some build open source database to accept individual submissions in order to achieve accuracy. This paper combines both efforts and biomedical ontologies to build an interaction network of multiple biomedical ontologies, which guarantees its robustness as well as its wide coverage of biomedical publications. Upon the network, we accomplish an algorithm which discovers paths between concept pairs and shows potential relations.Entities:
Mesh:
Year: 2016 PMID: 27478829 PMCID: PMC4961833 DOI: 10.1155/2016/3594517
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Extended gene-disease network.
Gene relation databases.
| Database name | Relation type | URL |
|---|---|---|
| dbSNP | ID conversion |
|
| DrugBank | ID conversion |
|
| Ensembl | ID conversion |
|
| KEGG | Pathway |
|
| GenBank | Gene-protein |
|
| NCIPID | Pathway |
|
| Reactome | Pathway |
|
| STRING | Interaction |
|
| UniGene | ID conversion |
|
| GO | ID conversion |
|
Disease name thesauri.
| Thesaurus name | URL |
|---|---|
| Disease ontology [ |
|
| MeSH [ |
|
| UMLS [ |
|
| SNOMED-CT [ |
|
| ICD10 [ |
|
Genetic association database (part).
| Disease | Dis_class | Gene |
|---|---|---|
| Leukemia | CANCER | HLA-A |
| Alzheimer's disease | NEUROLOGICAL | HFE |
| Thalassemia | HEMATOLOGICAL | HBA1 |
| Emphysema | CARDIOVASCULAR | GSTT1 |
| PAH metabolites, urinary | METABOLIC | GSTT1 |
Percentage of matched disease names.
| Method | Direct match | Half ambiguous | With x-ref | Ambiguous |
|---|---|---|---|---|
| Percentage | 10% | 45% | 65% | 96% |
Examples of four methods.
| Disease name from GAD | Disease name from thesaurus | |
|---|---|---|
| Direct match | Leukemia | Leukemia |
| Half ambiguous | Diabetes, Type 2 | Type 2 Diabetes Mellitus |
| With x-refs | Sleep Disorders | Sleep Disorder |
| Ambiguous | Testicular Neoplasms | Testicular Disease |
Nodes and edges of the network.
| Edge | Number of edges | Number of nodes |
|---|---|---|
| Gene-gene | 207051075 | 18998 |
| Disease-disease | 6932 | 6588 |
| Gene-disease | 121309 | 25586 |
Figure 2Workflow of algorithm.
Figure 3New paths for GAD pairs in expanded gene-disease network [34–36].
Distribution of test results.
| Results | In GAD | Only in expanded network | All |
|---|---|---|---|
| Percentage of total | 3.2% | 82.2% | 85.4% |
| Count | 5120 | 130918 | 159259 |
Alternative paths comparison of pairs in GAD.
| Results | Without alternative paths | With alternative paths |
|---|---|---|
| Percentage | 71.5% | 28.4% |
| Count | 3663 | 1457 |
Results of alpha thalassemia and beta thalassemia in GAD.
| Disease | Related genes |
|---|---|
| Alpha thalassemia | Hb, HP, UGT1A1, and ABO |
| Beta thalassemia | HBG2, PROCR, NOS1, NOS2, HBG1, F2, F5, COL1A1, HAMP, VDR, TNF, SERPINE1, AHSP, ITGB3, APOE, APOB, LARGE, HBBP1, HLA-C, GSTT1, GSTM1, FGB, F13A1, ESR1, ACE, and COL1A1 |
| Alpha and beta thalassemia | MYB, MTHFR, HBA@, HBS1L, HBB, HBA2, HBA1, G6PD, HBB, BCL11A, UGT1A1, and HFE |
Examples of gene-disease pairs found in GAD.
| Gene | Disease |
|---|---|
| THBS1 | Prostate cancer |
| TH | Mental disorder |
| TH | Mood disorder |
| TH | Borderline personality disorder |
| MB | Breast cancer |
| TNF | Leptospirosis |
| PC | Colorectal cancer |
Comparisons of Literome, DisGeNet, and expanded gene-disease network.
| Literome | DisGeNet [ | Expanded gene-disease network | |
|---|---|---|---|
| Sources | PubMed | GAD, CTD, and other 12 more | GAD, PubMed, and 14 more databases |
| PubMed articles linkage | Yes | No | Partial |
| Gene-disease interactions | Yes | Yes | Yes |
| Interaction path | Partial | No | Yes |
| Path length | 3 | 1 | Can be assigned |
| Demonstration | PMID and marked texts | List of triplet | Disease names and gene names in path |
| Database update | Yes | Yes | Yes |
| Supported medical terms retrieval | Genic interactions, genotype-phenotype | Disease and gene | Gene and disease interactions |