| Literature DB >> 28012137 |
Takashi Gojobori1,2, Kazuho Ikeo2, Yukie Katayama3, Takeshi Kawabata4, Akira R Kinjo4, Kengo Kinoshita5,6, Yeondae Kwon3, Ohsuke Migita7,8, Hisashi Mizutani2, Masafumi Muraoka2, Koji Nagata3, Satoshi Omori5, Hideaki Sugawara2, Daichi Yamada9, Kei Yura10,11.
Abstract
Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein-protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts' knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/ .Entities:
Keywords: Big data analysis; Database integration; Lysosomal storage disease; Protein 3D structure
Mesh:
Year: 2016 PMID: 28012137 PMCID: PMC5274651 DOI: 10.1007/s10969-016-9211-3
Source DB: PubMed Journal: J Struct Funct Genomics ISSN: 1345-711X
Components of VaProS
| DB/tool name | Data resource | Search tool | Data/function used in VaProS | Method of access | Original location | Reference |
|---|---|---|---|---|---|---|
| EntrezGene | ✓ | Nomenclature, reference and other biological information of genes | Copy and link |
| [ | |
| UniprotKB | ✓ | Amino acid sequences with biological annotation such as ontology and classification | Copy and link |
| [ | |
| BioGRID | ✓ | Genetic and protein interactions with curation based on biomedical literature | Copy and link |
| [ | |
| ChEMBL | ✓ | Drug-like small molecules with interacting proteins | Copy and link |
| [ | |
| DrugBank | ✓ | Drug molecules combined with drug target informtaion | Copy and link |
| [ | |
| IntAct | ✓ | Molecular interactions obtained from literature and direct submission | Copy and link |
| [ | |
| PID (NDEx) | ✓ | Biological interaction data of proteins | Copy and link |
| [ | |
| Reactome | ✓ | Biological pathway data | Copy and link |
| [ | |
| OMIM | ✓ | Mendelian disease related phenotype and its causative gene | Link |
| [ | |
| hGtoP | ✓ | ✓ | 3D structural and comparative genomics annotations of humans, mice and rats | Link |
| [ |
| Natural Ligand Database | ✓ | ✓ | 3D models of proteins and their natural ligands registered in KEGG reaction database | Link |
| [ |
| COXPRESdb | ✓ | ✓ | Relationship of gene expression based on RNAseq and microarray data | Link |
| [ |
| Mutation@A Glance | ✓ | ✓ | Genetic variants on proteins including disease-causing mutations observed in humans | Link |
| [ |
| 3D Interaction | ✓ | ✓ | Models of protein 3D structure and the structure in complex with other molecules | Link |
| [ |
| Autophagy DB | ✓ | ✓ | List of genes and proteins for autophagy | Built-in |
| [ |
| GNP expression | ✓ | ✓ | Genes clustered by expression pattern showning co-regulation and anti-regulation | Built-in |
| – |
| Molecular Interactions | ✓ | Graphic tool for interaction networks of proteins, compounds and phenotypes | Built-in | – | – | |
| TagCloud | ✓ | Graphic tool to display frequency of words in the titles of papers registered in UniProt | Built-in | – | – | |
| Pathway DB | ✓ | Finder of the related pathways from the databases in use | Built-in | – | – | |
| Phenotype | ✓ | Finder of medelian disease related to the protein/gene in query | Built-in | – | – | |
| Cis-finder | ✓ | Finder of the | Built-in | – | – | |
| S-VAR | ✓ | Evaluator of the impact of missense mutation in a protein | Link |
| – | |
| Genome explorer | ✓ | Annotator of genes with transcription start sites and other biological function | Built-in |
| – | |
| NOREN | ✓ | ID connector from UniProt AC to all the other IDs of the databases in use | Built-in |
| – |
Fig. 1The top page of VaProS located at http://p4d-info.nig.ac.jp/vapros
Fig. 2The initial search result by VaProS. The query word is “HEXA”, the causative gene of Tay-Sachs disease
Fig. 3The search result in detail by pressing the “Details (Go)” button in Fig. 2. The protein–protein interactions and frequently used terms in literature related to HEXA are displayed
Lysosomal storage diseases
| Disease | Type | Gene | UniProt ID | PDB | PDB identity* |
|---|---|---|---|---|---|
| Mucopolysaccharidosis | IH (Hurler syndrome) | – | – | – | – |
| IH-S (Hurler-Scheie syndrome) | IDUA | IDUA_HUMAN | 3W81 | 100% | |
| IS (Hurler, Hurler/Scheie, Scheie syndrome) | – | – | – | – | |
| II (Hunter syndrome) | IDS | IDS_HUMAN | 4UG4 | 36% | |
| III-A (Sanfilippo syndrome) | SGSH | SPHM_HUMAN | 4MIV | 100% | |
| III-B | NAGLU | ANAG_HUMAN | 4XWH | 100% | |
| III-C | HGSNAT | HGNAT_HUMAN | – | – | |
| III-D | GNS | GNS_HUMAN | 4UG4 | 30% | |
| IV-A (Morquio syndrome) | GALNS | GALNS_HUMAN | 4FDI | 100% | |
| IV-B | GLB1 | BGAL_HUMAN | 3WF2 | 100% | |
| VI (Maroteaux-Lamy syndrome) | ARSB | ARSB_HUMAN | 1FSU | 100% | |
| VII (Sly syndrome) | GUSB | BGLR_HUMAN | 1BHG | 100% | |
| IX (Hyaluronidase deficiency) | HYAL1 | HYAL1_HUMAN | 2PE4 | 99% | |
| Niemann-Pick disease | A | SMPD1 | ASM_HUMAN | 5FC5 | 35% |
| B | – | – | – | – | |
| C1 | NPC1 | NPC1_HUMAN | 3JD8 | 100% | |
| C2 | NPC2 | NPC2_HUMAN | 2HKA | 80% | |
| GM1 gangliosidosis | I | GLB1 | BGAL_HUMAN | 3WF2 | 100% |
| II | GLB1 | BGAL_HUMAN | 3WF2 | 100% | |
| III | GLB1 | BGAL_HUMAN | 3WF2 | 100% | |
| GM2 gangliosidosis | Tay-Sachs disease | HEXA | HEXA_HUMAN | 2GJX | 99% |
| Sandhoff’s disease | HEXB | HEXB_HUMAN | 5BRO | 98% | |
| AB variant | GM2A | SAP3_HUMAN | 1PUB | 100% | |
| Sulfatide lipidosis | Metachromatic leukodystrophy | ARSA | ARSA_HUMAN | 1N2L | 100% |
| ARSA | ARSA_HUMAN | 1N2L | 100% | ||
| Multiple sulfatase Deficiency | ARSB | ARSB_HUMAN | 1FSU | 100% | |
| SUMF1 | SUMF1_HUMAN | 1Y1H | 100% | ||
| Saposin dificiency | Prosaposin deficiency | – | – | 4V2O | 100% (fragments) |
| Krabbe disease, atypical | – | – | 3BQQ | ||
| Saposin B deficiency | PSAP | SAP_HUMAN | 2DOB | ||
| Gaucher disease, atypical | – | – | 1SN6 | ||
| Glycogenosis | II (Pompe disease) | GAA | LYAG_HUMAN | 2QLY | 47% |
| Gaucher disease | Gaucher disease | GBA | GLCM_HUMAN | 2WKL | 100% |
| Fabry disease | Fabry disease | GLA | AGAL_HUMAN | 3LXB | 99% |
| Ceramidosis | Farber’s disease | ASAH1 | ASAH1_HUMAN | – | – |
| Krabbe disease | Krabbe disease | GALC | GALC_HUMAN | 4UFH | 84% |
| Cholesterol ester storage disease | Cholesterol ester storage disease | LIPA | LICH_HUMAN | 1K8Q | 60% |
| Wolman disease | Wolman disease | ||||
| Glycoprotein disorder | Alpha-fucosidosis | FUCA1 | FUCO_HUMAN | 2ZXA | 39% |
| Alpha-mannosidosis | MAN2B1 | MA2B1_HUMAN | 1O7D | 83% | |
| Beta-mannosidosis | MANBA | MANBA_HUMAN | 2VR4 | 31% | |
| Aspartylglycosaminuria | AGA | ASPG_HUMAN | 1APZ | 99% | |
| Galactosialidosis | CTSA | PPGB_HUMAN | 1IVY | 99% | |
| Mucolipidosis I | NEU1 | NEUR1_HUMAN | 1EUS | 37% | |
| Mucolipidosis II | – | – | – | – | |
| Mucolipidosis III | GNPTAB | GNPTA_HUMAN | 2N6D | 99% (fragment) | |
| Schindler’s disease | NAGA | NAGAB_HUMAN | 4DO4 | 99% | |
| Membrane metabolism disorder | Cystinosis | CTNS | CTNS_HUMAN | – | – |
| Sialic acid storage disease (Salla disease) | SLC17A5 | S17A5_HUMAN | – | – | |
| Cathepsin K deficiency disease (pycnodysostosis) | CTSK | CATK_HUMAN | 7PCK | 100% | |
| Cobalamin F disease (cblF) | LMBRD1 | LMBD1_HUMAN | – | – | |
| Danon disease | LAMP2 | LAMP2_HUMAN | 2MOM | 100% (fragment) | |
| Neuronal Ceroid Lipofuscinosis | Neuronal ceroid lipofuscinosis-1 | PPT1 | PPT1_HUMAN | 3GRO | 100% |
| Neuronal ceroid lipofuscinosis-2 | TPP1 | TPP1_HUMAN | 3EDY | 100% | |
| Neuronal ceroid lipofuscinosis-3 | CLN3 | CLN3_HUMAN | – | – | |
| Neuronal ceroid lipofuscinosis-4A | CLN6 | CLN6_HUMAN | – | – | |
| Neuronal ceroid lipofuscinosis-4B | DNAJC5 | DNJC5_HUMAN | 2CTW | 100% (fragment) | |
| Neuronal ceroid lipofuscinosis-5 | CLN5 | CLN5_HUMAN | – | – | |
| Neuronal ceroid lipofuscinosis-6 | CLN6 | CLN6_HUMAN | – | – | |
| Neuronal ceroid lipofuscinosis-7 | MFSD8 | MFSD8_HUMAN | – | – | |
| Neuronal ceroid lipofuscinosis-8 | CLN8 | CLN8_HUMAN | – | – | |
| Neuronal ceroid lipofuscinosis-10 | CTSD | CATD_HUMAN | 2PSG | 49% | |
| Neuronal ceroid lipofuscinosis-11 | GRN | GRN_HUMAN | 2JYE | 100% (fragment) | |
| Neuronal ceroid lipofuscinosis-12 | ATP13A2 | AT132_HUMAN | 3WGV | 27% | |
| Neuronal ceroid lipofuscinosis-13 | CTSF | CATF_HUMAN | 1M6D | 99% | |
| Neuronal ceroid lipofuscinosis-14 | KCTD7 | KCTD7_HUMAN | 4UES | 50% (fragment) | |
| Congenital disorder of glycosylation | IA | PMM2 | PMM2_HUMAN | 2AMY | 100% |
*Amino acid sequence identity between the UniProt and PDB entries
Fig. 4Initial search result by VaProS. The search by “gangliosidosis” initially results in a table of candidates
Fig. 5“Molecular Interactions” after selecting HEXA and HEXB in the initial search result (Fig. 4). A big node represents a protein, a small node represents a ligand and an edge represents a protein–protein/ligand interaction. A node in red is associated with a disease (selected in the top-right window)
Fig. 6Artistic representation of the frequency of words in the titles of the manuscripts stored in the entry of HEXA_HUMAN in UniProtKB. The visualization was realized by d3-cloud (https://github.com/jasondavies/d3-cloud)
Fig. 7Variations of GM1-gangliosidosis mapped onto the protein 3D structure of GLB1. Variations in type I (a) and variations in type III (b). The structure of human β-galactosidase (PDB ID: 3WF2) is used for the mapping
Summary of the mutated sites of GLB1 on protein 3D structure for GM1-gangliosidosis
| GM1 type I | GM1 type II | GM1 type III | All residues | |
|---|---|---|---|---|
| Number of residues | 45 | 17 | 15 | 677 |
| Buried residuea (%) | 95.5 | 88.2 | 86.7 | 56.8 |
| Exposed residueb (%) | 4.5 | 11.8 | 13.3 | 43.2 |
aResidue with relative solvent accessibility less than 20%.
bResidue with relative solvent accessibility no less than 20%.