| Literature DB >> 35801937 |
Shuya Ikeda1, Hiromasa Ono1, Tazro Ohta1, Hirokazu Chiba1, Yuki Naito1, Yuki Moriya1, Shuichi Kawashima1, Yasunori Yamamoto1, Shinobu Okamoto1, Susumu Goto1, Toshiaki Katayama1.
Abstract
MOTIVATION: Understanding life cannot be accomplished without making full use of biological data, which are scattered across databases of diverse categories in life sciences. To connect such data seamlessly, identifier (ID) conversion plays a key role. However, existing ID conversion services have disadvantages, such as covering only a limited range of biological categories of databases, not keeping up with the updates of the original databases, and outputs being hard to interpret in the context of biological relations, especially when converting IDs in multiple steps.Entities:
Year: 2022 PMID: 35801937 PMCID: PMC9438948 DOI: 10.1093/bioinformatics/btac491
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Databases and datasets currently available on TogoID
| Database | Dataset ID | Dataset full name | Category |
|---|---|---|---|
| Affymetrix probeset | affy_probeset | Affymetrix probeset | Probe |
| BioProject | bioproject | BioProject | Project |
| BioSample | biosample | BioSample | Sample |
| Consensus CDS | ccds | Consensus CDS | Gene |
| ChEBI | chebi | ChEBI compound | Compound |
| ChEMBL | chembl_compound | ChEMBL compound | Compound |
| ChEMBL | chembl_target | ChEMBL target | Protein |
| ClinVar | clinvar | ClinVar variant | Variant |
| dbSNP | dbsnp | dbSNP | Variant |
| Disease ontology | doid | Disease ontology | Disease |
| DrugBank | drugbank | DrugBank | Compound |
| Enzyme nomenclature | ec | Enzyme nomenclature | Function |
| Ensembl | ensembl_gene | Ensembl gene | Gene |
| Ensembl | ensembl_protein | Ensembl protein | Protein |
| Ensembl | ensembl_transcript | Ensembl transcript | Transcript |
| GlyTouCan | glytoucan | GlyTouCan | Glycan |
| Gene ontology | go | Gene ontology | Function |
| HGNC | hgnc | HGNC | Gene |
| HGNC | hgnc_symbol | HGNC gene symbol | Gene, Synonym |
| HMDB | hmdb | HMDB | Compound |
| HomoloGene | homologene | HomoloGene | Ortholog |
| Human Phenotype Ontology | hp | Human Phenotype Ontology | Disease |
| InChIKey | inchi_key | InChIKey | Compound |
| GenBank/ENA/DDBJ | insdc | GenBank/ENA/DDBJ | Gene |
| IntAct | intact | IntAct | Interaction |
| InterPro | interpro | InterPro | Domain |
| LRG | lrg | LRG | Gene |
| MBGD | mbgd_gene | MBGD gene | Gene |
| MBGD | mbgd_organism | MBGD organism | Organism |
| MedDRA | meddra | MedDRA | Disease |
| MedGen | medgen | MedGen | Disease |
| MeSH | mesh | MeSH | Disease |
| MGI | mgi | MGI | Gene |
| miRBase | mirbase | miRBase | Transcript |
| MONDO | mondo | MONDO | Disease |
| NANDO | nando | NANDO | Disease |
| NCBI gene | ncbigene | NCBI gene | Gene |
| OMA | oma_group | OMA group | Ortholog |
| OMA | oma_protein | OMA protein | Protein |
| OMIM | omim_gene | OMIM gene | Gene |
| OMIM | omim_phenotype | OMIM phenotype | Disease |
| Orphanet | orphanet | Orphanet | Disease |
| PDB | pdb | PDB | Structure |
| Pfam | pfam | Pfam | Domain |
| PubChem | pubchem_compound | PubChem compound | Compound |
| PubChem | pubchem_substance | PubChem substance | Compound |
| PubMed | pubmed | PubMed | Literature |
| Reactome | reactome_pathway | Reactome pathway | Pathway |
| Reactome | reactome_reaction | Reactome reaction | Reaction |
| RefSeq | refseq_genomic | RefSeq genomic | Gene |
| RefSeq | refseq_protein | RefSeq protein | Protein |
| RefSeq | refseq_rna | RefSeq RNA | Transcript |
| RGD | rgd | RGD | Gene |
| Rhea | rhea | Rhea | Reaction |
| SRA | sra_accession | SRA accession | Submission |
| SRA | sra_analysis | SRA analysis | Analysis |
| SRA | sra_experiment | SRA experiment | Experiment |
| SRA | sra_project | SRA project | Project |
| SRA | sra_run | SRA run | SequenceRun |
| SRA | sra_sample | SRA sample | Sample |
| Taxonomy | taxonomy | Taxonomy | Organism |
| TogoVar variant | togovar | TogoVar variant | Variant |
| UniProt | uniprot | UniProt | Protein |
| UniProt | uniprot_mnemonic | UniProt mnemonic | Protein, Synonym |
| WikiPathways | wikipathways | WikiPathways | Pathway |
Note: The categories are classes defined in the TogoID ontology.
Fig. 1.Two methods to find a proper route for converting the input IDs. (A) The screenshot of the ‘EXPLORE’ mode. By clicking the ‘Ensembl gene’ button in the examples, four gene IDs of the Yamanaka factors (Takahashi and Yamanaka, 2006) are filled in to suggest candidate datasets. By selecting a circle next to a dataset, available conversion paths will be shown. Each dataset box shows the menu by hovering the cursor. Clicking the table icon on the left opens the ‘Results’ window, where users can preview the conversion results and download in the designated output format (see Fig. 2A). The download icon in the middle is a shortcut to directly download the converted ID list. The information icon (i) on the right is to show the details of the dataset and the linked datasets. (B) In the ‘NAVIGATE’ mode, users can specify the conversion target from the pull-down menu of the supported datasets in TogoID. The candidate conversion routes will be listed
Fig. 2.Browsing available datasets and conversion results. (A) In the ‘DATASETS’ tab, users can browse the list of datasets. Each dataset has a description imported from the Integbio Database Catalog, a list of directly linked datasets and a list of IDs to exemplify acceptable ID formats, which can be used to test the conversion by clicking. (B) In the ‘Results’ window, the ‘Route’ displays the number of corresponding IDs along with the biological meanings. In the ‘Report’ section, users can specify the format of the results, which can be retrieved from the ‘Action’ buttons. The IDs in the ‘Preview’ table are linked to the original DB entry. In this example, users can find that OMIM phenotype ID: 601317 (Deafness) is related to UniProt ID: Q13402 (Unconventional myosin-7a) and that this protein is corresponding to NCBI Gene ID: 4647 (MYO7A)