| Literature DB >> 22155608 |
Shweta S Chavan1, John D Shaughnessy, Ricky D Edmondson.
Abstract
Many primary biological databases are dedicated to providing annotation for a specific type of biological molecule such as a clone, transcript, gene or protein, but often with limited cross-references. Therefore, enhanced mapping is required between these databases to facilitate the correlation of independent experimental datasets. For example, molecular biology experiments conducted on samples (DNA, mRNA or protein) often yield more than one type of 'omics' dataset as an object for analysis (eg a sample can have a genomics as well as proteomics expression dataset available for analysis). Thus, in order to map the two datasets, the identifier type from one dataset is required to be linked to another dataset, so preventing loss of critical information in downstream analysis. This identifier mapping can be performed using identifier converter software relevant to the query and target identifier databases. This review presents the publicly available web-based biological database identifier converters, with comparison of their usage, input and output formats, and the types of available query and target database identifier types.Entities:
Mesh:
Year: 2011 PMID: 22155608 PMCID: PMC3525252 DOI: 10.1186/1479-7364-5-6-703
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Link to web interface of various Id converters
| Mapping services | Link |
|---|---|
| Gene/Clone ID converter | |
| ID mapping by UniProt | |
| MatchMiner | |
| DAVID gene ID conversion tool | |
| g:Convert | |
| CRONOS | |
| bioDBnet:db2db |
Comparison of various Id converters
| Features of mapping services | Gene/Clone ID converter | ID mapping by UniProt | MatchMiner | DAVID gene ID conversion tool | g:Convert | CRONOS | bioDBnet:db2db |
|---|---|---|---|---|---|---|---|
| Interface | Web-based GUI form | Web-based GUI form | Web-based GUI form, command line | Web-based GUI form | Web-based GUI form | Web-based GUI form | Web-based GUI form |
| Output format | Html, text, spreadsheet | Html, text | Html, text, spreadsheet | Html, text, spreadsheet | Html, text, spreadsheet, minimal (no header) | Html, email for batch mode | Text, spreadsheet |
| Organisms | Human, mouse, rat | Human, mouse, rat and many other species | Human, mouse | Human, about another 90,000 species | Human, mouse, rat and 31 other Ensembl-supported genome species | Human, mouse, rat, cow, dog, and fruit fly | A specified list could not be found |
| Input/output clone or transcript | Clone Ids, Affymetrix Ids, GenBank Accession (Additional output: EMBL)* | GenBank, EMBL, DDBJ | Affymetrix Ids, GenBank Accession, EST, IMAGE Clone Id, FISH-mapped BAC Clone Id | Affymetrix Id, Agilent Id, Illumina Id, GenBank Accession, Gene symbol, GenPept Accession, NCBI GI, RefSeq RNA/Genomic accession | Affymetrix, Agilent, CCDS Ids, Ensembl transcript, Illumina, RefSeq DNA/Genomic | Ensembl/FlyBase Transcript ID, EMBL, Affymetrix, Agilent, CCDS | Affymetrix, Agilent, GenBank, RefSeq Genomic, RefSeq Nucleotide |
| Input/output gene | HUGO gene names, Entrez gene Ids, Ensembl gene Ids, UniGene cluster Ids, RefSeq RNAs (Additional output: CCDS)a | Entrez Gene, HGNC, Ensembl, UniGene, TIGR (JCVI) | Gene Symbol HUGO/Alias, Name, UniGene Cluster Id, Entrez Gene Id, RefSeq RNA | Entrez gene Id, Ensembl gene/transcript Id, RefSeq mRNA accession, UniGene Id | Ensembl Gene, Entrez Gene, RefSeq mrna, UniGene | Gene Name, Ensembl/FlyBase Gene ID, GI, GeneID, HGNC, RefSeq mRNA | Entrez Gene ID, Ensembl Gene ID, UniGene |
| Input/output protein | RefSeq peptides, SwissProt names (Additional output: IPI, PDB)* | UniProtKB, RefSeq, GenPept, IPI, PDB | RefSeq protein | PIR accession, PIR Id, PIR NREF Id, RefSeq Protein accession, Uniprot Id/accession, UniRef Id | Ensembl Protein, IPI, PDB, RefSeq Protein | Protein Name, UniProt, Ensembl/FlyBase Protein ID, IPI, PIR | UniProt Accession, Ensembl Protein ID, GenPept, RefSeq Protein, UniProt |
| Input/output other information | (Additional output: PubMed, GO, KEGG, Reactome, Chromosomal locations from Ensembl, UCSC Genome Browser, OMIM)a | SGD, GeneRif, NCBI Taxon, and others | Cytogenetic location: UCSC (Additional output: PubMed, GO, KEGG, Reactome, Chromosomal locations from Ensembl, UCSC Genome Browser, OMIM)* | "Not sure" type also accepted, and many other secondary database identifiers also supported | UCSC, PubMed, GO and many other secondary databases | dbSNP, UniSTS, MGI, orfnames, MIM, MORBID, CDD | GO, InterPro, Biocarta, KEGG, dbSNP, H-Invitational (H-Inv), HomoloGene, MGC, MIM, UniSTS, Taxon, and other secondary databases |
aAll input Id types are potential output Id types as well (eg): as in Id mapping by UniProt. In some cases, however, there are additional output Id types available to choose from which are not available as input Id using that particular converter. Such output Id types are mentioned in parentheses and are indicated as 'additional output' (eg) as in Clone/Gene Id converter, GUI graphical user interface.