| Literature DB >> 12702208 |
Kimberly J Bussey1, David Kane, Margot Sunshine, Sudar Narasimhan, Satoshi Nishizuka, William C Reinhold, Barry Zeeberg, Weinstein Ajay, John N Weinstein.
Abstract
MatchMiner is a freely available program package for batch navigation among gene and gene product identifier types commonly encountered in microarray studies and other forms of 'omic' research. The user inputs a list of gene identifiers and then uses the Merge function to find the overlap with a second list of identifiers of either the same or a different type or uses the LookUp function to find corresponding identifiers.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12702208 PMCID: PMC154578 DOI: 10.1186/gb-2003-4-4-r27
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Information Flow in MatchMiner. Input identifier lists are first translated into unique internal gene indices to form a translation table. The translation table is then either converted into another set of identifiers using the LookUp function or compared with another such table using the Merge function to generate a report showing the intersection of two separate identifier lists. The resulting output can be displayed as HTML or else saved as text for import into other programs.
Figure 2Database relational table schema for MatchMiner. (a) Logical database representation. Data are incorporated from the UCSC Human Genome Build, LocusLink, UniGene, OMIM, and the Affymetrix annotation sets for HU95 and HU133 chips. Each candidate gene is assigned a gene index in the GeneIdx table. These gene indexes are used as keys for all of the MatchMiner operations. The number of many-to-many relationships in the model illustrates the complexity of the data. (b) Physical representation of the database. The implementation currently includes 14 tables with about 12 million rows.
MatchMiner LookUp search options
| Identifier type | Input algorithm | Output algorithm |
| Name (Gene symbol, alias or descriptive name) | HUGO then Alias | HUGO then Alias |
| Starts with Official. If not found, proceeds through all other sources. | Returns the HUGO name. If no name is flagged as HUGO, returns all aliases. | |
| ALL (HUGO and Alias) | ALL (HUGO and Alias) | |
| Searches all data sources for all matches to the symbol and flags those that are HUGO. | Returns all gene symbols and flags the HUGO symbol. | |
| Official | Official | |
| Searches all data sources for match as the HUGO name. | Returns the HUGO name. If not found, nothing is returned. | |
| Long | Long | |
| Searches all data sources for descriptive name. | Returns all descriptive names. | |
| GenBank accession number | ALL | ALL |
| Searches all data sources starting with UCSC known genes, then LocusLink, UniGene and UCSC ESTs until match found. | Returns accession number from UCSC known genes. If not found, proceeds through UniGene then UCSC EST. | |
| Data-source specific | Data-source specific | |
| Look up input in a specific data source. | Returns accession numbers found in a particular data source. | |
| IMAGE clone | UniGene | UniGene |
| Only data source with IMAGE clone ids. | Returns all IMAGE clone IDs associated with the UniGene | |
| Cytogenetic location | ALL | ALL |
| Searches all gene indexes for matching chromosome band. | Returns chromosome band from UCSC sequence to band translation. If not found, proceeds through all other sources with multiple bands listed separately. | |
| UCSC | ||
| Returns chromosome band from UCSC sequence to band translation. | ||
| Database id | UniGene | UniGene |
| Searches gene index for matching UniGene id. | Returns UniGene id. | |
| Affymetrix | Affymetrix | |
| Searches gene index for matching Affymetrix probe set identifier. | Returns Affymetrix probe set identifier. | |
| Sequence location number (bp) | Not implemented | Transcription Start |
| Returns transcription start from UCSC Known Genes. If not found, proceeds to UCSC EST. | ||
| FISH clone | UCSC | UCSC |
| Searches UCSC FISH clones for match to gene index based on sequence position overlap with UCSC known genes. | Returns FISH clone id. |
Figure 3Associating FISH-mapped BACs with genes. Schematic view of FISH-mapped BACs from 1p36.33 near the PITSLRE kinase genes (UCSC Genome Browser, June 2002 freeze). Note that a single BAC can encompass one or more genes. In MatchMiner, the FISH-mapped BAC table from UCSC is imported, and chromosomal positions are read from the table for comparison with the transcriptional start positions of UCSC 'Known Genes'. If a transcriptional start is contained within the bounds of a BAC, that BAC is associated with the corresponding gene index. Thus, a BAC containing several genes will be associated with each of those genes.
Comparison of the capabilities of gene identifier translation tools
| Program | Implementation | Search Types | Batch | Translation path traceable in interactive (single-gene) mode? | Translation path traceable in batch(gene-list) mode? | Multiple input associations flagged? | Output in form suitable for automated processing? |
| MatchMiner | Command line, Web application | LookUp, Merge | Yes | Yes | Yes | Yes | Yes |
| Source | Web application | LookUp | Yes | Yes | No | Yes, if "Show all Cluster Ids if Multiple Clusters" option selected | Yes |
| Genelynx | Web application | LookUp | Yes | Yes | No | Yes | No |
ChainOfResponsibility hierarchies for data sources in MatchMiner
| Identifier type | Hierarchy of source reliability |
| Cytogenetic location | UCSC Known Genes, LocusLink, UniGene, UCSC EST, OMIM |
| GenBank accession number | UCSC Known Genes, LocusLink, UniGene, UCSC EST, OMIM |
| HUGO gene symbol | LocusLink, OMIM |
| IMAGE clone id | UniGene |
| Long gene name | UCSC Known Genes, LocusLink, UniGene, UCSC EST, OMIM |
| Affymetrix probe id | Affymetrix |
| UniGene cluster id | UniGene |