| Literature DB >> 19144180 |
Damian Smedley1, Syed Haider, Benoit Ballester, Richard Holland, Darin London, Gudmundur Thorisson, Arek Kasprzyk.
Abstract
BACKGROUND: Biologists need to perform complex queries, often across a variety of databases. Typically, each data resource provides an advanced query interface, each of which must be learnt by the biologist before they can begin to query them. Frequently, more than one data source is required and for high-throughput analysis, cutting and pasting results between websites is certainly very time consuming. Therefore, many groups rely on local bioinformatics support to process queries by accessing the resource's programmatic interfaces if they exist. This is not an efficient solution in terms of cost and time. Instead, it would be better if the biologist only had to learn one generic interface. BioMart provides such a solution.Entities:
Mesh:
Year: 2009 PMID: 19144180 PMCID: PMC2649164 DOI: 10.1186/1471-2164-10-22
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Description of all publicly accessible BioMarts to date
| Ensembl Genes | Automated annotation of over 40 eukaryotic genomes | EMBL-EBI, UK |
| Ensembl Homology | Ensembl Compara orthologues and paralogues | EMBL-EBI, UK |
| Ensembl Variation | Ensembl Variation data from dbSNP and other sources | EMBL-EBI, UK |
| Ensembl Genomic Features | Ensembl Markers, clones and contigs data | EMBL-EBI, UK |
| Vega | Manually curated human, mouse and zebrafish genes | EMBL-EBI, UK |
| HTGT | High throughput gene targeting/trapping to produce mouse knock-outs | Sanger, UK |
| Gramene | Comparative Grass Genomics | CSHL, USA |
| Reactome | Curated database of biological pathways | CSHL, USA |
| Wormbase | CSHL, USA | |
| Dictybase | Dictyostelium discoideum genome database | Northwestern University, USA |
| RGD | Rat model organism database | Medical College of Wisconsin, USA |
| PRIDE | Proteomic data repository | EMBL-EBI, UK |
| EURATMart | Rat tissue expression compendium | EMBL-EBI, UK |
| MSD | Protein structures | EMBL-EBI, UK |
| Uniprot | Protein sequence and function repository | EMBL-EBI, UK |
| Pancreatic Expression Database | Pancreatic cancer expression database | Barts & The London School of Medicine, UK |
| PepSeeker | Peptide mass spectrometer data for proteomics | University of Manchester, UK |
| ArrayExpress | Microarray data repository | EMBL-EBI, UK |
| GermOnLine | Cross species knowledgebase of genes relevant for sexual reproduction | Biozentrum/SIB, Switzerland |
| DroSpeGe | Annotation of 12 Drosophila genomes | Indiana Univeristy, USA |
| HapMap | Catalogue of common human variations in a range of populations | CSHL, USA |
| VectorBase | Invertebrate vectors of human pathogens | University of Notre Dame, USA |
| Paramecium DB | Paramecium tetraurelia model organism database | CNRS, France |
| Eurexpress | Mouse | MRC Edinburgh, UK |
| Europhenome | Mouse phenotype data from high throughput standardized screens | MRC Harwell, UK |
Figure 1BioMart query showing that the St18 gene is the only mouse gene in the first 10 Mb of chromosome 1 that is annotated as "targeting complete" by the International Mouse Mutagenesis Consortium. This involves: (A) selecting the Ensembl Mus musculus genes dataset, (B) setting the filters, (C) setting the attributes, (D) viewing the results, and (E) adding in the Gene targeting dataset to obtain just the gene that has reached the "targeting complete" status.
Figure 2(A) Sequence output options and (B) FASTA output for all the genes found to be up-regulated in a microarray experiment using the Affymetrix HG-U95Av2 probeset. Here 1000 bp upstream of the first exon have been chosen along with the Ensembl Gene Id and the chromosomal position of the gene for the FASTA header.
Figure 3Candidate gene identification using BioMart. (A) The Arrhythmogenic right ventricular dysplasia (ARVD) gene was mapped to 14q24. BioMart identifies 172 genes in this region, which may be narrowed down to 67 with expression in the heart. (B) This may be further refined to the two candidate genes, ZFP36L1 and TGFB3, by looking for genes involved in organ morphogenesis, according to GO, as this condition is known to result in widespread structural abnormalities. The latter gene is now known to be the one involved in this disorder. (C) BioMart may also be used to extract SNPs for the identified genes including their location in the gene, whether they are upstream, downstream, intronic or coding and for the latter whether they result in an amino acid substitution.
Output from the listMarts command of the BiomaRt library
Output from the listDatasets command of the BiomaRt library
Output from the getBM command of the BiomaRt library