| Literature DB >> 21785142 |
Rhoda J Kinsella1, Andreas Kähäri, Syed Haider, Jorge Zamora, Glenn Proctor, Giulietta Spudich, Jeff Almeida-King, Daniel Staines, Paul Derwent, Arnaud Kerhornou, Paul Kersey, Paul Flicek.
Abstract
For a number of years the BioMart data warehousing system has proven to be a valuable resource for scientists seeking a fast and versatile means of accessing the growing volume of genomic data provided by the Ensembl project. The launch of the Ensembl Genomes project in 2009 complemented the Ensembl project by utilizing the same visualization, interactive and programming tools to provide users with a means for accessing genome data from a further five domains: protists, bacteria, metazoa, plants and fungi. The Ensembl and Ensembl Genomes BioMarts provide a point of access to the high-quality gene annotation, variation data, functional and regulatory annotation and evolutionary relationships from genomes spanning the taxonomic space. This article aims to give a comprehensive overview of the Ensembl and Ensembl Genomes BioMarts as well as some useful examples and a description of current data content and future objectives. Database URLs: http://www.ensembl.org/biomart/martview/; http://metazoa.ensembl.org/biomart/martview/; http://plants.ensembl.org/biomart/martview/; http://protists.ensembl.org/biomart/martview/; http://fungi.ensembl.org/biomart/martview/; http://bacteria.ensembl.org/biomart/martview/.Entities:
Mesh:
Year: 2011 PMID: 21785142 PMCID: PMC3170168 DOI: 10.1093/database/bar030
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Summary of data available at the Ensembl BioMart as of Ensembl release 61
| Data set | Description of data content |
|---|---|
| Ensembl Genes 61 | Genes from 52 species with annotated external references, protein domains, multi species comparison (orthologs, possible orthologs and paralogs), variation (germline and somatic), regulation (probe set mapping for microarray platforms), gene ontology, expression (GNF/Atlas) and transcript splicing event data |
| Ensembl Variation 61 | Variation data for 18 species including human somatic mutation data from COSMIC ( |
| Ensembl Regulation 61 | Regulation data for human, mouse and |
| Vega 41 | Manually curated genes for human, mouse and zebrafish by the HAVANA group at WTSI and displayed in the VEGA database ( |
| Reactome | Manually curated and peer-reviewed pathways from the BioMart ( |
| PRIDE (EBI UK) | Proteomics data from the PRIDE PRoteomics IDEntifications ( |
Summary of data available at the Ensembl Genomes BioMarts as of Ensembl Genomes release 8
| Data set | Description of data content |
|---|---|
| Ensembl Bacteria 8 | 249 genomes across 10 different clades (Gene database) |
| Ensembl Protists 8 | 11 species including |
| Ensembl Fungi 8 | 13 species, including eight Aspergillus species, |
| Ensembl Metazoa 8 | 30 species, including 12 Drosphila, five Caenorhabditis, |
| Ensembl Plants 8 | 10 species, including |
Summary of sources of help and documentation at Ensembl
| Information resource | URL or Email address |
|---|---|
| Ensembl frequently asked questions | |
| BioMart frequently asked questions | |
| Tutorials | |
| YouTube videos | |
| Ensembl news containing information about updates to mart databases | |
| Ensembl Blog | |
| Ensembl archives containing archived BioMart databases | |
| Ensembl helpdesk mailing list | helpdesk@ensembl.org |
| Ensembl Genomes helpdesk mailing list | helpdesk@ensemblgenomes.org |
| Ensembl Genomes portal website containing project information |
Figure 1.There are 777 Ensembl protein coding genes that code for the GPCR domain with InterPro ID (IPR000276) and that are detectable with the Affy HuGene 1_0 st v1 array 25.
Figure 2.The esv263 structural variation from DGVa occurs between 16 265 092 and 16 446 378 bp on chromosome 12.
Figure 3.Shows that there are 100 single nucleotide polymorphisms in the human somatic variation data set associated with tumors in the eye and the list of Ensembl gene IDs containing these variations can be downloaded for further study or one can click on an entry in the Ensembl Gene ID column on the interface which links to the main Ensembl website.
Figure 4.Five dbSNP rs IDs were used to filter the human variation data set and Ensembl gene IDs containing these five variations were selected in the attributes. Then linking to the second data set, human gene data set from Ensembl Genes database, the HGNC ID and symbol were selected in the attribute section to retrieve the corresponding gene names from HGNC. They are FAN1, MTMR10 and EEF1DP3.
Figure 5.The genes in the filtered region were lacA, lacY and lacZ and we can see that there are no orthologs for the lacZ gene in the E. coli DH10B strain.
Figure 6.Having first retrieved the Ensembl gene IDs for the three APL1 genes, these are used to filter the A. gambiae data set. Fifty variations were retrieved that lie within the three genes of the APL1 locus.
Figure 7.The ability to retrieve sequence information for genes of interest is a powerful feature of the BioMart tool. Here a user can download the coding sequence for all genes on chromosome 22 as well as additional information about each gene and this can be exported in a useful format.
| Ensembl Genes 61: | Gene type: protein_coding | Ensembl Gene ID |
| Limit to genes with these family or domain IDs: IPR000276 | Associated Gene Name | |
| Affy HuGene 1_0 st v1 |
| Ensembl Variation 61: | Limit to variants with these IDs: esv263 | Chromosome Name |
| Sequence region start (bp) | ||
| Sequence region end (bp) | ||
| Structural Variation Name | ||
| Structural Variation Description | ||
| Source Name |
| Ensembl Variation 61: | Phenotype: COSMIC: tumor_site:eye | Variation ID |
| Chromosome name | ||
| Position on Chromosome (bp) | ||
| Allele | ||
| Phenotype description | ||
| Associated gene | ||
| Ensembl Gene ID |
| Ensembl Variation 61: | Limit to variants with these IDs dbSNP rs IDs: rs348, rs362, rs364, rs565, rs645 | Variation ID |
| Chromosome name | ||
| Position on chromosome (bp) | ||
| Ensembl Gene ID | ||
| Ensembl Genes 61: | HGNC ID | |
| HGNC symbol |
| Ensembl Bacteria Bacterial Mart (Release 8): | Gene start (bp): 360473 | Ensembl Gene ID |
| Gene end (bp): 365601 | Ensembl Transcript ID | |
| Associated Gene Name | ||
| Escherichia coli DH10B Ensembl Gene ID | ||
| Escherichia coli DH10B Chromosome Start (bp) | ||
| Escherichia coli DH10B Chromosome End (bp) | ||
| Escherichia coli O157:H7 EC4115 Ensembl Gene ID | ||
| Escherichia coli O157:H7 EC4115 Chromosome Start (bp) | ||
| Escherichia coli O157:H7 EC4115 Chromosome End (bp) |
| Ensembl MetazoaMetazoa Variation Mart (release 8): | Ensembl Gene IDs: AGAP007035 AGAP007036 AGAP007033 | Variation ID |
| Chromosome name | ||
| Position on Chromosome (bp) | ||
| Allele | ||
| dbSNP rsID | ||
| Strain Name | ||
| Strain Genotype | ||
| Ensembl Gene ID | ||
| Biotype |
| Ensembl Gene 61: | Chromosome 22 | Coding sequence |
| Ensembl Gene ID | ||
| Associated Gene Name | ||
| Associated Gene DB | ||
| Gene Start (bp) | ||
| Gene End (bp) |