| Literature DB >> 18000006 |
P Flicek1, B L Aken, K Beal, B Ballester, M Caccamo, Y Chen, L Clarke, G Coates, F Cunningham, T Cutts, T Down, S C Dyer, T Eyre, S Fitzgerald, J Fernandez-Banet, S Gräf, S Haider, M Hammond, R Holland, K L Howe, K Howe, N Johnson, A Jenkinson, A Kähäri, D Keefe, F Kokocinski, E Kulesha, D Lawson, I Longden, K Megy, P Meidl, B Overduin, A Parker, B Pritchard, A Prlic, S Rice, D Rios, M Schuster, I Sealy, G Slater, D Smedley, G Spudich, S Trevanion, A J Vilella, J Vogel, S White, M Wood, E Birney, T Cox, V Curwen, R Durbin, X M Fernandez-Suarez, J Herrero, T J P Hubbard, A Kasprzyk, G Proctor, J Smith, A Ureta-Vidal, S Searle.
Abstract
The Ensembl project (http://www.ensembl.org) is a comprehensive genome information system featuring an integrated set of genome annotation, databases and other information for chordate and selected model organism and disease vector genomes. As of release 47 (October 2007), Ensembl fully supports 35 species, with preliminary support for six additional species. New species in the past year include platypus and horse. Major additions and improvements to Ensembl since our previous report include extensive support for functional genomics data in the form of a specialized functional genomics database, genome-wide maps of protein-DNA interactions and the Ensembl regulatory build; support for customization of the Ensembl web interface through the addition of user accounts and user groups; and increased support for genome resequencing. We have also introduced new comparative genomics-based data mining options and report on the continued development of our software infrastructure.Entities:
Mesh:
Year: 2007 PMID: 18000006 PMCID: PMC2238821 DOI: 10.1093/nar/gkm988
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The Ensembl regulatory build and GERP conservation track. A 90 kb region of human chromosome 10 showing Ensembl regulatory features in blue, green and grey on the bottom track and the GERP conservation track at the top. Note the overlap of gene-associated regulatory features with the start regions of both Ensembl transcripts and EST transcripts suggesting a complex transcriptional environment. The conservation track is a composite track that displays the constrained elements by default and both constrained elements and the GERP scores when expanded.
Figure 2.ChIP-chip display. Histone 3 lysine trimethylation data from mouse embryonic fibroblast cells (29) on mouse chromosome 17 in the region of the Q3UNB7 (ENSMUSG00000073442) and A630033E08Rik (ENSMUSG00000059142) genes. The Histone modifications display is a composite track that combines raw enrichment values and peak identifications. Displays that encompass large regions of the genome include only the identified peak regions.
Figure 3.SequenceAlignView. A full screen shot of a region on mouse chromosome 8 displaying available resequencing data from the 129S1/SvImJ, 129X1/SvJ and A/J laboratory mouse strains (the 129S1/SvImJ stain is marked as having no data in the region). Numerous display options are in the top panel on the page, which allow user to choose any region of the genomes, highlight Ensembl annotations, locations of knows SNPs and other information. The resequencing alignment in the bottom panel identifies exons in red and SNPs in yellow. Links to individual variations are provided to the right of the resequencing alignment.
Figure 4.DAS Visualizations. A 19 Mb region of human chromosome 11 showing identical data displayed with (from top to bottom) the colour gradient, histogram and tiling array ‘wiggle’ format. The colour gradient format transitions from yellow (low values) to blue (high value). The histogram display format supports merged data in bins across the genome; the display value is selectable to be either the average of the bin (shown here) or the maximum value in the bin to achieve greater data contrast. In the histogram format, the lowest value in the data set becomes the baseline. The tiling array format allows for the display of both positive and negative values with overlapping data points resulting in the maximum data point being displayed. All three display formats support in-line data normalization. The ideal format will depend on the data to be displayed. These example data are P-values from a genome-wide association study (30).