| Literature DB >> 16381931 |
E Birney1, D Andrews, M Caccamo, Y Chen, L Clarke, G Coates, T Cox, F Cunningham, V Curwen, T Cutts, T Down, R Durbin, X M Fernandez-Suarez, P Flicek, S Gräf, M Hammond, J Herrero, K Howe, V Iyer, K Jekosch, A Kähäri, A Kasprzyk, D Keefe, F Kokocinski, E Kulesha, D London, I Longden, C Melsopp, P Meidl, B Overduin, A Parker, G Proctor, A Prlic, M Rae, D Rios, S Redmond, M Schuster, I Sealy, S Searle, J Severin, G Slater, D Smedley, J Smith, A Stabenau, J Stalker, S Trevanion, A Ureta-Vidal, J Vogel, S White, C Woodwark, T J P Hubbard.
Abstract
The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased from 4 to 19, with the addition of the mammalian genomes of Rhesus macaque and Opossum, the chordate genome of Ciona intestinalis and the import and integration of the yeast genome. The year has also seen extensive improvements to both data analysis and presentation, with the introduction of a redesigned website, the addition of RNA gene and regulatory annotation and substantial improvements to the integration of human genome variation data.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16381931 PMCID: PMC1347495 DOI: 10.1093/nar/gkj133
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Number of genes of different classes for selected species
| Species | Protein-coding genes | miRNAs | Other ncRNAs |
|---|---|---|---|
| Human | 22 218 | 222 | 3353 |
| Mouse | 25 613 | 221 | 1353 |
| Rat | 21 952 | 208 | 1728 |
| Dog | 18 201 | 209 | 2059 |
Variation in numbers of protein coding genes reflects different cDNA resources and genome assembly quality. For example, mouse cDNA resources contain a significant amount of unscreened repeat contamination. There is a wide variation in ncRNA numbers whereas miRNA numbers are fairly constant (see text).
Figure 1The progressive improvement in the quality of human and mouse gene builds by comparison to curated protein and mRNA reference sequences is shown. The column legends indicate the species, reference dataset and assembly release number. UniSw indicates the Swiss-Prot (curated) part of UniProt. RefSeq indicates the curated part of RefSeq (i.e. excluding XP entries). Identical trends are seen in all four comparisons of human and mouse against UniSw and RefSeq. The four colours indicate the quality of the match to the reference dataset: blue indicates an exact match; maroon indicates matched ends with some internal mismatch/indel; yellow indicates an incomplete match and green indicates reference sequences that are missing from the gene build. There are multiple reasons for this improvement, including improvements in assembly quality, cDNA resources and algorithmic improvements to the gene build.
Figure 2A screenshot of the new alignslice view that is enabled by the multiple genome alignment. The top panel shows the human, rat and mouse genomes around the BRCA2 locus. The lower panel shows the base-pair alignment at the end of an exon (highlighted in the top panel by the central red box on human). In the base-pair view, exonic bases are blue and intronic bases are pink, with darker shades indicating conservation. Exon boundaries are highlighted with a red inverted L and SNPs are shown in red.
Figure 3The integration between Ensembl and the DAS protein 3D structure viewer SPICE is shown. The proteinview page of Ensembl shows the beta-globin gene HBB on chromosome 11. One of the non-synonymous SNPs is the sickle cell mutation at residue 7 (glutamic acid to valine). The PDB_spice DAS track shows a link to the PDB entry 1A3N chain B. In the SPICE window, which was opened by clicking on this track, the four chain structure of haemoglobin is shown on the left. The DAS annotations for the selected chain (B) are shown on the right. The uniprot_exon SNP DAS source is selected and the six SNPs are highlighted in the sequence of the chain (bottom right) and shown in the structure (dark green side chains with yellow highlights). Holding the mouse over residues in the structure panel shows the position of residue 7. Ensembl exposes its precalculated alignments between UniProt and Ensembl gene annotation as DAS sources (uniprot_exon).