Literature DB >> 19906699

Ensembl's 10th year.

Paul Flicek1, Bronwen L Aken, Benoit Ballester, Kathryn Beal, Eugene Bragin, Simon Brent, Yuan Chen, Peter Clapham, Guy Coates, Susan Fairley, Stephen Fitzgerald, Julio Fernandez-Banet, Leo Gordon, Stefan Gräf, Syed Haider, Martin Hammond, Kerstin Howe, Andrew Jenkinson, Nathan Johnson, Andreas Kähäri, Damian Keefe, Stephen Keenan, Rhoda Kinsella, Felix Kokocinski, Gautier Koscielny, Eugene Kulesha, Daniel Lawson, Ian Longden, Tim Massingham, William McLaren, Karine Megy, Bert Overduin, Bethan Pritchard, Daniel Rios, Magali Ruffier, Michael Schuster, Guy Slater, Damian Smedley, Giulietta Spudich, Y Amy Tang, Stephen Trevanion, Albert Vilella, Jan Vogel, Simon White, Steven P Wilder, Amonida Zadissa, Ewan Birney, Fiona Cunningham, Ian Dunham, Richard Durbin, Xosé M Fernández-Suarez, Javier Herrero, Tim J P Hubbard, Anne Parker, Glenn Proctor, James Smith, Stephen M J Searle.   

Abstract

Ensembl (http://www.ensembl.org) integrates genomic information for a comprehensive set of chordate genomes with a particular focus on resources for human, mouse, rat, zebrafish and other high-value sequenced genomes. We provide complete gene annotations for all supported species in addition to specific resources that target genome variation, function and evolution. Ensembl data is accessible in a variety of formats including via our genome browser, API and BioMart. This year marks the tenth anniversary of Ensembl and in that time the project has grown with advances in genome technology. As of release 56 (September 2009), Ensembl supports 51 species including marmoset, pig, zebra finch, lizard, gorilla and wallaby, which were added in the past year. Major additions and improvements to Ensembl since our previous report include the incorporation of the human GRCh37 assembly, enhanced visualisation and data-mining options for the Ensembl regulatory features and continued development of our software infrastructure.

Entities:  

Mesh:

Year:  2009        PMID: 19906699      PMCID: PMC2808936          DOI: 10.1093/nar/gkp972

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

On 27 January 2000, the Ensembl project announced the completion of ‘Ensembl Milestone 1’, the first complete release of the project’s data and web interface. The release included gene predictions and repeat annotations across all human DNA sequence available at that time (both finished and draft) and provided supporting evidence for the gene predictions including protein homology matches. All data was accessible though the Ensembl web site and via FTP download. In the email that announced Milestone 1, we noted that work on Ensembl was still in progress and promised significant improvements in the coming months. Ten years later, Ensembl provides a much larger and more complete genomic information resource in support of dozens of genomes including gene sets, multi-species alignments, annotations of orthologous and paralogous genes, and extensive variation and regulatory information. All data is available through a variety of visual and programmatic interfaces including the Ensembl Genome Browser, the Perl API and BioMart. We also provide a complete copy of all data and code to be used freely by the community. Ensembl works closely with a number of other fundamental bioinformatics projects that provide resources to the wider research community to ensure data consistency and increase resource interconnectedness. Some of these projects include the Genome Browser at UCSC (1), the databases and resources of the NCBI (2), the Rat Genome Database (RGD) (3) and VEGA (4). In this article, we provide a general overview of the data available within Ensembl and highlight some of the features and developments that have been introduced since our last report (5). Ensembl is comprehensively updated approximately five times each year and details of the new and updated data in each release are always provided on the Ensembl news pages linked from http://www.ensembl.org. Additionally, we provide more immediate information on the Ensembl blog at http://ensembl.blogspot.com/ as well as through the low volume Ensembl announce mailing list, which is open to all. To subscribe to the list, send an email to majordomo@ebi.ac.uk with the text ‘subscribe ensembl-announce’ as the message body.

RESULTS

Over the past year, we introduced seven new species including the anole lizard (Anolis carolinensis), the first reptile in Ensembl. Other species included the two-toed sloth (Choloepus hoffmanni), white-tufted-ear marmoset (Callithrix jacchus), the pig (Sus scrofa), the Tammar wallaby (Macropus eugenii), the zebra finch (Taeniopygia guttata) and the Western lowland gorilla (Gorilla gorilla). Of these, the anole lizard, zebra finch, marmoset and pig were high coverage genome assemblies based on ∼4–6 times coverage from Sanger-style sequencing reads and gorilla was the first example of an assembly that combined traditional Sanger-style sequencing at low coverage with high-throughput short-read sequencing at high coverage. Ensembl now fully supports a total of 24 high-coverage chordate genomes and 23 low-coverage chordate genomes. The lamprey (Petromyzon marinus), another high-coverage chordate genome, is currently provided with preliminary support only. An additional three non-chordate species (Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster) are included to facilitate comparative analysis. As of release 56 (September 2009), we transferred support of two mosquito species from Ensembl to our sister project Ensembl Genomes (http://www.ensemblgenomes.org). Both aedes (Aedes aegypti) and anopheles (Anopheles gambiae) will continue to be available through Ensembl Metazoa (http://metazoa.ensembl.org).

Gene annotation

In addition to the newly supported species listed above, for each of which we released a comprehensive gene set, we released new gene sets for a number of other species. In general new gene sets are released in conjunction with each new genome assembly. Our largest single effort over the past year has been in support of the GRCh37 human assembly, which was released by the Genome Reference Consortium in early 2009 (http://www.genomereference.org). This new build includes a long list of genomic regions that have been assessed for accuracy and updated where necessary. To support projects such as ENCODE and the 1000 Genomes, we will continue to provide complete resources for the NCBI36 human assembly in the form of an enhanced Ensembl archive site including BLAT/BLAST sequence search and other features not present in standard archive sites. The site, http://ncbi36.ensembl.org, will remain active until at least Summer 2010 when, depending on usage, we intend to provide support for the NCBI36 assembly only in the form of a typical Ensembl archive site. With the new GRCh37 assembly, a larger fraction of Ensembl genes correspond to RefSeq (6) and UniProt (7) entries suggesting continuing convergence of all of these resources [Figure 1 and compare with Figure 1 in Birney et al. (8)]. The improved convergence level is the result of at least three components: First, the genome assembly has improved. Second, the Ensembl gene build strategy has improved including the development of a combined Ensembl/Havana merged gene set (5), which increased the number of protein-coding transcripts. Third, the other resources (i.e. RefSeq and UniProtKB) have themselves independently improved their quality and internal consistency.
Figure 1.

The convergence of the Ensembl gene set and the UniProt and RefSeq resources shown over time. Three versions of Ensembl (release 44 in April 2007, release 47 in October 2007 and release 55 in July 2009) are each compared to the data available from Swiss-Prot/UniProtKB, NCBI RefSeq Proteins and NCBI RefSeq mRNAs. The colours on the bars represent the fraction of the Ensembl entries that perfectly match the entries in the other resources (blue); have matching edges and an internal mismatch or indel (red); have a substantial, but incomplete match (green); or are missing (purple).

The convergence of the Ensembl gene set and the UniProt and RefSeq resources shown over time. Three versions of Ensembl (release 44 in April 2007, release 47 in October 2007 and release 55 in July 2009) are each compared to the data available from Swiss-Prot/UniProtKB, NCBI RefSeq Proteins and NCBI RefSeq mRNAs. The colours on the bars represent the fraction of the Ensembl entries that perfectly match the entries in the other resources (blue); have matching edges and an internal mismatch or indel (red); have a substantial, but incomplete match (green); or are missing (purple). Ensembl, in partnership with NCBI, UCSC and the Havana project, continues to play an active role in the CCDS consortium (9). As of Ensembl release 56 (September 2009), 19 851 Ensembl translations match human CCDS consensus coding region structures exactly, and 17 679 Ensembl translations match mouse CCDS structures exactly. In last year’s report, we described in detail the creation of the extensively supported human and mouse genes sets through the merging of the Ensembl and Havana gene sets. These efforts continue in the context of the GENCODE project and have culminated in the Ensembl release 56 geneset becoming the GENCODE geneset (release 3c). Beyond human and mouse, we released a new gene set in support of the Zv8 assembly of the zebrafish genome, which incorporates many of the new methods applied in the human and mouse builds. For the rat genome, we released a completely updated gene set using the previous assembly to incorporate the significant additional supporting information that had become available since our previous gene set was created. A number of other species, including horse and cow, received relatively minor updates. Ensembl also incorporated the data formerly held in the Alternative Splicing and Transcript Diversity (ASTD) database as part of the planned decommissioning of this database and consolidation of genomic annotation data (10).

Functional genomics and regulatory information

We have continued development of the Ensembl Regulatory Build that has been briefly described in our previous reports (5,11). Over the past year we released two updates to the set of human Ensembl regulatory features and continued our focus on CD4+T-cells by incorporating additional histone modification data from Wang et al. (12). We also released the first version of the mouse regulatory build focused on embryonic stem (ES) cells based in part on data from Mikkelson et al. (13). Additionally, Ensembl regulatory features and their supporting data such as sites of DNase I hypersensitivity and selected histone modifications are now available via BioMart to facilitate efficient data mining of the regulatory features. Finally, in Ensembl version 56 (September 2009), we launched a dedicated visualisation of the regulatory features in the form of a Regulation Tab at the top of the page (Figure 2). The view currently provides information about the supporting features that are used to automatically assign a preliminary regulatory function to genome regions. Our regulatory feature view will be an important area of focus over the next 12 months as we incorporate data being produced by the ENCODE project.
Figure 2.

Ensembl regulatory feature: ENSR00000131372, a promoter-associated feature located on human chromosome 6 shown with the anchoring DNase I hypersensitivity sites and supporting histone modification data.

Ensembl regulatory feature: ENSR00000131372, a promoter-associated feature located on human chromosome 6 shown with the anchoring DNase I hypersensitivity sites and supporting histone modification data.

Ensembl software and code base

As mentioned above, the Ensembl code base is being reused within the Ensembl Genomes project, which seeks to extend the Ensembl infrastructure across the taxonomic space. A number of updates to the core Ensembl infrastructure were necessary to support specific needs of Ensembl Genomes that had not been previously required by Ensembl. For example, the Ensembl core databases now support multiple species within a single core database as well as provide preliminary support for alternative transcription initiation. Support for operons is planned for the near future. The Registry component of the Ensembl API, which allows users to automatically configure database connections and other behaviours of the API, was redesigned to support connections to multiple database servers in different physical locations.

Improved data mining and analysis resources

Ensembl calculates and provides a number of key results that are useful for data integration and analyses. One of the most widely used and important examples is the identification of external references (x-refs), which was completely re-factored this year. Ensembl’s x-refs associate external database identifiers to Ensembl gene and transcript IDs and serve to enable data connections between Ensembl and biological databases such as UniProt, EMBL and RefSeq. Several x-ref assignment methods are used as described here. Direct x-refs are those where a straightforward mapping between the Ensembl ID and the external ID already exists, such as when the assignment is done by the external resource. Primary x-refs are assigned by sequence matching using Exonerate (14) between the Ensembl DNA or peptide sequences and those in the external resources. Dependent x-refs are inferred from primary x-refs where the source database references other identifiers. Finally, a class of defined priority x-refs allow for prioritisation of sources that may provide several references for the same external identifier. We have redesigned the Ensembl Ontology database and API to make access to ontology data more consistent and straightforward. For example, Gene Ontology (GO) terms (15) and their relationships to each other are now stored in a more generic and hierarchical manner; this allows more flexible querying and the ability to perform transitive closures on GO terms which was not possible before. GO slims (http://www.geneontology.org/GO.slims.shtml) are now also supported. The Ensembl BioMart provides access to most of our data resources in a way that facilitates the creation of complex database queries in a relatively simple manner (16). In addition to their availability through the Ensembl web site, the Ensembl BioMart is also available from the main BioMart Portal (17).

Variation

Ensembl’s variation data resources continue to be dominated by data imported from dbSNP (2). Over the past year we integrated data from the 1000 Genomes Project that was incorporated into dbSNP 130 and created initial SNP sets for orangutan and zebra finch in conjunction with the release of the gene sets for these species. In the variation web display, all SNPs are now provided with phylogenetic context if they map to a region included in one of Ensembl’s multiple alignments. The phylogenetic context includes ancestral sequence reconstructions from Ortheus (18), allowing users to look at the ancestral alleles in an evolutionary context. By integrating data from the NHGRI curated catalogue of SNP-trait associations (19) in addition to data provided by the European Genome-phenome Archive (EGA), we have assigned annotations to over 1100 SNPs found to be associated with nearly 200 phenotypes and provided links to the published evidence. These annotations can be found on the corresponding variation page and are available through the general Ensembl search interface by searching with the phenotype name.

Web

In our last report we extensively described the fourth major design of the Ensembl web site, which was formally launched as a part of Ensembl Release 51 (November 2008) (5). With nearly 12 months of experience of the new site, we have made a number of comparably minor changes aimed at continual performance increases and reimplementation of some displays not included in the initial release of the new web code. Performance improvements included the implementation of nginx and memcached to improve server responsiveness. Visualisations reintroduced over the course of the year included multi-species comparison and alignment views that incorporate extensive contextual annotations such as genes, repeats and other features from each of the aligned species. We have also completed major changes to the Ensembl drawing code, which allows tracks to be configured via entries in the relevant database instead of in separate static files. Finally, we have improved the ability for users to find the specific tracks that they want to display by incorporating a search box into the AJAX control panel that provides centralised page configuration. In parallel with the new web design, we have implemented a more comprehensive monitoring of Ensembl’s performance at numerous locations around the world. We have deployed a fully functioning Ensembl mirror site to a physical location in California. This site is available at http://uswest.ensembl.org and provides an up-to-date mirror site fully monitored and maintained by Ensembl. Our tests show that users in North America and the Pacific Rim will experience faster response times from the US mirror compared to our main site in the United Kingdom and we will automatically offer users from these areas the ability to use our US mirror site as their default Ensembl. Other public Ensembl mirror sites are maintained by the user community with support from the project. Those users who take advantage of our user accounts to share settings and save sessions across multiple computers may find that the main site continues to provide faster performance due to the necessity of maintaining a single database with settings and sessions. To address the continual growth of the size of biological databases, we have begun testing full Ensembl installations in commercial cloud computing environments. Ensembl is also currently provided as one of the free Public Data Sets on Amazon Web Services that can be integrated into any cloud based application on AWS.

Comparative genomics resources

As the number of species increases within Ensembl, our comparative genomics resources become more valuable as information sources for highly-used genomes such as human, mouse and rat. They also serve as a way to connect all aspects of the project. One of the biggest challenges this year has been the update of the pairwise and multiple alignments to support the release of the GRCh37 human assembly. We also updated the comprehensive 31-way multi-species alignment (MSA) to include all of the low coverage mammalian genome sequences and now provide BED files for human and mouse constrained elements as determined by alignments of placental mammals. The recently published Enredo-Pecan-Ortheus (EPO) pipeline is at the heart of Ensembl’s MSA computations (18,20,21). Ensembl GeneTrees are the result of a comprehensive analysis to predict phylogeny in vertebrates and have recently been described in detail (22). The latest improvements include the use of the meta-aligner M-Coffee (23) and incorporation of information about exon boundaries into the alignments. We now restrict our calculation of pairwise dN/dS values such that they are only calculated for high-coverage species pairs, as we found the results to be more accurate. The current GreeTree pipeline is more robust to large gene clusters, which must be built into separate trees for computational reasons. We now annotate genes in separate trees that come from the same large cluster as distant within-species paralogues. We also annotate gene-split events (which may be real or artefactually due to an assembly problem) by analysing the protein multiple alignments: when two proteins of the same species do not overlap in the alignment, we label them the result of a gene-split event. Visually, we added clade-specific colours to the GeneTree view to help with the interpretation of the trees. It is also possible to hide or collapse genes from pre-defined clades or from the low-coverage genomes.

Outreach and user support

Ensembl has an extensive commitment to user support, outreach and training. Provided courses include browser focused workshops introducing Ensembl to users who have never visited the site before; in depth meetings attended by developers who are building bioinformatics applications based on the Ensembl code base; and courses for clinical users interested in leveraging the Ensembl resources to help understand connections between genotype and phenotype. We also participate in regular training courses such as EBI Road Shows and Wellcome Trust Open Door Workshops that incorporate information from many of the resources developed and hosted on the Wellcome Trust Genome Campus. We aim to provide on-site training for as many of our users across the world as possible and have recently conducted trained events in Europe, North and South America, Asia, Africa and the Middle East. We invite users interested in scheduling training to contact the Ensembl helpdesk at helpdesk@ensembl.org. For those users unable to attend a workshop in person, we are developing an extensive video library of tutorials. Our current selection is available though the Ensembl YouTube channel at http://www.youtube.com/user/EnsemblHelpdesk.

Future directions

In last year’s report, we described some of the ways that we are adapting Ensembl to the data generated by the current generation of high-throughput sequencing machines (5). We continued this theme in this report with the annotation of the first genome assembly created from combined traditional long read and next-generation short read technologies. Next year we expect to release gene sets on genome assemblies created entirely with next generation sequencing data. For a number of species, we also plan to create gene sets that incorporate short read transcriptomic data, which have shown considerable potential to increase the accuracy of our gene annotations in initial experiments using RNA-seq data from a number of zebrafish tissues. A significant focus in the next year will be the display and annotation of variation data. Through our participation in the Locus Reference Genomic (LRG) consortium (http://www.lrg-sequence.org), we plan to incorporate summary data from Locus Specific Databases (LSDBs) at the level recommended by the community (24,25). We are also developing and testing new variation displays as part of the 1000 Genomes Project, which runs a browser based on the Ensembl code at http://browser.1000genomes.org.

FUNDING

The Ensembl project receives primary funding from the Wellcome Trust. Additional funding is provided by the European Union, BBSRC, NHGRI, NIH-NIAID and EMBL. Funding for open access charge: Wellcome Trust. Conflict of interest statement. None declared.
  25 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Enredo and Pecan: genome-wide mammalian consistency-based multiple alignment with paralogs.

Authors:  Benedict Paten; Javier Herrero; Kathryn Beal; Stephen Fitzgerald; Ewan Birney
Journal:  Genome Res       Date:  2008-10-10       Impact factor: 9.043

3.  Combinatorial patterns of histone acetylations and methylations in the human genome.

Authors:  Zhibin Wang; Chongzhi Zang; Jeffrey A Rosenfeld; Dustin E Schones; Artem Barski; Suresh Cuddapah; Kairong Cui; Tae-Young Roh; Weiqun Peng; Michael Q Zhang; Keji Zhao
Journal:  Nat Genet       Date:  2008-06-15       Impact factor: 38.330

4.  Planning the human variome project: the Spain report.

Authors:  Jim Kaput; Richard G H Cotton; Lauren Hardman; Michael Watson; Aida I Al Aqeel; Jumana Y Al-Aama; Fahd Al-Mulla; Santos Alonso; Stefan Aretz; Arleen D Auerbach; Bharati Bapat; Inge T Bernstein; Jong Bhak; Stacey L Bleoo; Helmut Blöcker; Steven E Brenner; John Burn; Mariona Bustamante; Rita Calzone; Anne Cambon-Thomsen; Michele Cargill; Paola Carrera; Lawrence Cavedon; Yoon Shin Cho; Yeun-Jun Chung; Mireille Claustres; Garry Cutting; Raymond Dalgleish; Johan T den Dunnen; Carlos Díaz; Steven Dobrowolski; M Rosário N dos Santos; Rosemary Ekong; Simon B Flanagan; Paul Flicek; Yoichi Furukawa; Maurizio Genuardi; Ho Ghang; Maria V Golubenko; Marc S Greenblatt; Ada Hamosh; John M Hancock; Ross Hardison; Terence M Harrison; Robert Hoffmann; Rania Horaitis; Heather J Howard; Carol Isaacson Barash; Neskuts Izagirre; Jongsun Jung; Toshio Kojima; Sandrine Laradi; Yeon-Su Lee; Jong-Young Lee; Vera L Gil-da-Silva-Lopes; Finlay A Macrae; Donna Maglott; Makia J Marafie; Steven G E Marsh; Yoichi Matsubara; Ludwine M Messiaen; Gabriela Möslein; Mihai G Netea; Melissa L Norton; Peter J Oefner; William S Oetting; James C O'Leary; Ana Maria Oller de Ramirez; Mark H Paalman; Jillian Parboosingh; George P Patrinos; Giuditta Perozzi; Ian R Phillips; Sue Povey; Suyash Prasad; Ming Qi; David J Quin; Rajkumar S Ramesar; C Sue Richards; Judith Savige; Dagmar G Scheible; Rodney J Scott; Daniela Seminara; Elizabeth A Shephard; Rolf H Sijmons; Timothy D Smith; María-Jesús Sobrido; Toshihiro Tanaka; Sean V Tavtigian; Graham R Taylor; Jon Teague; Thoralf Töpel; Mollie Ullman-Cullere; Joji Utsunomiya; Henk J van Kranen; Mauno Vihinen; Elizabeth Webb; Thomas K Weber; Meredith Yeager; Young I Yeom; Seon-Hee Yim; Hyang-Sook Yoo
Journal:  Hum Mutat       Date:  2009-04       Impact factor: 4.878

5.  Sharing data between LSDBs and central repositories.

Authors:  Johan T den Dunnen; Rolf H Sijmons; Paal S Andersen; Mauno Vihinen; Jacques S Beckmann; Sandro Rossetti; C Conover Talbot; Ross C Hardison; Sue Povey; Richard G H Cotton
Journal:  Hum Mutat       Date:  2009-04       Impact factor: 4.878

6.  Genome-wide maps of chromatin state in pluripotent and lineage-committed cells.

Authors:  Tarjei S Mikkelsen; Manching Ku; David B Jaffe; Biju Issac; Erez Lieberman; Georgia Giannoukos; Pablo Alvarez; William Brockman; Tae-Kyung Kim; Richard P Koche; William Lee; Eric Mendenhall; Aisling O'Donovan; Aviva Presser; Carsten Russ; Xiaohui Xie; Alexander Meissner; Marius Wernig; Rudolf Jaenisch; Chad Nusbaum; Eric S Lander; Bradley E Bernstein
Journal:  Nature       Date:  2007-07-01       Impact factor: 49.962

7.  Ensembl 2006.

Authors:  E Birney; D Andrews; M Caccamo; Y Chen; L Clarke; G Coates; T Cox; F Cunningham; V Curwen; T Cutts; T Down; R Durbin; X M Fernandez-Suarez; P Flicek; S Gräf; M Hammond; J Herrero; K Howe; V Iyer; K Jekosch; A Kähäri; A Kasprzyk; D Keefe; F Kokocinski; E Kulesha; D London; I Longden; C Melsopp; P Meidl; B Overduin; A Parker; G Proctor; A Prlic; M Rae; D Rios; S Redmond; M Schuster; I Sealy; S Searle; J Severin; G Slater; D Smedley; J Smith; A Stabenau; J Stalker; S Trevanion; A Ureta-Vidal; J Vogel; S White; C Woodwark; T J P Hubbard
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

8.  BioMart--biological queries made easy.

Authors:  Damian Smedley; Syed Haider; Benoit Ballester; Richard Holland; Darin London; Gudmundur Thorisson; Arek Kasprzyk
Journal:  BMC Genomics       Date:  2009-01-14       Impact factor: 3.969

9.  Ensembl 2009.

Authors:  T J P Hubbard; B L Aken; S Ayling; B Ballester; K Beal; E Bragin; S Brent; Y Chen; P Clapham; L Clarke; G Coates; S Fairley; S Fitzgerald; J Fernandez-Banet; L Gordon; S Graf; S Haider; M Hammond; R Holland; K Howe; A Jenkinson; N Johnson; A Kahari; D Keefe; S Keenan; R Kinsella; F Kokocinski; E Kulesha; D Lawson; I Longden; K Megy; P Meidl; B Overduin; A Parker; B Pritchard; D Rios; M Schuster; G Slater; D Smedley; W Spooner; G Spudich; S Trevanion; A Vilella; J Vogel; S White; S Wilder; A Zadissa; E Birney; F Cunningham; V Curwen; R Durbin; X M Fernandez-Suarez; J Herrero; A Kasprzyk; G Proctor; J Smith; S Searle; P Flicek
Journal:  Nucleic Acids Res       Date:  2008-11-25       Impact factor: 16.971

10.  The Universal Protein Resource (UniProt) 2009.

Authors: 
Journal:  Nucleic Acids Res       Date:  2008-10-04       Impact factor: 16.971

View more
  178 in total

1.  A novel candidate cis-regulatory motif pair in the promoters of germline and oogenesis genes in C. elegans.

Authors:  Chaim Linhart; Yonit Halperin; Amir Darom; Shahar Kidron; Limor Broday; Ron Shamir
Journal:  Genome Res       Date:  2011-09-19       Impact factor: 9.043

Review 2.  Genomic architecture of MHC-linked odorant receptor gene repertoires among 16 vertebrate species.

Authors:  Pablo Sandro Carvalho Santos; Thomas Kellermann; Barbara Uchanska-Ziegler; Andreas Ziegler
Journal:  Immunogenetics       Date:  2010-08-03       Impact factor: 2.846

3.  Towards a knowledge-based Human Protein Atlas.

Authors:  Mathias Uhlen; Per Oksvold; Linn Fagerberg; Emma Lundberg; Kalle Jonasson; Mattias Forsberg; Martin Zwahlen; Caroline Kampf; Kenneth Wester; Sophia Hober; Henrik Wernerus; Lisa Björling; Fredrik Ponten
Journal:  Nat Biotechnol       Date:  2010-12       Impact factor: 54.908

4.  From alpaca to zebrafish: hammerhead ribozymes wherever you look.

Authors:  Carsten Seehafer; Anne Kalweit; Gerhard Steger; Stefan Gräf; Christian Hammann
Journal:  RNA       Date:  2010-11-16       Impact factor: 4.942

5.  Transcription factors expressed in olfactory bulb local progenitor cells revealed by genome-wide transcriptome profiling.

Authors:  Gordon R O Campbell; Ariane Baudhuin; Karen Vranizan; John Ngai
Journal:  Mol Cell Neurosci       Date:  2010-12-29       Impact factor: 4.314

6.  Uncovering the human methyltransferasome.

Authors:  Tanya C Petrossian; Steven G Clarke
Journal:  Mol Cell Proteomics       Date:  2010-10-07       Impact factor: 5.911

7.  Clcn4-2 genomic structure differs between the X locus in Mus spretus and the autosomal locus in Mus musculus: AT motif enrichment on the X.

Authors:  Di Kim Nguyen; Fan Yang; Rajinder Kaul; Can Alkan; Anthony Antonellis; Karen F Friery; Baoli Zhu; Pieter J de Jong; Christine M Disteche
Journal:  Genome Res       Date:  2011-01-31       Impact factor: 9.043

8.  Characterization of the intronic portion of cadherin superfamily members, common cancer orchestrators.

Authors:  Patrícia Oliveira; Remo Sanges; David Huntsman; Elia Stupka; Carla Oliveira
Journal:  Eur J Hum Genet       Date:  2012-02-08       Impact factor: 4.246

9.  Reactome pathway analysis to enrich biological discovery in proteomics data sets.

Authors:  Robin Haw; Henning Hermjakob; Peter D'Eustachio; Lincoln Stein
Journal:  Proteomics       Date:  2011-09       Impact factor: 3.984

10.  Public data archives for genomic structural variation.

Authors:  Deanna M Church; Ilkka Lappalainen; Tam P Sneddon; Jonathan Hinton; Michael Maguire; John Lopez; John Garner; Justin Paschall; Michael DiCuccio; Eugene Yaschenko; Stephen W Scherer; Lars Feuk; Paul Flicek
Journal:  Nat Genet       Date:  2010-10       Impact factor: 38.330

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.