Literature DB >> 18086701

The UCSC Genome Browser Database: 2008 update.

D Karolchik¹, R M Kuhn, R Baertsch, G P Barber, H Clawson, M Diekhans, B Giardine, R A Harte, A S Hinrichs, F Hsu, K M Kober, W Miller, J S Pedersen, A Pohl, B J Raney, B Rhead, K R Rosenbloom, K E Smith, M Stanke, A Thakkapallayil, H Trumbower, T Wang, A S Zweig, D Haussler, W J Kent.

Abstract

The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism genomes. Seventeen new assemblies have been added to the database in the past year, for a total coverage of 19 vertebrate and 21 invertebrate species as of September 2007. For each assembly, the GBD contains a collection of annotation data aligned to the genomic sequence. Highlights of this year's additions include a 28-species human-based vertebrate conservation annotation, an enhanced UCSC Genes set, and more human variation, MGC, and ENCODE data. The database is optimized for fast interactive performance with a set of web-based tools that may be used to view, manipulate, filter and download the annotation data. New toolset features include the Genome Graphs tool for displaying genome-wide data sets, session saving and sharing, better custom track management, expanded Genome Browser configuration options and a Genome Browser wiki site. The downloadable GBD data, the companion Genome Browser toolset and links to documentation and related information can be found at: http://genome.ucsc.edu/.

Entities: Disease Species

Mesh：

Year: 2007 PMID： 18086701 PMCID： PMC2238835 DOI： 10.1093/nar/gkm966

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Fundamental to expanding our knowledge of how the human body works in health and in disease is the capability to access and share data produced through experimentation and computational analysis. The University of California, Santa Cruz (UCSC) Genome Browser Database (GBD) (http://genome.ucsc.edu) (1) provides a common repository for genomic annotation data—including comparative genomics, genes and gene predictions; mRNA and EST alignments; and expression, regulation, variation and assembly data—and robust, flexible tools for viewing, comparing, distributing and analyzing the information. Produced and maintained by the Genome Bioinformatics Group at the UCSC Center for Biomolecular Science and Engineering, the GBD focuses primarily on vertebrate and model organism genomes, with an emphasis on comparative genomics analysis. As of September 2007 the GBD contains data for 11 mammalian species including human, mouse, rat, chimpanzee, rhesus macaque, horse, cow, cat, dog, opossum and platypus; 8 other vertebrates: chicken, lizard (Anolis carolinensis), frog (Xenopus tropicalis), zebrafish, fugu, tetraodon, medaka and stickleback; and 21 invertebrates including 11 flies, honeybee, Anopheles mosquito, five worms, one yeast (Saccharomyces cerevisiae) and two deuterostomes—purple sea urchin and sea squirt. For many of the organisms, more than one assembly is provided, and several older archived assemblies may be found at: http://genome-archive.cse.ucsc.edu/. The GBD stores a collection of annotation data for each assembly, which can be viewed graphically in the UCSC Genome Browser (2) as a series of ‘tracks’ aligned to the genomic sequence and grouped according to shared characteristics, for example gene predictions, gene expression and variation data. In most instances, each annotation track is represented by a position-oriented table based on genomic sequence coordinates, and may be supplemented by additional non-positional tables that supply related information or link the primary table to other tables in the database. The data are stored in a variety of formats described at: http://genome.ucsc.edu/FAQ/FAQformat. Minimally, the GBD provides assembly data, comparative genomics annotations, and mRNA, EST and RefSeq (3) gene alignments (when available) from GenBank (4) for each assembly. When available, links are provided to the complementary annotations in two other major genome browsers, Ensembl (5) and NCBI's MapViewer (6). A large set of additional annotations is available for widely studied genomes such as the human and mouse. Assemblies that lack sufficient native RefSeq data alignments and are of sufficient evolutionary distance from the human genome may also include a human proteins annotation that maps human exons using tBLASTn. The organizations and individuals who contributed to the sequencing, assembly, and annotation of featured organisms are acknowledged at: http://genome.ucsc.edu/goldenPath/credits.html; detailed information about the individual annotation tracks may be found in the Genome Browser by clicking the vertical gray or blue bars to the left of the displayed tracks. UCSC updates the genome assemblies and annotations in the GBD as new releases become available, with priority given to primate and model organism assemblies and annotations that we feel are of widespread interest to GBD users, based on input from our Scientific Advisory Board and feedback received through our mailing lists and user surveys. (The results from a users’ survey conducted in May 2007 may be reviewed at: http://genome.ucsc.edu/goldenPath/help/GBsurvey507.html.) RefSeq and mRNA data from GenBank are updated daily; EST data are updated weekly. In addition to the Genome Browser, several other graphical tools for exploring the data are available from the GBD website, including the Table Browser (7), which provides access for downloading and manipulating the GBD tables as text or tracks; the BLAT sequence-mapping tool (8); the In Silico PCR tool that searches a sequence database with a pair of PCR primers; the Gene Sorter (9) for exploring expression, homology and other gene relationships; the VisiGene in situ image browser, the Proteome Browser (10) for viewing related protein information; and the new Genome Graphs tool for uploading and viewing genome-wide data sets. This toolset is accompanied by a comprehensive set of online documentation and FAQs listed at http://genome.ucsc.edu/FAQ/. Online and hands-on training materials are available via the Training link (http://genome.ucsc.edu/training) on the GBD home page. The GBD data, tools and source may be downloaded from http://hgdownload.cse.ucsc.edu/downloads.html. Instructions for setting up a local server to mirror all or part of the GBD data can be found at http://genome.ucsc.edu/admin/mirror.html.

DATA ACQUISITION AND METHODS

New data

During the year ending September 2007 UCSC added eight new organisms to the GBD: lizard (A. carolinensis), horse, platypus, medaka, stickleback and three worms (Caenorhabditis brenneri, C. remanei and Pristionchus pacificus). Nine existing organisms were updated with new assemblies: mouse, cow, cat, fugu, zebrafish, Drosophila melanogaster, two worms (C. elegans, C. briggsae) and sea urchin. As updates are added, older assemblies remain accessible either on the primary website or through the GBD archives.

UCSC Genes—the next generation of Known Genes

In April 2007 UCSC released UCSC Genes (W.J. Kent, manuscript in preparation), an improved version of the existing Known Genes annotation (11), on the March 2006 (Build 36, hg18) human assembly. This annotation, which includes putative non-coding genes as well as protein-coding genes and 99.9% of RefSeq genes, is a moderately conservative prediction set based on data from RefSeq, GenBank and UniProt (12). Each entry requires the support of one GenBank RNA sequence and at least one additional line of evidence, with the exception of RefSeq RNAs, which require no additional evidence. Although some of the transcripts labeled as ‘non-coding’ in the set may actually code for protein, typically the evidence for the associated protein is weak. Compared to RefSeq, this gene set generally has about 10% more protein-coding genes, approximately five times as many putative non-coding genes, and about twice as many splice variants. As part of the migration to the UCSC Genes annotation, we now use our own UCSC Genes accession numbers as the primary key into the underlying knownGene table, rather than the GenBank mRNA accessions used in previous Known Genes annotations. The base accession numbers remain stable across iterations of the data set, although the suffixes may change to reflect version updates. A companion annotation to UCSC Genes, the Alt Events track, shows various types of alternative splicing, alternative promoter and other events that result in more than a single transcript from the same gene.

28-Species conservation

UCSC released a new Conservation (13) annotation track on the March 2006 (Build 36, hg18) human genome in June 2007. This track displays multiz (14) multiple alignments of 27 vertebrate species to the human genome, along with measurements of evolutionary conservation across all 28 species and a separate measurement of conservation across the placental mammal subset of species (18 organisms). Included in the track are 5 new high-quality assemblies—horse, platypus, lizard, stickleback and medaka; 6 new low-coverage mammalian genomes—bushbaby, tree shrew, guinea pig, hedgehog, common shrew and cat; 6 updated assemblies—chimp, cow, chicken, frog, fugu and zebrafish; and 10 assemblies included in the previous version of the Conservation track—rhesus, mouse, rat, rabbit, dog, armadillo, elephant, tenrec, opossum and tetraodon. In addition to the expanded species list, the new Conservation track has been enhanced to include additional filtering of pairwise alignments for each species to reduce paralogous alignments and information about the quality of aligning species sequence included in the multiple alignments downloads. A similar Conservation annotation of at least 30 species is scheduled for release on the July 2007 (Build 37, mm9) mouse assembly in the last quarter of 2007.

Variation and disease

Within the Variation and Repeats annotation group, UCSC has added several new data sets. The simple nucleotide polymorphism (SNP) data from dbSNP (15) Build 126, already available on the human and mouse assemblies, has been added to the chimp, rat, and dog. Updates to SNP Build 128 will be incorporated pending data release from dbSNP. SNP annotations may be filtered by several attributes, including average heterozygosity and weight, location type, class, validation, function and molecule type. The alignments of the SNP's flanking sequences to the genome are displayed on the details page for each SNP; in addition, the hg18 SNP details pages include the chimp and rhesus macaque orthologous alleles. We have also added a HapMap SNPs annotation (16) to the hg17 and hg18 assemblies containing data for 4 million SNPs (dbSNP 125) from four populations, together with the display of orthologous alleles from chimp and rhesus macaque and several options for filtering the data display. The Structural Variation annotation has been expanded to include structural variation data (17), deletions detected by several techniques (18–20) and numerous copy number polymorphism data sets (21–25). The SNP Arrays track displays SNPs available for genotyping with several different microarrays. The Exapted Repeats annotation displays conserved non-exonic elements that have been deposited by characterized mobile elements (26).

Mapping, gene prediction, regulation and expression data

In addition to updating selected existing data sets, we have introduced several new annotations to various assemblies. High-confidence gene annotations from the Consensus Coding DNA Sequence (CCDS) project (http://www.ncbi.nlm.nih.gov/CCDS/) have been added to more human assemblies and to the mouse. The Affymetrix Transcriptome Phase 3 data set (human) (27) displays transcriptome data from tiling Affymetrix GeneChips. The ORegAnno (Open Regulatory Annotation) track (several species) shows literature-curated regulatory regions, transcription factor binding sites and regulatory polymorphisms from the ORegAnno database (28). The ACEScan annotation (human) identifies predicted alternative human-mouse conserved exons from ACEScan (29). The CGAP SAGE track (human, mouse) displays genomic mappings for human LongSAGE tags from the Cancer Genome Anatomy Project (CGAP) (30), using the Serial Analysis of Gene Expression (SAGE) quantitative technique (31). The MGI QTL track (mouse) shows approximate positions of quantitative trait loci based on reported peak LOD scores from the Jackson Laboratory Mouse Genome Informatics group. The zebrafish genome now provides expression data using the Affymetrix Zebrafish GeneChip Genome Array (32).

Mammalian Gene Collection

The track details pages for features in the Mammalian Gene Collection (MGC) (33) Genes annotation track (available for several species) now include extensive information about the MGC clones, including links for ordering the clones. A new annotation on the hg18 human assembly, ORFeome Clones, shows alignments of human clones from the ORFeome Collaboration (http://www.orfeomecollaboration.org/) (34), a project that aims to be an unrestricted source of fully sequence-validated full-ORF human cDNA clones, with the goal of providing at least one fully sequenced full-ORF clone for each human gene. This annotation is automatically updated daily as new clones become available.

ENCODE

The Genome Browser serves as the data repository for the ENCODE (Encyclopedia of DNA Elements) project (35). The set of human genome annotations available on the UCSC ENCODE portal (http://genome.ucsc.edu/ENCODE/) (36), contributed by members of the ENCODE Consortium, has increased by 40% in the past 12 months, from 130 tracks and 950 tables on 2 assemblies in September 2006 to 199 tracks and 1583 tables on 3 assemblies in September 2007. The ENCODE data sets are now available on the March 2006 (Build 36, hg18) as well as the May 2004 (Build 35, hg17) human assembly.

VisiGene and Gene Sorter data sets

We have updated the VisiGene image sets from the Jackson Lab Mouse Genome Informatics Database and the Allen Brain Atlas, and have added links from the mouse Known Genes and UCSC Genes details pages to X. laevis images. The VisiGene probe-processing utilities have been updated to interact with the new UCSC Genes set. The hg18 Gene Sorter has been expanded to include the Wanker (37), Vidal (38) and Human Protein Reference Database (39) data sets showing the neighborhood of protein interactions surrounding selected genes. The neighborhoods are computed from a genome-wide protein–protein interaction network that connects genes if the proteins they encode have been detected to physically interact in high-throughput experiments. The xxBlastTab tables, which display gene ortholog data for human, mouse and rat in the Gene Sorter and UCSC Genes details pages, have now been filtered for synteny.

NEW FEATURES

The GBD and the Genome Browser toolset are dynamic resources that continually evolve to accommodate new genome assemblies, data types and research requirements. In the past year we have expanded and enhanced many of our Genome Browser tools to improve data browsing and manipulation capabilities for our users and collaborators.

Viewing genome-wide data with Genome Graphs

Genome Graphs (http://genome.ucsc.edu/cgi-bin/hgGenome), a new tool accessible via the ‘Genome Graphs’ link on the GBD home page, can be used to display genome-wide data sets, for example, the results of genome-wide SNP association studies, linkage studies and homozygosity mapping. Using the Genome Graphs tool, it is possible to upload or import several sets of genome-wide data and display them simultaneously (Figure 1), then accomplish such tasks as restricting the display to only those regions that exceed a set significance threshold, displaying genes via the Gene Sorter that exist in areas where the data meet a given significance threshold, displaying an area of interest in the Genome Browser and calculating the correlation coefficient (R) among the data sets. Both public and personal data sets may be loaded into the tool, and the display can be configured to suit individual needs.

Figure 1.

Genome Graphs for the human May 2004 (Build 35, hg17) assembly loaded with data published by the Wellcome Trust Case Control Consortium from a genome-wide association study of seven common diseases (40).

Genomewiki

We have launched a wiki site for sharing information about the UCSC Genome Browser and its data. The wiki—at: http://genomewiki.ucsc.edu—provides an informal forum for our browser users, mirror sites and staff to discuss topics of interest in the genome biology field and exchange usage tips, scripts, programs and notes about mirroring the Genome Browser and working with the Genome Browser source. As with most wiki sites, general users are welcome to edit and add pages after logging in.

Saving and sharing Genome Browser sessions

Users can now save their favorite Genome Browser sessions for reuse and sharing by using a new session management feature, accessible via the ‘Session’ link (http://genome.ucsc.edu/cgi-bin/hgSession) on the GBD home page and the blue navigation bar at the top of many of the tool web pages. Log-in access to the session features is controlled through the Genome Browser wiki site. Once logged in, the user can save the current Genome Browser session, including the exact position and track combination on display, share the session with another user or keep it private and load one's own saved sessions as well as those shared by others. Saved sessions persist for 1 year after the last access, unless deleted. Custom tracks within sessions persist for at least 48 h after the last time they are viewed.

Managing custom tracks

Custom annotation tracks, a popular Genome Browser feature for several years, allow users to load, display and manipulate personal data in the Genome Browser and Table Browser. The new custom tracks manager (http://genome.ucsc.edu/cgi-bin/hgCustom) makes the use of custom tracks much easier. The management interface can be accessed through the ‘add/manage custom tracks’ button on the Genome Browser gateway (http://genome.ucsc.edu/cgi-bin/hgGateway) or tracks (http://genome.ucsc.edu/cgi-bin/hgTracks) page. In addition to the data upload options supported in previous versions, users can now load and display multiple custom tracks simultaneously, add to, delete and modify the uploaded custom track set, load and manage tracks from multiple assemblies, and upload description pages for custom tracks. The lifespan of a custom track on the UCSC server has been increased from 8 to 48 h after last access, and we have converted the underlying custom track architecture from a file-based system to a database system to improve the performance.

New display options

We have added several new user-configurable display options to the Genome Browser that expand navigation within gene, mRNA and EST-based tracks and allow increased manipulation of the tracks image and track control groups. These options are controlled on the browser configuration page, which is accessed through the ‘configure’ button on the Genome Browser gateway or tracks page. When the ‘Next/previous item navigation’ configuration option is toggled on, gray double-headed arrows display in the Genome Browser tracks image on both sides of the track labels of gene, mRNA and EST tracks (or any standard tracks based on BED, PSL or genePred format). The image window may be shifted to display the next track feature towards the 5' or 3' end of the chromosome by clicking the corresponding left or right arrow. Similarly, the ‘Next/previous exon navigation’ configuration option displays white double-headed arrows on the both 5' and 3' end of each track item that has exons positioned beyond the edges of the current image. Clicking on one of the arrows shifts the image window to the next exon located towards that end of the feature. Another new configuration option—‘Enable track reordering’—allows the user to change the display order of the track groups as well as the order of annotation tracks within the groups, and to move tracks between track groups. This is particularly useful for customizing the browser display to individual research needs or for creating images for publication. Track groups on the Genome Browser tracks page may now be quickly collapsed and expanded by clicking the + and − icons on the left side of the group label. We have expanded the coloring and display options for mRNA tracks to include several ways to highlight gaps in alignments of query sequences (usually transcripts) to the genome, which frequently indicate a problem with the query sequence or with the genome assembly. Tracks may be colored by genomic, mRNA or nonsynonymous mRNA codons, mRNA bases, or different mRNA bases. Tracks may then be configured, through options on the description page, to display double horizontal lines at locations where both the genome and query sequence have an insertion and to display vertical lines of different colors to distinguish poly(A) tail insertions and insertions at the beginning, end or middle of the query (Figure 2). The new coloring scheme makes it easier to visually scan a region with hundreds of alignments and pick out regions of interest (Figure 3). The new display options are explained in detail on the mRNA track description pages.

Figure 2.

Figure 3.

A zoomed-out view of human mRNA alignments in the chrX:153 229 581–153 304 741 region using ‘squish’ display mode and configured to show nonsynonymous codon differences between the human mRNAs and the genomic sequence. This view is useful for quickly scanning for mRNAs that are free of nonsynonymous regions (i.e. are all-black in color) and have a valid poly(A) tail (green vertical bar).

A zoomed-in view of the human and non-human mRNA tracks in the chrX:151 572 101–151 572 240 region on the human March 2006 (Build 36, hg18) genome assembly. In both mRNA tracks, the mRNA coloring options are configured to show nonsynonymous codon differences between the mRNA alignments and the genomic feature at the top of the figure. Red indicates codons that differ from the human genomic sequence. The double horizontal lines in the ‘Non-Human mRNAs’ track highlight areas in which both the mRNA and the genome sequence have an insertion or stretch of non-matching sequence. Note the blue vertical line at the beginning of the bottom Rattus alignment, indicating an insertion at the beginning of the query sequence, also the orange vertical lines with partial peptide insertions in the Rattus alignments. A zoomed-out view of human mRNA alignments in the chrX:153 229 581–153 304 741 region using ‘squish’ display mode and configured to show nonsynonymous codon differences between the human mRNAs and the genomic sequence. This view is useful for quickly scanning for mRNAs that are free of nonsynonymous regions (i.e. are all-black in color) and have a valid poly(A) tail (green vertical bar). To visually simplify the display of the large number of similar annotation tracks present in the ENCODE track groups, collections of related tracks are now represented by a single ‘super-track’ control that provides a descriptive overview of the group and lets the user control display characteristics of the entire track set on one page. For example, the Yale ChIP-chip super-track control provides information and access for seven related Yale ChIP-chip annotations.

FUTURE DIRECTIONS

In the upcoming year, UCSC will continue to extend the GBD to include more species and assembly updates—focusing on primates, model organisms and species of critical importance to evolutionary studies—and more annotation data. We may also provide browser access to several of the low-coverage (2×) assemblies currently included in our Conservation tracks. Following the release of UCSC Genes data for the latest mouse assembly in Fall 2007, updates to both the human and the mouse UCSC Genes annotation will be offered ∼3–4 times per year. We will continue to expand our collection of human variation, disease-related, expression and genome-wide association data, and plan to explore the incorporation of federated data into the browser. Among the enhancements to our display and data-mining tools, we plan to facilitate the use of custom tracks in the Table Browser, extend display features such as the next/previous item navigation to a larger range of tracks, and add a user-annotated wiki track.

40 in total

1. dbSNP: the NCBI database of genetic variation.

Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. BLAT--the BLAST-like alignment tool.

Authors: W James Kent
Journal: Genome Res Date: 2002-04 Impact factor: 9.043

3. The human genome browser at UCSC.

Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal: Genome Res Date: 2002-06 Impact factor: 9.043

4. The UCSC Table Browser data retrieval tool.

Authors: Donna Karolchik; Angela S Hinrichs; Terrence S Furey; Krishna M Roskin; Charles W Sugnet; David Haussler; W James Kent
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

5. Detection of large-scale variation in the human genome.

Authors: A John Iafrate; Lars Feuk; Miguel N Rivera; Marc L Listewnik; Patricia K Donahoe; Ying Qi; Stephen W Scherer; Charles Lee
Journal: Nat Genet Date: 2004-08-01 Impact factor: 38.330

6. Aligning multiple genomic sequences with the threaded blockset aligner.

Authors: Mathieu Blanchette; W James Kent; Cathy Riemer; Laura Elnitski; Arian F A Smit; Krishna M Roskin; Robert Baertsch; Kate Rosenbloom; Hiram Clawson; Eric D Green; David Haussler; Webb Miller
Journal: Genome Res Date: 2004-04 Impact factor: 9.043

7. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

Authors: Ewan Birney; John A Stamatoyannopoulos; Anindya Dutta; Roderic Guigó; Thomas R Gingeras; Elliott H Margulies; Zhiping Weng; Michael Snyder; Emmanouil T Dermitzakis; Robert E Thurman; Michael S Kuehn; Christopher M Taylor; Shane Neph; Christoph M Koch; Saurabh Asthana; Ankit Malhotra; Ivan Adzhubei; Jason A Greenbaum; Robert M Andrews; Paul Flicek; Patrick J Boyle; Hua Cao; Nigel P Carter; Gayle K Clelland; Sean Davis; Nathan Day; Pawandeep Dhami; Shane C Dillon; Michael O Dorschner; Heike Fiegler; Paul G Giresi; Jeff Goldy; Michael Hawrylycz; Andrew Haydock; Richard Humbert; Keith D James; Brett E Johnson; Ericka M Johnson; Tristan T Frum; Elizabeth R Rosenzweig; Neerja Karnani; Kirsten Lee; Gregory C Lefebvre; Patrick A Navas; Fidencio Neri; Stephen C J Parker; Peter J Sabo; Richard Sandstrom; Anthony Shafer; David Vetrie; Molly Weaver; Sarah Wilcox; Man Yu; Francis S Collins; Job Dekker; Jason D Lieb; Thomas D Tullius; Gregory E Crawford; Shamil Sunyaev; William S Noble; Ian Dunham; France Denoeud; Alexandre Reymond; Philipp Kapranov; Joel Rozowsky; Deyou Zheng; Robert Castelo; Adam Frankish; Jennifer Harrow; Srinka Ghosh; Albin Sandelin; Ivo L Hofacker; Robert Baertsch; Damian Keefe; Sujit Dike; Jill Cheng; Heather A Hirsch; Edward A Sekinger; Julien Lagarde; Josep F Abril; Atif Shahab; Christoph Flamm; Claudia Fried; Jörg Hackermüller; Jana Hertel; Manja Lindemeyer; Kristin Missal; Andrea Tanzer; Stefan Washietl; Jan Korbel; Olof Emanuelsson; Jakob S Pedersen; Nancy Holroyd; Ruth Taylor; David Swarbreck; Nicholas Matthews; Mark C Dickson; Daryl J Thomas; Matthew T Weirauch; James Gilbert; Jorg Drenkow; Ian Bell; XiaoDong Zhao; K G Srinivasan; Wing-Kin Sung; Hong Sain Ooi; Kuo Ping Chiu; Sylvain Foissac; Tyler Alioto; Michael Brent; Lior Pachter; Michael L Tress; Alfonso Valencia; Siew Woh Choo; Chiou Yu Choo; Catherine Ucla; Caroline Manzano; Carine Wyss; Evelyn Cheung; Taane G Clark; James B Brown; Madhavan Ganesh; Sandeep Patel; Hari Tammana; Jacqueline Chrast; Charlotte N Henrichsen; Chikatoshi Kai; Jun Kawai; Ugrappa Nagalakshmi; Jiaqian Wu; Zheng Lian; Jin Lian; Peter Newburger; Xueqing Zhang; Peter Bickel; John S Mattick; Piero Carninci; Yoshihide Hayashizaki; Sherman Weissman; Tim Hubbard; Richard M Myers; Jane Rogers; Peter F Stadler; Todd M Lowe; Chia-Lin Wei; Yijun Ruan; Kevin Struhl; Mark Gerstein; Stylianos E Antonarakis; Yutao Fu; Eric D Green; Ulaş Karaöz; Adam Siepel; James Taylor; Laura A Liefer; Kris A Wetterstrand; Peter J Good; Elise A Feingold; Mark S Guyer; Gregory M Cooper; George Asimenos; Colin N Dewey; Minmei Hou; Sergey Nikolaev; Juan I Montoya-Burgos; Ari Löytynoja; Simon Whelan; Fabio Pardi; Tim Massingham; Haiyan Huang; Nancy R Zhang; Ian Holmes; James C Mullikin; Abel Ureta-Vidal; Benedict Paten; Michael Seringhaus; Deanna Church; Kate Rosenbloom; W James Kent; Eric A Stone; Serafim Batzoglou; Nick Goldman; Ross C Hardison; David Haussler; Webb Miller; Arend Sidow; Nathan D Trinklein; Zhengdong D Zhang; Leah Barrera; Rhona Stuart; David C King; Adam Ameur; Stefan Enroth; Mark C Bieda; Jonghwan Kim; Akshay A Bhinge; Nan Jiang; Jun Liu; Fei Yao; Vinsensius B Vega; Charlie W H Lee; Patrick Ng; Atif Shahab; Annie Yang; Zarmik Moqtaderi; Zhou Zhu; Xiaoqin Xu; Sharon Squazzo; Matthew J Oberley; David Inman; Michael A Singer; Todd A Richmond; Kyle J Munn; Alvaro Rada-Iglesias; Ola Wallerman; Jan Komorowski; Joanna C Fowler; Phillippe Couttet; Alexander W Bruce; Oliver M Dovey; Peter D Ellis; Cordelia F Langford; David A Nix; Ghia Euskirchen; Stephen Hartman; Alexander E Urban; Peter Kraus; Sara Van Calcar; Nate Heintzman; Tae Hoon Kim; Kun Wang; Chunxu Qu; Gary Hon; Rosa Luna; Christopher K Glass; M Geoff Rosenfeld; Shelley Force Aldred; Sara J Cooper; Anason Halees; Jane M Lin; Hennady P Shulha; Xiaoling Zhang; Mousheng Xu; Jaafar N S Haidar; Yong Yu; Yijun Ruan; Vishwanath R Iyer; Roland D Green; Claes Wadelius; Peggy J Farnham; Bing Ren; Rachel A Harte; Angie S Hinrichs; Heather Trumbower; Hiram Clawson; Jennifer Hillman-Jackson; Ann S Zweig; Kayla Smith; Archana Thakkapallayil; Galt Barber; Robert M Kuhn; Donna Karolchik; Lluis Armengol; Christine P Bird; Paul I W de Bakker; Andrew D Kern; Nuria Lopez-Bigas; Joel D Martin; Barbara E Stranger; Abigail Woodroffe; Eugene Davydov; Antigone Dimas; Eduardo Eyras; Ingileif B Hallgrímsdóttir; Julian Huppert; Michael C Zody; Gonçalo R Abecasis; Xavier Estivill; Gerard G Bouffard; Xiaobin Guan; Nancy F Hansen; Jacquelyn R Idol; Valerie V B Maduro; Baishali Maskeri; Jennifer C McDowell; Morgan Park; Pamela J Thomas; Alice C Young; Robert W Blakesley; Donna M Muzny; Erica Sodergren; David A Wheeler; Kim C Worley; Huaiyang Jiang; George M Weinstock; Richard A Gibbs; Tina Graves; Robert Fulton; Elaine R Mardis; Richard K Wilson; Michele Clamp; James Cuff; Sante Gnerre; David B Jaffe; Jean L Chang; Kerstin Lindblad-Toh; Eric S Lander; Maxim Koriabine; Mikhail Nefedov; Kazutoyo Osoegawa; Yuko Yoshinaga; Baoli Zhu; Pieter J de Jong
Journal: Nature Date: 2007-06-14 Impact factor: 49.962

8. The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).

Authors: Daniela S Gerhard; Lukas Wagner; Elise A Feingold; Carolyn M Shenmen; Lynette H Grouse; Greg Schuler; Steven L Klein; Susan Old; Rebekah Rasooly; Peter Good; Mark Guyer; Allison M Peck; Jeffery G Derge; David Lipman; Francis S Collins; Wonhee Jang; Steven Sherry; Mike Feolo; Leonie Misquitta; Eduardo Lee; Kirill Rotmistrovsky; Susan F Greenhut; Carl F Schaefer; Kenneth Buetow; Tom I Bonner; David Haussler; Jim Kent; Mark Kiekhaus; Terry Furey; Michael Brent; Christa Prange; Kirsten Schreiber; Nicole Shapiro; Narayan K Bhat; Ralph F Hopkins; Florence Hsie; Tom Driscoll; M Bento Soares; Tom L Casavant; Todd E Scheetz; Michael J Brown-stein; Ted B Usdin; Shiraki Toshiyuki; Piero Carninci; Yulan Piao; Dawood B Dudekula; Minoru S H Ko; Koichi Kawakami; Yutaka Suzuki; Sumio Sugano; C E Gruber; M R Smith; Blake Simmons; Troy Moore; Richard Waterman; Stephen L Johnson; Yijun Ruan; Chia Lin Wei; S Mathavan; Preethi H Gunaratne; Jiaqian Wu; Angela M Garcia; Stephen W Hulyk; Edwin Fuh; Ye Yuan; Anna Sneed; Carla Kowis; Anne Hodgson; Donna M Muzny; John McPherson; Richard A Gibbs; Jessica Fahey; Erin Helton; Mark Ketteman; Anuradha Madan; Stephanie Rodrigues; Amy Sanchez; Michelle Whiting; Anup Madari; Alice C Young; Keith D Wetherby; Steven J Granite; Peggy N Kwong; Charles P Brinkley; Russell L Pearson; Gerard G Bouffard; Robert W Blakesly; Eric D Green; Mark C Dickson; Alex C Rodriguez; Jane Grimwood; Jeremy Schmutz; Richard M Myers; Yaron S N Butterfield; Malachi Griffith; Obi L Griffith; Martin I Krzywinski; Nancy Liao; Ryan Morin; Ryan Morrin; Diana Palmquist; Anca S Petrescu; Ursula Skalska; Duane E Smailus; Jeff M Stott; Angelique Schnerch; Jacqueline E Schein; Steven J M Jones; Robert A Holt; Agnes Baross; Marco A Marra; Sandra Clifton; Kathryn A Makowski; Stephanie Bosak; Joel Malek
Journal: Genome Res Date: 2004-10 Impact factor: 9.043

9. Large-scale copy number polymorphism in the human genome.

Authors: Jonathan Sebat; B Lakshmi; Jennifer Troge; Joan Alexander; Janet Young; Pär Lundin; Susanne Månér; Hillary Massa; Megan Walker; Maoyen Chi; Nicholas Navin; Robert Lucito; John Healy; James Hicks; Kenny Ye; Andrew Reiner; T Conrad Gilliam; Barbara Trask; Nick Patterson; Anders Zetterberg; Michael Wigler
Journal: Science Date: 2004-07-23 Impact factor: 47.728

10. A genome annotation-driven approach to cloning the human ORFeome.

Authors: John E Collins; Charmain L Wright; Carol A Edwards; Matthew P Davis; James A Grinham; Charlotte G Cole; Melanie E Goward; Begoña Aguado; Meera Mallya; Younes Mokrab; Elizabeth J Huckle; David M Beare; Ian Dunham
Journal: Genome Biol Date: 2004-09-30 Impact factor: 13.583

316 in total

1. Integrative epigenomic and genomic analysis of malignant pheochromocytoma.

Authors: Johanna Sandgren; Robin Andersson; Alvaro Rada-Iglesias; Stefan Enroth; Goran Akerstrom; Jan P Dumanski; Jan Komorowski; Gunnar Westin; Claes Wadelius
Journal: Exp Mol Med Date: 2010-07-31 Impact factor: 8.718

2. Loss of Dicer in Sertoli cells has a major impact on the testicular proteome of mice.

Authors: Marilena D Papaioannou; Mélanie Lagarrigue; Charles E Vejnar; Antoine D Rolland; Françoise Kühne; Florence Aubry; Olivier Schaad; Alexandre Fort; Patrick Descombes; Marguerite Neerman-Arbez; Florian Guillou; Evgeny M Zdobnov; Charles Pineau; Serge Nef
Journal: Mol Cell Proteomics Date: 2010-05-13 Impact factor: 5.911