Literature DB >> 15608237

The Vertebrate Genome Annotation (Vega) database.

J L Ashurst¹, C-K Chen, J G R Gilbert, K Jekosch, S Keenan, P Meidl, S M Searle, J Stalker, R Storey, S Trevanion, L Wilming, T Hubbard.

Abstract

The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) has been designed to be a community resource for browsing manual annotation of finished sequences from a variety of vertebrate genomes. Its core database is based on an Ensembl-style schema, extended to incorporate curation-specific metadata. In collaboration with the genome sequencing centres, Vega attempts to present consistent high-quality annotation of the published human chromosome sequences. In addition, it is also possible to view various finished regions from other vertebrates, including mouse and zebrafish. Vega displays only manually annotated gene structures built using transcriptional evidence, which can be examined in the browser. Attempts have been made to standardize the annotation procedure across each vertebrate genome, which should aid comparative analysis of orthologues across the different finished regions.

Entities: Disease Gene Species

Mesh：

Year: 2005 PMID： 15608237 PMCID： PMC540089 DOI： 10.1093/nar/gki135

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

In 1999 the DNA sequence of chromosome 22, the first human chromosome to be fully sequenced, was published (1). It provided a snapshot of the complexity of genes within a chromosomal landscape and set the standard for manual annotation, which the rest of the community was to follow. Yet as sequencing methods improved and researchers wanted to analyse unfinished, as well as finished, sequence data, new automated annotation methods were established and genome browsers such as Ensembl (2) and the UCSC Genome Browser (3) provided automatic genome annotation for the draft human genome assembly finished in 2001 (4). After the announcement of the finishing of the human genome in 2003, attention turned to producing a gold standard manually curated view of the human gene set. The Vertebrate Genome Annotation (Vega) database is specifically dedicated to the browsing and maintenance of manually annotated data. Initially designed to view the manual annotation produced by the Havana group at the Sanger Institute (http://www.sanger.ac.uk/HGP/havana/), the project has expanded to include the manual annotation from the major centres (including RIKEN, the Joint Genome Institute, Genoscope and Washington University Genome Sequencing Center) involved in the sequencing and annotation of the human genome. Currently, it contains the annotation for 10 human chromosomes (6, 7, 9, 10, 13, 14, 20, 22, X and Y), but as the public consortium aims to complete the publication of its analysis by the end of 2004, it is planned that Vega will contain the complete manual annotation of the human genome by the beginning of 2005. Manual annotation is currently more accurate at identifying splice variants, pseudogenes, polyadenylation [poly(A)] features, non-coding genes and complex gene arrangements and clusters than automated methods. At the time of writing, the Vega human database contains over 15 000 gene loci and approximately 29 500 transcripts. In addition, Vega contains manual annotation of other vertebrate species and it is possible to view small chromosomal regions, e.g. mouse Del36H (5) and non-contiguous finished clone annotation of zebrafish. Figure 1 represents an overview of the processes and software involved in producing the data shown in Vega.

Figure 1

The VEGA annotation pipeline. The pipeline shown here is for human. The automated analysis for other species has slight differences. The searches are run on our computer farm and stored in an Ensembl MySQL database using the Ensembl analysis pipeline system (20). Nearly all searches and prediction algorithms are run on repeat masked sequence, the exception being CpG island prediction [see cpgreport in the EMBOSS (21) application suite]. RepeatMasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html) is used to mask interspersed repeats, followed by TRF (22) to mask tandem repeats. Nucleotide sequence databases are searched with wuBLASTN (http://blast.wustl.edu), and significant hits are re-aligned to the unmasked genomic sequence using est2genome (23). The Uniprot protein database (http://www.uniprot.org) is searched with wuBLASTX, and the accession numbers of significant hits are looked up in the Pfam database (24). The hidden Markov models for Pfam protein domains are aligned against the genomic sequence using Genewise (25) to provide annotation of protein domains (Halfwise in the figure). We also run a number of ab initio prediction algorithms: genscan (26) and fgenesh (27) for genes, tRNAscan (28) to find tRNA genes and Eponine TSS (29), which predicts transcription start sites. The annotators use the Otterlace interface to create and edit genes, which are stored in the Otter database (13). Where predicted transcript structures from Ensembl are available these can be viewed from within the Otterlace interface and may be used as starting templates for gene curation. Annotation in the Otter database is submitted to the EMBL/GenBank/DDBJ nucleotide database. The database for the VEGA website is periodically created by a publishing process that involves the copying and reformatting of data from the Otter genes and automated pipeline databases.

GENE CLASSIFICATION AND STANDARDIZATION OF ANNOTATION

Since different research groups are performing high-quality manual annotation of different chromosomes, it has been essential to standardize a set of definitions to describe the annotation of different gene features. A common factor is that all annotated gene structures must be supported by transcriptional evidence, either from cDNA, expressed sequence tag (EST) or protein sequences. The following are the gene indices used in human chromosome 20 annotation (6) and adopted by the Vega database as standard: These definitions have also been used in the recent annotation of chromosome 14 together with an additional classification ‘predicted genes’. Genoscope used this new classification to describe a gene based on ab initio predictions for which at least one exon is covered by biological or similarity data (unspliced ESTs, mouse or Tetraodon genomes or expression data from Rosetta) (7). These predicted genes as well as putative genes provide targets for experimental validation (8). Immunoglobulin segments and pseudogenes found on chromosomes 22 (1) and 14 (7) have also been given unique tags. These classifications have been extended across all the species in Vega with the only exception being that the specific model organism databases, e.g. the Mouse Genome Database (MGD) (http://www.informatics.jax.org/) (9) and the Zebrafish Information Network (ZFIN) nomenclature database (http://zfin.org/zf_info/nomen.html) (10), are used as the point of reference for known genes in place of LocusLink (11). Known genes: identical to human cDNA or protein sequences identified by LocusLink ID in the LocusLink database (http://www.ncbi.nlm.nih.gov/LocusLink/). Novel genes: have an open reading frame (ORF) and are identical or homologous to known cDNAs (vertebrates) and/or proteins (all species). Novel transcripts: similar to novel genes but no ORF can be unambiguously assigned. Putative genes: homologous to spliced ESTs (vertebrates) but devoid of significant ORF/CDS. Pseudogenes: sequences homologous to proteins (over ≥50% of the subject length) with a disrupted CDS and for which an active gene can generally be found at another locus. Using correct gene nomenclature is an important method for maintaining consistency in an annotation database, especially when comparing haplotypes or syntenic regions. The annotation staff involved in the Vega project, therefore, interact closely with the nomenclature committees from the Human Genome Organisation (HUGO, HGNC) (12), ZFIN and MGD. If an approved symbol is not available for a gene locus, an interim internal identifier is used, which is usually in the format clonename.number, e.g. RP11-694B14.5. The locus and its associated transcripts and exons are also attributed stable, versioned database IDs (e.g. OTTHUMG00000017411, OTTHUMT00000046000), generated and tracked within the Otter database (see Figure 2). Whenever a gene locus is edited the version number will increase and the date of the change will be saved, allowing the user to find out when the annotation was last updated. Otter is an extended Ensembl database with an associated client/server system that is able to support interactive updating of annotation (13). The annotation stored in the Otter backend for the Vega database is either curated directly using Otterlace (a Perl/TK curation interface wrapped around Acedb) or via Otter XML uploads, such as from external groups. Multiple versions of any genome assembly can be stored in the system, with tools to migrate annotation to the latest assembly. Although the finished sequence is highly accurate (better than 1 base error in 10 000) over megabase regions, assemblies are frequently revised as chromosomal regions are finished, particularly in regions of genome duplication and frequently in conjunction with feedback from manual annotation. For reference sequences we can also expect assemblies to be revised as re-sequencing reveals a more common haplotype.

Figure 2

Curated Locus Report giving information about the PAX2 locus on chromosome 10.

In an attempt to define a common standard for manual annotation across the human genome, collaborators involved in submitting annotation data to Vega have held a series of human annotation workshops (HAWK) (http://www.sanger.ac.uk/HGP/havana/hawk.shtml; see http://www.sanger.ac.uk/HGP/havana/docs/guidelines.pdf). Currently, there are many different transcript structures available in different browsers for various loci. With the aim of producing a single gold standard gene set, NCBI, UCSC, Ensembl and the Sanger Institute have started a collaboration to analyse the human gene sets produced by RefSeq, Ensembl and Vega and to define a non-redundant set of protein coding transcripts (HCDSs) that all collaborators can agree on.

ADDITIONAL FEATURES IN Vega

Unlike most of the browsers currently available containing automated gene builds, the manually annotated data shown in Vega does not have a particular emphasis on displaying only coding transcripts. Of transcripts annotated within Vega, ∼50% have no ORF associated with them. There are many possible properties of these transcripts, such as their being non-coding RNAs, transcripts involved in nonsense mediated mRNA decay (possibly regulating coding genes) (14) or partial transcripts where the ORF has not yet been experimentally determined. Each transcript constructed has spliced evidence associated with it, which can be viewed in Vega, so the user can assess the validity of each transcript. In addition to coding genes, ∼30% of gene structures are pseudogenes. These have been subdivided into unprocessed and processed categories in the recently finished chromosomes, so the user can identify whether the pseudogene has arisen from a duplication event or retrotransposition. Polyadenylation sites and signals, identified manually by examining 3′ EST and cDNA data, are visible within the ContigView webpage of Vega (see Figure 3). The features are not associated with a particular transcript as it is difficult using 3′ EST data to associate a poly(A) feature with a particular alternative variant when they share the same 3′-untranslated region. Single nucleotide polymorphisms (SNPs) can also be viewed in ContigView and are mapped from the Glovar database (http://www.glovar.org/Homo_sapiens/) onto the clones within Vega. Glovar contains all the human dbSNP data in addition to SNPs derived from comparing public human reads from the trace repository (http://trace.ensembl.org/) with the current genome build. The functional classification of the SNPs (coding, untranslated region, Intronic, Other) is derived from mapping onto the Vega annotation. Currently SNPs are available only for the human chromosomes but they will eventually be available for all genomic sequences within Vega.

Figure 3

ContigView webpage from human chromosome 6 Vega displaying poly(A) signals/sites and SNPs associated with SLC29A1 and HSPCB loci.

In addition to displaying the latest working assembly for each chromosome from individual public human sequencing consortia, Vega contains the annotated haplotype sequence available for the major histocompatability complex (MHC) region on chromosome 6 (15) (see Figure 4). The two common HLA haplotypes, PG and CX, are strongly associated with auto-immune diseases including type 1 diabetes and multiple sclerosis. The two haplotypes differ in their complement component C4 genes and MHC class II HLA-DRB genes, and this comparison can be easily made within ContigView. In a future Vega release we are planning to use the Ensembl Compara genome comparison framework to allow us to support MultiContigView pages so that haplotypes can be examined in parallel and to facilitate browsing the four more human MHC haplotype regions that will be available at the end of the year.

Figure 4

Different chromosomes and regions annotated from the three different vertebrates currently available in Vega.

The mouse annotation browser within Vega, besides displaying finished regions from the reference mouse strain (C57BL/6), also displays three regions from the non-obese diabetic (Nod) strain (16) (see Figure 4). Since several insulin dependent diabetes (IDD) susceptibility loci have been mapped onto mouse chromosomes 1 and 3, comparison of the genomic sequence and genes between the two strains could be used to highlight functionally important SNPs (17). The zebrafish genome sequence will be finished and manually annotated solely by the Sanger Institute. Vega will be the main site for browsing the annotated data and at present there are 1164 loci, mostly from individual BAC and PAC clones. The genome is currently displayed in chromosomes/linkages groups 1–25 (see Figure 4). In addition, there are two ‘artificial’ chromosomes, U containing all the clones that could not be mapped onto the chromosomes yet, and AB containing clones from the AB strain. Clones that have not yet been annotated are displayed with all their features derived from automated computational analysis (repeat masking, ab initio gene predictions, BLAST searches, etc.) but are shaded in grey to avoid confusion with the annotated ones.

ACCESSING AND QUERYING DATA

The Vega browser, which is based on the Ensembl web code and infrastructure, provides a number of standard entry points such as sequence similarity (BLAST and SSAHA search) and keyword search. Data can be downloaded using ExportView, which can dump data in a variety of formats including FASTA, Gene Feature Format (GFF) and as flat files. Annotation can also be accessed directly via distributed annotation server (DAS) data sources. At present, we do not directly provide data mining via BioMart (http://www.ebi.ac.uk/biomart/) since Vega is designed to be updated weekly, which currently makes rebuilding BioMart impractical. It is possible to use EnsMart (18), available at Ensembl (http://www.ensembl.org/Multi/martview), to query a recent version of the gene structures from the Vega human database (which are displayed on the Ensembl website in ContigView). However, the Vega annotation shown in Ensembl would have been mapped from the latest chromosome assembly, upon which the annotation was curated and which is displayed in Vega, onto the current international genome assembly, which inevitably lags behind. If the assembly in Ensembl differs from that in Vega, only the annotation that can be cleanly transferred is present. For the informatician, a more comprehensive search of the Vega data can be performed using the Ensembl API (http://www.ensembl.org/Docs).

FEEDBACK AND SUBMITTING DATA

Vega is a community annotation database and feedback from researchers is essential to produce a gold standard annotation of the genomes available. Therefore, a webform is provided on the website (http://vega.sanger.ac.uk/helpdesk/index.html) to enable the user to contact the Vega team directly and improve the annotation if additional evidence is available. Since the browser is not restricted to annotation of whole genomes, we encourage users to contact vega@sanger.ac.uk to submit manual annotation of vertebrate finished regions they have sequenced, provided it has been peer reviewed and/or meets the HAWK standard for annotation.

FUTURE PLANS

We aim to have a fully manually annotated human genome available by the beginning of 2005. With community support, we hope to maintain the annotation and update on a weekly basis. We will also display the latest manual annotation of the regions as part of the ENCODE project (http://www.genome.gov/10005107) (19). In addition, annotated sequences from the mouse and zebrafish genomes, finished at the Sanger Institute or by the public sequencing centres, will be released on a chromosome basis in Vega. In collaboration with the MHC consortium we are also planning to release additional human MHC haplotypes, as well as MHC regions from dog, cat, pig and rat. Using the comparative analysis pipeline designed by the Ensembl team we are looking into producing comparative views for these data to enable the user to browse easily among different species.

29 in total

1. The DNA sequence of human chromosome 22.

Authors: I Dunham; N Shimizu; B A Roe; S Chissoe; A R Hunt; J E Collins; R Bruskiewich; D M Beare; M Clamp; L J Smink; R Ainscough; J P Almeida; A Babbage; C Bagguley; J Bailey; K Barlow; K N Bates; O Beasley; C P Bird; S Blakey; A M Bridgeman; D Buck; J Burgess; W D Burrill; K P O'Brien
Journal: Nature Date: 1999-12-02 Impact factor: 49.962

2. EMBOSS: the European Molecular Biology Open Software Suite.

Authors: P Rice; I Longden; A Bleasby
Journal: Trends Genet Date: 2000-06 Impact factor: 11.639

3. Initial sequencing and analysis of the human genome.

Authors: E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal: Nature Date: 2001-02-15 Impact factor: 49.962

4. The human genome browser at UCSC.

Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal: Genome Res Date: 2002-06 Impact factor: 9.043

5. Tandem repeats finder: a program to analyze DNA sequences.

Authors: G Benson
Journal: Nucleic Acids Res Date: 1999-01-15 Impact factor: 16.971

6. The DNA sequence and comparative analysis of human chromosome 20.

Authors: P Deloukas; L H Matthews; J Ashurst; J Burton; J G Gilbert; M Jones; G Stavrides; J P Almeida; A K Babbage; C L Bagguley; J Bailey; K F Barlow; K N Bates; L M Beard; D M Beare; O P Beasley; C P Bird; S E Blakey; A M Bridgeman; A J Brown; D Buck; W Burrill; A P Butler; C Carder; N P Carter; J C Chapman; M Clamp; G Clark; L N Clark; S Y Clark; C M Clee; S Clegg; V E Cobley; R E Collier; R Connor; N R Corby; A Coulson; G J Coville; R Deadman; P Dhami; M Dunn; A G Ellington; J A Frankland; A Fraser; L French; P Garner; D V Grafham; C Griffiths; M N Griffiths; R Gwilliam; R E Hall; S Hammond; J L Harley; P D Heath; S Ho; J L Holden; P J Howden; E Huckle; A R Hunt; S E Hunt; K Jekosch; C M Johnson; D Johnson; M P Kay; A M Kimberley; A King; A Knights; G K Laird; S Lawlor; M H Lehvaslaiho; M Leversha; C Lloyd; D M Lloyd; J D Lovell; V L Marsh; S L Martin; L J McConnachie; K McLay; A A McMurray; S Milne; D Mistry; M J Moore; J C Mullikin; T Nickerson; K Oliver; A Parker; R Patel; T A Pearce; A I Peck; B J Phillimore; S R Prathalingam; R W Plumb; H Ramsay; C M Rice; M T Ross; C E Scott; H K Sehra; R Shownkeen; S Sims; C D Skuce; M L Smith; C Soderlund; C A Steward; J E Sulston; M Swann; N Sycamore; R Taylor; L Tee; D W Thomas; A Thorpe; A Tracey; A C Tromans; M Vaudin; M Wall; J M Wallis; S L Whitehead; P Whittaker; D L Willey; L Williams; S A Williams; L Wilming; P W Wray; T Hubbard; R M Durbin; D R Bentley; S Beck; J Rogers
Journal: Nature Date: 2001 Dec 20-27 Impact factor: 49.962

7. NOD Idd5 locus controls insulitis and diabetes and overlaps the orthologous CTLA4/IDDM12 and NRAMP1 loci in humans.

Authors: N J Hill; P A Lyons; N Armitage; J A Todd; L S Wicker; L B Peterson
Journal: Diabetes Date: 2000-10 Impact factor: 9.461

8. Computational detection and location of transcription start sites in mammalian genomic DNA.

Authors: Thomas A Down; Tim J P Hubbard
Journal: Genome Res Date: 2002-03 Impact factor: 9.043

9. The Zebrafish Information Network (ZFIN): the zebrafish model organism database.

Authors: Judy Sprague; Dave Clements; Tom Conlin; Pat Edwards; Ken Frazer; Kevin Schaper; Erik Segerdell; Peiran Song; Brock Sprunger; Monte Westerfield
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

10. MGD: the Mouse Genome Database.

Authors: Judith A Blake; Joel E Richardson; Carol J Bult; Jim A Kadin; Janan T Eppig
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

85 in total

1. PathEx: a novel multi factors based datasets selector web tool.

Authors: Eric Bareke; Michael Pierre; Anthoula Gaigneaux; Bertrand De Meulder; Sophie Depiereux; Fabrice Berger; Naji Habra; Eric Depiereux
Journal: BMC Bioinformatics Date: 2010-10-22 Impact factor: 3.169

2. Whole-exome sequencing identifies compound heterozygous mutations in WDR62 in siblings with recurrent polymicrogyria.

Authors: David R Murdock; Gary D Clark; Matthew N Bainbridge; Irene Newsham; Yuan-Qing Wu; Donna M Muzny; Sau Wai Cheung; Richard A Gibbs; Melissa B Ramocki
Journal: Am J Med Genet A Date: 2011-08-10 Impact factor: 2.802

3. Validation of mRNA/EST-based gene predictions in human Xp11.4 revealed differences to the organization of the orthologous mouse locus.

Authors: Gaiping Wen; Juliane Ramser; Stefan Taudien; Ulrike Gausmann; Karin Blechschmidt; Adam Frankish; Jennifer Ashurst; Alfons Meindl; Matthias Platzer
Journal: Mamm Genome Date: 2005-12-08 Impact factor: 2.957

4. Iterative gene prediction and pseudogene removal improves genome annotation.

Authors: Marijke J van Baren; Michael R Brent
Journal: Genome Res Date: 2006-05 Impact factor: 9.043

5. Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome.

Authors: Olof Emanuelsson; Ugrappa Nagalakshmi; Deyou Zheng; Joel S Rozowsky; Alexander E Urban; Jiang Du; Zheng Lian; Viktor Stolc; Sherman Weissman; Michael Snyder; Mark B Gerstein
Journal: Genome Res Date: 2006-11-21 Impact factor: 9.043

6. The disparate nature of "intergenic" polyadenylation sites.

Authors: Fabrice Lopez; Samuel Granjeaud; Takeshi Ara; Badih Ghattas; Daniel Gautheret
Journal: RNA Date: 2006-08-24 Impact factor: 4.942

Review 7. Online resources for SNP analysis: a review and route map.

Authors: Christopher Phillips
Journal: Mol Biotechnol Date: 2007-01 Impact factor: 2.695

8. The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.

Authors: Kim D Pruitt; Jennifer Harrow; Rachel A Harte; Craig Wallin; Mark Diekhans; Donna R Maglott; Steve Searle; Catherine M Farrell; Jane E Loveland; Barbara J Ruef; Elizabeth Hart; Marie-Marthe Suner; Melissa J Landrum; Bronwen Aken; Sarah Ayling; Robert Baertsch; Julio Fernandez-Banet; Joshua L Cherry; Val Curwen; Michael Dicuccio; Manolis Kellis; Jennifer Lee; Michael F Lin; Michael Schuster; Andrew Shkeda; Clara Amid; Garth Brown; Oksana Dukhanina; Adam Frankish; Jennifer Hart; Bonnie L Maidak; Jonathan Mudge; Michael R Murphy; Terence Murphy; Jeena Rajan; Bhanu Rajput; Lillian D Riddick; Catherine Snow; Charles Steward; David Webb; Janet A Weber; Laurens Wilming; Wenyu Wu; Ewan Birney; David Haussler; Tim Hubbard; James Ostell; Richard Durbin; David Lipman
Journal: Genome Res Date: 2009-06-04 Impact factor: 9.043

9. Targeted discovery of novel human exons by comparative genomics.

Authors: Adam Siepel; Mark Diekhans; Brona Brejová; Laura Langton; Michael Stevens; Charles L G Comstock; Colleen Davis; Brent Ewing; Shelly Oommen; Christopher Lau; Hung-Chun Yu; Jianfeng Li; Bruce A Roe; Phil Green; Daniela S Gerhard; Gary Temple; David Haussler; Michael R Brent
Journal: Genome Res Date: 2007-11-07 Impact factor: 9.043

10. The UCSC Genome Browser.

Authors: Donna Karolchik; Angie S Hinrichs; W James Kent
Journal: Curr Protoc Bioinformatics Date: 2009-12