| Literature DB >> 24316575 |
Jennifer L Harrow1, Charles A Steward, Adam Frankish, James G Gilbert, Jose M Gonzalez, Jane E Loveland, Jonathan Mudge, Dan Sheppard, Mark Thomas, Stephen Trevanion, Laurens G Wilming.
Abstract
The Vertebrate Genome Annotation (VEGA) database (http://vega.sanger.ac.uk), initially designed as a community resource for browsing manual annotation of the human genome project, now contains five reference genomes (human, mouse, zebrafish, pig and rat). Its introduction pages have been redesigned to enable the user to easily navigate between whole genomes and smaller multi-species haplotypic regions of interest such as the major histocompatibility complex. The VEGA browser is unique in that annotation is updated via the Human And Vertebrate Analysis aNd Annotation (HAVANA) update track every 2 weeks, allowing single gene updates to be made publicly available to the research community quickly. The user can now access different haplotypic subregions more easily, such as those from the non-obese diabetic mouse, and display them in a more intuitive way using the comparative tools. We also highlight how the user can browse manually annotated updated patches from the Genome Reference Consortium (GRC).Entities:
Mesh:
Year: 2013 PMID: 24316575 PMCID: PMC3964964 DOI: 10.1093/nar/gkt1241
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Redesigned VEGA home page and species landing pages. (A) New home page with complete genomes (1) separated from partial regions (2), and a new panel with alternative entry points to special data sets available in multiple genomes (3, 4). (B) New species landing page; human shown here. Easy access to statistics and examples (1), special data sets (2, 3) and updated annotation (4).
Biotypes available in VEGA, with a brief description of each
| Biotype | Description |
|---|---|
| Protein coding | |
| Polymorphic | At least one variant has a valid ORF and at least one coding variant contains a polymorphism (see ‘NOVEL GENE TRACKS IN VEGA: LOF AND KO’ section). |
| Protein coding (in progress) | ‘Zebrafish only’. Genome assembly issue causes loss of ORF; to be re-annotated on correct assembly. |
| lncRNA | Long non-coding RNA: lacks protein-coding potential and is >200 bp long. |
| Non-coding | Known from publications to be non-coding. |
| 3-Prime overlapping | Transcriptional start site and/or published experimental data support independent transcription from the 3′ UTR of a coding gene. |
| Antisense | At least one variant overlaps a protein-coding locus on the opposite strand, or antisense regulation of a coding gene has been published. |
| lincRNA | Long intergenic ncRNA: does not overlap (sense nor antisense) a coding gene. |
| Sense intronic | In an intron of a coding gene; no exonic overlap. |
| Sense overlapping | Contains a coding gene in an intron; no exonic overlap. |
| Pseudogene | ORF disrupted by frameshifts and/or premature stop codons. |
| Processed | Lacks introns and arose from retrotransposition of parent gene mRNA. |
| Unprocessed | Can contain introns and is produced by genomic duplication. |
| Transcribed | Locus-specific transcripts indicate transcription. These can be classified into ‘Processed’, ‘Unprocessed’ and ‘Unitary’. |
| Translated | Locus-specific protein mass spectroscopy data suggests translation. These can be classified into ‘Processed’ and ‘Unprocessed’. We maintain the connection with the pseudogene biotype until the experimental community validates it as a coding gene. |
| Polymorphic | Pseudogene owing to a SNP/DIP, but orthologous gene translated in other individuals/haplotypes/strains. |
| Unitary | Species-specific unprocessed pseudogene without a parent gene, which has an active orthologue in another species. |
| IG | Immunoglobulin pseudogene. |
| IG Gene | Immunoglobulin gene. |
| TR Gene | T-cell receptor gene. |
Figure 2.Viewing patches in VEGA. Searching VEGA for the human SLC37A4 gene yields two results (panel 2): one hit on the reference genome (top) and one on a patch (bottom). The top of the ‘Location’ page (panel 1) for the reference gene shows the location of patches on the chromosome, with the region shown in detail boxed in red. The detail view panel shows the location of the patch as two green lines in the ‘Assembly exceptions’ track (highlighted with an orange box left) and light green shading between them (highlighted with an orange box middle). The ‘Gene’ page (panel 3) for the reference gene shows the remark ‘reference genome error’ under the ‘Annotation Attributes’ section of the ‘Gene summary’ (highlighted with an orange box). Panels 4 and 5 show the difference in annotation of the same gene on the patch and reference, respectively. Note the lack of any CDS annotation on the reference gene.
Figure 3.Viewing different haplotypes in VEGA. (A) Searching for gene Vav3 in mouse produces two results: one in the C57BL/6J reference mouse and one in the DIL NOD mouse (1). Selecting the location view allows the user to view the ‘Region in detail’. Variation data are available under ‘Configure this page’ (2) and by subsequently selecting ‘Strain alignment’. Insertions and deletions between the two strains can be observed in the strain alignment track, with insertions relative to the reference shown as green blocks and deletions relative to the reference shown as red blocks as detailed in the ‘Alignment Differences’ legend (3). (B) VEGA can present alignment data either graphically via the ‘Alignments (image)’ and ‘Region comparison’ sections, or as text via the ‘Alignments (text)’ section. (C) To view variations at a nucleotide level between the two strains click on the ‘Region comparison’ panel in the left hand menu and then choose the ‘Select regions’ menu to add the appropriate region. The sequence for the DIL NOD mouse (bottom) can be aligned and visualized against the C57BL/6J reference strain (top). This particular view shows the most 3′ intron of Vav3. An insertion relative to the reference is shown in the middle of the display by a green block. The C57BL/6J mouse clearly has an extra C nucleotide with respect to the DIL NOD mouse. Regions of identity or similarity between the two strains are shaded in green.