| Literature DB >> 17064419 |
Oleksiy Kohany1, Andrew J Gentles, Lukasz Hankus, Jerzy Jurka.
Abstract
BACKGROUND: Repbase is a reference database of eukaryotic repetitive DNA, which includes prototypic sequences of repeats and basic information described in annotations. Updating and maintenance of the database requires specialized tools, which we have created and made available for use with Repbase, and which may be useful as a template for other curated databases.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17064419 PMCID: PMC1634758 DOI: 10.1186/1471-2105-7-474
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Data entry pages in RepbaseSubmitter.
| Select | Initialization page |
| Summary | Specification of entry Accession, Keywords, Definition, Comments |
| Sequence | Entry of sequence, calculation of DNA content and lengths |
| Organism | Source organism/taxonomy; classification based on current Repbase structure |
| Protein | Specification of coding regions: prediction of ORFs, annotation on DNA sequence, comments describing protein features/functions |
| References | Relevant references to primary literature or databases (Repbase or external such as Genbank, EMBL) |
| Release | Repbase release, relevant database accessions; consensus references |
| Submission | Display of final version prior to submission, perform final checks, submit to relational database for review |
The seven main forms presented by RepbaseSubmitter are listed with their title, and a summary of the information which can be entered.
Current Repbase schema for transposable element classification.
| Mariner, hAT, MuDR, EnSpm, piggyback, P, Merlin, Harbinger, Transib, Novosib, Mirage, Helitron, Polinton, Rehavkus | |
| Gypsy, Copia, DIRS, BEL | |
| ERV1, ERV2, ERV3 | |
| LINE1 (L1), RTE-1, CRE, CR1 (LINE3), I, Jockey, NeSL, R2, R4, Rex1, RandI, Penelope | |
| Satellites (SAT, MSAT) |
Repbase currently recognizes over 40 superfamilies of transposable/repetitive element. The major classes and superfamilies are listed here. The underlying relational database structure of Repbase allows easy addition and modification of the classification scheme, based on currently accepted conventions.
Figure 1Protein annotation entry form of RepbaseSubmitter. The protein prediction sub-window is also shown, showing how ORFs can be predicted and merged into a predicted protein for annotation on the nucleotide sequence. The bottom of the main window shows access buttons for each entry page of the program. RepbaseSubmitter is written in java, and can run on any system with an installed Java Virtual Machine of version 1.5 or above.
Figure 2Example of a repeat map, and graphical representation. Name contains locus names of submitted query sequences (first column) and library sequences (fourth column). Repbase names are hyperlinked to their sequences in web-based Censor. From/To contains beginning/end positions of reported fragments on their corresponding sequence. Dir indicates orientation ('d' for direct, 'c' for complementary) of repeat fragment. Column Sim contains the similarity between 2 aligned fragments, calculated as described in the text. Pos is roughly the ratio of positive matches (bases that produce positive scores in the alignment matrix) to alignment length. This ratio is calculated the same way as we calculate similarity (see main text), with positive_count instead of match_count. This information is particularly useful for estimating the quality of protein alignments. Score is the alignment score obtained from BLAST.