| Literature DB >> 15980537 |
Abstract
Work on a large number of biological problems benefits tremendously from having an easy way to access the annotation of DNA sequence features, such as intron/exon structure, the contents of promoter regions and the location of other genes in upsteam and downstream regions. For example, taking the placement of introns within a gene into account can help in a phylogenetic analysis of homologous genes. Designing experiments for investigating UTR regions using PCR or DNA microarrays require knowledge of known elements in UTR regions and the positions and strandness of other genes nearby on the chromosome. A wealth of such information is already known and documented in databases such as GenBank and the NCBI Human Genome builds. However, it usually requires significant bioinformatics skills and intimate knowledge of the data format to access this information. Presented here is a highly flexible and easy-to-use tool for extracting feature annotation from GenBank entries. The tool is also useful for extracting datasets corresponding to a particular feature (e.g. promoters). Most importantly, the output data format is highly consistent, easy to handle for the user and easy to parse computationally. The FeatureExtract web server is freely available for both academic and commercial use at http://www.cbs.dtu.dk/services/FeatureExtract/.Entities:
Mesh:
Year: 2005 PMID: 15980537 PMCID: PMC1160149 DOI: 10.1093/nar/gki388
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Example output—overall file structure
| Line number | Name | Sequence | Annotation | Comment |
|---|---|---|---|---|
| 1 | ||||
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 5 | ||||
| 6 | ||||
| 7 | ||||
| 8 | ||||
| 9 | ||||
| 10 | ||||
| 11 |
Each line contains four tab-separated fields (Name, Sequence, Annotation and Comments) representing an individual feature. In this example the features extracted are protein coding genes (CDS) from the following GenBank entries: AB001981, X01831, J00923, J00043, J00044, X01086, X07053, AF098919. For readability the fields have been truncated after 20 letters.
Example output—field details
Detailed example of data extracted from the GenBank entry AB001981 (first CDS).