| Literature DB >> 20494974 |
Barry Moore1, Guozhen Fan, Karen Eilbeck.
Abstract
The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.Entities:
Mesh:
Year: 2010 PMID: 20494974 PMCID: PMC2896117 DOI: 10.1093/nar/gkq426
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The input and output of the SOBA tool. (A) Small portion of a GFF3 file, including the column headings. (B–F) Screen shots of the output of SOBA. (B) The primary counts for each feature type per data source. (C) The simple statistics for the lengths of each feature including the mean, median and footprint of the feature on the genome. (D) A high-level view of all of the SO terms used in the genome annotation and the transitive i_sa relations back to the root node. A large format version of this panel is available at http://sequenceontology.org/resources/images/Figure1D.gif. (E) The distribution of intron density of protein coding genes (number of coding introns/length of polypeptide sequence). (F) An example of a sequence feature length distribution showing the distribution of lengths of annotated exons.