| Literature DB >> 17488839 |
Chun Liang1, Gang Wang, Lin Liu, Guoli Ji, Yuansheng Liu, Jinqiao Chen, Jason S Webb, Greg Reese, Jeffrey F D Dean.
Abstract
Expressed sequence tags (ESTs) remain a dominant approach for characterizing the protein-encoding portions of various genomes. Due to inherent deficiencies, they also present serious challenges for data quality control. Before GenBank submission, EST sequences are typically screened and trimmed of vector and adapter/linker sequences, as well as polyA/T tails. Removal of these sequences presents an obstacle for data validation of error-prone ESTs and impedes data mining of certain functional motifs, whose detection relies on accurate annotation of positional information for polyA tails added posttranscriptionally. As raw DNA sequence information is made increasingly available from public repositories, such as NCBI Trace Archive, new tools will be necessary to reanalyze and mine this data for new information. WebTraceMiner (www.conifergdb.org/software/wtm) was designed as a public sequence processing service for raw EST traces, with a focus on detection and mining of sequence features that help characterize 3' and 5' termini of cDNA inserts, including vector fragments, adapter/linker sequences, insert-flanking restriction endonuclease recognition sites and polyA or polyT tails. WebTraceMiner complements other public EST resources and should prove to be a unique tool to facilitate data validation and mining of error-prone ESTs (e.g. discovery of new functional motifs).Entities:
Mesh:
Substances:
Year: 2007 PMID: 17488839 PMCID: PMC1933163 DOI: 10.1093/nar/gkm299
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Example of a canonical model for cDNA library construction. In some cases, a second adapter/linker sequence (Adapter2) might be applied between the polyA/T tails and Restriction site2.
Figure 2.WebTraceMiner Web Interfaces. Panel (A): The configuration interface allows users to save customized information for how to process EST traces. For general users, the only required information is entered into the ‘Adapter and Restriction Enzyme Site’ section to define a specific canonical model for directional cDNA library construction, such as that illustrated in Figure 1. Panel (B): After processing, resultant data are available as four views—‘Tabulated View’, ‘XML View’, ‘Fasta View’ and ‘Color-coded View’. With satisfactory processing, users can proceed with optional database integration by clicking the database icon. Panel (C): Tabulated results are available from ‘Tabulated View’. The results can be sorted by clicking relevant column heads. They can also be searched and filtered using the expandable ‘Sequence Filter’ on the top. By clicking a sequence name link, a new panel, such as that shown in Panel (D), will pop up automatically. Panel (D): An EST sequence comprising a complete cDNA insert with in silico authenticated 3′ and 5′ termini. Both the SVG graph and the color-coded sequence read can be redrawn by applying different parameters available in the ‘Sequence Color View Control Panel’.