| Literature DB >> 20501601 |
Andreas Massouras1, Frederik Decouttere, Korneel Hens, Bart Deplancke.
Abstract
High-throughput sequencing (HTS) is revolutionizing our ability to obtain cheap, fast and reliable sequence information. Many experimental approaches are expected to benefit from the incorporation of such sequencing features in their pipeline. Consequently, software tools that facilitate such an incorporation should be of great interest. In this context, we developed WebPrInSeS, a web server tool allowing automated full-length clone sequence identification and verification using HTS data. WebPrInSeS encompasses two separate software applications. The first is WebPrInSeS-C which performs automated sequence verification of user-defined open-reading frame (ORF) clone libraries. The second is WebPrInSeS-E, which identifies positive hits in cDNA or ORF-based library screening experiments such as yeast one- or two-hybrid assays. Both tools perform de novo assembly using HTS data from any of the three major sequencing platforms. Thus, WebPrInSeS provides a highly integrated, cost-effective and efficient way to sequence-verify or identify clones of interest. WebPrInSeS is available at http://webprinses.epfl.ch/ and is open to all users.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20501601 PMCID: PMC2896179 DOI: 10.1093/nar/gkq431
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.WebPrInSeS-C Screenshot of the web interface after successfully launching WebPrInSeS-C. WebPrInSeS-C requires a reads file (reads.fastq in the example), a fasta file containing the flanking sequences (flanking.fa in the example) and fasta file containing the DNA reference sequences (reference.fa in the example) to start assembly. It outputs a tab-separated file containing summary information of the processing done by Maq and PrInSeS-C (PrInSeS-C decision report), a fasta file with the assembled sequences (PrInSeS-C fasta), an html file which visualizes sequences of interest as assembled by PrInSeS-C and aligned back to the reference (PrInSeS-C html) and a file containing the results of the Maq alignment and the subsequent data processing (PrInSeS-C Maq html).
Figure 2.The automated clone validation pipeline. (a) Diagram outlining the workflow. (b) The heuristic algorithm of the decision-making tool. Numbers on the branches indicate the number of clones from a sequenced Drosophila ORF collection that fall in each category (see 'A working example' section).
Figure 3.Visualization of the PrInSeS-C Maq html file and the PrInSeS-C html file. (Left) Example of a clone (CG7046-PA) that was referred for manual curation. In this case, it concerned an alternatively spliced transcript. The position where WebPrInSeS-C via assembly detected a 15-nt insertion is boxed. (Right) Example of an accepted clone (NC2beta-PA). (a) Visualization of the PrInSeS-C Maq html file for both examples. Yellow boxes mark a drop in read depth; cyan boxes mark recovery in read depth. Red colored sequences indicate regions of low read-depth from zero (dark red) to 25 (white)-fold coverage. Dark blue boxes highlight non-synonymous mutations, green boxes highlight synonymous mutations. Functional protein domains are highlighted by the presence of a line above the respective coding sequence as well as their ID. (b) Visualization of the PrInSeS-C html file for both sequences. Red boxes indicate SNPs or indels. The case number indicates a specific assembly scenario as described in the main text.