| Literature DB >> 29300846 |
Hamza Khan1, Hamid Mohamadi1, Benjamin P Vandervalk1, Rene L Warren1, Justin Chu1, Inanc Birol1.
Abstract
Motivation: Sequencing studies on non-model organisms often interrogate both genomes and transcriptomes with massive amounts of short sequences. Such studies require de novo analysis tools and techniques, when the species and closely related species lack high quality reference resources. For certain applications such as de novo annotation, information on putative exons and alternative splicing may be desirable.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29300846 PMCID: PMC5946899 DOI: 10.1093/bioinformatics/btx839
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.ChopStitch workflow: After constructing the genomic Bloom filter, ChopStitch interrogates transcript sequences to find putative exons. It then finds exons with overlapping edges and constructs a splicegraph in DOT format. Graphviz ccomps is used to find sub-graphs. ChopStitch also detects putative exons smaller than the size of k-mer as illustrated in the figure: The stretch of absent k-mers is greater than k-1. The 3-sided arrows show the scrutiny process towards the beginning and end of the absent k-mer stretch
Fig. 2.Constructing the splice graph DOT file: In the above figure, T and E represent the transcript and exon ids belonging to different protein coding transcript isoforms. k-mers are shown as colored lines towards the edge of the putative exon. Putative exons which share the same k-mer spectra are shown by the same color and are represented as a single node in the DOT file with their headers concatenated with the string ‘_OR_’ (Color version of this figure is available at Bioinformatics online.)
Dataset specification
| Organism | Library strategy | Accession ID | Read length | # Reads |
|---|---|---|---|---|
| WGSS | ERR309932 | 250 bp | 228 489 950 000 | |
| WGSS | DRR008444 | 110 bp | 68 621 900 | |
| RNA-Seq | ERR356374 | 50 bp | 43 166 926 | |
| RNA-Seq | SRR2537190 | 50 bp | 23 983 224 |
Fig. 3.Performance of ChopStitch compared to state-of-the-art exon prediction software tools (Non-redundant analysis). (A) For each tool, the number of BLAST hits returned against non-redundant Ensembl exons at a query coverage and sequence identity of 95% and a length difference of 5 base pairs between the query and the subject, (B) Precision comparison, (C) Recall comparison, (D) Comparing of memory consumption in MB between tools, (E) Comparing of time in minutes between tools (ChopStitch.PP denotes the results obtained after running the post-processing step in ChopStitch. ChopStitch.All are the results obtained after running ChopStitch with the –allexons option. LEMONS-Merged show results from the LEMONS merged.xls file. LEMONS-SDB show LEMONS results while using the same species reference database)
Fig. 4.ChopStitch results for C.elegans on different pbf and sbf FPR values. Performance deteriorates with a small increase in compared to a much larger increase in
Evaluation results for ChopStitch splice graphs
| Organism | Evaluation approach | Precision |
|---|---|---|
| 0.97 | ||
| 0.95 | ||
| Reference-based | 0.94 | |
| Reference-based | 0.96 | |
| Annotation-based | 0.90 | |
| Annotation-based | 0.98 |