| Literature DB >> 21619643 |
Ancha Baranova1, Jonathan Bode, Ganiraju Manyam, Maria Emelianenko.
Abstract
BACKGROUND: The "off-target" silencing effect hinders the development of siRNA-based therapeutic and research applications. Existing solutions for finding possible locations of siRNA seats within a large database of genes are either too slow, miss a portion of the targets, or are simply not designed to handle a very large number of queries. We propose a new approach that reduces the computational time as compared to existing techniques.Entities:
Year: 2011 PMID: 21619643 PMCID: PMC3117723 DOI: 10.1186/1756-0500-4-168
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Illustration of a duplicate string concept. This example explains how one finds duplicate strings in a transcriptome consisting of 3 genes containing duplicate strings of length N = 5.
Figure 2Algorithm 1 flowchart. Algorithm 1 performs suffix tree-based calculation of "siRNA seats'' of length N with threshold n.
Figure 3Generation of the storage of all substrings. This example illustrates the steps of Algorithm 1 for the input consisting of 3 genes AGAGAGGC, TCAATCCC and AATAAATC. All of the corresponding n-strings are identified with the number of occurrences stored in the leaf. The list of unique n-strings is provided, and the "siRNA seats" resulting from this computation are specified.
The size and fill-in of the tree needed to store the initial dataset and the memory consumption
| N | Branches in Full Tree | Branches Used | % Branches Used | Full Tree Memory | Actual Tree Memory | Total Memory |
|---|---|---|---|---|---|---|
| 349525 | 349519 | 99.998% | 8 MB | 8 MB | 708 MB | |
| 1398101 | 1395271 | 99.798% | 32 MB | 31.94 MB | 731.94 MB | |
| 5592405 | 5366925 | 95.968% | 128 MB | 122.84 MB | 822.84 MB | |
| 22369621 | 17849905 | 79.795% | 512 MB | 408.55 MB | 1.08 GB | |
| 89478485 | 45717780 | 51.094% | 2 GB | 1.02 GB | 1.70 GB | |
| 357913941 | 88264307 | 24.661% | 8 GB | 1.97 GB | 2.65 GB | |
| 1431655765 | 138965433 | 9.707% | 32 GB | 3.11 GB | 3.79 GB | |
| 5726623061 | 192967338 | 3.370% | 128 GB | 4.31 GB | 5.00 GB |
Figure 4Algorithm efficiency comparison. Time taken to retrieve all unique strings of length n for the new algorithm (crosses) and the previously suggested CRM algorithm (circles) on a logarithmic scale.