| Literature DB >> 19958494 |
Won-Hyoung Chung1, Seong-Bae Park.
Abstract
BACKGROUND: Oligonucleotide design is known as a time-consuming work in bioinformatics. In order to accelerate and be efficient the oligonucleotide design process, one of widely used approach is the prescreening unreliable regions using a hashing (or seeding) algorithm. Since the seeding algorithm is originally proposed to increase sensitivity for local alignment, the specificity should be considered as well as the sensitivity for the oligonucleotide design problem. However, a measure of evaluating the seeds regarding how adequate and efficient they are in the oligo design is not yet proposed. Here, we propose novel measures of evaluating the seeding algorithms based on the discriminability and the efficiency.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19958494 PMCID: PMC2788383 DOI: 10.1186/1471-2164-10-S3-S3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The illustration of the effect of seeding on oligo design. The oligos are selected from target sequences using a seed. T1, T2 and T3 are the target sequences. P1 and P2 are the matched oligos for an oligo P0, while S1, S2 and S3 are the matched hashes for S0 by a seed.
Figure 2The discriminability of the five seeding algorithms.
Figure 3The efficiency of the five seeding algorithms.
Figure 4The efficient discriminability of the five seeding algorithms.
Evaluation results for pmoA data set
| Efficient Discriminability | Discriminability | Efficiency | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Weight | Cont | Spaced | Trans | Cont | Spaced | Trans | Cont | Spaced | Trans |
| 7 | 0.09071 | 0.1027 | 0.1025 | 0.5341 | 0.5826 | 0.5848 | 0.06188 | 0.07246 | 0.07246 |
| 8 | 0.1067 | 0.1184 | 0.1167 | 0.6011 | 0.6443 | 0.6382 | 0.07627 | 0.08734 | 0.08568 |
| 9 | 0.122 | 0.1318 | 0.1295 | 0.659 | 0.6806 | 0.6728 | 0.09095 | 0.0999 | 0.09755 |
| 10 | 0.1335 | 0.1437 | 0.1439 | 0.6949 | 0.7161 | 0.7189 | 0.1023 | 0.112 | 0.1124 |
| 11 | 0.1447 | 0.1517 | 0.1532 | 0.7245 | 0.7317 | 0.7378 | 0.1135 | 0.1196 | 0.1214 |
| 12 | 0.1538 | 0.1611 | 0.1561 | 0.7447 | 0.756 | 0.738 | 0.1245 | 0.1295 | 0.1244 |
| 13 | 0.1638 | 0.1788 | 0.1752 | 0.7657 | 0.7997 | 0.7893 | 0.135 | 0.1503 | 0.146 |
| 14 | 0.174 | 0.1845 | 0.1875 | 0.7839 | 0.8129 | 0.8186 | 0.146 | 0.1591 | 0.1606 |
| 15 | 0.1597 | 0.2016 | 0.2016 | 0.7323 | 0.8374 | 0.8343 | 0.1496 | 0.1797 | 0.1791 |
| 16 | 0.1633 | 0.2043 | 0.2045 | 0.7356 | 0.8383 | 0.8392 | 0.1584 | 0.1887 | 0.1879 |
| 17 | 0.1679 | 0.2187 | 0.2161 | 0.7412 | 0.8697 | 0.8605 | 0.1676 | 0.2046 | 0.1998 |
| 18 | 0.1561 | 0.2259 | 0.229 | 0.6971 | 0.878 | 0.8857 | 0.1713 | 0.2144 | 0.2125 |
| 19 | 0.1562 | 0.2323 | 0.2269 | 0.6895 | 0.8697 | 0.8546 | 0.1794 | 0.2285 | 0.221 |
| 20 | 0.1622 | 0.2134 | 0.2148 | 0.6977 | 0.796 | 0.8044 | 0.1892 | 0.2349 | 0.2308 |
| 21 | 0.1575 | 0.2249 | 0.2223 | 0.6741 | 0.8119 | 0.8099 | 0.1955 | 0.2494 | 0.2444 |
| 22 | 0.1411 | 0.2085 | 0.208 | 0.6153 | 0.7535 | 0.7527 | 0.1976 | 0.2514 | 0.2486 |
| 23 | 0.1414 | 0.1998 | 0.2004 | 0.6087 | 0.7056 | 0.7085 | 0.2071 | 0.259 | 0.2616 |
| 24 | 0.1421 | 0.2209 | 0.2168 | 0.6028 | 0.7285 | 0.7119 | 0.2163 | 0.2855 | 0.2936 |
| 25 | 0.1318 | 0.2313 | 0.2216 | 0.5627 | 0.7386 | 0.7069 | 0.2188 | 0.3029 | 0.2995 |
Cont indicates the continuous seed type, Spaced indicates the spaced seed type, and Trans indicates the transition-constrained seed type.
Evaluation results for nirS data set
| Efficient Discriminability | Discriminability | Efficiency | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Weight | Cont | Spaced | Trans | Cont | Spaced | Trans | Cont | Spaced | Trans |
| 7 | 0.0493 | 0.05717 | 0.05845 | 0.2952 | 0.3239 | 0.3327 | 0.02892 | 0.03411 | 0.03505 |
| 8 | 0.07998 | 0.08818 | 0.08992 | 0.4637 | 0.4877 | 0.499 | 0.05206 | 0.05835 | 0.05991 |
| 9 | 0.1073 | 0.1186 | 0.1191 | 0.6056 | 0.6374 | 0.6399 | 0.07727 | 0.08782 | 0.08781 |
| 10 | 0.1263 | 0.1425 | 0.1415 | 0.6991 | 0.7474 | 0.7443 | 0.09885 | 0.1155 | 0.1147 |
| 11 | 0.1406 | 0.1506 | 0.1528 | 0.7632 | 0.7766 | 0.7884 | 0.1175 | 0.1275 | 0.1315 |
| 12 | 0.1425 | 0.1558 | 0.1538 | 0.7793 | 0.8008 | 0.7941 | 0.1329 | 0.1379 | 0.1364 |
| 13 | 0.1438 | 0.1657 | 0.1657 | 0.7866 | 0.8397 | 0.8396 | 0.1449 | 0.1607 | 0.1597 |
| 14 | 0.1429 | 0.1697 | 0.1629 | 0.7833 | 0.8517 | 0.8278 | 0.1549 | 0.1712 | 0.1691 |
| 15 | 0.1401 | 0.1627 | 0.1659 | 0.7702 | 0.8193 | 0.8306 | 0.163 | 0.1807 | 0.182 |
| 16 | 0.138 | 0.1608 | 0.1637 | 0.7581 | 0.8132 | 0.8231 | 0.1687 | 0.185 | 0.186 |
| 17 | 0.138 | 0.1631 | 0.1647 | 0.7533 | 0.8148 | 0.8216 | 0.1734 | 0.1902 | 0.1913 |
| 18 | 0.1315 | 0.1622 | 0.1643 | 0.7224 | 0.806 | 0.8131 | 0.1754 | 0.193 | 0.1932 |
| 19 | 0.1299 | 0.1634 | 0.1639 | 0.711 | 0.7965 | 0.7985 | 0.178 | 0.1991 | 0.1987 |
| 20 | 0.1293 | 0.1513 | 0.1513 | 0.7037 | 0.7414 | 0.7419 | 0.1808 | 0.2003 | 0.2006 |
| 21 | 0.129 | 0.1536 | 0.1578 | 0.6972 | 0.7428 | 0.7569 | 0.1833 | 0.2041 | 0.2048 |
| 22 | 0.1284 | 0.1487 | 0.1491 | 0.6894 | 0.7169 | 0.719 | 0.185 | 0.2054 | 0.2057 |
| 23 | 0.1295 | 0.1504 | 0.151 | 0.6883 | 0.7014 | 0.7033 | 0.1873 | 0.2133 | 0.2136 |
| 24 | 0.1274 | 0.1538 | 0.1591 | 0.6747 | 0.6959 | 0.7036 | 0.1883 | 0.2193 | 0.2253 |
| 25 | 0.128 | 0.1496 | 0.1533 | 0.6714 | 0.6716 | 0.6727 | 0.1902 | 0.2224 | 0.2268 |
Cont indicates the continuous seed type, Spaced indicates the spaced seed type, and Trans indicates the transition-constrained seed type.