| Literature DB >> 30809545 |
Ameera M Almasoud1, Hend S Al-Khalifa1, Abdulmalik S Al-Salman1.
Abstract
In the field of biology, researchers need to compare genes or gene products using semantic similarity measures (SSM). Continuous data growth and diversity in data characteristics comprise what is called big data; current biological SSMs cannot handle big data. Therefore, these measures need the ability to control the size of big data. We used parallel and distributed processing by splitting data into multiple partitions and applied SSM measures to each partition; this approach helped manage big data scalability and computational problems. Our solution involves three steps: split gene ontology (GO), data clustering, and semantic similarity calculation. To test this method, split GO and data clustering algorithms were defined and assessed for performance in the first two steps. Three of the best SSMs in biology [Resnik, Shortest Semantic Differentiation Distance (SSDD), and SORA] are enhanced by introducing threaded parallel processing, which is used in the third step. Our results demonstrate that introducing threads in SSMs reduced the time of calculating semantic similarity between gene pairs and improved performance of the three SSMs. Average time was reduced by 24.51% for Resnik, 22.93%, for SSDD, and 33.68% for SORA. Total time was reduced by 8.88% for Resnik, 23.14% for SSDD, and 39.27% for SORA. Using these threaded measures in the distributed system, combined with using split GO and data clustering algorithms to split input data based on their similarity, reduced the average time more than did the approach of equally dividing input data. Time reduction increased with increasing number of splits. Time reduction percentage was 24.1%, 39.2%, and 66.6% for Threaded SSDD; 33.0%, 78.2%, and 93.1% for Threaded SORA in the case of 2, 3, and 4 slaves, respectively; and 92.04% for Threaded Resnik in the case of four slaves.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30809545 PMCID: PMC6369486 DOI: 10.1155/2019/6750296
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Section of GO graph showing biological process (BP), molecular function (MF), and cellular components (CC) and some of their descendants [16].
Figure 2Example of splitting gene ontology (GO) into four splits.
Figure 3Flowchart of GO splitting algorithm.
Figure 4Flowchart of the data clustering algorithm.
Figure 5A flowchart of testing the performance of enhanced SSMs.
Figure 6A flowchart of assessing the performance of enhanced SSMs in the distributed system.
Average time and IPs obtained using original and Threaded Resnik.
| Sample Size | Original Resnik Average Time (ns) | Threaded Resnik Average Time (ns) | Improvement Percentage (IP) |
|---|---|---|---|
| 10 | 56515 | 47490.44 | -15.97 |
| 100 | 26184.95 | 22534.24 | -13.94 |
| 1000 | 27907.82 | 21201.57 | -24.03 |
| 10000 | 16287.99 | 11133.88 | -31.64 |
| 100000 | 11844.27 | 7563.32 | -36.14 |
| 1000000 | 8273.15 | 6179.37 | -25.31 |
| Average | 24502.20 | 19350.47 | -24.51 |
Total time and IPs obtained using original and Threaded Resnik.
| Sample Size | Original Resnik Total Time (ns) | Threaded Resnik Total Time (ns) | Improvement Percentage (IP) |
|---|---|---|---|
| 10 | 2560906085 | 1776206977 | -30.64 |
| 100 | 5350898201 | 6506353894 | 21.59 |
| 1000 | 5224382582 | 2409982084 | -53.87 |
| 10000 | 2997898214 | 2691416975 | -10.22 |
| 100000 | 9417548254 | 10159782237 | 7.88 |
| 1000000 | 46988654302 | 52629878875 | 12.00 |
| Average | 12090047940 | 12695603507 | -8.88 |
Average time and IPs obtained using original and Threaded SSDD.
| Sample Size | Original SSDD Average Time (ns) | Threaded SSDD Average Time (ns) | Improvement Percentage (IP) |
|---|---|---|---|
| 10 | 2.92E+08 | 8.65E+07 | -70.38 |
| 100 | 1.32E+08 | 1.15E+08 | -12.70 |
| 1000 | 9.32E+07 | 8.99E+07 | -3.54 |
| 10000 | 4.62E+07 | 3.51E+07 | -23.92 |
| 100000 | 4.48E+07 | 4.30E+07 | -3.99 |
| 1000000 | 2.83E+07 | 2.18E+07 | -23.01 |
| Average | 106040305 | 65214046.13 | -22.93 |
Total time and IPs obtained using original and Threaded SSDD.
| Sample Size | Original SSDD Total Time (ns) | Threaded SSDD Total Time (ns) | Improvement Percentage (IP) |
|---|---|---|---|
| 10 | 3124423720 | 1063764634 | -65.95 |
| 100 | 14597347470 | 11920695608 | -18.34 |
| 1000 | 93710848961 | 90410186100 | -3.52 |
| 10000 | 4.63634E+11 | 3.5258E+11 | -23.95 |
| 100000 | 4.48611E+12 | 4.30634E+12 | -4.01 |
| 1000000 | 2.83292E+13 | 2.17894E+13 | -23.09 |
| Average | 5.56506E+12 | 4.42528E+12 | -23.14 |
Average time and IPs obtained using original and Threaded SORA.
| Sample Size | Original SORA Average Time (ns) | Threaded SORA Average Time (ns) | Improvement Percentage (IP) |
|---|---|---|---|
| 10 | 4.14E+07 | 2.08E+07 | -49.76 |
| 100 | 1.23E+08 | 7.71E+07 | -37.39 |
| 1000 | 1.11E+08 | 7.23E+07 | -34.75 |
| 10000 | 3.51E+09 | 3.06E+09 | -12.81 |
| 100000 | X | X | X |
| 1000000 | X | X | X |
| Average | 9.47E+08 | 8.08E+08 | -33.68 |
X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.
Total time and IPs obtained using original and Threaded SORA.
| Sample Size | Original SORA Total Time (ns) | Threaded SORA Total Time (ns) | Improvement Percentage (IP) |
|---|---|---|---|
| 10 | 1708984548 | 448755395 | -73.74 |
| 100 | 12714096406 | 7988678157 | -37.17 |
| 1000 | 1.11216E+11 | 74128196605 | -33.35 |
| 10000 | 3.51063E+13 | 3.06089E+13 | -12.81 |
| 100000 | X | X | X |
| 1000000 | X | X | X |
| Average | 8.80799E+12 | 7.67286E+12 | -39.27 |
X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.
Total time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced Resnik and input data divided equally.
| Sample Size | Original Resnik Total Time (ns) | Threaded Resnik Total Time (ns) (Input Data Divided Equally) | % Threaded Resnik Total Time (Input Data Divided Equally) vs. Original Resnik Total Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 2560906085 | 379861713 | 329100621 | 207469547 | -85.17 | -87.15 | -91.90 |
| 100 | 5350898201 | 234863609 | 223428959 | 403408364 | -95.61 | -95.82 | -92.46 |
| 1000 | 5224382582 | 546062707 | 348324714 | 418142340 | -89.55 | -93.33 | -92.00 |
| 10000 | 2997898214 | 1315383851 | 507249110 | 408547630 | -56.12 | -83.08 | -86.37 |
| 100000 | 9417548254 | 3374684138 | 3254669438 | 3745141194 | -64.17 | -65.44 | -60.23 |
| 1000000 | 46988654302 | 22114717802 | 19225589479 | 13719464210 | -52.94 | -59.08 | -70.80 |
| Average | 12090047940 | 4660928970 | 3981393720 | 3150362214 | -73.93 | -80.65 | -82.29 |
Average time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced Resnik and input data divided equally.
| Sample Size | Original Resnik Average Time (ns) | Threaded Resnik Average Time (ns) (Input Data Divided Equally) | % Threaded Resnik Average Time (Input Data Divided Equally) vs. Original Resnik Average Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 56515 | 3.80E+08 | 1.48E+05 | 1.07E+10 | 672287.86 | 161.88 | 18932926.63 |
| 100 | 26184.94949 | 7.14E+04 | 2.90E+05 | 6.70E+08 | 172.68 | 1007.51 | 2558621.76 |
| 1000 | 27907.82082 | 3.36E+04 | 2.65E+04 | 6.46E+07 | 20.40 | -5.04 | 231376.33 |
| 10000 | 16287.9895 | 1.92E+04 | 1.61E+04 | 6.45E+06 | 17.88 | -1.15 | 39499.73 |
| 100000 | 11844.26883 | 7.20E+03 | 1.15E+04 | 6.56E+05 | -39.21 | -2.91 | 5438.54 |
| 1000000 | 8273.153824 | 4.31E+03 | 7.10E+03 | 7.15E+04 | -47.90 | -14.18 | 764.24 |
| Average | 24502.19708 | 6.34E+07 | 8.32E+04 | 1.91E+09 | 112068.62 | 191.02 | 3628104.54 |
Total time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SSDD and input data divided equally.
| Number of Gene Pairs | Original SSDD Total Time (ns) | Threaded SSDD Total Time (ns) (Input Data Divided Equally) | % Threaded SSDD Total Time (Input Data Divided Equally) vs. Original SSDD Total Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 3124423720 | 576766602 | 1100200519 | 668881808 | -81.54 | -64.79 | -78.59 |
| 100 | 14597347470 | 3979998527 | 5141720453 | 2863699495 | -72.73 | -64.78 | -80.38 |
| 1000 | 93710848961 | 28954815547 | 19152967373 | 15747791657 | -69.10 | -79.56 | -83.20 |
| 10000 | 4.63634E+11 | 3.31738E+11 | 1.9269E+11 | 1.21739E+11 | -28.45 | -58.44 | -73.74 |
| 100000 | 4.48611E+12 | 1.53352E+12 | 1.39815E+12 | 1.50908E+12 | -65.82 | -68.83 | -66.36 |
| 1000000 | 2.83292E+13 | 1.65612E+13 | 1.25735E+13 | 2.0722E+13 | -41.54 | -55.62 | -26.85 |
| Average | 5.56506E+12 | 3.07667E+12 | 2.36495E+12 | 3.72868E+12 | -59.86 | -65.34 | -68.19 |
Average time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SSDD and input data divided equally.
| Number of Gene Pairs | Original SSDD Average Time (ns) | Threaded SSDD Average Time (ns) (Input Data Divided Equally) | % Threaded SSDD Average Time (Input Data Divided Equally) vs. Original SSDD Average Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 2.92E+08 | 1.55E+08 | 5.54E+11 | 1.12E+12 | -46.94 | 189561.09 | 383330.36 |
| 100 | 1.32E+08 | 8.38E+07 | 4.01E+10 | 7.00E+10 | -36.39 | 30339.33 | 53035.99 |
| 1000 | 9.32E+07 | 6.49E+07 | 3.95E+09 | 6.74E+09 | -30.34 | 4139.99 | 7134.82 |
| 10000 | 4.62E+07 | 1.17E+08 | 4.38E+08 | 6.67E+08 | 153.30 | 848.23 | 1344.0 |
| 100000 | 4.48E+07 | 3.88E+07 | 8.03E+07 | 6.25E+07 | -13.34 | 79.35 | 39.60 |
| 1000000 | 2.83E+07 | 4.77E+07 | 3.33E+07 | 6.68E+07 | 68.67 | 17.75 | 136.21 |
| Average | 1.06E+08 | 8.45E+07 | 9.98E+10 | 2.00E+11 | 1.58E+01 | 3.75E+04 | 7.42E+04 |
Total time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SORA and input data divided equally.
| Number of Gene Pairs | Original SORA Total Time (ns) | Threaded SORA Total Time (ns) (Input Data Divided Equally) | % Threaded SORA Total Time (Input Data Divided Equally) vs. Original SORA Total Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 1708984548 | 603416420 | 426734074 | 1296029007 | -64.69 | -75.03 | -24.16 |
| 100 | 12714096406 | 4591177183 | 3915812000 | 2555994931 | -63.89 | -69.20 | -79.90 |
| 1000 | 1.11216E+11 | 37052551034 | 23049069956 | 20918129803 | -66.68 | -79.28 | -81.19 |
| 10000 | 3.51063E+13 | 1.23639E+13 | 1.2036E+13 | 7.32991E+12 | -64.78 | -65.72 | -79.12 |
| 100000 | X | X | X | X | X | X | X |
| 1000000 | X | X | X | X | X | X | X |
| Average | 8.80799E+12 | 3.10154E+12 | 3.01584E+12 | 1.83867E+12 | -65.01 | -72.31 | -66.09 |
X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.
Enhanced SORA average time in the distributed system (2, 3 and 4 slaves) with input data divided equally.
| Number of Gene Pairs | Original SORA Average Time (ns) | Threaded SORA Average Time (ns) (Input Data Divided Equally) | % Threaded SORA Average Time (Input Data Divided Equally) vs. Original SORA Average Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 4.14E+07 | 6.03E+08 | 5.27E+12 | 1.06E+13 | 4.82E+08 | 2.13E+11 | 1.84E+11 |
| 100 | 1.23E+08 | 7.06E+07 | 3.20E+11 | 6.63E+11 | 4.42E+07 | 1.63E+11 | 4.60E+10 |
| 1000 | 1.11E+08 | 6.15E+07 | 3.09E+10 | 6.40E+10 | 6.61E+07 | 6.75E+09 | 6.01E+09 |
| 10000 | 3.51E+09 | 1.83E+09 | 6.37E+09 | 3.56E+09 | 3.31E+08 | 6.69E+08 | 3.46E+08 |
| 100000 | X | X | X | X | X | X | X |
| 1000000 | X | X | X | X | X | X | X |
| Average | 9.47E+08 | 6.41E+08 | 1.41E+12 | 2.83E+12 | 2.31E+08 | 9.59E+10 | 5.91E+10 |
X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.
Average time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced Resnik and input data divided by their similarity.
| Number of Gene Pairs | Original Resnik Average Time (ns) | Threaded Resnik Average Time (ns) (Input Data Divided by Their Similarity) | % Threaded Resnik Average Time (Input Data Divided by Their Similarity) vs. Original Resnik Average Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 56515 | 1.58E+05 | 3.65E+08 | 4.59E+08 | 179.57 | 645746.24 | 812073.76 |
| 100 | 26184.94949 | 6.90E+04 | 3.48E+07 | 3.52E+07 | 163.51 | 132800.77 | 134328.37 |
| 1000 | 27907.82082 | 3.70E+04 | 3.39E+06 | 3.41E+06 | 32.58 | 12047.13 | 12118.80 |
| 10000 | 16287.9895 | 2.03E+05 | 3.38E+05 | 3.47E+05 | 1146.32 | 1975.15 | 2030.40 |
| 100000 | 11844.26883 | 2.58E+04 | 3.26E+04 | 1.01E+04 | 117.83 | 175.24 | -14.73 |
| 1000000 | 8273.153824 | 2.23E+04 | 2.36E+04 | 1.86E+04 | 169.55 | 185.26 | 124.82 |
| Average | 24502.19708 | 85850 | 67264033.33 | 82997616.67 | 301.56 | 132154.96 | 160110.24 |
Total time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced Resnik and input data divided by their similarity.
| Number of Gene Pairs | Original Resnik Total Time (ns) | Threaded Resnik Total Time (ns) (Input Data Divided by Their Similarity) | % Threaded Resnik Total Time (Input Data Divided by Their Similarity) vs. Original Resnik Total Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 2560906085 | 335066219 | 211190589 | 3208745299 | -86.92 | -91.75 | 25.30 |
| 100 | 5350898201 | 854315104 | 794283629 | 711803925 | -84.03 | -85.16 | -86.70 |
| 1000 | 5224382582 | 8390205103 | 7467893056 | 8870113085 | 60.60 | 42.94 | 69.78 |
| 10000 | 2997898214 | 93562482698 | 80426435081 | 68009514911 | 3020.94 | 2582.76 | 2168.57 |
| 100000 | 9417548254 | 5.81006E+11 | 6.31117E+11 | 6.47021E+11 | 6069.40 | 6601.50 | 6770.38 |
| 1000000 | 46988654302 | 6.55475E+12 | 6.35691E+12 | 6.12695E+12 | 13849.64 | 13428.62 | 12939.22 |
| Average | 12090047940 | 1.20648E+12 | 1.17949E+12 | 1.14246E+12 | 3804.94 | 3746.49 | 3647.76 |
Average time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SSDD and input data divided by their similarity.
| Number of Gene Pairs | Original SSDD Average Time (ns) | Threaded SSDD Average Time (ns) (Input Data Divided by Their Similarity) | % Threaded SSDD Average Time (Input Data Divided by Their Similarity) vs. Original SSDD Average Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 2.92E+08 | 1.24E+08 | 8.63E+10 | 1.17E+11 | -57.55 | 29444.68 | 39954.78 |
| 100 | 1.32E+08 | 5.72E+07 | 1.04E+10 | 1.06E+10 | -56.58 | 7794.49 | 7946.31 |
| 1000 | 9.32E+07 | 4.55E+07 | 1.01E+09 | 1.08E+09 | -51.16 | 984.15 | 1059.29 |
| 10000 | 4.62E+07 | 4.47E+07 | 9.67E+07 | 1.82E+08 | -3.23 | 109.35 | 294.01 |
| 100000 | 4.48E+07 | 2.47E+07 | 6.66E+07 | 4.38E+07 | -44.83 | 48.75 | -2.17 |
| 1000000 | 2.83E+07 | 6.45E+07 | 6.42E+07 | 4.12E+07 | 128.07 | 127.01 | 45.68 |
| Average | 1.06E+08 | 6.01E+07 | 1.63E+10 | 2.15E+10 | -1.42E+01 | 6.42E+03 | 8.22E+03 |
Total time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SSDD and input data divided by their similarity.
| Number of Gene Pairs | Original SSDD Total Time (ns) | Threaded SSDD Total Time (ns) (Input Data Divided by Their Similarity) | % Threaded SSDD Total Time (Input Data Divided by Their Similarity) vs. Original SSDD Total Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 3124423720 | 794430504 | 295236711 | 2460790326 | -74.57 | -90.55 | -21.24 |
| 100 | 14597347470 | 5126679035 | 2413422396 | 3134281448 | -64.88 | -83.47 | -78.53 |
| 1000 | 93710848961 | 39111930797 | 36944258807 | 29763348812 | -58.26 | -60.58 | -68.24 |
| 10000 | 4.63634E+11 | 4.13966E+11 | 3.05233E+11 | 3.13192E+11 | -10.71 | -34.17 | -32.45 |
| 100000 | 4.48611E+12 | 2.48536E+12 | 1.92535E+12 | 2.06661E+12 | -44.60 | -57.08 | -53.93 |
| 1000000 | 2.83292E+13 | 2.16457E+13 | 1.99122E+13 | 1.84843E+13 | -23.59 | -29.71 | -34.75 |
| Average | 5.56506E+12 | 4.09835E+12 | 3.69707E+12 | 3.48325E+12 | -46.10 | -59.26 | -48.19 |
Total time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SORA and input data divided by their similarity.
| Number of Gene Pairs | Original SORA Total Time (ns) | Threaded SORA Total Time (ns) (Input Data Divided by Their Similarity) | % Threaded SORA Total Time (Input Data Divided by Their Similarity) vs. Original SORA Total Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 1708984548 | 482298616 | 359402373 | 399416852 | -71.78 | -78.97 | -76.63 |
| 100 | 12714096406 | 2840539191 | 2788018116 | 2459561951 | -77.66 | -78.07 | -80.65 |
| 1000 | 1.11216E+11 | 28551368161 | 23437947918 | 18670965654 | -74.33 | -78.93 | -83.21 |
| 10000 | 3.51063E+13 | 1.22567E+12 | 1.03491E+12 | 1.24612E+12 | -96.51 | -97.05 | -96.45 |
| 100000 | X | X | X | X | X | X | X |
| 1000000 | X | X | X | X | X | X | X |
| Average | 8.80799E+12 | 3.14385E+11 | 2.65373E+11 | 3.16912E+11 | -80.07 | -83.25 | -84.24 |
X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.
Average time obtained using a distributed system (2, 3, and 4 slaves) with Enhanced SORA and input data divided by their similarity.
| Number of Gene Pairs | Original SORA Average Time (ns) | Threaded SORA Average Time (ns) (Input Data Divided by Their Similarity) | % Threaded SORA Average Time (Input Data Divided by Their Similarity) vs. Original SORA Total Time | ||||
|---|---|---|---|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | 2 Slaves | 3 Slaves | 4 Slaves | ||
| 10 | 4.14E+07 | 4.82E+08 | 2.13E+11 | 1.84E+11 | 1065.48 | 514936.11 | 444813.82 |
| 100 | 1.23E+08 | 4.42E+07 | 1.63E+11 | 4.60E+10 | -64.12 | 132207.19 | 37238.23 |
| 1000 | 1.11E+08 | 6.61E+07 | 6.75E+09 | 6.01E+09 | -40.35 | 5991.10 | 5323.33 |
| 10000 | 3.51E+09 | 3.31E+08 | 6.69E+08 | 3.46E+08 | -90.57 | -80.94 | -90.14 |
| 100000 | X | X | X | X | X | X | X |
| 1000000 | X | X | X | X | X | X | X |
| Average | 9.47E+08 | 2.31E+08 | 9.59E+10 | 5.91E+10 | 1065.48 | 1.63E+05 | 1.22E+05 |
X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.
Average time reduction obtained using Threaded Resnik with a distributed system and input data divided by their similarity versus input data divided equally.
| Number of Gene Pairs | Improvement Percentage (IP) | ||
|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | |
| 10 | -99.96 | 4.05 | -95.71 |
| 100 | -3.36 | -52.76 | -94.75 |
| 1000 | 10.12 | 200 | -94.72 |
| 10000 | 957.29 | 1999.38 | -94.62 |
| 100000 | 258.33 | 183.48 | -98.46 |
| 1000000 | 417.40 | 232.39 | -73.99 |
| Average | 256.64 | 45588.22 | -92.04 |
Total time reduction obtained using Threaded Resnik with a distributed system and input data divided by their similarity versus input data divided equally.
| Number of Gene Pairs | Improvement Percentage (IP) | ||
|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | |
| 10 | -11.79 | -28.59 | 1446.61 |
| 100 | 263.75 | 342.67 | 76.45 |
| 1000 | 1436.49 | 2081.14 | 2021.31 |
| 10000 | 7012.94 | 15755.41 | 16546.66 |
| 100000 | 17116.60 | 19291.13 | 17176.28 |
| 1000000 | 29539.75 | 32964.86 | 44558.83 |
| Average | 9226.29 | 11712.50 | 13637.69 |
Average time reduction obtained using Threaded SSDD in a distributed system with input data divided by their similarity versus input data divided equally.
| Number of Gene Pairs | Improvement Percentage (IP) | ||
|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | |
| 10 | -20 | -84.42 | -89.55 |
| 100 | -31.74 | -74.06 | -84.86 |
| 1000 | -29.89 | -74.43 | -83.98 |
| 10000 | -61.79 | -77.92 | -72.71 |
| 100000 | -36.34 | -17.06 | -29.92 |
| 1000000 | 35.22 | 92.79 | -38.32 |
| Average | -2.41E+01 | -3.92E+01 | -6.66E+01 |
Total time reduction obtained using Threaded SSDD in a distributed system with input data divided by their similarity versus input data divided equally.
| Number of Gene Pairs | Improvement Percentage (IP) | ||
|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | |
| 10 | 37.74 | -73.17 | 267.90 |
| 100 | 28.81 | -53.06 | 9.45 |
| 1000 | 35.08 | 92.89 | 89.00 |
| 10000 | 24.79 | 58.41 | 157.26 |
| 100000 | 62.07 | 37.71 | 36.95 |
| 1000000 | 30.70 | 58.37 | -10.80 |
| Average | 36.53 | 20.20 | 91.63 |
Average time reduction obtained using Threaded SORA in a distributed system with input data divided by their similarity versus input data divided equally.
| Number of Gene Pairs | Improvement Percentage (IP) | ||
|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | |
| 10 | -20.07 | -95.96 | -98.26 |
| 100 | -37.39 | -49.06 | -93.06 |
| 1000 | 7.48 | -78.16 | -90.61 |
| 10000 | -81.91 | -89.50 | -90.28 |
| 100000 | X | X | X |
| 1000000 | X | X | X |
| Average | -3.30E+01 | -7.82E+01 | -9.31E+01 |
X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.
Total time reduction obtained using Threaded SORA in a distributed system with input data divided by their similarity versus input data divided equally.
| Number of Gene Pairs | Improvement Percentage (IP) | ||
|---|---|---|---|
| 2 Slaves | 3 Slaves | 4 Slaves | |
| 10 | -20.07 | -15.78 | -69.18 |
| 100 | -38.13 | -28.80 | -3.77 |
| 1000 | -22.94 | 1.69 | -10.74 |
| 10000 | -90.087 | -91.41 | -83.00 |
| 100000 | X | X | X |
| 1000000 | X | X | X |
| Average | -42.80820245 | -33.57 | -41.67 |
X indicates that, due to limited memory, the system required many hours to find the similarity of some of the pairs.