| Literature DB >> 24526831 |
Dongming Tang1, Mingwen Wang2, Weifan Zheng1, Hongjun Wang3.
Abstract
To discover relationships and associations rapidly in large-scale datasets, we propose a cross-platform tool for the rapid computation of the maximal information coefficient based on parallel computing methods. Through parallel processing, the provided tool can effectively analyze large-scale biological datasets with a markedly reduced computing time. The experimental results show that the proposed tool is notably fast, and is able to perform an all-pairs analysis of a large biological dataset using a normal computer. The source code and guidelines can be downloaded from https://github.com/HelloWorldCN/RapidMic.Entities:
Keywords: algorithms; computational biology; gene expression; software; statistical analysis
Year: 2014 PMID: 24526831 PMCID: PMC3921152 DOI: 10.4137/EBO.S13121
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Compute the maximal information coefficient.
| compute_mic( |
| { |
| |
| |
| |
| /* x vs. y */ |
| for ( |
| |
| /* y |
| for ( |
| |
| return getResult(); |
| } |
| FixOnePartition( |
| { |
| |
| |
| EquipartitionYAxis( |
| |
| return ApproxOptimizeXAxis( |
| } |
Comparison of the computing time (seconds) for case 1.
| N | MINE.JAR | MINERVA | MINEMAT | MINEC++ | RAPIDMIC |
|---|---|---|---|---|---|
| 10000 | 5.651 | 43.575 | 65.452 | 52.584 | 1.420 |
| 20000 | 13.689 | 375.707 | 376.086 | 334.551 | 5.456 |
| 50000 | 81.256 | 4172.569 | 4219.81 | 3616.941 | 27.05 |
| 100000 | 604.114 | 26442.04 | 25847.2 | 22448.676 | 101.608 |
Notes: n, number of values for the variable. The computing time excludes the time required to read the data file and write the result file. Minerva enables parallelization using a multicore package.
Comparison of the computing time (seconds) for case 2.
| DATA | M | MINE.JAR | MINERVA | MINEMAT | MINEC++ | RAPIDMIC | |
|---|---|---|---|---|---|---|---|
| Spellman | 4381 | 23 | – | 4304.86 | 2933.59 | 2267.198 | 1060.649 |
| MLB2008 | 132 | 337 | 1476.938 | 430.489 | 1170.25 | 1195.025 | 350.142 |
Notes: n, number of variables. m, number of features. The computing time excludes the time required to read the data file and write the result file. Minerva enables parallelization using a multicore package.
Comparison of the computing time (seconds) for case 3.
| DATA | MINE.JAR | MINERVA | MINEMAT | MINEC++ | RAPIDMIC | ||
|---|---|---|---|---|---|---|---|
| RNA | 20422 | 16 | 2.440 | 2.905 | 1.275 | 0.831 | 0.378 |
| Spellman | 4381 | 23 | 0.41 | 1.954 | 1.283 | 1.108 | 0.219 |
Notes: n, number of variables. m, number of features. The computing time excludes the time required to read the data file and write the result file. Minerva enables parallelization using a multicore package.
Comparison of the computing time (seconds) for case 4.
| DATA | M | MINE.JAR | MINERVA | MINEMAT | MINEC++ | RAPIDMIC | ||
|---|---|---|---|---|---|---|---|---|
| RNA | 20422 | 16 | 3000 | - | 3164.7 | 3580.5 | 3060.2 | 1285.3 |
| Spellman | 4381 | 23 | 2190 | - | 3280.6 | 1297.8 | 1118.6 | 443.3 |
Notes: n, number of variables. m, number of features. i, the first i variables compared with each of the remaining variables. The computing time excludes the time required to read the data file and write the result file. Minerva enables parallelization using a multicore package.
Analysis of the consistency between our parallel method RapidMic and the serial method MINE.jar.
| Range | Identical | (0, 0.00001] | (0.00001, 0.00002] |
| Number | 2193 | 644 | 1544 |