| Literature DB >> 25136661 |
P M Booma1, S Prabhakaran2, R Dhanalakshmi1.
Abstract
Microarray gene expression datasets has concerned great awareness among molecular biologist, statisticians, and computer scientists. Data mining that extracts the hidden and usual information from datasets fails to identify the most significant biological associations between genes. A search made with heuristic for standard biological process measures only the gene expression level, threshold, and response time. Heuristic search identifies and mines the best biological solution, but the association process was not efficiently addressed. To monitor higher rate of expression levels between genes, a hierarchical clustering model was proposed, where the biological association between genes is measured simultaneously using proximity measure of improved Pearson's correlation (PCPHC). Additionally, the Seed Augment algorithm adopts average linkage methods on rows and columns in order to expand a seed PCPHC model into a maximal global PCPHC (GL-PCPHC) model and to identify association between the clusters. Moreover, a GL-PCPHC applies pattern growing method to mine the PCPHC patterns. Compared to existing gene expression analysis, the PCPHC model achieves better performance. Experimental evaluations are conducted for GL-PCPHC model with standard benchmark gene expression datasets extracted from UCI repository and GenBank database in terms of execution time, size of pattern, significance level, biological association efficiency, and pattern quality.Entities:
Mesh:
Year: 2014 PMID: 25136661 PMCID: PMC4083291 DOI: 10.1155/2014/357873
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1Framework of improved Pearson's correlation proximity-based hierarchical clustering.
Algorithm 1Hierarchical clustering algorithm to measure the similar gene expression pattern.
Algorithm 2Seed Augment algorithm.
Tabulation of execution time (using GenBank database).
| Size of pattern (KB) | Execution time (ms) | ||
|---|---|---|---|
| Existing MDP method | SC3 | PCPHC model | |
| 155 | 370 | 300 | 260 |
| 203 | 650 | 580 | 470 |
| 350 | 1040 | 950 | 780 |
| 489 | 1170 | 1000 | 900 |
| 550 | 1430 | 1200 | 1010 |
| 614 | 1635 | 1420 | 1150 |
| 757 | 1765 | 1350 | 1205 |
Figure 2Measure of execution time (using GenBank database).
Tabulation of biological association efficiency (using yeast gene expression dataset).
| Number of features | Biological association efficiency (%) | ||
|---|---|---|---|
| Existing MDP method | SC3 | PCPHC model | |
| 10 | 75 | 77 | 82 |
| 20 | 76 | 78 | 85 |
| 30 | 78 | 80 | 82 |
| 40 | 79 | 80 | 86 |
| 50 | 78 | 81 | 87 |
| 60 | 79 | 83 | 92 |
| 70 | 80 | 85 | 95 |
Figure 3Measure of biological association efficiency (using yeast gene expression dataset).
Figure 4Measure of pattern quality level.
Tabulation of pattern quality level.
| Number of features | Pattern quality level (score points) | ||
|---|---|---|---|
| Existing MDP method | SC3 | PCPHC model | |
| 2 | 42 | 43 | 46 |
| 4 | 60 | 61 | 64 |
| 6 | 36 | 37 | 39 |
| 8 | 37 | 39 | 42 |
| 10 | 30 | 32 | 34 |
| 12 | 34 | 35 | 37 |
| 14 | 40 | 41 | 42 |
Figure 5Technique versus accuracy rate (using GenBank database).
Technique versus accuracy rate (using GenBank database).
| Technique | Accuracy rate (%) |
|---|---|
| Existing MDP method | 72 |
| SC3 | 78 |
| PCPHC model | 85 |
Tabulation of gene expression level (using yeast gene expression dataset).
| Number of genes | Gene expression level (%) | ||
|---|---|---|---|
| Existing MDP method | SC3 | PCPHC model | |
| 25 | 76 | 81 | 91 |
| 50 | 79 | 82 | 91 |
| 75 | 80 | 83 | 92 |
| 100 | 81 | 84 | 92 |
| 125 | 82 | 84 | 92 |
| 150 | 84 | 87 | 93 |
| 175 | 85 | 89 | 93 |
Figure 6Measure of gene expression level (using yeast gene expression dataset).