| Literature DB >> 17022808 |
Prabakaran Subramani1, Rajendra Sahu, Shekhar Verma.
Abstract
BACKGROUND: Feature selection is an approach to overcome the 'curse of dimensionality' in complex researches like disease classification using microarrays. Statistical methods are utilized more in this domain. Most of them do not fit for a wide range of datasets. The transform oriented signal processing domains are not probed much when other fields like image and video processing utilize them well. Wavelets, one of such techniques, have the potential to be utilized in feature selection method. The aim of this paper is to assess the capability of Haar wavelet power spectrum in the problem of clustering and gene selection based on expression data in the context of disease classification and to propose a method based on Haar wavelet power spectrum.Entities:
Mesh:
Year: 2006 PMID: 17022808 PMCID: PMC1618414 DOI: 10.1186/1471-2105-7-432
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A two level DWT for N data. The number of data is halved after every filtering and down sampling operation. A wavelet transform is applied on output of low pass filter (h [n]) (approximation coefficients) recursively keeping the output coefficients of each high pass filtering operation (g [n]) (detailed coefficients) at each stage. The wavelet transform of a data at any level i of decomposition consists of approximation coefficients only at ith level and all detailed coefficients up to ith level.
Differentially expressed genes selected for classifying EWS diagnostic category of SRBCT data (RPV – Relative percentage variation).
| 1 | 1319 | 866702 | 99.52 |
| 2 | 1645 | 52076 | 97.92 |
| 3 | 1954 | 814260 | 97.91 |
| 4 | 1200 | 838856 | 96.45 |
| 5 | 696 | 753587 | 95.63 |
| 6 | 1140 | 824922 | 92.71 |
| 7 | 1070 | 1475730 | 91.06 |
| 8 | 851 | 563673 | 89.27 |
| 9 | 404 | 1422723 | 88.28 |
| 10 | 1831 | 208718 | 87.64 |
| 16 | 1980 | 841641 | 83.46 |
| 19 | 373 | 291756 | 81.31 |
| 20 | 1626 | 811000 | 81.22 |
A list of top ranked genes selected by using relative percentage variation of gene expression profiles between BL versus others of SRBCT dataset
| 1 | 1916 | 80109 | 98.61 |
| 2 | 836 | 241412 | 98.24 |
| 3 | 783 | 767183 | 98.04 |
| 4 | 846 | 183337 | 98.02 |
| 5 | 1735 | 200814 | 97.81 |
| 6 | 1387 | 740604 | 97.40 |
| 7 | 335 | 1469292 | 96.35 |
| 8 | 1884 | 609663 | 96.16 |
| 9 | 1725 | 813630 | 95.69 |
| 10 | 1295 | 344134 | 95.48 |
| 14 | 2230 | 417226 | 94.45 |
| 17 | 1915 | 840942 | 94.22 |
| 19 | 1158 | 814526 | 93.24 |
| 25 | 85 | 700792 | 91.70 |
A list of features selected by using relative percentage variation of gene expression profiles between NB versus others of SRBCT dataset
| 1 | 1764 | 44563 | 96.29 |
| 2 | 742 | 812105 | 95.93 |
| 3 | 236 | 878280 | 95.38 |
| 4 | 255 | 325182 | 89.34 |
| 5 | 2202 | 110503 | 88.23 |
| 6 | 417 | 395708 | 85.49 |
| 7 | 909 | 785933 | 84.32 |
| 8 | 1601 | 629896 | 82 |
| 9 | 2199 | 135688 | 81.02 |
| 10 | 695 | 376516 | 80.50 |
| 18 | 2144 | 308231 | 69.75 |
| 25 | 2050 | 295985 | 60.40 |
A list of features selected by earlier methods and using relative percentage variation of gene expression profiles of Golub within top 20 slots.
| 1 | 5599 | 99.99 |
| 5 | 1882 | 99.95 |
| 11 | 5376 | 99.91 |
| 12 | 6218 | 99.89 |
| 17 | 2288 | 99.81 |
| 19 | 2043 | 99.76 |
| 20 | 6200 | 99.75 |
A list of features selected by the original work within top 25 slots and using relative percentage variation of gene expression profiles between BRCA1 versus others.
| 4 | 955 | 91.99 |
| 8 | 1288 | 90.30 |
| 15 | 585 | 88.22 |
| 16 | 2248 | 88.12 |
| 23 | 10 | 86.66 |
| 24 | 1620 | 86.41 |
| 25 | 2734 | 85.48 |
Figure 2Haar wavelet power spectrum of gene 1 of SRBCT data in different diagnostic categories. It is obvious that the power spectrum of gene 1 is different in different diagnostic categories. Raw gene expression data of gene 1 is used in calculating the wavelet power spectrum.
Figure 3Haar wavelet power spectrum of gene 2 in EWS category data and that in the data containing all other categories of SRBCT data. It shows that gene2 is not dominant in EWS diagnostic category against the group of all other diagnostic categories. Raw gene expression data of gene 2 is used in calculating the wavelet power spectrum.
Figure 4Haar wavelet power spectrum of gene 1319 in EWS category data and that in the data containing all the remaining categories of SRBCT data. Unlike gene2, gene 1319 is dominant in EWS against all other diagnostic categories. Raw gene expression data of gene 1319 is used in calculating the wavelet power spectrum.