| Literature DB >> 18366616 |
Jack Y Yang1, Guo-Zheng Li, Hao-Hua Meng, Mary Qu Yang, Youping Deng.
Abstract
BACKGROUND: Since the high dimensionality of gene expression microarray data sets degrades the generalization performance of classifiers, feature selection, which selects relevant features and discards irrelevant and redundant features, has been widely used in the bioinformatics field. Multi-task learning is a novel technique to improve prediction accuracy of tumor classification by using information contained in such discarded redundant features, but which features should be discarded or used as input or output remains an open issue.Entities:
Mesh:
Year: 2008 PMID: 18366616 PMCID: PMC2386068 DOI: 10.1186/1471-2164-9-S1-S3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Performance of multi-task learning algorithms for ANNs with Both graphs show balanced accuracy (BACC) scores. Top: Results grouped by data set. Bottom: Results grouped by multi-task learning algorithm
Figure 2Performance of multi-task learning algorithms for ANNs with Both graphs show balanced accuracy (BACC) scores. Top: Results grouped by data set. Bottom: Results grouped by multi-task learning algorithm.
Mean and standard deviation (in parentheses) of BACC scores (%), calculated over 50 hold out runs.
| DATASET | ALL | GA-FS | H-MTL | GA-MTL | GA-MTL-IR | e-GA-MTL |
| Breast | 53.2(9.3) | 56.1(8.6) | 59.8(8.5) | 72.0(8.3) | 69.0(8.4) | 72.4(8.4) |
| Colon | 50.0(8.8) | 46.7(8.5) | 58.1(7.9) | 78.6(8.4) | 85.7(8.1) | 78.6(7.8) |
| Leukemia | 59.1(7.8) | 59.7(8.2) | 65.2(7.9) | 79.6(7.9) | 76.4(7.7) | 84.5(7.6) |
| Ovarian | 57.8(6.8) | 69.1(6.8) | 75.6(3.8) | 78.9(7.2) | 79.9(7.1) | 82.4(7.1) |
| Average | 54.8(8.2) | 58.1(8.0) | 64.8(7.8) | 77.1(7.9) | 77.4(7.8) | 79.5(7.7) |
| Breast | 54.2(9.3) | 59.3(8.9) | 69.2(9.2) | 74.9(8.8) | 72.3(9.3) | 76.2(8.7) |
| Colon | 57.0(8.9) | 64.4(8.6) | 67.4(9.0) | 82.2(8.6) | 82.7(8.6) | 82.5(8.8) |
| Leukemia | 68.4(8.1) | 76.2(7.7) | 76.6(8.0) | 76.8(7.5) | 76.4(7.1) | 91.4(7.2) |
| Ovarian | 63.0(6.3) | 68.5(6.6) | 78.8(5.9) | 80.3(6.0) | 83.5(6.2) | 83.6(6.3) |
| Average | 60.4(8.2) | 67.6(8.0) | 73.2(8.0) | 79.4(8.0) | 79.5(7.8) | 83.3(7.7) |
Mean and standard deviation (in parentheses) of correction scores (%), calculated over 50 hold out runs.
| DATASET | ALL | GA-FS | H-MTL | GA-MTL | GA-MTL-IR | e-GA-MTL |
| Breast | 53.1(9.5) | 56.3(8.4) | 59.3(8.8) | 71.9(8.9) | 68.8(8.1) | 71.9(8.6) |
| Colon | 57.1(7.8) | 61.9(7.3) | 66.6(7.1) | 80.9(6.3) | 85.7(6.4) | 80.9(6.4) |
| Leukemia | 57.7(9.3) | 53.8(9.0) | 61.5(9.5) | 76.9(8.4) | 76.9(9.1) | 80.7(8.9) |
| Ovarian | 57.1(6.4) | 67.9(7.4) | 75.0(5.4) | 78.6(5.9) | 79.8(5.1) | 82.1(5.8) |
| Average | 56.3(8.3) | 60.0(8.0) | 65.6(7.7) | 77.1(7.4) | 77.8(7.2) | 78.9(7.4) |
| Breast | 64.3(8.5) | 67.3(8.7) | 65.7(8.3) | 75.4(8.3) | 71.8(7.5) | 75.8(8.0) |
| Colon | 62.0(7.5) | 71.2(7.2) | 75.2(6.5) | 85.1(6.7) | 82.9(6.1) | 83.9(6.3) |
| Leukemia | 65.3(8.8) | 69.2(8.4) | 76.3(8.8) | 84.6(8.9) | 83.1(8.1) | 85.6(8.3) |
| Ovarian | 61.9(7.5) | 65.4(7.7) | 78.6(6.4) | 80.9(6.8) | 82.1(6.4) | 83.3(6.8) |
| Average | 63.4(8.1) | 68.3(8.0) | 74.0(7.5) | 81.5(7.7) | 80.0(7.0) | 82.2(7.4) |
Mean and standard deviation (in parentheses) of sensitivity scores (%), calculated over 50 hold out runs.
| DATASET | ALL | GA-FS | H-MTL | GA-MTL | GA-MTL-IR | e-GA-MTL |
| Breast | 53.3(8.7) | 53.3(8.5) | 66.7(8.0) | 73.3(7.8) | 73.3(7.5) | 80.0(7.6) |
| Colon | 28.6(7.4) | 28.6(7.5) | 42.8(7.4) | 71.4(7.0) | 85.7(7.1) | 71.4(7.1) |
| Leukemia | 62.5(8.1) | 75.0(8.0) | 75.0(7.7) | 87.5(7.9) | 75.0(7.8) | 85.7(7.7) |
| Ovarian | 60.0(7.5) | 73.3(7.4) | 76.7(7.4) | 80.0(6.7) | 80.0(6.9) | 83.3(7.2) |
| Average | 51.1(7.9) | 57.6(7.9) | 65.3(7.6) | 78.1(7.4) | 78.5(7.3) | 80.1(7.4) |
| Breast | 60.2(9.8) | 63.2(9.4) | 73.3(10.4) | 80.5(9.5) | 80.6(9.0) | 85.6(9.0) |
| Colon | 42.7(7.8) | 43.4(7.5) | 51.2(7.4) | 72.1(7.5) | 72.3(7.1) | 72.3(6.8) |
| Leukemia | 71.3(7.6) | 75.0(7.5) | 77.5(7.9) | 88.7(8.1) | 74.3(7.9) | 91.2(8.3) |
| Ovarian | 64.7(7.5) | 71.4(7.3) | 80.5(7.4) | 81.2(7.0) | 80.0(6.9) | 86.7(7.1) |
| Average | 59.7(8.2) | 63.3(7.9) | 70.6(8.3) | 80.6(8.0) | 76.8(7.7) | 84.0(7.8) |
Mean and standard deviation (in parentheses) of specificity scores (%), calculated over 50 hold out runs.
| DATASET | ALL | GA-FS | H-MTL | GA-MTL | GA-MTL-IR | e-GA-MTL |
| Breast | 53.0(9.9) | 58.8(8.6) | 52.9(9.0) | 70.7(8.7) | 64.7(9.2) | 64.8(9.2) |
| Colon | 71.3(10.2) | 64.7(9.5) | 73.4(8.4) | 85.7(9.8) | 85.7(9.1) | 85.7(8.4) |
| Leukemia | 55.6(7.5) | 44.4(8.3) | 55.4(8.1) | 71.7(7.8) | 77.8(7.5) | 83.3(7.4) |
| Ovarian | 55.6(6.1) | 64.9(6.2) | 74.4(6.1) | 77.7(7.6) | 79.7(7.3) | 81.5(7.0) |
| Average | 58.5(8.4) | 58.6(8.1) | 64.3(8.0) | 76.1(8.4) | 76.3(8.2) | 78.8(8.0) |
| Breast | 48.1(8.8) | 55.4(8.4) | 65.1(8.0) | 69.2(8.1) | 63.9(9.6) | 66.8(8.4) |
| Colon | 71.3(10.0) | 85.3(9.7) | 83.5(10.5) | 92.2(9.7) | 93.1(10.0) | 92.6(10.8) |
| Leukemia | 65.4(8.5) | 77.4(7.8) | 75.6(8.1) | 64.8(6.8) | 78.4(6.2) | 91.5(6.0) |
| Ovarian | 61.3(5.1) | 65.6(5.9) | 77.0(4.3) | 79.4(5.0) | 87.0(5.4) | 80.4(5.5) |
| Average | 61.0(8.1) | 71.8(8.1) | 75.8(7.6) | 78.1(7.9) | 82.2(7.8) | 82.6(7.6) |
Mean and standard deviation (in parentheses) of precision scores (%), calculated over 50 hold out runs.
| DATASET | ALL | GA-FS | H-MTL | GA-MTL | GA-MTL-IR | e-GA-MTL |
| Breast | 50.0(8.8) | 53.3(8.4) | 55.5(8.4) | 68.8(8.1) | 64.7(7.9) | 66.7(8.2) |
| Colon | 33.3(8.7) | 25.0(8.1) | 42.9(7.4) | 71.4(7.8) | 75.0(7.7) | 71.4(7.4) |
| Leukemia | 38.5(7.7) | 37.5(7.9) | 42.8(7.5) | 58.3(7.6) | 60.0(7.4) | 66.7(7.1) |
| Ovarian | 42.9(6.4) | 53.7(6.6) | 62.6(6.1) | 66.6(6.8) | 68.6(7.1) | 71.4(6.5) |
| Average | 41.2(7.9) | 42.4(7.8) | 51.0(7.4) | 66.3(7.6) | 67.1(7.5) | 69.1(7.3) |
| Breast | 42.6(9.2) | 60.0(9.0) | 65.4(9.7) | 82.4(8.4) | 82.1(9.2) | 82.6(9.1) |
| Colon | 51.2(7.5) | 56.2(7.4) | 64.7(7.3) | 70.6(8.1) | 66.7(8.2) | 68.4(7.3) |
| Leukemia | 46.1(7.8) | 63.1(7.4) | 63.6(7.6) | 57.0(7.4) | 60.0(6.4) | 67.1(6.8) |
| Ovarian | 47.6(5.4) | 53.6(5.7) | 66.4(4.3) | 68.4(5.1) | 78.2(5.0) | 72.1(5.5) |
| Average | 46.9(7.5) | 58.2(7.4) | 65.0(7.2) | 69.6(7.3) | 71.8(7.2) | 72.6(7.2) |
Mean and standard deviation (in parentheses) of the number of features, calculated over 50 hold out runs, where the base learners are ANNs with M = 2 units in the hidden layer.
| Breast | Colon | Leukemia | Ovarian | Average | ||
| GA-FS | input | 15564.4(2.1) | 897.3(4.3) | 3245.3(2.5) | 10037.4(3.4) | 7436.1(3.1) |
| discarded | 8916.6(1.5) | 1103.6(3.2) | 3883.6(2.7) | 5116.6(3.8) | 4755.1(2.8) | |
| H-MTL | input | 12547.4(6.5) | 1014.0(5.4) | 4007.8(2.5) | 8924.7(2.5) | 6623.5(4.2) |
| output | 4182.5(6.9) | 338.0(3.2) | 1335.9(3.4) | 2974.9(3.4) | 2207.8(4.2) | |
| discarded | 7751.1(5.3) | 648.2(4.3) | 1785.3(2.6) | 3254.6(2.5) | 3359.8(3.7) | |
| GA-MTL | input | 15624.5(2.7) | 993.3(3.3) | 3324.7(2.0) | 10154.2(4.4) | 7524.2(3.1) |
| output | 8856.5(2.9) | 1007.6(3.5) | 3804.3(3.7) | 4999.8(2.8) | 4667.1(3.2) | |
| GA-MTL-IR | input | 12656.3(3.6) | 877.4(4.5) | 4231.6(2.9) | 7895.4(3.5) | 6415.2(3.6) |
| output | 4073.6(4.3) | 474.4(4.6) | 1112.1(2.7) | 4004.0(3.1) | 2416.0(3.7) | |
| discarded | 7751.1(5.3) | 648.2(4.3) | 1785.3(2.6) | 3254.6(2.5) | 3359.8(3.7) | |
| e-GA-MTL | input | 12743.3(4.1) | 884.7(5.2) | 4296.4(2.9) | 10235.2(3.6) | 7954.4(4.0) |
| output | 4097.4(4.3) | 486.4(4.4) | 1175.6(2.1) | 2354.4(4.5) | 2028.5(3.8) | |
| discarded | 7765.2(5.4) | 660.0(5.1) | 1796.2(2.1) | 2449.2(3.6) | 3167.7(4.0) |
Mean and standard deviation (in parentheses) of the number of features, calculated over 50 hold out runs, where the base learners are ANNs with M = 10 units in the hidden layer.
| Breast | Colon | Leukemia | Ovarian | Average | ||
| GA-FS | input | 15042.4(3.5) | 917.3(4.5) | 3456.3(2.5) | 9837.4(4.3) | 7313.4(3.7) |
| discarded | 9438.6(1.5) | 1082.6(2.8) | 3672.6(2.9) | 5316.6(3.7) | 4877.6(2.7) | |
| H-MTL | input | 12620.2(5.6) | 1042.3(4.5) | 4082.1(3.6) | 8847.3(2.1) | 6648.0(4.0) |
| output | 4206.7(5.9) | 347.5(2.3) | 1360.7(3.1) | 2949.1(4.3) | 2216.0(3.9) | |
| discarded | 7654.1(3.5) | 610.2(2.7) | 1686.3(5.2) | 3357.6(2.1) | 3327.1(3.4) | |
| GA-MTL | input | 15153.3(2.5) | 1041.1(4.1) | 3435.4(4.3) | 10034.5(3.3) | 7416.1(3.6) |
| output | 9327.7(2.7) | 959.0(2.5) | 3693.6(3.8) | 5119.5(3.4) | 4775.0(3.1) | |
| GA-MTL-IR | input | 12541.5(3.6) | 842.2(4.5) | 4325.6(2.8) | 7984.2(2.1) | 6423.4(3.3) |
| output | 4285.4(4.3) | 547.6(4.6) | 1117.1(2.0) | 3812.2(3.8) | 2440.6(3.7) | |
| discarded | 7654.1(3.5) | 610.2(2.7) | 1686.3(5.2) | 3357.6(2.1) | 3327.1(3.4) | |
| e-GA-MTL | input | 12700.3(4.1) | 854.7(4.1) | 4147.1(2.9) | 10453.2(3.5) | 7038.8(3.7) |
| output | 4154.4(4.5) | 486.4(4.4) | 1272.7(2.7) | 2454.4(3.5) | 2092.0(3.8) | |
| discarded | 7645.2(5.3) | 660.0(5.7) | 1846.2(2.4) | 2489.2(3.7) | 3160.2(4.3) |
Figure 3Heuristic multi-task learning (H-MTL)
Figure 4Genetic algorithm based feature selection (GA-FS)
Figure 5Genetic algorithm based multi-task learning (GA-MTL)
Figure 6GA-MTL with irrelevant feature removed (GA-MTL-IR)
Figure 7Enhanced version of GA-MTL (e-GA-MTL)
Microarray data sets used for comparison
| Data Sets | Samples | Class Ratio | Features | |||
| Breast Cancer | 97 | 46/51 | 24,481 | |||
| Colon | 62 | 22/40 | 2,000 | |||
| Leukemia | 72 | 25/47 | 7,129 | |||
| Ovarian | 253 | 91/162 | 15,154 | |||