| Literature DB >> 25888091 |
Marcilio C P de Souto1, Pablo A Jaskowiak2, Ivan G Costa3,4.
Abstract
BACKGROUND: Several missing value imputation methods for gene expression data have been proposed in the literature. In the past few years, researchers have been putting a great deal of effort into presenting systematic evaluations of the different imputation algorithms. Initially, most algorithms were assessed with an emphasis on the accuracy of the imputation, using metrics such as the root mean squared error. However, it has become clear that the success of the estimation of the expression value should be evaluated in more practical terms as well. One can consider, for example, the ability of the method to preserve the significant genes in the dataset, or its discriminative/predictive power for classification/clustering purposes. RESULTS ANDEntities:
Mesh:
Year: 2015 PMID: 25888091 PMCID: PMC4350881 DOI: 10.1186/s12859-015-0494-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Cancer datasets with missing values
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| alizadeh-2000-v1 | Blood | 2 | 21, 21 | 42 | 4022 | 3.25 | 49.30 | 3678 | 2.15 | 44.56 |
| alizadeh-2000-v2 | Blood | 3 | 42, 9, 11 | 62 | 4022 | 4.59 | 66.93 | 3369 | 2.75 | 60.52 |
| alizadeh-2000-v3 | Blood | 4 | 21, 21, 9, 11 | 62 | 4022 | 4.59 | 66.93 | 3369 | 2.75 | 60.52 |
| bredel-2005 | Brain | 3 | 31, 14, 5 | 179 | 41472 | 7.57 | 43.06 | 19200 | 3.25 | 30.56 |
| chen-2002 | Liver | 2 | 104, 75 | 66 | 24192 | 6.04 | 88.46 | 22336 | 2.18 | 85.46 |
| garber-2001 | Lung | 4 | 17, 40,4, 5 | 110 | 24192 | 3.87 | 67.81 | 36663 | 2.23 | 65.14 |
| lapointe-2004-v1 | Prostate | 3 | 11, 39, 19 | 69 | 42640 | 4.56 | 73.57 | 35265 | 2.10 | 69.26 |
| lapointe-2004-v2 | Prostate | 4 | 11, 39, 19, 41 | 110 | 42640 | 4.93 | 67.16 | 36663 | 2.23 | 60.29 |
| liang-2005 | Brain | 3 | 28, 6, 3 | 37 | 42640 | 4.56 | 73.57 | 22923 | 0.82 | 23.16 |
| risinger-2003 | Endometrium | 4 | 13, 3, 19, 7 | 42 | 24192 | 7.97 | 74.33 | 8366 | 0.76 | 20.76 |
| tomlins-2006 | Prostate | 5 | 27, 20, 32, 13, 12 | 104 | 8872 | 4.46 | 89.34 | 9936 | 3.27 | 80.94 |
| tomlins-2006-v2 | Prostate | 4 | 27, 20, 32, 13 | 92 | 20001 | 4.04 | 84.23 | 10048 | 3.34 | 79.72 |
|
|
|
|
|
|
|
| ||||
Statistics after non-supervised filtering
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| alizadeh-2000-v1 | 960 | 945 | 962 | 932 | 932 | 1.96 | 1.91 | 1.97 | 1.83 | 1.83 |
| alizadeh-2000-v2 | 1075 | 1050 | 1081 | 1030 | 1030 | 2.71 | 2.63 | 2.72 | 2.59 | 2.59 |
| alizadeh-2000-v3 | 1075 | 1050 | 1081 | 1030 | 1030 | 2.71 | 2.63 | 2.72 | 2.59 | 2.59 |
| bredel-2005 | 3819 | 3833 | 3825 | 3850 | 3852 | 0.81 | 0.82 | 0.81 | 0.84 | 0.84 |
| chen-2002 | 2240 | 2246 | 2238 | 2329 | 2340 | 2.25 | 2.24 | 2.23 | 2.31 | 2.32 |
| garber-2001 | 2563 | 2540 | 2578 | 2584 | 2603 | 1.94 | 1.92 | 1.95 | 1.95 | 1.95 |
| lapointe-2004-v1 | 4161 | 4159 | 4170 | 4196 | 4292 | 1.94 | 1.92 | 1.95 | 1.95 | 1.95 |
| lapointe-2004-v2 | 3846 | 3811 | 3833 | 3838 | 3930 | 2.50 | 2.50 | 2.50 | 2.53 | 2.58 |
| liang-2005 | 2531 | 2528 | 2529 | 2519 | 2521 | 2.32 | 2.29 | 2.31 | 2.33 | 2.37 |
| risinger-2003 | 942 | 2074 | 2078 | 2073 | 2073 | 0.84 | 0.83 | 0.84 | 0.81 | 0.81 |
| tomlins-2006 | 2027 | 2020 | 2039 | 2018 | 2018 | 2.41 | 2.40 | 2.43 | 2.40 | 2.40 |
| tomlins-2006-v2 | 2118 | 2118 | 2124 | 2103 | 2103 | 2.37 | 2.34 | 2.37 | 2.34 | 2.34 |
|
|
|
|
|
|
|
|
|
|
|
|
Classification error for different imputation methods (columns) and classificaiton methods (rows)
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
| SVM | alizadeh-2000-v1 | 9.52 | 9.52 | 9.52 | 9.52 | 11.90 |
| alizadeh-2000-v2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| alizadeh-2000-v3 | 6.45 | 6.45 | 6.45 | 6.45 | 6.45 | |
| bredel-2005 | 16.00 | 16.00 | 16.00 | 16.00 | 16.00 | |
| chen-2002 | 2.23 | 2.23 | 2.23 | 2.23 | 1.68 | |
| garber-2001 | 16.67 | 16.67 | 18.18 | 18.18 | 19.70 | |
| lapointe-2004-v1 | 1.82 | 1.82 | 1.82 | 1.82 | 1.82 | |
| lapointe-2004-v2 | 17.39 | 18.84 | 15.94 | 17.39 | 20.29 | |
| liang-2005 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| risinger-2003 | 21.43 | 19.05 | 19.05 | 19.05 | 19.05 | |
| tomlins-2006 | 6.73 | 7.69 | 6.73 | 5.77 | 5.77 | |
| tomlins-2006-v2 | 6.52 | 6.52 | 6.52 | 6.52 | 6.52 | |
| KNN | alizadeh-2000-v1 | 33.33 | 33.33 | 33.33 | 30.95 | 30.95 |
| alizadeh-2000-v2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |
| alizadeh-2000-v3 | 19.35 | 19.35 | 17.74 | 17.74 | 17.74 | |
| bredel-2005 | 20.00 | 20.00 | 20.00 | 20.00 | 20.00 | |
| chen-2002 | 11.73 | 11.17 | 12.29 | 12.29 | 12.29 | |
| garber-2001 | 16.67 | 16.67 | 16.67 | 16.67 | 18.18 | |
| lapointe-2004-v1 | 13.64 | 14.55 | 14.55 | 14.55 | 14.55 | |
| lapointe-2004-v2 | 33.33 | 36.23 | 33.33 | 33.33 | 34.78 | |
| liang-2005 | 2.70 | 2.70 | 2.70 | 2.70 | 2.70 | |
| risinger-2003 | 23.81 | 23.81 | 23.81 | 19.05 | 23.81 | |
| tomlins-2006 | 20.19 | 20.19 | 20.19 | 20.19 | 20.19 | |
| tomlins-2006-v2 | 21.74 | 21.74 | 21.74 | 21.74 | 21.74 | |
| NB | alizadeh-2000-v1 | 7.14 | 7.14 | 7.14 | 7.14 | 7.14 |
| alizadeh-2000-v2 | 1.61 | 1.61 | 1.61 | 1.61 | 1.61 | |
| alizadeh-2000-v3 | 8.06 | 8.06 | 6.45 | 6.45 | 6.45 | |
| bredel-2005 | 14.00 | 14.00 | 14.00 | 14.00 | 14.00 | |
| chen-2002 | 13.41 | 12.85 | 13.41 | 12.85 | 13.41 | |
| garber-2001 | 22.73 | 24.24 | 22.73 | 22.73 | 22.73 | |
| lapointe-2004-v1 | 23.64 | 23.64 | 23.64 | 21.82 | 22.73 | |
| lapointe-2004-v2 | 31.88 | 31.88 | 33.33 | 33.33 | 33.33 | |
| liang-2005 | 18.92 | 18.92 | 16.22 | 16.22 | 18.92 | |
| risinger-2003 | 23.81 | 23.81 | 23.81 | 26.19 | 23.81 | |
| tomlins-2006 | 15.38 | 15.38 | 14.42 | 14.42 | 14.42 | |
| tomlins-2006-v2 | 17.39 | 17.39 | 17.39 | 17.39 | 17.39 | |
| DT | alizadeh-2000-v1 | 28.57 | 30.95 | 11.90 | 23.81 | 23.81 |
| alizadeh-2000-v2 | 8.06 | 8.06 | 14.52 | 14.52 | 14.52 | |
| alizadeh-2000-v3 | 25.81 | 27.42 | 25.81 | 20.97 | 20.97 | |
| bredel-2005 | 40.00 | 38.00 | 44.00 | 44.00 | 44.00 | |
| chen-2002 | 6.15 | 7.82 | 7.26 | 6.70 | 6.70 | |
| garber-2001 | 28.18 | 21.21 | 21.21 | 22.73 | 19.70 | |
| lapointe-2004-v1 | 23.64 | 26.36 | 23.64 | 22.73 | 23.64 | |
| lapointe-2004-v2 | 28.99 | 30.43 | 36.23 | 36.23 | 33.33 | |
| liang-2005 | 8.11 | 8.11 | 8.11 | 8.11 | 8.11 | |
| risinger-2003 | 45.24 | 45.24 | 54.76 | 50.00 | 54.76 | |
| tomlins-2006 | 40.38 | 40.38 | 34.62 | 36.54 | 34.62 | |
| tomlins-2006-v2 | 40.22 | 40.22 | 38.04 | 35.87 | 34.78 |
Corrected Rand index for different imputation methods (columns) and clustering methods (rows)
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
|
| alizadeh-2000-v1 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 |
| alizadeh-2000-v2 | 0.89 | 0.89 | 0.84 | 0.89 | 0.84 | |
| alizadeh-2000-v3 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | |
| bredel-2005 | 0.41 | 0.41 | 0.41 | 0.41 | 0.41 | |
| chen-2002 | 0.3 | 0.3 | 0.3 | 0.3 | 0.3 | |
| garber-2001 | 0.55 | 0.51 | 0.54 | 0.49 | 0.54 | |
| lapointe-2004-v1 | 0.42 | 0.47 | 0.47 | 0.44 | 0.47 | |
| lapointe-2004-v2 | 0.17 | 0.17 | 0.15 | 0.17 | 0.15 | |
| liang-2005 | 0.5 | 0.5 | 0.5 | 0.5 | 0.5 | |
| risinger-2003 | 0.45 | 0.45 | 0.45 | 0.47 | 0.45 | |
| tomlins-2006 | 0.39 | 0.39 | 0.4 | 0.39 | 0.4 | |
| tomlins-2006 | 0.51 | 0.51 | 0.51 | 0.51 | 0.51 | |
| HC-CL | alizadeh-2000-v1 | 0.04 | 0.04 | 0.13 | 0.13 | 0.13 |
| alizadeh-2000-v2 | 0.54 | 0.54 | 0.52 | 0.40 | 0.52 | |
| alizadeh-2000-v3 | 0.38 | 0.38 | 0.39 | 0.47 | 0.39 | |
| bredel-2005 | -0.03 | 0.07 | 0.03 | 0.07 | 0.03 | |
| chen-2002 | -0.01 | -0.01 | -0.01 | -0.01 | -0.01 | |
| garber-2001 | 0.55 | 0.55 | 0.55 | 0.55 | 0.55 | |
| lapointe-2004-v1 | -0.01 | -0.01 | -0.01 | -0.01 | -0.01 | |
| lapointe-2004-v2 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | |
| liang-2005 | 0.12 | 0.12 | 0.12 | 0.12 | 0.12 | |
| risinger-2003 | 0.09 | 0.09 | 0.10 | 0.09 | 0.10 | |
| tomlins-2006 | 0.46 | 0.46 | 0.39 | 0.43 | 0.39 | |
| tomlins-2006 | 0.39 | 0.39 | 0.39 | 0.39 | 0.39 | |
| HC-AL | alizadeh-2000-v1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| alizadeh-2000-v2 | 0.79 | 0.79 | 0.79 | 0.79 | 0.79 | |
| alizadeh-2000-v3 | 0.40 | 0.40 | 0.44 | 0.44 | 0.44 | |
| bredel-2005 | -0.07 | -0.08 | -0.07 | -0.05 | -0.07 | |
| chen-2002 | -0.01 | -0.01 | -0.01 | -0.01 | -0.01 | |
| garber-2001 | 0.00 | 0.00 | 0.00 | 0.02 | 0.00 | |
| lapointe-2004-v1 | -0.01 | -0.01 | -0.01 | -0.01 | -0.01 | |
| lapointe-2004-v2 | 0.04 | 0.04 | 0.04 | 0.04 | 0.04 | |
| liang-2005 | 0.12 | 0.12 | 0.12 | 0.12 | 0.12 | |
| risinger-2003 | 0.14 | 0.14 | 0.12 | 0.14 | 0.12 | |
| tomlins-2006 | 0.44 | 0.44 | 0.41 | 0.41 | 0.41 | |
| tomlins-2006 | 0.56 | 0.56 | 0.56 | 0.56 | 0.56 |
Summary of the Friedman -values for the classification and clustering methods
|
|
|
|---|---|
| DT | 0.81 |
| KNN | 0.88 |
| NB | 0.82 |
| SVM | 0.99 |
| HCA-CL | 0.95 |
| HCA-AL | 0.89 |
|
| 0.99 |