| Literature DB >> 28086746 |
Wei-Sheng Wu1, Meng-Jhun Jhou2.
Abstract
BACKGROUND: Missing value imputation is important for microarray data analyses because microarray data with missing values would significantly degrade the performance of the downstream analyses. Although many microarray missing value imputation algorithms have been developed, an objective and comprehensive performance comparison framework is still lacking. To solve this problem, we previously proposed a framework which can perform a comprehensive performance comparison of different existing algorithms. Also the performance of a new algorithm can be evaluated by our performance comparison framework. However, constructing our framework is not an easy task for the interested researchers. To save researchers' time and efforts, here we present an easy-to-use web tool named MVIAeval (Missing Value Imputation Algorithm evaluator) which implements our performance comparison framework.Entities:
Keywords: Algorithm; Microarray data; Missing value imputation; Performance comparison; Performance index; Web tool
Mesh:
Year: 2017 PMID: 28086746 PMCID: PMC5237319 DOI: 10.1186/s12859-016-1429-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The 20 benchmark microarray datasets of different types and different species
| GEO | Size | Type | Organism | Title |
|---|---|---|---|---|
| GDS3323 | 45101x6 | Non-time series | Mus musculus | Na+/H+ exchanger 3 deficiency effect on the colon |
| GDS3215 | 12625x6 | Non-time series |
| 13-cis retinoic acid effect on SEB-1 sebocyte cell line |
| GDS3485 | 45011x6 | Non-time series |
| Zinc transporter SLC39A13 deficiency effect on chondrocytes |
| GDS3476 | 45011x6 | Non-time series |
| NF-E2-related factor 2 Nrf2 activation effect on the liver |
| GDS3197 | 45101x6 | Non-time series |
| Transcriptional coactivator PGC-1beta hypomorphic mutation effect on the liver |
| GDS3149 | 45101x6 | Non-time series |
| Suppressor of cytokine signaling 3 deficiency effect on the regenerating liver |
| GDS2107 | 15923x6 | Non-time series |
| Long-term ethanol consumption effect on pancreas |
| GDS3464 | 15617x6 | Non-time series |
| SPT5 mutant embryos |
| GDS3426 | 23015x6 | Non-time series |
| Staphylococcus epidermidis SarZ mutant |
| GDS3421 | 10208x6 | Non-time series |
| Frag1 cells response to ionic and non-ionic hyperosmotic stress |
| GDS3360 | 22575x8 | Time |
| Chlamydia pneumoniae infection effect on HL epithelial cells: time course |
| GDS2863 | 31099x6 | Time | Rattus | Tienilic acid effect on the liver: time course |
| GDS5057 | 34760x8 | Time |
| Mepenzolate bromide effect on lung: time course |
| GDS5055 | 45307x10 | Time |
| Histone demethylase KDM1A deficiency effect on 3 T3-L1 preadipocytes: time course |
| GDS3428 | 22283x9 | Time |
| Immature dendritic cell response to butanol fraction of Echinacea purpurea: time course |
| GDS4484 | 45101x8 | Time |
| Cerebellar neuronal cell response to thyroid hormone: time course |
| GDS3785 | 17589x8 | Time | Homo sapiens | Osteoarthritic chondrocytes and healthy mesenchymal stem cell during chondrogenic differentiation: time course |
| GDS3930 | 8799x9 | Time |
| Bone morphogenic protein effect on cultured sympathetic neurons: time course |
| GDS4321 | 10208x8 | Time |
| Escherichia coli O157:H7 response to cinnamaldehyde: time course |
| GDS3032 | 22277x8 | Time series | Homo sapiens | Quercetin effect on intestinal cell differentiation in vitro: time course |
The 12 existing algorithms implemented in MVIAeval
| Algorithm | Year of Publication | Category | Reference |
|---|---|---|---|
| SVD | 2001 | Global | [ |
| BPCA | 2003 | Global | [ |
| KNN | 2001 | Local | [ |
| SKNN | 2004 | Local | [ |
| IKNN | 2007 | Local | [ |
| LS | 2004 | Local | [ |
| LLS | 2005 | Local | [ |
| ILLS | 2006 | Local | [ |
| SLLS | 2008 | Local | [ |
| Shrinkage LLS | 2013 | Local | [ |
| Shrinkage SLLS | 2013 | Local | [ |
| Shrinkage ILLS | 2013 | Local | [ |
Fig. 1Three performance indices implemented in MVIAeval. MVIAeval implements three performance indices, which are a 1/NRMSE, b CPP and c BLCI. Here we provide an example to show how the scores of these three performance indices are calculated
Fig. 2The simulation procedure for evaluating the performance of an algorithm. The simulation procedure for evaluating the performance of an imputation algorithm (e.g. KNN) for a given complete benchmark microarray data matrix using a performance index (e.g. CPP) is divided into four steps
Fig. 3The flowchart of MVIAeval. The flowchart shows how MVIAeval conducts a comprehensive performance comparison for a new algorithm
Fig. 4The input and five settings of MVIAeval. Users need to a upload the R code of their new algorithm, b select the test datasets among 20 benchmark microarray (time series or non-time series) datasets, c select the compared algorithms among 12 existing algorithms, d select the performance indices from three existing ones, the comprehensive performance scores from two possible choices, and the number of simulation runs
Fig. 5The output of MVIAeval. For demonstration purpose, we upload the R code of a sample algorithm as the user’s new algorithm and select two benchmark datasets (GDS3215 and GDS3785), 12 existing algorithms, three performance indices, the overall ranking score as the comprehensive performance score, and 25 simulation runs. a The webpage of the comprehensive performance comparison results shows that the overall performance of the user’s algorithm (denoted as USER) ranks six among all the 13 compared algorithms. b By clicking “details” in the row of BLCI for the benchmark dataset GDS3785, users can see the performance comparison results using only BLCI score for the benchmark dataset GDS3785. It can be seen that the user’s algorithm ranks five among the 13 compared algorithms using only BLCI score for the benchmark dataset GDS3785. The details of BLCI score for each algorithm can also be found
MVIAeval can provide the performance comparison results in many scenarios
| Performance | Benchmark datasets | Ranking of USER using ORS | Ranking of USER using ONS |
|---|---|---|---|
| 1/NRMSE | Five Time Series [ | 5 | 6 |
| Five Non-time Series [ | 6 | 6 | |
| CPP | Five Time Series [ | 7 | 9 |
| Five Non-time Series [ | 11 | 8 | |
| BLCI | Five Time Series [ | 3 | 4 |
| Five Non-time Series [ | 7 | 7 | |
| 1/NRMSE + CPP + BLCI | Five Time Series [ | 6 | 7 |
| Five Non-time Series [ | 6 | 6 |
The performance comparison results of the user’s algorithm (denoted as USER) and various existing algorithms using different types of datasets (time series or non-time series), different performance indices (1/NRMSE, CPP or BLCI), and different overall performance scores (overall ranking score (ORS) or overall normalized score (ONS)) are shown. More details could be seen at http://cosbi.ee.ncku.edu.tw/MVIAeval/A_Case_Study