Literature DB >> 17933008

Two-pass imputation algorithm for missing value estimation in gene expression time series.

Elena Tsiporkova1, Veselka Boeva.   

Abstract

Gene expression microarray experiments frequently generate datasets with multiple values missing. However, most of the analysis, mining, and classification methods for gene expression data require a complete matrix of gene array values. Therefore, the accurate estimation of missing values in such datasets has been recognized as an important issue, and several imputation algorithms have already been proposed to the biological community. Most of these approaches, however, are not particularly suitable for time series expression profiles. In view of this, we propose a novel imputation algorithm, which is specially suited for the estimation of missing values in gene expression time series data. The algorithm utilizes Dynamic Time Warping (DTW) distance in order to measure the similarity between time expression profiles, and subsequently selects for each gene expression profile with missing values a dedicated set of candidate profiles for estimation. Three different DTW-based imputation (DTWimpute) algorithms have been considered: position-wise, neighborhood-wise, and two-pass imputation. These have initially been prototyped in Perl, and their accuracy has been evaluated on yeast expression time series data using several different parameter settings. The experiments have shown that the two-pass algorithm consistently outperforms, in particular for datasets with a higher level of missing entries, the neighborhood-wise and the position-wise algorithms. The performance of the two-pass DTWimpute algorithm has further been benchmarked against the weighted K-Nearest Neighbors algorithm, which is widely used in the biological community; the former algorithm has appeared superior to the latter one. Motivated by these findings, indicating clearly the added value of the DTW techniques for missing value estimation in time series data, we have built an optimized C++ implementation of the two-pass DTWimpute algorithm. The software also provides for a choice between three different initial rough imputation methods.

Entities:  

Mesh:

Year:  2007        PMID: 17933008     DOI: 10.1142/s0219720007003053

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  6 in total

1.  Subject-specific Estimation of Missing Cortical Thickness Maps in Developing Infant Brains.

Authors:  Yu Meng; Gang Li; Yaozong Gao; John H Gilmore; Weili Lin; Dinggang Shen
Journal:  Med Comput Vis (2015)       Date:  2016-07-30

2.  Learning-based subject-specific estimation of dynamic maps of cortical morphology at missing time points in longitudinal infant studies.

Authors:  Yu Meng; Gang Li; Yaozong Gao; Weili Lin; Dinggang Shen
Journal:  Hum Brain Mapp       Date:  2016-11       Impact factor: 5.038

3.  Clusters of temporal discordances reveal distinct embryonic patterning mechanisms in Drosophila and anopheles.

Authors:  Dmitri Papatsenko; Michael Levine; Yury Goltsev
Journal:  PLoS Biol       Date:  2011-01-25       Impact factor: 8.029

4.  Time warping of evolutionary distant temporal gene expression data based on noise suppression.

Authors:  Yury Goltsev; Dmitri Papatsenko
Journal:  BMC Bioinformatics       Date:  2009-10-26       Impact factor: 3.169

5.  Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments.

Authors:  Magalie Celton; Alain Malpertuy; Gaëlle Lelandais; Alexandre G de Brevern
Journal:  BMC Genomics       Date:  2010-01-07       Impact factor: 3.969

6.  A formal concept analysis approach to consensus clustering of multi-experiment expression data.

Authors:  Anna Hristoskova; Veselka Boeva; Elena Tsiporkova
Journal:  BMC Bioinformatics       Date:  2014-05-19       Impact factor: 3.169

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.