Literature DB >> 32050700

Model-Based Clustering with Measurement or Estimation Errors.

Wanli Zhang1, Yanming Di1.   

Abstract

Model-based clustering with finite mixture models has become a widely used clustering method. One of the recent implementations is MCLUST. When objects to be clustered are summary statistics, such as regression coefficient estimates, they are naturally associated with estimation errors, whose covariance matrices can often be calculated exactly or approximated using asymptotic theory. This article proposes an extension to Gaussian finite mixture modeling-called MCLUST-ME-that properly accounts for the estimation errors. More specifically, we assume that the distribution of each observation consists of an underlying true component distribution and an independent measurement error distribution. Under this assumption, each unique value of estimation error covariance corresponds to its own classification boundary, which consequently results in a different grouping from MCLUST. Through simulation and application to an RNA-Seq data set, we discovered that under certain circumstances, explicitly, modeling estimation errors, improves clustering performance or provides new insights into the data, compared with when errors are simply ignored, whereas the degree of improvement depends on factors such as the distribution of error covariance matrices.

Entities:  

Keywords:  RNA-seq; classification boundary; clustering analysis; expectation-maximization algorithm; gaussian finite mixture model; gene expression; uncertainty

Mesh:

Year:  2020        PMID: 32050700      PMCID: PMC7074130          DOI: 10.3390/genes11020185

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


  4 in total

1.  PATTERN CLUSTERING BY MULTIVARIATE MIXTURE ANALYSIS.

Authors:  J H Wolfe
Journal:  Multivariate Behav Res       Date:  1970-04-01       Impact factor: 5.923

2.  Single-gene negative binomial regression models for RNA-Seq data with higher-order asymptotic inference.

Authors:  Yanming Di
Journal:  Stat Interface       Date:  2015       Impact factor: 0.582

3.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.

Authors:  Luca Scrucca; Michael Fop; T Brendan Murphy; Adrian E Raftery
Journal:  R J       Date:  2016-08       Impact factor: 3.984

4.  An approach for clustering gene expression data with error information.

Authors:  Brian Tjaden
Journal:  BMC Bioinformatics       Date:  2006-01-12       Impact factor: 3.169

  4 in total
  1 in total

1.  Statistics in the Genomic Era.

Authors:  Hui Jiang; Kevin He
Journal:  Genes (Basel)       Date:  2020-04-18       Impact factor: 4.096

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.