Andrea Rau1, Cathy Maugis-Rabusseau2, Marie-Laure Martin-Magniette3, Gilles Celeux2. 1. INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France. 2. INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France. 3. INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV, Saclay Plant Sciences, Evry, France, CNRS, ERL 8196, URGV, Saclay Plant Sciences, Evry, France and Inria Saclay - Île-de-France, Orsay, France INRA, UMR1313 Génétique animale et biologie intégrative, Jouy-en-Josas, France, AgroParisTech, UMR1313 Génétique animale et biologie intégrative, Paris 05, France, Institut de Mathématiques de Toulouse, INSA de Toulouse, Université de Toulouse, Toulouse, France, UMR AgroParisTech/INRA MIA 518, Paris, France, INRA, UMR 1165 URGV, Saclay Plant Sciences, Evry, France, UEVE, UMR URGV,
Abstract
MOTIVATION: In recent years, gene expression studies have increasingly made use of high-throughput sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression (DGE) has flourished, primarily in the context of normalization and differential analysis. RESULTS: In this work, we focus on the question of clustering DGE profiles as a means to discover groups of co-expressed genes. We propose a Poisson mixture model using a rigorous framework for parameter estimation as well as the choice of the appropriate number of clusters. We illustrate co-expression analyses using our approach on two real RNA-seq datasets. A set of simulation studies also compares the performance of the proposed model with that of several related approaches developed to cluster RNA-seq or serial analysis of gene expression data. AVAILABILITY AND AND IMPLEMENTATION: The proposed method is implemented in the open-source R package HTSCluster, available on CRAN. CONTACT: andrea.rau@jouy.inra.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: In recent years, gene expression studies have increasingly made use of high-throughput sequencing technology. In turn, research concerning the appropriate statistical methods for the analysis of digital gene expression (DGE) has flourished, primarily in the context of normalization and differential analysis. RESULTS: In this work, we focus on the question of clustering DGE profiles as a means to discover groups of co-expressed genes. We propose a Poisson mixture model using a rigorous framework for parameter estimation as well as the choice of the appropriate number of clusters. We illustrate co-expression analyses using our approach on two real RNA-seq datasets. A set of simulation studies also compares the performance of the proposed model with that of several related approaches developed to cluster RNA-seq or serial analysis of gene expression data. AVAILABILITY AND AND IMPLEMENTATION: The proposed method is implemented in the open-source R package HTSCluster, available on CRAN. CONTACT: andrea.rau@jouy.inra.fr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.