Literature DB >> 27037601

PCAN: Probabilistic correlation analysis of two non-normal data sets.

Roger S Zoh1, Bani Mallick2, Ivan Ivanov3, Veera Baladandayuthapani4, Ganiraju Manyam4, Robert S Chapkin5, Johanna W Lampe6, Raymond J Carroll2.   

Abstract

Most cancer research now involves one or more assays profiling various biological molecules, e.g., messenger RNA and micro RNA, in samples collected on the same individuals. The main interest with these genomic data sets lies in the identification of a subset of features that are active in explaining the dependence between platforms. To quantify the strength of the dependency between two variables, correlation is often preferred. However, expression data obtained from next-generation sequencing platforms are integer with very low counts for some important features. In this case, the sample Pearson correlation is not a valid estimate of the true correlation matrix, because the sample correlation estimate between two features/variables with low counts will often be close to zero, even when the natural parameters of the Poisson distribution are, in actuality, highly correlated. We propose a model-based approach to correlation estimation between two non-normal data sets, via a method we call Probabilistic Correlations ANalysis, or PCAN. PCAN takes into consideration the distributional assumption about both data sets and suggests that correlations estimated at the model natural parameter level are more appropriate than correlations estimated directly on the observed data. We demonstrate through a simulation study that PCAN outperforms other standard approaches in estimating the true correlation between the natural parameters. We then apply PCAN to the joint analysis of a microRNA (miRNA) and a messenger RNA (mRNA) expression data set from a squamous cell lung cancer study, finding a large number of negative correlation pairs when compared to the standard approaches.
© 2016, The International Biometric Society.

Entities:  

Keywords:  Canonical correlation analysis; Correlation; Generalized linear models; Poisson regression; RNA-sequencing

Mesh:

Substances:

Year:  2016        PMID: 27037601      PMCID: PMC5045754          DOI: 10.1111/biom.12516

Source DB:  PubMed          Journal:  Biometrics        ISSN: 0006-341X            Impact factor:   2.571


  15 in total

1.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

2.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis.

Authors:  Marie-Agnès Dillies; Andrea Rau; Julie Aubert; Christelle Hennequet-Antier; Marine Jeanmougin; Nicolas Servant; Céline Keime; Guillemette Marot; David Castel; Jordi Estelle; Gregory Guernec; Bernd Jagla; Luc Jouneau; Denis Laloë; Caroline Le Gall; Brigitte Schaëffer; Stéphane Le Crom; Mickaël Guedj; Florence Jaffrézic
Journal:  Brief Bioinform       Date:  2012-09-17       Impact factor: 11.622

3.  Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates.

Authors:  Steven P Lund; Dan Nettleton; Davis J McCarthy; Gordon K Smyth
Journal:  Stat Appl Genet Mol Biol       Date:  2012-10-22

4.  Integrated microRNA and mRNA expression profiling in a rat colon carcinogenesis model: effect of a chemo-protective diet.

Authors:  Manasvi S Shah; Scott L Schwartz; Chen Zhao; Laurie A Davidson; Beiyan Zhou; Joanne R Lupton; Ivan Ivanov; Robert S Chapkin
Journal:  Physiol Genomics       Date:  2011-03-15       Impact factor: 3.107

5.  A scaling normalization method for differential expression analysis of RNA-seq data.

Authors:  Mark D Robinson; Alicia Oshlack
Journal:  Genome Biol       Date:  2010-03-02       Impact factor: 13.583

6.  integrOmics: an R package to unravel relationships between two omics datasets.

Authors:  Kim-Anh Lê Cao; Ignacio González; Sébastien Déjean
Journal:  Bioinformatics       Date:  2009-08-25       Impact factor: 6.937

7.  Aberrant signaling pathways in squamous cell lung carcinoma.

Authors:  Ivy Shi; Nooshin Hashemi Sadraei; Zhong-Hui Duan; Ting Shi
Journal:  Cancer Inform       Date:  2011-11-21

8.  Differential expression analysis for sequence count data.

Authors:  Simon Anders; Wolfgang Huber
Journal:  Genome Biol       Date:  2010-10-27       Impact factor: 13.583

9.  Comparative evaluation of gene set analysis approaches for RNA-Seq data.

Authors:  Yasir Rahmatallah; Frank Emmert-Streib; Galina Glazko
Journal:  BMC Bioinformatics       Date:  2014-12-05       Impact factor: 3.169

10.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

View more
  1 in total

Review 1.  Gut-host Crosstalk: Methodological and Computational Challenges.

Authors:  Ivan Ivanov
Journal:  Dig Dis Sci       Date:  2020-03       Impact factor: 3.199

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.