| Literature DB >> 21685083 |
Nicholas A Furlotte1, Hyun Min Kang, Chun Ye, Eleazar Eskin.
Abstract
MOTIVATION: The analysis of gene coexpression is at the core of many types of genetic analysis. The coexpression between two genes can be calculated by using a traditional Pearson's correlation coefficient. However, unobserved confounding effects may cause inflation of the Pearson's correlation so that uncorrelated genes appear correlated. Many general methods have been suggested, which aim to remove the effects of confounding from gene expression data. However, the residual confounding which is not accounted for by these generic correction procedures has the potential to induce correlation between genes. Therefore, a method that specifically aims to calculate gene coexpression between gene expression arrays, while accounting for confounding effects, is desirable.Entities:
Mesh:
Year: 2011 PMID: 21685083 PMCID: PMC3117390 DOI: 10.1093/bioinformatics/btr221
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The distributions of coexpression ranks for a set of 732 probe pairs, for which both probes in a pair target the same gene. The coexpression values for each probe pair are ranked with respect to all other pairwise coexpression values. Smaller ranks indicate higher coexpression. We expect that probes targeting the same gene should be highly coexpressed and therefore should have very low rank. The MMC method consistently ranks these coexpressions lower when compared to the other two methods.
Fig. 2.Comparison of the concordance between two yeast datasets for both methods Concordance between two sets of coexpressions is compared by looking at the proportion of coexpressions in common for the top ranking coexpressions. The x-axis represents the number of top ranked coexpressions considered, while the y-axis represents the proportion of those coexpressions that are common between the new and old dataset.
Fig. 3.Distribution of gene-module P-values for Pearson, SVA and MMC. We used a set of 233 known functional modules consisting of sets of genes of size 2 to 20. For each of these modules, a P-value representing the biological significance is calculated. This figure plots the distributions of these P-values. Since the P-values were calculated for gene sets known to be functionally related, we expect that there should be an inflation of significant P-values. It can be seen that the MMC method produces a larger number of significant P-values when compared to both the traditional Pearson and SVA-corrected coexpressions.