| Literature DB >> 23282414 |
Lijing Xu1, Cheng Cheng, E Olusegun George, Ramin Homayouni.
Abstract
BACKGROUND: Gene expression data are noisy due to technical and biological variability. Consequently, analysis of gene expression data is complex. Different statistical methods produce distinct sets of genes. In addition, selection of expression p-value (EPv) threshold is somewhat arbitrary. In this study, we aimed to develop novel literature based approaches to integrate functional information in analysis of gene expression data.Entities:
Mesh:
Year: 2012 PMID: 23282414 PMCID: PMC3535704 DOI: 10.1186/1471-2164-13-S8-S23
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Overview of the LBFS algorithm. A statistical test was applied to get differentially expressed genes (DEGs) from the original labeled (OL) and permutated labeled (PL) samples. Subsets of 50 genes were randomly selected 1000 times from each pool of DEGs. Then literature p-values (LPvs) were calculated for each 50 gene-set. A Fisher's Exact test was used to determine if the proportion (called LCI) of subsets with LPv <0.5 in the OL group was significantly different from that obtained from PL group.
Figure 2Relationship between EPV and LCI. The fraction of gene sets with LPv < 0.05 (y-axis) was plotted at various expression p-value (EPv) thresholds (x-axis) for 3 different datasets. Inset shows magnified view for EPv < 0.10.
Literature based functional significance (LBFS) of gene sets generated by four statistical tests for three different microarray experiments.
| LCI | LBFS | |||||
|---|---|---|---|---|---|---|
| Gene list | PGC-1beta | IL2 | ET1 | PGC-1beta | IL2 | ET1 |
| Welch t-Test | 0.34 | 0.34 | 0.17 | 7.08E-06 | 0.0004 | 0.45 |
| Mann-Whitney | 0.2 | 0.2 | 0.13 | 0.118 | 0.0075 | 1 |
| Student t-Test | 0.38 | 0.38 | 0.1 | 1.24E-07 | 0.071 | 1 |
| Empirical Bayes | 0.4 | 0.19 | 0.05 | 1.36E-08 | 0.11 | 1 |
For comparison the Literature Cohesion Index (LCI) which is used to calculate LBFS is displayed for each experiment.
Figure 3Relationship between EPV and LCI at various thresholds. The LCI at various LPv thresholds ranging from 0.01 to 0.1 (y-axis) was plotted against various EPv thresholds (x-axis) for PGC-1beta dataset. Inset shows magnified view for EPv < 0.10. The shapes of the curves are similar at various LPv thresholds.
Number of significant genes identified by student t-test after correction for multiple hypotheses testing
| # of tests | # of genes with p <0.05 | Storey pFDR q<0.1 | BH FDR <0.1 | Bonferroni FWER <0.1 | Westfall Young Permutation | |
|---|---|---|---|---|---|---|
| IL2 | 20558 | 5001 | 5955 | 3827 | 32 | 95 |
| PGC-1beta | 17633 | 2618 | 1 | 1 | 1 | 1 |
| ET1 | 20477 | 1559 | 0 | 0 | 0 | 0 |