| Literature DB >> 19014579 |
Rongheng Lin1, Shuangshuang Dai, Richard D Irwin, Alexandra N Heinloth, Gary A Boorman, Leping Li.
Abstract
BACKGROUND: Recently, microarray data analyses using functional pathway information, e.g., gene set enrichment analysis (GSEA) and significance analysis of function and expression (SAFE), have gained recognition as a way to identify biological pathways/processes associated with a phenotypic endpoint. In these analyses, a local statistic is used to assess the association between the expression level of a gene and the value of a phenotypic endpoint. Then these gene-specific local statistics are combined to evaluate association for pre-selected sets of genes. Commonly used local statistics include t-statistics for binary phenotypes and correlation coefficients that assume a linear or monotone relationship between a continuous phenotype and gene expression level. Methods applicable to continuous non-monotone relationships are needed. Furthermore, for multiple experimental categories, methods that combine multiple GSEA/SAFE analyses are needed.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19014579 PMCID: PMC2636811 DOI: 10.1186/1471-2105-9-481
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
NCT compendium microarray data. In time rows, the numbers from 0 to 3 indicate vehicle, low, medium and high doses.
| Compound | 1.2.Di | 1.4.Di | Brom | Diqu | Gala | Mono | N-Nitr | Thio |
| 6 Hrs | 0–3 | 0–3 | 0–3 | 0–4 | 0–3 | 0–3 | 0–3 | 0–3 |
| 24 Hrs | 0–3 | 0–3 | 0–3 | 0–4 | 0–3 | 0–3 | 0–3 | 0–3 |
| 48 Hrs | 0–3* | 0–3 | 0–3 | 0–4 | 0–3 | 0–2† | 0–3 | 0–3 |
| Replicates | 4 | 4 | 4 | 6 | 4 | 4 | 4 | 4 |
| Array totals | 34 | 36 | 36 | 72 | 36 | 32 | 36 | 36 |
Diquat dibromide has 4 levels. Array numbers do not include vehicle treated animals, which were used as baseline in cDNA microarray. *: Two animals in high dose group died before 48 hours. †: All four animals in high dose group died before 48 hours.
Figure 1log Examples of non-monotone relationship in liver between expression levels of two genes and ALT. Each color represents one compound, and symbols (○, Δ, +) indicate the low, medium and high doses respectively. For the compound (diquat) with four dose levels, symbol "×" indicates the highest dose. Time information is not indicated. The smoothing line is fitted using all data, with natural cubic splines.
Figure 2(a) Genes' percentiles based on Gray dots show (R2, r2) and the solid line is a smoothing (lowess) line. The dashed line is y = x. (b) Observed R2 - r2 vs randomly generated R2 - r2. The dashed line is y = x.
Figure 3Rich association types between genes in gene set X-axis is the expression level of the gene G6pc which has the largest standard deviation in the gene set. Y-axis is the expression level for all genes in the set. Natural cubic splines with 4 inner knots at quartiles are fitted using G6pc as the predictor variable. Data for monocrotaline in liver are used.
The number of identified sets and overlaps of two runs.
| Random seed 1 | Random seed 2 | Overlap | |
| | | 38 | 37 | 36 |
| | | 2 | 5 | 2 |
| | | 0 | 0 | 0 |
| | | 74 | 70 | 68 |
| | | 28 | 30 | 27 |
| | | 5 | 5 | 5 |