| Literature DB >> 12697067 |
Charles C Kim1, Stanley Falkow.
Abstract
BACKGROUND: Genes that are determined to be significantly differentially regulated in microarray analyses often appear to have functional commonalities, such as being components of the same biochemical pathway. This results in certain words being under- or overrepresented in the list of genes. Distinguishing between biologically meaningful trends and artifacts of annotation and analysis procedures is of the utmost importance, as only true biological trends are of interest for further experimentation. A number of sophisticated methods for identification of significant lexical trends are currently available, but these methods are generally too cumbersome for practical use by most microarray users.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12697067 PMCID: PMC153504 DOI: 10.1186/1471-2105-4-12
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Variation in cumulative Poisson probabilities in response to number of random samples and SGL matches
| 0.993 ± 0.014 | 0.981 ± 0.045 | 0.957 ± 0.057 | 0.898 ± 0.126 | 0.799 ± 0.167 | 0.626 ± 0.240 | |
| 0.996 ± 0.003 | 0.987 ± 0.008 | 0.963 ± 0.017 | 0.914 ± 0.033 | 0.812 ± 0.059 | 0.643 ± 0.072 | |
| 0.996 ± 0.001 | 0.987 ± 0.003 | 0.964 ± 0.005 | 0.913 ± 0.011 | 0.810 ± 0.017 | 0.647 ± 0.021 | |
Poisson analysis was performed on the described dataset 100 times using 10, 100, or 1000 random samples. The mean of the cumulative probabilities is reported with two standard deviations. The original SGL contained 8 matches (first column); the additional columns were generated using a synthetic SGL modified to contain the specified number of matches.
Figure 1Binomial distribution of SPI-2 genes in a dataset The total filtered dataset consisted on 4290 unique elements. An SGL of 256 genes was generated using SAM and analyzed for 34 members of SPI-2. The arrow indicates the number of matches in the SGL, with P(x > 8) = 0.004. The binomial analysis required 5 seconds; Poisson analysis of the same datasets required 7 seconds. A 21,450 element dataset created by replicating the 4290 element dataset 5 times required 8 seconds for binomial analysis. The files used for this analysis are available at the LACK website or as supplementary data.