| Literature DB >> 20021653 |
Clark D Jeffries1, William O Ward, Diana O Perkins, Fred A Wright.
Abstract
BACKGROUND: Improvements in high-throughput technology and its increasing use have led to the generation of many highly complex datasets that often address similar biological questions. Combining information from these studies can increase the reliability and generalizability of results and also yield new insights that guide future research.Entities:
Mesh:
Substances:
Year: 2009 PMID: 20021653 PMCID: PMC2813853 DOI: 10.1186/1471-2105-10-431
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1BLANKET applied to comparison of two illustrative lists of 500 descriptors. The first ten from Experiment A appear in scrambled order within the first ten of Experiment B. BLANKET suggests that four combinations of shortlists are sufficiently coincidental to meet a p-value of .05. Note that three of the four selected shortlist pairs (p, q) have unequal numbers of selected descriptors.
Figure 2BLANKET applied to a second set illustrative lists of n = 500 descriptors. Descriptors in Experiment A are ranked in canonical order 1, 2, ..., 500. To the same ranking, weighted noise is added to arrive at an Experiment B ranking of 7, 26, 17, 32, 21, 34, 12, 46, 49, 14, 57, 54, 67, 19, 61, 28, 1, 15, 82,... BLANKET finds no shortlists of length at most 10 that meet the criteria for significance (p-value ≤ .05, corresponding to RFET < .00105). However, BLANKET finds three shortlist pairs as shown of length < 20 that do meet the same (RFET < .00800). Thus BLANKET, not knowing the effects of noise, would recommend to the researcher these descriptors for further investigation. Note the characteristic sharp decline in RFET values near the chosen shortlists. This example is relevant to the case of one experiment performed with great accuracy and the other with substantial noise. Note that each selected shortlist pair (p, q) has unequal numbers of descriptors selected from both experiments (p ≠ q).
Figure 3BLANKET applied to ranked list date of 190 descriptors from He et al. (He et al. 2005). BLANKET suggests that three combinations of shortlists are sufficiently coincidental to meet a p-value of .05. Note that each selected shortlist pair (p, q) has unequal numbers of descriptors selected from both experiments (p ≠ q).
BLANKET multiple comparison-corrected significance threshold values for p-value 0.05.
| n | p, q ≤ 20 | p, q ≤ 10 |
|---|---|---|
| 100 | 0.00424 | 0.00192 |
| 200 | 0.00549 | 0.00150 |
| 300 | 0.00666 | 0.00119 |
| 400 | 0.00657 | 0.00104 |
| 500 | 0.00800 | 0.00105 |
Figure 4BLANKET applied to comparison of 289 genes within lung cancer microarray studies of Stearman and Bhattacharjee. One pair of shortlists with 14 descriptors from the first and eight from the second yields a RFET score = .0044; this is less than .0066, the level that insures a p-value significance level (.05) for any shortlists with twenty or fewer members from a universe of 300 members.
Figure 5BLANKET applied to comparison of lung cancer microarray studies of Beer and Stearman. The method fails to find a threshold pair with low RFET score < .00800, which would be sufficient for shortlists with up to 20 members to have statistical significance in a universe of 489 descriptors. This surface is more organized than random BLANKETs, since there is a sharp decrease from 1 to low values, but it is less organized than those in Figures 3 and 4.