| Literature DB >> 16423300 |
Abstract
BACKGROUND: In binary high-throughput screening projects where the goal is the identification of low-frequency events, beyond the obvious issue of efficiency, false positives and false negatives are a major concern. Pooling constitutes a natural solution: it reduces the number of tests, while providing critical duplication of the individual experiments, thereby correcting for experimental noise. The main difficulty consists in designing the pools in a manner that is both efficient and robust: few pools should be necessary to correct the errors and identify the positives, yet the experiment should not be too vulnerable to biological shakiness. For example, some information should still be obtained even if there are slightly more positives or errors than expected. This is known as the group testing problem, or pooling problem.Entities:
Mesh:
Year: 2006 PMID: 16423300 PMCID: PMC1409803 DOI: 10.1186/1471-2105-7-28
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Guaranteed error correction and detection properties of STD. An experimenter, expecting up to t positives and E errors, chooses a satisfactory prime number q and builds the set of pools STD(n; q; t·Γ+2·E+1), as specified in corollary 2. Recall that n is the total number of variables and Γ is the compression power, i.e. the smallest γ such that qγ+1 ≥ n. This figure summarizes the behavior of these pools when the actual number of errors exceeds E, and distinguishes between the two types of errors: false positives and false negatives. In the dark blue region, all errors are detected and corrected. In the intermediate blue rectangles, correction is not guaranteed but detection is: in an unfavorable conformation of positives and errors, correction of all errors may fail, but this failure cannot go unnoticed, and the user can therefore plan additional experiments. In the cyan square, detection is usually also guaranteed, except if E is very small (E < 2·Γ-1): in this case, the line y = 3·E+1-x splits the square in two, and detection is only guaranteed in the bottom left portion, where the total number of errors is at most 3·E+1. Finally, in the outer pale cyan zone, no guarantee is provided.
Choosing the optimal value for the number of pools per layer, q
| ≤ 13 | ≥ 3 | ≥ 16 | k > q+1, can't use these values | |
| 17 | 3 | 16 | 272 | 36.8 |
| 19 | 3 | 16 | 304 | 32.9 |
| 23 | 2 | 11 | 253 | 39.5 |
| 29 | 2 | 11 | 319 | 31.3 |
| ... | 2 | 11 | ... | ... |
| 97 | 2 | 11 | 1067 | 9.4 |
| 101 | 1 | 6 | 606 | 16.5 |
This table shows the gains obtained with various q values, when the total number of variables to be tested is n = 10000 and the number of expected positives is t = 5, in a noiseless experiment (E = 0). Γ is the compression power (i.e. logarithm of n in base q, see Preliminaries in Results(1) section), k is the number of layers, v is the number of pools (i.e. k·q), and the gain is defined as n/v. By construction, STD requires k ≤ q+1; and to guarantee the identification of t positives while correcting E errors, section 3.3 showed that we must choose k = t·Γ+2·E+1; in this example, k = 5Γ+1. Often, the smallest useable q (i.e., satisfying k ≤ q+1), qmin, yields the highest gain, but this is not always the case. In this example, qmin = 17, but q = 23 (smallest q such that Γ = 2) yields the highest gain: 39.5.
Gains obtained when the identification of 3 positives and the correction of 2 errors is guaranteed (t = 3, E = 2)
| 100 | 11 | 9 | 8 | 88 | 1.1 |
| 1000 | 11 | 91 | 11 | 121 | 8.3 |
| 104 | 13 | 769 | 14 | 182 | 55 |
| 105 | 19 | 5263 | 14 | 266 | 376 |
| 106 | 19 | 52631 | 17 | 323 | 3096 |
For each value of n (total number of variables), the optimal q value qopt has been calculated, as well as the associated pool size, the number of layers k, the total number of pools v, and the gain.