| Literature DB >> 16191193 |
Bart Pieterse1, Elisabeth J Quirijns, Frank H J Schuren, Mariët J van der Werf.
Abstract
BACKGROUND: Clone-based microarrays, on which each spot represents a random genomic fragment, are a good alternative to open reading frame-based microarrays, especially for microorganisms for which the complete genome sequence is not available. Since the generation of a genomic DNA library is a random process, it is beforehand uncertain which genes are represented. Nevertheless, the genome coverage of such an array, which depends on different variables like the insert size and the number of clones in the library, can be predicted by mathematical approaches. When applying the classical formulas that determine the probability that a certain sequence is represented in a DNA library at the nucleotide level, massive amounts of clones would be necessary to obtain a proper coverage of the genome.Entities:
Mesh:
Year: 2005 PMID: 16191193 PMCID: PMC1262695 DOI: 10.1186/1471-2105-6-238
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Overview of prokaryotes from several genera with their genes/transcription unit-ratio. Microorganisms that were used for model development (M) or validation (V) of the MIC- and the GSI-equation are depicted in the list.
| 1.6 | |||
| 1.6 | |||
| 1.6 | M | ||
| 1.4 | |||
| 1.6 | |||
| 1.4 | |||
| 1.5 | |||
| 1.5 | V | ||
| 1.5 | |||
| 1.7 | |||
| 1.7 | V | ||
| 1.5 | |||
| 1.5 | V | ||
| 1.8 | M | ||
| 1.5 | |||
| 1.5 | |||
| 1.6 | M | ||
| 1.6 | |||
| 1.6 | |||
| 1.6 | M | ||
| 1.6 | |||
| 2.3 | M | ||
| 2.7 | M | ||
| 1.4 | V | ||
| 1.5 | |||
| 1.5 | |||
| 1.5 | |||
| 1.5 | |||
| 1.6 | M | ||
| 1.6 | |||
| 1.6 | |||
| 1.8 | M | ||
| 1.8 | |||
| 1.6 | |||
| 1.6 | |||
| 2.0 | |||
| 1.5 | M | ||
| 1.8 | |||
| 1.8 | |||
| 1.6 | M | ||
| 1.8 | |||
| 2.1 | M | ||
| 3.1 | V | ||
| 1.6 | |||
| 2.1 | |||
| 1.7 | M | ||
| 1.5 | |||
| 1.4 | |||
| 1.9 | |||
| 1.3 | V | ||
| 2.0 | V | ||
| 1.6 | |||
| 1.6 | |||
| 1.8 | |||
| 1.9 | |||
| 1.5 | |||
| 1.8 | M | ||
| 1.6 | |||
| Nostoc sp. PCC 7120 | 1.2 | ||
| 1.6 | |||
| 1.5 | V | ||
| 2.1 | |||
| 3.0 | V | ||
| 1.8 | M | ||
| 2.0 | M | ||
| 2.1 | |||
| 1.5 | |||
| 1.3 | V | ||
| 1.3 | |||
| 2.1 | |||
| 2.0 | |||
| 1.6 | |||
| 1.7 |
Figure 1Schematic representation of the criteria that were applied to determine whether gene specific information is generated by a specific insert. The upper line represents a genome fragment in which the block arrows represent genes. Arrows with a gray filling belong to the same transcription unit. The thinner lines represent possible locations of the inserts. The dashed lines represent inserts for which no gene specific information can be generated, since they contain genomic material that possibly belongs to another transcription unit.
Overview of the variables that were used for the model datasets on which the MIC- and the GSI-equation are based. Multiple combinations of the mentioned values were applied.
| 500; 1500; 2500; 3500; 4500; 5500; 6500; 7500; 8500; 9500 | |
| 100; 300; 500; 700; 900; 1100; 1300; 1500; 2100; 2700; 3000 | |
| 0.5; 1.5; 2.5; 3.5; 4.5; 5.5; 6.5; 7.5; 8.5; 9.5 | |
| 50; 100; 150; 200; 250; 300; 350; 400; 450 | |
| 10; 20; 30; 40; 50; 60; 70; 80; 90 |
Values for the parameters in the MIC- and the GSI-equation.
| 4.85E-01 | 0.544 | 0 | |
| 2.54E-03 | * | * | |
| -1.51E-05 | -4.26E-08 | -3.05E-07 | |
| 1.27E-04 | 6.13E-05 | 1.46E-05 | |
| -5.22E-09 | 0 | -1.96E-09 | |
| -1.22E-01 | -7.84E-02 | -1.06E-02 | |
| 3.42E-03 | 3.31E-03 | 3.23E-04 | |
| 3.95E-04 | -5.36E-04 | 2.08E-04 | |
| -9.57E-08 | 9.73E-08 | -4.62E-08 | |
| -9.85E-06 | 1.69E-08 | 3.42E-08 | |
| -4.61E-07 | * | * | |
| 3.25E-04 | 2.55E-06 | 4.12E-06 | |
| -1.69E-08 | -2.22E-08 | 5.47E-09 | |
| 2.01E-05 | 2.42E-05 | -6.04E-06 | |
| 2.26E-06 | -1.76E-06 | 1.30E-06 | |
| 2.60E-11 | * | * |
ad *: this parameter is not present in the GSI equation
Overview of the variables and the values used for these variables that were used for the datasets that were used for the validation of the MIC- and the GSI-equation. All possible combinations of the mentioned values were tested.
| N | 1000; 4000; 7000; 10000 |
| 100; 500; 1000; 1500; 2000 | |
| 1; 3; 5; 7 | |
| 25; 50; 75 | |
| 100; 200 |
Figure 2Histogram representations of the residuals from the validation of the MIC-equation (A) and the GSI-equation (B).
Reliability of the MIC- and the GSI-equation, depicted as the fraction of predictions that differ less than 0.01, 0.05 or 0.10 from the real values, for the validation sets defined in Table 4.
| Abs (Δ predicted vs. real) | Fraction for MIC-equation | Fraction for GSI-equation |
| < 0.01 | 0.19 | 0.24 |
| < 0.05 | 0.58 | 0.73 |
| < 0.10 | 0.87 | 0.95 |
Effect of false estimations of R on the fraction of predictions that differ less than 0.01, 0.05 or 0.10 from the real values, for the validation set defined in Table 4.
| Applied value for | Abs (Δ predicted vs. real) | Fraction |
| < 0.01 | 0.24 | |
| < 0.05 | 0.73 | |
| < 0.10 | 0.95 | |
| < 0.01 | 0.17 | |
| < 0.05 | 0.65 | |
| < 0.10 | 0.95 | |
| < 0.01 | 0.24 | |
| < 0.05 | 0.75 | |
| < 0.10 | 0.94 | |
| < 0.01 | 0.12 | |
| < 0.05 | 0.55 | |
| < 0.10 | 0.90 | |
| < 0.01 | 0.23 | |
| < 0.05 | 0.69 | |
| < 0.10 | 0.91 | |
| < 0.01 | 0.10 | |
| < 0.05 | 0.38 | |
| < 0.10 | 0.81 | |
| < 0.01 | 0.21 | |
| < 0.05 | 0.59 | |
| < 0.10 | 0.88 |
Figure 3Contour plots of the predicted fractions of represented genes with a minimal insert coverage of 25% (A), 50% (B), or 75% (C) as a function of the number of clones (N) and the insert size (IS) for a prokaryote with a genome size of 4 Mbp. The predicted fractions are depicted in the plot on top of the lines by which they are represented.
Figure 4Contour plot of the predicted fraction of represented genes for which gene specific information could be generated as a function of the number of clones (N) and the insert size (IS) for a prokaryote with a genome size of 4 Mbp, an average number of genes per transcription unit (R) of 1.8, and a minimal overlap between the insert and the gene of 100 bp. The predicted fractions are depicted in the plot on top of the lines by which they are represented.