| Literature DB >> 20144223 |
Abstract
BACKGROUND: Current commercial high-density oligonucleotide microarrays can hold millions of probe spots on a single microscopic glass slide and are ideal for studying the transcriptome of microbial genomes using a tiling probe design. This paper describes a comprehensive computational pipeline implemented specifically for designing tiling probe sets to study microbial transcriptome profiles.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20144223 PMCID: PMC2836303 DOI: 10.1186/1471-2105-11-82
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overview of the tiling array probe design pipeline.
Figure 2Schematics of the probe evaluation within selection windows. A) Collection of all qualified probe sequences within each window. All probe properties are collected for every probe sequence that starts within the same selection window. Probes closer than minimal distance to existing selected probes are excluded and windows containing already selected probes due to repeated sequences are omitted. B) Selection of the highest quality probe within the window. Screening for the best probe within each selection window starts in the order of cross-hybridization level 1 through 4. Probe sequences of the same level are evaluated based on the following criteria: deviance from median Tm; probe self-annealing; BLAST percentage identity and identity stretch; and, probe length and position.
Figure 3Schematics of the gap-filling. A) Collect all qualified probe sequences within each gap. This involves removing probe sequences at the gap ends and other repeated probe sequences that appear within minimal distance to already selected probes. B) Collect the highest quality probe within the gap. In the first probe screening step, one best probe representative from each gap is collected based on the same ranking criteria used in Figure 2B. C) Selection of the best probe among all gaps. The best probes, one highest quality probe from each gap, are ordered by descending gap size and then by the following criteria: BLAST percentage identity and stretch; deviance from median Tm; probe self-annealing; and, probe length and position. The highest quality probe from the largest gap is selected. Every additional probe selected generates two new gaps which are orderly added to the existing gaps. The gap-filling is repeated until all probe spots on the target microarray are used.
Summary statistics of designed probe sets for several microbial genomes of different sizes
| Species | ||||||
|---|---|---|---|---|---|---|
| Large, 13 Mbp | Medium, 2.34 Mbp | Small, 0.16 Mbp | ||||
| 1620 K (12*135 K) | 385 K | 72 K (4*72 K) | ||||
| 16 | 12 | 5 | ||||
| 4 | 3 | 2 | ||||
| 1620000 | 385000 | 72000 | ||||
| 818857 | 204991 | 36069 | ||||
| 818947 | 205024 | 35931 | ||||
| 4 | 123253 | 3 | 42959 | 2 | 24802 | |
| 5 | 66143 | 4 | 22608 | 3 | 11222 | |
| 6 | 55731 | 5 | 21595 | 4 | 20978 | |
| 7 | 54928 | 6 | 20832 | 5 | 9846 | |
| 8 | 62274 | 7 | 21503 | 6 | 5152 | |
| 9 | 59254 | 8 | 25340 | |||
| 10 | 63218 | 9 | 25927 | |||
| 11 | 79015 | 10 | 28351 | |||
| 12 | 74532 | 11 | 39237 | |||
| 13 | 78092 | 12 | 24979 | |||
| 14 | 92854 | 13 | 21136 | |||
| 15 | 111273 | 14 | 19371 | |||
| 16 | 78459 | 15 | 17735 | |||
| 17 | 75716 | 16 | 16567 | |||
| 18 | 61562 | 17 | 17007 | |||
| 19 | 57317 | 18 | 16602 | |||
| 20 | 59414 | 19 | 12493 | |||
| 21 | 50445 | 20 | 10402 | |||
| 22 | 47169 | 21 | 5371 | |||
| 23 | 49842 | |||||
| 24 | 42842 | |||||
| 25 | 41557 | |||||
| 26 | 41786 | |||||
| 27 | 32553 | |||||
| 28 | 28835 | |||||
| 29 | 27753 | |||||
| 30 | 21987 | |||||
Pipeline run time for different probe design sets
| Genome size | ~0.16 Mbp | ~1.22 Mbp | ~2.34 Mbp | ~4.6 Mbp | ~13 Mbp |
| Characteristic | Small | Multiple | Oral pathogen | Large | |
| Probe number (NG format) | 72 K (4*72 K) | 192.5 K (385 K/2) | 385 K | 770 K (2*385 K) | 1620 K (12*135 K) |
| Probe size | 50 | 50 | 50 | 50 | 50 |
| Min. probe size | 45 | 45 | 45 | 45 | 48 |
| Step 1 | <1 min | <2 min | 2 min | 6 min | 16 min |
| Step 2 | <1 min | 8 min | 16 min | 31 min | 1 h 3 min |
| Step 3 (blastn, -W 7)* | 12 min | 1 h 43 min | 2 h 57 min | 6 h 37 min | 76 h |
| Step 4 (hybrid-ss-min)* | 6 min | 43 min | 69 min | 2 h 25 min | 8 h |
| Step 5,6 | <1 min | 7 min | 6 min | 11 min | 31 min |
*Run in parallel, only most time consuming step is included in the total run time. Word size 7 is used for blastn.