| Literature DB >> 17646297 |
Stefan Gräf1, Fiona G G Nielsen, Stefan Kurtz, Martijn A Huynen, Ewan Birney, Henk Stunnenberg, Paul Flicek.
Abstract
MOTIVATION: Recent advances in microarray technologies have made it feasible to interrogate whole genomes with tiling arrays and this technique is rapidly becoming one of the most important high-throughput functional genomics assays. For large mammalian genomes, analyzing oligonucleotide tiling array data is complicated by the presence of non-unique sequences on the array, which increases the overall noise in the data and may lead to false positive results due to cross-hybridization. The ability to create custom microarrays using maskless array synthesis has led us to consider ways to optimize array design characteristics for improving data quality and analysis. We have identified a number of design parameters to be optimized including uniqueness of the probe sequences within the whole genome, melting temperature and self-hybridization potential.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17646297 PMCID: PMC5892713 DOI: 10.1093/bioinformatics/btm200
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Design strategy. (A) The genomic sequence is subdivided in unit-sized windows. Within each window, all minimum unique substrings with length ≤K are determined. These are the basis for the uniqueness scoring to design optimized probes. (B) Uniqueness scoring function (exemplified by Mus musculus, chr17:3028401-3028500). The shown sequences represent all the minimum unique prefixes for a unit window. In each window of seed length h, the uniqueness score is calculated by counting the number of minimum unique substrings. For the windows shown, the uniqueness scores are 7, 9, 7, 4, 1, 0. The minimum unique substrings that add to the score are indicated by stars.
Fig. 2The distribution of U with respect to the number of genome-wide hybridization-quality BLAT alignments for a large set of 50mer probes. The box-and-whiskers plot represents the median value of U by a bold line and the first and third quartiles of the U distribution are represented by the outline of the box. Whiskers represent the largest and smallest values of U within 1.5 × IQR (inter quartile range).
Fig. 3Probe selection algorithm.
Fig. 4Density plots of optimized characteristics for our high-coverage and high-uniqueness tiling array designs and comparison to commercial whole-genome tiling arrays. (A) The full design uniqueness score per base, Tm distribution and the uniqueness score per base for the disjoint subsets represented by the non-repetitive and repetitive portions of the mouse genome for our high-coverage U>0 design containing 19 343 498 probes in the entire design of which 10 565 728 probes are in regions not identified as repetitive and 8 777 770 probes are in repetitive regions; (B) The full design uniqueness score per base, Tm distribution and the uniqueness score per base for the disjoint subsets represented by the non-repetitive and repetitive portions of the mouse genome for our high-uniqueness U>15 design containing 15 658 735 probes in the entire design of which 10 213 493 probes are in regions not identified as repetitive and 5 445 242 probes are in repetitive regions; (C) The full design uniqueness score per base and the Tm distribution for the NimbleGen 50mers in 100 bp windows whole-genome design containing 14 579 139 probes designed to the non-repetitive portion of the genome and (D) The full design uniqueness score per base and the Tm distribution for the Affymetrix 25mers in 35 bp windows whole-genome design containing 38 346 501 probes designed to the non-repetitive portion of the genome. See Table 1 for additional design information.
Coverage of the mouse genome, expressed as a percentage of the length of the genome assembly, for the base pair, window and region measures for various tiling array designs
| Design parameters | Base pair | Window | Region |
|---|---|---|---|
| 35.1 | 73.2 | 78.1 | |
| 28.4 | 59.2 | 60.7 | |
| NimbleGen | 27.6 | 55.1 | 52.7 |
| Affymetrix | 36.5 | 50.8 | 50.5 |
Design number C4527-SET-01.
GeneChip© Mouse Tiling 2.0R Array Set.