| Literature DB >> 17000641 |
C H Cannon1, C S Kua, E K Lobenhofer, P Hurban.
Abstract
Comparative genomics, using the model organism approach, has provided powerful insights into the structure and evolution of whole genomes. Unfortunately, only a small fraction of Earth's biodiversity will have its genome sequenced in the foreseeable future. Most wild organisms have radically different life histories and evolutionary genomics than current model systems. A novel technique is needed to expand comparative genomics to a wider range of organisms. Here, we describe a novel approach using an anonymous DNA microarray platform that gathers genomic samples of sequence variation from any organism. Oligonucleotide probe sequences placed on a custom 44 K array were 25 bp long and designed using a simple set of criteria to maximize their complexity and dispersion in sequence probability space. Using whole genomic samples from three known genomes (mouse, rat and human) and one unknown (Gonystylus bancanus), we demonstrate and validate its power, reliability, transitivity and sensitivity. Using two separate statistical analyses, a large numbers of genomic 'indicator' probes were discovered. The construction of a genomic signature database based upon this technique would allow virtual comparisons and simple queries could generate optimal subsets of markers to be used in large-scale assays, using simple downstream techniques. Biologists from a wide range of fields, studying almost any organism, could efficiently perform genomic comparisons, at potentially any phylogenetic level after performing a small number of standardized DNA microarray hybridizations. Possibilities for refining and expanding the approach are discussed.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17000641 PMCID: PMC1636412 DOI: 10.1093/nar/gkl478
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Affinity of four genomes to the SHyP array. (A) Experimental design for genomic comparisons. Each arrow indicates a direct hybridization experiment in which the indicated genome was labeled with Cy3 and compared to the other genome, labeled with Cy5. The rat genome was used in a self-self hybridization. Color-coding of genomes is consistent (B-E,G). Log[Average] hybridization intensity across for each genomic comparison: (B) mouse-rat; (C) mouse-human; (D) rat-human; (E) rat–rat and (G) raminSk5-raminSu5. Only ‘indicator’ probes for at least one genome are shown, including ‘rodent’ as purple (see Materials and Methods). (F) The results from the direct comparison among the three known genomes are shown in the overlapping circles. The central number indicates the probes with apparent hybridization to all four genomes. The number of probes unique to each genome are shown in the outer arcs. The interstices illustrate the number of probes common to both genomes. The small white circle in the rat circle represents the false positives observed in the rat–rat hybridization. (H) The ‘population-indicator’ are color-coded for the ramin species, while non-significant probes are shown in gray.
Effect of significance level on informative probes
| N | |||||||
|---|---|---|---|---|---|---|---|
| 3965 | 2243 | 897 | 877 | 105/303 | 914 | ||
| 0.01 | 9.32 (1.5) | 9.59 (1.5) | 9.20 (1.9) | 9.73 (1.5) | 9.41 (1.6) / 9.3 (1.8) | 9.49 (1.5) | |
| GC% | 52 | 52 | 53 | 52 | 54/53 | 52 | |
| N | 2808 | 1303 | 516 | 734 | 43/138 | 117 | |
| 1 × 10−5 | 9.33 (1.5) | 9.66 (1.5) | 9.19 (1.9) | 9.81 (1.5) | 9.43 (1.6) / 9.16 (1.8) | 9.49 (1.5) | |
| GC% | 52 | 52 | 53 | 52 | 53/52 | 52 | |
| N | 1291 | 857 | 360 | 550 | 26/81 | 32 | |
| 1 × 10−8 | 9.35 (1.5) | 9.70 (1.5) | 9.20 (1.9) | 9.88 (1.5) | 9.45 (1.5) / 9.17 (1.8) | 9.40 (1.4) | |
| GC% | 52 | 52 | 53 | 52 | 53/52 | 53 | |
| N | 870 | 616 | 266 | 383 | 14/53 | 6 | |
| 1 × 10−11 | 9.37 (1.5) | 9.71 (1.5) | 9.20 (2.0) | 9.93 (1.5) | 9.66 (1.6) / 9.29 (1.8) | 9.40 (1.5) | |
| GC% | 52 | 52 | 53 | 52 | 53/52 | 53 |
P-value is shown at four levels, increasing sequentially by three orders of magnitude. This value is taken directly from Agilent's image analysis software. Values for genomic compartments are shown at each level of significance, including each individual genome, including each ramin population. The false positives were identified in the rat:rat hybridization. For each P-value, the number (N) of indicator probes for each sample are shown, the average maximum sequence identity (Id) in bp (standard deviation shown) and (GC%) content for each set of informative probes.
Correlation among hybridizations for each genome
The values along the diagonal indicate the results produced from different comparisons for the same genome. The first value represents all probes, the second value includes low variance probes. The comparisons between genomes in the upper right half of the table indicate correlation values for low variance probes; all probes are shown in the lower left half.
Figure 2Affinity of four genomes with low variance SHyP probes. Genomes are color-coded as in Figure 1: mouse (red), rat (blue), human (green) and ramin (orange). Each color-coded column represents a specific class of indicator probes: all genomes (white, n = 224), mammal (light gray, n = 682), rodent (dark gray, n = 729), mouse (pink, n = 644), rat (light blue, n = 491), human (light green, n = 233) and ramin (light orange, n = 76). Each class of indicator probes are ordered from low to high average Log[intensity] across all genomes. The solid line indicates the cut-off for presence/absence of signal, as determined by our statistical analysis. Obviously homoplasious probes are indicated in the white space between the rat and human graphs. Homoplasious probes found in both human and ramin are beige.
Distribution of low variance probes across genomes and groups
Private Probes indicate the number of sequences with significantly stronger hybridization intensity for that group while ‘detectable’ probes were present but at a low level in other groups.
Figure 3The distribution of some SHyP probes in known genomes. (A) The relationship between chromosome length and the number of BLAST hits observed using 3190 probes sequences, all with strong hybridization intensities. Each point represents a single chromosome and the number of hits with ≥18 complementary base pairs are shown. (B) Human chromosome maps showing the accumulation of probes from end to end for each chromosome. Vertical breaks indicate an absence of probes in that segment while horizontal lines illustrate probe-rich regions.