| Literature DB >> 16895449 |
Victor Guryev1, Bart M G Smits, Jose van de Belt, Mark Verheul, Norbert Hubner, Edwin Cuppen.
Abstract
Genetic variation in genomes is organized in haplotype blocks, and species-specific block structure is defined by differential contribution of population history effects in combination with mutation and recombination events. Haplotype maps characterize the common patterns of linkage disequilibrium in populations and have important applications in the design and interpretation of genetic experiments. Although evolutionary processes are known to drive the selection of individual polymorphisms, their effect on haplotype block structure dynamics has not been shown. Here, we present a high-resolution haplotype map for a 5-megabase genomic region in the rat and compare it with the orthologous human and mouse segments. Although the size and fine structure of haplotype blocks are species dependent, there is a significant interspecies overlap in structure and a tendency for blocks to encompass complete genes. Extending these findings to the complete human genome using haplotype map phase I data reveals that linkage disequilibrium values are significantly higher for equally spaced positions in genic regions, including promoters, as compared to intergenic regions, indicating that a selective mechanism exists to maintain combinations of alleles within potentially interacting coding and regulatory regions. Although this characteristic may complicate the identification of causal polymorphisms underlying phenotypic traits, conservation of haplotype structure may be employed for the identification and characterization of functionally important genomic regions.Entities:
Mesh:
Year: 2006 PMID: 16895449 PMCID: PMC1523234 DOI: 10.1371/journal.pgen.0020121
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Data Used in Haplotype Block Analysis
Figure 1Patterns of LD for Orthologous Genomic Segments of Approximately 5 Mb in Rat, Human, and Mouse
LD plots for orthologous genomic segments in rat (A), human (B), and mouse (C) are shown. For each panel, the following information is shown: LD plot (top), haplotype blocks in SNP coordinates (middle), and physical map and haplotype blocks in physical coordinates (bottom). The haplotype map has a gradient representation for |D′| values that assists visual comparison of haplotype structure. Haplotype blocks were built with stringent criteria, sometimes resulting in splitting of visually recognized blocks. Three characteristic haplotype blocks that are conserved cross-species have been color-coded and are discussed in the text. Similar plots for a second mouse set, two other human populations, and the combined human set are available as Figures S1 and S2.
Characteristics and Correlation of Haplotype Structures between Datasets
Figure 2Comparison of the Haplotype Block Densities between Syntenic Regions of Rat, Mouse, and Human (Same Genome Segments as Shown in Figure 1)
The scatter plots show log10 of the amount of haplotype blocks per 100-kb bin in rat (horizontal) against log10 of the amount of haplotype blocks seen in syntenic region of mouse (A) and human (B) genome (vertical). Data points for gene-containing and intergenic genomic bins are shown as closed and open blocks, respectively. Observed correlations of haplotype block densities are significant in linear (r = +0.5530; p < 0.0001 [A] and r = +0.4563; p = 0.0005 [B]) as well as in log-transformed space (r = +0.6795; p < 0.0001 [A] and r = +0.3209; p = 0.0180 [B]).
Figure 3Analysis of LD Decay for Functionally Different Segments of the Human Genome
(A) The graph shows average values of |D′| and their confidence limits (± standard deviation) as a function of the physical distance between SNPs for the following categories: (1) both SNPs reside in the same gene (blue line), (2) the SNPs reside in two different genes (green line), (3) both SNPs reside in the same intergenic region (red line), (4) one SNP resides in the gene and the other in the 30 kb upstream region of the same gene (purple line), and (5) one SNP resides in the gene and the other in the 30 kb downstream region of the same gene (gray line).
(B) Frequency distribution spectrum of |D′| values for SNP pairs at 100-kb distance. High |D′| values (>0.8) are overrepresented for equally spaced SNPs in a gene and its flanking regions as compared to intergenic regions.
(C) Frequency distribution of high LD values (|D′| > 0.5) for SNP pairs at 450-kb distance. Higher LD values are observed between a gene and its upstream region.
(D) Frequency distribution of high LD values (|D′| > 0.5) for SNP pairs at 650-kb distance. Higher LD values are observed between a gene and its upstream region.
The bin with |D′| = 1 is isolated to a separate bin in panels (B–D) as there is a considerable frequency bias for this |D′| value. Similar graphs plotted for separate human populations are available as Figures S3, S4, and S5.