| Literature DB >> 22058129 |
Albert Pallejà1, Heiko Horn, Sabrina Eliasson, Lars Juhl Jensen.
Abstract
Genome-wide association studies (GWAS) have identified thousands of single nucleotide polymorphisms (SNPs) associated with the risk of hundreds of diseases. However, there is currently no database that enables non-specialists to answer the following simple questions: which SNPs associated with diseases are in linkage disequilibrium (LD) with a gene of interest? Which chromosomal regions have been associated with a given disease, and which are the potentially causal genes in each region? To answer these questions, we use data from the HapMap Project to partition each chromosome into so-called LD blocks, so that SNPs in LD with each other are preferentially in the same block, whereas SNPs not in LD are in different blocks. By projecting SNPs and genes onto LD blocks, the DistiLD database aims to increase usage of existing GWAS results by making it easy to query and visualize disease-associated SNPs and genes in their chromosomal context. The database is available at http://distild.jensenlab.org/.Entities:
Mesh:
Year: 2011 PMID: 22058129 PMCID: PMC3245128 DOI: 10.1093/nar/gkr899
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Dividing chromosomes into LD blocks. The figure shows the results for a region of chromosome 19. (A) We first segment the chromosome into three classes based on the average D′ within a ±60 kb window: high-LD (black diamonds, D′ ≥ 0.6), moderate-LD (green squares, 0.4 ≤ D′ < 0.6) and low-LD segments (blue triangles, D′ < 0.4). The heatmap below the graph shows the D′ values between pairs of SNPs. (B) We subsequently determine the boundaries of LD blocks within moderate- and low-LD segments based on where the average D′ within a ±5 kb window drops to 0.5 or lower.
Robustness analysis of the algorithm
| Large window (kb) | Small window (kb) | Average | Number of LD blocks | Average size of LD blocks |
|---|---|---|---|---|
| ±50 | ±5 | ±0.10 | 37 856 | 80 |
| ±60 | ±4 | ±0.10 | 35 097 | 86 |
| ±60 | ±5 | ±0.05 | 38 532 | 79 |
| ±60 | ±5 | ±0.20 | 35 752 | 85 |
| ±60 | ±6 | ±0.10 | 41 332 | 73 |
| ±70 | ±5 | ±0.10 | 38 296 | 79 |
The table shows the number of LD blocks and the average size of the blocks after running our algorithm using different window sizes and average D′ thresholds symmetric around D′ = 0.5. We set the thresholds by adding or subtracting to 0.5 the quantity in column Average D′ thresholds. The average size of the LD blocks changed <8% when varying the window sizes and the average D′ thresholds. The windows and thresholds finally selected for running the algorithm and the results obtained are in bold.
Figure 2.The DistiLD web interface. The figure shows different steps when querying the database with the three genes IKZF1, ARID5B and CEBPE. (A) An intermediate page is shown where the user selects a disease or GWAS of interest. (B) The result page shows LD blocks containing SNPs associated with the selected disease or GWAS. If the query is a list of SNPs or genes, they will be highlighted in red. (C) A popup with further details on SNPs can be obtained by clicking on them. (D) Similarly, selecting a gene yields an information popup provided by the Reflect web resource (16).