| Literature DB >> 21247465 |
Mark A Levenstien1, Robert J Klein.
Abstract
BACKGROUND: With the advent of cost-effective genotyping technologies, genome-wide association studies allow researchers to examine hundreds of thousands of single nucleotide polymorphisms (SNPs) for association with human disease. Recently, many researchers applying this strategy have detected strong associations to disease with SNP markers that are either not in linkage disequilibrium with any nonsynonymous SNP or large distances from any annotated gene. In such cases, no well-established standard practice for effective SNP selection for follow-up studies exists. We aim to identify and prioritize groups of SNPs that are more likely to affect phenotypes in order to facilitate efficient SNP selection for follow-up studies.Entities:
Mesh:
Year: 2011 PMID: 21247465 PMCID: PMC3033802 DOI: 10.1186/1471-2105-12-26
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Representative classes considered in this study.
| class name | number of SNPs | size of region (kb) | SNP frequency (SNPs/kb) |
|---|---|---|---|
| coding | 132,562 | 34,215 | 3.87 |
| promoter | 104,439 | 24,844 | 4.20 |
| splice site | 41,932 | 11,621 | 3.61 |
| constrained elements | 152,158 | 71,977 | 2.11 |
| regulatory features core | 260,919 | 64,296 | 4.06 |
| regulatory features extended | 558,932 | 136,134 | 4.11 |
| cisRED | 9,491 | 2,815 | 3.37 |
| miRanda | 3,975 | 884 | 4.50 |
| non-coding RNA genes | 1,832 | 490 | 3.74 |
| ancestral repeats | 44,243 | 13,011 | 3.40 |
| genome | 11,307,522 | 3,022,647 | 3.74 |
SNP counts shown are for all SNPs in Ensembl.
Figure 1Venn Diagrams displaying the relative number of genome-wide SNPs in several classes. SNP markers are from Ensembl. A) Comparison of SNPs in the gene-centered annotations of promoter, coding, or splice control regions. B) Comparison between coding SNPs and evolutionarily constrained SNPs. C) Comparison between constrained elements and regulatory features. D) Comparison between constrained elements and genic regions.
Classes under negative selection compared to the genome.
| rank | class name | q-value | |
|---|---|---|---|
| 1 | coding | <1 × 10-8 | 1.5 × 10-3 |
| 2 | nonsynonymous | <1 × 10-8 | 3.1 × 10-3 |
| 3 | constrained elements | <1 × 10-8 | 4.7 × 10-3 |
| 4 | constrained elements minus coding | <1 × 10-8 | 6.3 × 10-3 |
| 5 | constrained elements minus genes | <1 × 10-8 | 7.8 × 10-3 |
| 6 | constrained elements 1 kb from genes | <1 × 10-8 | 9.4 × 10-3 |
| 7 | regulatory features extended | <1 × 10-8 | 1.1 × 10-2 |
| 8 | H3K36me3 | <1 × 10-8 | 1.3 × 10-2 |
| 9 | H3K79me3 | <1 × 10-8 | 1.4 × 10-2 |
| 10 | constrained elements 100 kb from genes | 1.0 × 10-8 | 1.6 × 10-2 |
| 11 | splice site | 8.0 × 10-4 | 1.7 × 10-2 |
| 12 | DnaseI | 4.5 × 10-3 | 1.9 × 10-2 |
| 13 | H3K4me3 | 5.1 × 10-3 | 2.0 × 10-2 |
| 14 | H3K4me2 | 8.6 × 10-3 | 2.2 × 10-2 |
| 15 | PolII | 1.1 × 10-2 | 2.3 × 10-2 |
| 16 | miRanda | 1.5 × 10-2 | 2.5 × 10-2 |
| 17 | cisRED | 2.4 × 10-2 | 2.7 × 10-2 |
Classes with a statistically significant excess of low derived alleles when compared to the genome as a whole are shown. In order to adjust for the multiplicity of testing, we apply an FDR correction with α = 0.05. Only resequenced Perlegen SNP markers are included in this analysis to minimize ascertainment bias. For our comparisons, we rely on allele frequencies present in the AFR Perlegen population.
Classes under negative selection compared to ancestral repeats.
| rank | class name | q-value | |
|---|---|---|---|
| 1 | 5.0 × 10-8 | 1.1 × 10-3 | |
| 2 | 3.6 × 10-5 | 2.3 × 10-3 | |
| 3 | 1.4 × 10-4 | 3.4 × 10-3 | |
| 4 | 1.2 × 10-3 | 4.5 × 10-3 | |
| 5 | 3.0 × 10-3 | 5.7 × 10-3 | |
| 6 | 3.2 × 10-3 | 6.8 × 10-3 | |
| 7 | H3K79me3 | 8.0 × 10-3 | 7.9 × 10-3 |
| 8 | constrained elements 100 kb from genes | 1.1 × 10-2 | 9.0 × 10-3 |
| 9 | miRanda | 3.4 × 10-2 | 1.0 × 10-2 |
| 10 | H3K36me3 | 4.0 × 10-2 | 1.1 × 10-2 |
| 11 | PolII | 6.3 × 10-2 | 1.3 × 10-2 |
| 12 | H3K4me2 | 7.1 × 10-2 | 1.4 × 10-2 |
| 13 | cisRED | 1.0 × 10-1 | 1.5 × 10-2 |
Classes with an excess of low derived alleles when compared to the ancestral repeats are shown. Bolded, italicized classes are statistically significant when we apply an FDR correction with α = 0.05. Only resequenced Perlegen SNP markers are included in this analysis to minimize ascertainment bias. For our comparisons, we rely on allele frequencies present in the AFR Perlegen population.
P-values and FDR adjusted p-values for the analyses involving regulatory attributes H3K79me3 and H3K36me3.
| H3K79me3 | H3K36me3 | |||
|---|---|---|---|---|
| 8.00 × 10-3 (1) | 1.00 × 10-8 (1) | 4.00 × 10-2 (1) | 1.00 × 10-8 (1) | |
| FDR adjusted | 5.03 × 10-2 (1) | 4.40 × 10-7 (1) | 1.76 × 10-1 (1) | 4.40 × 10-7 (1) |
Ranks for each "real" p-value are relative to a set of "generated" p-values produced by performing tests of evidence of selection on sets of SNPs in random genomic regions comparable, in size and proximity to annotated genes, to the regulatory feature under consideration.
Figure 2Histograms illustrating the distribution of the derived allele frequencies for different SNP classes. Data for all three Perlegen populations (AFR, EUR, and CHN) are presented for the genome as a whole (green), ancestral repeats (red), and various SNP classes of functional significance (blue). Only resequenced Perlegen SNP markers are included in this analysis to minimize ascertainment bias. A) Evolutionarily constrained elements B) Constrained elements excluding genic regions C) Constrained elements excluding regions closer than 100 kb to an annotated gene D) H3K79me3 regions.
Comparison of the distribution of derived allele frequencies (DAF) for SNPs within several classes.
| class | Perlegen population | class DAF median | ancestral repeats | genome | ||
|---|---|---|---|---|---|---|
| DAF median | DAF median | |||||
| constrained elements | AFR | 0.174 | 0.196 | 3.6 × 10-5 | 0.205 | < 1 × 10-8 |
| EUR | 0.188 | 0.208 | 3.4 × 10-4 | 0.229 | < 1 × 10-8 | |
| CHN | 0.174 | 0.208 | 8.7 × 10-3 | 0.217 | < 1 × 10-8 | |
| constrained elements 1 kb from genes | AFR | 0.174 | 0.196 | 3.2 × 10-3 | 0.200 | < 1 × 10-8 |
| EUR | 0.190 | 0.208 | 5.4 × 10-3 | 0.229 | < 1 × 10-8 | |
| CHN | 0.188 | 0.208 | 6.0 × 10-2 | 0.217 | < 1 × 10-8 | |
| constrained elements 100 kb from genes | AFR | 0.174 | 0.196 | 1.1 × 10-2 | 0.200 | 1.0 × 10-8 |
| EUR | 0.205 | 0.208 | 9.6 × 10-3 | 0.229 | 1.1 × 10-6 | |
| CHN | 0.188 | 0.208 | 8.3 × 10-2 | 0.217 | 8.5 × 10-7 | |
| constrained elements outside of genes | AFR | 0.174 | 0.196 | 3.0 × 10-3 | 0.196 | < 1 × 10-8 |
| EUR | 0.188 | 0.208 | 5.3 × 10-3 | 0.229 | < 1 × 10-8 | |
| CHN | 0.188 | 0.208 | 6.1 × 10-2 | 0.217 | < 1 × 10-8 | |
| H3K79me3 | AFR | 0.174 | 0.196 | 8.0 × 10-3 | 0.200 | < 1 × 10-8 |
| EUR | 0.208 | 0.208 | 1.6 × 10-2 | 0.229 | 1.7 × 10-7 | |
| CHN | 0.188 | 0.202 | 1.2 × 10-1 | 0.217 | 5.0 × 10-8 | |
Classes presented are 1) constrained elements, 2) constrained elements at least 1 kb from the closest gene, and 3) constrained elements at least 100 kb from the closest gene, 4) constrained elements outside of genes, and 5) H3K79me3 regulatory attributes in comparison with the genome as a whole and ancestral repeats. We perform a Mann-Whitney U-test to compare the DAF distribution for SNPs in constrained elements, constrained elements outside of genes, and H3K79me3 regulatory attributes with that of the genome and that of ancestral repeats. We list the resulting p-values as well as the median DAF for each class.