| Literature DB >> 20017993 |
Jennifer L Asimit1, Yun Joo Yoo1, Daryl Waggott1, Lei Sun2,3, Shelley B Bull1,2.
Abstract
Due to the high-dimensionality of single-nucleotide polymorphism (SNP) data, region-based methods are an attractive approach to the identification of genetic variation associated with a certain phenotype. A common approach to defining regions is to identify the most significant SNPs from a single-SNP association analysis, and then use a gene database to obtain a list of genes proximal to the identified SNPs. Alternatively, regions may be defined statistically, via a scan statistic. After categorizing SNPs as significant or not (based on the single-SNP association p-values), a scan statistic is useful to identify regions that contain more significant SNPs than expected by chance. Important features of this method are that regions are defined statistically, so that there is no dependence on a gene database, and both gene and inter-gene regions can be detected. In the analysis of blood-lipid phenotypes from the Framingham Heart Study (FHS), we compared statistically defined regions with those formed from the top single SNP tests. Although we missed a number of single SNPs, we also identified many additional regions not found as SNP-database regions and avoided issues related to region definition. In addition, analyses of candidate genes for high-density lipoprotein, low-density lipoprotein, and triglyceride levels suggested that associations detected with region-based statistics are also found using the scan statistic approach.Entities:
Year: 2009 PMID: 20017993 PMCID: PMC2795900 DOI: 10.1186/1753-6561-3-s7-s127
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Comparison of scan statistic regions with single-SNP tests for HDL having p-values < 10-4
| Scan statistic regions | ||||
|---|---|---|---|---|
| Single-SNP | Non-gene | Gene | SNPs missed by | SNP |
| Inter-gene SNP | 29 | 18 | 172 | 219 |
| Within-gene SNP | 0 | 35 | 146 | 181 |
| Total no. SNPs | 29 | 53 | 318 | 400 |
Comparison of scan statistic regions with SNP-database regions defined from single-SNP tests for HDL having p-values < 10-4
| SNP-database region | ||||
|---|---|---|---|---|
| Scan-statistic region | Inter-gene | Within-gene | Regions detected only by scan statistic | Total no. regions |
| Non-gene scan statistic | 33 (8)a | 0 | 72 (12) | 105 (20) |
| Gene scan statistic | 10 (7) | 38 (17) | 87 (20) | 135 (44) |
| Total | 43 (15) | 38 (17) | 159 (32) | 240 (64) |
aNumbers in parentheses are counts for tests with genome-wide empirical p-values < 0.05.
Region-based tests of candidate genes for lipid phenotypes
| Gene-based analysis ( | Scan statistic analysis | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Lipid Gene | Chr. | No. SNPs (No. PCs) | Global LR test | Schaid test | James min | No. SNPs | Region | GW | Empirical GW |
| | 16 | 7 (3) | 22 | 4.72 × 10-17 | 2 | ||||
| | 8 | 5 (3) | 12 | 1.06 × 10-8 | 6 | ||||
| | 9 | 52 (14) | 0.15 | 1.12 × 10-3 | 16 | 2.51 × 10-8 | 10 | ||
| | 16 | 2 (2) | 0.36 | 0.15 | 0.45 | 22 | 4.72 × 10-17 | 2 | |
| | 10 | 47 (10) | 4.27 × 10-4 | 0.02 | 6 | 6.15 × 10-4 | 197 | 0.31 | |
| | 18 | 1 (1) | 0.29 | 0.29 | 0.29 | 39 | 7.81 × 10-26 | 1 | |
| | 18 | 5 (2) | 0.67 | 0.42 | 0.61 | 39 | 7.81 × 10-26 | 1 | |
| | 1 | 1 (1) | 3 | 4.20 × 10-6 | 218c | ||||
| | 19 | 5 (2) | 15 | 1.82 × 10-8 | 14 | ||||
| | 2 | 10 (4) | 17 | 9.40 × 10-10 | 7 | ||||
| | 5 | 5 (2) | 5.52 × 10-4 | 1.38 × 10-3 | NAd | NA | NA | NA | |
| | 19 | 1 (1) | 0.09 | 0.09 | 0.09 | 18 | 6.09 × 10-11 | 3 | |
| | 7 | 3 (2) | 7 | 4.64 × 10-10 | 106c | ||||
| | 8 | 5 (3) | 24 | 1.27 × 10-16 | 3 | ||||
| | 2 | 4 (2) | 6 | 5.51 × 10-6 | 40 | ||||
aFor tests in regression analysis of principal components (PCs). p-Values < 2 × 10-4 are in bold.
bThe empirical p-value is the number of permutation regions with p-values smaller than the observed regional p-value divided by 10,000 n, where n is 240 for HDL, 243 for LDL, or 204 for TG. p-Values < 0.05 are in bold.
cRank from the scan statistic analysis using unpruned genotype data
dNA indicates that the regional p-value was greater than the threshold 10-3.