| Literature DB >> 22373399 |
Robert C Culverhouse1, Anthony L Hinrichs, Brian K Suarez.
Abstract
The unrelated individuals sample from Genetic Analysis Workshop 17 consists of a small number of subjects from eight population samples and genetic data composed mostly of rare variants. We compare two simple approaches to collapsing rare variants within genes for their utility in identifying genes that affect phenotype. We also compare results from stratified analyses to those from a pooled analysis that uses ethnicity as a covariate. We found that the two collapsing approaches were similarly effective in identifying genes that contain causative variants in these data. However, including population as a covariate was not an effective substitute for analyzing the subpopulations separately when only one subpopulation contained a rare variant linked to the phenotype.Entities:
Year: 2011 PMID: 22373399 PMCID: PMC3287824 DOI: 10.1186/1753-6561-5-S9-S101
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Q1 single replicate results for combined populations (200 replicates, N = 697 each)
| Count (quantitative) | Indicator (dichotomous) | ||||
|---|---|---|---|---|---|
| Gene | Chromosome | Median ( | Min, max | Median ( | Min, max |
| 13 | 3.1 × 10−19 | 1.1 × 10−29, 5.0 × 10−10 | 9.7 × 10−19 | 9.2 × 10−30, 1.2 × 10−10 | |
| 12 | 7.2 × 10−10 | 5.0 × 10−19, 1.8 × 10−4 | 7.2 × 10−10 | 5.0 × 10−19, 1.8 × 10−4 | |
| 12 | 8.1 × 10−9 | 9.4 × 10−16, 2.0 × 10−2 | 3.0 × 10−8 | 4.8 × 10−15, 2.3 × 10−2 | |
| 12 | 1.4 × 10−8 | 8.5 × 10−18, 1.7 × 10−3 | 1.4 × 10−9 | 8.5 × 10−18, 1.7 × 10−3 | |
Linear regression results from using two collapsing approaches for rare variants in a gene. Analyses used population as a covariate. The median, minimum (min), and maximum (max) p-value for each gene that passed our threshold of 10−6 for at least 50% of the replicates are listed.
Figure 1Comparison of Q1 results from two gene-wise collapsing methods for rare variants. Each data point represents the results of meta-analysis of a single gene using the first 50 data replicates. The horizontal position is −log(p) from the analysis based on the Indicator variable; the vertical position is based on the Count variable. The correlation between the two values is 0.92. If we eliminate the top four values, the correlation is still 0.86.
Q1 subpopulation results (20 meta-analyses, 10 replicates each)
| Subpopulation | Gene | Chromosome | Median ( | Min, max | |
|---|---|---|---|---|---|
| Europea | 1,560 | 13 | 1.0 × 10−45 | 1.1 × 10−52, 4.6 × 10−37 | |
| 4 | 1.9 × 10−41 | 8.7 × 10−58, 1.0 × 10−32 | |||
| Asiab | 3,210 | ||||
| Yorubac | 1,120 | FLT1 | 13 | 9.5 × 10−17 | 9.3 × 10−25, 4.6 × 10−13 |
| Luhyad | 1,080 | 6 | 4.2 × 10−6 | 1.7 × 10−14, 1.7 × 10−2 |
Linear regression results using the Count coding for rare variants in a gene. The median, minimum (min), and maximum (max) p-value for the top causative genes that passed our threshold of 10−6 for at least 50% of the replicates are listed, plus the top result from the Luhya sample, which does not quite meet the threshold.
a There were 78 lower ranked false positives (p < 10−6). The highest ranked false positive had p < 10−40.
b There were no genes that passed the significance threshold of 10−6. Top gene had median p = 0.13.
c There were 23 lower ranked false positives (i.e., p < 10−6). The highest ranked had p > 10−14.
d No genes passed the significance threshold of 10−6 in at least 50% of the replicates. The second ranked gene had p > 10−3.