| Literature DB >> 21187953 |
Jeffrey A Longmate1, Garrett P Larson, Theodore G Krontiris, Steve S Sommer.
Abstract
We describe three statistical results that we have found to be useful in case-control genetic association testing. All three involve combining the discovery of novel genetic variants, usually by sequencing, with genotyping methods that recognize previously discovered variants. We first consider expanding the list of known variants by concentrating variant-discovery in cases. Although the naive inclusion of cases-only sequencing data would create a bias, we show that some sequencing data may be retained, even if controls are not sequenced. Furthermore, for alleles of intermediate frequency, cases-only sequencing with bias-correction entails little if any loss of power, compared to dividing the same sequencing effort among cases and controls. Secondly, we investigate more strongly focused variant discovery to obtain a greater enrichment for disease-related variants. We show how case status, family history, and marker sharing enrich the discovery set by increments that are multiplicative with penetrance, enabling the preferential discovery of high-penetrance variants. A third result applies when sequencing is the primary means of counting alleles in both cases and controls, but a supplementary pooled genotyping sample is used to identify the variants that are very rare. We show that this raises no validity issues, and we evaluate a less expensive and more adaptive approach to judging rarity, based on group-specific variants. We demonstrate the important and unusual caveat that this method requires equal sample sizes for validity. These three results can be used to more efficiently detect the association of rare genetic variants with disease.Entities:
Mesh:
Year: 2010 PMID: 21187953 PMCID: PMC3004857 DOI: 10.1371/journal.pone.0014318
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
ATM exon 24 allele counts from 66 independent breast cancer cases and 126 unrelated controls.
| Exon 24 Alleles | Individuals | ||||
| Disease Status | Genotyping Method | C | G | variant/n | Percent |
| Case | Sequencing | 12 | 2 | 2/7 (1/6) | 29 (17) |
| Case |
| 111 | 7 | 7/59 | 12 |
| Control |
| 248 | 4 | 4/126 | 3 |
The table summarizes data from Larson et al. [13]. The seven cases with sequencing shared an intronic marker at ATM as well as a rare HRAS allele with their affected sibling. Omitting the first occurrence of the variant among sequenced cases (in parentheses) permits comparing a pooled detection rate of 8/65 in cases to 4/126 in controls ( by Fisher's exact test, one-sided).
Test size and power using detection in subsets.
| Scenario | Power (nominal | Detected (mid 50%) | |||||||
| RR | Rare | Freq | Seq | Naive | Corrected | Balanced | Complete | Cases only | Balanced |
| 1 | 20 | .2 | 50 | .08 | .05 | .04 | .04 | (7, 9) | (7, 9) |
| 1 | 40 | .2 | 50 | .15 | .04 | .03 | .04 | (7, 10) | (7, 10) |
| 1 | 40 | .1 | 50 | .12 | .03 | .03 | .04 | (3, 6) | (3, 6) |
| 1 | 40 | .05 | 100 | .15 | .01 | .02 | .03 | (3, 6) | (3, 6) |
| 2.5 | 40 | .2 | 50 | NA | .92 | .89 | 1.00 | (14, 17) | (11, 14) |
| 2.5 | 40 | .1 | 50 | NA | .60 | .58 | 1.00 | (8, 11) | (6, 9) |
| 2.5 | 40 | .1 | 100 | NA | .80 | .85 | 1.00 | (15, 19) | (11, 15) |
| 5 | 100 | .05 | 100 | NA | .83 | .89 | 1.00 | (16, 21) | (10, 14) |
| 5 | 100 | .05 |
| NA | .07 | .63 | .92 | (16, 21) | (10, 14) |
Number cases and controls reduced to 100, so sequencing exhausts cases.
For each line, except the last, 500 cases and 500 controls are generated in 5,000 simulated samples to estimate test size or power for a nominal 0.05-level test comparing the collective frequency of rare alleles. In each scenario, the baseline disease rate is 1%, so relative risk (RR) of 2.5 implies a penetrance of 2.5%. Rare is the number of unknown rare alleles in the population, all assumed to have the same frequency and penetrance. Freq is the total frequency of all rare alleles (e.g. 20 rare alleles with a combined frequency of 0.2 imply a frequency of 0.01 each). We make the simplifying assumption that rare alleles are mutually exclusive. Seq is the total number sequenced, either concentrated in cases or equally divided (balanced) among cases and controls. All four p-value columns are from Fisher's exact text. The first three count the number of cases and controls with any of the rare alleles detected among the indiduals that are sequenced. In the Naive and Corrected columns, all sequences are from controls, but the number of detected distinct rare alleles is subtracted from the case count in the ‘Corrected’ column. Balanced indicates that the individuals sequenced for allele detection were equally divided between cases and controls. Complete denotes the test based on sequencing all cases and all controls — a much larger sequencing effort. The parenthetic numbers indicate 25th and 75th percentiles of the number of rare alleles detected in the cases-only and balanced detection strategies.
Detection Probabilities for High-Risk Variants.
| Selection |
|
|
|
|
|
| ||||
| All Individuals | 0.010 | 0.010 | 0.010 | 0.010 |
| Cases | 0.015 | 0.020 | 0.029 | 0.048 |
| w/affected sibs | 0.019 | 0.029 | 0.057 | 0.130 |
| sharing 1 or 2 IBD | 0.020 | 0.032 | 0.066 | 0.155 |
| sharing 2 IBD | 0.022 | 0.039 | 0.083 | 0.201 |
|
| ||||
| All Individuals | 0.010 | 0.010 | 0.010 | 0.010 |
| Cases | 0.014 | 0.019 | 0.028 | 0.045 |
| w/affected sibs | 0.017 | 0.026 | 0.049 | 0.109 |
| sharing 1 or 2 IBD | 0.018 | 0.029 | 0.056 | 0.130 |
| sharing 2 IBD | 0.020 | 0.034 | 0.070 | 0.169 |
In each scenario there is a sporadic disease rate of 1% and the high-risk allele of interest elevates the disease risk by a factor of , which varies across columns. The rows represent increasingly restrictive sampling rules, and the probability that the high risk allele is present (one or two copies) in a sampled proband is tabulated. In the upper half of the table, risk depends only on one locus. In the lower half, there is also a nuisance locus, with a 5 percent allele frequency, i.e. 10 times as common as the allele of interest, and additive with its effect.