| Literature DB >> 21304886 |
Iuliana Ionita-Laza1, Joseph D Buxbaum, Nan M Laird, Christoph Lange.
Abstract
Rapid advances in sequencing technologies set the stage for the large-scale medical sequencing efforts to be performed in the near future, with the goal of assessing the importance of rare variants in complex diseases. The discovery of new disease susceptibility genes requires powerful statistical methods for rare variant analysis. The low frequency and the expected large number of such variants pose great difficulties for the analysis of these data. We propose here a robust and powerful testing strategy to study the role rare variants may play in affecting susceptibility to complex traits. The strategy is based on assessing whether rare variants in a genetic region collectively occur at significantly higher frequencies in cases compared with controls (or vice versa). A main feature of the proposed methodology is that, although it is an overall test assessing a possibly large number of rare variants simultaneously, the disease variants can be both protective and risk variants, with moderate decreases in statistical power when both types of variants are present. Using simulations, we show that this approach can be powerful under complex and general disease models, as well as in larger genetic regions where the proportion of disease susceptibility variants may be small. Comparisons with previously published tests on simulated data show that the proposed approach can have better power than the existing methods. An application to a recently published study on Type-1 Diabetes finds rare variants in gene IFIH1 to be protective against Type-1 Diabetes.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21304886 PMCID: PMC3033379 DOI: 10.1371/journal.pgen.1001289
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Data summary.
| k/k′ | 1 | 2 | 3 | 4 | 5 | |
| 0 | ||||||
| 1 | ||||||
| 2 | ||||||
| 3 | ||||||
| 4 | ||||||
Variants are classified according to the number of times they appear in controls () and cases (). Only variants with higher observed count in cases compared with controls (i.e., more likely to be risk variants) are considered.
Type-1 Error for the three approaches: collapsing (C), weighted-sum (WS), and replication-based (R).
| Sim. Model | Sample Size | C | WS | R | |
Results for two genetic simulation models are shown: variants under a neutral evolution model (1), and variants under a weakly-purifying selection model (2). The sample size is the total number of individuals sequenced, with equal numbers of cases and controls. Nominal levels: , , and .
Power Estimates () for the three approaches: collapsing (C) versus weighted-sum (WS) versus replication-based (R).
| PAR = 0.03 | PAR = 0.05 | ||||||||
| Sim. Model | Sample Size | Disease Model | #DSVs | C | WS | R | C | WS | R |
| Eq PAR | |||||||||
| Eq PAR | |||||||||
| Uneq PAR | |||||||||
| Uneq PAR | |||||||||
| Eq PAR | |||||||||
| Eq PAR | |||||||||
| Uneq PAR | |||||||||
| Uneq PAR | |||||||||
Two genetic simulation models are assumed: neutral variants (1), and mildly deleterious variants (2). Varying number of DSVs are assumed, that can contribute equally or unequally to the total PAR. The sample size is the total number of individuals sequenced, with equal numbers of cases and controls. All tests are two-sided, i.e., testing for the presence of risk or protective variants in the region of interest.
Power Estimates () for two-sided tests, testing for the presence of risk or protective variants, when there is a mixture of risk and protective variants in the region of interest.
| PAR = 0.03 | PAR = 0.05 | |||||||
| Sim. Model | #Risk | #Protective | C | WS | R | C | WS | R |
In addition to risk variants in the region, there are between protective variants as well. Simulation model corresponds to one of the two scenarios: neutral variants (1), and mildly deleterious variants (2). The total sample size is cases and controls. Collapsing (C) vs. weighted-Sum (WS) vs. replication-based (R).
Type-1 diabetes results.
| Gene | #SNVs | C | WS | R |
Two-sided P-values for the top two genes. An upper frequency threshold of was used for the variants considered for testing.