| Literature DB >> 30669967 |
Xinyuan Zhang1, Anna O Basile2, Sarah A Pendergrass3, Marylyn D Ritchie4,5.
Abstract
BACKGROUND: The development of sequencing techniques and statistical methods provides great opportunities for identifying the impact of rare genetic variation on complex traits. However, there is a lack of knowledge on the impact of sample size, case numbers, the balance of cases vs controls for both burden and dispersion based rare variant association methods. For example, Phenome-Wide Association Studies may have a wide range of case and control sample sizes across hundreds of diagnoses and traits, and with the application of statistical methods to rare variants, it is important to understand the strengths and limitations of the analyses.Entities:
Keywords: Power analysis; Rare variant association analysis; Sample size imbalance; Simulation study
Mesh:
Year: 2019 PMID: 30669967 PMCID: PMC6343276 DOI: 10.1186/s12859-018-2591-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Type I error simulation results with MAF UB of 0.01. For visualization and comparison purposes, blue and red horizontal lines indicate type I error at 0.05 and 0.1 respectively. Fig. (a) shows the results for type I error for an equal number of cases and controls for differing sample sizes. Note that the y-axis only goes to a type I error rate of 0.1. Fig. (b) shows the type I error rate for different unbalanced cases and controls as arranged by case to control ratio. The axis is labeled by the number of cases then the number of controls for each simulation. The percentage of cases to controls is also listed below the number of cases and controls. Figs. (c and d) show the results as ordered by the number of cases. Figure 1c has 10,000 control and Fig. 1d has 30,000 control
Fig. 2Power simulation results with cutoff for evaluated variation of MAF 0.01. Fig. (a) shows the results when cases and controls are equal in number. Fig. (b) shows the impact of unbalanced cases and controls on power ranked by the case/control ratio. The percent case to control ratio is listed below the x-axis. Figs. (c and d) show the results for power with unbalanced cases and controls ordered by case number with 10,000 controls (c) and 30,000 controls (d)
Fig. 3Power comparison of three models with differing contributions from protective and risk rare genetic variation. The results are shown for variants contributing low, moderate, or high impact on outcome risk or protection. Methods describe the range of odds ratios corresponding to the different categories. (a) Total sample size of 4000 for balanced cases and controls with MAF UB 0.05. (b) Total sample size of 4000 for balanced cases and controls with MAF UB 0.01. (c) 200 cases and 10,000 controls with MAF UB 0.05. (d) 200 cases and 10,000 controls with MAF UB 0.01
Detailed Parameters for Mixture Odds Ratio Design
| Randomly Selected 10 Disease loci | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Signal Level | OR > 1 range (Risk) | OR < 1 range (Protective) | ||||||||
| Low |
| 2.73 | 3.15 | 3.58 |
| 0.43 | 0.37 | 0.32 | 0.28 | 0.25 |
| Moderate |
| 5.25 | 6.5 | 7.75 |
| 0.25 | 0.19 | 0.15 | 0.13 | 0.11 |
| High |
| 11.5 | 14 | 16.4 |
| 0.11 | 0.087 | 0.07 | 0.06 | 0.053 |
Note: The numbers in bold represent the boundaries when selecting the odds ratios
Simulation Design
| Balanced Cases and Controls | |
| Total Sample Size 20, 50, 100, 200, 400, 1000, 2000, 4000, 6000, 10,000, 14,000, 20,000 | |
| Unbalanced Cases and Controls | |
| Number of controls 10,000 | Number of controls 30,000 |
| Number of cases | Number of cases |
Other Parameter Settings
| Number of Simulations | 1000 runs times 30 replicates for each sample size scenario |
| Upper Threshold for MAF | 0.01 and 0.05 |
| Variant Weighting | Madsen and Browning [ |
| Disease Prevalence | 5% |
| Number of Disease Loci | 10 |
| Odds Ratio (OR) | All disease loci with OR 2.5; Half of disease loci with risk effect, the other half with protective effect |