| Literature DB >> 25908791 |
Amadou Gaye1, Thomas W Y Burton2, Paul R Burton1.
Abstract
MOTIVATION: Very large studies are required to provide sufficiently big sample sizes for adequately powered association analyses. This can be an expensive undertaking and it is important that an accurate sample size is identified. For more realistic sample size calculation and power analysis, the impact of unmeasured aetiological determinants and the quality of measurement of both outcome and explanatory variables should be taken into account. Conventional methods to analyse power use closed-form solutions that are not flexible enough to cater for all of these elements easily. They often result in a potentially substantial overestimation of the actual power.Entities:
Mesh:
Year: 2015 PMID: 25908791 PMCID: PMC4528636 DOI: 10.1093/bioinformatics/btv219
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Overview of the models/scenarios that could be investigated with the initial ESPRESSO script (GA, GB, EB, EB × GA and EB × GB) versus those that are enabled under the new R libraries (all other nine models)
| Additive genetic variant (GA) | Binary genetic variant (GB) | Quantitative environmental exposure (EQ) | Binary environmental exposure (EB) | |
|---|---|---|---|---|
| Additive genetic variant (GA) | GA × GA | |||
| Binary genetic variant (GB) | GB × GA | GB × GB | ||
| Quantitative environmental exposure (EQ) | EQ × GA | EQ × GB | EQ × EQ | |
| Binary environmental exposure (EB) | EB × GA | EB × GB | EB × EQ | EB × EB |
The main effect scenarios that can be investigated are on the cells in the first column whilst the interactions scenarios are in the inner cells of the table.
Fig. 1.Flowchart that shows the main steps in an ESPRESSO process
Fig. 2.Graphical view of the GLM analysis in ESPRESSO. After each simulation run a dataset of observed values is generated analysed and the beta coefficient, standard error and z-statistic stored
Configuration of the CPT cohort: sample size at the time of this analysis and target sample size
| Name | Age-range at recruitment | Target sample size | Sample size by the time of this analysis |
|---|---|---|---|
| Atlantic cohort | 40–69 years | 30 000 | 15 000–25 000 |
| British Columbia cohort | 40–69 years | 40 000 | 15 000–25 000 |
| CARTaGENE (Quebec) | 35–69 years | 20 000 | 20 000 |
| Ontario health survey (Ontario) | 35–69 years | 15 0 000 | 35 000–70 000 |
| The tomorrow project (Alberta) | 40–69 years | 50 000 | 25 000–40 000 |
| CPT project overall | Predominantly 35–69 years | 250 000 | 110 000–180 000 |
The six scenarios explored to construct each power profile
| Scenario | Minor allele frequency | Prevalence of ‘at risk’ environmental exposure | Mathematical model |
|---|---|---|---|
| 1. Common determinants | 0.30 | 0.50 | Main effects only |
| 2. Moderately common determinants | 0.10 | 0.20 | Main effects only |
| 3. Uncommon determinants | 0.05 | 0.10 | Main effects only |
| 4. Common determinants | 0.30 | 0.50 | Main effects + interaction |
| 5. Moderately common determinants | 0.10 | 0.20 | Main effects + interaction |
| 6. Uncommon determinants | 0.05 | 0.10 | Main effects + interaction |
Minimal detectable effect sizes for SBP with 110 000 participants
| Genetic main effect | Environment main effect | |
|---|---|---|
| 10−7 (GWAS) | 0.01 | |
| Moderately common determinants | 1.0433 | 0.8123 |
The classification of the determinants as moderately common refers to the MAF of the genetic determinant (0.1) and the prevalence of the environmental (0.2), respectively, as reported on Table 3.