| Literature DB >> 16595075 |
Frank Dudbridge1, Arief Gusnanto, Bobby P C Koeleman.
Abstract
Recent developments in the statistical analysis of genome-wide studies are reviewed. Genome-wide analyses are becoming increasingly common in areas such as scans for disease-associated markers and gene expression profiling. The data generated by these studies present new problems for statistical analysis, owing to the large number of hypothesis tests, comparatively small sample size and modest number of true gene effects. In this review, strategies are described for optimising the genotyping cost by discarding promising genes at an earlier stage, saving resources for the genes that show a trend of association. In addition, there is a review of new methods of analysis that combine evidence across genes to increase sensitivity to multiple true associations in the presence of many non-associated genes. Some methods achieve this by including only the most significant results, whereas others model the overall distribution of results as a mixture of distributions from true and null effects. Because genes are correlated even when having no effect, permutation testing is often necessary to estimate the overall significance, but this can be very time consuming. Efficiency can be improved by fitting a parametric distribution to permutation replicates, which can be re-used in subsequent analyses. Methods are also available to generate random draws from the permutation distribution. The review also includes discussion of new error measures that give a more reasonable interpretation of genome-wide studies, together with improved sensitivity. The false discovery rate allows a controlled proportion of positive results to be false, while detecting more true positives; and the local false discovery rate and false-positive report probability give clarity on whether or not a statistically significant test represents a real discovery.Entities:
Mesh:
Year: 2006 PMID: 16595075 PMCID: PMC3500180 DOI: 10.1186/1479-7364-2-5-310
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Determinants of power in genome-wide association and expression studies.
| Genetic association study | Gene expression study | |
|---|---|---|
| Number of genes tested | High | High |
| Number | Few;1 < | Moderate;k >>1 |
| Sample size | Large; thousands | Small; tens |
| Gene effect size | Low; odds ratio <2 | High; logfold-change >2 |
Comparison of different error rates and analysis methods.
| Error control for | Appropriate for | |||
|---|---|---|---|---|
| Whole study | Single test | Association study | Expression study | |
| Family wise error, strong | Yes (1) | Yes (1) | No | No |
| Family wise error, weak | Yes(1) | No | Yes | Yes |
| Minimum | Yes (1) | Yes (1) | Some what | No |
| Truncated | Yes (1) | No | Yes | Possibly |
| Random gene effects model | Yes (1) | Yes (2) | Possibly | Yes |
| False discovery rate | Yes (3) | No | No | Yes |
| Q-value | Yes (3) | Some (3) | No | Yes |
| Local false discovery rate | Yes (2) | Yes (2) | Yes | Yes |
| False-positive report probability | Yes (3) | Some (3) | Yes | Yes |
'Error control' indicates whether a method provides some measure of error: (1) type-I error; (2) posterior probability of association; (3) expected proportion of false discoveries in a series of tests. 'Appropriate for' indicates whether, in the view of the authors, a method is suitable for genome-wide association or expression studies, based on the factors in Table 1.