| Literature DB >> 27305981 |
Masahiro Kanai1, Toshihiro Tanaka1,2, Yukinori Okada1,3,4.
Abstract
To assess the statistical significance of associations between variants and traits, genome-wide association studies (GWAS) should employ an appropriate threshold that accounts for the massive burden of multiple testing in the study. Although most studies in the current literature commonly set a genome-wide significance threshold at the level of P=5.0 × 10-8, the adequacy of this value for respective populations has not been fully investigated. To empirically estimate thresholds for different ancestral populations, we conducted GWAS simulations using the 1000 Genomes Phase 3 data set for Africans (AFR), Europeans (EUR), Admixed Americans (AMR), East Asians (EAS) and South Asians (SAS). The estimated empirical genome-wide significance thresholds were Psig=3.24 × 10-8 (AFR), 9.26 × 10-8 (EUR), 1.83 × 10-7 (AMR), 1.61 × 10-7 (EAS) and 9.46 × 10-8 (SAS). We additionally conducted trans-ethnic meta-analyses across all populations (ALL) and all populations except for AFR (ΔAFR), which yielded Psig=3.25 × 10-8 (ALL) and 4.20 × 10-8 (ΔAFR). Our results indicate that the current threshold (P=5.0 × 10-8) is overly stringent for all ancestral populations except for Africans; however, we should employ a more stringent threshold when conducting a meta-analysis, regardless of the presence of African samples.Entities:
Mesh:
Year: 2016 PMID: 27305981 PMCID: PMC5090169 DOI: 10.1038/jhg.2016.72
Source DB: PubMed Journal: J Hum Genet ISSN: 1434-5161 Impact factor: 3.172
20]
Overview of the 1000 Genomes Phase 3 (version 5) samples
| AFR | African Caribbeans in Barbados | ACB | 47 | 49 | 96 | 21 048 933 |
| Americans of African Ancestry in SW USA | ASW | 26 | 35 | 61 | ||
| Esan in Nigeria | ESN | 53 | 46 | 99 | ||
| Gambian in Western Divisions in the Gambia | GWD | 55 | 58 | 113 | ||
| Luhya in Webuye, Kenya | LWK | 44 | 55 | 99 | ||
| Mende in Sierra Leone | MSL | 42 | 43 | 85 | ||
| Yoruba in Ibadan, Nigeria | YRI | 52 | 56 | 108 | ||
| Subtotal | 319 | 342 | 661 | |||
| EUR | Utah Residents (CEPH) with Northern and Western European Ancestry | CEU | 49 | 50 | 99 | 11 980 247 |
| Finnish in Finland | FIN | 38 | 61 | 99 | ||
| British in England and Scotland | GBR | 46 | 45 | 91 | ||
| Iberian Population in Spain | IBS | 54 | 53 | 107 | ||
| Toscani in Italia | TSI | 53 | 54 | 107 | ||
| Subtotal | 240 | 263 | 503 | |||
| AMR | Colombians from Medellin, Colombia | CLM | 43 | 51 | 94 | 14 261 439 |
| Mexican Ancestry from Los Angeles, USA | MXL | 32 | 32 | 64 | ||
| Peruvians from Lima, Peru | PEL | 41 | 44 | 85 | ||
| Puerto Ricans from Puerto Rico | PUR | 54 | 50 | 104 | ||
| Subtotal | 170 | 177 | 347 | |||
| EAS | Chinese Dai in Xishuangbanna, China | CDX | 44 | 49 | 93 | 10 201 713 |
| Han Chinese in Beijing, China | CHB | 46 | 57 | 103 | ||
| Southern Han Chinese | CHS | 52 | 53 | 105 | ||
| Japanese in Tokyo, Japan | JPT | 56 | 48 | 104 | ||
| Kinh in Ho Chi Minh City, Vietnam | KHV | 46 | 53 | 99 | ||
| Subtotal | 244 | 260 | 504 | |||
| SAS | Bengali from Bangladesh | BEB | 42 | 44 | 86 | 12 641 702 |
| Gujarati Indian from Houston, Texas | GIH | 56 | 47 | 103 | ||
| Indian Telugu from the UK | ITU | 59 | 43 | 102 | ||
| Punjabi from Lahore, Pakistan | PJL | 48 | 48 | 96 | ||
| Sri Lankan Tamil from the UK | STU | 55 | 47 | 102 | ||
| Subtotal | 260 | 229 | 489 | |||
| Total | 1233 | 1271 | 2504 | 28 993 742 | ||
Abbreviations: AFR, African; AMR, Admixed American; EAS, East Asian; EUR, European; MAF, minor allele frequency; SAS, South Asian.
MAF was calculated within each ancestral population.
Figure 1The −log10 Pmin distributions for five ancestral populations and meta-analysis results. We conducted GWAS simulations using the 1000 Genomes Phase 3 data set and measured the minimum P-value of the variants (Pmin). Each panel represents a population/meta-analysis result. Each vertical bar in the panel represents the top five percentile of −log10 Pmin (that is, the estimated empirical genome-wide significance −log10 Psig). The dotted vertical bar represents the common genome-wide significance threshold of 5.0 × 10−8. AFR, African; AMR, Admixed American; EAS, East Asian; EUR, European; SAS, South Asian; ALL, meta-analysis across all ancestral populations; ΔAFR, meta-analysis including all ancestral populations except for AFR (that is, EUR, AMR, EAS and SAS).
Estimated genome-wide significance thresholds for ancestral populations and meta-analyses
| P | |||||
|---|---|---|---|---|---|
| AFR | 3.24 × 10−8 (7.49) | 3.11 × 10−8–3.36 × 10−8 (7.47–7.51) | 21 048 933 | 1 545 429 | 0.073 |
| EUR | 9.26 × 10−8 (7.03) | 9.01 × 10−8–9.51 × 10−8 (7.02–7.05) | 11 980 247 | 540 128 | 0.045 |
| AMR | 1.83 × 10−7 (6.74) | 1.79 × 10−7–1.87 × 10−7 (6.73–6.75) | 14 261 439 | 273 444 | 0.019 |
| EAS | 1.61 × 10−7 (6.79) | 1.57 × 10−7–1.64 × 10−7 (6.78–6.80) | 10 201 713 | 311 275 | 0.031 |
| SAS | 9.46 × 10−8 (7.02) | 9.20 × 10−8–9.69 × 10−8 (7.01–7.04) | 12 641 702 | 528 484 | 0.042 |
| ALL | 3.25 × 10−8 (7.49) | 3.16 × 10−8–3.33 × 10−8 (7.48–7.50) | 28 993 742 | 1 539 237 | 0.053 |
| ΔAFR | 4.20 × 10−8 (7.38) | 4.08 × 10−8–4.33 × 10−8 (7.37–7.39) | 19 862 732 | 1 189 822 | 0.060 |
Abbreviations: AFR, African; ALL, meta-analysis across all ancestral populations; AMR, Admixed American; CI, confidence interval; EAS, East Asian; EUR, European; MAF, minor allele frequency; SAS, South Asian; ΔAFR, meta-analysis including all ancestral populations except for AFR (that is, EUR, AMR, EAS and SAS).
The 5th percentile of Psig was calculated based on the 95th percentile of –log10 Psig.
MAF was calculated within each ancestral population.
The effective number of independent variants was calculated by dividing the significance level α=0.05 by Psig.
Estimated effective number of independent variants in the AMR subpopulations by LD pruning
| P | ||||
|---|---|---|---|---|
| AMR | 14 261 439 | 2 129 877 | 0.149 | 2.35 × 10−8 (7.63) |
| CLM | 7 512 590 | 1 343 116 | 0.179 | 3.72 × 10−8 (7.43) |
| MXL | 7 218 484 | 985 773 | 0.137 | 5.07 × 10−8 (7.29) |
| PEL | 6 570 123 | 873 604 | 0.133 | 5.72 × 10−8 (7.24) |
| PUR | 7 735 691 | 1 542 788 | 0.199 | 3.24 × 10−8 (7.49) |
Abbreviations: AMR, Admixed American; CLM, Colombians from Medellin, Colombia; LD, linkage disequilibrium; MAF, minor allele frequency; MXL, Mexican Ancestry from Los Angeles, USA; PEL, Peruvians from Lima, Peru; PUR, Puerto Ricans from Puerto Rico.
MAF was calculated within each population.
The effective number of independent variants was estimated by LD-based pruning (sliding window size: 40 kb; window step size: 4 kb; r2<0.5).
Figure 2The relationship between −log10 PLD and −log10 Psig. We calculated the LD-based genome-wide significance PLD based on the effective number of independent variants, which was estimated by applying LD pruning with a maximum r2 threshold of 0.5. Whereas −log10 Psig showed approximately positive correlation with −log10 PLD for AFR, EUR, EAS and SAS (blue), AMR (red) is an outlier. The error bars represent the 95% CI for −log10 Psig. The dotted lines represent the common genome-wide significance threshold of P=5.0 × 10−8. AFR, African; AMR, Admixed American; EAS, East Asian; EUR, European; SAS, South Asian.