| Literature DB >> 32287273 |
Olga A Vsevolozhskaya1, Min Shi2, Fengjiao Hu2, Dmitri V Zaykin2.
Abstract
Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including cost of management, difficulties in consolidation of records across research groups, etc. These issues make methods based on SNP-level summary statistics particularly appealing. The most common form of combining statistics is a sum of SNP-level squared scores, possibly weighted, as in burden tests for rare variants. The overall significance of the resulting statistic is evaluated using its distribution under the null hypothesis. Here, we demonstrate that this basic approach can be substantially improved by decorrelating scores prior to their addition, resulting in remarkable power gains in situations that are most commonly encountered in practice; namely, under heterogeneity of effect sizes and diversity between pairwise LD. In these situations, the power of the traditional test, based on the added squared scores, quickly reaches a ceiling, as the number of variants increases. Thus, the traditional approach does not benefit from information potentially contained in any additional SNPs, while our decorrelation by orthogonal transformation (DOT) method yields steady gain in power. We present theoretical and computational analyses of both approaches, and reveal causes behind sometimes dramatic difference in their respective powers. We showcase DOT by analyzing breast cancer and cleft lip data, in which our method strengthened levels of previously reported associations and implied the possibility of multiple new alleles that jointly confer disease risk.Entities:
Year: 2020 PMID: 32287273 PMCID: PMC7182280 DOI: 10.1371/journal.pcbi.1007819
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Power comparison of TQ, DOT, and ACAT, assuming very similar effect sizes in magnitude and equicorrelation LD structure with ρ = 0.7.
| Number of SNPs | Empiric. | Theor. | Approx. | Empiric. | Theor. | ACAT | |
|---|---|---|---|---|---|---|---|
| TQ | TQ | TQ | DOT | DOT | |||
| 500 | 0.802 | 0.802 | 0.802 | 0.090 | 0.090 | 0.832 | 0.02 |
| 300 | 0.801 | 0.801 | 0.801 | 0.101 | 0.100 | 0.830 | 0.03 |
| 200 | 0.801 | 0.801 | 0.801 | 0.112 | 0.112 | 0.829 | 0.04 |
| 100 | 0.799 | 0.800 | 0.800 | 0.144 | 0.145 | 0.826 | 0.08 |
| 50 | 0.798 | 0.799 | 0.799 | 0.196 | 0.197 | 0.821 | 0.16 |
| 30 | 0.795 | 0.796 | 0.796 | 0.253 | 0.252 | 0.814 | 0.26 |
| 20 | 0.794 | 0.793 | 0.794 | 0.307 | 0.306 | 0.809 | 0.39 |
Power comparison of TQ, DOT, and ACAT, assuming very similar effect sizes but heterogeneous LD structure.
| Number of SNPs | Empiric. | Theor. | Approx. | Empiric. | Theor. | ACAT | |
|---|---|---|---|---|---|---|---|
| TQ | TQ | TQ | DOT | DOT | |||
| 500 | 0.729 | 0.730 | 0.726 | 0.973 | 0.973 | 0.793 | 0.251 |
| 300 | 0.731 | 0.730 | 0.726 | 0.883 | 0.883 | 0.791 | 0.256 |
| 200 | 0.731 | 0.730 | 0.726 | 0.810 | 0.811 | 0.789 | 0.281 |
| 100 | 0.730 | 0.731 | 0.726 | 0.599 | 0.599 | 0.786 | 0.295 |
| 50 | 0.732 | 0.733 | 0.728 | 0.577 | 0.576 | 0.782 | 0.418 |
| 30 | 0.736 | 0.735 | 0.729 | 0.504 | 0.502 | 0.778 | 0.488 |
| 20 | 0.737 | 0.737 | 0.731 | 0.541 | 0.540 | 0.776 | 0.661 |
Power comparison of TQ, DOT, and ACAT, assuming heterogeneity in effect sizes but equicorrelated LD.
| Number of SNPs | Empiric. | Theor. | P-approx. | Empiric. | Theor. | ACAT | |
|---|---|---|---|---|---|---|---|
| TQ | TQ | TQ | DOT | DOT | |||
| 500 | 0.525 | 0.525 | 0.526 | 1.000 | 1.000 | 0.626 | 0.479 |
| 300 | 0.526 | 0.525 | 0.526 | 1.000 | 0.999 | 0.624 | 0.486 |
| 200 | 0.526 | 0.525 | 0.524 | 0.993 | 0.993 | 0.622 | 0.494 |
| 100 | 0.525 | 0.524 | 0.524 | 0.919 | 0.920 | 0.616 | 0.518 |
| 50 | 0.522 | 0.523 | 0.522 | 0.762 | 0.762 | 0.607 | 0.566 |
| 30 | 0.521 | 0.521 | 0.521 | 0.648 | 0.648 | 0.599 | 0.630 |
| 20 | 0.519 | 0.519 | 0.520 | 0.578 | 0.579 | 0.592 | 0.709 |
Power comparison of TQ, DOT, and ACAT with effect sizes randomly sampled from -0.15 to 0.15 and heterogeneous LD.
| Number of SNPs | Empiric. | Theor. | P-approx. | Empiric. | Theor. | ACAT | |
|---|---|---|---|---|---|---|---|
| TQ | TQ | TQ | DOT | DOT | |||
| 500 | 0.0500 | 0.0503 | 0.0508 | 0.9226 | 0.9222 | 0.0564 | 0.2118 |
| 300 | 0.0506 | 0.0503 | 0.0509 | 0.7688 | 0.7689 | 0.0570 | 0.2107 |
| 200 | 0.0504 | 0.0503 | 0.0508 | 0.5970 | 0.5967 | 0.0570 | 0.2025 |
| 100 | 0.0504 | 0.0503 | 0.0509 | 0.3040 | 0.3038 | 0.0568 | 0.1655 |
| 50 | 0.0502 | 0.0503 | 0.0508 | 0.3074 | 0.3070 | 0.0555 | 0.2397 |
| 30 | 0.0505 | 0.0503 | 0.0507 | 0.1485 | 0.1487 | 0.0562 | 0.1527 |
| 20 | 0.0501 | 0.0503 | 0.0508 | 0.1191 | 0.1189 | 0.0557 | 0.1399 |
Power comparison of TQ, DOT, and ACAT using realistic LD patterns from 1000 Genomes project.
| Theor. | Approx. | Regr. | MVN | Theor. | Regr. | MVN | ||
|---|---|---|---|---|---|---|---|---|
| TQ | TQ | TQ | TQ | DOT | DOT | DOT | ACAT | |
| Setting 5 | ||||||||
| 0.34 | 0.34 | 0.34 | 0.34 | 0.60 | 0.60 | 0.60 | 0.40 | |
| Setting 6 | ||||||||
| 0.42 | 0.42 | 0.42 | 0.43 | 0.77 | 0.77 | 0.77 | 0.43 | |
| Setting 7 | ||||||||
| 0.24 | 0.24 | 0.24 | 0.24 | 0.76 | 0.76 | 0.76 | 0.18 | |
Type-I error rates (α = 10−3) using a reference panel to estimate LD.
Population LD patterns are modeled using 1000 Genomes project data.
| Sample size | TQ | DOT | ACAT |
|---|---|---|---|
| 1 × 10−3 | 3 × 10−3 | 1 × 10−3 | |
| 1 × 10−3 | 3 × 10−3 | 1 × 10−3 | |
| 1 × 10−3 | 2 × 10−3 | 1 × 10−3 | |
| 1 × 10−3 | 1 × 10−4 | 1 × 10−3 |
Type-I error rates (α = 10−7) using a reference panel to estimate LD.
Population LD patterns are modeled using 1000 Genomes project data.
| Sample size | TQ | DOT | ACAT |
|---|---|---|---|
| 2 × 10−7 | 3 × 10−4 | 1 × 10−7 | |
| 2 × 10−7 | 2 × 10−4 | 1 × 10−7 | |
| 2 × 10−7 | 2 × 10−4 | 1 × 10−7 | |
| 2 × 10−7 | 1 × 10−4 | 1 × 10−7 |
Fig 1Overview of DOT method in application to breast cancer data.
We compute gene-level score by first decorrelating SNP P-values using the invariant to order matrix H and then calculating sum of independent chi-squared statistics. We utilize our DOT method to obtain a gene-level P-value. In the breast cancer data application, we chose an anchor SNP—a SNP that has previously been reported as risk variant (highlighted by a vertical dashed line),—and then combine SNPs in an LD block with the anchor SNP by the DOT. SNP-level P-values highlighted in red are those in moderate to high LD with the anchor SNP.
Breast cancer candidate gene association P-values.
| Gene | TQ | DOT | ACAT | min( |
|---|---|---|---|---|
| 0.0005 | 0.0004 | 0.001 | 0.001 | |
| 0.20 | 0.0001 | 0.19 | 0.96 | |
| 0.01 | 0.003 | 0.01 | 0.07 | |
| 0.56 | 0.009 | 0.76 | 1 |
Breast cancer SNPs identified by DOT in the analysis of GWAS data.
| Gene | Number of SNPs in analysis ( | rs number | Reference |
|---|---|---|---|
| 13 | rs4784220 | This SNP was previously reported in the literature to be associated with breast cancer [ | |
| rs8046979 | This SNP was also linked to breast cancer [ | ||
| A new association with susceptibility to breast cancer. | |||
| 36 | rs2347867 | This SNP was previously reported to be involved in breast cancer risk [ | |
| rs985191 | This SNP was previously reported to be associated with endocrine therapy efficacy in breast cancer [ | ||
| A new association with susceptibility to breast cancer. This SNP was previously linked to the effectiveness of androgen deprivation therapy among prostate cancer patients [ | |||
| A new association with susceptibility to breast cancer. | |||
| A new association with susceptibility to breast cancer. | |||
| A new association with susceptibility to breast cancer. | |||
| A new association with susceptibility to breast cancer. | |||
| 18 | rs1219648 | This SNP was previously reported to be associated with premenopausal breast cancer [ | |
| rs2860197 | This SNP was previously suggested to have an association with breast cancer [ | ||
| rs2981582 | This SNP was previously reported in the literature to be associated with breast cancer [ | ||
| rs3135730 | This SNP was previously suggested to have an interaction between oral contraceptive use and breast cancer [ | ||
| A new association with susceptibility to breast cancer. | |||
| 30 | rs999737 | This SNP was previously reported in the literature to be associated with breast cancer [ | |
| rs8016149 | This SNP was previously suggested to have an association with breast cancer [ | ||
| rs1023529 | This SNP has been patented as one of susceptibility variants of breast cancer [ | ||
| rs2189517 | This SNP was showed to be associated with breast cancer in Chinese population [ | ||
| A new association with susceptibility to breast cancer. |
Cleft lip candidate gene association P-values.
| Gene | TQ | DOT | ACAT | min( |
|---|---|---|---|---|
| 8.9 × 10−8 | 1.3 × 10−13 | 7.2 × 10−11 | 7.2 × 10−11 | |
| chr. 8q24/rs987525 [ | 1.0 × 10−9 | 8.7 × 10−22 | 4.7 × 10−15 | 3.2 × 10−15 |
| 4.7 × 10−9 | 1.8 × 10−19 | 2.1 × 10−14 | 2.1 × 10−14 | |
| 1.5 × 10−8 | 2.9 × 10−8 | 2.4 × 10−11 | 3.6 × 10−11 |
Cleft SNPs identified by DOT in the analysis of GWAS data.
| Gene | Number of SNPs in analysis ( | rs number | Reference |
|---|---|---|---|
| 30 | rs4847196 | This SNP was previously studied in connection to cleft lip [ | |
| rs563429 | This SNP was also previously considered in association with cleft lip [ | ||
| rs2275035 | Was recently identified to be associated with orofacial clefting [ | ||
| A new association with susceptibility to cleft lip. This SNP was previously suggested to be linked to esophageal cancer [ | |||
| chr. 8q24 | 29 | rs987525 | One of the top results was the anchor SNP [ |
| rs882083 | Was previously suggested to be associated with cleft lip [ | ||
| rs1157136 | Was previously suggested to be associated with cleft lip in Brazilian population [ | ||
| rs12548036 | Was previously studied in connection to susceptibility to cleft lip in Japanese population [ | ||
| rs1530300 | Was previously suggested to be associated with cleft lip in Brazilian population [ | ||
| A new association with susceptibility to cleft lip. | |||
| 6 | rs10863790 | One of the top contributions was the anchor SNP [ | |
| rs861020 | Was previously reported to be associated with cleft lip [ | ||
| rs2236906 | Was considered to be associated with cleft lip in a Kenya African Cohort [ | ||
| rs2073485 | Was reported to be associated with cleft lip in Western China [ | ||
| 14 | rs11696257 | Was previously reported to be associated with cleft lip [ | |
| rs6102085 | Was previously reported to be associated with cleft lip in Han Chinese population [ | ||
| rs6065259 | Was previously reported to be associated with cleft lip in a population in Heilongjiang Province, northern China [ | ||
| rs6102074 | Was previously reported to be associated with cleft lip in Han Chinese population [ |
Type-I error rates (α = 10−4) using a reference panel to estimate LD.
Population LD patterns are modeled using 1000 Genomes project data.
| Sample size | TQ | DOT | ACAT |
|---|---|---|---|
| 9 × 10−5 | 5 × 10−4 | 1 × 10−4 | |
| 9 × 10−5 | 4 × 10−4 | 1 × 10−4 | |
| 1 × 10−4 | 1 × 10−4 | 1 × 10−4 | |
| 1 × 10−4 | 1 × 10−4 | 1 × 10−4 |