| Literature DB >> 36042399 |
Zhonghe Shao1, Ting Wang1, Jiahao Qiao1, Yuchen Zhang1, Shuiping Huang1,2,3,4,5, Ping Zeng6,7,8,9,10.
Abstract
BACKGROUND: Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods.Entities:
Keywords: Common and rare variant association study; Expression quantitative trait loci; Genome-wide association study; Integrative analysis; Multilocus method; P value combination method; SNP-set analysis; Summary statistics
Mesh:
Year: 2022 PMID: 36042399 PMCID: PMC9429742 DOI: 10.1186/s12859-022-04897-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
An overview of 22 SNP-set methods and their corresponding modeling characteristics
| No | Year | Method | Input | Calculate | References | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| R | Other | Analytical | Simulation | |||||||
| 1 | 1960 | MLR | √ | √ | √ | √ | [ | |||
| 2 | 2008 | FLM | √ | √ | √ | √ | [ | |||
| 3 | 2004 | HC | √ | √ | √ | [ | ||||
| 4 | 2017 | GHC | √ | √ | √ | √ | [ | |||
| 5 | 2019 | BJ | √ | √ | √ | [ | ||||
| 6 | 2019 | GBJ | √ | √ | √ | √ | [ | |||
| 7 | 2020 | DOT | √ | √ | √ | √ | [ | |||
| 8 | 2017 | BT | √ | √ | √ | √ | [ | |||
| 9 | 2013 | SKATO | √ | √ | √ | √ | [ | |||
| 10 | 2018 | SKAT | √ | √ | √ | √ | [ | |||
| 11 | 1986 | Simes | √ | √ | [ | |||||
| 12 | 1992 | FCP | √ | √ | [ | |||||
| 13 | 2002 | TPM | √ | √ | [ | |||||
| 14 | 2003 | RTP | √ | √ | [ | |||||
| 15 | 2007 | minP | √ | √ | √ | [ | ||||
| 16 | 2019 | ART | √ | √ | [ | |||||
| 17 | 2019 | ART-A | √ | √ | √ | √ | [ | |||
| 18 | 2007 | GM | √ | √ | [ | |||||
| 19 | 2008 | SimpleM | √ | √ | √ | [ | ||||
| 20 | 2011 | GATES | √ | √ | √ | [ | ||||
| 21 | 2019 | HMP | √ | √ | √ | [ | ||||
| 22 | 2020 | ACAT | √ | √ | √ | [ | ||||
P denotes a vector of P values, Z denotes a vector of Z scores, W is a vector of weights, R denotes the SNP-by-SNP correlation matrix, τ indicates a fixed value that P is less than in TPM, with the default being 0.2; k is the number of P values to be combined in RTP, ARTP, ART, ART-A, with the default value being 2/M, where M is the number of SNPs for a given gene; a is a shape parameter in GM, with the default being 0.0383; N is the sample size
MLR Multiple linear regression, FLM Functional multiple linear regression model, HC Higher criticism test, GHC Generalized higher criticism test, BJ Berk–Jones test, GBJ Generalized Berk–Jones test, DOT Decorrelation by orthogonal transformation, BT Burden test, SKATO Optimal sequence kernel association test, SKAT Sequence kernel association test, Simes Simes’s test, FCP Fisher combined probability, TPM Truncated product method, RTP Rank truncated product, ART Augmented rank truncation, ART-A Adaptive augmented rank truncation, GM Gamma method, GATES Gene-based association test that uses extended Simes procedure, HMP The harmonic mean P value test, ACAT Aggregated Cauchy association test
Fig. 1Statistical analysis framework for the theoretical and application comparison of SNP-set based association methods with summary statistics
Ratio between the empirical type I error and the given significance level estimated over 105 simulations under common variants
| Method | Significance level α | Performance of type I error control | |||||
|---|---|---|---|---|---|---|---|
| 0.05 | 0.01 | 0.001 | Average | Inflated | Well-controlled | Conservative | |
| MLR | 0.00 | 0.00 | 0.00 | 0.00 | √ | ||
| FLM | 0.00 | 0.00 | 0.00 | 0.00 | √ | ||
| HC | 1.33 | 1.82 | 2.33 | 1.83 | √ | ||
| GHC | 1.26 | 1.65 | 1.94 | 1.62 | √ | ||
| BJ | 1.29 | 1.64 | 1.97 | 1.63 | √ | ||
| GBJ | 0.85 | 1.32 | 1.71 | 1.29 | √ | ||
| DOT | 0.00 | 0.00 | 0.00 | 0.00 | √ | ||
| BT | 1.04 | 1.07 | 1.10 | 1.07 | √ | ||
| SKAT-O | 1.08 | 1.18 | 1.11 | 1.12 | √ | ||
| SKAT | 1.02 | 1.08 | 1.08 | 1.06 | √ | ||
| Simes | 0.82 | 0.82 | 0.82 | 0.82 | √ | ||
| FCP | 5.29 | 21.88 | 174.81 | 67.33 | √ | ||
| TPM | 2.45 | 10.39 | 86.81 | 33.22 | √ | ||
| RTP | 3.76 | 14.71 | 110.07 | 42.85 | √ | ||
| minP | 0.88 | 0.82 | 0.77 | 0.82 | √ | ||
| ART | 4.15 | 16.51 | 126.91 | 49.19 | √ | ||
| ART-A | 1.17 | 3.05 | 12.97 | 5.73 | √ | ||
| GM | 2.01 | 7.43 | 52.03 | 20.49 | √ | ||
| SimpleM | 0.39 | 0.41 | 0.41 | 0.40 | √ | ||
| GATES | 1.47 | 1.53 | 1.51 | 1.50 | √ | ||
| HMP | 0.87 | 1.01 | 1.06 | 0.98 | √ | ||
| ACAT | 1.04 | 1.08 | 1.07 | 1.06 | √ | ||
Determine whether a SNP-set method was inflated, well-controlled or conservative according to the average ratio between the empirical type I error and the given significance level over 105 simulations. inflated: ratio > 1.2; well-controlled: 0.8 ≤ ratio ≤ 1.2; conservative: ratio < 0.8
Fig. 2Estimated power for the seven SNP-set methods under the sparse case with a significance level α of 10−5. Here, PVE = 0.3%, 0.5% or 1% at the right side, the number of causal SNPs (prop) = 0.05, 0.20 or 0.50 on the top, the number of the total analyzed SNPs = 50, 200 or 500 on the x-axis. The power was estimated across 103 replications
Estimated powers of the seven methods under sparse case where PVE = 0.3%, and prop = 5%, 20% or 50% of SNPs were randomly selected to be causal with the same direction of effect sizes
| Prop | BT | SKATO | SKAT | Simes | minP | HMP | ACAT |
|---|---|---|---|---|---|---|---|
| 0.05 | 0.350 | 0.065 | 0.059 | 0.044 | 0.037 | 0.054 | 0.054 |
| 0.20 | 0.379 | 0.058 | 0.062 | 0.039 | 0.035 | 0.051 | 0.051 |
| 0.50 | 0.363 | 0.066 | 0.058 | 0.038 | 0.038 | 0.047 | 0.047 |
Fig. 3Estimated power for the seven SNP-set methods in the case of rare variant association study under the sparse case with a significance level α of 10−5. Here, PVE = 0.3%, 0.5% or 1% at the right side, the number of causal SNPs = 0.05, 0.20 or 0.50 on the top, the number of the total analyzed SNPs = 50, 200 or 500 on the x-axis. The power was estimated across 103 replications
Fig. 4(A) Estimated power for SNP-set methods under the polygenic TWAS framework of no horizontal pleiotropy. (B) Estimated power for SNP-set methods under the TWAS polygenic framework of horizontal pleiotropy. Here, θ = 0.1 or 0.2 at the right side, the − log10(α) = 3, 4, or 5 on the top, the number of the total analyzed SNPs = 50, 200 or 500 on the x-axis. The power was estimated across 103 replications
Summary performance of these SNP-set based association methods in the power evaluation of simulation studies and in real-data applications to distinct fields
| Common variants | Rare variants | |||
|---|---|---|---|---|
| Unweighted | TWAS with eQTL weights | |||
| No horizontal pleiotropy | Horizontal pleiotropy | |||
| Simulation | HMP ACAT | BT | HMP ACAT | SKAT SKATO |
| Application | HMP ACAT | HMP SKATO | SKAT SKATO | |
The methods listed in the table were selected in terms of their power in the simulation studies or based on the number of identified genes in the real-data applications
Total computation times (second) of 103 simulations under the sparse case with PVE = 0.5% and only 20% of simulated SNPs were selected to have substantial impacts on phenotype
| BT | SKATO | SKAT | Simes | minP | HMP | ACAT | |
|---|---|---|---|---|---|---|---|
| 50 | 4.10 | 483.83 | 7.16 | 3.98 | 4.48 | 4.12 | 3.83 |
| 200 | 58.59 | 1234.77 | 70.84 | 58.35 | 60.55 | 58.62 | 52.71 |
| 500 | 297.73 | 1561.44 | 524.48 | 296.46 | 312.76 | 295.70 | 279.46 |
Fig. 5Upset plot to illustrate the number of identified genes shared across distinct SNP-set methods for six psychiatric disorders (A), four plasma lipid traits (B), and nine immune-related diseases (C)
Identified genes associated with six psychiatric disorders, four plasma lipid traits and nine immune-related diseases under various real-data applications
| Phenotype | BT | SKATO | SKAT | Simes | minP | HMP | ACAT | Total |
|---|---|---|---|---|---|---|---|---|
| ADHD | 6 | 25 | 25 | 24 | 25 | 26 | 36 | |
| ASD | 1 | 3 | 2 | 2 | 3 | 3 | 5 | |
| BIP | 7 | 65 | 74 | 57 | 59 | 80 | 116 | |
| CU | 0 | 10 | 12 | 10 | 11 | 16 | ||
| MDD | 2 | 9 | 1 | 3 | 5 | 5 | 13 | |
| SCZ | 11 | 282 | 299 | 295 | 299 | 298 | 402 | |
| HDL | 22 | 221 | 222 | 193 | 192 | 215 | 282 | |
| LDL | 65 | 147 | 146 | 147 | 150 | 144 | 198 | |
| TG | 78 | 203 | 168 | 168 | 168 | 168 | 209 | |
| TC | 146 | 218 | 198 | 198 | 197 | 198 | 252 | |
| IBD | 22 | 146 | 94 | 249 | 292 | |||
| UC | 13 | 175 | 129 | 219 | 271 | |||
| CD | 22 | 222 | 144 | 272 | 357 | |||
| SLE | 101 | 180 | 104 | 266 | 315 | |||
| PBC | 102 | 65 | 122 | 121 | 240 | |||
| PSC | 92 | 210 | 138 | 221 | 298 | |||
| RA | 46 | 157 | 139 | 141 | 217 | |||
| MS | 106 | 137 | 183 | 201 | 306 | |||
| OST | 10 | 0 | 5 | 5 | 49 | |||
The maximum number of associated genes is highlighted in bold for each disease. Methods including Simes and minP which cannot incorporate eQTL weights were excluded from the TWAS analysis of the nine immune-related diseases
ADHD Attention-deficit/hyperactivity disorder, ASD Autism spectrum disorder, BIP Bipolar disorder, CU Cannabis use, MDD Major depression disorder, SCZ Schizophrenia, HDL High-density-lipoprotein cholesterol, LDL Low-density-lipoprotein cholesterol, TG Triglycerides, TC Total cholesterol, IBD Inflammatory bowel disease, UC Ulcerative colitis, CD Crohn’s disease, SLE Systemic lupus erythematosus, PBC Primary biliary cirrhosis, PSC Primary sclerosing cholangitis, RA Rheumatoid arthritis, MS Multiple sclerosis, OST Osteoarthritis