| Literature DB >> 34996381 |
Matthew Sutton1, Pierre-Emmanuel Sugier2,3, Therese Truong3, Benoit Liquet2,4.
Abstract
BACKGROUND: Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation. In this article, we propose statistical methods which incorporate both gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits.Entities:
Keywords: Genetic epidemiology; High dimensional data; Lasso penalization; Oncology; Pathway analysis; Pleiotropy; Sparse methods; Variable selection
Mesh:
Year: 2022 PMID: 34996381 PMCID: PMC8742466 DOI: 10.1186/s12874-021-01491-8
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Each row in the figure corresponds to a simulated scenario. Colours correspond to groups, and the number active in a group refers to the number of non-zero variables p in a group consisting of 20 variables per study (so 40 variables over K=2 studies). The number of non-zero vs zero groups is (1/4,2/8,4/16 and 8/32)
Non-overlapping pathway chosen for the study
| Pathway | Description | #Gene | #SNP |
|---|---|---|---|
| F _ | Obesity and obesity-related phenotypes | 48 | 857 |
| F _ | DNA repair | 88 | 610 |
| F _ | Circadian Rhythm | 23 | 559 |
| F _ | Xenobiotics metabolism | 68 | 531 |
| F _ | Precocious or delayed puberty | 16 | 329 |
| F _ | Cell cycle | 19 | 249 |
| F _ | Nicotinate and nicotinamide metabolism | 23 | 229 |
| F _ | Inflammatory response | 26 | 182 |
| F _ | Other glycan degradation | 15 | 111 |
| F _ | Folate metabolism | 5 | 50 |
Average variable selection performance averaged across 100 simulated datasets with standard deviations in brackets
| Individual | Group | ||||||
|---|---|---|---|---|---|---|---|
| Method | MCC | TPR | TNR | MCC | TPR | TNR | |
| Sim 1 | SMT | 0.29 (0.36) | 0.22 (0.27) | 1.00 (0.00) | 0.41 (0.49) | 0.41 (0.49) | 1.00 (0.00) |
| GMT | 0.10 (0.14) | 0.36 (0.48) | 0.91 (0.11) | 0.35 (0.49) | 0.36 (0.48) | 0.99 (0.05) | |
| SGMT | 0.47 (0.39) | 0.41 (0.37) | 1.00 (0.00) | 0.60 (0.49) | 0.63 (0.49) | 0.97 (0.09) | |
| ASSET | 0.21 (0.34) | 0.16 (0.26) | 1.00 (0.00) | 0.28 (0.45) | 0.28 (0.45) | 1.00 (0.00) | |
| GPA | 0.03 (0.12) | 0.09 (0.18) | 0.93 (0.17) | 0.04 (0.18) | 0.21 (0.41) | 0.83 (0.37) | |
| SGST | 0.09 (0.21) | 0.05 (0.12) | 1 (0.00) | 0.17 (0.38) | 0.17 (0.38) | 1 (0.03) | |
| Sim 2 | SMT | 0.55 (0.15) | 0.34 (0.16) | 1.00 (0.00) | 0.88 (0.20) | 0.84 (0.25) | 0.99 (0.03) |
| GMT | 0.34 (0.08) | 0.80 (0.27) | 0.83 (0.06) | 0.85 (0.21) | 0.80 (0.27) | 1.00 (0.02) | |
| SGMT | 0.72 (0.14) | 0.56 (0.20) | 1.00 (0.00) | 0.95 (0.11) | 0.95 (0.15) | 0.99 (0.04) | |
| ASSET | 0.46 (0.19) | 0.26 (0.15) | 1.00 (0.00) | 0.74 (0.30) | 0.70 (0.33) | 0.99 (0.04) | |
| GPA | 0.22 (0.2) | 0.11 (0.12) | 0.99 (0.07) | 0.43 (0.39) | 0.39 (0.37) | 0.98 (0.14) | |
| SGST | 0.01 (0.05) | 0.00 (0.02) | 1 (0.00) | 0.03 (0.13) | 0.02 (0.1) | 1.00 (0.00) | |
| Sim 3 | SMT | 0.46 (0.08) | 0.24 (0.08) | 1.00 (0.00) | 0.91 (0.12) | 0.89 (0.16) | 0.99 (0.03) |
| GMT | 0.57 (0.03) | 0.98 (0.06) | 0.84 (0.01) | 0.99 (0.05) | 0.98 (0.06) | 1.00 (0.01) | |
| SGMT | 0.73 (0.09) | 0.59 (0.13) | 1.00 (0.00) | 0.97 (0.07) | 0.99 (0.04) | 0.99 (0.03) | |
| ASSET | 0.33 (0.11) | 0.14 (0.08) | 1.00 (0.00) | 0.77 (0.20) | 0.71 (0.26) | 0.99 (0.04) | |
| GPA | 0.23 (0.09) | 0.07 (0.04) | 1.00 (0.00) | 0.61 (0.23) | 0.49 (0.26) | 1.00 (0.01) | |
| SGST | 0.00 (0.00) | 0.00 (0.00) | 1 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 1.00 (0.00) | |
| Sim 4 | SMT | 0.21 (0.05) | 0.06 (0.02) | 1.00 (0.00) | 0.70 (0.15) | 0.61 (0.17) | 0.99 (0.02) |
| GMT | 0.86 (0.02) | 1.00 (0.03) | 0.94 (0.00) | 1.00 (0.02) | 1.00 (0.03) | 1.00 (0.00) | |
| SGMT | 0.56 (0.04) | 0.38 (0.04) | 1.00 (0.00) | 0.99 (0.02) | 1.00 (0.02) | 1.00 (0.01) | |
| ASSET | 0.13 (0.08) | 0.03 (0.03) | 1.00 (0.00) | 0.46 (0.25) | 0.34 (0.24) | 0.99 (0.02) | |
| GPA | 0.11 (0.05) | 0.02 (0.01) | 1.00 (0.00) | 0.43 (0.21) | 0.28 (0.18) | 1.00 (0.01) | |
| SGST | 0.00 (0.00) | 0.00 (0.00) | 1.00 (0.00) | 0.00 (0.00) | 0.00 (0.00) | 1.00 (0.00) | |
Measures of performance are based on variable (pleiotropic) effect recovery and group effect recovery
Average reconstruction error for the different methods over 100 simulated datasets with standard deviations in brackets
| L1 | L2 | |||||||
|---|---|---|---|---|---|---|---|---|
| Method | Sim 1 | Sim 2 | Sim 3 | Sim 4 | Sim 1 | Sim 2 | Sim 3 | Sim 4 |
| SMT | 2.76 (0.94) | 9.62 (1.16) | 41.95 (2.33) | 193.14 (2.64) | 1.20 (0.27) | 2.22 (0.29) | 4.95 (0.29) | 11.80 (0.18) |
| GMT | 3.67 (0.64) | 13.60 (0.58) | 45.68 (1.35) | 174.58 (3.04) | 1.52 (0.11) | 2.67 (0.18) | 5.01 (0.21) | 10.69 (0.21) |
| SGMT | 3.66 (2.05) | 10.05 (1.57) | 39.63 (2.30) | 176.60 (3.64) | 1.21 (0.31) | 2.09 (0.25) | 4.52 (0.27) | 10.85 (0.24) |
| ASSET | 2.88 (0.55) | 10.25 (1.48) | 47.36 (1.99) | 202.60 (1.87) | 1.47 (0.23) | 2.79 (0.26) | 6.04 (0.19) | 12.69 (0.1) |
| GPA | 6.34 (7.45) | 12.33 (3.67) | 49.04 (1.27) | 203.32 (1.13) | 1.87 (0.64) | 3.06 (0.19) | 6.21 (0.11) | 12.73 (0.06) |
The estimated coefficients for the penalised methods correspond to the estimate with tuning parameters chosen from cross validation. The estimated coefficients for ASSET and GPA are set using the summary statistics of the active variables. An active variable for ASSET and GPA was one with a FDR corrected p-value less than 0.05
Pleiotropic SNPs selected by our different approaches. For each method, we reported if the SNP effect was find in the same direction between the two studies (+), the opposite direction (-) or not selected (ns)
| SNP | Chr | Pos (kbp) | EA | BA | DE | SGMT | ||
|---|---|---|---|---|---|---|---|---|
| BC | TC | Gene | Pathway | |||||
| rs1342862 * | 1 | 72,657 | G | A | − | − | NEGR1 | F _ |
| rs17483835 * | 1 | 183,297 | A | G | − | − | F _ | |
| rs17332991 * | 5 | 60,179 | A | C | − | − | ERCC8 | F _ |
| rs6151640 * | 5 | 79,967 | G | C | − | − | F _ | |
| rs249634 * | 5 | 80,164 | G | A | + | − | F _ | |
| rs4978820 * | 9 | 110,057 | A | G | − | − | F _ | |
| rs4255624 | 12 | 24,960 | G | A | − | − | F _ | |
| rs878156 * | 14 | 20,824 | G | A | − | + | PARP2 | F _ |
| rs1482057 * | 15 | 61,064 | A | C | − | + | RORA ** | F _ |
| rs12150110 | 17 | 11,962 | A | G | + | + | F _ | |
| rs3087592 | 22 | 41,079 | A | G | + | − | F _ | |
Chr: chromosome; EA: effect allele; BA: baseline allele; DE: direction of effects; BC: breast cancer; TC: thyroid cancer; * SNP selected by SMT; ** Gene selected by GMT
Fig. 2First 15 selected SNPs in the bootstrapped analysis with gene as group structure, ordered by frequency of appearance. The name of corresponding genes are mentioned. The 8 signals selected in the analysis on real datasets are represented in green
Fig. 3First 15 selected SNPs in the bootstrapped analysis with pathway as group structure, ordered by frequency of appearance. The name of corresponding pathways are mentioned. The 13 signals selected in the analysis on real datasets are represented in green