| Literature DB >> 31324155 |
Anne-Sophie Stelzer1,2,3,4, Livia Maccioni5, Aslihan Gerhold-Ay6, Karin E Smedby7, Martin Schumacher8, Alexandra Nieters5, Harald Binder8.
Abstract
BACKGROUND: Increasingly, molecular measurements from multiple studies are pooled to identify risk scores, with only partial overlap of measurements available from different studies. Univariate analyses of such markers have routinely been performed in such settings using meta-analysis techniques in genome-wide association studies for identifying genetic risk scores. In contrast, multivariable techniques such as regularized regression, which might potentially be more powerful, are hampered by only partial overlap of available markers even when the pooling of individual level data is feasible for analysis. This cannot easily be addressed at a preprocessing level, as quality criteria in the different studies may result in differential availability of markers - even after imputation.Entities:
Keywords: Consortium; Multivariable model; Partial overlap; Regularized regression; Single nucleotide polymorphism
Mesh:
Year: 2019 PMID: 31324155 PMCID: PMC6642584 DOI: 10.1186/s12881-019-0849-0
Source DB: PubMed Journal: BMC Med Genet ISSN: 1471-2350 Impact factor: 2.103
Fig. 1Scenarios appearing in the analysis of consortial data based on two studies after imputation. a. Illustration of SNP data for all individuals in a study. Every row contains all SNP data for one individual and each column represents the data for one SNP and all individuals. b. A perfect world: Both studies cover the same SNPs for all individuals (full). c. Reality: Differential coverage of SNPs in both studies. All SNPs in study B are a real subset of the SNPs in study A. An ideal analysis can use all applicable information (indicated by red for reduced). d. Reality: Differential coverage of SNPs in both studies as in Fig. 1c. In a complete case analysis, all information from study B is dropped (indicated by part for partial)
Top 10 SNPs according to IFs for the full data analysis resembling the “truth” (IF) in decreasing order
| SNP | blackIF | blackIF | blackIF | ||
|---|---|---|---|---|---|
| rs7039441 | ✓ | 0.68 |
|
| 0.05 |
| rs1323398 | ✓ | 0.55 |
|
| 0.02 |
| rs3793482 | ✘ | 0.44 |
|
| 0.02 |
| rs1048251 | ✓ | 0.38 |
|
| 0.09 |
| rs10965030 | ✘ | 0.28 |
|
| 0.07 |
| rs10491695 | ✘ | 0.25 |
|
| 0.26 |
| rs3750417 | ✘ | 0.22 |
|
| 0.06 |
| rs7846927 | ✓ | 0.21 |
|
| 0.05 |
| rs6477107 | ✓ | 0.19 |
|
| 0.02 |
| rs12684584 | ✘ | 0.19 |
|
| 0.34 |
✓ SNP present in both studies
✘ SNP present in the large study (study A) but not in the white✘ small study (study B)
We additionally report the respective IFs from the reduced (IF) and partial analysis (IF). Numbers are marked in bold if the IF of the (reduced or partial) analysis is smaller than that of the full analysis (IF
Overlap of top 100 selected SNPs by the lasso and synthesis regression
| SNP | rank | rank | IF |
|---|---|---|---|
| rs894243 | 12 | 5 | 0.14 |
| rs80159021 | 21 | 1 | 0.00 |
| rs7041984 | 25 | 9 | 0.00 |
| rs7039441 | 32 | 40 | 0.68 |
| rs7020755 | 60 | 4 | 0.00 |
| rs6475560 | 71 | 30 | 0.00 |
The SNPs have been ordered in an increasing way according to their position in the selection sequence when applying the lasso with different values for λ (rank ). rank details the SNP’s ranks according to the inclusion frequencies returned by the application of boosting. IF shows the inclusion frequencies when applying synthesis regression to the original study A data including missings
Top 10 SNPs according to IFs for the combined data analysis (IF) in decreasing order
| SNP | IF | IF | IF | ||
|---|---|---|---|---|---|
| rs2274095 | ✘ | 0.52 |
| - | 0.42 |
| rs722628 | ✓ | 0.48 |
|
| 0.55 |
| rs7022345 | ✓ | 0.44 |
|
| 0.02 |
| rs1323398 | ✓ | 0.41 |
|
| 0.13 |
| rs2792232 | ✓ | 0.39 |
|
| 0.20 |
| rs1886261 | ✘ | 0.35 |
| - | 0.20 |
| rs10974947 | ✓ | 0.34 |
|
| 0.06 |
| rs4742308 | ✓ | 0.34 |
|
| 0.31 |
| rs4742247 | ✓ | 0.30 |
|
| 0.90 |
| rs7018851 | ✓ | 0.29 |
|
| 0.63 |
✓ SNP present in both studies
✘ SNP present in study A but not in study B
We additionally report the respective IFs from the analysis of study A (IF) and study B (IF). Numbers are marked in bold if the IF of the analysis (of study A or study B) is smaller than that of the combined analysis (IF
Fig. 2This illustration shows how combining information from both studies A and B changes the inclusion frequency (IF) in comparison to IFs in both single studies