| Literature DB >> 25880419 |
André Lacour1, Vitalia Schüller2, Dmitriy Drichel3, Christine Herold4, Frank Jessen5,6, Markus Leber7, Wolfgang Maier8, Markus M Noethen9, Alfredo Ramirez10, Tatsiana Vaitsiakhovich11, Tim Becker12,13.
Abstract
BACKGROUND: A usually confronted problem in association studies is the occurrence of population stratification. In this work, we propose a novel framework to consider population matchings in the contexts of genome-wide and sequencing association studies. We employ pairwise and groupwise optimal case-control matchings and present an agglomerative hierarchical clustering, both based on a genetic similarity score matrix. In order to ensure that the resulting matches obtained from the matching algorithm capture correctly the population structure, we propose and discuss two stratum validation methods. We also invent a decisive extension to the Cochran-Armitage Trend test to explicitly take into account the particular population structure.Entities:
Mesh:
Year: 2015 PMID: 25880419 PMCID: PMC4367953 DOI: 10.1186/s12859-015-0521-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Genotypic contingency table for risk/reference allele /
|
|
|
|
| |
|---|---|---|---|---|
| Controls |
|
|
|
|
| Cases |
|
|
|
|
| Sum |
|
|
|
|
Distribution of the 1,845 individuals (967 controls, 878 cases) from 14 distinct ancestries
|
|
|
|
|
| ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
| ncontrols | 85 | 96 | 66 | 30 | 88 | 36 | 130 | 66 | 45 | 31 | 92 | 120 | 9 | 73 |
| ncases | 42 | 48 | 66 | 60 | 44 | 73 | 65 | 133 | 88 | 61 | 46 | 60 | 18 | 74 |
H0-simulation: inflation factor and false-positive rates
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|
| CAT | AT | – | – | 1845 | 1.990 | 0.013 | 0.167 | 0.001 |
| CAT | RSU | All | – | 1845 | 1.900 | 0.013 | 0.163 | 0.002 |
| CAT | RSU | Pairs | Cluster | 1322 (661p) | 0.853 | 0.008 | 0.044 | 0.001 |
| CAT | RSU | Pairs | Vicinity | 1254 (627p) | 0.846 | 0.007 | 0.044 | 0.001 |
| CAT | RSU | Groups | Cluster | 1845 (661g) | 0.921 | 0.006 | 0.047 | 0.001 |
| CAT | RSU | Groups | Vicinity | 1845 (627g) | 0.921 | 0.009 | 0.046 | 0.001 |
| CAT | RSU | Clusters | – | 1845 (14c) | 0.918 | 0.006 | 0.046 | 0.001 |
| MCAT(2) | RSU | Pairs | Cluster | 1322 (661p) | 0.832 | 0.008 | 0.044 | 0.001 |
| MCAT(2) | RSU | Pairs | Vicinity | 1254 (627p) | 0.828 | 0.010 | 0.043 | 0.001 |
| MCAT(2) | RSU | Groups | Cluster | 1845 (661g) | 1.005 | 0.011 | 0.050 | 0.001 |
| MCAT(2) | RSU | Groups | Vicinity | 1845 (627g) | 1.001 | 0.011 | 0.050 | 0.001 |
| MCAT(2) | RSU | Clusters | – | 1845 (14c) | 1.004 | 0.009 | 0.050 | 0.001 |
| MCAT(1) | AT | Pairs | Cluster | 1322 (661p) | 1.007 | 0.009 | 0.050 | 0.001 |
| MCAT(1) | AT | Pairs | Vicinity | 1254 (627p) | 1.007 | 0.011 | 0.050 | 0.001 |
| MCAT(1) | AT | Groups | Cluster | 1845 (627g) | 1.007 | 0.012 | 0.051 | 0.001 |
| MCAT(1) | AT | Groups | Vicinity | 1845 (627g) | 1.008 | 0.010 | 0.050 | 0.001 |
| MCAT(1) | AT | Clusters | – | 1845 (14c) | 1.000 | 0.007 | 0.050 | 0.001 |
| LRmds | LRT | 7 PCs | – | 1845 | 1.201 | 0.019 | 0.078 | 0.001 |
| LRmds | LRT | 14 PCs | – | 1845 | 1.006 | 0.009 | 0.051 | 0.001 |
| LRmds | LRT | 28 PCs | – | 1845 | 1.017 | 0.009 | 0.052 | 0.001 |
Given are means and standard errors of the inflation factor λ and false-positive rates f from ten iterations of 1845 individuals and ∼44,900 SNPs. The nominal error level is α=0.05. The abbreviations in the second column are: AT asymptotic test, RSU resampling simulation within units (99,999 cycles), LRT likelihood ratio test. Column N shows the number of individuals included and in brackets the number of pairs p, groups g and clusters c.
Figure 1ROC curve. Power simulation of 11,010 SNPs, 1845 individuals. P-values are corrected via genomic control using the corresponding inflation factor from the simulation under H 0. The abscissas are given in logarithmic scale. The upper plot compares the MCAT(2) with the principal components approach. The lower plot shows the asymptotic test MCAT(1). The subscripts in the legend denote the employed structures: clusters c, groups (pairs) with cluster validation gc (pc) and groups (pairs) with vicinity validation gv (pv).
Power simulation: power vs. selected nominal levels for all strategies
|
| ||||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| CAT | AT | – | – | 0.665 | 0.450 | 0.271 |
| CAT | RSU | Pairs | Cluster | 0.881 | 0.782 | 0.662 |
| CAT | RSU | Pairs | Vicinity | 0.846 | 0.710 | 0.558 |
| CAT | RSU | Groups | Cluster | 0.809 | 0.705 | 0.594 |
| CAT | RSU | Groups | Vicinity | 0.833 | 0.729 | 0.607 |
| CAT | RSU | Clusters | – | 0.808 | 0.704 | 0.597 |
| MCAT(2) | RSU | Pairs | Cluster | 0.884 | 0.783 | 0.658 |
| MCAT(2) | RSU | Pairs | Vicinity | 0.823 | 0.705 | 0.549 |
| MCAT(2) | RSU | Groups | Cluster | 0.907 | 0.808 | 0.685 |
| MCAT(2) | RSU | Groups | Vicinity | 0.872 | 0.745 | 0.591 |
| MCAT(2) | RSU | Clusters | – | 0.894 | 0.782 | 0.641 |
| MCAT(1) | AT | Pairs | Cluster | 0.862 | 0.733 | 0.579 |
| MCAT(1) | AT | Pairs | Vicinity | 0.813 | 0.641 | 0.454 |
| MCAT(1) | AT | Groups | Cluster | 0.903 | 0.797 | 0.671 |
| MCAT(1) | AT | Groups | Vicinity | 0.878 | 0.751 | 0.603 |
| MCAT(1) | AT | Clusters | – | 0.898 | 0.786 | 0.650 |
| LRmds | LRT | 7 PCs | – | 0.871 | 0.741 | 0.588 |
| LRmds | LRT | 14 PCs | – | 0.908 | 0.805 | 0.678 |
| LRmds | LRT | 28 PCs | – | 0.901 | 0.798 | 0.664 |
11,010 SNPs, 1845 individuals, nominal error level α and power 1−β. 10,000 independent SNPs were used to obtain structure information. The abbreviations in the second column are: AT asymptotic test, RSU resampling simulation within units (99,995 cycles), LRT likelihood ratio test. Column N shows the number of individuals included and in brackets the number of pairs p, groups g and clusters c.
Comparison of top ranking associations between different stratification methods
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
|
|
|
|
| ||
| rs13320534 | 3:46171700 |
|
|
|
|
|
| rs936939 | 3:45986623 |
|
|
|
| 1.55e-5 |
| rs9967637 | 19:57250898 |
| 6.50e-6 | 1.24e-5 |
| 4.96e-5 |
| rs17650960 | 15:27999442 |
| 7.93e-5 | 9.67e-5 | 5.33e-5 | 3.81e-5 |
| rs10902222 | 11:810882 |
| 2.76e-5 | 1.14e-5 | 1.04e-5 | 4.16e-5 |
| rs1992102 | 3:21280562 | 2.80e-6 |
|
|
| 2.38e-2 |
| rs2962492 | 5:39568609 | 7.54e-6 |
|
|
|
|
| rs4673251 | 2:204114244 | 1.70e-5 | 6.90e-6 | 1.13e-5 | 2.03e-5 | 8.57e-2 |
| rs16844699 | 3:103879674 | 4.00e-5 | 6.04e-5 | 6.04e-5 | 3.50e-4 |
|
| kgp9470129 | 3:141298124 | 5.30e-5 | 3.70e-5 | 2.67e-5 | 2.20e-5 |
|
| rs8073498 | 17:7569698 | 1.32e-4 | 1.66e-3 | 1.08e-3 | 6.90e-4 |
|
| rs3094078 | 6:30224970 | 3.16e-3 |
|
| 3.05e-2 | 1.28e-2 |
The indices of the P-values refer to the type of test: CAT test without any stratification method, LR-mds13/LR-pca07 for logistic regression with 13 MDS/7 PCA covariates, MLMA stands for mixed linear model association and MCAT-gv for our modified CAT test with group unit and vicinity validation.