| Literature DB >> 35876838 |
Yongwen Zhuang1,2, Brooke N Wolford3, Kisung Nam4, Wenjian Bi5, Wei Zhou6, Cristen J Willer3,7,8, Bhramar Mukherjee2,9,10, Seunggeun Lee4.
Abstract
BACKGROUND: In the genome-wide association analysis of population-based biobanks, most diseases have low prevalence, which results in low detection power. One approach to tackle the problem is using family disease history, yet existing methods are unable to address type I error inflation induced by increased correlation of phenotypes among closely related samples, as well as unbalanced phenotypic distribution.Entities:
Year: 2022 PMID: 35876838 PMCID: PMC9477535 DOI: 10.1093/bioinformatics/btac459
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.Analytical framework of TAPE. In Step 1, latent disease risk of individuals is estimated from observed phenotypes and family disease history using a weighted proportion of the affected close relatives to the individual. In Step 2, a null LMM is fit with covariates and two random effects with the sparse kinship matrix and the dense GRM as covariance structures. In Step 3, P-values score test is performed for each genetic variant using empirical saddlepoint approximation
Empirical type I error rates for TAPE-WP, TAPE-LTFH, LT-FH and SAIGE, estimated using independent SNPs and a sample size of 10 000 ()
| Case:control | MAF | TAPE-WP | TAPE-LTFH | LTFH | SAIGE |
|---|---|---|---|---|---|
| 2500 pairs of siblings and 5000 independent individuals | |||||
| 1:99 | 0.001 | 4.977e−08 | 1.019e−07 | 5.928e−06 | 4.418e−08 |
| 5:95 | 0.001 | 5.115e−08 | 8.275e−08 | 1.252e−06 | 4.368e−08 |
| 10:90 | 0.001 | 5.476e−08 | 7.452e−08 | 5.489e−07 | 4.641e−08 |
| 1:99 | 0.01 | 5.455e−08 | 1.069e−07 | 1.409e−07 | 3.963e−08 |
| 5:95 | 0.01 | 5.143e−08 | 1.158e−07 | 1.940e−07 | 4.341e−08 |
| 10:90 | 0.01 | 5.459e−08 | 9.086e−08 | 1.141e−07 | 4.980e−08 |
| 1:99 | 0.10 | 5.007e−08 | 1.275e−07 | 1.500e−07 | 3.964e−08 |
| 5:95 | 0.10 | 5.213e−08 | 1.639e−07 | 1.238e−07 | 4.355e−08 |
| 10:90 | 0.10 | 6.416e−08 | 7.782e−08 | 7.232e−08 | 4.650e−08 |
| 625 8-member families and 5000 independent individuals | |||||
| 1:99 | 0.001 | 3.329e−08 | 9.028e−08 | 4.446e−06 | 3.832e−08 |
| 5:95 | 0.001 | 3.051e−08 | 6.563e−08 | 8.171e−07 | 4.245e−08 |
| 10:90 | 0.001 | 2.967e−08 | 5.145e−08 | 3.751e−07 | 4.721e−08 |
| 1:99 | 0.01 | 3.742e−08 | 9.792e−08 | 4.818e−07 | 4.547e−08 |
| 5:95 | 0.01 | 3.156e−08 | 7.906e−08 | 1.463e−07 | 4.311e−08 |
| 10:90 | 0.01 | 2.978e−08 | 6.215e−08 | 8.811e−08 | 4.324e−08 |
| 1:99 | 0.10 | 3.113e−08 | 7.730e−08 | 1.000e−07 | 3.895e−08 |
| 5:95 | 0.10 | 3.050e−08 | 7.983e−08 | 6.025e−08 | 4.232e−08 |
| 10:90 | 0.10 | 3.163e−08 | 6.372e−08 | 5.857e−08 | 4.546e−08 |
Note: Two types of population structure were considered: (i) sample consists of 2500 pairs of siblings and 5000 independent individuals; and (ii) sample consists of 625 8-member families and 5000 independent individuals.
Fig. 2.Average values of causal variants with N = 10 000 (5000 independent individuals and 2500 pairs of siblings), comparing TAPE-WP, TAPE-LTFH, LT-FH and SAIGE. For each dataset, 100 000 independent variants were simulated and 1% variants were selected as causal variants with four different effect sizes. A total of 100 datasets were generated to calculate average values. MAFs of variants were 0.1
Summary of 10 traits in UKB
| Trait | Phecode | Case:control | Parental prevalence |
|---|---|---|---|
| Parkinson’s disease | 332 | 1:360 | 0.0186 |
| Dementias | 290.1 | 1:406 | 0.0609 |
| Lung cancer | 165.1 | 1:181 | 0.0604 |
| Depression | 296.2 | 1:33 | 0.0462 |
| Type II diabetes | 250.2 | 1:20 | 0.0845 |
| Hypertension | 401 | 1:4 | 0.2388 |
| Chronic bronchitis | 496.2 | 1:136 | 0.0785 |
| Colorectal cancer | 153 | 1:87 | 0.0499 |
| Ischemic heart disease | 411 | 1:11 | 0.2373 |
| Cerebral ischemia | 433.3 | 1:138 | 0.1348 |
Fig. 3.Manhattan plot for the UKB association test results from SAIGE (first row), TAPE-LTFH (second row) and TAPE-WP (third row) among white British (N = 408 898). Left: type II diabetes (Phecode 250.2); right: Parkinson’s disease (Phecode 332). Significant clumped variants are identified using a window width of 5 Mb and a linkage disequilibrium threshold of 0.1
Fig. 4.Q–Q plot for the UKB association test results from SAIGE, LT-FH, TAPE-LTFH and TAPE-WP among white British (N = 408 898), categorized by MAF. Up: type II diabetes (Phecode 250.2); bottom: Parkinson’s disease (Phecode 332)