| Literature DB >> 28303589 |
Renaud Tissier1, Roula Tsonaka1, Simon P Mooijaart2, Eline Slagboom3, Jeanine J Houwing-Duistermaat1,4.
Abstract
The case-control design is often used to test associations between the case-control status and genetic variants. In addition to this primary phenotype, a number of additional traits, known as secondary phenotypes, are routinely recorded, and typically, associations between genetic factors and these secondary traits are studied too. Analysing secondary phenotypes in case-control studies may lead to biased genetic effect estimates, especially when the marker tested is associated with the primary phenotype and when the primary and secondary phenotypes tested are correlated. Several methods have been proposed in the literature to overcome the problem, but they are limited to case-control studies and not directly applicable to more complex designs, such as the multiple-cases family studies. A proper secondary phenotype analysis, in this case, is complicated by the within families correlations on top of the biased sampling design. We propose a novel approach to accommodate the ascertainment process while explicitly modelling the familial relationships. Our approach pairs existing methods for mixed-effects models with the retrospective likelihood framework and uses a multivariate probit model to capture the association between the mixed type primary and secondary phenotypes. To examine the efficiency and bias of the estimates, we performed simulations under several scenarios for the association between the primary phenotype, secondary phenotype and genetic markers. We will illustrate the method by analysing the association between triglyceride levels and glucose (secondary phenotypes) and genetic markers from the Leiden Longevity Study, a multiple-cases family study that investigates longevity.Entities:
Keywords: ascertainment; family data; genetic association and heritability; mixed models; multivariate probit model
Mesh:
Substances:
Year: 2017 PMID: 28303589 PMCID: PMC5485037 DOI: 10.1002/sim.7281
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373
Figure 1Directed acyclic graph representing the case where bias is expected when estimating the association between the genetic marker and the secondary phenotype. Arrows represent existing association between each node of the graph. A secondary phenotype analysis investigates whether there is an association between the genetic factor and the secondary phenotype.
Figure 2Example of a family pedigree from the Leiden Longevity Study. Squares and circles represent men and women, respectively; crossed symbols represent deceased individuals. In black are the long‐lived individuals on whom the ascertainment is based; in grey are the cases of the study (offsprings of long‐lived siblings), and in white are the controls.
Figure 3Estimates and 95% confidence intervals for the single‐nucleotide polymorphism (SNP) effect on the secondary phenotype for the retrospective likelihood approach and the naive method. Results are obtained from 500 simulated datasets of 400 sibships for two ascertainment schedules. The top and bottom panel correspond to a rare or common primary phenotype with a prevalence around 1% and 5%, respectively. In black and red are represented results for small (α 1=0.1) and large (α 1=0.5) effect sizes of the SNP on the primary phenotype, respectively. The horizontal line corresponds to the true SNP effect on the secondary phenotype.
Heritability results of the simulation studies for a SNP and a polygenic score: estimates with standard deviations and root mean square error (in brackets) for the heritability of the secondary phenotype for a common disease (prevalence ≈5%), when sibships with at least one or at least two cases are sampled and for two values of α 1, that is, SNP or polygenic score effect on primary phenotype.
| SNP model | Polygenic score model | ||||
|---|---|---|---|---|---|
| Ascertainment |
| Retrospective | Naive | Retrospective | Naive |
| 1. At least two cases | |||||
| 0.10 | 0.48 (0.07) (0.22) | 0.13 (0.07) (0.37) | 0.50 (0.03) (0.13) | 0.14 (0.03) (0.36) | |
| 0.50 | 0.48 (0.07) (0.22) | 0.14 (0.07) (0.36) | 0.52 (0.03) 0.12) | 0.15 (0.03) (0.34) | |
| 2. At least one case | |||||
| 0.10 | 0.50 (0.08) (0.17) | 0.25 (0.08) (0.25) | 0.48 (0.04) (0.12) | 0.25 (0.03) (0.24) | |
| 0.50 | 0.50 (0.08) (0.17) | 0.27 (0.08) (0.24) | 0.50 (0.04) (0.10) | 0.26 (0.04) (0.23) | |
The heritability value is 50% under the generating model. Datasets consist of 400 sibships of size 5. Results are based on 500 replicates. SNP, single‐nucleotide polymorphism.
Robustness of the retrospective likelihood methods to violation of the probit model assumption for the primary phenotype: estimates of the effect size of the single‐nucleotide polymorphism on the secondary phenotype (β 1) and heritability of the secondary phenotype are given for a common disease (prevalence ≈5%), for the two ascertainment mechanisms and two values of α 1.
| Ascertainment |
|
| Heritability |
|---|---|---|---|
| 0. True value | 0.200 | 0.500 | |
| 1. At least two cases | |||
| 0.100 | 0.199 (0.104) (0.104) (0.948) | 0.509 (0.017) (0.110) | |
| 0.500 | 0.197 (0.106) (0.110) (0.945) | 0.516 (0.014) (0.108) | |
| 2. At least one case | |||
| 0.100 | 0.200 (0.104) (0.107) (0.961) | 0.510 (0.012) (0.096) | |
| 0.500 | 0.199 (0.107) (0.111) (0.960) | 0.513 (0.010) (0.087) |
In brackets are standard deviations, root mean square error and coverage probability (for the effect size only). Datasets consist of 400 sibships of size 5. Results are based on 500 replicates.
Empirical type I errors rates for testing for association between a genetic marker and a secondary phenotype using the likelihood ratio test for four scenarios.
| Nominal level ( | Retrospective likelihood | Naive method | |
|---|---|---|---|
| At least two cases | |||
|
| |||
| 0.05 | 0.0509 | 0.0580 | |
| 0.01 | 0.0118 | 0.0152 | |
| 0.001 | 0.0017 | 0.0025 | |
|
| |||
| 0.05 | 0.0505 | 0.0878 | |
| 0.01 | 0.0113 | 0.0222 | |
| 0.001 | 0.0013 | 0.0043 | |
| At least one case | |||
|
| |||
| 0.05 | 0.0524 | 0.0514 | |
| 0.01 | 0.0102 | 0.0098 | |
| 0.001 | 0.0018 | 0.0014 | |
|
| |||
| 0.05 | 0.0522 | 0.0558 | |
| 0.01 | 0.0098 | 0.0097 | |
| 0.001 | 0.0009 | 0.0016 |
Sibships with at least one or with at least two cases are considered. Two values for the association between the single‐nucleotide polymorphism and the primary phenotype, namely, α 1 = 0.1 and α 1 = 0.5, are used. Datasets consist of 400 sibships of size 5. Results are based on 10 000 replicates.
Figure 4Estimates and 95% confidence intervals for the polygenic score effect on the secondary phenotype for the retrospective likelihood approach and the naive method. Results are obtained from 500 simulated datasets of 400 sibships for two ascertainment schedules. The top and bottom panel correspond to a rare or common primary phenotype with a prevalence around 1% and 5%, respectively. In black and red are represented results for small (α 1=0.1) and large (α 1=0.5) effect sizes of the polygenic score on the primary phenotype, respectively. The horizontal line corresponds to the true polygenic score effect on the secondary phenotype.