| Literature DB >> 30073909 |
Marianne A Jonker1,2, Johannes A Rijken3, Frederik J Hes4, Hein Putter5, Erik F Hensen3,6.
Abstract
Accurate assessment of the age-dependent disease risk conferred by germline variants in disease susceptibility genes is often hampered by the way the data are collected. Cohort-based data sets frequently contain an overrepresentation of patients (i.e. carriers of the gene variant of interest affected with the associated disease), and an underrepresentation of disease-free carriers. In order to overcome this problem, penetrance estimates can be based on family-based study designs, through the evaluation of index patients and their family members. This approach facilitates the identification of asymptomatic germline variant carriers. By adjusting for the way these family data are ascertained, an estimate for the penetrance of the pathogenic gene variant can be obtained. However, the family structure is often incomplete or missing. This complicates the estimation of the penetrance, because full adjustment of the likelihood is not possible. We present a conditional likelihood for the estimation of the penetrance of pathogenic gene variants, based on a cohort of multiple families comprising index patients, disease-free and affected non-index carriers, but with missing information on pedigree structure. The proposed estimator corrects for the ascertainment in a robust way and is shown to be more accurate than the frequently used Kaplan-Meier estimator of the penetrance function.Entities:
Keywords: Age-at-onset; SDHB; conditional maximum likelihood method; missing data
Mesh:
Year: 2018 PMID: 30073909 PMCID: PMC6745609 DOI: 10.1177/0962280218791338
Source DB: PubMed Journal: Stat Methods Med Res ISSN: 0962-2802 Impact factor: 3.021
Figure 1.Boxplots for the results of the simulation studies with equal to the Weibull distribution with the shape and scale parameters equal to 2.5 and 90, at ages 50 (left) and 70 (right).
Median of the estimates of F at the ages 50 and 70 by the estimates in the settings A, B, C and D and in the three different studies, and the naive estimator (the Kaplan–Meier estimator as described before).
| Study 1 | Study 2 | Study 3 | ||||
|---|---|---|---|---|---|---|
| age | 50 | 70 | 50 | 70 | 50 | 70 |
|
| 0.205 | 0.413 | 0.205 | 0.413 | 0.205 | 0.413 |
| A | 0.206 | 0.414 | 0.206 | 0.413 | 0.206 | 0.413 |
| B | 0.209 | 0.421 | 0.209 | 0.420 | 0.214 | 0.429 |
| C | 0.196 | 0.403 | 0.199 | 0.405 | 0.202 | 0.409 |
| D | 0.199 | 0.408 | 0.203 | 0.411 | 0.209 | 0.423 |
|
| 0.124 | 0.260 | 0.130 | 0.272 | 0.142 | 0.294 |
Note: The first row yields the true values.
Figure 3.Left:Kaplan–Meier estimates with (upper step-function) and without (lower step-function) index patient and the maximum likelihood estimate (continuous dashed line). Right: Maximum likelihood estimate (continuous line). The dashed lines indicate the range of possible bias due to missing of 20% individuals. Only a bias towards overestimation in the dataset is evaluated.
Median of the estimates of F at the ages 50 and 70 by the estimates in the settings A, B, C and D and in the three different studies, and the naive estimator (the Kaplan–Meier estimator as described before).
| Study 1 | Study 2 | Study 3 | ||||
|---|---|---|---|---|---|---|
| age | 50 | 70 | 50 | 70 | 50 | 70 |
|
| 0.154 | 0.208 | 0.205 | 0.413 | 0.0527 | 0.152 |
| A | 0.154 | 0.208 | 0.206 | 0.413 | 0.0527 | 0.152 |
| B | 0.154 | 0.209 | 0.226 | 0.448 | 0.0527 | 0.153 |
| C | 0.153 | 0.207 | 0.204 | 0.413 | 0.0512 | 0.150 |
| D | 0.154 | 0.208 | 0.224 | 0.448 | 0.0515 | 0.150 |
|
| 0.0936 | 0.128 | 0.169 | 0.347 | 0.0281 | 0.0850 |
Note: The first row yields the true values.
Figure 2.Left: Maximum likelihood estimate for (black) and confidence interval (dashed). Right: Empirical distribution for G based on the censoring times of the relatives. Straight line: distribution function for the uniform [20,80] distribution.