| Literature DB >> 25519338 |
Miriam Kesselmeier1, Carine Legrand2, Barbara Peil2, Maria Kabisch3, Christine Fischer4, Ute Hamann3, Justo Lorenzo Bermejo2.
Abstract
Logistic regression is usually applied to investigate the association between inherited genetic variants and a binary disease phenotype. A limitation of standard methods used to estimate the parameters of logistic regression models is their strong dependence on a few observations deviating from the majority of the data. We used data from the Genetic Analysis Workshop 18 to explore the possible benefit of robust logistic regression to estimate the genetic risk of hypertension. The comparison between standard and robust methods relied on the influence of departing hypertension profiles (outliers) on the estimated odds ratios, areas under the receiver operating characteristic curves, and clinical net benefit. Our results confirmed that single outliers may substantially affect the estimated genotype relative risks. The ranking of variants by probability values was different in standard and in robust logistic regression. For cutoff probabilities between 0.2 and 0.6, the clinical net benefit estimated by leave-one-out cross-validation in the investigated sample was slightly larger under robust regression, but the overall area under the receiver operating characteristic curve was larger for standard logistic regression. The potential advantage of robust statistics in the context of genetic association studies should be investigated in future analyses based on real and simulated data.Entities:
Year: 2014 PMID: 25519338 PMCID: PMC4143696 DOI: 10.1186/1753-6561-8-S1-S65
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1Cook's distances from the age-only standard logistic regression model. The 4 most prominent outliers are indicated by their observation number.
Estimated odds ratios per year of age
| Excluded | HTN | Age | Standard logistic regression | Robust logistic regression | ||
|---|---|---|---|---|---|---|
| OR-Age (95% CI) | % Change | OR-Age (95% CI) | % Change | |||
| None | 1.085 (1.050, 1.121) | ref. | 1.084 (1.048, 1.122) | ref. | ||
| 62 | 0 | 90.23 | 1.095 (1.057, 1.133) | +11.2% | 1.091 (1.052, 1.131) | +7.8% |
| 58 | 0 | 87.66 | 1.094 (1.056, 1.132) | +10.0% | 1.091 (1.052, 1.131) | +7.9% |
| 60 | 1 | 38.44 | 1.091 (1.054, 1.128) | +6.5% | 1.089 (1.051, 1.128) | +5.1% |
| 24 | 0 | 80.27 | 1.091 (1.054, 1.128) | +6.6% | 1.091 (1.052, 1.131) | +7.6% |
Odds ratios (ORs) were estimated based on standard and robust logistic regression models for the complete set of individuals and after exclusion of the 4 most remarkable outliers.
HTN: Hypertension.
Overall odds of hypertension per age interval
| <39.0 (1:22) | [39.0, 46.0) (2:20) | [46.0, 56.2) (9:23) | ≥56.2 (31:22) |
| 0.05 | 0.10 | 0.39 | 1.41 |
Age intervals were defined by the age quartiles in controls.
Figure 2Quantile-quantile plots from the age-genotype standard and robust logistic regression models. The 2 selected SNPs are indicated by their reference SNP ID number.
Area under the receiver operating characteristic curve (AUC)
| Excluded | Standard logistic regression | Robust logistic regression | ||||||
|---|---|---|---|---|---|---|---|---|
| AUC-Age | AUC-Age + SNP | AUC-Age | AUC-Age + SNP | |||||
| None | 0.811 | (ref.) | 0.852 | (ref.) | 0.811 | (ref.) | 0.843 | (ref.) |
| 62 | 0.820 | +1.1% | 0.861 | +1.1% | 0.820 | +1.1% | 0.852 | +1.0% |
| 58 | 0.820 | +1.1% | 0.861 | +1.1% | 0.820 | +1.1% | 0.853 | +1.2% |
| 60 | 0.825 | +1.7% | 0.859 | +0.9% | 0.825 | +1.7% | 0.851 | +0.9% |
| 24 | 0.819 | +1.0% | 0.859 | +0.9% | 0.819 | +1.0% | 0.844 | +0.0% |
AUCs were calculated for the complete set of individuals and after exclusion of the 4 most remarkable outliers. The relative contributions of the variables age and SNP (rs3934103 and rs11918360, respectively) are also shown.
Concordance, sensitivity, specificity, clinical net benefit, and overall AUCs.
| Probability cutoff | Standard logistic regression | Robust logistic regression | ||||||
|---|---|---|---|---|---|---|---|---|
| Concordance | Sensitivity | Specificity | Clinical net benefit | Concordance | Sensitivity | Specificity | Clinical net benefit | |
| 0.0 | 43 (33.1) | 1.00 | 0.00 | 0.33 | 43 (33.1) | 1.00 | 0.00 | 0.33 |
| 0.1 | 79 (60.8) | 0.95 | 0.44 | 0.27 | 82 (63.1) | 0.88 | 0.51 | 0.26 |
| 0.2 | 90 (69.2) | 0.86 | 0.61 | 0.22 | 97 (74.6) | 0.86 | 0.69 | 0.23 |
| 0.3 | 98 (75.4) | 0.81 | 0.72 | 0.19 | 99 (76.2) | 0.81 | 0.74 | 0.19 |
| 0.4 | 98 (75.4) | 0.70 | 0.78 | 0.13 | 102 (78.5) | 0.72 | 0.82 | 0.16 |
| 0.5 | 101 (77.7) | 0.60 | 0.86 | 0.11 | 107 (82.3) | 0.67 | 0.90 | 0.15 |
| 0.6 | 97 (74.6) | 0.40 | 0.92 | 0.05 | 102 (78.5) | 0.51 | 0.92 | 0.09 |
| 0.7 | 99 (76.2) | 0.35 | 0.97 | 0.06 | 100 (76.9) | 0.42 | 0.94 | 0.05 |
| 0.8 | 93 (71.5) | 0.19 | 0.98 | 0.00 | 97 (74.6) | 0.30 | 0.97 | 0.01 |
| 0.9 | 91 (70.0) | 0.12 | 0.99 | −0.03 | 93 (71.5) | 0.19 | 0.98 | −0.08 |
| 1.0 | 87 (66.9) | 0.00 | 1.00 | - | 87 (66.9) | 0.00 | 1.00 | - |
| AUC | 0.835 | 0.830 | ||||||
These characteristics rely on the age-genotype models for standard and robust logistic regression estimated based on leave-one-out cross-validation.