| Literature DB >> 32764582 |
Louis Lello1,2, Timothy G Raben3, Stephen D H Hsu3,4.
Abstract
We test 26 polygenic predictors using tens of thousands of genetic siblings from the UK Biobank (UKB), for whom we have SNP genotypes, health status, and phenotype information in late adulthood. Siblings have typically experienced similar environments during childhood, and exhibit negligible population stratification relative to each other. Therefore, the ability to predict differences in disease risk or complex trait values between siblings is a strong test of genomic prediction in humans. We compare validation results obtained using non-sibling subjects to those obtained among siblings and find that typically most of the predictive power persists in between-sibling designs. In the case of disease risk we test the extent to which higher polygenic risk score (PRS) identifies the affected sibling, and also compute Relative Risk Reduction as a function of risk score threshold. For quantitative traits we examine between-sibling differences in trait values as a function of predicted differences, and compare to performance in non-sibling pairs. Example results: Given 1 sibling with normal-range PRS score (< 84 percentile, < + 1 SD) and 1 sibling with high PRS score (top few percentiles, i.e. > + 2 SD), the predictors identify the affected sibling about 70-90% of the time across a variety of disease conditions, including Breast Cancer, Heart Attack, Diabetes, etc. 55-65% of the time the higher PRS sibling is the case. For quantitative traits such as height, the predictor correctly identifies the taller sibling roughly 80 percent of the time when the (male) height difference is 2 inches or more.Entities:
Mesh:
Year: 2020 PMID: 32764582 PMCID: PMC7411027 DOI: 10.1038/s41598-020-69927-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The left and right panels show case and control distributions in PRS for the entire cohort of sibling pairs and the Affected Sibling Pair (ASP) cohort respectively. Phenotype is Hypertension. This plot was made using pyplot v3.2.1 under license https://matplotlib.org/3.2.1/users/license.html.
Polygenic predictors tested on sibling pairs.
| Condition | N pairs (single case) | Sibling case higher PRS fraction | Random case higher PRS fraction |
|---|---|---|---|
| Asthma | 3,948 | 0.618 (0.004) | 0.633 (0.005) |
| Atrial fibrillation | 332 | 0.620 (0.033) | 0.636 (0.013) |
| Basal cell carcinoma | 431 | 0.599 (0.014) | 0.616 (0.009) |
| Breast cancer (LASSO-L1) | 583 | 0.585 (0.020) | 0.586 (0.014) |
| Breast cancer (Khera) | 583 | 0.557 (–) | 0.601 (–) |
| Coronary artery disease (LASSO-L1) | 1,072 | 0.556 (0.012) | 0.579 (0.017) |
| Coronary artery disease (Khera) | 1,073 | 0.596 (–) | 0.614 (–) |
| Gallstones | 700 | 0.592 (0.006) | 0.622 (0.014) |
| Glaucoma | 440 | 0.593 (0.013) | 0.602 (0.015) |
| Gout | 631 | 0.627 (0.007) | 0.661 (0.003) |
| Heart attack | 900 | 0.593 (0.012) | 0.603 (0.006) |
| High cholesterol | 4,291 | 0.596 (0.005) | 0.632 (0.002) |
| Hypothyroidism | 2,031 | 0.658 (0.003) | 0.699 (0.005) |
| Hypertension | 6,931 | 0.627 (0.002) | 0.645 (0.001) |
| Malignant melanoma | 360 | 0.547 (0.040) | 0.592 (0.013) |
| Prostate cancer | 106 | 0.642 (0.015) | 0.650 (0.034) |
| Testicular cancer | 24 | ||
| Type 1 diabetes | 290 | 0.646 (0.019) | 0.669 (0.006) |
| Type 2 diabetes | 1,594 | 0.595 (0.005) | 0.620 (0.001) |
The first column is the number of sibling pairs with one affected and one unaffected sibling. The second column is the average and standard deviation (over five predictors) of the fraction in which the case has higher PRS. Third column gives results for non-sibling (random) pairs. Quantities in bold have uncertainties in the central value larger than 10% due to low statistics.
Predictors tested on sibling pairs with a single case, where one sibling is high risk (, , SD or 93rd, 97.5th, 99th percentile) and the other is normal risk ( SD or < 85th percentile).
| Condition | N pairs | 1 sib 93rd | N | 1 sib 97th | N | 1 sib 99th |
|---|---|---|---|---|---|---|
| Asthma | 402 | 0.758 (0.021) | 110 | 0.782 (0.039) | 17 | 0.882 (0.078) |
| Atrial fibrillation | 37 | 0.757 (0.071) | 20 | 9 | 1.0 (–) | |
| Basal cell carcinoma | 54 | 0.648 (0.065) | 23 | 0.826 (0.079) | 6 | |
| Breast cancer (LASSO-L1) | 45 | 11 | 2 | |||
| Breast cancer (Khera) | 52 | 12 | 3 | |||
| Coronary artery disease (LASSO-L1) | 131 | 0.613 (0.043) | 48 | 10 | ||
| Coronary artery disease (Khera) | 109 | 0.706 (0.044) | 38 | 0.816 (0.063) | 5 | |
| Gallstones | 212 | 0.720 0.031 | 158 | 0.697 (0.037) | 85 | 0.686 (0.050) |
| Glaucoma | 30 | 0.720 (0.082) | 9 | 1 | – | |
| Gout | 70 | 0.743 (0.052) | 37 | 0.784 (0.068) | 16 | 0.875 (0.083) |
| Heart attack | 68 | 0.685 (0.056) | 16 | 4 | – | |
| High cholesterol | 441 | 0.660 (0.023) | 130 | 0.662 (0.042) | 28 | 0.786 (0.078) |
| Hypothyroidism | 282 | 0.780 (0.025) | 109 | 0.890 (0.030) | 32 | 0.906 (0.052) |
| Hypertension | 757 | 0.726 (0.016) | 229 | 0.777 (0.027) | 53 | 0.811 (0.054) |
| Malignant melanoma | 30 | 10 | 2 | 1.0 – | ||
| Prostate cancer | 8 | 0 | – | 0 | – | |
| Testicular cancer | 0 | – | 0 | – | 0 | – |
| Type 1 diabetes | 41 | 0.805 (0.062) | 28 | 0.893 (0.058) | 17 | |
| Type 2 diabetes | 137 | 0.772 (0.036) | 37 | 0.816 (0.064) | 8 |
The first column is the number of pairs used. The second column is the fraction of pairs where the high risk sibling is the case. 1 SD binomial errors given in parenthesis. Quantities in bold have uncertainties in the central value larger than 10% due to low statistics.
Predictors tested on non-sibling (random) pairs w/ a single case where one is high risk (, , SD above or 93rd, 97.5th, 99th percentile) and the other is normal risk ( Standard Deviation or < 85th percentile).
| Condition | N pairs | 1 sib 93rd | N | 1 sib 97th | N | 1 sib 99th |
|---|---|---|---|---|---|---|
| Asthma | 777 | 0.749 (0.016) | 289 | 0.794 (0.026) | 72 | 0.889 (0.050) |
| Atrial fibrillation | 76 | 0.734 (0.057) | 44 | 24 | ||
| Basal cell carcinoma | 65 | 0.708 (0.067) | 34 | 9 | ||
| Breast cancer (LASSO-L1) | 132 | 0.638 (0.050) | 50 | 10 | ||
| Breast cancer (Khera) | 143 | 0.678 (0.044) | 60 | 23 | ||
| Coronary artery disease (LASSO-L1) | 117 | 0.634 (0.054) | 40 | 7 | ||
| Coronary artery disease (Khera) | 187 | 0.711 (0.037) | 78 | 0.719 (0.060) | 22 | 0.773 (0.124) |
| Gallstones | 210 | 0.695 (0.035) | 149 | 0.718 (0.041) | 65 | 0.723 (0.066) |
| Glaucoma | 42 | 16 | 3 | |||
| Gout | 115 | 0.852 (0.041) | 69 | 0.870 (0.054) | 40 | 0.850 (0.078) |
| Heart attack | 121 | 0.645 (0.048) | 45 | 14 | ||
| High cholesterol | 881 | 0.712 (0.016) | 340 | 0.738 (0.026) | 108 | 0.769 (0.048) |
| Hypothyroidism | 505 | 0.844 (0.018) | 240 | 0.879 (0.025) | 79 | 0.911 (0.044) |
| Hypertension | 1883 | 0.755 (0.010) | 727 | 0.812 (0.016) | 222 | 0.820 (0.029) |
| Malignant melanoma | 64 | 29 | 17 | |||
| Prostate cancer | 37 | 12 | 9 | |||
| Testicular cancer | – | – | – | – | – | – |
| Type 1 diabetes | 86 | 0.849 (0.049) | 56 | 0.911 (0.056) | 35 | 0.914 (0.076) |
| Type 2 diabetes | 230 | 0.764 (0.030) | 75 | 0.853 (0.052) | 23 |
The first column is the number of pairs. The second column is the fraction of pairs where the high risk individual is the case. 1 SD binomial errors given in parenthesis. Quantities in bold have uncertainties in the central value larger than 10% due to low statistics.
Figure 2Predictors tested on random (non-sibling) pairs and affected sibling pairs with a single case. One individual is high risk (with z-score given on the horizontal axis) and the other is normal risk (PRS < + 1 SD). The error estimates are explained in the text. This plot was made using pyplot v3.2.1 under license https://matplotlib.org/3.2.1/users/license.html.
Figure 3Exclusion of individuals above (left panel) and below (right panel) a z-score threshold (horizontal axis) with resulting group prevalence shown on the vertical axis. The left panel shows risk reduction in a low PRS population, the right panel shows risk enhancement in a high PRS population. Top figures are results in the general population, bottom figures are the Affected Sibling Pair (ASP) population (i.e., variation of risk with PRS among individuals with an affected sib). Phenotype is Type 2 Diabetes. This plot was made using pyplot v3.2.1 under license https://matplotlib.org/3.2.1/users/license.html.
Figure 4Exclusion of individuals above (left panel) and below (right panel) a z-score threshold (horizontal axis) with resulting group prevalence shown on the vertical axis. The left panel shows risk reduction in a low PRS population, the right panel shows risk enhancement in a high PRS population. Top figures are results in the general population, bottom figures are the Affected Sibling Pair (ASP) population (i.e., variation of risk with PRS among individuals with an affected sib). Phenotype is Breast Cancer. This plot was made using pyplot v3.2.1 under license https://matplotlib.org/3.2.1/users/license.html.
Figure 5Exclusion of individuals above (left panel) and below (right panel) a z-score threshold (horizontal axis) with resulting group prevalence shown on the vertical axis. The left panel shows risk reduction in a low PRS population, the right panel shows risk enhancement in a high PRS population. Top figures are results in the general population, bottom figures are the Affected Sibling Pair (ASP) population (i.e., variation of risk with PRS among individuals with an affected sib). Phenotype is Hypertension. This plot was made using pyplot v3.2.1 under license https://matplotlib.org/3.2.1/users/license.html.
Figure 6Exclusion of individuals above (left panel) and below (right panel) a z-score threshold (horizontal axis) with resulting group prevalence shown on the vertical axis. The left panel shows risk reduction in a low PRS population, the right panel shows risk enhancement in a high PRS population. Top figures are results in the general population, bottom figures are the Affected Sibling Pair (ASP) population (i.e., variation of risk with PRS among individuals with an affected sib). Phenotype is Heart Attack.This plot was made using pyplot v3.2.1 under license https://matplotlib.org/3.2.1/users/license.html.
Polygenic predictors tested on sibling pairs.
| Condition | All siblings | ASPs | ||
|---|---|---|---|---|
| N cases/ctrls | AUC-All | N cases/ctrls | AUC-ASP | |
| Asthma | 4,519/35,511 | 0.630 (0.002) | 944/3,877 | 0.628 (0.003) |
| Atrial fibrillation | 327/39,703 | 0.624 (0.004) | 16/330 | 0.577 (0.019) |
| Basal cell carcinoma | 415/39,615 | 0.626 (0.007) | 16/428 | 0.528 (0.024) |
| Breast cancer (LASSO) | 963/22,204 | 0.585 (0.016) | 52/583 | 0.567 (0.015) |
| Breast cancer (Khera) | 963/22,242 | 0.608 (–) | 52/583 | 0.573 (–) |
| Coronary artery disease (LASSO) | 1,058/38,972 | 0.582 (0.017) | 70/1,069 | 0.570 (0.019) |
| Coronary artery disease (Khera) | 1,059/39,049 | 0.621 (–) | 70/1,070 | 0.617 (–) |
| Gallstones | 690/39,340 | 0.638 (0.003) | 40/699 | 0.586 (0.015) |
| Glaucoma | 422/39,608 | 0.592 (0.012) | 26/439 | 0.602 (0.030) |
| Gout | 601/39,429 | 0.660 (0.004) | 29/631 | 0.653 (0.010) |
| Heart attack | 889/39,141 | 0.602 (0.006) | 60/898 | 0.618 (0.025) |
| High cholesterol | 5,240/34,790 | 0.632 (0.002) | 1,351/4,203 | 0.622 (0.002) |
| Hypertension | 10,524/29,506 | 0.648 (0.001) | 4,296/6,719 | 0.635 (0.001) |
| Hypothyroidism | 2,152/37,878 | 0.709 (0.002) | 319/1,997 | 0.685 (0.007) |
| Malignant melanoma | 334/39,696 | 0.585 (0.007) | 2/359 | – |
| Prostate cancer | 262/16,601 | 0.644 (0.014) | 20/106 | 0.654 (0.030) |
| Testicular cancer | 57/16,806 | 0.631 (0.012) | 0/24 | – |
| Type 1 diabetes | 277/39,753 | 0.676 (0.003) | 12/290 | 0.643 (0.018) |
| Type 2 diabetes | 1,692/38,338 | 0.617 (0.005) | 235/1,576 | 0.599 (0.014) |
The first column gives the number of cases/controls and the AUC for the entire sibling cohort (proxy for general population). The second column gives the number of cases/controls and the AUC for subset of cohort in which all pairs have at least one affected sibling (ASP). Quantities in parentheses are standard deviations amongst five predictors.
Polygenic predictors tested on sibling pairs and non-sibling (random) pairs.
| Trait | N pairs | ||
|---|---|---|---|
| BMI | 21,556 | 0.271 (0.003) | 0.345 (0.005) |
| Educational attainment | 21,352 | 0.089 (0.001) | 0.256 (0.003) |
| Body fat percentage | 20,990 | 0.245 (0.002) | 0.319 (0.001) |
| Fluid intelligence | 4,968 | 0.165 (0.004) | 0.264 (0.005) |
| Heel bone density | 10,133 | 0.345 (0.001) | 0.415 (0.002) |
| Standing height | 21,418 | 0.545 (0.001) | 0.614 (0.001) |
| Platelet count | 20,534 | 0.393 (0.002) | 0.490 (0.002) |
First column is number of pairs, second and third are correlation between difference in predicted phenotype and actual phenotype for sibs and non-sibling pairs.
Figure 7Difference in phenotype (vertical axis) and difference in polygenic score (horizontal axis) for pairs of individuals. Red dots are sibling pairs and blue dots are random (non-sibling) pairs. This plot was made using pyplot v3.2.1 under license https://matplotlib.org/3.2.1/users/license.html.
Rank ordering by polygenic score.
| Trait | N pairs | Fraction called (sibling pairs) | Fraction called (random pairs) |
|---|---|---|---|
| BMI | 21,556 | 0.588 (0.001) | 0.614 (0.003) |
| Educational attainment | 21,352 | 0.528 (0.001) | 0.591 (0.001) |
| Body fat percentage | 20,990 | 0.583 (0.002) | 0.606 (0.001) |
| Fluid intelligence | 4,968 | 0.558 (0.003) | 0.592 (0.006) |
| Heel bone density | 10,133 | 0.627 (0.002) | 0.657 (0.003) |
| Standing height | 21,418 | 0.684 (0.001) | 0.718 (0.001) |
| Platelet count | 20,534 | 0.646 (0.001) | 0.679 (0.001) |
The first column gives the number of sibling pairs, the second and third columns give the fraction called correctly (higher PGS individual has greater phenotype value) in sibling/non-sibling pairs. Quantities in parenthesis are standard deviations.
Figure 8Probability of PGS correctly identifying the individual with larger phenotype value (vertical axis). Horizontal axis shows absolute difference in phenotypes. The blue line is for sibling pairs, the orange line is for randomized (non-sibling) pairs. This plot was made using pyplot v3.2.1 under license https://matplotlib.org/3.2.1/users/license.html.
Predictors tested on sibling pairs where a phenotype difference is larger than some value (, , Standard Deviations difference; the adjusted phenotypes are described in the Supplementary Appendix).
| Trait | N pairs | N | N | |||
|---|---|---|---|---|---|---|
| Body mass index | 13,376 | 0.623 (0.004) | 7,387 | 0.658 (0.006) | 3,815 | 0.694 (0.007) |
| Educational attainment | 11,532 | 0.545 (0.005) | 8,020 | 0.548 (0.006) | 5,402 | 0.554 (0.007) |
| Body fat percentage | 13,836 | 0.613 (0.004) | 8,075 | 0.642 (0.005) | 4,234 | 0.670 (0.007) |
| Fluid intelligence | 3,288 | 0.570 (0.009) | 1,875 | 0.585 (0.011) | 952 | 0.605 (0.016) |
| Heel bone density | 6,365 | 0.674 (0.006) | 3,435 | 0.716 (0.008) | 1,724 | 0.734 (0.011) |
| Standing height | 12,689 | 0.762 (0.004) | 6,184 | 0.835 (0.005) | 2,488 | 0.890 (0.006) |
| Platelet count | 13,039 | 0.702 (0.004) | 7,253 | 0.745 (0.005) | 3,591 | 0.769 (0.007) |
The first column is the number of pairs and the second column is the fraction of pairs where higher PGS corresponds to greater phenotype value. The standard deviation among males for height, BMI, body fat percentage, years of education, fluid intelligence, heel bone density, and platelet count are respectively 6.76 cm, 4.23 kg/m, 5.80%, 5.19 years, 2.16 points, 0.15 g/cm and 55.81 cells/l. The standard deviation for females for height, BMI, body fat percentage, years of education, fluid intelligence, heel bone density and platelet count are respectively 6.12 cm, 5.13 kg/m, 6.85%, 5.03 years, 2.02 points, 0.12 g/cm and 59.96 cells/l.
Predictors tested on non-sibling pairs where a phenotype difference is larger than some value (, , Standard Deviations difference; the adjusted phenotypes are described in the Supplementary Appendix).
| Trait | N pairs | N | N | |||
|---|---|---|---|---|---|---|
| Body mass index | 14,816 | 0.652 (0.004) | 9,185 | 0.686 (0.005) | 5,399 | 0.723 (0.006) |
| Educational attainment | 13,973 | 0.618 (0.004) | 10,521 | 0.630 (0.005) | 7,866 | 0.646 (0.005) |
| Body fat percentage | 14,892 | 0.637 (0.004) | 9,771 | 0.668 (0.005) | 5,794 | 0.702 (0.006) |
| Fluid intelligence | 3,495 | 0.611 (0.008) | 2,337 | 0.625 (0.010) | 1,394 | 0.648 (0.013) |
| Heel bone density | 6,985 | 0.707 (0.005) | 4,410 | 0.758 (0.006) | 2,505 | 0.790 (0.008) |
| Standing height | 15,365 | 0.777 (0.003) | 9,994 | 0.837 (0.004) | 5,828 | 0.884 (0.004) |
| Platelet count | 14,443 | 0.724 (0.004) | 9,131 | 0.774 (0.004) | 5,251 | 0.805 (0.005) |
The first column is the number of pairs and the second column is the fraction of pairs where higher PGS corresponds to greater phenotype value. The standard deviation among males for height, BMI, body fat percentage, years of education, fluid intelligence, heel bone density, and platelet count are respectively 6.76 cm, 4.23 kg/m, 5.80%, 5.19 years, 2.16 points, 0.15 g/cm and 55.81 cells/l. The standard deviation for females for height, BMI, body fat percentage, years of education, fluid intelligence, heel bone density and platelet count are respectively 6.12 cm, 5.13 kg/m, 6.85%, 5.03 years, 2.02 points, 0.12 g/cm and 59.96 cells/l.