| Literature DB >> 30665703 |
Caroline F Wright1, Ben West2, Marcus Tuke2, Samuel E Jones2, Kashyap Patel2, Thomas W Laver2, Robin N Beaumont2, Jessica Tyrrell2, Andrew R Wood2, Timothy M Frayling2, Andrew T Hattersley2, Michael N Weedon3.
Abstract
More than 100,000 genetic variants are classified as disease causing in public databases. However, the true penetrance of many of these rare alleles is uncertain and might be over-estimated by clinical ascertainment. Here, we use data from 379,768 UK Biobank (UKB) participants of European ancestry to assess the pathogenicity and penetrance of putatively clinically important rare variants. Although rare variants are harder to genotype accurately than common variants, we were able to classify as high quality 1,244 of 4,585 (27%) putatively clinically relevant rare (MAF < 1%) variants genotyped on the UKB microarray. We defined as "clinically relevant" variants that were classified as either pathogenic or likely pathogenic in ClinVar or are in genes known to cause two specific monogenic diseases: maturity-onset diabetes of the young (MODY) and severe developmental disorders (DDs). We assessed the penetrance and pathogenicity of these high-quality variants by testing their association with 401 clinically relevant traits. 27 of the variants were associated with a UKB trait, and we were able to refine the penetrance estimate for some of the variants. For example, the HNF4A c.340C>T (p.Arg114Trp) (GenBank: NM_175914.4) variant associated with diabetes is <10% penetrant by the time an individual is 40 years old. We also observed associations with relevant traits for heterozygous carriers of some rare recessive conditions, e.g., heterozygous carriers of the ERCC4 c.2395C>T (p.Arg799Trp) variant that causes Xeroderma pigmentosum were more susceptible to sunburn. Finally, we refute the previous disease association of RNF135 in developmental disorders. In conclusion, this study shows that very large population-based studies will help refine our understanding of the pathogenicity of rare genetic variants.Entities:
Keywords: SNP-chip; biobank; genetic; genotyping; pathogenicity; penetrance; rare variant; variant
Mesh:
Year: 2019 PMID: 30665703 PMCID: PMC6369448 DOI: 10.1016/j.ajhg.2018.12.015
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.025
Evaluated Variants
| 0–0.0005 | 511 | 0 | 11 | 522 |
| 0.0005–0.001 | 607 | 8 | 59 | 674 |
| 0.001–0.005 | 1,598 | 218 | 210 | 2,026 |
| 0.005–0.01 | 138 | 204 | 73 | 415 |
| 0.01–0.05 | 66 | 456 | 48 | 570 |
| 0.05–0.1 | 2 | 129 | 5 | 136 |
| 0.1–0.5 | 6 | 189 | 7 | 202 |
| 0.5–1 | 0 | 40 | 0 | 40 |
| Total | 2,928 | 1,244 | 413 | 4,585 |
Number of variants manually evaluated for analytical validity in different MAF bins; quality scores are grouped into false positives (FP, score = 1 or 2), unclear scores (score = 3), and true positives (TP, score = 4 or 5).
Figure 1Correlation between Minor Allele Frequency and Analytical Validity Quality Score
(A and B) Density plot (A) and boxplot (B) of manual quality scores (from 1–5, see Figure S1) of genotype data in UKB versus minor allele frequency (MAF) for 4,585 putatively clinically important variants, where MAF < 1%, Hardy–Weinberg equilibrium (HWE) > 0.05, and missingness < 0.01.
(C) Histogram of the number of variants at each quality score versus presence or absence of the variant in gnomAD (exome data) or the 1000 Genomes Project.
Red = score 1; gold = score 2; green = score 3; blue = score 4; purple = score 5.
(D) Estimation of the false-positive rate (FPR) versus MAF for variants assayed with the UKB genotyping arrays, calculated by the grouping of quality scores into low (score = 1 or 2) and high (score = 4 or 5) and use of the rocreg command in Stata for fitting a ROC curve.
Pathogenic Variants
| dbSNP: rs141090143 | chr16: 89220556 C>T | GenBank: NM_174917:c.C1672T:p.R558W | 0.632 | ease of sunburn (number of episodes) | 0.31 [0.20, 0.42] | 4 × 10−10 | combined malonic and methylmalonic aciduria (AR) | |
| dbSNP: rs137852591 | chrX: 66941751 C>G | GenBank: NM_000044:c.C2395G:p.Q799E | 0.129 | skeletal mass (SD) | −0.16 [−0.21, −0.11] | 1 × 10−10 | partial androgen insensitivity syndrome (XLR) | |
| height (cm) | −0.85 [−1.27, −0.43] | 1 × 10−8 | ||||||
| dbSNP: rs1800053 | chrX: 66931295 C>A | GenBank: NM_000044:c.C1937A:p.A646D | 0.269 | balding pattern (males only) | −0.13 [−0.17, −0.08] | 1 × 10−8 | partial androgen insensitivity syndrome (XLR) | |
| dbSNP: rs121913049 | chr16: 14041848 C>T | GenBank: NM_005236:c.C2395T:p.R799W | 0.060 | ease of sunburn (number of episodes) | 0.98 [0.64, 1.33] | 2 × 10−8 | xeroderma pigmentosum (AR) | |
| dbSNP: rs150597413 | chr1: 152277622 G>T | GenBank: NM_002016:c.C9740A:p.S3247X | 0.369 | eczema | 1.66 [1.40, 1.98] | 9 × 10−8 | ichthyosis vulgaris (AD) | |
| dbSNP: rs138726443 | chr1: 152280023 G>A | GenBank: NM_002016:c.C7339T:p.R2447X | 0.446 | eczema | 1.96 [1.69, 2.27] | 5 × 10−16 | ichthyosis vulgaris (AD) | |
| dbSNP: rs104894006 | chr7: 44189591 G>A | GenBank: NM_000162:c.C556T:p.R186X | 0.001 | maturity-onset diabetes of the young | 68 [14, 325] | 2 × 10−8 | diabetes mellitus (AD) | |
| Affx-52141620 | chr11: 5248004 G>A | GenBank: NM_000518:c.C118T:p.Q40X | 0.005 | mean corpuscular volume (SD) | −2.92 [−3.26, −2.57] | 6 × 10−63 | beta-thalassemia (AR)∗ | |
| red blood cell distribution width (SD) | 1.87 [1.53, 2.21] | 5 x 10−27 | ||||||
| dbSNP: rs138213197 | chr17: 46805705 C>T | GenBank: NM_006361:c.G251A:p.G84E | 0.160 | prostate cancer | 4.09 [3.24, 5.17] | 1 × 10−23 | prostate cancer susceptibility (AD) | |
| father with prostate cancer | 1.75 [1.47, 2.09] | 4 × 10−9 | ||||||
| dbSNP: rs137853336 | chr20: 43042354 C>T | GenBank: NM_175914:c.340C>T:R114W | 0.015 | diabetes | 2.9 [1.7, 5] | 3 × 10−4 | maturity-onset diabetes of the young (AD) | |
| Affx-80274027 | chr5: 137902404 CT>- | GenBank: NM_004134:c.882_883del:p.T294fs | 0.017 | mean corpuscular volume (SD) | −0.49 [−0.67, −0.32] | 2 × 10−8 | even-plus syndrome (AR) | |
| red blood cell distribution width (SD) | 1.17 [0.99, 1.34] | 9 × 10−40 | ||||||
| Affx-80299186 | chr19: 12995833 ->C | GenBank: NM_006563:c.954dupG:p.R319fs | 0.017 | mean corpuscular volume (SD) | −1.27 [−1.45, −1.1] | 9 × 10−48 | blood group Lutheran inhibitor (AD) | |
| red blood cell distribution width (SD) | 1.48 [1.3, 1.65] | 2 × 10−63 | ||||||
| dbSNP: rs34637584 | chr12: 40734202 G>A | GenBank: NM_198578:c.G6055A:p.G2019S | 0.032 | parent with Parkinson disease | 4.76 [3.25, 6.96] | 1 × 10−11 | Parkinson disease (AD) | |
| Affx-86888962 | chr14: 23887458 C>T | GenBank: NM_000257:c.C4130T:p.T1377M | 0.117 | pulse rate (beats per minute) | −4.75 [−5.47, −4.01] | 4 × 10−41 | primary familial hypertrophic cardiomyopathy (AD) | |
| dbSNP: rs80358259 | chr18: 21116700 A>G | GenBank: NM_000271:c.T3182C:p.I1061T | 0.075 | mean corpuscular volume (SD) | −0.24 [−0.32, −0.15] | 2 × 10−8 | Niemann-Pick disease (AR) | |
| dbSNP: rs28934272 | chr15: 28230247 C>T | GenBank: NM_000275:c.G1327A:p.V443I | 0.834 | ease of sunburn (number of episodes) | 0.49 [0.40, 0.58] | 1 × 10−47 | oculocutaneous albinism (AR) | |
| dbSNP: rs121918170 | chr15: 28228529 T>C | GenBank: NM_000275:c.A1465G:p.N489D | 0.094 | ease of sunburn (number of episodes) | 0.91 [0.64, 1.18] | 1 × 10−14 | oculocutaneous albinism (AR) | |
| dbSNP: rs180177132 | chr16: 23632683 C>T | GenBank: NM_024675:c.G3113A:p.W1038X | 0.033 | breast cancer | 4.55 [3.05, 6.79] | 2 × 10−10 | familial breast cancer (AD) | |
| mother with breast cancer | 2.62 [1.92, 3.59] | 5 × 10−8 | ||||||
| dbSNP: rs139315125 | chr1: 7869960 A>G | GenBank: NM_001289862:c.A1250G:p.H417R | 0.438 | morning person | 1.37 [1.27, 1.47] | 2 × 10−16 | advanced sleep phase syndrome (AD) | |
| dbSNP: rs150812083 | chr1: 7869953 C>G | GenBank: NM_001289862:c.C1243G:p.P415A | 0.458 | morning person | 1.35 [1.25, 1.46] | 7 × 10−15 | ||
| dbSNP: rs121918221 | chr20: 18496339 G>A | GenBank: NM_006363:c.G325A:p.E109K | 0.027 | red blood cell distribution width (SD) | 0.39 [0.25, 0.52] | 3 × 10−8 | congenital dyserythropoietic anemia (AR) | |
| dbSNP: rs121434346 | chr5: 1212453 G>A | GenBank: NM_001003841:c.G517A:p.D173N | 0.442 | red blood cell distribution width (SD) | −0.15 [−0.18, −0.11] | 2 × 10−16 | neutral 1 amino acid transport defect (AR) | |
| dbSNP: rs144292455 | chr4: 104577415 C>T | GenBank: NM_001059:c.G824A:p.W275X | 0.054 | reproductive age at menarche (yr) | 0.66 [0.45, 0.87] | 2 × 10−10 | hypogonadotropic hypogonadism (AR) | |
| dbSNP: rs137853120 | chr22: 37469593 C>T | GenBank: NM_153609:c.G1561A:p.D521N | 0.019 | mean corpuscular volume (SD) | −0.67 [−0.83, −0.51] | 3 × 10−16 | microcytic anemia (AR) | |
| red blood cell distribution width (SD) | 0.69 [0.53, 0.85] | 5 × 10−17 | ||||||
| dbSNP: rs121908866 | chr14: 81610039 G>A | GenBank: NM_000369:c.G1637A:p.W546X | 0.041 | hypothyroid | 3.34 [2.47, 4.51] | 7 × 10−12 | hypothyroidism (AD, AR) | |
| autoimmune disease | 2.31 [1.76, 3.04] | 4 × 10−8 | ||||||
| dbSNP: rs34557412 | chr17: 16852187 A>G | GenBank: NM_012452:c.T310C:p.C104R | 0.703 | mean corpuscular volume (SD) | −0.09 [−0.12, −0.07] | 4 × 10−11 | common variable immunodeficiency (AD, AR) |
Reduced penetrance, variable expressivity, and carrier phenotypes for rare (MAF < 1%) ClinVar pathogenic variants with genome-wide significant associations in UKB. Abbreviations are as follows: UKB = UK Biobank, HGVS = Human Genome Variation Society, MAF = minor allele frequency, SD = standard deviations, cm = centimeters, yr = years, CI = confidence interval, AD = autosomal dominant, AR = autosomal recessive, XLR = X-liniked recessive.
Figure 2Comparison of Penetrance Estimate for HNF4A p.Arg114Trp in UK Biobank versus Previously Published Estimates from MODY Cohort Studies
A Kaplan-Meier plot of the proportion of individuals who are diabetes free at various ages for 379,768 individuals from UK Biobank (red line), 122 UK Biobank individuals who are heterozygous for HNF4A p.Arg114Trp (green line), 26 MODY referral probands (blue line), and 24 family members of the probands (yellow line) from Laver et al.
Benign Variants
| Affx-80270894 | chr2: 228148945 G>GAGTAAAGGGCC | GenBank: NM_000091:c.2766_2776del:p.G922fs | 0.01813 | education years | −0.007 [−0.17, 0.156] | 0.93 | DD (Alport syndrome, | |
| fluid intelligence | −0.085 [−0.3, 0.133] | 0.45 | autosomal dominant) | |||||
| BMI | 0.096 [−0.07, 0.259] | 0.25 | ||||||
| height | −0.097 [−0.26, 0.066] | 0.24 | ||||||
| albumin creatine ratio | 0.941 [0.44, 2.015] | 0.88 | ||||||
| hearing left | −0.13 [−0.41, 0.144] | 0.35 | ||||||
| hearing right | 0.076 [−0.2, 0.35] | 0.59 | ||||||
| dbSNP: rs200910410 | chr20: 57428858 T>C | GenBank: NM_080425:c.C538T:p.Q180X | 0.03063 | education years | −0.053 [−0.18, 0.072] | 0.41 | DD (Albright hereditary | |
| fluid intelligence | 0.062 [−0.12, 0.239] | 0.49 | osteodystrophy) | |||||
| BMI | 0.048 [−0.08, 0.174] | 0.46 | ||||||
| height | −0.105 [−0.23, 0.021] | 0.10 | ||||||
| Affx-89024826 | chr6: 26156672 T>TC | GenBank: NM_005321:c.55delC:p.P19fs | 0.02438 | education years | 0.098 [−0.05, 0.24] | 0.18 | DD (Childhood | |
| fluid intelligence | 0.1 [−0.1, 0.302] | 0.33 | overgrowth) | |||||
| BMI | −0.014 [−0.16, 0.129] | 0.85 | ||||||
| height | 0.059 [−0.08, 0.202] | 0.42 | ||||||
| dbSNP: rs121918161 | chr17: 29324307 T>C | GenBank: NM_032322:c.C727T:p.Q243X | 0.00215 | education years | 0.341 [−0.11, 0.79] | 0.14 | DD (macrocephaly, | |
| fluid intelligence | 0.16 [−0.41, 0.726] | 0.58 | macrosomia, | |||||
| BMI | 0.177 [−0.27, 0.626] | 0.44 | facial dysmorphism | |||||
| height | 0.082 [−0.37, 0.532] | 0.72 | syndrome) | |||||
| Affx-80285705 | chr17: 29325809 G>GC | 0.05265 | education years | 0.02 [−0.08, 0.115] | 0.68 | |||
| fluid intelligence | −0.097 [−0.23, 0.032] | 0.14 | ||||||
| BMI | 0.017 [−0.08, 0.113] | 0.72 | ||||||
| height | 0.03 [−0.07, 0.125] | 0.54 | ||||||
| dbSNP: rs202123354 | chr18: 3452067 A>G | GenBank: NM_170695:c.G90A:p.W30X | 0.01241 | education years | 0.023 [−0.23, 0.278] | 0.86 | DD (holoprosencephaly) | |
| fluid intelligence | 0.169 [−0.2, 0.539] | 0.37 | ||||||
| BMI | 0.134 [−0.12, 0.389] | 0.30 | ||||||
| height | 0.053 [−0.2, 0.308] | 0.68 |
Classification of likely pathogenic variants in maturity-onset diabetes of the young (MODY) and developmental disorders (DD) from UKB. Abbreviations are as follows: UKB = UK Biobank, RSID = Reference SNP cluster ID, HGVS = Human Genome Variation Society, MAF = minor allele frequency, CI = confidence interval, BMI = body mass index, DD = developmental disorder.