| Literature DB >> 35047837 |
Hye In Kim1, Bin Ye1, Jeffrey Staples1, Anthony Marcketta1, Chuan Gao1, Alan R Shuldiner1, Cristopher V Van Hout1.
Abstract
Parent-of-origin (PoO) effects refer to the differential phenotypic impacts of genetic variants dependent on their parental inheritance due to imprinting. While PoO effects can influence complex traits, they may be poorly captured by models that do not differentiate the parental origin of the variant. The aim of this study was to conduct a genome-wide screen for PoO effects on a broad range of clinical traits derived from electronic health records (EHR) in the DiscovEHR study enriched with familial relationships. Using pairwise kinship estimates from genetic data and demographic data, we identified 22,051 offspring among 134,049 individuals in the DiscovEHR study. PoO of ~9 million variants was assigned in the offspring by comparing offspring and parental genotypes and haplotypes. We then performed genome-wide PoO association analyses across 154 quantitative and 611 binary traits extracted from EHR. Of the 732 significant PoO associations identified (p < 5 × 10-8), we attempted to replicate 274 PoO associations in the UK Biobank study with 5,015 offspring and replicated 9 PoO associations (p < 0.05). In summary, our study implements a bioinformatic and statistical approach to examine PoO effects genome-wide in a large population study enriched with familial relationships and systematically characterizes PoO effects on hundreds of clinical traits derived from EHR. Our results suggest that, while the statistical power to detect PoO effects remains modest yet, accurately modeling PoO effects has the potential to find new associations that may have been missed by the standard additive model, further enhancing the mechanistic understanding of genetic influence on complex traits.Entities:
Keywords: GWAS; electronic health record; familial relationship; imprinting; parent-of-origin effect
Year: 2021 PMID: 35047837 PMCID: PMC8756508 DOI: 10.1016/j.xhgg.2021.100039
Source DB: PubMed Journal: HGG Adv ISSN: 2666-2477
Figure 1Identification of parent-offspring relationships and PoO assignment in DiscovEHR study
(A) Genome-wide identity-by-descent (IBD) was estimated from genetic data between every pair of individuals in the DiscovEHR study. Pairs with genome-wide probability of sharing one allele IBD (IBD1) > 0.8 (in red) were inferred to be in parent-offspring relationships.
(B) In each parent-offspring relationship, offspring, father, and mother were inferred based on age and sex information. The number of offspring with one parent or both parents in the study is indicated in the corresponding area of the Venn diagram.
(C) Parent-of-origin (PoO) of variants was assigned among offspring with at least one parent available. For each heterozygous genotype, PoO of the minor allele was assigned using two methods. When at least one available parental genotype is homozygous, PoO was determined based on Mendelian segregation (left). When the available parental genotype(s) is/are heterozygous, PoO was estimated by comparing the haplotypes around the variant between offspring and each available parent (right). See text for detailed methods.
PoO specificity among the additive associations identified within imprinted regions
| Imprinted region | Variant | Nearest gene | Variant effect | Trait | MAF | Num.all | Num.offspring | Beta.add | pval.add | Beta.pat | pval.pat | Beta.mat | pval.mat | Beta.diff | pval.diff |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7:50234619:T:C | Intergenic | % monocytes | 0.055 | 102,797 | 17,544 | −0.09 | 9.6 × 10−27 | −0.13 | 1.9 × 10−5 | −0.04 | 0.15 | 0.10 | 0.028 | ||
| 7:50541672:T:C | intronic intronic | % monocytes | 0.034 | 102,797 | 17,544 | 0.07 | 4.9 × 10−10 | 0.15 | 7.9 × 10−5 | 0.04 | 0.32 | −0.12 | 0.021 | ||
| 7:50359449:C:T | Intronic | % monocytes | 0.023 | 102,797 | 17,544 | −0.12 | 3.2 × 10−19 | −0.19 | 7.9 × 10−5 | −0.08 | 0.084 | 0.14 | 0.029 | ||
| 6:160270606:T:G | Intronic | EGFR | 0.133 | 116,211 | 19,467 | 0.03 | 2.4 × 10−9 | 0.06 | 2.3 × 10−5 | 0.01 | 0.46 | −0.05 | 0.021 | ||
| 6:160743692:C:T | intronic | cholesterol | 0.031 | 93,077 | 14,619 | 0.07 | 1.1 × 10−7 | 0.07 | 0.14 | 0.20 | 1.5 × 10−5 | 0.13 | 0.052 | ||
| 7:130738173:T:C | upstream | HDL-C | 0.481 | 93,383 | 14,642 | 0.04 | 1.2 × 10−23 | 0.03 | 0.064 | 0.13 | 2.1 × 10−16 | 0.10 | 3.1 × 10−6 | ||
| upstream | TC/HDL-C ratio | 0.481 | 93,381 | 14,629 | −0.04 | 1.4 × 10−17 | −0.003 | 0.84 | −0.11 | 2.2 × 10−10 | −0.10 | 8.1 × 10−6 | |||
| upstream | triglyceride | 0.481 | 93,365 | 14,625 | −0.04 | 5.1 × 10−17 | −0.01 | 0.56 | −0.10 | 1.4 × 10−9 | −0.09 | 1.3 × 10−4 | |||
| 11:2875083:G:A | intergenic | bilirubin | 0.087 | 108,535 | 17,913 | 0.05 | 7.7 × 10−13 | 0.05 | 0.054 | 0.10 | 3.5 × 10−5 | 0.05 | 0.17 | ||
| 11:3017489:T:A | intronic | bilirubin | 0.200 | 108,535 | 17,913 | 0.03 | 8.2 × 10−9 | 0.03 | 0.073 | 0.07 | 5.3 × 10−5 | 0.03 | 0.16 | ||
| 14:100704203:T:C | upstream | platelets | 0.329 | 115,308 | 19,552 | −0.04 | 3.2 × 10−20 | 0.02 | 0.12 | −0.10 | 2.3 × 10−11 | −0.12 | 7.8 × 10−10 | ||
| 20:58872268:G:A | intronic | TSH | 0.012 | 101,160 | 17,401 | 0.10 | 1.7 × 10−7 | −0.04 | 0.54 | 0.40 | 2.4 × 10−9 | 0.44 | 3.0 × 10−6 |
Among the 667 associations within the known imprinted regions (p < 3.6 × 10−7) identified under the additive model across 167 quantitative traits, 12 were PoO specific (p < 7.5 × 10−5). Variant is denoted as chromosome:position:reference allele:alternate allele on GRCh38 genome build. Num.all, number of all individuals with given traits; Num.offspring, number of offspring with given traits; Beta.add and pval.add, beta coefficient and p values under additive model; Beta.pat and pval.pat, beta coefficient and p values under paternal model; Beta.mat and pval.mat, beta coefficient and p values under maternal model; Beta.diff and pval.diff, beta coefficient (modeled on maternal allele compared to the paternal allele) and p values under differential model.
Indicates significant PoO-specific associations.
PoO-specific associations for quantitative traits identified in the DiscovEHR study (p < 5 × 10−8) that are replicated in the UK Biobank study (p < 0.05)
| Variant | Nearest gene | Variant effect | Trait | Study | MAF | Num.all | Num.offspring | Beta.add | pval.add | Beta.pat | pval.pat | Beta.mat | pval.mat | Beta.diff | pval.diff |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 22:17116671:G:A | downstream 3′ UTR | % monocytes | DiscovEHR | 0.004 | 102,797 | 17,544 | −0.42 | 1.6 × 10−39 | −0.67∗ | 1.1 × 10−8∗ | −0.17 | 0.16 | 0.48 | 4.8 × 10−3 | |
| UK Biobank | 0.007 | 448,877 | 4,846 | −0.45 | 2.2 × 10−307 | −0.41∗ | 4.7 × 10−3∗ | −0.62 | 0.011 | −0.22 | 0.45 | ||||
| 2:233925128:A:G | upstream | total bilirubin | DiscovEHR | 0.003 | 108,535 | 17,913 | 0.35 | 4.9 × 10−29 | 0.60∗ | 2.8 × 10−8∗ | 0.11 | 0.30 | −0.45 | 5.9 × 10−3 | |
| UK Biobank | 0.005 | 440,287 | 4,768 | 0.28 | 9.4 × 10−146 | 0.39∗ | 0.012∗ | 0.11 | 0.62 | −0.28 | 0.30 | ||||
| 1:161478451:C:G | intergenic | protein | DiscovEHR | 0.197 | 108,144 | 17,857 | −0.04 | 1.1 × 10−17 | −0.10∗ | 2.9 × 10−8∗ | −0.03 | 0.096 | 0.07 | 4.3 × 10−3 | |
| UK Biobank | 0.172 | 405,404 | 4,354 | −0.04 | 5.0 × 10−58 | −0.10∗ | 3.5 × 10−3∗ | −0.06 | 0.20 | 0.04 | 0.50 | ||||
| 17:46812337:C:T | intronic | red blood cells | DiscovEHR | 0.439 | 115,524 | 19,569 | 0.004 | 0.23 | 0.08∗ | 3.2 × 10−9∗ | 0.02 | 0.25 | −0.06 | 2.2 × 10−4 | |
| UK Biobank | 0.444 | 449,656 | 4,855 | 0.02 | 9.6 × 10−23 | 0.07∗ | 3.4 × 10−3∗ | 0.04 | 0.075 | −0.03 | 0.43 | ||||
| 1:115200874:T:G | intergenic | DiscovEHR | 0.476 | 115,524 | 19,569 | 0.01 | 0.019 | 0.02 | 0.13 | 0.09 | 3.0 × 10−11 | 0.07 | 1.7 × 10−5 | ||
| UK Biobank | 0.473 | 449,656 | 4,855 | 0.001 | 0.67 | −0.0002 | 0.99 | 0.05 | 0.019 | 0.05 | 0.094 | ||||
| 6:170235280:G:C | intergenic | DiscovEHR | 0.482 | 115,524 | 19,569 | 0.001 | 0.76 | 0.02 | 0.16 | 0.08 | 3.1 × 10−8 | 0.06 | 5.0 × 10−4 | ||
| UK Biobank | 0.489 | 449,656 | 4,855 | 0.003 | 0.04 | 0.04 | 0.081 | 0.05 | 0.028 | 0.01 | 0.73 | ||||
| 20:55475672:A:G | intergenic | DiscovEHR | 0.384 | 115,524 | 19,569 | 0.003 | 0.36 | 0.01 | 0.31 | 0.08 | 4.8 × 10−8 | 0.06 | 7.4 × 10−4 | ||
| UK Biobank | 0.376 | 449,656 | 4,855 | 0.002 | 0.19 | 0.06 | 0.014 | 0.06 | 9.1 × 10−3 | 0.002 | 0.94 | ||||
| 18:66392331:A:T | intergenic | HDL cholesterol | DiscovEHR | 0.004 | 93,383 | 14,642 | −0.04 | 0.25 | 0.15 | 0.21 | 0.72 | 1.7 × 10−8 | 0.49 | 9.1 × 10−3 | |
| UK Biobank | 0.003 | 405,671 | 4,361 | −0.01 | 0.65 | 0.12 | 0.40 | 0.46 | 0.047 | 0.33 | 0.22 | ||||
| 14:100704203:T:C | intergenic | platelets | DiscovEHR | 0.329 | 115,308 | 19,552 | −0.04 | 3.2 × 10−20 | 0.02 | 0.12 | −0.10 | 2.3 × 10−11 | −0.12 | 7.8 × 10−10 | |
| UK Biobank | 0.335 | 449,652 | 4,855 | −0.04 | 1.9 × 10−95 | −0.001 | 0.96 | −0.10 | 4.5 × 10−4 | −0.10 | 9.5 × 10−3 |
Nine of the PoO-specific associations (p < 5 × 10−8) identified from the genome-wide screen for 154 traits in the DiscovEHR study were replicated in the UK Biobank study (p < 0.05). Variant is denoted as chromosome:position:reference allele:alternate allele on GRCh38 genome build. Num.all, number of all individuals with given traits; Num.offspring, number of offspring with given traits; Beta.add and pval.add, beta coefficient and p values under additive model; Beta.pat and pval.pat, beta coefficient and p values under paternal model; Beta.mat and pval.mat, beta coefficient and p values under maternal model; Beta.diff and pval.diff, beta coefficient (modeled on maternal allele compared to the paternal allele) and p values under differential model.
Indicates significant PoO associations.
Figure 2Simulation of power to detect PoO effects under different statistical models
Power to detect PoO effects under additive, parental, and differential models was simulated across ranges of minor allele frequencies (MAFs) and effect sizes assuming diverse patterns of PoO effects that could result from imprinting: (A) uniparental, (B) polar dominance, and (C) bipolar dominance effect. Bar plots at the top are illustrative examples of the various patterns of PoO effects that can result from imprinting. The horizontal axis displays a range of simulated effect sizes. The left vertical axes display the % power to detect the effect across a range of MAFs ordered on the right vertical. See text for detailed methods.