| Literature DB >> 24349080 |
Jonathan D Mosley1, Sara L Van Driest2, Emma K Larkin1, Peter E Weeke1, John S Witte3, Quinn S Wells1, Jason H Karnes1, Yan Guo4, Lisa Bastarache5, Lana M Olson6, Catherine A McCarty7, Jennifer A Pacheco8, Gail P Jarvik9, David S Carrell10, Eric B Larson10, David R Crosslin9, Iftikhar J Kullo11, Gerard Tromp12, Helena Kuivaniemi12, David J Carey12, Marylyn D Ritchie13, Josh C Denny14, Dan M Roden1.
Abstract
A single mutation can alter cellular and global homeostatic mechanisms and give rise to multiple clinical diseases. We hypothesized that these disease mechanisms could be identified using low minor allele frequency (MAF<0.1) non-synonymous SNPs (nsSNPs) associated with "mechanistic phenotypes", comprised of collections of related diagnoses. We studied two mechanistic phenotypes: (1) thrombosis, evaluated in a population of 1,655 African Americans; and (2) four groupings of cancer diagnoses, evaluated in 3,009 white European Americans. We tested associations between nsSNPs represented on GWAS platforms and mechanistic phenotypes ascertained from electronic medical records (EMRs), and sought enrichment in functional ontologies across the top-ranked associations. We used a two-step analytic approach whereby nsSNPs were first sorted by the strength of their association with a phenotype. We tested associations using two reverse genetic models and standard additive and recessive models. In the second step, we employed a hypothesis-free ontological enrichment analysis using the sorted nsSNPs to identify functional mechanisms underlying the diagnoses comprising the mechanistic phenotypes. The thrombosis phenotype was solely associated with ontologies related to blood coagulation (Fisher's p = 0.0001, FDR p = 0.03), driven by the F5, P2RY12 and F2RL2 genes. For the cancer phenotypes, the reverse genetics models were enriched in DNA repair functions (p = 2×10-5, FDR p = 0.03) (POLG/FANCI, SLX4/FANCP, XRCC1, BRCA1, FANCA, CHD1L) while the additive model showed enrichment related to chromatid segregation (p = 4×10-6, FDR p = 0.005) (KIF25, PINX1). We were able to replicate nsSNP associations for POLG/FANCI, BRCA1, FANCA and CHD1L in independent data sets. Mechanism-oriented phenotyping using collections of EMR-derived diagnoses can elucidate fundamental disease mechanisms.Entities:
Mesh:
Year: 2013 PMID: 24349080 PMCID: PMC3861317 DOI: 10.1371/journal.pone.0081503
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Overview of the nsSNP association approaches.
Panel (a) describes key features of the SNP association approaches used. Panel (b) shows, for a single hypothetical SNP, how assignment of affection status for homozygotes for the minor allele (HZMAs) varies by the approaches. The table lists cancer codes present among the HZMAs, the number of HZMAs that have the cancer code and the Fisher's p-value comparing the proportion of affected HZMAs with the cancer to the proportion in the common allele homozygotes. For this example, all of the listed cancers are assumed to be constituents of the mechanistic phenotype. For the standard genetic models, all subjects with any of the cancers are classified as cases. In contrast, the 2 reverse genetics approaches only analyze subsets of these subjects with cancers meeting pre-specified criteria, as designated by the brackets.
Population characteristics.
| Thrombosis study | Cancer study | |
| Total subjects (n) | 1655 | 3009 |
| No. males (%) | 596 (36.0) | 1699 (56.5) |
| No. females (%) | 1059 (64.0) | 1310 (43.5) |
| Mean (SD) age | 52.3 (17.7) | 51.4 (18.7) |
| Thrombosis diagnoses: n (%) | ||
| All thrombosis phenotypes | 454 (27.4) | |
| Long-term anticoagulation | 179 (10.8) | |
| Stroke | 165 (10.0) | |
| Acute myocardial infarction | 149 (9.0) | |
| Venous thrombosis | 116 (7.0) | |
| Thrombotic pulmonary disease | 47 (2.8) | |
| Other disorders1 | 44 (2.7) | |
| Arterial thrombosis | 36 (2.2) | |
| Spontaneous abortion | 31 (1.9) | |
| Cancer diagnoses: n (%) | ||
| All cancers | 1276 (42.4) | |
| Non-hematological, primary (CA) | 1076 (35.8) | |
| Secondary/metastases (MET) | 362 (12.0) | |
| Hematological (HEM) | 371 (12.3) | |
| Skin (SKN) | 109 (3.6) | |
= 16), Primary hypercoagulable state (n = 13), Budd-Chiari syndrome (n = 3), Thrombophlebitis migrans (n = 1), other congenital deficiencies (n = 2), congenital factor IX disorder (n = 1), other coagulation defects (n = 6), congenital factor VIII disorder (n = 3). (1) Includes: Defibrination syndrome (n
Figure 2ROC analyses for simulation studies.
Analyses are based on 10,000 random samples of 13 phenotyped subjects drawn from the thrombosis data set. ROC curves show sensitivity and specificities based on association p-values when one to five subjects were assigned to be affected with a constituent disease, as compared to association p-values associated with no additional cases. Panels (a) and (b) show ROC curves based on p-value associations for the recessive and reverse genetics models (with >2 affected subjects per constituent phenotype), respectively, when five subjects were assigned a random constituent disease. Each line corresponds to the number of additional subjects assigned a disease. Panels (c) and (d) represent the same models, respectively, for subjects assigned a disease already present among the 13 subjects in the random sample. Panel (e) summarizes AUC values from ROC curves for the recessive, reverse genetics with >2 affected subject (RG1) and reverse genetics with >2 affected and p<0.1 (RG2) models under the four simulations conditions tested. The number of the x-axis refers to the number of additional subjects assigned an affection status for each simulation scenario.
Enriched ontologies for genes associated with the thrombosis phenotype in AAs.
| Model/GO code | GO Term | P-value threshold1 | Genes below threshold2 | Genes with ontology3 | Fisher's exact p-value | FDR p-value | Genes |
|
| |||||||
| GO:0007596 | blood coagulation | 0.012 | 8 | 3 | 0.0001 | 0.03 |
|
| GO:0007599 | hemostasis | 0.012 | 8 | 3 | 0.0001 | 0.03 |
|
| GO:0042060 | wound healing | 0.012 | 8 | 3 | 0.0001 | 0.03 |
|
| GO:0050817 | coagulation | 0.012 | 8 | 3 | 0.0001 | 0.03 |
|
| GO:0050878 | Regulation of body fluids | 0.012 | 8 | 3 | 0.0001 | 0.03 |
|
| GO:0009611 | wound response | 0.03 | 14 | 4 | 0.001 | 0.4 |
|
| GO:0019932 | 2nd msgr signal | 0.03 | 14 | 3 | 0.002 | 0.4 |
|
| GO:0007243 | protein kinase cascade | 0.011 | 7 | 2 | 0.005 | 0.57 |
|
| GO:0007242 | intracellular signaling cascade | 0.03 | 14 | 4 | 0.008 | 0.69 |
|
| GO:0007166 | cell surface receptor | 0.03 | 14 | 5 | 0.009 | 0.82 |
|
|
| |||||||
| GO:0007596 | blood coagulation | 0.011 | 5 | 2 | 0.001 | 0.6 |
|
| GO:0007599 | hemostasis | 0.011 | 5 | 2 | 0.002 | 0.6 |
|
| GO:0042060 | wound healing | 0.011 | 5 | 2 | 0.002 | 0.6 |
|
| GO:0050817 | coagulation | 0.011 | 5 | 2 | 0.002 | 0.6 |
|
| GO:0050878 | Regulation of body fluids | 0.011 | 5 | 2 | 0.002 | 0.6 |
|
| GO:0003001 | signal involved in cell-cell signaling | 0.03 | 11 | 2 | 0.006 | 0.78 |
|
|
| |||||||
| GO:0007596 | blood coagulation | 0.033 | 13 | 3 | 0.0005 | 0.16 |
|
| GO:0007599 | hemostasis | 0.033 | 13 | 3 | 0.0005 | 0.16 |
|
| GO:0042060 | wound healing | 0.033 | 13 | 3 | 0.0005 | 0.16 |
|
| GO:0050817 | coagulation | 0.033 | 13 | 3 | 0.0005 | 0.16 |
|
| GO:0050878 | reg body fluids | 0.033 | 13 | 3 | 0.0005 | 0.16 |
|
| GO:0009611 | wound response | 0.033 | 14 | 4 | 0.001 | 0.36 |
|
| GO:0007243 | protein kinase cascade | 0.006 | 5 | 2 | 0.003 | 0.6 |
|
All ontologies with a Fisher's exact p<0.01 are shown.
(1) The association p-value cut-off that gave the strongest enrichment.
(2) Number of genes with p-values below the p-value cut-off.
(3) Number of genes with the ontology.
Enriched ontologies for genes associated with the cancer phenotypes in whites.
| Model/GO code | GO Term | P-value thres hold1 | Genes below thres hold2 | Genes with ontology3 | Fisher's exact p-value | FDR p-value | Genes |
|
| |||||||
| GO:0006281 | DNA repair | 0.0098 | 27 | 60 | 0.00002 | 0.03 |
|
| GO:0033554 | stress response | 0.0098 | 27 | 7 | 0.00003 | 0.03 |
|
| GO:0006974 | DNA damage response | 0.0098 | 27 | 6 | 0.00003 | 0.03 |
|
|
| |||||||
| GO:0006281 | DNA repair | 0.0098 | 27 | 6 | 0.00002 | 0.045 |
|
| GO:0006974 | DNA damage response | 0.0098 | 27 | 6 | 0.00003 | 0.045 |
|
|
| |||||||
| GO:0000070 | mitotic sister chromatid segregation | 0.0003 | 2 | 2 | 0.000004 | 0.004 |
|
| GO:0007059 | chromosome segregation | 0.0003 | 2 | 2 | 0.00004 | 0.03 |
|
| GO:0000279 | M phase | 0.001 | 4 | 3 | 0.00005 | 0.03 |
|
| GO:0022403 | cell cycle phase | 0.001 | 4 | 3 | 0.00008 | 0.03 |
|
| GO:0000087 | M phase of mitotic cell cycle | 0.0003 | 2 | 2 | 0.00013 | 0.04 |
|
| GO:0000280 | nuclear division | 0.0003 | 2 | 2 | 0.00013 | 0.04 |
|
| GO:0007067 | mitosis | 0.0003 | 2 | 2 | 0.00013 | 0.04 |
|
| GO:0048285 | organelle fission | 0.0003 | 2 | 2 | 0.00016 | 0.04 |
|
| GO:0022402 | cell cycle process | 0.0010 | 4 | 3 | 0.00017 | 0.04 |
|
All ontologies with an FDR p<0.05 are shown.
(1) The association p-value cut-off that gave the strongest enrichment.
(2) Number of genes with p-values below the p-value cut-off.
(3) Number of genes with the ontology.
Figure 3Double-stranded DNA repair pathway.
Genes identified in the analyses are shown in red. When DNA is damaged, damage sensors promote recruitment and assembly of a repair complex comprised of Fanconi Anemia (FA) genes, BRCA1 and other proteins to the site of damage.
Association results for enriched DNA repair genes in whites.
| Recessive Model | Additive Model | |||||||||
| Phenotype | Reverse genetics P-value | Reverse genetics P-value | OR | 95%CI | P | OR | 95%CI | P | ||
|
| ||||||||||
| ALL | 0.13 | 0.091 | 2.6 | (0.8–8.7) | 0.12 | 1.1 | (0.9–1.4) | 0.31 | ||
| CA | 1 | 1 | 0.8 | (0.3–2.8) | 0.79 | 1.1 | (0.9–1.4) | 0.36 | ||
|
| 0.009 | 0.008 | 4.9 | (1.5–15.8) | 0.008 | 1.1 | (0.8–1.6) | 0.38 | ||
| MET | 1 | 1 | n/a | n/a | 1.00 | 1.1 | (0.8–1.5) | 0.46 | ||
| SKN | 1 | 1 | 2.3 | (0.3–18.4) | 0.42 | 0.7 | (0.4–1.4) | 0.30 | ||
|
| ||||||||||
| ALL | 0.31 | 0.17 | 0.9 | (0.4–2.2) | 0.85 | 1.1 | (0.9–1.3) | 0.50 | ||
| CA | 0.26 | 0.14 | 1.0 | (0.4–2.4) | 0.97 | 1.0 | (0.8–1.2) | 0.95 | ||
| HEM | 0.68 | 1 | 0.6 | (0.1–2.7) | 0.54 | 1.2 | (0.9–1.6) | 0.20 | ||
|
| 0.01 | 0.006 | 3.3 | (1.3–8.2) | 0.01 | 1.2 | (0.9–1.6) | 0.17 | ||
| SKNSMB | 1 | 1 | 1.3 | (0.2–10.1) | 0.78 | 1.0 | (0.6–1.6) | 0.98 | ||
|
| ||||||||||
| ALL | 0.19 | 0.036 | 1.6 | (0.7–3.6) | 0.22 | 1.1 | (0.9–1.3) | 0.19 | ||
| CA | 0.38 | 0.19 | 1.3 | (0.6–3.0) | 0.46 | 1.2 | (1.0–1.4) | 0.11 | ||
| HEM | 0.35 | 0.30 | 1.4 | (0.5–4.1) | 0.57 | 0.9 | (0.7–1.2) | 0.57 | ||
| MET | 0.22 | 0.11 | 1.4 | (0.5–4.2) | 0.54 | 1.0 | (0.8–1.4) | 0.85 | ||
|
| 8.00E-04 | 8.00E-04 | 6.7 | (2.5–18.3) | 1.8E-04 | 1.9 | (1.3–2.7) | 0.001 | ||
|
| ||||||||||
| ALL | 0.20 | 0.06 | 1.1 | (0.5–2.4) | 0.90 | 1.0 | (0.9–1.3) | 0.65 | ||
| CA | 0.05 | 0.03 | 1.4 | (0.6–3.2) | 0.42 | 1.1 | (0.9–1.3) | 0.50 | ||
| HEM | 0.36 | 0.21 | 1.4 | (0.5–4.3) | 0.52 | 1.0 | (0.7–1.3) | 0.75 | ||
|
| 0.003 | 0.002 | 4.0 | (1.6–9.6) | 0.002 | 1.4 | (1.0–1.8) | 0.02 | ||
| SKNSMB | 1 | 1 | 1.3 | (0.2–10.0) | 0.78 | 1.1 | (0.7–1.8) | 0.70 | ||
|
| ||||||||||
| ALL | 0.048 | 0.03 | 3.3 | (1.0–10.9) | 0.047 | 1.1 | (0.9–1.4) | 0.42 | ||
|
| 0.01 | 0.005 | 4.4 | (1.3–14.4) | 0.01 | 1.1 | (0.9–1.4) | 0.43 | ||
| HEM | 1 | 1 | n/a | n/a | 1.00 | 1.0 | (0.7–1.4) | 0.86 | ||
| MET | 1 | 1 | 1.5 | (0.3–7.1) | 0.60 | 1.1 | (0.8–1.5) | 0.64 | ||
| SKNSMB | 1 | 1 | n/a | n/a | 1.00 | 0.8 | (0.4–1.5) | 0.41 | ||
|
| ||||||||||
| ALL | 0.36 | 0.20 | 2.8 | (1.0–8.3) | 0.06 | 1.0 | (0.8–1.3) | 0.75 | ||
| CA | 0.27 | 0.16 | 2.1 | (0.8–5.9) | 0.15 | 1.0 | (0.8–1.3) | 0.82 | ||
| HEM | 0.35 | 0.30 | 1.8 | (0.5–6.5) | 0.38 | 1.0 | (0.7–1.4) | 0.97 | ||
| MET | 0.33 | 1 | 1.9 | (0.5–6.9) | 0.33 | 1.0 | (0.7–1.4) | 0.87 | ||
|
| 0.01 | 0.01 | 7.3 | (2.0–26.5) | 0.002 | 1.2 | (0.7–2.0) | 0.56 | ||
Denotes the mechanistic phenotype driving the DNA repair enrichment.
‘n/a’ indicates there were no affected minor allele homozygotes for the phenotype. The value
Association of DNA repair genes in independent data sets.
| Logistic regression comparing HZs1 | Additive Model | |||||
| Phenotype | OR | 95% CI | p-value | OR | 95% CI | P |
|
| ||||||
|
| ||||||
| ALL | 1.7 | (0.7–4.0) | 0.30 | 1.2 | (0.9–1.4) | 0.15 |
| CA | 1.5 | (0.6–3.7) | 0.42 | 1.1 | (0.9–1.4) | 0.32 |
| HEM | 2.0 | (0.1–15.2) | 0.84 | 1.2 | (0.7–2.3) | 0.51 |
| MET | 1.1 | (0.02–7.1) | 1.00 | 0.8 | (0.5–1.5) | 0.58 |
| **SKN | 4.1 | (0.9–13.3) | 0.06 | 1.7 | (1.2–2.5) | 0.006 |
|
| ||||||
| ALL | 1.4 | (0.92–2.0) | 0.13 | 1.0 | (0.9–1.1) | 0.82 |
| CA | 1.4 | (0.9–2.1) | 0.10 | 1.0 | (0.9–1.1) | 0.88 |
| HEM | 1.4 | (0.6–3.5) | 0.41 | 1.0 | (0.8–1.2) | 0.84 |
| **MET | 2.4 | (1.3–4.4) | 0.01 | 1.2 | (1.0–1.4) | 0.06 |
| SKN | 1.2 | (0.4–3.8) | 0.79 | 1.1 | (0.9–1.4) | 0.42 |
|
| ||||||
|
| ||||||
| ALL | 8.1 | (1.1–360.6) | 0.04 | 1.0 | (0.8–1.2) | 0.96 |
| CA | 4.5 | (0.9–44.4) | 0.08 | 0.9 | (0.8–1.1) | 0.45 |
| **HEM | 0.7 | (0.1–5.5) | 1.00 | 1.1 | (0.8–1.5) | 0.42 |
| MET | 1.2 | (0.1–6.5) | 1.00 | 1.0 | (0.8–1.3) | 0.85 |
| SKN | 1.3 | (0.2–6.1) | 0.94 | 1.3 | (0.9–2.0) | 0.16 |
|
| ||||||
| ALL | 2.3 | (0.9–6.4) | 0.09 | 0.9 | (0.8–1.2) | 0.62 |
| CA | 2.0 | (0.8–5.0) | 0.17 | 0.9 | (0.7–1.2) | 0.49 |
| HEM | 0.8 | (0.2–2.9) | 1.00 | 0.7 | (0.5–1.0) | 0.05 |
| **MET | 4.1 | (1.4–10.8) | 0.01 | 1.2 | (0.9–1.6) | 0.20 |
| SKN | 1.7 | (0.6–4.5) | 0.37 | 1.5 | (1.0–2.2) | 0.06 |
|
| ||||||
| ALL | 1.3 | (0.6–2.9) | 0.59 | 1.0 | (0.8–1.1) | 0.74 |
| CA | 1.3 | (0.6–2.8) | 0.67 | 1.0 | (0.8–1.2) | 0.94 |
| HEM | 1.7 | (0.5–4.3) | 0.40 | 1.0 | (0.8–1.3) | 0.85 |
| MET | 0.5 | (0.1–1.7) | 0.36 | 0.9 | (0.7–1.1) | 0.43 |
| **SKN | 1.4 | (0.2–5.8) | 0.92 | 0.8 | (0.6–1.2) | 0.35 |
|
| ||||||
| ALL | 0.9 | (0.4–2.1) | 0.90 | 0.9 | (0.8–1.0) | 0.15 |
| CA | 0.6 | (0.2–1.5) | 0.35 | 0.9 | (0.8–1.1) | 0.35 |
| HEM | 3.5 | (1.3–8.7) | 0.01 | 0.9 | (0.7–1.2) | 0.42 |
| **MET | 0.8 | (0.2–2.4) | 0.89 | 1.0 | (0.8–1.3) | 0.83 |
| SKN | n/a | n/a | n/a | 0.8 | (0.5–1.2) | 0.21 |
|
| ||||||
| ALL | 2.6 | (0.5–16.1) | 0.27 | 1.0 | (0.8–1.2) | 0.88 |
| **CA | 1.3 | (0.3–5.9) | 0.90 | 0.9 | (0.8–1.1) | 0.44 |
| HEM | 0.9 | (0.02–7.5) | 1.00 | 1.0 | (0.7–1.3) | 0.99 |
| MET | 1.5 | (0.2–8.2) | 0.86 | 0.9 | (0.7–1.2) | 0.59 |
| SKN | n/a | n/a | n/a | 0.6 | (0.3–1.0) | 0.07 |
(1) This model compared minor allele homozygotes to matched common allele homozygotes.
= 3,092 subjects). (2) Genotype data for this nsSNP was only available for one eMERGE site (n