| Literature DB >> 26924528 |
Daniel Greene1, Sylvia Richardson2, Ernest Turro3.
Abstract
Rare genetic disorders, which can now be studied systematically with affordable genome sequencing, are often caused by high-penetrance rare variants. Such disorders are often heterogeneous and characterized by abnormalities spanning multiple organ systems ascertained with variable clinical precision. Existing methods for identifying genes with variants responsible for rare diseases summarize phenotypes with unstructured binary or quantitative variables. The Human Phenotype Ontology (HPO) allows composite phenotypes to be represented systematically but association methods accounting for the ontological relationship between HPO terms do not exist. We present a Bayesian method to model the association between an HPO-coded patient phenotype and genotype. Our method estimates the probability of an association together with an HPO-coded phenotype characteristic of the disease. We thus formalize a clinical approach to phenotyping that is lacking in standard regression techniques for rare disease research. We demonstrate the power of our method by uncovering a number of true associations in a large collection of genome-sequenced and HPO-coded cases with rare diseases.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26924528 PMCID: PMC4827100 DOI: 10.1016/j.ajhg.2016.01.008
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.025
Figure 1Example HPO Coding of a Subject with Wiskott-Aldrich Syndrome
The nodes in blue imply the presence of the more general ancestral phenotypes depicted as gray nodes. No blue node has a directed path to any other, which means that the blue nodes comprise a minimal set of HPO terms. The graph has been simplified by removing nodes that link together only two other nodes.
Figure 2Results of Inference on Simulated Data
Phenotype data were simulated using three levels of expressivity r of the disease terms. The plots within each panel correspond to different frequencies of the rare genotype. In each plot, the red dots mark the estimated posterior mean of γ for 64 datasets simulated under and the gray dots show an equivalent set of estimates for datasets simulated under (i.e., whereby phenotypes for subjects having y = 1 are sampled from the same distribution as for subjects having y = 0). The position of points on the x axis within a plot is arbitrary.
Studies from which Genetic and Phenotypic Data Were Obtained
| Bleeding and Platelet Disorders (BPD) | detailed patient-specific HPO terms | 709 | 74 |
| Primary ImmunoDeficiency (PID) | Abnormality of the immune system (HP:0002715) | 201 | 131 |
| Pulmonary Arterial Hypertension (PAH) | Pulmonary hypertension (HP:0002092) | 422 | 9 |
| Specialist Pathology Evaluating Exomes in Diagnostics (SPEED) | Retinal dystrophy (HP:0000556) | 384 | 241 |
| Abnormality of the nervous system (HP:0000707) | 215 | 689 | |
| Abnormality of the nervous system and Retinal dystrophy (HP:0000707, HP:0000556) | 7 | ||
| Phenotypic abnormality (HP:0000118) | 107 |
Note that the SPEED project has a branch dealing with retinal dystrophy and another branch dealing with abnormalities of the nervous system and that 7 individuals are included in both branches. In addition, 107 subjects could not be assigned to a specific sub-project at the time of writing due to lack of information and we assigned them a single abstract HPO term “Phenotypic abnormality” (HP:0000118).
Figure 3Results for ACTN1
The panels show results obtained by applying the SimReg method to phenotype data for all subjects and genotype data for ACTN1. There were 43 individuals in our dataset coded with the rare genotype for this gene, of which 22 were coded with “Thrombocytopenia” and “Increased mean platelet volume.” The graph shows the estimated probabilities of inclusion of individual terms in ϕ (only the seven terms with the highest probabilities of inclusion and their ancestors are shown). The acronym “BBFT” refers to “Abnormality of blood and blood-forming tissues.” The heat map shows the estimated probabilities of pairs of terms co-occurring in ϕ, for pairs composed from the ten most frequently included individual terms.
Figure 4Results for DIAPH1 and RASGRP2
Estimated posterior probabilities of individual terms being included in the characteristic phenotype ϕ using phenotype data for all subjects and variant data for DIAPH1 encoded under a high-impact dominant model and RASGRP2 encoded under a recessive model. The ten terms with the highest marginal posterior probability are shown. The estimated posterior probability that γ = 1 is equal to 0.872 and 0.750 for DIAPH1 and RASGRP2, respectively.
Known Genes for which and the Inferred Phenotype Was Compatible with the Known Disorder
| dominant | bleeding and platelet disorder | 1.00 | increased mean platelet volume (0.79), thrombocytopenia (0.56), platelet count (0.44) | ||
| dominant | pulmonary arterial hypertension | 1.00 | pulmonary hypertension (0.34), elevated pulmonary artery pressure (0.31), pulmonary artery (0.11) | ||
| recessive | retinal dystrophy | 0.99 | retinal dystrophy (0.22), retina (0.22), fundus (0.16) | ||
| recessive | retinal dystrophy | 0.99 | retina (0.23), retinal dystrophy (0.2), fundus (0.17) | ||
| recessive | retinal dystrophy | 0.97 | retinal dystrophy (0.21), retina (0.18), fundus (0.18) | ||
| high-impact dominant | bleeding and platelet disorder | 0.95 | reduced factor XI activity (0.89), intrinsic pathway (0.11), platelet aggregation (0.07) | ||
| recessive | bleeding and platelet disorder | 0.75 | platelet aggregation (0.67), collagen-induced platelet aggregation (0.2), platelet function (0.1) | ||
| high-impact dominant | retinal dystrophy | 0.70 | retinal dystrophy (0.2), retina (0.17), fundus (0.14) | ||
| high-impact dominant | bleeding and platelet disorder | 0.68 | extrinsic pathway (0.5), reduced factor vii activity (0.46), white hair (0.1) | ||
| high-impact dominant | retinal dystrophy | 0.42 | retina (0.2), retinal dystrophy (0.17), posterior segment of the eye (0.16) |
We display the mode of inheritance under which the association was found, the known disorder, the probability of association, and the top three HPO terms (shown in abbreviated form) in the inferred phenotypes. The marginal posterior probability of inclusion in the characteristic phenotype is shown in brackets next to each term. When an association was found under multiple modes of inheritance, only the true mode is shown. Note that the inferred phenotypes are influenced by prior phenotypic information in the form of OMIM and MGI annotations.
Figure 5Overall Results
Distributions of the estimated posterior means of γ obtained by applying the SimReg method to each gene under three different modes of inheritance. The tails are truncated at the most extreme values. The dashes indicate values greater than 0.25. The known genes for the BRIDGE project disorders having and a compatible inferred phenotype are labeled and colored in red. An asterisk indicates that a posterior mean of γ greater than 0.25 was estimated only with the use of a prior on ϕ that was informed by the literature of human and murine heritable disorders.