| Literature DB >> 26119816 |
Tudor Groza1, Sebastian Köhler2, Dawid Moldenhauer3, Nicole Vasilevsky4, Gareth Baynam5, Tomasz Zemojtel6, Lynn Marie Schriml7, Warren Alden Kibbe8, Paul N Schofield9, Tim Beck10, Drashtti Vasant11, Anthony J Brookes10, Andreas Zankl12, Nicole L Washington13, Christopher J Mungall13, Suzanna E Lewis13, Melissa A Haendel4, Helen Parkinson11, Peter N Robinson14.
Abstract
The Human Phenotype Ontology (HPO) is widely used in the rare disease community for differential diagnostics, phenotype-driven analysis of next-generation sequence-variation data, and translational research, but a comparable resource has not been available for common disease. Here, we have developed a concept-recognition procedure that analyzes the frequencies of HPO disease annotations as identified in over five million PubMed abstracts by employing an iterative procedure to optimize precision and recall of the identified terms. We derived disease models for 3,145 common human diseases comprising a total of 132,006 HPO annotations. The HPO now comprises over 250,000 phenotypic annotations for over 10,000 rare and common diseases and can be used for examining the phenotypic overlap among common diseases that share risk alleles, as well as between Mendelian diseases and common diseases linked by genomic location. The annotations, as well as the HPO itself, are freely available.Entities:
Mesh:
Year: 2015 PMID: 26119816 PMCID: PMC4572507 DOI: 10.1016/j.ajhg.2015.05.020
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.025
Figure 1Algorithm 1
Summary of the algorithm used to identify a set of HPO term annotated to diseases. See Material and Methods for explanations.
Figure 2Overview of CR and Bioinformatic Analysis
The analysis was performed in several major steps. (1) Bio-LarK was used to analyze the PubMed-MEDLINE 2014 corpus, which resulted in a total of 5,136,645 abstracts annotated with MeSH terms and phenotypic features. (2) For each of 3,145 resulting diseases, the frequency and specificity of HPO terms found in the abstract were used for inferring phenotypic annotations. (3) These annotations were used for producing disease models for each of the diseases. (4) Medical validation of the annotations was performed on the basis of disease, phenotype, and SNP annotations in GWAS Central for phenotype sharing in common disease. (5) Validation with OMIM, Orphanet, and DO was used for assessing phenotype sharing between rare and common diseases linked to the same locus.
Figure 3Phenotypic Network of Common Disease
A total of 1,678 common diseases could be mapped to at least one of 13 top-level DO categories (Figures S5 and S6). 1,148 of these diseases displayed a connection to another disease with a phenotypic similarity score of at least 2.0. They are shown as a node in the graph and are colored according to membership in the upper-level disease categories. The thickness of the connections between the nodes reflects the degree of phenotypic similarity
Figure 4Phenotype-SNP Network
For constructing this network, individual HPO terms were connected to SNPs if the SNP was significantly associated with a disease characterized by the HPO term in question. For instance, the SNP rs5029939 is significantly associated with both Sjögren syndrome and systemic lupus erythematosus. The diseases also share a number of phenotypic features, including “antinuclear antibody positivity” (HP: 0003493) and “xerostomia” (HP: 0000217). A small and particularly dense subset of the network was manually chosen. The network is centered on ten HPO terms representing clinical features that are common in autoimmune diseases.
Phenotypic Overlap between Rare and Complex Disorders
| rs840016: rheumatoid arthritis | edema (HP: 0000969), | |
| rs2268361: polycystic ovary syndrome | abnormality of the ovary (HP: 0000137), | |
| rs13081389: type 2 diabetes mellitus | hyperglycemia (HP: 0003074), | |
| rs295: metabolic syndrome X | hypercholesterolemia (HP: 0003124), | |
| rs34778348: Parkinson disease | rigidity (HP: 0002063), | |
| rs7164883: atrial fibrillation | arrhythmia (HP: 0011675), | |
| rs12149070: COPD | respiratory tract infection (HP: 0011947), |
GWAS hits localized in the vicinity of Mendelian-disease-associated genes could be associated with common diseases that have phenotypic overlaps with the corresponding Mendelian diseases. Seven examples in which common and rare diseases linked to neighboring loci and showed substantial phenotypic overlap were manually chosen. The protein-coding gene associated with the rare disease, as well as the accession number of the polymorphism located in non-coding sequence near the gene, is shown. The following abbreviation is used: COPD, chronic obstructive pulmonary disease.