| Literature DB >> 35758016 |
Sarah M Alghamdi1, Paul N Schofield2, Robert Hoehndorf1.
Abstract
Computing phenotypic similarity helps identify new disease genes and diagnose rare diseases. Genotype-phenotype data from orthologous genes in model organisms can compensate for lack of human data and increase genome coverage. In the past decade, cross-species phenotype comparisons have proven valuble, and several ontologies have been developed for this purpose. The relative contribution of different model organisms to computational identification of disease-associated genes is not fully explored. We used phenotype ontologies to semantically relate phenotypes resulting from loss-of-function mutations in model organisms to disease-associated phenotypes in humans. Semantic machine learning methods were used to measure the contribution of different model organisms to the identification of known human gene-disease associations. We found that mouse genotype-phenotype data provided the most important dataset in the identification of human disease genes by semantic similarity and machine learning over phenotype ontologies. Other model organisms' data did not improve identification over that obtained using the mouse alone, and therefore did not contribute significantly to this task. Our work impacts on the development of integrated phenotype ontologies, as well as for the use of model organism phenotypes in human genetic variant interpretation. This article has an associated First Person interview with the first author of the paper.Entities:
Keywords: Disease gene discovery; Machine learning; Model organism; Ontology; Phenotype; Semantic similarity
Mesh:
Year: 2022 PMID: 35758016 PMCID: PMC9366895 DOI: 10.1242/dmm.049441
Source DB: PubMed Journal: Dis Model Mech ISSN: 1754-8403 Impact factor: 5.732
Fig. 1.Illustration of the approaches that we used to calculate phenotypic similarity. Resnik's similarity uses the taxonomy of the ontology. OPA2Vec generates vector representations by using the axioms of the ontologies propagated over the subsumption hierarchy along with the natural language information available in the ontology. DL2Vec and OWL2Vec generate a graph from the ontologies axioms then perform random walks to generate vector representations for genes and diseases, with some differences including that the graphs are directed in OWL2Vec and undirected in DL2Vec.
Comparison of the performance of predicting gene–disease associations evaluated for diseases associated with genes that have orthologs with at least one phenotype annotation in mouse, fish, fly and yeast (255 genes)
Predicting gene–disease associations
Comparison of the performance of the Pheno-e and uPheno ontologies to predict gene–disease associations using mouse, fly and yeast on the human evaluations dataset
Predicting gene–disease associations using supervised methods and our proposed naïve classifier
Fig. 2.Human genes with model organism orthologs. The pairwise intersection of model organisms with phenotypes is illustrated in subgraphs. Each of these subgraphs represents 18,508 human genes in total.
Pheno-e summary of direct and indirect inferred subclasses and superclass axioms between different organism phenotype classes
uPheno summary of direct and indirect inferred shared ancestor generic class count between different organism phenotype classes
Fig. 3.Example of inferred hierarchy relating classes from different organism phenotypes. The classes are colour coded according to the source from which they were obtained.