| Literature DB >> 23660285 |
Damian Smedley1, Anika Oellrich, Sebastian Köhler, Barbara Ruef, Monte Westerfield, Peter Robinson, Suzanna Lewis, Christopher Mungall.
Abstract
The ultimate goal of studying model organisms is to translate what is learned into useful knowledge about normal human biology and disease to facilitate treatment and early screening for diseases. Recent advances in genomic technologies allow for rapid generation of models with a range of targeted genotypes as well as their characterization by high-throughput phenotyping. As an abundance of phenotype data become available, only systematic analysis will facilitate valid conclusions to be drawn from these data and transferred to human diseases. Owing to the volume of data, automated methods are preferable, allowing for a reliable analysis of the data and providing evidence about possible gene-disease associations. Here, we propose Phenotype comparisons for DIsease Genes and Models (PhenoDigm), as an automated method to provide evidence about gene-disease associations by analysing phenotype information. PhenoDigm integrates data from a variety of model organisms and, at the same time, uses several intermediate scoring methods to identify only strongly data-supported gene candidates for human genetic diseases. We show results of an automated evaluation as well as selected manually assessed examples that support the validity of PhenoDigm. Furthermore, we provide guidance on how to browse the data with PhenoDigm's web interface and illustrate its usefulness in supporting research. Database URL: http://www.sanger.ac.uk/resources/databases/phenodigmEntities:
Mesh:
Year: 2013 PMID: 23660285 PMCID: PMC3649640 DOI: 10.1093/database/bat025
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Illustrates the models and numbers of annotations for each of the imported data resources
| Resource | nom | ob | uc | Average | Maximum | Minimum |
|---|---|---|---|---|---|---|
| Sanger- MGP | 725 | MP | 351 | 3.6 | 65 | 1 |
| MGD | 27 251 | MP | 7219 | 5.6 | 105 | 1 |
| ZFIN | 1613 | ZP | 6766 | 12.2 | 142 | 1 |
| OMIM | 4757 | HPO | 5967 | 11.2 | 120 | 1 |
aNumber of models/diseases in this resource; bontology used for annotations; cnumber of uniquely used ontology terms (concepts); average, maximum and minimum number of annotations assigned to one entity.
Figure 1.Determining the phenotype similarity of two entities, e.g. a mouse model and a disease, is a three-step process in our method. The first step is the alignment of ontology concepts based on OWLSim and assigning scores to individual pairs of ontology concepts as illustrated in the top panel of this figure. In a second step, the best scoring matches for each of the annotated ontology concepts are identified and the overall phenotype similarity score described as either the maximum or mean of these scores. In a third step, we scale these two measures relative to their maximum possible values and calculate a single combined percentage score.
Figure 2.ROC analysis of PhenoDigm’s phenotype prioritization method applied to MGD’s curated mouse model–disease associations (top) and OMIM MorbidMap known gene–disease associations (bottom). The success of PhenoDigm applied to disease and MGD phenotypes is shown in the top panel for the combined score used in PhenoDigm as well as the maxIC, avgIC, maxSimJ and avgSimJ measures used for the original MouseFinder implementation. The bottom panel shows the recall of known gene–disease associations when comparing OMIM phenotypes with MGD, ZFIN, Sanger-MGP or Europhenome model organism phenotypes.
Figure 3.To efficiently browse the obtained prioritisation results, a web interface was developed. As illustrated here, the interface allows the user to browse by and search for diseases and obtain all prioritized models sorted according to species and genes. Genes can then be expanded to models and even to the level of phenotype descriptions to show on what basis the match occured.