| Literature DB >> 24634472 |
Anika Oellrich1, Damian Smedley.
Abstract
Despite great biological and computational efforts to determine the genetic causes underlying human heritable diseases, approximately half (3500) of these diseases are still without an identified genetic cause. Model organism studies allow the targeted modification of the genome and can help with the identification of genetic causes for human diseases. Targeted modifications have led to a vast amount of model organism data. However, these data are scattered across different databases, preventing an integrated view and missing out on contextual information. Once we are able to combine all the existing resources, will we be able to fully understand the causes underlying a disease and how species differ. Here, we present an integrated data resource combining tissue expression with phenotypes in mouse lines and bringing us one step closer to consequence chains from a molecular level to a resulting phenotype. Mutations in genes often manifest in phenotypes in the same tissue that the gene is expressed in. However, in other cases, a systems level approach is required to understand how perturbations to gene-networks connecting multiple tissues lead to a phenotype. Automated evaluation of the predicted tissue-phenotype associations reveals that 72-76% of the phenotypes are associated with disruption of genes expressed in the affected tissue. However, 55-64% of the individual phenotype-tissue associations show spatially separated gene expression and phenotype manifestation. For example, we see a correlation between 'total body fat' abnormalities and genes expressed in the 'brain', which fits recent discoveries linking genes expressed in the hypothalamus to obesity. Finally, we demonstrate that the use of our predicted tissue-phenotype associations can improve the detection of a known disease-gene association when combined with a disease gene candidate prediction tool. For example, JAK2, the known gene associated with Familial Erythrocytosis 1, rises from the seventh best candidate to the top hit when the associated tissues are taken into consideration. Database URL: http://www.sanger.ac.uk/resources/databases/phenodigm/phenotype/list.Entities:
Mesh:
Year: 2014 PMID: 24634472 PMCID: PMC3982582 DOI: 10.1093/database/bau017
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure
1.Illustration of the overall work flow of the study. After downloading and formatting all required data, the expression profiles are merged into one data set. The merged data set is then used to calculate the associations between tissues and phenotypes that are then evaluated. After evaluation, the significant associations are loaded into and provided via the PhenoDigm web interface.
Obtained evaluation results for the automated evaluation against tissues contained in EQ statements of MP concepts
| Hypergeometric distribution | Association rule | |||
|---|---|---|---|---|
| Type of comparison | Phenotype | Association | Phenotype | Association |
| Expected - exact | 184 (76%) | 59 | 20 (71%) | 2 |
| Expected - psp | 88 | 6 | ||
| Expected - ldsp | 66 | 13 | ||
| Number of tissue matches | 58 (24%) | 373 | 8 (29%) | 26 |
| Number of EQ | 879 | 2412 | 177 | 225 |
| Total | 1121 | 2998 | 205 | 272 |
Results are grouped by the type of comparison either on a phenotype or an individual phenotype-tissue association level. On the association level, expected tissues are further divided into whether the same tissue was predicted as used in EQ (exact), predicted tissue is subclass or part_of tissue in EQ (psp), or tissue in EQ is subclass or part_of tissue predicted (ldsp). Note that even though tissues between EQ and prediction do not match, the association may still be correct. No EQ means that no EQ statement referring a tissue that is related to any of the 21 applied was available for evaluation.
Figure
2.Depicts the extension of the PhenoDigm web interface and how the data can be browsed using the newly added pages. From a list of phenotypes, those of interest can be selected, leading to a page that shows P-values for each of the investigated tissues. Hyphens in one of the P-value fields indicate that with the data set no significant association between the phenotype and the tissue could be identified. For significant associations, genes supporting the association between tissue and phenotype are provided.