Literature DB >> 32734156

Phexpo: a package for bidirectional enrichment analysis of phenotypes and chemicals.

Christopher Hawthorne1, David A Simpson1, Barry Devereux2, Guillermo López-Campos1.   

Abstract

Phenotypes are the result of the complex interplay between environmental and genetic factors. To better understand the interactions between chemical compounds and human phenotypes, and further exposome research we have developed "phexpo," a tool to perform and explore bidirectional chemical and phenotype interactions using enrichment analyses. Phexpo utilizes gene annotations from 2 curated public repositories, the Comparative Toxicogenomics Database and the Human Phenotype Ontology. We have applied phexpo in 3 case studies linking: (1) individual chemicals (a drug, warfarin, and an industrial chemical, chloroform) with phenotypes, (2) individual phenotypes (left ventricular dysfunction) with chemicals, and (3) multiple phenotypes (covering polycystic ovary syndrome) with chemicals. The results of these analyses demonstrated successful identification of relevant chemicals or phenotypes supported by bibliographic references. The phexpo R package (https://github.com/GHLCLab/phexpo) provides a new bidirectional analyses approach covering relationships from chemicals to phenotypes and from phenotypes to chemicals.
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.

Entities:  

Keywords:  biological ontologies; chemicals and drugs; computational biology/methods; exposome; phenotype

Year:  2020        PMID: 32734156      PMCID: PMC7382647          DOI: 10.1093/jamiaopen/ooaa023

Source DB:  PubMed          Journal:  JAMIA Open        ISSN: 2574-2531


LAY SUMMARY

Chemicals are major contributors to the “exposome,” the whole set of exposures experienced by an individual that shapes their phenotype. Although the effects of exposure to some chemicals can be tested directly this is often not feasible. We conjectured that it might be possible to associate chemicals with phenotypes through the common genes to which they have been independently linked. In this manuscript we present “phexpo,” a novel tool that analyses the overlap between gene lists linked with chemicals or phenotypes. This approach enabled the detection of both known and novel associations between chemicals and phenotypes. Case studies using a chemical, a phenotype, or a combination of phenotypes as a query element demonstrate the application of phexpo. The relationships identified between these entities were supported by evidence from the literature. Phexpo facilitates the establishment and discovery of novel relationships between chemicals and phenotypes and vice versa and therefore provides a valuable new tool for the study of the exposome.

BACKGROUND AND SIGNIFICANCE

Phenotypes, as described in the Human Phenotype Ontology (HPO), are the result of the complex interplay between environmental and genetic factors. Recognition of the importance of environmental factors, coupled with an increasing ability to determine individual exposures, led to the concept of the “exposome” as the whole set of exposures of an individual since conception. The exposome soon gained traction in research as a complement to the genome and has become an important element for the development of new precision medicine applications. During the last century chemistry revolutionized many industries and human activities and as a result, the environment now contains an unprecedented amount and variety of chemicals, such as plasticizers used in bottles, flame retardants in clothes, new drugs, and pesticides used to improve crop yields. This increase in anthropogenic chemicals has fostered advances in toxicology and monitoring, and the plethora of subsequent studies and data are making the “chemical component” one of the best understood and studied elements of the exposome. However, despite some advances, our understanding of the biological effects of most chemical compounds is incomplete and for many there remains significant controversy regarding safety levels and effects upon human health. Biomedical informatics and translational bioinformatics provide analytical tools and develop data repositories to support exposome and toxicological research. Some of these repositories, such as Toxin and Toxin Target Database, Exposome Explorer, or the Comparative Toxicogenomics Database (CTD), integrate different data sources and combine chemical and biological information such as biomarkers or target genes. This integration has enabled the development of analytical approaches using these contents and the relationships between chemicals and genes to uncover potential links between chemicals and biological pathways or diseases. Although these approaches provide valuable insights, Gene Ontology terms and pathways are mostly focused on the biological outcomes at a cellular or molecular level whereas analyses of diseases involve sets of phenotypes. There is therefore an unmet need to implement a method to relate chemical compounds to the different phenotypes described in the HPO. The HPO has formalized the phenotype space by providing descriptions of clinical abnormalities and annotations to both rare and common diseases and is increasingly being used by different actors for data exchange and identification of disease etiology. To facilitate a bidirectional analysis of the relationships between chemicals and phenotypes using gene annotations, we have developed phexpo (phenotype–exposome), a methodology to perform bidirectional enrichment analysis of chemicals and phenotypes. This methodology has been bundled inside an R package. Phexpo incorporates chemical and gene data from CTD and phenotype and gene data from HPO. We refer to phenotypes as HPO terms. Using a chemical- or phenotype-derived gene list built from genes in both data sources, phexpo will provide enriched chemicals or phenotypes, respectively. This demonstrates a novel methodology that combines gene information from CTD and HPO to generate potential associations between chemical exposures and phenotypes.

METHODS

Chemical–gene relationships, chemical vocabulary, and phenotype–gene datasets were downloaded from CTD (February 5, 2019 update) and HPO (ontology version: February 12, 2019), respectively. As our focus is on human phenotypes, CTD datasets were preprocessed to concentrate exclusively on human gene identifiers and generate the relevant gene lists linking chemicals and human genes. A more detailed preprocessing explanation is included in the Supplementary Material. To identify relationships between chemicals and phenotypes, phexpo uses the gene lists derived from the annotations for chemicals and phenotypes and then compares them using a Fisher’s exact test against a background universe of genes generated from the aforementioned annotations. To correct for multiple testing phexpo includes and reports Bonferroni corrected P-values and a false discovery rate using Benjamini and Hochberg corrected P-values. Phexpo is built around 4 analytical functions (Figure 1) and 1 visualization function:
Figure 1.

Diagrammatic representation of phexpo’s processes. (A) Chemicals and phenotypes can be connected via genes. Phexpo’s analytical functions return a table of associated results. (B) Further breakdown of phexpo functions. If a user inserts a chemical into the perfFishTestChem functions enriched phenotypes are returned. Conversely, if a user inserts a phenotype into the perfFishTestHPO functions enriched chemicals are returned.

Diagrammatic representation of phexpo’s processes. (A) Chemicals and phenotypes can be connected via genes. Phexpo’s analytical functions return a table of associated results. (B) Further breakdown of phexpo functions. If a user inserts a chemical into the perfFishTestChem functions enriched phenotypes are returned. Conversely, if a user inserts a phenotype into the perfFishTestHPO functions enriched chemicals are returned. perfFishTestChemSingle(): This function uses a chemical name provided by the user as input and generates a gene list using the chemical–gene dataset. It uses only the genes that are in the intersection of the chemical–gene and phenotype–gene datasets. It calculates the various gene counts against the phenotype–gene dataset and uses them for the R built-in Fisher’s exact test. The function returns a table with all the associated phenotypes. perfFishTestHPOSingle(): This function uses a HPO term as input and carries out a calculation analogous to perfFishTestChemSingle(), but utilizes the phenotype–gene dataset for gene list creation. It calculates gene counts against the chemical–gene dataset to run the Fisher’s exact test and return a table with all the associated chemicals. perfFishTestHPOMultiple(): This function uses a list of different phenotypes. It aggregates all the annotated genes for the given phenotypes into a single gene list for the comparison and then carries out the same role as perfFishTestHPOSingle() to return a table with all the associated chemicals. perfFishTestChemMultiple(): This function uses a list of different chemicals. It aggregates all the annotated genes for the given chemicals into a single gene list for the comparison and then carries out the same role as perfFishTestChemSingle() to return a table with all the associated phenotypes. visEnrich(): This function provides a Shiny interface for the visualization of the results generated using any of the other 4 analytical functions. This function generates a graphical user interface that presents the results in a tabular format and a graphical display enabling the user to manipulate the results using different filtering criteria. The results of the 4 analytical functions are presented in tabular format and include the raw and corrected P-values, the different gene set sizes, and their overlaps.

RESULTS

To facilitate the bidirectional integration of chemicals and phenotypes we developed a new approach built into an R package, “phexpo,” that exploits gene annotations extracted from 2 curated high-quality resources, CTD (for chemical–gene annotations) and HPO (for phenotype–gene annotations). To demonstrate the capabilities of phexpo we present 3 different case studies using its different functions. In order to assess and evaluate the results we manually validated some of the results and provide bibliographic evidence supporting the associations (additional details are in the Supplementary Material).

Case study I—single chemical to phenotype enrichment

To validate the analysis of single chemicals in phexpo we chose 2 diverse, but well-studied compounds with predictable results, a drug (warfarin) and an industrial chemical (chloroform) (Figure 2). For this analysis, we used the function perfFishTestChemSingle(). As expected, the enriched phenotypes for warfarin, including “deep vein thrombosis” and “abnormality of prothrombin,” match its anticoagulant function. For chloroform, we identified liver phenotypes consistent with its known hepatotoxicity. Additional analysis can be found in Supplementary Material.
Figure 2.

Case study I results using the shiny interface. The bar charts interface enables filtering using different criteria (A) shows HPO terms identified for warfarin filtered by Bonferroni correction. (B) HPO terms identified for chloroform filtered by FDR. Full results tables available in the Supplementary Material. Abbreviations: FDR: false discovery rate; HPO: Human Phenotype Ontology.

Case study I results using the shiny interface. The bar charts interface enables filtering using different criteria (A) shows HPO terms identified for warfarin filtered by Bonferroni correction. (B) HPO terms identified for chloroform filtered by FDR. Full results tables available in the Supplementary Material. Abbreviations: FDR: false discovery rate; HPO: Human Phenotype Ontology.

Case study II—single phenotype to chemical enrichment

For the single phenotype case study, we used the HPO term “left ventricular dysfunction” which is described as “inability of the left ventricle to perform its normal physiologic function. Failure is either due to an inability to contract the left ventricle or the inability to relax completely and fill with blood during diastole.” For this analysis, we used the function perfFishTestHPOSingle() and the top 10 results are presented in Table 1.
Table 1.

Top 10 chemicals identified for the HPO term “left ventricular dysfunction”

Chemical name P-valueBonfFDR
Halofuginone1.11E−111.31E−081.31E−08
Nitrofen6.15E−117.22E−083.61E−08
1-Trifluoromethoxyphenyl-3-(1-propionylpiperidine-4-yl) urea1.04E−081.22E−053.54E−06
Streptozocin1.21E−081.42E−053.54E−06
Bleomycin2.03E−082.39E−054.07E−06
Fenofibrate2.08E−082.44E−054.07E−06
Phenylephrine3.33E−083.91E−055.59E−06
Palm Oil6.59E−087.74E−059.48E−06
Doxorubicin9.25E−080.0001099.48E−06
Dietary fats9.65E−080.0001139.48E−06

Arranged by ascending P-value.

Abbreviations: HPO: Human Phenotype Ontology; FDR: false discovery rate.

Top 10 chemicals identified for the HPO term “left ventricular dysfunction Arranged by ascending P-value. Abbreviations: HPO: Human Phenotype Ontology; FDR: false discovery rate. From these results we highlight the following potential relationships, halofuginone has been found to elicit a shielding effect against stress on the heart. 1-Trifluoromethoxyphenyl-3-(1-propionylpiperidine-4-yl) urea is a soluble epoxide hydrolase inhibitor and soluble epoxide hydrolase inhibitors have been suggested as a potential strategy against heart diseases. In animal models streptozocin is used to cause diabetes mellitus, which has a link to diastolic heart dysfunction. The antibiotic bleomycin has known cardiotoxic effects when used in chemotherapy. Fenofibrate given short term has been shown to reduce some of the effects of chronic left ventricular volume overload in rat models. Phenylephrine causes hypertrophy when administered to neonatal rat cardiomyocytes. Tocotrienol rich fractions extracted from palm oil had beneficial heart functioning. Doxorubicin is both a drug used for cancer treatment as well as being a cardiotoxic agent involved in causing heart failure.

Case study III—multiple phenotypes to chemical enrichment

An important feature of our methodology is that it allows for the combination of multiple phenotypes in a single enrichment analysis. This is important for diseases, complex conditions, or syndromes that comprise a variety of phenotypes, which can be stacked together in our analysis. In this case study, we use the example of polycystic ovary syndrome (PCOS). Although the wide range of different phenotypes displayed by women with this endocrine disorder have hindered elucidation of the causes, exposomic and inherited genetic variables are likely to play a part. We compiled a list of phenotypes including “oligomenorrhea,” “enlarged polycystic ovaries,” “amenorrhea,” “hirsutism,” “increased body weight,” and “acne” that characterize PCOS, and performed an enrichment analysis to test whether it returns chemicals with a known relationship to PCOS shown in Table 2.
Table 2.

Top 10 chemical results identified for PCOS phenotypes

Chemical name P-valueBonfFDR
Tetrachlorodibenzodioxin2.38E−391.22E−351.22E−35
Bisphenol A2.70E−321.39E−286.94E−29
Ammonium chloride1.21E−256.23E−222.08E−22
Valproic acid1.50E−197.71E−161.79E−16
Ethylnitrosourea1.74E−198.93E−161.79E−16
Colforsin4.17E−192.15E−153.58E−16
Vehicle emissions6.76E−193.48E−154.96E−16
Diethylhexyl phthalate2.55E−181.31E−141.64E−15
Ethinyl estradiol3.52E−181.81E−142.01E−15
Dexamethasone3.20E−171.65E−131.65E−14

Arranged by ascending P-value.

Abbreviations: FDR: false discovery rate; PCOS: polycystic ovary syndrome.

Top 10 chemical results identified for PCOS phenotypes Arranged by ascending P-value. Abbreviations: FDR: false discovery rate; PCOS: polycystic ovary syndrome. Of the multiple highly enriched chemicals returned, many have documented links with PCOS. Tetrachlorodibenzodioxin is an endocrine-disrupting chemical and although its influence on PCOS has not been specifically assessed it has been highlighted as a suspect for consideration. Individuals with PCOS were found to have increased levels of bisphenol A. Valproic acid treatment has a known association with heightened PCOS occurrence in epileptic patients. Colforsin (forskolin) can function in a similar way to luteinizing hormone on PCOS theca cells. Diethylhexyl phthalate has been shown to have ovarian effects in rats. Ethinyl estradiol is used to combat the acne and hirsutism phenotypes of PCOS. Dexamethasone has been used to increase testosterone production in theca cells to mimic PCOS patients with hyperandrogenism.

DISCUSSION

In this work, we have presented a new approach to establish bidirectional relationships between phenotypes and chemical exposures. This methodology has been successfully implemented in phexpo, a multiplatform R package that is freely available (https://github.com/GHLCLab/phexpo). In contrast to other existing applications our approach allows searches with multiple chemicals or phenotypes simultaneously. This enables users to search (or find) different individual phenotypes rather than diseases that might be too broad or may share phenotypes (or symptoms) that could lead to overlaps. We have successfully tested phexpo and its different functions using a variety of chemicals and phenotypes and have been able to validate the results using bibliographic references. The methodology can therefore be used to discover novel potential relationships that open new avenues of research and direct additional experimental validations and exploration. Although we demonstrate the ability of the approach to generate interesting and validated results, we acknowledge that there are limitations. The results are confined to the annotations present and although we selected high-quality and well-known resources we are currently only using 1 chemical database (CTD) and 1 phenotype ontology (HPO). The associations established between chemicals and phenotypes lack “directionality,” in that a chemical may induce or protect from a certain phenotype, and indeed both were found in the case study lists. Finally, although aggregating phenotypes or chemicals is a powerful tool, the use of an additive approach requires the union of all the annotated genes and does not take account of whether in 2 different phenotypes or chemicals those genes might be affected in different directions (eg, induced by 1 chemical and repressed by another). Other potential limitations are that dosage and timing of exposures that might be relevant for the development of some of the phenotypes are not considered. In conclusion, we have introduced a novel methodology bundled inside an R package called phexpo that links chemical compounds and phenotype terms through enrichment analyses based on their gene annotations. We have described 3 case studies validated through the literature which present phexpo’s functionalities and its capabilities to identify phenotypes related to a chemical and vice versa. Phexpo’s bidirectional approach to study the potential relationships between chemical compounds and human phenotypes provides insights for human health and exposome research. This tool will be a valuable asset to further exposome research by revealing potential novel phenotype–chemical associations.

FUNDING

CH has been supported by a Northern Ireland Department for the Economy (DfE) postgraduate studentship award.

AUTHOR CONTRIBUTIONS

CH constructed the package, contributed to the idea, contributed to the design of the experiments, run the analyses, validated the results, and wrote the manuscript. GL-C created the idea, designed the experiments, validated the results, and wrote the manuscript. DAS contributed to the idea and critical manuscript revision. BD contributed to the idea and critical manuscript revision.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online. Click here for additional data file.
  25 in total

1.  The Exposome: Molecules to Populations.

Authors:  Megan M Niedzwiecki; Douglas I Walker; Roel Vermeulen; Marc Chadeau-Hyam; Dean P Jones; Gary W Miller
Journal:  Annu Rev Pharmacol Toxicol       Date:  2018-08-10       Impact factor: 13.820

Review 2.  Environmental determinants of polycystic ovary syndrome.

Authors:  Sharon Stein Merkin; Jennifer L Phy; Cynthia K Sites; Dongzi Yang
Journal:  Fertil Steril       Date:  2016-05-27       Impact factor: 7.329

3.  Evaluate the effects of antiepileptic drugs on reproductive endocrine system in newly diagnosed female epileptic patients receiving either Valproate or Lamotrigine monotherapy: A prospective study.

Authors:  Harpreet Singh Sidhu; R Srinivasa; Akshay Sadhotra
Journal:  Epilepsy Res       Date:  2017-10-28       Impact factor: 3.045

4.  Fenofibrate reduces cardiac remodeling and improves cardiac function in a rat model of severe left ventricle volume overload.

Authors:  Wahiba Dhahri; Jacques Couet; Élise Roussel; Marie-Claude Drolet; Marie Arsenault
Journal:  Life Sci       Date:  2012-11-07       Impact factor: 5.037

5.  Possible Ameliorative Effect of Ivabradine on the Autonomic and Left Ventricular Dysfunction Induced by Doxorubicin in Male Rats.

Authors:  Amany E El-Naggar; Sahar M El-Gowilly; Fouad M Sharabi
Journal:  J Cardiovasc Pharmacol       Date:  2018-07       Impact factor: 3.105

Review 6.  CLARITY-BPA academic laboratory studies identify consistent low-dose Bisphenol A effects on multiple organ systems.

Authors:  Gail S Prins; Heather B Patisaul; Scott M Belcher; Laura N Vandenberg
Journal:  Basic Clin Pharmacol Toxicol       Date:  2018-10-17       Impact factor: 4.080

Review 7.  Human Indoor Exposome of Chemicals in Dust and Risk Prioritization Using EPA's ToxCast Database.

Authors:  Ting Dong; Yingdan Zhang; Shenglan Jia; Hongtao Shang; Wenjuan Fang; Da Chen; Mingliang Fang
Journal:  Environ Sci Technol       Date:  2019-05-28       Impact factor: 9.028

8.  T3DB: the toxic exposome database.

Authors:  David Wishart; David Arndt; Allison Pon; Tanvir Sajed; An Chi Guo; Yannick Djoumbou; Craig Knox; Michael Wilson; Yongjie Liang; Jason Grant; Yifeng Liu; Seyed Ali Goldansaz; Stephen M Rappaport
Journal:  Nucleic Acids Res       Date:  2014-11-05       Impact factor: 19.160

9.  Activation of the Amino Acid Response Pathway Blunts the Effects of Cardiac Stress.

Authors:  Pu Qin; Pelin Arabacilar; Roberta E Bernard; Weike Bao; Alan R Olzinski; Yuanjun Guo; Hind Lal; Stephen H Eisennagel; Michael C Platchek; Wensheng Xie; Julius Del Rosario; Mohamad Nayal; Quinn Lu; Theresa Roethke; Christine G Schnackenberg; Fe Wright; Michael P Quaile; Wendy S Halsey; Ashley M Hughes; Ganesh M Sathe; George P Livi; Robert B Kirkpatrick; Xiaoyan A Qu; Deepak K Rajpal; Maria Faelth Savitski; Marcus Bantscheff; Gerard Joberty; Giovanna Bergamini; Thomas L Force; Gregory J Gatto; Erding Hu; Robert N Willette
Journal:  J Am Heart Assoc       Date:  2017-05-09       Impact factor: 5.501

10.  Chronic, Recreational Chloroform-Induced Liver Injury.

Authors:  Emily A Minor; Mackenzie S Newman; Justin T Kupec
Journal:  Case Reports Hepatol       Date:  2018-09-10
View more
  1 in total

1.  Capturing a Comprehensive Picture of Biological Events From Adverse Outcome Pathways in the Drug Exposome.

Authors:  Qier Wu; Youcef Bagdad; Olivier Taboureau; Karine Audouze
Journal:  Front Public Health       Date:  2021-12-17
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.