Literature DB >> 32274464

Novel application of normalized pointwise mutual information (NPMI) to mine biomedical literature for gene sets associated with disease: use case in breast carcinogenesis.

Sean M Watford1,2, Rachel G Grashow3,4, Vanessa Y De La Rosa3,5, Ruthann A Rudel3, Katie Paul Friedman6, Matthew T Martin7,6.   

Abstract

Advances in technology within biomedical sciences have led to an inundation of data across many fields, raising new challenges in how best to integrate and analyze these resources. For example, rapid chemical screening programs like the US Environmental Protection Agency's ToxCast and the collaborative effort, Tox21, have produced massive amounts of information on putative chemical mechanisms where assay targets are identified as genes; however, systematically linking these hypothesized mechanisms with in vivo toxicity endpoints like disease outcomes remains problematic. Herein we present a novel use of normalized pointwise mutual information (NPMI) to mine biomedical literature for gene associations with biological concepts as represented by Medical Subject Headings (MeSH terms) in PubMed. Resources that tag genes to articles were integrated, then cross-species orthologs were identified using UniRef50 clusters. MeSH term frequency was normalized to reflect the MeSH tree structure, and then the resulting GeneID-MeSH associations were ranked using NPMI. The resulting network, called Entity MeSH Co-occurrence Network (EMCON), is a scalable resource for the identification and ranking of genes for a given topic of interest. The utility of EMCON was evaluated with the use case of breast carcinogenesis. Topics relevant to breast carcinogenesis were used to query EMCON and retrieve genes important to each topic. A breast cancer gene set was compiled through expert literature review (ELR) to assess performance of the search results. We found that the results from EMCON ranked the breast cancer genes from ELR higher than randomly selected genes with a recall of 0.98. Precision of the top five genes for selected topics was calculated as 0.87. This work demonstrates that EMCON can be used to link in vitro results to possible biological outcomes, thus aiding in generation of testable hypotheses for furthering understanding of biological function and the contribution of chemical exposures to disease.

Entities:  

Keywords:  Biomedical Literature; Breast Carcinogenesis; Chemical Exposures; Data Integration; Genes; Literature Mining

Year:  2018        PMID: 32274464      PMCID: PMC7144681          DOI: 10.1016/j.comtox.2018.06.003

Source DB:  PubMed          Journal:  Comput Toxicol        ISSN: 2468-1113


  2 in total

Review 1.  Progress in data interoperability to support computational toxicology and chemical safety evaluation.

Authors:  Sean Watford; Stephen Edwards; Michelle Angrish; Richard S Judson; Katie Paul Friedman
Journal:  Toxicol Appl Pharmacol       Date:  2019-08-09       Impact factor: 4.219

2.  Environmental mixtures and breast cancer: identifying co-exposure patterns between understudied vs breast cancer-associated chemicals using chemical inventory informatics.

Authors:  Lauren E Koval; Kathie L Dionisio; Katie Paul Friedman; Kristin K Isaacs; Julia E Rager
Journal:  J Expo Sci Environ Epidemiol       Date:  2022-06-16       Impact factor: 5.563

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.