Literature DB >> 35789222

Large-Scale Validation of Hypothesis Generation Systems via Candidate Ranking.

Justin Sybrandt1, Michael Shtutman2, Ilya Safro1.   

Abstract

The first step of many research projects is to define and rank a short list of candidates for study. In the modern rapidity of scientific progress, some turn to automated hypothesis generation (HG) systems to aid this process. These systems can identify implicit or overlooked connections within a large scientific corpus, and while their importance grows alongside the pace of science, they lack thorough validation. Without any standard numerical evaluation method, many validate general-purpose HG systems by rediscovering a handful of historical findings, and some wishing to be more thorough may run laboratory experiments based on automatic suggestions. These methods are expensive, time consuming, and cannot scale. Thus, we present a numerical evaluation framework for the purpose of validating HG systems that leverages thousands of validation hypotheses. This method evaluates a HG system by its ability to rank hypotheses by plausibility; a process reminiscent of human candidate selection. Because HG systems do not produce a ranking criteria, specifically those that produce topic models, we additionally present novel metrics to quantify the plausibility of hypotheses given topic model system output. Finally, we demonstrate that our proposed validation method aligns with real-world research goals by deploying our method within MOLIERE, our recent topic-driven HG system, in order to automatically generate a set of candidate genes related to HIV-associated neurodegenerative disease (HAND). By performing laboratory experiments based on this candidate set, we discover a new connection between HAND and Dead Box RNA Helicase 3 (DDX3). Reproducibility: code, validation data, and results can be found at sybrandt.com/2018/validation.

Entities:  

Keywords:  Applied Data Science; Hypothesis Generation; Literature Based Discovery; Scientific Text Mining

Year:  2019        PMID: 35789222      PMCID: PMC9248026          DOI: 10.1109/bigdata.2018.8622637

Source DB:  PubMed          Journal:  Proc IEEE Int Conf Big Data


  25 in total

1.  Knowledge discovery by automated identification and ranking of implicit relationships.

Authors:  Jonathan D Wren; Raffi Bekeredjian; Jelena A Stewart; Ralph V Shohet; Harold R Garner
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

2.  Epidemiologic links between drug use and HIV epidemics: an international perspective.

Authors:  Chris Beyrer; Andrea L Wirtz; Stefan Baral; Alena Peryskina; Frangiscos Sifakis
Journal:  J Acquir Immune Defic Syndr       Date:  2010-12       Impact factor: 3.731

Review 3.  Cocaine and HIV-1 interplay in CNS: cellular and molecular mechanisms.

Authors:  Shilpa Buch; Honghong Yao; Minglei Guo; Tomohisa Mori; Blaise Mathias-Costa; Vijeta Singh; Pankaj Seth; John Wang; Tsung-Ping Su
Journal:  Curr HIV Res       Date:  2012-07       Impact factor: 1.581

4.  Ensemble non-negative matrix factorization methods for clustering protein-protein interactions.

Authors:  Derek Greene; Gerard Cagney; Nevan Krogan; Pádraig Cunningham
Journal:  Bioinformatics       Date:  2008-06-12       Impact factor: 6.937

5.  Using ARROWSMITH: a computer-assisted approach to formulating and assessing scientific hypotheses.

Authors:  N R Smalheiser; D R Swanson
Journal:  Comput Methods Programs Biomed       Date:  1998-11       Impact factor: 5.428

6.  Finding complex biological relationships in recent PubMed articles using Bio-LDA.

Authors:  Huijun Wang; Ying Ding; Jie Tang; Xiao Dong; Bing He; Judy Qiu; David J Wild
Journal:  PLoS One       Date:  2011-03-23       Impact factor: 3.240

7.  Fish oil, Raynaud's syndrome, and undiscovered public knowledge.

Authors:  D R Swanson
Journal:  Perspect Biol Med       Date:  1986       Impact factor: 1.416

8.  SemMedDB: a PubMed-scale repository of biomedical semantic predications.

Authors:  Halil Kilicoglu; Dongwook Shin; Marcelo Fiszman; Graciela Rosemblat; Thomas C Rindflesch
Journal:  Bioinformatics       Date:  2012-10-08       Impact factor: 6.937

9.  Representation of research hypotheses.

Authors:  Larisa N Soldatova; Andrey Rzhetsky
Journal:  J Biomed Semantics       Date:  2011-05-17

10.  Artificial intelligence in neurodegenerative disease research: use of IBM Watson to identify additional RNA-binding proteins altered in amyotrophic lateral sclerosis.

Authors:  Nadine Bakkar; Tina Kovalik; Ileana Lorenzini; Scott Spangler; Alix Lacoste; Kyle Sponaugle; Philip Ferrante; Elenee Argentinis; Rita Sattler; Robert Bowser
Journal:  Acta Neuropathol       Date:  2017-11-13       Impact factor: 17.088

View more
  1 in total

1.  Drug repurposing for COVID-19 via knowledge graph completion.

Authors:  Rui Zhang; Dimitar Hristovski; Dalton Schutte; Andrej Kastrin; Marcelo Fiszman; Halil Kilicoglu
Journal:  J Biomed Inform       Date:  2021-02-08       Impact factor: 8.000

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.