| Literature DB >> 31259012 |
Ross Kleiman1,2, Finn Kuusisto3,2, Ian Ross1, Peggy L Peissig4, Ron Stewart3, C David Page1, Jeremy Weiss5.
Abstract
Epidemiological studies identifying biological markers of disease state are valuable, but can be time-consuming, expensive, and require extensive intuition and expertise. Furthermore, not all hypothesized markers will be borne out in a study, suggesting that higher quality initial hypotheses are crucial. In this work, we propose a high-throughput pipeline to produce a ranked list of high-quality hypothesized marker laboratory tests for diagnoses. Our pipeline generates a large number of candidate lab-diagnosis hypotheses derived from machine learning models, filters and ranks them according to their potential novelty using text mining, and corroborate final hypotheses with logistic regression analysis. We test our approach on a large electronic health record dataset and the PubMed corpus, and find several promising candidate hypotheses.Year: 2019 PMID: 31259012 PMCID: PMC6568080
Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc