Anthony Rios1,2, Ramakanth Kavuluru2,3, Zhiyong Lu1. 1. National Library of Medicine (NLM), National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH), Bethesda, MD, USA. 2. Department of Computer Science, University of Kentucky, Lexington, KY, USA. 3. Division of Biomedical Informatics, Department of Internal Medicine, Lexington, KY, USA.
Abstract
Motivation: Creating large datasets for biomedical relation classification can be prohibitively expensive. While some datasets have been curated to extract protein-protein and drug-drug interactions (PPIs and DDIs) from text, we are also interested in other interactions including gene-disease and chemical-protein connections. Also, many biomedical researchers have begun to explore ternary relationships. Even when annotated data are available, many datasets used for relation classification are inherently biased. For example, issues such as sample selection bias typically prevent models from generalizing in the wild. To address the problem of cross-corpora generalization, we present a novel adversarial learning algorithm for unsupervised domain adaptation tasks where no labeled data are available in the target domain. Instead, our method takes advantage of unlabeled data to improve biased classifiers through learning domain-invariant features via an adversarial process. Finally, our method is built upon recent advances in neural network (NN) methods. Results: We experiment by extracting PPIs and DDIs from text. In our experiments, we show domain invariant features can be learned in NNs such that classifiers trained for one interaction type (protein-protein) can be re-purposed to others (drug-drug). We also show that our method can adapt to different source and target pairs of PPI datasets. Compared to prior convolutional and recurrent NN-based relation classification methods without domain adaptation, we achieve improvements as high as 30% in F1-score. Likewise, we show improvements over state-of-the-art adversarial methods. Availability and implementation: Experimental code is available at https://github.com/bionlproc/adversarial-relation-classification. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Creating large datasets for biomedical relation classification can be prohibitively expensive. While some datasets have been curated to extract protein-protein and drug-drug interactions (PPIs and DDIs) from text, we are also interested in other interactions including gene-disease and chemical-protein connections. Also, many biomedical researchers have begun to explore ternary relationships. Even when annotated data are available, many datasets used for relation classification are inherently biased. For example, issues such as sample selection bias typically prevent models from generalizing in the wild. To address the problem of cross-corpora generalization, we present a novel adversarial learning algorithm for unsupervised domain adaptation tasks where no labeled data are available in the target domain. Instead, our method takes advantage of unlabeled data to improve biased classifiers through learning domain-invariant features via an adversarial process. Finally, our method is built upon recent advances in neural network (NN) methods. Results: We experiment by extracting PPIs and DDIs from text. In our experiments, we show domain invariant features can be learned in NNs such that classifiers trained for one interaction type (protein-protein) can be re-purposed to others (drug-drug). We also show that our method can adapt to different source and target pairs of PPI datasets. Compared to prior convolutional and recurrent NN-based relation classification methods without domain adaptation, we achieve improvements as high as 30% in F1-score. Likewise, we show improvements over state-of-the-art adversarial methods. Availability and implementation: Experimental code is available at https://github.com/bionlproc/adversarial-relation-classification. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Jeffrey T Leek; Robert B Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W Evan Johnson; Donald Geman; Keith Baggerly; Rafael A Irizarry Journal: Nat Rev Genet Date: 2010-09-14 Impact factor: 53.242
Authors: Serkan Ayvaz; John Horn; Oktie Hassanzadeh; Qian Zhu; Johann Stan; Nicholas P Tatonetti; Santiago Vilar; Mathias Brochhausen; Matthias Samwald; Majid Rastegar-Mojarad; Michel Dumontier; Richard D Boyce Journal: J Biomed Inform Date: 2015-04-24 Impact factor: 6.317
Authors: Sampo Pyysalo; Filip Ginter; Juho Heimonen; Jari Björne; Jorma Boberg; Jouni Järvinen; Tapio Salakoski Journal: BMC Bioinformatics Date: 2007-02-09 Impact factor: 3.169
Authors: Anthony Rios; Eric B Durbin; Isaac Hands; Susanne M Arnold; Darshil Shah; Stephen M Schwartz; Bernardo H L Goulart; Ramakanth Kavuluru Journal: J Biomed Inform Date: 2019-08-08 Impact factor: 6.317