Yuan Luo1, Aliyah R Sohani2, Ephraim P Hochberg3, Peter Szolovits1. 1. Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. 2. Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Cambridge, Massachusetts, USA. 3. Center for Lymphoma, Massachusetts General Hospital, Cambridge, Massachusetts, USA Department of Medicine, Harvard Medical School, Cambridge, Massachusetts, USA.
Abstract
OBJECTIVE: Pathology reports are rich in narrative statements that encode a complex web of relations among medical concepts. These relations are routinely used by doctors to reason on diagnoses, but often require hand-crafted rules or supervised learning to extract into prespecified forms for computational disease modeling. We aim to automatically capture relations from narrative text without supervision. METHODS: We design a novel framework that translates sentences into graph representations, automatically mines sentence subgraphs, reduces redundancy in mined subgraphs, and automatically generates subgraph features for subsequent classification tasks. To ensure meaningful interpretations over the sentence graphs, we use the Unified Medical Language System Metathesaurus to map token subsequences to concepts, and in turn sentence graph nodes. We test our system with multiple lymphoma classification tasks that together mimic the differential diagnosis by a pathologist. To this end, we prevent our classifiers from looking at explicit mentions or synonyms of lymphomas in the text. RESULTS AND CONCLUSIONS: We compare our system with three baseline classifiers using standard n-grams, full MetaMap concepts, and filtered MetaMap concepts. Our system achieves high F-measures on multiple binary classifications of lymphoma (Burkitt lymphoma, 0.8; diffuse large B-cell lymphoma, 0.909; follicular lymphoma, 0.84; Hodgkin lymphoma, 0.912). Significance tests show that our system outperforms all three baselines. Moreover, feature analysis identifies subgraph features that contribute to improved performance; these features agree with the state-of-the-art knowledge about lymphoma classification. We also highlight how these unsupervised relation features may provide meaningful insights into lymphoma classification. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
OBJECTIVE: Pathology reports are rich in narrative statements that encode a complex web of relations among medical concepts. These relations are routinely used by doctors to reason on diagnoses, but often require hand-crafted rules or supervised learning to extract into prespecified forms for computational disease modeling. We aim to automatically capture relations from narrative text without supervision. METHODS: We design a novel framework that translates sentences into graph representations, automatically mines sentence subgraphs, reduces redundancy in mined subgraphs, and automatically generates subgraph features for subsequent classification tasks. To ensure meaningful interpretations over the sentence graphs, we use the Unified Medical Language System Metathesaurus to map token subsequences to concepts, and in turn sentence graph nodes. We test our system with multiple lymphoma classification tasks that together mimic the differential diagnosis by a pathologist. To this end, we prevent our classifiers from looking at explicit mentions or synonyms of lymphomas in the text. RESULTS AND CONCLUSIONS: We compare our system with three baseline classifiers using standard n-grams, full MetaMap concepts, and filtered MetaMap concepts. Our system achieves high F-measures on multiple binary classifications of lymphoma (Burkitt lymphoma, 0.8; diffuse large B-cell lymphoma, 0.909; follicular lymphoma, 0.84; Hodgkin lymphoma, 0.912). Significance tests show that our system outperforms all three baselines. Moreover, feature analysis identifies subgraph features that contribute to improved performance; these features agree with the state-of-the-art knowledge about lymphoma classification. We also highlight how these unsupervised relation features may provide meaningful insights into lymphoma classification. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Authors: Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute Journal: J Am Med Inform Assoc Date: 2010 Sep-Oct Impact factor: 4.497
Authors: Matija Snuderl; Olga K Kolman; Yi-Bin Chen; Jessie J Hsu; Adam M Ackerman; Paola Dal Cin; Judith A Ferry; Nancy Lee Harris; Robert P Hasserjian; Lawrence R Zukerberg; Jeremy S Abramson; Ephraim P Hochberg; Hang Lee; Alfred I Lee; Christiana E Toomey; Aliyah R Sohani Journal: Am J Surg Pathol Date: 2010-03 Impact factor: 6.394
Authors: Hua Xu; Shane P Stenner; Son Doan; Kevin B Johnson; Lemuel R Waitman; Joshua C Denny Journal: J Am Med Inform Assoc Date: 2010 Jan-Feb Impact factor: 4.497
Authors: Aliyah R Sohani; Elaine S Jaffe; Nancy Lee Harris; Judith A Ferry; Stefania Pittaluga; Robert P Hasserjian Journal: Am J Surg Pathol Date: 2011-11 Impact factor: 6.394
Authors: Daniel Albright; Arrick Lanfranchi; Anwen Fredriksen; William F Styler; Colin Warner; Jena D Hwang; Jinho D Choi; Dmitriy Dligach; Rodney D Nielsen; James Martin; Wayne Ward; Martha Palmer; Guergana K Savova Journal: J Am Med Inform Assoc Date: 2013-01-25 Impact factor: 4.497
Authors: Zexian Zeng; Xiaoyu Li; Sasa Espino; Ankita Roy; Kristen Kitsch; Susan Clare; Seema Khan; Yuan Luo Journal: AMIA Annu Symp Proc Date: 2018-04-16
Authors: Juan Antonio Lossio-Ventura; William Hogan; François Modave; Yi Guo; Zhe He; Amanda Hicks; Jiang Bian Journal: Proceedings (IEEE Int Conf Bioinformatics Biomed) Date: 2017-12-18
Authors: Yuan Luo; William K Thompson; Timothy M Herr; Zexian Zeng; Mark A Berendsen; Siddhartha R Jonnalagadda; Matthew B Carson; Justin Starren Journal: Drug Saf Date: 2017-11 Impact factor: 5.606