A M Cohen1, W R Hersh, C Dubay, K Spackman. 1. Department of Medical Informatics and Clinical Epidemiology, School of Medicine, Oregon Health & Science University, 3181 S,W, Sam Jackson Park Road, Portland, Oregon 97239-3098, USA. cohenaa@ohsu.edu
Abstract
BACKGROUND: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction. RESULTS: Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs. CONCLUSION: The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.
BACKGROUND: Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction. RESULTS: Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs. CONCLUSION: The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.
Authors: Tatsuya Abe; Kinya Terada; Hiroaki Wakimoto; Ryo Inoue; Edyta Tyminski; Robert Bookstein; James P Basilion; E Antonio Chiocca Journal: Cancer Res Date: 2003-05-01 Impact factor: 12.701
Authors: Qing T Zeng; Tony Tse; Guy Divita; Alla Keselman; Jon Crowell; Allen C Browne; Sergey Goryachev; Long Ngo Journal: J Med Internet Res Date: 2007-02-28 Impact factor: 5.428