MOTIVATION: The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine-learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs). RESULTS: We present Totum, a machine learning solution that harmonizes gene/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving ≈70%) in exact alignment and 22% (achieving ≈82%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts. AVAILABILITY AND IMPLEMENTATION: Totum is implemented in Java and its resources are available at http://bioinformatics.ua.pt/totum
MOTIVATION: The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine-learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs). RESULTS: We present Totum, a machine learning solution that harmonizes gene/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving ≈70%) in exact alignment and 22% (achieving ≈82%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts. AVAILABILITY AND IMPLEMENTATION: Totum is implemented in Java and its resources are available at http://bioinformatics.ua.pt/totum
Authors: Wasila Dahdul; Prashanti Manda; Hong Cui; James P Balhoff; T Alexander Dececchi; Nizar Ibrahim; Hilmar Lapp; Todd Vision; Paula M Mabee Journal: Database (Oxford) Date: 2018-01-01 Impact factor: 3.451
Authors: Johannes Birgmeier; Maximilian Haeussler; Cole A Deisseroth; Ethan H Steinberg; Karthik A Jagadeesh; Alexander J Ratner; Harendra Guturu; Aaron M Wenger; Mark E Diekhans; Peter D Stenson; David N Cooper; Christopher Ré; Alan H Beggs; Jonathan A Bernstein; Gill Bejerano Journal: Sci Transl Med Date: 2020-05-20 Impact factor: 19.319