| Literature DB >> 19761568 |
Nigam H Shah1, Nipun Bhatia, Clement Jonquet, Daniel Rubin, Annie P Chiang, Mark A Musen.
Abstract
The National Center for Biomedical Ontology (NCBO) is developing a system for automated, ontology-based access to online biomedical resources (Shah NH, et al.: Ontology-driven indexing of public datasets for translational bioinformatics. BMC Bioinformatics 2009, 10(Suppl 2):S1). The system's indexing workflow processes the text metadata of diverse resources such as datasets from GEO and ArrayExpress to annotate and index them with concepts from appropriate ontologies. This indexing requires the use of a concept-recognition tool to identify ontology concepts in the resource's textual metadata. In this paper, we present a comparison of two concept recognizers - NLM's MetaMap and the University of Michigan's Mgrep. We utilize a number of data sources and dictionaries to evaluate the concept recognizers in terms of precision, recall, speed of execution, scalability and customizability. Our evaluations demonstrate that Mgrep has a clear edge over MetaMap for large-scale service oriented applications. Based on our analysis we also suggest areas of potential improvements for Mgrep. We have subsequently used Mgrep to build the Open Biomedical Annotator service. The Annotator service has access to a large dictionary of biomedical terms derived from the United Medical Language System (UMLS) and NCBO ontologies. The Annotator also leverages the hierarchical structure of the ontologies and their mappings to expand annotations. The Annotator service is available to the community as a REST Web service for creating ontology-based annotations of their data.Entities:
Mesh:
Year: 2009 PMID: 19761568 PMCID: PMC2745685 DOI: 10.1186/1471-2105-10-S9-S14
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Concept recognition. The figure shows the working of a generic concept recognizer, which maps the text 'deficient' to the concept of 'Deficiency' in a hierarchical dictionary of concepts.
Size and number of elements of data sources
| ClinicalTrials.gov | 50303 | 99 mb |
| Gold Miner (Subset) | 2085 | 0.5 mb |
| Gene Expression Omnibus | 2085 | 0.7 mb |
| PubMed (Subset) | 2827 | 3.7 mb |
The size and number of concepts in each of the dictionaries SNOMED-CT, Diseases, FMA and Biological Processes (from GO)
| SNOMED-CT | 48 MB | 1,139,586 |
| Diseases | 38 MB | 764,420 |
| FMA (Body Parts) | 4.8 MB | 93,335 |
| Biological Processes | 1.18 MB | 31,294 |
Total number of concepts recognized by Mgrep and MetaMap across all resources using the biological process and diseases dictionaries
| Clinical Trials | 10 | 106 | 409 | 710 |
| Gold Miner | 12 | 80 | 753 | 1283 |
| GEO | 136 | 188 | 337 | 704 |
| MedLine subset | 26 | 48 | 22 | 209 |
MG = Mgrep; MM = MetaMap
Total number of concepts recognized by Mgrep and MetaMap across all resources using the Foundational Model of Anatomy and SNOMED-CT as dictionaries
| Clinical Trials | 243 | 380 | 1548 | 1730 |
| Gold Miner | 671 | 1097 | 3747 | 3400 |
| GEO | 272 | 818 | 2228 | 2372 |
| MedLine subset | 57 | 132 | 1320 | 1088 |
MG = Mgrep; MM = MetaMap
Precision of Mgrep and MetaMap using Biological Processes as the dictionary
| Clinical Trials | 0.6 | 0.63 |
| Gold Miner | 0.58 | 0.33 |
| GEO | 0.93 | 0.73 |
| MedLine | 0.77 | 0.76 |
Precision of Mgrep and MetaMap using the 'diseases' dictionary
| Clincal Trials | 0.87 | 0.71 |
| Gold Miner | 0.73 | 0.548 |
| GEO | 0.88 | 0.755 |
| MedLine | 0.23 | 0.091 |
Figure 2Annotator Web service workflow. The figure shows the Annotator Web service workflow. First, direct annotations are created from raw text based on syntactic concept recognition according to a dictionary that use terms (concept names and synonyms) from both UMLS and NCBO ontologies. Second, different components expand the first set of annotations using ontology semantics (e.g., subsumption relationships and mappings between ontologies).
Figure 3User interface for accessing the Annotator Web service. The figure shows a user interface for accessing the Annotator Web service. This UI enables users to figure out the best parameters to use in the programmatic service calls by allowing them to select different settings for ontologies to use, semantic types to restrict to as well as whether to use the semantic expansion components or not