| Literature DB >> 17090325 |
Yang Jin1, Ryan T McDonald, Kevin Lerman, Mark A Mandel, Steven Carroll, Mark Y Liberman, Fernando C Pereira, Raymond S Winters, Peter S White.
Abstract
BACKGROUND: The rapid proliferation of biomedical text makes it increasingly difficult for researchers to identify, synthesize, and utilize developed knowledge in their fields of interest. Automated information extraction procedures can assist in the acquisition and management of this knowledge. Previous efforts in biomedical text mining have focused primarily upon named entity recognition of well-defined molecular objects such as genes, but less work has been performed to identify disease-related objects and concepts. Furthermore, promise has been tempered by an inability to efficiently scale approaches in ways that minimize manual efforts and still perform with high accuracy. Here, we have applied a machine-learning approach previously successful for identifying molecular entities to a disease concept to determine if the underlying probabilistic model effectively generalizes to unrelated concepts with minimal manual intervention for model retraining.Entities:
Mesh:
Year: 2006 PMID: 17090325 PMCID: PMC1657036 DOI: 10.1186/1471-2105-7-492
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Top 25 MTag identified mentions and their corresponding PubMED keyword and MEDLINE exact string matching search results.
| carcinoma | True Positive | 861214 | 466958 | 891996 |
| breast neoplasms | True Positive | 129096 | 133592 | 137445 |
| adenocarcinoma | True Positive | 166302 | 208117 | 183654 |
| lung neoplasms | True Positive | 104176 | 110378 | 111869 |
| pulmonary | False Positive | |||
| breast cancer | True Positive | 91446 | 147286 | 128381 |
| lymphoma | True Positive | 182764 | 158674 | 226407 |
| liver neoplasms | True Positive | 69513 | 84529 | 84712 |
| fibroblasts | False Positive | |||
| skin neoplasms | True Positive | 62282 | 66072 | 66105 |
| neoplastic | False Positive | |||
| neoplasm metastasis | False Positive | |||
| brain neoplasms | True Positive | 58729 | 84636 | 63586 |
| stomach neoplasms | True Positive | 50019 | 52566 | 55208 |
| prostatic neoplasms | True Positive | 48042 | 49110 | 50312 |
| leukemia | True Positive | 163011 | 190798 | 368980 |
| colonic neoplasms | True Positive | 41327 | 47402 | 42841 |
| cervical neoplasms | True Positive | 40998 | 41424 | 41717 |
| sarcoma | True Positive | 142665 | 110920 | 242654 |
| bone neoplasms | True Positive | 33568 | 73429 | 35091 |
| melanoma | True Positive | 79519 | 61134 | 126681 |
| pancreatic neoplasms | True Positive | 31598 | 33775 | 33291 |
| extramural | False Positive | |||
| lung cancer | True Positive | 53601 | 118679 | 66071 |
| abdominal | False Positive |
Figure 1Example of the HTML output of MTag for an annotated abstract [31]. Malignancy type mentions identified by MTag are shown in bold, italicized, and blue text.