| Literature DB >> 21347147 |
Vincent Yip1, Mutlu Mete, Umit Topaloglu, Sinan Kockara.
Abstract
A large amount of valuable information is available in plain text clinical reports. New techniques and technologies are applied to extract information from these reports. One of the leading systems in the cancer community is the Cancer Text Information Extraction System (caTIES), which was developed with caBIG-compliant data structures. caTIES embedded two key components for extracting data: MMTx and GATE. In this paper, an n-gram based framework is proven to be capable of discovering concepts from text reports. MetaMap is used to map medical terms to the National Cancer Institute (NCI) Metathesaurus and the Unified Medical Language System (UMLS) Metathesaurus for verifying legitimate medical data. The final concepts from our framework and caTIES are weighted based on our scoring model. The scores show that, on average, our framework scores higher than caTIES on 848 (36.9%) of reports. Furthermore, 1388 (60.5%) of reports have similar performances on both systems.Entities:
Year: 2010 PMID: 21347147 PMCID: PMC3041542
Source DB: PubMed Journal: Summit Transl Bioinform ISSN: 2153-6430
Score comparisons with different parameter specifications (sorted by MNOG)
| - | - | caTIES | 1185 | 2 |
| 3 | 3 | n-gram | ||
| - | - | caTIES | 1262 | 7 |
| 3 | 5 | n-gram | ||
| - | - | caTIES | 1679 | 28 |
| 3 | 10 | n-gram | ||
| - | - | caTIES | 1150 | 1 |
| 4 | 3 | n-gram | ||
| - | - | caTIES | 1328 | 58 |
| 4 | 5 | n-gram | ||
| - | - | caTIES | 1680 | 55 |
| 4 | 10 | n-gram | ||
| - | - | caTIES | 1346 | 2 |
| 5 | 3 | n-gram | ||
| - | - | caTIES | 1276 | 182 |
| 5 | 5 | n-gram | ||
| - | - | caTIES | 1588 | 195 |
| 5 | 10 | n-gram | ||
| caTIES | 1388 | 59 | ||
| n-gram | ||||
Φ is the score difference for each report between two sample systems.