| Literature DB >> 29980203 |
Yiqing Zhao1,2, Nooshin J Fesharaki1, Hongfang Liu2, Jake Luo3.
Abstract
BACKGROUND: The use of knowledge models facilitates information retrieval, knowledge base development, and therefore supports new knowledge discovery that ultimately enables decision support applications. Most existing works have employed machine learning techniques to construct a knowledge base. However, they often suffer from low precision in extracting entity and relationships. In this paper, we described a data-driven sublanguage pattern mining method that can be used to create a knowledge model. We combined natural language processing (NLP) and semantic network analysis in our model generation pipeline.Entities:
Keywords: Big data analysis; Information extraction; Knowledge modeling; Medical imaging; Natural language processing; Semantic network; Sublanguage analysis; Text mining
Mesh:
Year: 2018 PMID: 29980203 PMCID: PMC6035419 DOI: 10.1186/s12911-018-0645-3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1System pipeline: (1) Corpus development (using Jsoup), (2) Syntactic processing (using Stanford Parser), (3) Semantic processing (using UMLS Annotator), (4) Knowledge model generation
Fig. 2Co-occurrence network of top 40 semantic types (subgraph). The thickness of the edge demonstrates weight (the number of co-occurrence incidences); a thicker edge means more co-occurrence incidences exist in the relation. The size of the nodes indicates connectivity (the number of other nodes connected to it). The network graph represents the complexity of the semantic co-occurrence pattern of semantic types in imaging notes
Regrouping of UMLS semantic types to form 14 semantic categories (four conceptually important semantic types are marked with “*”)
| New Semantic Category | Included UMLS Semantic Types | Type Counts |
|---|---|---|
| Abnormality | Anatomical Abnormality, Acquired Abnormality, Congenital Abnormality | 3 |
| Body Part | Body Part Organ or Organ Component, Body Substance, Body System*, Tissue, Cell, Gene or Genome, Receptor* | 7 |
| Classification | Classification | 1 |
| Functional Concept | Functional Concept | 1 |
| Location | Spatial Concept, Body Location or Region, Body Space or Junction | 3 |
| Medical Activity | Diagnostic Procedure, Therapeutic or Preventive Procedure, Laboratory Procedure, Health Care Activity, Research Activity, Activity | 6 |
| Medical Device and Object | Medical Device, Manufactured Object | 2 |
| Observation | Finding, Sign or Symptom, Injury or Poisoning, Laboratory or Test Result, Phenomenon or Process | 5 |
| Pathology | Disease or Syndrome, Neoplastic Process, Mental or Behavioral Dysfunction, Cell or Molecular Dysfunction, Pathologic Function | 5 |
| Physiology | Cell Function, Organ or Tissue Function, Organism Function*, Physiologic Function | 4 |
| Qualitative Concept | Qualitative Concept | 1 |
| Quantitative Concept | Quantitative Concept | 1 |
| Substance | Pharmacologic Substance, Substance, Biologically Active Substance, Biomedical or Dental Material* | 4 |
| Temporal Concept | Temporal Concept | 1 |
Ten most frequently co-occurred “Subject/Object” relationships identified from the corpus of 23,410 image reports
| Co-occurrence Pair | Example | Count |
|---|---|---|
| Location:Body Part | frontal view:of(Situated_at):vertebral body; lower outer quadrant:of(Modifies):right breast | 19,625 |
| Observation:Body Part | erythema:of(Occurs_in):left breast; mass lesion(Occurs_in):in:left breast | 15,219 |
| Pathology:Body Part | B-cell lymphoma:of(Occurs_in):breast; fibroadenoma:with(Modifies):tissue | 14,904 |
| Medical Activity:Body Part | ultrasound: in(Acts_on):left breast; CT scans: of(Acts_on):skull | 13,479 |
| Observation:Pathology | x-ray findings:as(Indicative_of):pleural effusions; all features:of(Indicative_of):fibroadenoma | 13,439 |
| Functional Concept:Pathology | outcome:of(Describes):breast cancer; case:of(Related_to): previous DCIS | 12,394 |
| Pathology:Pathology | Haemangiomas:are(Be): benign vascular tumors; complications:include(Has):vessel thrombosis | 12,119 |
| Medical Activity:Pathology | drainage:confirmed(Shows):breast abscess; mastectomy:for(Acts_on):breast malignancy | 11,924 |
| Medical Activity:Observation | Chest x-ray:performed for(Deals_with):chest pain; biopsy:of(Acts_on):small lesion | 11,890 |
| Observation:Observation | breast lump:with(Shows):occasional pain; features:of(Shows):benign lesion | 11,882 |
Fig. 3Summary of different semantic types (among 289,782 NP and ADJP, top 22). Majority (80.32%) of the radiology case corpus covered by the top 22 (16.3%) UMLS semantic types
Fig. 4Knowledge model. The dotted lines show significant relationships in the co-occurrence network. The dotted box represents core semantic categories that are intrinsically closely related and are significant in the knowledge model
Fig. 5Knowledge model example of two sentences: “Serial IVU films showing widely separated pubic bones with absent symphysis” and “Complex L-transposition of the great arteries with cardiac pacemaker”
Evaluation of semantic annotation performance
| Semantic Categories | True Positive (TP) | True Negative (TN) | False Positive (FP) | False Negative (FN) | Precision | Recall | F-Score |
|---|---|---|---|---|---|---|---|
| Abnormality | 16 | 1660 | 4 | 12 | 80.0% | 57.1% | 0.6667 |
| Body Part | 238 | 1438 | 38 | 26 | 86.2% | 90.2% | 0.8815 |
| Classification | 12 | 1664 | 0 | 6 | 100.0% | 66.7% | 0.8000 |
| Functional Concept | 90 | 1586 | 14 | 12 | 86.5% | 88.2% | 0.8738 |
| Location | 230 | 1446 | 54 | 58 | 81.0% | 79.9% | 0.8042 |
| Medical Activity | 22 | 1654 | 14 | 26 | 61.1% | 45.8% | 0.5238 |
| Medical Device and Object | 8 | 1668 | 14 | 0 | 36.4% | 100.0% | 0.5333 |
| Observation | 168 | 1508 | 16 | 62 | 91.3% | 73.0% | 0.8116 |
| Pathology | 202 | 1474 | 4 | 50 | 98.1% | 80.2% | 0.8821 |
| Physiology | 16 | 1660 | 4 | 4 | 80.0% | 80.0% | 0.8000 |
| Qualitative Concept | 172 | 1504 | 22 | 60 | 88.7% | 74.1% | 0.8075 |
| Quantitative Concept | 78 | 1598 | 4 | 4 | 95.1% | 95.1% | 0.9512 |
| Substance | 24 | 1652 | 12 | 24 | 66.7% | 50.0% | 0.5714 |
| Temporal Concept | 56 | 1620 | 2 | 0 | 96.6% | 100.0% | 0.9825 |
| Overall | 1332 | 22,132 | 202 | 344 | 86.8% | 79.5% | 0.8299 |