| Literature DB >> 33319703 |
Fengbo Zheng1, Rashmie Abeysinghe2, Nicholas Sioutos3, Lori Whiteman3, Lyubov Remennik3, Licong Cui4.
Abstract
BACKGROUND: The National Cancer Institute (NCI) Thesaurus provides reference terminology for NCI and other systems. Previously, we proposed a hybrid prototype utilizing lexical features and role definitions of concepts in non-lattice subgraphs to identify missing IS-A relations in the NCI Thesaurus. However, no domain expert evaluation was provided in our previous work. In this paper, we further enhance the hybrid approach by leveraging a novel lexical feature-roots of noun chunks within concept names. Formal evaluation of our enhanced approach is also performed.Entities:
Keywords: Lexical feature; NCI Thesaurus; Quality assurance; Role definition
Year: 2020 PMID: 33319703 PMCID: PMC7737275 DOI: 10.1186/s12911-020-01289-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
The role definition of concept “Sarcoma” (C9118) in NCI Thesaurus
| Role | Value |
|---|---|
| IS-A | Connective and soft tissue neoplasm |
| IS-A | Malignant neoplasm |
| Disease_Has_Abnormal_Cell | Malignant cell |
| Disease_Has_Abnormal_Cell | Neoplastic cell |
| Disease_Excludes_Normal_Cell_Origin | Epithelial cell |
| Disease_Excludes_Normal_Tissue_Origin | Epithelial tissue |
| Disease_Has_Associated_Anatomic_Site | Connective and soft tissue |
| Disease_Has_Normal_Tissue_Origin | Connective and soft tissue |
| Disease_Excludes_Finding | Benign cellular infiltrate |
| Disease_Excludes_Finding | Indolent clinical course |
| Disease_Excludes_Finding | Intermediate filaments present |
| Disease_Excludes_Finding | Intracytoplasmic eosinophilic inclusion |
| Disease_Has_Finding | Malignant cellular infiltrate |
Fig. 1An example of non-lattice subgraphs in 19.08d version of NCI Thesaurus. Concepts are connected by IS-A relations. The red dotted line shows a potentially missing IS-A relation between concepts “Cutaneous Pseudolymphoma” and “Non-Neoplastic Skin Disorder” identified by our method
Fig. 2Semantic models of concepts “Cutaneous Pseudolymphoma (C62776)” and “Non-Neoplastic Skin Disorder (C27555)” that are contained in non-lattice subgraph shown in Fig. 1
The number of potentially missing IS-A relations identified for sub-hierarchies
| Sub-hierarchy | # of Non-lattice subgraphs | # of suggested missing IS-A relations |
|---|---|---|
| Disease, disorder or finding | 8075 | 34 |
| Experimental organism diagnosis | 257 | 18 |
| Drug, food, chemical or biomedical material | 922 | 1 |
| Molecular abnormality | 143 | 1 |
| Activity | 109 | 1 |
Ten examples of valid missing IS-A relations confirmed by EVS experts
| Subconcept | Superconcept |
|---|---|
| Glycine encephalopathy | Congenital nervous system disorder |
| Tumor infiltrating lymphocytes-N2-transduced | Therapeutic tumor infiltrating lymphocytes |
| Stage 0 anal cancer AJCC v8 | Anal precancerous condition |
| Cutaneous pseudolymphoma | Non-neoplastic skin disorder |
| Congenital vena cava abnormality | Congenital cardiovascular abnormality |
| Mouse cardiac fibrosarcoma | Mouse cardiac sarcoma |
| Fibrosarcoma of the mouse intestinal tract | Mouse malignant mesenchymal neoplasm |
| Carcinoma of the mouse larynx | Mouse carcinoma |
| Eyelid xanthoma | Non-neoplastic eyelid disorder |
| Autoimmune lymphoproliferative syndrome-associated lymphoma | Immunodeficiency-related malignant neoplasm |