| Literature DB >> 30943974 |
Himanshu Sharma1, Chengsheng Mao2, Yizhen Zhang2, Haleh Vatani1, Liang Yao2, Yizhen Zhong2, Luke Rasmussen2, Guoqian Jiang3, Jyotishman Pathak4, Yuan Luo5.
Abstract
BACKGROUND: This paper presents a portable phenotyping system that is capable of integrating both rule-based and statistical machine learning based approaches.Entities:
Year: 2019 PMID: 30943974 PMCID: PMC6448187 DOI: 10.1186/s12911-019-0786-z
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1A snippet of the patient input file
Fig. 2A snippet of MetaMap output record
Note_NLP table data elements
| Column name | Description |
|---|---|
| note_nlp_id | A unique identifier for each term extracted from a note. A randomly generated auto-incremented number. |
| note_id | A foreign key. The note_id from the Note table from the note the term was extracted from. |
| section_concept_id | The representation of the section that extracted concept belongs to. |
| snippet | A small window of the text that extracted concepts belong to. |
| offset | Provided by the MetaMap in the output file. |
| lexical_variant | The actual phrase text that MetaMap generates. |
| note_nlp_concept_id | The concepts or CUIs. |
| nlp_system | NLP tool. |
| nlp_date_time | Date and Time of creation/running |
Fig. 3The word position recording process in our work
Fig. 4A sample of the matched regex tables. a the table for words related to ‘Questionable’; (b) the table for words related to ‘Absent’; (c) the table for words related to ‘Present’
The parameter grids for grid search
| Classifier | Parameter grid |
|---|---|
| LR | ‘C’:[0.01,0.1,1,10,100] |
| SVM |
‘C’:[0.01,0.1,1,10,100],
|
| DT | ‘criterion’:[‘gini’,'entropy’] |
| RF |
‘n_estimators’:[5,10,30,50,80,100],
|
The classification results on all CUIs corresponding to the original records
| P-Micro | P-Macro | R-Micro | R-Macro | F-Micro | F-Macro | |
|---|---|---|---|---|---|---|
| Intuitive | ||||||
| LR | 0.8719 | 0.5792 | 0.8719 | 0.5509 | 0.8719 | 0.5618 |
| SVM | 0.8727 | 0.5776 | 0.8727 | 0.5537 | 0.8727 | 0.5632 |
| DT |
|
|
|
|
|
|
| RF | 0.8524 | 0.5626 | 0.8524 | 0.5349 | 0.8524 | 0.5454 |
| Textual | ||||||
| LR | 0.8846 | 0.4379 | 0.8846 | 0.4195 | 0.8846 | 0.4268 |
| SVM | 0.8886 | 0.4384 | 0.8886 | 0.4243 | 0.8886 | 0.4300 |
| DT |
|
|
|
|
|
|
| RF | 0.8621 | 0.4220 | 0.8621 | 0.4044 | 0.8621 | 0.4112 |
For each task, the best results are bolded
The classification results without family history related CUIs
| P-Micro | P-Macro | R-Micro | R-Macro | F-Micro | F-Macro | |
|---|---|---|---|---|---|---|
| Intuitive | ||||||
| LR | 0.8716 | 0.5794 | 0.8716 | 0.5503 | 0.8716 | 0.5615 |
| SVM | 0.8735 | 0.5780 | 0.8735 | 0.5546 | 0.8735 | 0.5640 |
| DT |
|
|
|
|
|
|
| RF | 0.8627 | 0.5685 | 0.8627 | 0.5462 | 0.8627 | 0.5551 |
| Textual | ||||||
| LR | 0.8836 | 0.4372 | 0.8836 | 0.4189 | 0.8836 | 0.4262 |
| SVM | 0.8895 | 0.4391 | 0.8895 | 0.4248 | 0.8895 | 0.4306 |
| DT |
|
|
|
|
|
|
| RF | 0.8618 | 0.4210 | 0.8618 | 0.4049 | 0.8618 | 0.4112 |
For each task, the best results are bolded
Fifteen semantic types selected for clinical feature representations [21]
| CUI | Semantic group | Semantic type description |
|---|---|---|
| T017 | Anatomy | Anatomical Structure |
| T022 | Anatomy | Body System |
| T023 | Anatomy | Body Part, Organ, or Organ Component |
| T033 | Disorders | Finding |
| T034 | Phenomena | Laboratory or Test Result |
| T047 | Disorders | Disease or Syndrome |
| T048 | Disorders | Mental or Behavioral Dysfunction |
| T049 | Disorders | Cell or Molecular Dysfunction |
| T059 | Procedures | Laboratory Procedure |
| T060 | Procedures | Diagnostic Procedure |
| T061 | Procedures | Therapeutic or Preventive Procedure |
| T121 | Chemicals & Drugs | Pharmacologic Substance |
| T122 | Chemicals & Drugs | Biomedical or Dental Material |
| T123 | Chemicals & Drugs | Biologically Active Substance |
| T184 | Disorders | Sign or Symptom |
The classification results without family history on 15 types of selected CUIs
| P-Micro | P-Macro | R-Micro | R-Macro | F-Micro | F-Macro | |
|---|---|---|---|---|---|---|
| Intuitive | ||||||
| LR | 0.9024 | 0.6040 | 0.9024 | 0.5763 | 0.9024 | 0.5874 |
| SVM | 0.9077 | 0.6055 | 0.9077 | 0.5831 | 0.9077 | 0.5924 |
| DT |
|
|
|
|
|
|
| RF | 0.8784 | 0.5849 | 0.8784 | 0.5559 | 0.8784 | 0.5671 |
| Textual | ||||||
| LR | 0.9145 | 0.4560 | 0.9145 | 0.4410 | 0.9145 | 0.4472 |
| SVM | 0.9227 |
| 0.9227 | 0.4532 | 0.9227 | 0.4607 |
| DT |
| 0.4878 |
|
|
|
|
| RF | 0.8830 | 0.4353 | 0.8830 | 0.4195 | 0.8830 | 0.4258 |
For each task, the best results are bolded
The best rule-based classification results reported in [20]
| P-Micro | P-Macro | R-Micro | R-Macro | F-Micro | F-Macro | |
|---|---|---|---|---|---|---|
| Intuitive | 0.9590 | 0.7485 | 0.9590 | 0.6571 | 0.9590 | 0.6745 |
| Textual | 0.9756 | 0.8318 | 0.9756 | 0.7776 | 0.9756 | 0.8000 |
The classification results for major classes on all CUIs corresponding to the original records
| P-Micro | P-Macro | R-Micro | R-Macro | F-Micro | F-Macro | |
|---|---|---|---|---|---|---|
| Intuitive | ||||||
| LR | 0.8709 |
| 0.8709 | 0.5733 | 0.8709 | 0.5960 |
| SVM | 0.8724 |
| 0.8724 | 0.5770 | 0.8724 | 0.5981 |
| DT |
|
|
|
|
|
|
| RF | 0.8466 | 0.6226 | 0.8466 | 0.5559 | 0.8466 | 0.5765 |
| Textual | ||||||
| LR | 0.8882 |
| 0.8882 |
| 0.8882 |
|
| SVM | 0.8930 |
| 0.8930 |
| 0.8930 |
|
| DT |
|
|
|
|
|
|
| RF | 0.8882 |
| 0.8882 |
| 0.8882 |
|
For each task, the best results are bolded. The underlined results can achieve the top 10 results reported in [20]
The classification results for major classes without family history related CUIs
| P-Micro | P-Macro | R-Micro | R-Macro | F-Micro | F-Macro | |
|---|---|---|---|---|---|---|
| Intuitive | ||||||
| LR | 0.8723 |
| 0.8723 | 0.5741 | 0.8723 | 0.5970 |
| SVM | 0.8732 |
| 0.8732 | 0.5780 | 0.8732 | 0.5989 |
| DT |
|
|
|
|
|
|
| RF | 0.8559 |
| 0.8559 | 0.5623 | 0.8559 | 0.5838 |
| Textual | ||||||
| LR | 0.8886 |
| 0.8886 |
| 0.8886 |
|
| SVM | 0.8938 |
| 0.8938 |
| 0.8938 |
|
| DT |
|
|
|
|
|
|
| RF | 0.8640 |
| 0.8640 |
| 0.8640 |
|
For each task, the best results are bolded. The underlined results can achieve the top 10 results reported in [20]
The classification results for major classes without family history on the 15 types of selected CUIs
| P-Micro | P-Macro | R-Micro | R-Macro | F-Micro | F-Macro | |
|---|---|---|---|---|---|---|
| Intuitive | ||||||
| LR | 0.9001 |
| 0.9001 | 0.5979 | 0.9001 | 0.6206 |
| SVM | 0.9074 |
| 0.9074 | 0.6065 | 0.9074 | 0.6274 |
| DT |
|
|
|
|
|
|
| RF | 0.8690 |
| 0.8690 | 0.5740 | 0.8690 | 0.5952 |
| Textual | ||||||
| LR | 0.9188 |
| 0.9188 |
| 0.9188 |
|
| SVM | 0.9273 |
| 0.9273 |
| 0.9273 |
|
| DT |
|
|
|
|
|
|
| RF | 0.8864 |
| 0.8864 |
| 0.8864 |
|
For each task, the best results are bolded. The underlined results can achieve the top 10 results reported in [20]