| Literature DB >> 32252745 |
Tingting Zhang1, Yaqiang Wang2, Xiaofeng Wang3, Yafei Yang3, Ying Ye4.
Abstract
BACKGROUND: In this study, we focus on building a fine-grained entity annotation corpus with the corresponding annotation guideline of traditional Chinese medicine (TCM) clinical records. Our aim is to provide a basis for the fine-grained corpus construction of TCM clinical records in future.Entities:
Keywords: Corpus construction; Fine-grained annotation; Guideline development; Named entity recognition; TCM clinical records
Mesh:
Year: 2020 PMID: 32252745 PMCID: PMC7132896 DOI: 10.1186/s12911-020-1079-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Studies on the construction of Chinese clinical text corpora in the last five years
| Year | Author | Scale and target | Entities | Fine-grained | TCM clinical texts |
|---|---|---|---|---|---|
| 2014 | Xu et al. [ | 336 Chinese discharge summaries of 71,355 words | Medication, anatomy, medical problems, treatments, and tests | N | N |
| 2014 | Lei et al. [ | 400 admission notes and 400 discharge summaries | Clinical problems, procedures, laboratory tests, and medications | N | N |
| 2014 | Wang et al. [ | 11,613 clinical records | Symptoms | N | Y |
| 2014 | Wang et al. [ | 115 EMRs | 115 documents on tumor-related information from the notes of hepatic carcinoma operations | N | N |
| 2014 | Gao et al. [ | 42 health records of stroke | Body structures and clinical description | N | Y |
| 2015 | Li et al. [ | 700 initial diagnosis records, congestive heart failure data of 253 cases. | TCM herbs and symptoms | N | Y |
| 2015 | Xu et al. [ | 24,817 anonymized Chinese EMRs | Symptoms, clinical tests, diseases, drugs, body parts, and procedure categories | N | Y |
| 2016 | Zhang et al. [ | 2000 notes (1000 admission notes and 1000 discharge summaries) | Diseases and syndromes, symptoms and signs, treatments and drugs, and laboratory tests | N | N |
| 2016 | Wan et al. [ | More than 100,000 TCM article abstracts | Herbs, syndromes, diseases, and formulas | N | Y |
| 2016 | Liu et al. [ | 1778 clinical notes of 281 hospitalized patients | Temporal expression and normalization in Chinese clinical notes (type, value, and modifier) | N | N |
| 2017 | Ruan et al. [ | 1000 EMRs | Symptoms, departments, diseases, medicines, and examinations | N | Y |
| 2017 | He et al. [ | 500 discharge summaries and 492 progress notes | Diseases, symptoms, and treatments | N | N |
| 2018 | Zhang et al. [ | 400 documents | Symptoms, tests, diagnoses, treatments, and body parts | N | N |
| 2018 | Miao et al. [ | 540 reports | Breast Imaging Reporting and Data System | N | N |
| 2018 | Bao et al. [ | 600 documents | History of present illnesses, personal history, and family history | N | N |
| 2019 | Wang et al. [ | 1596 annotated instances (10,024 sentences) | Diseases, symptoms, exams, treatments, and body parts | N | N |
| 2019 | Gao et al. [ | 255 authentic admission records | Medical discovery, body parts, temporal words, diseases, medications, treatments, inspections, laboratory tests, and measurements | N | N |
| 2019 | Cai et al. [ | 1000 admission records | Anatomical parts, symptom descriptions, independent symptoms, drugs, and operations | N | N |
| 2019 | Xiong et al. [ | 1000 admission notes and 800 discharge summaries | Body parts, diseases, symptoms, tests, and treatments | Y | N |
Fig. 1First and second levels of concepts in TCMLS. The figure was generated by Mcrosoft Visio 2013
Definition and examples of the 13 entities used in this study
| Entity type | Definition | Examples (entities are in bold font) |
|---|---|---|
| This entity enables us to locate the exact positions of symptoms, medical tests, or disease. | ||
| This is the musculature and vascular tissue of the tongue, also the tongue substance. It is annotated only when followed by a specific description of the tongue’s physical manifestation. | ||
| A layer of moss-like material covering the tongue, also called tongue fur. It is annotated only when followed by the description of tongue coating manifestation. | 舌红, | |
| A radial artery of the wrist, which includes three sections: | 舌红, 苔黄, | |
| A point where a needle is inserted and manipulated in acupuncture therapy. | ||
| A system of conduits through which qi and blood circulate, connecting the bowels, viscera, extremities, superficial organs, and tissues, and making the body an organic whole. These are the same as channels and networks and are also called meridians or channels. | 左大腿 | |
| An internal organ in which the essence and qi are formed and stored. These organs include heart, liver, spleen, lungs, and kidneys, and are also called the five viscera. | 一直服调 | |
| An internal organ in which food is received, transported, and digested, including the gallbladder, stomach, large intestine, small intestine, urinary bladder, and triple energizers.g They are also called the six bowels. | ||
| Words referring to the tongue body and tongue coating. | ||
| Specific description of the tongue body manifestation, including tongue color, shape, and sublingual vein. | 舌 | |
| Specific tongue coating manifestation, including color, thickness, and texture. | 舌红, 苔 | |
| Specific description of arterial pulsation in TCM when the pulse is felt during examination. | 舌红, 苔黄, 脉 | |
| Description of the direction and position, which enables us to know the specific location of the body part. |
gIn TCM, the Fu organ, or “triple energizers,” is a collective term for the three portions of the body cavity through which the visceral qi is transformed. This organ is also widely known as the “triple burners.” It contains the upper energizer, middle energizer, and lower energizer. It is also called the “solitary hollow organ,” because there is no paired relationship between the viscera and the “triple energizers”
Fig. 2Screenshot of the entity annotation tool
Fig. 3Examples of fine-grained annotation
Fig. 4Workflow of entity selection, guideline development and corpus construction, and IAA measurement. The figure was generated by Mcrosoft Visio 2013
Fig. 5Pairwise IAA (κ) of the three annotators (W, Y, Z) for the first four rounds of annotation and final corpus construction. The figure was generated by Mcrosoft Word 2016
Numbers of entities and annotations
| Entity classification | Entity type | Total entity count | Total annotation count | Percentage of the corresponding type (entity/annotation) |
|---|---|---|---|---|
| Ordinary body part | 462 | 21,093 | 75.3%/56.3% | |
| Pulse | 22 | 6148 | 3.6%/16.4% | |
| Tongue coating | 10 | 4978 | 1.6%/13.3% | |
| Tongue body | 7 | 3789 | 1.1%/10.1% | |
| Acupoint | 87 | 469 | 14.2%/1.3% | |
| Zang organ | 5 | 139 | 0.8%/0.4% | |
| Meridian and collateral | 16 | 34 | 0.98%/0.1% | |
| Fu organ | 2 | 3 | 0.3%/0.008% | |
| Both tongue body and coating | 2 | 793 | 0.3%/2.1% | |
| Tongue coating manifestation | 102 | 10,911 | 38.9%/72.7% | |
| Tongue body manifestation | 160 | 4088 | 61.1%/27.2% | |
| Pulse condition | 90 | 9573 | 100%/100% | |
| Direction and position | 139 | 5781 | 100%/100% | |
Examples of the top-10 entities for each entity class
| Entity class | Total count | Entity examples (top 10) and number of occurrences |
|---|---|---|
| Ordinary body part | 21,091 | 口 (mouth; 2252), 头 (head; 1853), 腹 (abdomen; 1689), 胃 (stomach; 1267), 喉 (larynx; 962), 腰 (waist; 893), 肢 (limbs; 686), 背 (back; 585), 身 (body; 583), 手 (hand; 578) |
| Pulse | 6148 | 脉 (pulse; 6091), 尺脉 (chi pulse; 11), 肾脉 (kidney pulse; 10), 关 (guan; 6), 寸 (cun; 6), 尺 (chi; 4), 关脉 (guan pulse; 4), 肝 (liver; 2), 沉取 (taking deeply; 1), 脉沉取 (taking the deep pulse; 1) |
| Tongue coating | 4978 | 苔 (coating; 4765), 舌苔 (tongue coating; 188), 舌 (tongue; 16) |
| Tongue body | 3789 | 舌 (tongue; 3695), 舌质 (tongue body; 87), 苔 (tongue coating; 3), 舌苔 (tongue coating; 1), 舌头 (tongue; 1), 质 (tongue body; 1) |
| Acupoints | 469 | 风池 (GB20; 66), 太阳穴 (EX-HN5; 51), 肩井 (GB21; 40), 大椎 (DU14; 30), 环跳 (GB30; 27), 肩髃 (LI15; 14), 少海 (HT3; 12), 委中 (BL40; 11), 承扶 (BL36; 11), 天宗 (SI11; 10) |
| Zang organ | 139 | 心 (heart; 125), 肺 (lung; 5), 肾 (kidney; 4), 脾 (spleen; 3) |
| Meridians and collaterals | 34 | 膀胱经 (bladder meridian, BL; 8), 胃经 (stomach meridian, ST; 6), 大肠经 (large intestine meridian, LI; 4), 肝经 (liver meridian, LI; 2), 足太阳 (bladder meridian, BL; 2), 心经 (heart meridian, HT; 1), 肺经 (lung meridian, LU; 1), 手阳明经 (large intestine meridian, LI; 1), 足少阳 (gallbladder meridian, GB; 1), 小肠经 (small intestine meridian, SI; 1) |
| Fu organ | 3 | 胆 (gallbladder; 2), 胃 (stomach; 1) |
| Both tongue body and coating | 793 | 舌 (tongue; 793) |
Tongue coating manifestation | 10,911 | 薄 (thin; 3612), 黄 (yellow; 1907), 腻 (slimy; 1725), 干 (dry; 791), 白 (738; white), 略黄 (slightly yellow; 570), 少 (less; 365), 厚 (thick; 254), 润 (moist; 233), 滑 (slippery; 150) |
Tongue body manifestation | 4088 | 红 (red; 893), 淡 (pale; 564), 略红 (slightly red; 467), 暗 (dark; 216), 略暗 (slightly dark; 216), 红暗 (red and dark; 195), 齿印 (teeth-marked; 144), 暗红 (dark and red; 127), 淡暗 (pale and dark; 126), 略淡 (slightly pale; 122) |
| Pulse condition | 9573 | 细 (thready; 3493), 弦 (string-like; 1364), 弱 (faint; 841), 沉 (sunken; 651), 滑 (slippery; 616), 数 (534; rapid), 软 (soft; 473), 平 (normal; 420), 略弦 (slightly string-like; 180), 略数 (slightly rapid; 123) |
| Direction and position | 5781 | 左 (left; 1262), 右 (right; 1110), 下 (lower; 736), 上 (upper; 282), 心 (center; 273), 中 (middle; 199), 尖 (tip; 193), 前 (front; 141), 外 (outside; 136), 外侧 (outward; 128) |
Fig. 6Examples of some relationships between the top-10 syndromes and top-10 pulse conditions. The solid lines suggest that there are many possible relations between syndromes and pulse conditions. The absence of a line does not mean there is no relation between them. The figure was generated by Mcrosoft PowerPoint 2016
Fig. 7Examples of some relationships between the top-10 syndromes and top-10 tongue body and coating manifestations. The solid lines suggest that there are many possible relations between them. The absence of a line does not mean there is no relation between them. The figure was generated by Mcrosoft PowerPoint 2016
Examples of top-10 annotated acupoints in corresponding meridians
| Meridians | Annotated acupoints and number of occurrences | Examples |
|---|---|---|
| Lung meridian (LU) | 鱼际 (LU10; 3), 云门 (LU2; 3) | “右 |
| Large intestine meridian (LI) | 肩髃 (LI15; 14), 曲池 (LI11; 9), 合谷 (LI4; 8), 臂臑 (LI14; 7), 手三里 (LI10; 5), 巨骨 (LI16; 4), 肘髎 (LI12) (1) | “左 |
| Stomach meridian (ST) | 解溪 (ST41; 5), 髀关 (ST31; 4), 气冲 (ST30; 2), 梁丘 (ST34; 2), 下关 (ST7; 2), 内庭 (ST44; 1), 足三里 (ST36; 1), 丰隆 (ST40; 1), 人迎 (ST9; 1) | “左膝, |
| Spleen meridian (SP) | 血海 (SP10; 1), 三阴交 (SP6; 1), 大横 (SP15; 1), 腹结 (SP14; 1) | “下肢麻痹, 左, |
| Heart meridian (HT) | 少海 (HT3; 12) | “左锁骨头痛, 右 |
| Small intestine meridian (SI) | 天宗 (SI11; 10), 秉风 (SI12; 5), 曲垣 (SI13; 2), 肩贞 (SI9; 1), 秉风穴 (SI12; 1), 天容 (SI17; 1) | “右 |
| Bladder meridian (BL) | 委中 (BL40; 11), 承扶 (BL36; 11), 白环俞 (BL30; 10), 大肠俞 (BL; 9), 承山 (BL57; 7), 秩边 (BL54; 3), 昆仑 (BL60; 2), 通天 (BL7; 2), 申脉 (BL62; 1), | “右 |
| Kidney meridian (KI) | 涌泉 (KI1; 2), 太溪 (KI3; 2), 然谷 (KI2; 1) | “脚底热感, |
| Pericardium meridian (PC) | 大陵 (PC7; 1) | “痛点, 左阳池, 左少海, 右 |
| Triple energizer meridian (TE) | 阳池 (TE4; 8), 耳门 (TE21; 2), 肩髎(TE14; 2) | 手关节痛, |
| Gallbladder Meridian (GB) | 风池 (GB20; 66), 肩井 (GB21; 40), 环跳 (GB30; 27), 居髎 (GB29; 10), 阳陵泉 (GB34; 5), 侠溪 (GB40; 1), 维道 (GB28; 1) | “现痛点, |
| Liver meridian (LR) | 急脉 (LR12; 2), 太冲 (LR3; 2) | “便秘, 右 |
| Extra point (EX) | 太阳穴 (EX-HN5; 51), 太阳 (EX-HN5; 6), 外膝眼 (EX-LE5; 4), 夹脊 (EX-B2; 3), 膝眼 (EX-B6; 2), 鹤顶 (EX-LE2; 1), 颈百劳 (EX-UX8; 1), 腰眼 (EX-B6; 1) | “头痛, |
| Governor vessel (GV) | 大椎 (GV14; 30), 腰阳关 (GV3; 8), 长强 (GV1; 1), 风府 (GV16; 1), 前顶 (GV21; 1) | “项背强痛, 右 |
Conception vessel (CV) | 中脘 (CV12; 1), 曲骨 (CV2; 1) | “ |