| Literature DB >> 31391038 |
Tao Zheng1,2, Yimei Gao3, Fei Wang3,4, Chenhao Fan2, Xingzhi Fu2, Mei Li2, Ya Zhang1, Shaodian Zhang3,5, Handong Ma6,7.
Abstract
BACKGROUND: Imaging examinations, such as ultrasonography, magnetic resonance imaging and computed tomography scans, play key roles in healthcare settings. To assess and improve the quality of imaging diagnosis, we need to manually find and compare the pre-existing reports of imaging and pathology examinations which contain overlapping exam body sites from electrical medical records (EMRs). The process of retrieving those reports is time-consuming. In this paper, we propose a convolutional neural network (CNN) based method which can better utilize semantic information contained in report texts to accelerate the retrieving process.Entities:
Keywords: Convolutional neural network; LIME; Natural language processing; Text similarity
Mesh:
Year: 2019 PMID: 31391038 PMCID: PMC6686478 DOI: 10.1186/s12911-019-0880-2
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
A report-pair in this study
| Language | Imaging report content | Pathologic report content |
|---|---|---|
| English | The solid hypoechoic area of the subcutaneous tissues of maxillofacial region is 14.6 mm × 10.4 mm and covered with a capsule. The boundary is clear and the shape is regular. | The specimen for pathological examination contains one mass. The size of mass is 1.2 × 1 × 1 cm, the color is gray red and the capsule is complete. (Parotid gland) favor a diagnosis of pleomorphic adenoma. The lesion contains abundant cells without a clear limit out of the surrounding tissue. |
| Chinese | 颌面部所指处皮下见实质性低回声区14.6 mm × 10.4 mm, 边界清, 有包膜, 形态规则。 | 肿块一枚, 大小1.2*1*1 cm, 灰红色, 包膜完整。(腮腺)多形性腺瘤, 细胞丰富, 与周围组织分界不清。 |
Fig. 1Workflow of detecting text semantic similarity
Fig. 2CNN-based neural network for text similarity detection
Performance comparison of different models including Precision/Recall/F1-score
| Model | Macro average | Positive class | Negative class | AUC (mean ± std) | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Precision (mean ± std) | Recall (mean ± std) | F1-score (mean ± std) | Precision (mean ± std) | Recall (mean ± std) | F1-score (mean ± std) | Precision (mean ± std) | Recall (mean ± std) | F1-score (mean ± std) | ||
| Zero-r | 0.73 ± 0.0 | 0.85 ± 0.0 | 0.78 ± 0.0 | 0.0 ± 0.0 | 0.0 ± 0.0 | 0.0 ± 0.0 | 0.85 ± 0.0 | 1.0 ± 0.0 | 0.92 ± 0.0 | 0.0 ± 0.0 |
| Keyword Mapping | 0.827 ± 0.006 | 0.842 ± 0.005 | 0.833 ± 0.005 | 0.464 ± 0.023 | 0.358 ± 0.018 | 0.404 ± 0.018 | 0.891 ± 0.004 | 0.927 ± 0.004 | 0.909 ± 0.003 | 0.840 ± 0.004 |
| LSA | 0.892 ± 0.005 | 0.862 ± 0.007 | 0.873 ± 0.006 | 0.512 ± 0.022 | 0.758 ± 0.019 | 0.611 ± 0.019 | 0.956 ± 0.004 | 0.879 ± 0.008 | 0.916 ± 0.005 | 0.894 ± 0.006 |
| LDA | 0.872 ± 0.006 | 0.852 ± 0.006 | 0.860 ± 0.006 | 0.514 ± 0.021 | 0.669 ± 0.023 | 0.581 ± 0.019 | 0.936 ± 0.005 | 0.884 ± 0.005 | 0.910 ± 0.004 | 0.879 ± 0.004 |
| Doc2Vec | 0.882 ± 0.007 | 0.862 ± 0.007 | 0.869 ± 0.007 | 0.514 ± 0.019 | 0.682 ± 0.023 | 0.586 ± 0.018 | 0.943 ± 0.005 | 0.892 ± 0.006 | 0.917 ± 0.004 | 0.871 ± 0.005 |
| NER-based | 0.835 ± 0.006 | 0.849 ± 0.005 | 0.842 ± 0.006 | 0.473 ± 0.022 | 0.501 ± 0.020 | 0.482 ± 0.020 | 0.904 ± 0.006 | 0.923 ± 0.005 | 0.912 ± 0.005 | 0.853 ± 0.004 |
| Siamese LSTM | 0.920 ± 0.006 | 0.891 ± 0.005 | 0.904 ± 0.006 | 0.582 ± 0.020 | 0.843 ± 0.021 | 0.698 ± 0.020 | 0.964 ± 0.006 | 0.901 ± 0.007 | 0.932 ± 0.006 | 0.916 ± 0.006 |
| CNN + random vector | 0.916 ± 0.005 | 0.931 ± 0.006 | 0.923 ± 0.005 | 0.631 ± 0.022 | 0.833 ± 0.019 | 0.712 ± 0.019 | 0.972 ± 0.007 | 0.917 ± 0.005 | 0.941 ± 0.005 | 0.942 ± 0.003 |
| CNN + pretrain vector | 0.912 ± 0.006 | 0.927 ± 0.006 | 0.920 ± 0.006 | 0.637 ± 0.021 | 0.811 ± 0.019 | 0.701 ± 0.020 | 0.965 ± 0.006 | 0.920 ± 0.005 | 0.937 ± 0.006 | 0.936 ± 0.004 |
| CNN + concept vector | 0.931 ± 0.006 | 0.938 ± 0.007 | 0.935 ± 0.006 | 0.682 ± 0.023 | 0.771 ± 0.020 | 0.734 ± 0.021 | 0.969 ± 0.004 | 0.938 ± 0.008 | 0.954 ± 0.007 | 0.951 ± 0.003 |
Fig. 3ROC Curve of different models
The original text of selected samples
| Sample pair No. | Imaging report content (Chinese) | Imaging report content (English) | Pathologic report content (Chinese) | Pathologic report content (English) |
|---|---|---|---|---|
| 1 | 宫内见1个胎儿, 胎位头位, 胎方位LOP。双顶径81, 枕额径101, 腹前后径92, 腹左右径83, 股骨长60, 肱骨长52。胎心胎动见, 胎心133次/分, 胎心律齐。胎盘位于后壁, 厚度35, 分级II, 胎盘下缘距宫颈内口> 54。羊水指数31 + 31 + 36 + 39。胎儿脐血流指数:PI = 0.93, RI = 0.63, S/D = 2.71 单胎头位。胎儿迟发畸形的检查受多因素影响, 超声无法检出所有胎儿异常。此检查仅限于胎儿生长监测。 | One fetus can be observed in the uterus. The position of the fetus is cephalic position, the orientation is LOP, the biparietal diameter is 81, the occipitofrontal diameter is 101, the anteroposterior trunk diameter is 92, the transverse trunk diameter is 83, the femur length is 60, the humeral length is 52. Fetal heart rate and fetal movement can be observed. The fetal heart rate us 133 beats per minute and the heart rhythm is regular. The placenta is located in the posterior wall. The thickness of the placenta is 35, grade II. The distance between the placental margin and the internal cervical os is > 54. The Amniotic fluid index is 31 + 31 + 36 + 39. Fetal umbilical artery plow index: PI = 0.93, RI = 0.63, S/D = 2.71. Singleton and cephalic presentation. The examination of fetal delayed malformation is affected by many factors, and ultrasound cannot detect all fetal abnormalities. This examination is limited to fetal growth monitoring. | 胎盘组织重600g, 大小21*17*3 cm, 胎膜完整, 切面灰红色, 母面小叶完整, 子面光滑, 相连脐带长35cm, 直径1.2 cm, 血管三根。(胎盘)孕晚期胎盘一个, 绒毛发育良好, 脐带及胎膜未见明显异常。 | The weight of placental tissue is 600 g, the size is 21 × 17 × 3 cm, the fetal mem-brane is intact, the cut sur-face is gray-red, the lobules of maternal surface are intact, and the daughter surface is smooth. The length of the umbilical cord is 35 cm, the diameter is 1.2 cm, and three blood vessels can be observed. (Placenta) favor a diagnosis of previa of late pregnancy, the villi are well-developed, and no obvious lesion is observed in umbilical cord and fetal membrane. |
| 2 | 甲状腺大小正常, 包膜清晰完整, 内部回声分布均匀, CDFI:腺体内部血流信号未见明显异常。甲状腺右叶内可见数个低回声区, 大者大小23.5*13.2 mm, 形态规则, 边界清晰, 内部回声不均匀。 | The size of thyroid gland is normal, the capsule is clear and intact, and the echogenicity is homogeneous. CDFI: There is no obvious abnormality of blood flow signal in the gland. There are several hypoechoic areas in the right lobe of the thyroid. The size of the lesion is 23.5 × 13.2 mm, the shape is regular, the boundary is clear, and the echogenicity is inhomogeneous. | 甲状腺组织, 大小4.5*2.5*1.5 cm, 切面见结节两枚, 直径1-2 cm, 灰红色, 质软。(甲状腺右叶)结节性甲状腺肿伴滤泡性腺瘤形成。 | The specimen for pathological examination contains one thyroid tissue. The size of the tissue is 4.5 × 2.5 × 1.5 cm. Two thyroid nodules can be observed from the cut surface. The diameter of the nodules is 1 to 2 cm, the color are grey red, the texture is soft. (The right lobe of the thyroid) favor a diagnosis of nodular goiter combined with follicular adenoma. |
Sample-level feature importance of sample pair 1 and 2 for both imaging and pathologic report provided by LIME algorithm
| Sample pair No. | Imaging report | Pathologic report | ||||
|---|---|---|---|---|---|---|
| Word-Chinese | Word-English | Feature importance of word | Word-Chinese | Word-English | Feature importance of word | |
| 1 (Prediction probability = 0.77) | 胎膜 | Fetal membranes | 0.15 | 胎儿 | Fetus | 0.12 |
| 脐带 | Umbilical cord | 0.14 | 胎心 | Fetal heart | 0.03 | |
| 胎盘 | Placenta | 0.06 | 羊水 | Amniotic fluid | 0.03 | |
| 毛发 | Hair | 0.04 | 头位 | Head position | 0.02 | |
| 小叶 | Lobule | 0.02 | 股骨 | Femur | 0.01 | |
| 面灰 | Face ash | 0.01 | 单胎 | Single fetus | 0.01 | |
2 (Prediction probability = 0.83) | 甲状腺 | Thyroid | 0.19 | 甲状腺 | Thyroid | 0.16 |
| 结节 | Tubercle | 0.15 | 腺体 | Glandular body | 0.14 | |
| 滤泡 | Follicular | 0.08 | 右叶 | Right lobe | 0.07 | |
| 右叶 | Right lobe | 0.03 | 包膜 | Envelope | 0.03 | |
| 腺瘤 | Adenoma | 0.01 | 回声 | Echoes | 0.01 | |
| 切面 | Section | 0.01 | 血流 | Blood flow | 0.01 | |
Sample pairs from error analysis
| Sample pair No. | Imaging report content (Chinese) | Imaging report content (English) | Pathologic report content (Chinese) | Pathologic report content (English) | True label | Predict label |
|---|---|---|---|---|---|---|
| 3 | 于左肾下极腹侧可见多个囊性为主的混合性回声, 相互融合, 较大之一约17.1 × 17.0 mm(局部凸向肾外), 靠近肾盏之一大小约14.2 × 14.6 mm, 形态欠规则, 表面光整, 境界欠清, 囊内无回声透声尚可, 分布欠均, 可见分隔样回声, 间隔及囊壁未见明显增粗, 囊内及囊壁可见点状、带状强回声, 团块后方回声无明显改变, CFI示未见明显血流信号。 | Multiple cystic mixed echoes can be observed in the ventral side of the inferior pole of left kidney, which fuse with each other. The largest one is about 17.1 × 17.0 mm (which protrudes out locally from the kidney), and the one near the renal pelvis is about 14.2 × 14.6 mm. The shape of the cysts is irregular, the surface is smooth, and the boundary is not clear. There are no echoes in the cysts, the sound transmission is normal, but the echogenicity is inhomogeneous, and septations can be observed. There is no obvious thickening for both septations and walls of the cysts. Punctate and banded strong echoes can be observed inside the cysts and on the wall of the cysts. There is no obvious lesion behind the cysts, and CFI showed no obvious blood flow signal. | 肿物两枚, 直径1cm, 暗黄色, 质中。另见肾上腺组织, 大小2.5*1.5*1.5 cm, 暗红色, 质中。(左肾上腺)倾向皮质结节状增生。 | The specimen for pathological examination contains two masses and one adrenal tissue. The diameter of masses is 1 cm, the color is dark yellow, and the texture is medium level. The size of adrenal tissue is 2.5 × 1.5 × 1.5 cm, the color is dark red, and the texture are medium level. (Left adrenal gland) favor a diagnosis of nodular adrenal cortical hyperplasia. | True | False |
| 4 | 左侧腋下见数个淋巴结样回声区, 大者11mm*5 mm, 边界清, 有包膜, 形态规则, 内部结构清晰, 未见明显血流信号。左侧腋下可见多个淋巴结。 | There are several lymphoid echoes under the left armpit, the largest one is 11 mm × 5 mm, the boundary is clear, the capsule is regular, the internal structure is clear. There is no obvious blood flow signal. Multiple lymph nodes can be seen in the left armpit. | 脂肪组织, 大小3.5*3*1 cm, 找见淋巴结两枚。(右腋下淋巴结)淋巴结(0/1)未见癌转移。免疫组化:(右腋下淋巴结)淋巴结(0/1)未见癌转移。 | The specimen for pathological examination contains one fat tissue. The size of fat tissue is 3.5 × 3 × 1 cm, and two lymph nodes can be seen in the tissue. (The lymph node of right armpit) lymph node (0/1) show no metastasis. Immunohistochemical staining method: (the lymph node of right armpit) lymph node (0/1) show no metastasis. | False | True |