| Literature DB >> 32646496 |
Nan Li1, Zhihao Yang2, Ling Luo1, Lei Wang3, Yin Zhang4, Hongfei Lin1, Jian Wang1.
Abstract
BACKGROUND: Hepatocellular carcinoma is one of the most general malignant neoplasms in adults with high mortality. Mining relative medical knowledge from rapidly growing text data and integrating it with other existing biomedical resources will provide support to the research on the hepatocellular carcinoma. To this purpose, we constructed a knowledge graph for Hepatocellular Carcinoma (KGHC).Entities:
Keywords: Hepatocellular carcinoma; Information extraction; Knowledge graph
Mesh:
Year: 2020 PMID: 32646496 PMCID: PMC7346328 DOI: 10.1186/s12911-020-1112-5
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1The processing flow of constructing the KGHC
The Attributes of Knowledge Graph
| Attribute No. | Attribute Name | Remarks |
|---|---|---|
| 1 | PMID | The PubMed abstract ID from which the entity is extracted |
| 2 | Text Name | The mention of the entity in sentence |
| 3 | Entity ID | The id of the entity |
| 4 | Entity Type | The type of entity |
| 5 | Entity_Start_Index | The first character position (in sentence) of the text denoting the entity |
| 6 | Entity_End_Index | The last character position (in sentence) of the text denoting the entity |
| 7 | Source | The source of triple (e.g., UptoDate) |
| 8 | Sentence | The sentence including the triple |
Fig. 2The architecture of Att-BiLSTM-CRF
The result of Att-BiLSTM-CRF model on CHEMDNER dataset of BioCreative IV
| Method | Precision(%) | Recall(%) | F-score(%) |
|---|---|---|---|
| tmChem [ | 89.09 | 85.75 | 87.39 |
| Lu et al. [ | 88.73 | 87.41 | 88.06 |
| RNNA-CRF [ | 91.14 | 88.27 | 89.68 |
| BiLSTM-CRF [ | 91.31 | 87.73 | 89.48 |
| Att-BiLSTM-CRF | 91.65 | 90.04 | 90.84 |
Fig. 3An example of BioIE output
Fig. 4A partial display of KGHC
Fig. 5Parts of network between hepatocellular carcinoma and Hepatitis A
Fig. 6Data distribution in different categories of knowledge graph
Fig. 7Directly relation and indirectly relation
Fig. 8The input corpus of knowledge graph
Disagreement analysis
| Cause of error | Percentage | Percentage based on text genre | |||
|---|---|---|---|---|---|
| literature | UpToDate | SemMedDB | |||
| Entity Recognition | 6.91%(13) | 23.08%(3) | 30.77%(4) | 7.69%(1) | 38.46%(5) |
| Entity Disambiguation | 7.44%(14) | 28.57%(4) | 7.14%(1) | 7.14%(1) | 57.14%(8) |
| Nonexistent Relation | 16.48%(31) | 16.13%(5) | 0(0) | 6.45%(2) | 77.42%(24) |
| Inaccurate Relation | 47.34%(89) | 16.85%(15) | 2.25%(2) | 2.25%(2) | 78.65%(70) |
| Passive Relation | 10.63%(20) | 20%(4) | 5%(1) | 0(0) | 75%(15) |
| Negation Relation | 11.17%(21) | 19.05%(4) | 0(0) | 0(0) | 80.95%(17) |