| Literature DB >> 35476822 |
Haofen Wang1, Huifang Du1, Guilin Qi2, Huajun Chen3, Wei Hu4, Zhuo Chen3.
Abstract
BACKGROUND: With the continuous spread of COVID-19, information about the worldwide pandemic is exploding. Therefore, it is necessary and significant to organize such a large amount of information. As the key branch of artificial intelligence, a knowledge graph (KG) is helpful to structure, reason, and understand data.Entities:
Keywords: COVID-19; artificial intelligence; data set; knowledge extraction; knowledge fusion; knowledge graph; linked data; natural language processing; schema modeling; semantic search
Year: 2022 PMID: 35476822 PMCID: PMC9109781 DOI: 10.2196/37215
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Overview of OpenKG-COVID19. KG: knowledge graph; NCBI: National Center for Biotechnology Information; Q&A: question and answer.
Classifications of schema design and knowledge extraction of COVID-19 knowledge graphs.
| Knowledge graph | Schema design | Knowledge extraction | |||||
|
| Manual | Site data | Automatic mining | Structured | Semistructured | Plain text | |
| Concept |
|
| ✓ |
| ✓ | ✓ | |
| Encyclopedia |
| ✓ | ✓ |
| ✓ | ✓ | |
| Medical | ✓ |
|
|
|
| ✓ | |
| Health | ✓ | ✓ |
|
| ✓ |
| |
| Research |
| ✓ |
| ✓ |
| ✓ | |
| Prevention | ✓ |
|
| ✓ | ✓ | ✓ | |
| Goods |
| ✓ |
|
| ✓ |
| |
| Event | ✓ |
|
|
|
| ✓ | |
| Character |
| ✓ |
|
| ✓ |
| |
| Epidemiology | ✓ |
|
| ✓ |
|
| |
Figure 2Schema diagram of the epidemiology knowledge graph.
Figure 3Construction process of the encyclopedia knowledge graph (KG).
Figure 4Extraction of various types of semistructured data.
Figure 5Workflow of entity alignment.KG: Knowledge graph.
Detailed statistics and quality of each subgraph.
| Knowledge graph | Facts, n | Concepts, n | Entities, n | Properties, n | Evaluation, n | Correct, n | Precision (%), mean (SD) |
| Encyclopedia | 261,154 | 50 | 54,318 | 60 | 5000 | 4778 | 95 |
| Medical | 2857 | 54 | 1035 | 92 | 652 | 620 | 94 |
| Research | 2,281,797 | 31 | 221,131 | 64 | 8556 | 8555 | 99 |
| Event | 27,388 | 4 | 2291 | 21 | 200 | 198 | 96 |
| Character | 1902 | 21 | 1057 | 40 | 570 | 570 | 99 |
| Prevention | 28,651 | 113 | 34,859 | 24 | 646 | 630 | 97 |
| Goods | 3738 | 165 | 132 | 57 | 365 | 359 | 97 |
| Health | 51,575 | 592 | 7110 | 104 | 487 | 483 | 98 |
| Epidemiology | 8336 | 55 | 2163 | 47 | 200 | 200 | 98 |
| Concept | 19,391 | 1487 | 4784 | 2 | 100 | 96 | 92 |
Results of schema matching.a
| Knowledge graph | Encyclopedia | Prevention | Concept | Health | Research | Medical | Epidemiology | Event | Goods | Character |
| Encyclopedia | —b | 0 | 0 | 9 | 4 | 6 | 1 | 0 | 0 | 0 |
| Prevention | 0 | — | 0 | 0 | 0 | 2 | 0 | 16 | 17 | 1 |
| Concept | 37 | 7 | — | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Health | 9 | 0 | 41 | — | 3 | 13 | 3 | 1 | 1 | 0 |
| Research | 2 | 0 | 6 | 2 | — | 3 | 0 | 1 | 0 | 0 |
| Medical | 4 | 2 | 25 | 4 | 4 | — | 6 | 3 | 0 | 1 |
| Epidemiology | 4 | 0 | 18 | 3 | 0 | 3 | — | 0 | 0 | 4 |
| Event | 0 | 5 | 1 | 0 | 0 | 0 | 1 | — | 16 | 0 |
| Goods | 0 | 1 | 18 | 6 | 0 | 0 | 0 | 2 | — | 0 |
| Character | 0 | 2 | 11 | 0 | 5 | 5 | 3 | 1 | 0 | — |
aThe numbers below the diagonal are class matches and the numbers above the diagonal are property matches.
bNot applicable.
Performance of schema matching.
| Knowledge graph | Class (%) | Property (%) | |||||
|
| Precision | Recall | F1-score | Precision | Recall | F1-score | |
| Encyclopedia | 75.0 | 85.7 | 80.0 | 85.0 | 85.0 | 85.0 | |
| Prevention | 76.5 | 100.0 | 86.7 | 94.4 | 100.0 | 97.1 | |
| Concept | 72.0 | 90.1 | 80.0 | —a | — | — | |
| Health | 55.4 | 94.7 | 69.9 | 76.7 | 88.5 | 82.1 | |
| Research | 78.9 | 100.0 | 88.2 | 54.5 | 100.0 | 70.6 | |
| Medical | 87.2 | 91.1 | 89.1 | 79.4 | 90.0 | 84.4 | |
| Epidemiology | 78.1 | 100.0 | 87.7 | 78.6 | 100.0 | 88.0 | |
| Event | 100.0 | 100.0 | 100.0 | 91.9 | 100.0 | 95.8 | |
| Goods | 70.4 | 76.0 | 73.1 | 97.1 | 100.0 | 98.5 | |
| Character | 100.0 | 100.0 | 100.0 | 83.3 | 83.3 | 83.3 | |
| Overall | 74.6 | 91.5 | 82.2 | 85.6 | 95.0 | 90.0 | |
aNot applicable.
Primary predicates used in the OpenKG-COVID19 schemata.
| Predicate | Description |
| rdf:label | Local name statement of all URLs |
| rdfs:subClassOf | The hypernym-hypernym relationship between two classes |
| rdfs:domain | Domain class of a property |
| rdfs:range | Range class or literal data types of a property, which can be multivalued |
| owl:sameA | Synonym relationship between two resources |
Figure 6Data set search interface (left) and entity search interface (right).
Figure 7Knowledge review on the web (left) and WeChat mini app (right) of OpenBase.