| Literature DB >> 31801523 |
Yueping Sun1, Li Hou2, Lu Qin1, Yan Liu1, Jiao Li1, Qing Qian1.
Abstract
BACKGROUND: To robustly identify synergistic combinations of drugs, high-throughput screenings are desirable. It will be of great help to automatically identify the relations in the published papers with machine learning based tools. To support the chemical disease semantic relation extraction especially for chronic diseases, a chronic disease specific corpus for combination therapy discovery in Chinese (RCorp) is manually annotated.Entities:
Keywords: Chemical-disease relations; Chronic diseases; Combination therapy; Corpus annotation; Relation extraction
Mesh:
Year: 2019 PMID: 31801523 PMCID: PMC6894109 DOI: 10.1186/s12911-019-0936-3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
The topic distributions of RCorp
| Disease Topic | Proportion |
|---|---|
| diabetes mellitus | 16.2 |
| leukemia | 12.7 |
| asthma | 10.9 |
| hypertension | 11.5 |
| chronic cardiopulmonary disease | 10.0 |
| myocardial infarction | 8.26 |
| cerebral infarction | 4.72 |
| hepatitis | 3.83 |
| thyroiditis | 2.36 |
| eye disease | 2.36 |
| intestinal tuberculosis | 1.47 |
| others | 22.2 |
Fig. 1Annotation example shown in the annotation tool CBSAS
Fig. 2An example of annotation results of the combination therapy version
The overall corpus statistics
| Training | Test | Total | |
|---|---|---|---|
| Articles | 271 | 68 | 339 |
| Chemical Mention | 1870 | 497 | 2367 |
| Chemical ID | 455 | 135 | 526 |
| Disease Mention | 1711 | 402 | 2113 |
| Disease ID | 309 | 95 | 354 |
| Symptom Mention | 184 | 53 | 237 |
| Symptom ID | 72 | 32 | 84 |
| CID | 135 | 29 | 164 |
| CIS | 120 | 43 | 163 |
| CTD | 636 | 166 | 802 |
Fig. 3Distribution of chemical mentions and disease mentions in the corpus. a Chemical mentions distribution. b Disease mentions distribution. The blue circle is the number of unique concepts from the training set, the yellow circle is the number of unique concepts from the test set, and the light brown circle is the overlap of concepts from both of the two sets
Fig. 4Distribution of CTD mentions in the corpus. The blue circle is the number of unique relations from the training set, the yellow circle is the number of unique relations from the test set, and the light brown circle is the overlap of relations from both of the two sets
Inter-annotator agreement F scores of the corpus
| Object | F | Relaxed F |
|---|---|---|
| Chemical | 0.883 | 0.944 |
| Disease | 0.791 | 0.859 |
| Symptom | 0.627 | 0.765 |
| CID | 0.382 | 0.628 |
| CIS | 0.456 | 0.783 |
| CTD | 0.479 | 0.788 |
A comparison of works on the corpus building of CDRs
| Corpus or author name | Language | Dictionary | Sources | Scale | Text boundary | Annotation results |
|---|---|---|---|---|---|---|
| Roberts [ | en | UMLS | clinical text | 150 | sentence | Condition, intervention, drug, locus and their interaction relations |
| i2b2/VA [ | en | – | clinical text | 871 | sentence | Relation types that hold between medical problems, tests, and treatments |
| EU-ADR [ | en | MeSH/UMLS++ [ | abstracts | 300 | sentence | Drugs, disorders, targets and their inter-relationships |
| Rosario [ | en | MeSH | abstracts | 3495 sentences | sentence | Relationships between treatment and disease |
| IxaMed-GS [ | spa | SNOMED CT | Clinical text | 75 docs/5410 sentences | document | Relationships between entities indicating adverse drug reaction events |
| BioCreative CDR [ | en | MESH [ | abstracts | 1500 | document | Relationships between chemicals and diseases (CID) |
| RCorp | cn | CMESH [ | abstracts | 339 | document | Relationships between chemicals and diseases (CTD) |