| Literature DB >> 36034595 |
Qian Zhu1, Chunxu Qu2, Ruizheng Liu2, Gunjan Vatas3, Andrew Clough4, Ðắc-Trung Nguyễn5, Eric Sid2, Ewy Mathé1, Yanji Xu2.
Abstract
Rare diseases (RDs) are naturally associated with a low prevalence rate, which raises a big challenge due to there being less data available for supporting preclinical and clinical studies. There has been a vast improvement in our understanding of RD, largely owing to advanced big data analytic approaches in genetics/genomics. Consequently, a large volume of RD-related publications has been accumulated in recent years, which offers opportunities to utilize these publications for accessing the full spectrum of the scientific research and supporting further investigation in RD. In this study, we systematically analyzed, semantically annotated, and scientifically categorized RD-related PubMed articles, and integrated those semantic annotations in a knowledge graph (KG), which is hosted in Neo4j based on a predefined data model. With the successful demonstration of scientific contribution in RD via the case studies performed by exploring this KG, we propose to extend the current effort by expanding more RD-related publications and more other types of resources as a next step.Entities:
Keywords: PubMed; knowledge graph; natural language processing; rare disease (RD); scientific annotations
Year: 2022 PMID: 36034595 PMCID: PMC9403737 DOI: 10.3389/frai.2022.932665
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Figure 1An example of exploring Adrenomyodystrophy-associated literature [A: a graph containing semantic annotations generated from the original paper for Adrenomyodystrophy; B: an expanded graph from A via one disease node of “adrenal insufficiency”; C: an expanded graph from A via one author node of “Ropers.” The attached Cypher Query 0 was applied to generate A. (B,C) can be generated by clicking the nodes circled in A, respectively].
Figure 2The study workflow.
Figure 3Data model and an example of information (data properties) about one article node.
PubTator annotation results.
|
|
|
|---|---|
| Disease | 1,051,402 |
| Gene | 242,436 |
| Chemical | 177,226 |
| Mutation | 100,879 |
| Species | 57,985 |
| Cell line | 4,148 |
| Genus | 255 |
| Strain | 56 |
Results of OMIM category.
|
|
|
|---|---|
| Clinical features | 20,406 |
| Molecular genetics | 11,959 |
| See also | 4,732 |
| Mapping | 4,065 |
| Inheritance | 3,312 |
| Description | 3,114 |
| Animal model | 1,775 |
| Pathogenesis | 1,640 |
| Clinical management | 1,392 |
| Cytogenetics | 1,303 |
| Population genetics | 1,240 |
| Diagnosis | 1,190 |
| History | 1,119 |
| Genotype phenotype correlations | 737 |
| Biochemical features | 717 |
| Other features | 525 |
| Nomenclature | 359 |
| Heterogeneity | 246 |
Statistical results for the KG.
|
|
|
|---|---|
| Author | 3,154,451 |
| FullTextUrl | 2,264,836 |
| Substance | 47,506 |
| PubtatorAnnotation | 1,634,387 |
| Article | 1,362,819 |
| Keyword | 556,496 |
| JournalVolume | 537,679 |
| MeshTerm | 46,862 |
| OMIMRef | 18,455 |
| Journal | 13,485 |
| Disease | 6,061 |
| MeshQualifier | 165 |
Thirteen types of “Ehlers-Danlos syndrome”.
|
|
|
|---|---|
| GARD:0002081 | Hypermobile Ehlers-Danlos syndrome |
| GARD:0002082 | Vascular Ehlers-Danlos syndrome |
| GARD:0002083 | Kyphoscoliotic Ehlers-Danlos syndrome |
| GARD:0002084 | Arthrochalasia Ehlers-Danlos syndrome |
| GARD:0002088 | Classical Ehlers-Danlos syndrome |
| GARD:0002089 | Dermatosparaxis Ehlers-Danlos syndrome |
| GARD:0006322 | Ehlers-Danlos syndromes |
| GARD:0008486 | Musculocontractural Ehlers-Danlos syndrome |
| GARD:0008507 | Classical-like Ehlers-Danlos syndrome |
| GARD:0008508 | Ehlers-Danlos syndrome, dysfibronectinemic type |
| GARD:0009991 | Spondylodysplastic Ehlers-Danlos syndrome |
| GARD:0012474 | Periodontal Ehlers-Danlos syndrome |
| GARD:0012613 | Cardiac-Valvular Ehlers-Danlos syndrome |
Figure 4Overview of EDS research collected in PubMed between 1900 and 2021. (A) Research on EDS increased over years since its first publication in 1958. (B) Number of articles on each type of EDS with the bar of the parent EDS (GARD:0006322) in red. (C) Heatmap showing multiple types of EDS (column) discussed in one article (row), in dark red indicating the disease mentioned in an article.
Figure 5Discovery of alternative use of dextromethorphan for other rare diseases. (A) Dextromethorphan was widely studied for glycine encephalopathy (GE, GARD:0007219). (B) Dextromethorphan has potential usage for seven rare neurological diseases including Rett syndrome. (C) Five RDs including Ectopia pupillae and Pustular psoriasis have been studied as side effects of Dextromethorpha.