| Literature DB >> 31931870 |
Sara Althubaiti1,2, Şenay Kafkas1,2, Marwa Abdelhakim1,2, Robert Hoehndorf3,4.
Abstract
BACKGROUND: Ontologies are widely used across biology and biomedicine for the annotation of databases. Ontology development is often a manual, time-consuming, and expensive process. Automatic or semi-automatic identification of classes that can be added to an ontology can make ontology development more efficient.Entities:
Keywords: Disease ontology; Embeddings; Neural network
Mesh:
Year: 2020 PMID: 31931870 PMCID: PMC6958746 DOI: 10.1186/s13326-019-0218-0
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Label-based workflow. The workflow describes how words (in red) are classified as disease or “other”
Fig. 2Annotation-based workflow. In this workflow, we first normalize the mentions of disease classes in the corpus and then apply Word2Vec to generate embeddings for classes, not merely words
Fig. 3a) The visualization of the embeddings using the t-SNE for binary-classification task b) The visualization of the embeddings using the t-SNE for classifying infectious diseases. c) The visualization of the embeddings using the t-SNE for classifying anatomical diseases. d) The visualization of the embeddings using the t-SNE for classifying the combination of infectious and anatomical diseases
F-score and AUC for our four experiments using different hidden layer sizes
| Classification | Hidden layer sizes | 10 | 50 | 100 | 200 | ||||
|---|---|---|---|---|---|---|---|---|---|
| Number of classes | F-score | AUC | F-score | AUC | F-score | AUC | F-score | AUC | |
| Diseases | 2 | 94.65% | 95.31% | 94.83% | 95.97% | 94.49% | 95.99% | ||
| Infectious disease | 5 | 95.65% | 95.01% | 96.01% | 95.74% | 95.22% | 95.68% | ||
| Anatomical disease | 13 | 69.18% | 77.22% | 70.15% | 80.24% | 70.20% | 76.98% | ||
| Infectious + anatomical diseases | 17 | 71.07% | 84.75% | 84.03% | 72.61% | 72.67% | 83.66% | ||
The values in bold represent the highest AUC and F-score within each experiments
Fig. 4ROC curves for each experiment (Diseases, Infectious disease, Anatomical disease and a combination of Infectious disease + Anatomical disease)
Manually analyzed disease terms predicted as disease
| Term | Manual analysis result | Explanation for the suggested diseases |
|---|---|---|
| FACTO | other | - |
| leucoencephalopathy | other | - |
| Disease | A disease refers to a condition with repetitive mucosal ulcers [ | |
| Desmoid | other | - |
| metapneumovirus | other | - |
| Disease | A rare condition with abnormal flaccidity of both the trachea and the bronchi which results in possibility of narrowing or collapse of the airway [ | |
| Disease | A rare condition characterized by transient lesions in the central part of the splenium of the corpus callosum (SCC), followed by complete reversibility on follow-up magnetic resonance imaging (MRI) after a variable period. It coincides with different diseases [ | |
| mal-absorption | other | - |
| acroparesthesias | other | - |
| limb-shaking | other | - |
| Disease | A rare disease that has an Orphanet ID: ORPHA:251912. It is one of the pineal parenchymal tumors and is considered the least aggressive one [ | |
| hypomineralisation | other | - |
| Disease | It is a severe form of human gnathostomiasis, DOID:11379, which can lead to disease and death, it involves the nervous system [ | |
| Metastasis | other | - |
| Disease | A type of cancer that begins in plasma cells that produce antibodies. It could be one of the synonyms of multiple myeloma DOID:9538 [ | |
| Disease | An OMIM disease, OMIM:254900 [ | |
| arthralgia | other | - |
| Disease | Fibrodentinoma is a benign odontogenic tumor that occurs in children and young adults. The disease name usually is represented as “Ameloblastic Fibrodentinoma” [ | |
| infantile-ataxia | other | - |
| knowlesi | other | - |
The terms in bold represent the correctly validated terms (by a clinician) that classified as diseases terms using our method (in Diseases classification experiment).
Sample of manually analyzed disease terms predicted as infectious disease
| Disease terms | Ontology class assigned by ANN | Manual analysis result | Suggested additional classification | DOID | Explanation |
|---|---|---|---|---|---|
| Pelizaeus-Merzbacher disease | Viral infectious disease | Non-infectious (inherited disorder) | - | - | - |
| Viral infectious disease | Viral infectious disease | herpes simplex | DOID:8566 | The disease is caused by Human herpesvirus 8 which is Herpesviridae infection. | |
| Bacterial infectious disease | Bacterial infectious disease (usually start viral and progress to either bacterial or fungal) | - | - | It is an infection in the maxillary sinuses which could be due to different etiology, one of them is bacterial [ | |
| keratosis follicularis | Bacterial infectious disease | Non-infectious (genetic disease) | - | - | - |
| chronic rheumatic pericarditis | Viral infectious disease | The condition is triggered by autoimmune reaction to infection, mainly group A streptococci. | - | - | - |
| Viral infectious disease | In most cases the nerve is damaged by diabetes or surgery, however, a viral infection might be a cause | - | - | A condition in which the stomach suffers from paresis that affects the food movement to the small intestine [ | |
| osmotic diarrhea | Bacterial infectious disease | symptom | - | - | - |
| familial cold autoinflammatory syndrome | Viral infectious disease | Non-infectious (inherited disease) | - | - | - |
| Fungal infectious disease | Etiology is controversial, most commonly fungal or bacterial. | - | - | Ambiguous. | |
| Binder syndrome | Viral infectious disease | Congenital disease | - | - | - |
| hypohidrosis | Bacterial infectious disease | Multi-causal | - | - | - |
| Sjogren’s syndrome | Viral infectious disease | autoimmune disease | - | - | - |
| Fungal infectious disease | Etiology is controversial, however it is considered as a variant of orallesion associated with candida infection [ | - | - | Ambiguous. | |
| Goodpasture syndrome | Viral infectious disease | autoimmune disease | - | - | - |
| Bacterial infectious disease | Bacterial infectious disease | syphilis | DOID:4166 | Considering the same concept of etiology, both diseases are caused by bacterial infection (Treponema pallidum). | |
| acute diarrhea | Viral infectious disease | symptom | - | - | - |
| WHIM syndrome | Bacterial infectious disease | Congenital disease | - | - | - |
| erythrasma | Fungal infectious disease | Bacterial infection disease | - | - | - |
| chronic wasting disease | Parasitic infectious disease | Neurodegenerative disorder | - | - | - |
| Bacterial infectious disease | Bacterial infectious disease | rheumatic fever | DOID:1586 | The disease is caused by Group A bacteria of the genus Streptococcus, same causative agent for Rheumatic fever. |
The terms in bold represent the correctly validated terms (by a clinician) that classified as infectious diseases terms using our method (in Infectious disease classification experiment).
Sample of manually analyzed disease terms classified as affecting particular anatomical systems (Continued)
| Disease terms | Ontology class | Ontology class assigned by ANN | Manual analysis result | Suggested additional classification | DOID | Explanation |
|---|---|---|---|---|---|---|
| Timothy syndrome | genetic disease | cardiovascular system disease | Cannot specify (affect multiple parts) | - | - | - |
| Familial periodic paralysis | disease of metabolism | cardiovascular system disease | musculoskeletal system disease | - | - | - |
| disease of metabolism | endocrine system disease | endocrine system disease | pituitary gland disease | DOID:53 | The pituitary gland is the endocrine gland responsible for secreting prolactin. | |
| Angiokeratoma circumscriptum | disease of cellular proliferation | gastrointestinal system disease | cardiovascular system disease | - | - | - |
| syndrome | gastrointestinal system disease | gastrointestinal system disease | peptic ulcer disease | DOID:750 | It is a disease that affects either pancreas, duodenum, or both of them. Both organs are pats of the GIT system. The disease pathology is mainly excessive gastrin secretion with subsequent peptic ulcers. | |
| genetic disease | gastrointestinal system disease | gastrointestinal system disease | liver disease | DOID:409 | It is a genetic disorder that affects primarily the liver. | |
| disease of metabolism | hematopoietic system disease | hematopoietic system disease | kernicterus due to isoimmunization | DOID:12043 | Bilirubin disorder could be a result of blood pathology, same as for the mentioned classification DOID:12043. | |
| genetic disease | hematopoietic system disease | hematopoietic system disease | hemoglobinopathy | DOID:2860 | The disease is mainly a hemoglobin disorder with hematological phenotypes. | |
| Kabuki syndrome | syndrome | immune system disease | Not anatomical - multisystems | - | - | - |
| Amyloidosis | disease of metabolism | immune system disease | Not anatomical - multisystems | - | - | - |
| Fatty liver disease | disease of metabolism | musculoskeletal system disease | gastrointestinal system disease | - | - | - |
| Renal-hepatic-pancreatic dysplasia | physical disorder | musculoskeletal system disease | Cannot specify (affect multiple parts) | - | - | - |
| physical disorder | musculoskeletal system disease | musculoskeletal system disease | bone development disease/Synostosis | DOID:0080006/ DOID:11971 | There is already an entity in the DO for synostosis under bone development disease. | |
| genetic disease | musculoskeletal system disease | musculoskeletal system disease | bone remodeling disease | DOID:0080005 | We could suggest an additional classification based on the main affected system. Our suggestive classification is musculoskeletal since | |
| the disease is mainly affecting mineralization of the bone with phenotypes similar to those of Rickets DOID:10609. | ||||||
| disease of mental health | nervous system disease | nervous system disease | * | * | * | |
| disease of metabolism | nervous system disease | nervous system disease | neurodegeneration with brain iron accumulation | DOID:0110734 | The disease main pathophysiology is either the absence or dysfunction of ceruloplasmin with subsequent iron accumulation in various organ, mainly the brain. | |
| Glomangiomatosis | disease of cellular proliferation | nervous system disease | cardiovascular system disease | - | - | - |
| disease of metabolism | nervous system disease | nervous system disease | nervous system disease; since it covers many subclasses to which we can map many aspects of this disease | DOID:863 | The disease’s phenotypes reflect neurological affection of multiple parts in the nervous system. | |
| disease of cellular proliferation | reproductive system disease | reproductive system disease | Female reproductive organ cancer | DOID:120 | The term refers to the group of malignant neoplasms that consist of abnormal proliferation of trophoblastic tissues similar to choriocarcinoma DOID:3596 and gestational trophoblastic neoplasia DOID:3590. | |
| physical disorder | reproductive system disease | reproductive system disease | testicular disease | DOID:2519 | The term refers to undescended testicle. |
*Nacrolepsy: is classified as a sleep disorder which is correct, however, the class itself is a subclass to mental disorders. Since there are some neurological disorders that have shown a strong association with sleep disorder such as: neurodenegrative disorders such as tauopathy which involve Alzheimer’s diseases (DOID:10652) [51], synucleinopathy which involve Parkinsonism (DOID:14330) [52], and Genetic neurodegenerative disorders such as Machado-Joseph disease (DOID:1440) [53] or Huntington’s disease (DOID:12858) [54]. We suggest a new classification in which sleep disorders may also be a subclass of nervous system diseases (neurodegenerative disorder) [55] The terms in bold represent the correctly validated terms (by a clinician) that classified as anatomical diseases terms using our method (in Anatomical disease classification experiment).