| Literature DB >> 22849591 |
María Taboada1, Diego Martínez, Belén Pilo, Adriano Jiménez-Escrig, Peter N Robinson, María J Sobrido.
Abstract
BACKGROUND: Semantic Web technology can considerably catalyze translational genetics and genomics research in medicine, where the interchange of information between basic research and clinical levels becomes crucial. This exchange involves mapping abstract phenotype descriptions from research resources, such as knowledge databases and catalogs, to unstructured datasets produced through experimental methods and clinical practice. This is especially true for the construction of mutation databases. This paper presents a way of harmonizing abstract phenotype descriptions with patient data from clinical practice, and querying this dataset about relationships between phenotypes and genetic variants, at different levels of abstraction.Entities:
Mesh:
Year: 2012 PMID: 22849591 PMCID: PMC3444309 DOI: 10.1186/1472-6947-12-78
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Figure 1Steps required to answering questions on phenotype-genotype relationships. The figure includes an example showing the genetic variants that have been identified in patients with childhood-onset chronic diarrhea (i.e., p.Q230X, p.R395C, p.R405W and p.T343R).
Figure 2Example of hierarchical (‘is-a’) relationships between Abnormalities of the Central Nervous System and more specific disorders, such as Epilepsy or Ataxia.
Figure 3Layers of the approach proposed in this work.
Percentage of match of CTX terminology extracted from different sources
| Patient Data (ExactMatch) | 65 | 93 % | 67 % | 78 % |
| Patient Data (ExactMatch + MetaMap) | 65 | 94 % | 94 % | 94 % |
| OMIM (NormalizeString) | 23 | 100 % | 100 % | 100 % |
| GeneReviews (MetaMap) | 83 | 88 % | 79 % | 83 % |
The F measure represents 2 (P R)/(P + R), the geometric mean of the precision (P) and recall (R), which is a standard measure for the goodness of information retrieval from documents.
Coverage of relevant CTX terminology in Snomed CT, HPO and PATO
| Snomed CT | 56 | 6 | 11 | 11 |
| (93.3 %) | (100 %) | (92 %) | (73.3 %) | |
| HPO | 50 | ---- | ---- | ----- |
| (83.3 %) | | | | |
| PATO | ---- | ---- | ---- | 5 |
| | | | (33.3 %) | |
| Total | 60 | 6 | 12 | 15 |
| (100 %) | (100 %) | (100 %) | (100 %) |
Descriptive statistics on the origin of CTX ontology concepts
| Phenotype | 153 | 17 | 6 |
| Anatomical Structure | 0 | 42 | 0 |
| Diagnostic Study | 0 | 30 | 0 |
| Qualifier Value | 0 | 31 | 0 |
| Total | 153 | 120 | 6 |
| (55 %) | (43 %) | (2 %) |
Four examples of queries to the patient data
| What are the | Patient(?p) ^ hasNervousSystemDisorder(?p, ?x) ^ Epilepsy(?x) ^ hasPresence(?x, ?y) ^ Yes(?y) ^ hasNervousSystemDisorder(?p, ?d) ^ Dementia(?d) ^ hasPresence(?d, ?y) ^ hasGeneMutation(?p, ?g) ^ GeneticMutation(?g) ^mutation(?g, ?m) → sqwrl:columnNames("GeneMutation") ^ sqwrl:selectDistinct(?m) | c.844 + 1 G- > T | |
| p.N403K | |||
| p.R395C | |||
| p.R405W | |||
| p.T339M | |||
| p.T343R | |||
| What are the Abnormalities of the Central Nervous System that have been associated with | Patient(?p1) ^ hasNervousSystemDisorder(?p1, ?z) ^ hasPresence(?z, ?y) ^ Yes(?y) ^ AbnormalityoftheCerebellum(?z) ^ hasGeneMutation(?p1, ?g) ˚ sqwrl:makeSet(?s1, ?z) ^ Patient(?p2) ^ hasOtherManifestations(?p2, ?x) ^ hasPresence(?x, ?y) ^ AbnormalityoftheCerebellum(?x) ^ hasGeneMutation(?p2, ?g) ^ sqwrl:makeSet(?s2, ?x) ^ GeneMutation(?g) ^ mutation(?g, ?m) ^ swrlb:equal(?m, "p.R395C") ˚ sqwrl:append(?s3, ?s1, ?s2) ^ sqwrl:element(?e, ?s3) → sqwrl:select(?e) | PresenceofAtaxia | |
| PresenceofChiariTypeI | |||
| How often has | Patient(?p1) ^ hasNervousSystemDisorder(?p1, ?x) ^ Ataxia(?x) ^ hasPresence(?x, ?y) ^ Yes(?y) ^ hasGeneMutation(?p1, ?g) ^ GeneMutation(?g) ^ mutation(?g, ?m) ^ swrlb:equal(?m, "p.R395C") ˚ sqwrl:makeSet(?s1, ?p1) ˚ sqwrl:size(?size1, ?s1) ^ Patient(?p2) ^ hasGeneMutation(?p2, ?g) ^ sqwrl:makeSet(?s2, ?p2) ^ sqwrl:size(?size2, ?s2) ^ swrlb:multiply(?mu, ?size1, 100.0) ^ swrlb:divide(?d, ?mu, ?size2) → sqwrl:select(?d) | 57 % | |
| What is the average number of years from the onset of diarrhea to the first neurological symptom in patients with the genetic variant p.R395C? | Patient(?p1) ^ hasGeneMutation(?p1, ?g) ^ GeneticMutation(?g) ^ mutation(?g, "p.R395C") ^ hasDiarrheaAge(?p1, ?d) ^ AgeatFirstSymptom(?d) ^ age(?d, ?da) ^ hasNeurologicalSymptomsOnsetAge(?p, ?a) ^ AgeatFirstSymptom(?a) ^ age(?a, ?ca) ^ swrlb:subtract(?di, ?ca, ?da) → sqwrl:columnNames("Average age from diarrhea to neurological symptoms onset") ^ sqwrl:avg(?di) | 7 years |
The first query is about genetic variants associated with a specific combination of observable features; the second query is about phenotype traits associated to a specific genetic variant; and the third and fourth ones are examples of querying information about frequency and elapsed time associated with the presence of a specific genetic variant and trait.