| Literature DB >> 30999867 |
Pierre Monnin1, Joël Legrand2, Graziella Husson2, Patrice Ringot2, Andon Tchechmedjiev3, Clément Jonquet3,4, Amedeo Napoli2, Adrien Coulet2,4.
Abstract
BACKGROUND: Pharmacogenomics (PGx) studies how genomic variations impact variations in drug response phenotypes. Knowledge in pharmacogenomics is typically composed of units that have the form of ternary relationships gene variant - drug - adverse event. Such a relationship states that an adverse event may occur for patients having the specified gene variant and being exposed to the specified drug. State-of-the-art knowledge in PGx is mainly available in reference databases such as PharmGKB and reported in scientific biomedical literature. But, PGx knowledge can also be discovered from clinical data, such as Electronic Health Records (EHRs), and in this case, may either correspond to new knowledge or confirm state-of-the-art knowledge that lacks "clinical counterpart" or validation. For this reason, there is a need for automatic comparison of knowledge units from distinct sources.Entities:
Keywords: Knowledge comparison; Knowledge engineering; Linked open data; Ontology; Pharmacogenomics; Semantic web
Mesh:
Year: 2019 PMID: 30999867 PMCID: PMC6471679 DOI: 10.1186/s12859-019-2693-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Example of a sentence (PMID=18370849), manually annotated with four entities and one relation
Statistics of named entities and relations manually annotated in our 307-sentence corpus
| Named entities | Relations | |||
|---|---|---|---|---|
| Type | First-layer | Second-layer | Type | |
| Gene | 452 | 20 | isAssociatedWith | 582 |
| GenomicVariation | 74 | 166 | isEquivalentTo | 77 |
| Drug | 459 | 36 | ||
| Phenotype | 262 | 251 | ||
| Total | 1720 | Total | 659 | |
Entities annotated in multiple sentences are counted multiple times. Second-layer entities refer to entities which offset includes the annotation of a first-layer entity
Named Entity Recognition (NER) performance in terms of precision (P), recall (R) and f1-score (F1)
| P | R | F1 | std | |
|---|---|---|---|---|
| Drug | 0.92 | 0.87 | 0.89 | 0.03 |
| Gene | 0.97 | 0.91 | 0.94 | 0.03 |
| Phenotype | 0.84 | 0.66 | 0.74 | 0.09 |
| Genomic variation | 0.81 | 0.69 | 0.74 | 0.08 |
| All entities | 0.86 | 0.80 | 0.83 | 0.05 |
Results of second-layer entities take into account the prediction error of the first-layer entities. Std stands for F1 standard deviation
Relation extraction performance in terms of precision (P), recall (R) and f1-score (F1)
| P | R | F1 | std | |
|---|---|---|---|---|
| isAssociatedWith | 0.61 | 0.35 | 0.44 | 0.08 |
| isEquivalentTo | 0.73 | 0.78 | 0.75 | 0.14 |
| All relations | 0.67 | 0.56 | 0.61 | 0.08 |
Std stands for F1 standard deviation. Results take into account the prediction error for the entities
Reference databases and ontologies used to normalize the entities extracted from text
| Order | Drug | Gene | GenomicVariation | Phenotype |
|---|---|---|---|---|
| 1 st | MeSH | NCBI Gene | dbSNP | MeSH |
| 2 nd | ChEBI | PGxLOD | PGxLOD | MEDDRA |
| 3 rd | ATC | PGxLOD | ||
| 4 th | PGxLOD |
PGxLOD means that a local URI is created
Fig. 2Main concepts and relations of PGxO. The central concept of the ontology is PharmacogenomicRelationship
Main statistics of PGxLOD v2
| PGxO concept | Number of instances |
|---|---|
| Drug | 51,459 |
| GeneticFactor | 386,801 |
| Gene | 172,881 |
| GenomicVariation | 213,910 |
| Haplotype | 33 |
| Variant | 204,875 |
| Phenotype | 88,247 |
| Disease | 47,573 |
| PharmacodynamicPhenotype | 63 |
| PharmacokineticPhenotype | 44 |
| PharmacogenomicRelationship | 68,431 |
| | 2701 |
| | 65,720 |
| | 10 |
Statistics of the instantiation of PGxO with data from PGxLOD v1
| Source | Genes | Variants | Drugs | Diseases | Phenotypes |
|---|---|---|---|---|---|
| ClinVar | 21,487 | 103,219 | 0 | 0 | 6837 |
| DisGeNET | 85,893 | 49,279 | 0 | 38,727 | 6092 |
| DrugBank | 4300 | 0 | 7740 | 0 | 0 |
| MediSpan | 0 | 0 | 5820 | 2481 | 0 |
| SIDER | 0 | 0 | 25,479 | 6291 | 0 |
| UniProt | 25,456 | 0 | 0 | 0 | 0 |
| Total | 137,136 | 152,498 | 39,039 | 47,499 | 12,929 |
Numbers of PGx relationships extracted from PharmGKB v2018-03-05
| Caused phenotype | Level of evidence | All | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Toxicity/ADR | Efficacy | 1A | 1B | 2A | 2B | 3 | 4 | ||
| # PGx relationships | 1268 | 1531 | 44 | 11 | 71 | 97 | 2270 | 208 | 2701 |
Some PGx relationships can cause both Toxicity/ADR and Efficacy
Fig. 3A PGx relationship extracted from PharmGKB on October 30th, 2018 and represented with PGxO. For readability purposes, in some cases labels are used instead of URIs. Only one drug and one variant are represented, whereas this relationship involves more components. The clinical annotation is available at https://www.pharmgkb.org/gene/PA356/clinicalAnnotation/1184648909
Numbers of unique entities recognized in the test corpus and successfully mapped with reference databases or ontologies
| Database / Ontology | Drug | Gene | GenomicVariation | Phenotype |
|---|---|---|---|---|
| MeSH | 1600 | n/a | n/a | 1625 |
| ChEBI | 285 | n/a | n/a | n/a |
| ATC | 78 | n/a | n/a | n/a |
| NCBI Gene | n/a | 4907 | n/a | n/a |
| dbSNP | n/a | n/a | 803 | n/a |
| MEDDRA | n/a | n/a | n/a | 0 |
| PGxLOD | 6449 | 5905 | 7937 | 22,335 |
| Total | 8412 | 10,812 | 8740 | 23,960 |
Reference databases and ontologies are listed in Table 4
Fig. 4A PGx relationship extracted from the literature on September 13th, 2018 and represented with PGxO. For readability purposes, in some cases labels are used instead of URIs. For example, the TPMT gene is identified with the URI http://bio2rdf.org/ncbigene:7172. The abstract is available at https: //www.ncbi.nlm.nih.gov/pubmed/23029095/
Fig. 5A PGx relationship discovered from EHRs [48] and represented with PGxO. The initial association discovered from EHRs is standing between a drug response and the TPMT activity, i.e. a phenotype. The later is considered a proxy to the genotype of the TPMT gene, as stated by the CPIC guidelines. For readability purposes, in some cases labels are used instead of URIs
Fig. 6Example of a RDF graph on which a reconciliation rule identifies that two PGx relationships are identical. The owl:sameAs links result of the application of the rule
Number of owl:sameAs links between PGx relationships from each source
| EHRs | Literature | PharmGKB | |
|---|---|---|---|
| EHRs | 0 | 0 | 0 |
| Literature | 0 | 109,078 | 0 |
| PharmGKB | 0 | 0 | 132 |
Number of skos:broadMatch links between PGx relationships from each source
| EHRs | Literature | PharmGKB | |
|---|---|---|---|
| EHRs | 0 | 14 | 0 |
| Literature | 0 | 133,966 | 0 |
| PharmGKB | 0 | 865 | 98 |
Rows represent origins of the links and columns represent destinations