| Literature DB >> 30407536 |
Ying Yu1, Yunjin Wang1, Zhaojie Xia2, Xiangyu Zhang3, Kailiang Jin3, Jingcheng Yang1, Luyao Ren1, Zheng Zhou1, Dong Yu1, Tao Qing1, Chengdong Zhang1, Li Jin4,5, Yuanting Zheng1,5,6, Li Guo2,7, Leming Shi1,5,6.
Abstract
One important aspect of precision medicine aims to deliver the right medicine to the right patient at the right dose at the right time based on the unique 'omics' features of each individual patient, thus maximizing drug efficacy and minimizing adverse drug reactions. However, fragmentation and heterogeneity of available data makes it challenging to readily obtain first-hand information regarding some particular diseases, drugs, genes and variants of interest. Therefore, we developed the Precision Medicine Knowledgebase (PreMedKB) by seamlessly integrating the four fundamental components of precision medicine: diseases, genes, variants and drugs. PreMedKB allows for search of comprehensive information within each of the four components, the relationships between any two or more components, and importantly, the interpretation of the clinical meanings of a patient's genetic variants. PreMedKB is an efficient and user-friendly tool to assist researchers, clinicians or patients in interpreting a patient's genetic profile in terms of discovering potential pathogenic variants, recommending therapeutic regimens, designing panels for genetic testing kits, and matching patients for clinical trials. PreMedKB is freely accessible and available at http://www.fudan-pgx.org/premedkb/index.html#/home.Entities:
Mesh:
Year: 2019 PMID: 30407536 PMCID: PMC6324052 DOI: 10.1093/nar/gky1042
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.PreMedKB architecture with three layers including meta databases, domain knowledgebases, and application modules. The meta database layer consists of databases on diseases, genes, variants and drugs with their respective metadata such as names and synonymies. The domain knowledgebases consist of the relationships between two or more of the four components. Through RESTful APIs, application layer, consisting of user-friendly applications, can be connected with the meta database layer and the domain knowledgebase layer.
Data sources and summary of integrated data
| Meta database | Data sources |
| Disease meta data | Disease Ontology, ICD-10, MalaCards, OMIM, Orphanet |
| Gene meta data | HGNC, NCBI.Gene, OMIM, UniProtKB, GTEx, TCGA |
| Variant meta data | CIViC, COSMIC, ClinVar, dbSNP, HGMD, ExAC |
| Drug meta data | Drugs@FDA, DrugBank, PubChem, STITCH, ClassyFire, DailyMed, TTD |
| Domain knowledgebase | Data sources |
| Semantic relation data source | CIViC, COSMIC, ClinVar, FDA Pharmacogenomic Biomarkers, HGMD, My Cancer Genome, NCCN, PharmGKB, TTD |
| Clinical trial support | ClinicalTrials |
| Literature support | MEDLINE, PubTator |
| Terminology | UMLS |
Some of the abbreviations, their full names of above databases and URLs:
CIViC, Clinical Interpretations of Variants in Cancer, https://civicdb.org/;
ClassyFire, chemical classification, http://classyfire.wishartlab.com/;
ClinicalTrials, https://clinicaltrials.gov/;
ClinVar, Clinical Variation database, https://www.ncbi.nlm.nih.gov/clinvar/;
COSMIC, Catalogue Of Somatic Mutations In Cancer, https://cancer.sanger.ac.uk/cosmic;
dbSNP, single nucleotide polymorphism database, https://www.ncbi.nlm.nih.gov/snp;
DailyMed, https://www.dailymed.nlm.nih.gov/;
Disease Ontology, http://disease-ontology.org/;
DrugBank, https://www.drugbank.ca/;
Drugs@FDA, https://www.accessdata.fda.gov/scripts/cder/daf/;
ExAC, Exome Aggregation Consortium, http://exac.broadinstitute.org/;
FDA Pharmacogenomic Biomarkers, https://www.fda.gov/Drugs/ScienceResearch/ucm572698.htm;
GTEx, Genotype-Tissue Expression, https://gtexportal.org/;
HGMD, Human Gene Mutation Database, http://www.hgmd.cf.ac.uk/ac/index.php;
HGNC, HUGO Gene Nomenclature Committee, https://www.genenames.org/;
ICD-10, the 10th revision of the International Classification of Diseases and Related Health Problems (ICD), http://apps.who.int/classifications/icd10/browse/2016/en;
MalaCards, database of human maladies, https://www.malacards.org/;
MEDLINE, https://www.nlm.nih.gov;
My Cancer Genome, http://www.mycancergenome.org;
NCBI.Gene, https://www.ncbi.nlm.nih.gov/gene/;
NCCN, the National Comprehensive Cancer Network, https://www.nccn.org/;
OMIM, Online Mendelian Inheritance in Man, https://omim.org/;
Orphanet, rare diseases database, https://www.orpha.net/;
PharmGKB, the Pharmacogenomics Knowledgebase, https://www.pharmgkb.org/;
PubChem, https://pubchem.ncbi.nlm.nih.gov/;
PubTator, a web-based system for assisting biocuration, https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/;
STITCH, search tool for interactions of chemicals, http://stitch.embl.de/;
TCGA, The Cancer Genome Atlas, https://cancergenome.nih.gov/;
TTD, Therapeutic Target Database, http://bidd.nus.edu.sg/group/cjttd/;
UMLS, Unified Medical Language System, http://www.medical-language-international.com/;
UniProtKB, The Universal Protein Resource (UniProt) knowledgebase; https://www.uniprot.org/
Figure 2.The schema of PreMedKB and its main features. PreMedKB provides a resource for integrating information on diseases, genes, variants, drugs, and the relationships between any two or more of these four components. PreMedKB allows users to search by disease(s), gene(s), variant(s), drug(s) or combinations of these categories. A comprehensive overview of the relationships between four components with evidences can be obtained with viewing facilities to help understanding the relationships.
Figure 3.PreMedKB user interface. (A) Search builder of PreMedKB. PreMedKB provides search builder to accomplish complex search requirements. (B) As a result, a knowledge graph displaying diseases, genes, variants, and drugs that are related to the input query can be obtained. A word cloud showing hot keywords and a relationship table of semantic relations are also provided. (C) The interface enables sorting and filtering the resulting associations by rating, type of relations, specific diseases/genes/variants/drugs. (D) Each node and edge can be clicked to view detailed information. For edges, the source databases, related clinical trials and literature supporting the relationships are shown, and for nodes, the metadata can be displayed. In addition to showing the metadata in the ordinary way, we apply dynamic charts and interactive bodymap to visualize the mutation landscapes, expression profiles, gene locations and the 3D structure of the drug molecule.
Comparison between PreMedKB and other precision medicine databases
| Databases | Knowledge field | Term Normalization | Search methods | Structured data | Knowledge presentation | Docking with NGS pipelines | Other information | Programmatic use | Data download |
|---|---|---|---|---|---|---|---|---|---|
|
| Target therapy in cancer | No | Controlled words and free text | No | Tables and texts | Need format change | Meta data on gene and variant | Via API | Yes |
|
| Drug data and related target information | Yes | Free text | No | Tables and texts | Without specific variant sites | Literature, clinical trials, meta data on gene and drug | Via API | Yes |
|
| Variants for human inherited disease | Yes | Database retrieval | No | Tables and texts | Direct | Literature | Via MySQL | License required |
|
| Targeted therapies, immune therapies and other in cancer | No | Controlled words | No | Tables and texts | Need format change | Literature, clinical trials | No | No |
|
| Target therapy in cancer | Genes are normalized | Free text | Yes | Tables and dynamic graphs | Need format change | Literature, clinical sequencing cohort | Via API | Yes |
|
| Target therapy, pharmacogenomics and drug repurposing in cancer | No | Controlled words | Yes | Tables and texts | Without specific variant sites | Cross links to NCBI.Gene, PubChem, PubMed and ClinicalTrials.gov | Via API | Yes |
|
| Pharmacogenomics | No | Controlled words and free text | No | Tables and texts | Need format change | Literature | No | License required |
|
| Target therapies, immunotherapies, chemo therapies, pharmacogenomics and pathogenic sites in cancer and other diseases | Yes | Free text and search builder | Yes | Tables, text and dynamic network / graphs | Direct | Literature, clinical trials, expression profiles and genomic landscapes | No | No |
Figure 4.Examples of learning molecular traits of diseases by using single or combined searching query. (A) As single query search, ‘lung cancer’ is used as the search query to search for all directly linked nodes. By default, only the top 20 relationships are displayed. (B) As complex search, two queries ‘7-55259515-T-G’[variant] AND ‘lung cancer’[disease] is used. In the result, five drugs and their relationships with the two nodes in the search builder are shown in the semantic network.
Figure 5.An example using PreMedKB to identify genomic risk factors in a pancreatic cancer patient. (A) After data analysis, two pathogenic candidate variants are identified and used as input of PreMedKB. (B) Variation of SPINK1 (5-157828020-A-G) is identified as a cause of hereditary pancreatitis. (C) The traverse level is selected to be 2, showing an extensive search using the initial result as input. (D) A more comprehensive network is obtained. After filtering, SPINK1 is shown to have a direct connection with pancreatic cancer and hereditary pancreatitis. (E) Detail information helps us understand the functions and expression profile of SPINK1 gene.