| Literature DB >> 30463544 |
Johannes Starlinger1,2, Steffen Pallarz3, Jurica Ševa3, Damian Rieke4,5,6, Christine Sers7, Ulrich Keilholz4, Ulf Leser3.
Abstract
BACKGROUND: The decreasing cost of obtaining high-quality calls of genomic variants and the increasing availability of clinically relevant data on such variants are important drivers for personalized oncology. To allow rational genome-based decisions in diagnosis and treatment, clinicians need intuitive access to up-to-date and comprehensive variant information, encompassing, for instance, prevalence in populations and diseases, functional impact at the molecular level, associations to druggable targets, or results from clinical trials. In practice, collecting such comprehensive information on genomic variants is difficult since the underlying data is dispersed over a multitude of distributed, heterogeneous, sometimes conflicting, and quickly evolving data sources. To work efficiently, clinicians require powerful Variant Information Systems (VIS) which automatically collect and aggregate available evidences from such data sources without suppressing existing uncertainty.Entities:
Keywords: Data model; Genomic variant data integration; Molecular cancer therapy; Variant information system
Mesh:
Year: 2018 PMID: 30463544 PMCID: PMC6249891 DOI: 10.1186/s12911-018-0665-z
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1A Variant Information System (VIS) integrates public data sources and makes their joint information available for use both within inhouse systems for patient knowledge management and directly to domain expert users. (Clipart source: openclipart.org; public domain)
Overview of databases integrated in our current implementation of the data model indicating which type of data is provided (’+’) by each database
| COSMIC | ClinVar | CIViC | Ensembl | 1000 genomes | OncoKB | DrugBank | ClinicalTrials.gov | DGIdb | Cancer gene census | reactome | ncbi | DoCM | ExAC | canSAR | TCGA | GeneView | KEGG | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| + | + | + | + | + | + | ∼ | + | ||||||||||
|
| + | + | + | + | + | + | ∼ | ∼ | + | + | + | + | + | + | + | + | ||
|
| + | + | + | + | + | + | + | + | ∼ | + | ||||||||
|
| + | + | + | + | + | + | + | + | ∼ | + | ||||||||
|
| + | + | + | + | ∼ | + | ||||||||||||
|
| ∼ | + | + | + | + | ∼ | + | |||||||||||
|
| ∼ | + | + | + | ∼ | + | + | + | ||||||||||
|
| + | + | + | + | + | ∼ | + | ∼ | ∼ | + | ||||||||
|
| + | + | + | + | + | + | + | ∼ | ∼ | + | ||||||||
|
| + | + | + | + | + | + | + | + | + | + | + | |||||||
|
| + | + | + | + | + | + | + | + | + | |||||||||
|
| + | + | + | + | ∼ | + | + | ∼ | + | + | ||||||||
|
| ∼ | + | + | + | ∼ | ∼ | + | + | + | + | + | |||||||
|
| + | ∼ | + | |||||||||||||||
|
| + | + | ∼ | ∼ | ∼ | + | + | |||||||||||
|
| + | + | ∼ | ∼ | + | |||||||||||||
|
| + | + | + | ∼ | + | |||||||||||||
|
| + | ∼ | ||||||||||||||||
|
| ∼ | ∼ | + | + | + | |||||||||||||
|
| + | + | ∼ | + | ||||||||||||||
|
| + | + | ||||||||||||||||
|
| + | |||||||||||||||||
|
| + | + | + | + | + | + | + | + | + | + | + | + | + | + | ||||
|
| + | + | + | + |
In some cases the information is provided only in part (’ ∼’), meaning that either references to other sources are provided which hold the needed information or that the information is provided only for a subset of entities
Fig. 2The relational class model to represent minimum variant level data (MVLD) and possible extensions; colors correspond to Ritter et al. [7]: brown: somatic interpretive data; purple: allele interpretive data; blue: allele descriptive data; white: background data extending MVLD. Cardinalities of relationships indicated as follows: (A)1–n(B): one instance of (A) is associated with an arbitrary number of instances of (B); (A)0..1–n(B): no or one instance of (A) is associated with an arbitrary number of instances of (B)
Overview of data types and value ranges for data elements covered by the core data model for minimum variant level data
| Class | Attribute | Value range | Example |
|---|---|---|---|
| Allele descriptive | |||
| Gene | Gene ID | Internal ID | G0002V5Z |
| Gene name | HGNC gene symbols | KRAS | |
| Chromosome | 1.. 22, X, Y | 12 | |
| Entrez gene ID | Entrez gene IDs | 3845 | |
| Ensembl gene ID | Ensembl gene IDs | ENSG00000133703 | |
| RefSeq gene ID | RefSeq gene IDs | NG_007524 | |
| Gene transript | Gene ID | Internal ID | G0002V5Z |
| Gene transcript ID | Internal ID | T0006OOW | |
| RefSeq transcript ID | RefSeq Transcript IDs | NM_033360 | |
| RefSeq protein ID | RefSeq protein IDs | NP_203524 | |
| Ensemble transcript ID | Ensemble transcript IDs | ENST00000256078 | |
| UniProt ID | UniProt IDs | P01116 | |
| Gene position | Gene ID | Internal ID | G0002V5Z |
| Genome version | Genome build IDs | GRCh37.p13 | |
| DNA position | Genomic coordinate | 12p12.1 | |
| Gene pathway | Gene ID | Internal ID | G0002V5Z |
| Pathway ID | Internal ID | P003V724 | |
| Gene pathway | Pathway ID | Internal ID | P003V724 |
| Common name | Activation of RAS in B cells | ||
| Kegg ID | Kegg IDs | map04014 | |
| Reactome ID | Reactome IDs | R-HSA-1169092 | |
| PathwayCommons ID | PathwayCommons IDs | R-HSA-1169092 | |
| Allele interpretive | |||
| Variant | Variant ID | Internal ID | V0000LBB |
| Variant type | “Single nucleotide variant (SNV)”, “multinucleotide variant (MNV)”, “insertion (INS)”, “deletion (DEL)” | SNV | |
| Variant position | Variant ID | Internal ID | V0000LBB |
| Genome version | Genome build IDs | GRCh37.p13 | |
| DNA sub. & position | HGVS genomic coordinate | NC_000012.11:g.25398284C >G | |
| Gene variant | Gene ID | Internal ID | G0002V5Z |
| Variant ID | Internal ID | V0000LBB | |
| Variant consequence | “Non-sense”, “missense”, “silent”, “frame shift”, “in-frame”, “3UTR”, “5UTR”, “splice”, “splice-region”, “intronic”, “upstream”, “downstream” | missense | |
| Gene variant transcript | Gene ID | Internal ID | G0002V5Z |
| Variant ID | Internal ID | V0000LBB | |
| Gene transcript ID | Internal ID | T0006OOW | |
| Protein sub. & Position | HGVS formatted variants | NM_033360.3(KRAS):c.35G >C (p.Gly12Ala) | |
| Protein domain | Descriptive name of protein domain | Small GTP-binding protein domain | |
| Variant consequence | “Expression”, “amplification”, “deletion”, “fusion”, “loss of function”, “missense” | missense | |
| Risk score | FATHMM, SIFT, PolyPhen | 0.98468, 0, 0.97 | |
| Somatic interpretive | |||
| Cancer type | Cancer type ID | Internal ID | C000WQFL |
| Cancer type name | NCI thesaurus | Oncotree IDs | Colorectal cancer | |
| UMLS ID | UMLS concept IDs | C1527249 | |
| HPO ID | HPO concept IDs | HP:0003003 | |
| Cancer variant | Cancer variant ID | Internal ID | CV00XBQW |
| Variant ID | Internal ID | V0000LBB | |
| Cancer type ID | Internal ID | C000WQFL | |
| Biomarker class | “Diagnostic”, “prognostic”, “predictive”, “predisposing”, “pharmacogenomic” | predictive | |
| Clinical relevance level() | “Tier 1”, “Tier 2”, “Tier 3” [ | Tier 2 | |
| Cancer variant sample | Cancer variant ID | Internal ID | CV00XBQW |
| Sample ID | Internal ID | SXBQW0A7 | |
| Somatic classification | “Confirmed somatic”, “confirmed germline”, “unknown” | somatic | |
| Allele frequency | Allele frequency in global population | 0.00001647 | |
| Sample specimen | Sample ID | Internal ID | SXBQW0A7 |
| Tumor purity | Ratio | 0.763 | |
| TNM status | TNM values | T2N1M1 | |
| Primary / relapse | Primary || relapse | primary | |
| Cancer variant drug | Cancer variant ID | Internal ID | CV00XBQW |
| Drug ID | Internal ID | D00000Z9 | |
| Cancer variant drug effect | Cancer variant ID | Internal ID | CV00XBQW |
| Drug ID | Internal ID | D00000Z9 | |
| Effect | “Resistant”, “responsive”, “non-responsive”, “sensitive”, “reduced sensitivity”, “other” | Resistance or non-response | |
| Level of evidence | see Table | C | |
| Sublevel of evidence | see Table | 3A | |
| Drug | Drug ID | Internal ID | D00000Z9 |
| Substance name | FDA approved | DrugBank substance names | Panitumumab | |
| DrugBank ID | DrugBank IDs | DB01269 | |
| PharmGKB ID | PharmGKB IDs | PA162373091 | |
| FDA ID | FDA IDs | 125147 | |
| Drug mechanism | Drug ID | Internal ID | D00000Z9 |
| Molecular mechanism | Description | Binds to the epidermal growth factor receptor (EGFR) on both normal and tumor cells[...] |
Example data for evidence recording is given in Additional file 3
Identifiers included in data model for cross-source entity identification
| Entity type | Primary ID source | Further ID sources |
|---|---|---|
| Gene | Entrez gene | Ensembl, RefSeq |
| Transcripts | RefSeq | Ensembl, UniProt |
| Disease names | Disease ontology | UMLS, human phenotype ontology (HPO) |
| Drugs | DrugBank | PharmGKB, FDA |
| Pathways | KEGG | Gene ontology, PathwayCommons, Reactome |
Fig. 3Overview of data integration: source databases are processed by extract/transform/load (ETL) scripts which generate source specific table spaces within the local database. From these, the relevant elements are semantically mapped to and loaded into the core data model
Ambiguities using HGVS nomenclature arising from overlapping genes and different sources, by the example of variant rs121913529
| Ensembl HGVS | dbSNP HGVS | Associated |
|---|---|---|
| gene | ||
| NC_000012.11:g.25398284C>G | KRAS | |
| ENST00000256078.4:c.35G >C | NM_033360.3:c.35G >C | KRAS-004 |
| ENSP00000256078.4:p.Gly12Ala | NP_203524.1:p.Gly12Ala | KRAS-004 |
| ENST00000311936.3:c.35G >C | NM_004985.4:c.35G >C | KRAS-001 |
| ENSP00000308495.3:p.Gly12Ala | NP_004976.2:p.Gly12Ala | KRAS-001 |
| ENST00000556131.1:c.35G >C | KRAS-002 | |
| ENSP00000451856.1:p.Gly12Ala | KRAS-002 | |
| ENST00000557334.1:c.35G >C | KRAS-003 | |
| ENSP00000452512.1:p.Gly12Ala | KRAS-003 |
Assessment of the effect of different variants as provided by the SIFT and PolyPhen algorithms showing disagreement: While for variants one and two only one program calculates a score resembling the (true) clinical findings, variant three is corroborated by both programs and agrees with the clinical findings
| rsID | SIFT | PolyPhen | Clinical evidence |
|---|---|---|---|
| rs104894359 | 0 | 0.361 | Pathogenic |
| rs121913529 | 0 | 1 | Pathogenic |
| rs1137282 | 0.85 | 0.012 | Benign |
| Legend: | |||
| SIFT: | 0 (deleterious) - 1 (tolerated) | ||
| PolyPhen: | 0 (benign) - 1 (probably damaging) | ||
Evidence levels as defined by different sources
| Evidence level | CIViC | [ | Evidence level 2 | Oncokb.org | Pct.mdanderson.org (PMID: 25863335) | Andre et. al. (PMID: 25344359) | Proposed |
|---|---|---|---|---|---|---|---|
| A / Tier 1 | Validated association - proven/consensus association in human medicine | Alteration has matching FDA approved or NCCN recommended therapy | 1A | FDA-approved biomarker and drug in this indication | Drug is FDA-approved for the same tumor type harboring a specific biomarker | Molecular alteration validated in several robust early phase trials or at least one phase III randomized trials. Alteration validated in the disease under consideration, targeted therapies have shown to be uneffective in patients who are lacking the genomic alteration | Drug is FDA-approved for the same tumor type harboring a specific biomarker |
| 1B | An adequately-powered, prospective study with biomarker selection/stratification, or a meta-analysus/overview demonstrates a biomarker predicts tumor response to a drug or that the drug is clinically effective in a biomarker-selected cohort in the same tumor type. | Molecular alteration validated in several robust early phase trials or at least one phase III randomized trials. No evidence that the therapy does not work in the absence of the molecular alteration | An adequately-powered, prospective study with biomarker selection/stratification, or a meta-analysus/overview demonstrates a biomarker predicts tumor response to a drug or that the drug is clinically effective in a biomarker-selected cohort in the same tumor type. | ||||
| 1C | Molecular alteration validated in several robust early phase trials or at least one phase III randomized trials. Level I molecular alteration, but not in the disease under consideration | ||||||
| B | Clinical evidence - clinical trial or other primary patient data supports association | 2A | Standard-of-care biomarker and drug in this indication but not FDA-approved | Large-scale retrospective study demonstrates a biomarker is associated with tumor response to the drug in the same tumor type. This could be a prospective trial where biomarker study is the secondary objective, or an adequately powered retrospective cohort study or a case-control study | Efficacy of targeting molecular alteration suggested in single and underpowered phase I/II trials. Alteration validated in the disease under consideration, targeted therapies have shown to be ineffective in patients who are lacking the genomic alteration | Large-scale retrospective study demonstrates a biomarker is associated with tumor response to the drug in the same tumor type. This could be a prospective trial where biomarker study is the secondary objective, or an adequately powered retrospective cohort study or a case-control study | |
| 2B | FDA-approved biomarker and drug in another indication, but not FDA or NCCN compendium-listed for this indication | Clinical data that the biomarker predicts tumor response to drug in a different tumor type | Efficacy of targeting molecular alteration suggested in single and underpowered phase I/II trials. No evidence that the therapy does not work in the absence of the molecular alteration | Clinical data (analogue 1A-2A) that the biomarker predicts tumor response to drug in a different tumor type | |||
| 2C | Efficacy of targeting molecular alteration suggested in single and underpowered phase I/II trials. Level I molecular alteration, but not in the disease under consideration or anecdotal evidence of response to targeting molecular alteration in single patient case reports | Single unusual responder (or case study) show a biomarker is associated with response to drug, supported by scientific rationales | |||||
| C / Tier 2 | Case study - individual case reports from clinical journals | Alteration has matching therapy based on evidence from clinical trials, case reports, or exceptional responders | 3A | Clinical evidence links biomarker to drug response in this indication but neither biomarker or drug are FDA-approved or NCCN compendium-listed | Single unusual responder (or case studies) show a biomarker is associated with response to drug, supported by scientific rationales | Target suggested by preclinical studies. Preclinical studies include human samples, cell lines and animal models | Preclinical data (in vitro or in vivo models and functional genomics) demonstrates that a biomarker predicts response of cells to drug treatment in the same tumor type |
| D / Tier 3 | Preclinical evidence - in vivo or in vitro models support association | Alteration predicts for response or resistance to therapy based on evidence from pre-clinical data (in vitro or in vivo models) | 3B | Clinical evidence links biomarker to drug response in another indication but neither biomarker or drug are FDA-approved or NCCN compendium-listed | Preclinical data (in vitro or in vivo models and functional genomics) demonstrates that a biomarker predicts response of cells to drug treatment | Target suggested by preclinical studies. Preclinical studies that lack either cell lines or animal models | Preclinical data (in vitro or in vivo models and functional genomics) demonstrates that a biomarker predicts response of cells to drug treatment in a different tumor type |
| E / Tier 4 | Inferential association - indirect evidence | Alteration is a putative oncogenic driver based on functional activation of a pathway | 4A | Preclinical evidence associates this biomarker to drug response, where the biomarker and drug are NOT FDA-approved or NCCN compendium-listed | Target predicted but lack of clinical or preclinical data. Genomic alteration is a known cancer-related gene | Inferential association between biomarker and treatment response. | |
| 4B | Target predicted but lack of clinical or preclinical data. Genomic alteration is not known as cancer-related gene |