| Literature DB >> 30912749 |
Seo Jeong Shin1, Seng Chan You2, Yu Rang Park3, Jin Roh4, Jang-Hee Kim4, Seokjin Haam5, Christian G Reich6, Clair Blacketer7, Dae-Soon Son8, Seungbin Oh9, Rae Woong Park1,2.
Abstract
BACKGROUND: Clinical sequencing data should be shared in order to achieve the sufficient scale and diversity required to provide strong evidence for improving patient care. A distributed research network allows researchers to share this evidence rather than the patient-level data across centers, thereby avoiding privacy issues. The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) used in distributed research networks has low coverage of sequencing data and does not reflect the latest trends of precision medicine.Entities:
Keywords: data visualization; databases, genetic; high-throughput nucleotide sequencing; multicenter study; patient privacy
Mesh:
Year: 2019 PMID: 30912749 PMCID: PMC6454347 DOI: 10.2196/13249
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1Schematic diagram of the relationship between tables composing the genomic common data model. Tables in red (“Genomic_Test,” “Target_Gene,” “Variant_Occurrence,” and “Variant_Annotation”) are those storing genomic sequencing data and processes, whereas tables in blue (“Person,” “Condition_Occurrence,” “Procedure_Occurrence,” “Specimen,” and “Care_Site”) are those already existing in the Observational Medical Outcomes Partnership-common data model and store clinical data directly linked to the “Variant_Occurrence” and “Genomic_Test” tables. ID: identification; HGVS: Human Genome Variation Society; HGNC: Human Genome Organisation Gene Nomenclature Committee.
Description of data used to build the genomic common data model and to validate the data model.
| Variable | AUSOMa (N=114), n (%) | TCGAb (N=1060), n (%) | |
| ≤49 | 7 (6.1) | 44 (4.2) | |
| 50-59 | 26 (22.8) | 163 (15.4) | |
| 60-69 | 41 (36.0) | 310 (29.2) | |
| 70-79 | 35 (30.7) | 317 (29.9) | |
| ≥80 | 5 (4.4) | 56 (5.2) | |
| Unknown | 0 (0.0) | 170 (16.0) | |
| Male | 64 (56.1) | 628 (59.0) | |
| Female | 50 (43.9) | 429 (41.0) | |
| Unknown | 0 (0.0) | 3 (0.2) | |
| Lung adenocarcinoma | 92 (80.7) | 603 (56.9) | |
| Lung squamous carcinoma | 22 (19.3) | 457 (43.1) | |
| Stage I | 78 (68.4) | 526 (49.6) | |
| Stage II | 16 (14.0) | 286 (27.0) | |
| Stage III | 18 (15.8) | 184 (17.4) | |
| Stage IV | 0 (0.0) | 36 (3.4) | |
| Unknown | 2 (1.8) | 28 (2.6) | |
aAUSOM: Ajou University School of Medicine.
bTCGA: The Cancer Genome Atlas.
Figure 2Data visualization tool for clinical sequencing data holders who converted their genomic data into genomic CDM. Users can (a) connect their genomic CDM database; (b) get analysis plots such as an overall profile, (c) mutation type, and (d) pathogeny of variants; and (e) search the proportion of patients with gene name and variant information. CDM: common data model.
Figure 3Waterfall plot describing the variant profile of the top 10 genes in (a) Ajou University School of Medicine and (b) The Cancer Genome Atlas databases. Each row represents gene symbols ordered by their frequency of variants with different colors indicating different variant types. Columns represent each patient with only one sample per patient. The bar graph on the left corresponds to the frequency of variants in each gene. Clinical groups such as age, sex, and condition are shown in the bottom box. LUAD: lung adenocarcinoma; LUSC: lung squamous cell carcinoma.
Figure 4Frequencies of actionable mutations detected in the sequencing process between the AUSOM and TCGA databases. Frequency is shown according to the (a, e, i) level of five selected genes and (b, f, j) actionable mutations in EGFR, (c, g, k) KRAS, and (d, h, l) others such as PIK3CA, BRAF, and NRAS. Frequency is also shown according to patient groups: (a-d) total, (e-h) lung adenocarcinoma, and (k-l) lung squamous cell carcinoma. AUSOM: Ajou University School of Medicine; TCGA: The Cancer Genome Atlas; LUAD: lung adenocarcinoma; LUSC: lung squamous cell carcinoma.