| Literature DB >> 34858777 |
Hyun Ae Jung1, Oksoon Jeong2, Dong Kyung Chang2,3, Sehhoon Park1, Jong-Mu Sun1, Se-Hoon Lee1, Jin Seok Ahn1, Myung-Ju Ahn1, Keunchil Park1.
Abstract
BACKGROUND: The American Society for Clinical Oncology recently launched the minimal common oncology data elements project to facilitate cancer data interoperability. However, clinical data are often unrecorded in an organized way, and converting them into a structured format can be time-consuming. Clinical Data Warehouse (CDW) is a database that consolidates data from different clinical sources. However, the clinical data extracted from this database include not only structured data but also natural language generated during clinical practice. Therefore, applying these data to a clinical study is challenging because they are unstructured, and unformatted to allow essential content to be found. This study determined how best to organize a huge amount of clinical data to evaluate the upper aerodigestive tract cancers' clinical features and outcomes, including cancer of the head and neck, esophagus, lung, thymus, and mesothelioma.Entities:
Keywords: Medical big data; automatically updated; cancer; cohort; outcomes
Year: 2021 PMID: 34858777 PMCID: PMC8577969 DOI: 10.21037/tlcr-21-531
Source DB: PubMed Journal: Transl Lung Cancer Res ISSN: 2218-6751
Figure 1Real-time automatically updated data warehouse in health care (ROOT).
Detailed key elements in the six main areas depicting the cancer journeys of patients
| Elements | Areas |
|---|---|
| Patients | Age*; sex*; performance status*; smoking history*; family history#; co-morbidity (HTN, DM, hepatitis, Tbc, cardiovascular disease, cerebral disease, thyroid disorder)# |
| Disease | Histology*; TNM stage (c, yp, p)*; location of primary tumor#; metastatic site (brain*, bone&, lung&, liver&, pleura&, leptomeningeal seeding&) |
| Genomics* | |
| Labs# | Tumor marker; CBC; chemistry; electrolyte; LD; CRP |
| Treatment* | Surgery: aim: curative |
| Radiotherapy: aim: curative | |
| Stereotactic radiosurgery | |
| Chemotherapy: adjuvant/neoadjuvant/definitive/palliative/salvage treatment | |
| Clinical trial | |
| Outcomes | RFS*; PFS*; TTNT*; OS*; RR*; side effects (clinical symptoms, labs, radiation pneumonitis, drug-induced pneumonitis)& |
*: level I; &: level II; #: level III. HTN, hypertension; DM, diabetes; Tbc, tuberculosis; EGFR, epidermal growth factor receptor; ALK, anaplastic lymphoma kinase; IHC, immunohistochemistry; FISH, fluorescent in situ hybridization; PD-L1, programmed death-ligand 1; NGS, next-generation sequencing; VATS, video-assisted thoracoscopy; RFS, relapse-free survival; PFS, progression-free survival; TTNT, time to next treatment; OS, overall survival; RR, response rate.
Figure 2Data flow from various references and sources to ROOT.
Figure 3Data quality management process in ROOT.
Characteristics of data for each key element.
| Clearly defined dataa | Structured datab | Unstructured datac |
|---|---|---|
| Age | ECOG (EMR: FB) | ECOG (EMR: text) |
| Sex | Smoking (EMR: FB) | Smoking (EMR: text) |
| Dates (birthday, date of first diagnosis, date of first treatment, surgery, death) | Family history (EMR: FB) | Family history (EMR: text) |
| Co-morbidity (diagnostic codes) | Co-morbidity (EMR: FB) | Co-morbidity (EMR: text or medication) |
| Blood test | ||
| Pathology result | Pathology result (out-of-hospital) | |
| Mutation test (EGFR/ALK/PD-L1 in-house setting) | Mutation test (NGS/clinical trial, other hospital) | |
| TNM staging (EMR: FB) | TNM staging (image, EMR: text) | |
| Type of surgery/radiation dosage, fraction, location/chemotherapy regimen | Image result (brain metastasis, LMS, metastasis site) |
a, data that can be used as they are automatically extracted from the EMR; b, results that are clear but need clarification and modification; c, data used to synthesize the results extracted by various methods or natural language processing. FB, fill in the blank.
Figure 4Actual screen of ROOT in CDW program.