| Literature DB >> 36233137 |
Najia Ahmadi1, Yuan Peng1, Markus Wolfien1, Michéle Zoch1, Martin Sedlmayr1.
Abstract
The current generation of sequencing technologies has led to significant advances in identifying novel disease-associated mutations and generated large amounts of data in a high-throughput manner. Such data in conjunction with clinical routine data are proven to be highly useful in deriving population-level and patient-level predictions, especially in the field of cancer precision medicine. However, data harmonization across multiple national and international clinical sites is an essential step for the assessment of events and outcomes associated with patients, which is currently not adequately addressed. The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is an internationally established research data repository introduced by the Observational Health Data Science and Informatics (OHDSI) community to overcome this issue. To address the needs of cancer research, the genomic vocabulary extension was introduced in 2020 to support the standardization of subsequent data analysis. In this review, we evaluate the current potential of the OMOP CDM to be applicable in cancer prediction and how comprehensively the genomic vocabulary extension of the OMOP can serve current needs of AI-based predictions. For this, we systematically screened the literature for articles that use the OMOP CDM in predictive analyses in cancer and investigated the underlying predictive models/tools. Interestingly, we found 248 articles, of which most use the OMOP for harmonizing their data, but only 5 make use of predictive algorithms on OMOP-based data and fulfill our criteria. The studies present multicentric investigations, in which the OMOP played an essential role in discovering and optimizing machine learning (ML)-based models. Ultimately, the use of the OMOP CDM leads to standardized data-driven studies for multiple clinical sites and enables a more solid basis utilizing, e.g., ML models that can be reused and combined in early prediction, diagnosis, and improvement of personalized cancer care and biomarker discovery.Entities:
Keywords: EHR; OHDSI; OMOP CDM; PLP; machine learning; prediction
Mesh:
Substances:
Year: 2022 PMID: 36233137 PMCID: PMC9569469 DOI: 10.3390/ijms231911834
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Figure 1PRISMA Flow-chart diagram showing the paper selection process.
An overview of the dataset size and features used in the articles, vocabularies used to transform them into OMOP CDM format, and predictive models used to analyze the data.
| Article | Dataset Size | Features | Vocabularies | Predictive Models |
|---|---|---|---|---|
| Felmeister et al. 2017 [ | 1000 | patient, condition, observation, drug exposure and demographics (gender, race, date of birth, etc.) | ICD-9-CM, ICD-10-CM, SNOMED-CT | LR, LDA, KNN, CART, NB, and SVM |
| Meystre et al. 2019 [ | 229 | patient identifier, gender, date of birth, height, weight, diagnostic code, procedure code, and clinical notes | LOINC, SNOMED-CT | NLP and SVM |
| Seneviratne et al. 2018 [ | 5861 | conditions, procedures, medications, observations, and laboratory values | ICD-9 and ICD-10 | LASSO, RF, GBM, and XGB |
| Tsopra et al. 2021 [ | - | - | ICD-10, LOINC, and SNOMED-CT | - |
| Lee et al. 2021 [ | 207,794 | age group, medical history: general (e.g., dementia, cardiovascular disease (e.g., arterial fibrillation), and neoplasms (e.g., malignant neoplasm of anorectum) | - | Cox regression |
Inclusion and exclusion criteria for the title and abstract screening and full-text screening.
| Screening Round | Inclusion | Exclusion |
|---|---|---|
| Title and abstract screening | The article is primary research in a peer-reviewed journal or conference. | The article is of any other type, for instance, study protocols, commentaries and editorials, tutorials, project reports, medical case studies, and master and doctoral thesis. |
| The article is written in English. | The article is written in a language other than English. | |
| The title or abstract mention analysis of cancer data. | The title or abstract do not mention analysis of cancer data. | |
| The title or abstract mention OMOP or OHDSI. | The title or abstract do not mention OMOP or OHDSI. | |
| Full-text screening | The article allows open access to full text. | The article does not allow open access to full text. |
| The article defines a predictive approach for cancer medicine. | The article defines a predictive approach but for a domain other than cancer medicine | |
| The predictive approach in the article uses the OMOP CDM as the data model. | The predictive approach in the article does not use the OMOP CDM as the data model. |