| Literature DB >> 33533192 |
Davide Cirillo1, Iker Núñez-Carpintero1, Alfonso Valencia1,2.
Abstract
From genome-scale experimental studies to imaging data, behavioral footprints, and longitudinal healthcare records, the convergence of big data in cancer research and the advances in Artificial Intelligence (AI) is paving the way to develop a systems view of cancer. Nevertheless, this biomedical area is largely characterized by the co-existence of big data and small data resources, highlighting the need for a deeper investigation about the crosstalk between different levels of data granularity, including varied sample sizes, labels, data types, and other data descriptors. This review introduces the current challenges, limitations, and solutions of AI in the heterogeneous landscape of data granularity in cancer research. Such a variety of cancer molecular and clinical data calls for advancing the interoperability among AI approaches, with particular emphasis on the synergy between discriminative and generative models that we discuss in this work with several examples of techniques and applications.Entities:
Keywords: artificial intelligence; cancer research; data granularity; machine learning
Mesh:
Year: 2021 PMID: 33533192 PMCID: PMC8024732 DOI: 10.1002/1878-0261.12920
Source DB: PubMed Journal: Mol Oncol ISSN: 1574-7891 Impact factor: 6.603
Fig. 1The interplay between data generated with different levels of granularity and the multiplicity of AI approaches in cancer research.
Fig. 2Demographic features of the individuals represented in TCGA and CEDCD projects. (A) Average number of individuals per cancer type in TCGA disaggregated by sex; (B) average number of individuals per cancer type in TCGA disaggregated by race and sex; (C) average number of individuals per cancer type in CEDCD cohort studies disaggregated by sex; and (D) average number of individuals per CEDCD cohort studies disaggregated by race and sex.
Fig. 3Synergy of AI solutions for cancer research in the data continuum. Based on label availability of large and small datasets (e.g., over‐ and under‐represented cancer subgroups), several learning approaches (supervised, semi‐supervised, unsupervised, transfer learning) can be attained to create both generative and discriminative models. While discriminative models can be used to identify smaller subsets from the totality of big data (represented as small dashed rectangle on the upper left corner), generative models can be used for data augmentation by producing large volumes of synthetic instances (represented as a large dashed rectangle on the upper right corner).