| Literature DB >> 31590671 |
Mario Zanfardino1, Monica Franzese2, Katia Pane2, Carlo Cavaliere2, Serena Monti2, Giuseppina Esposito3, Marco Salvatore2, Marco Aiello2.
Abstract
Genomic and radiomic data integration, namely radiogenomics, can provide meaningful knowledge in cancer diagnosis, prognosis and treatment. Despite several data structures based on multi-layer architecture proposed to combine multi-omic biological information, none of these has been designed and assessed to include radiomic data as well. To meet this need, we propose to use the MultiAssayExperiment (MAE), an R package that provides data structures and methods for manipulating and integrating multi-assay experiments, as a suitable tool to manage radiogenomic experiment data. To this aim, we first examine the role of radiogenomics in cancer phenotype definition, then the current state of radiogenomics data integration in public repository and, finally, challenges and limitations of including radiomics in MAE, designing an extended framework and showing its application on a case study from the TCGA-TCIA archives. Radiomic and genomic data from 91 patients have been successfully integrated in a single MAE object, demonstrating the suitability of the MAE data structure as container of radiogenomic data.Entities:
Keywords: Cancer; MultiAssayExperiment; Radiogenomics; Radiomics; TCGA; TCIA
Year: 2019 PMID: 31590671 PMCID: PMC6778975 DOI: 10.1186/s12967-019-2073-2
Source DB: PubMed Journal: J Transl Med ISSN: 1479-5876 Impact factor: 5.531
Fig. 1Radiomics workflow. Radiomics features can be calculated from one or more imaging modalities, e.g. computed tomography (CT), magnetic resonance (MR), positron emission tomography (PET), for each time point acquired. Then, regions of interest (ROIs) are segmented from the acquired multi-parametric images, e.g. T2 weighted MR image, Contrast Enhanced T1 weighted MR image, FDG PET image, as shown from left to right in the figure in a case of breast lesion. Finally, the radiomic features are estimated, providing hundreds of features that can be categorized as shape, first order, second order and higher order features, for each segmented ROI, for each patient in the study and for each acquired image
Multiple cancer data type visualization and/or integration resources
| Name | Description | Data type | Software type/Programming language | Key task | Operating system | Latest update |
|---|---|---|---|---|---|---|
| Caleydo StratomeX [ | Tools allowing exploration of relationship among multiple datasets | Multi-omic | Application/Java | Data visualization | Windows Unix/Linux Mac OS | 2018 |
| CAS-viewer [ | Visualization of Cancer Alternatively Splicing (CAS) is a dynamic interface providing an integrated knowledge of alternative mRNA splicing patterns along with multi-cancer omic data from 33 TCGA cancer types | DNA methylation, miRNAs, and SNPs | Web Application/- | Data visualization and basic analysis | Windows Unix/Linux Mac OS | 2018 |
| cBio Cancer Genomic Portal [ | The cBioPortal for Cancer Genomics provides visualization, analysis and download of large-scale cancer genomics data sets from The Cancer Genome Atlas as well as many carefully curated published data sets | Transcriptomic, DNA methylation, CNVs, SNPs and clinical data | Web Application/Python, Java, Perl, R, MatLab | Data visualization and basic analysis | Windows Unix/Linux Mac OS | 2018 |
| Genboree Workbench [ | Genboree is a web-based platform for multi-omic research and data analysis using the latest bioinformatics tools. | Transcriptomic and epigenomic data | Web Application/- | Data visualization and basic analysis | Windows Unix/Linux Mac OS | 2014 |
| MARIO [ | MArkov Random fields to Integrate Omics variables (MARIO) is a hierarchical Bayesian model approach for the parallel, integrative analysis of data from several genomic types | Multi-omic and beyond | BUGS software/- | Data integration/analytics | Unix/Linux | 2017 |
| mixOmics [ | mixOmics offers exploration and integration of biological data and allows multivariate statistical approaches to identify similarities between two heterogeneous datasets | Multi-omic and beyond | Bioconductor Package/R | Data integration/analytics | Windows Unix/Linux Mac OS | 2018 |
| ModulOmics [ | ModulOmics identifies cancer driver pathways, or modules, by integrating multiple data types on the basis of DNA and RNA cancer patient data, integrated with PPI networks and known regulatory connections | Multi-omic and beyond | Package/R or Python | Data integration/analytics | Unix/Linux Mac OS | 2018 |
| Omics Integrator [ | Omics Integrator provides integration of proteomic data, gene expression data and/or epigenetic data using a protein–protein interaction network. It is comprised of two modules, Garnet and Forest | Multi-omic and beyond | Package/Python | Data integration/analytics | Unix/Linux | 2018 |
| XENA UCSC browser [ | It offers interactive visualization and exploration of TCGA genomic, phenotypic, and clinical data, as produced by the Cancer Genome Atlas Research Network | Multi-omic | Web client | Data visualization | Windows Unix/Linux Mac OS | 2017 |
Integrated Database of oncological, neurological/neurodegenerative, cardiovascular and multiple diseases
| Name | Description | Data type | Data access | Data download | Latest update |
|---|---|---|---|---|---|
| Oncological disease | |||||
| CPTAC [ | The Data Portal represents the NCI’s largest public repository of proteogenomic comprehensive sequence datasets | MS proteomic and phosphoproteomic data and gene expression | Open/controlled user account (open use studies) or request access by data use application (controlled use studies) | Web based, web client and programmatic | 2018 |
| GDC [ | The Cancer Genome Atlas is a large cancer genomics data collection covering 43 projects with normal-control. Patient outcomes, treatment details, pathology, and expert analyses are also provided when available. Many subjects possess corresponding imaging data on The Cancer Imaging Archive (TCIA) | Gene expression, DNA methylation, germline and somatic mutations, clinical data | Open/controlled user account (open use studies) or request access by data use application (controlled use studies) | Web-based, web client and Programmatic | 2018 |
| ICGC [ | The International Cancer Genome Consortium archives large number of datasets with molecular data from more than 20,000 donors including the Pan cancer Analysis of Whole Genomes (PCAWG) study | Germline and somatic mutations, gene expression, DNA methylation | Open/controlled user account (open use studies) or request access by data use application (controlled use studies) | Web based, web client and programmatic | 2018 |
| TCIA [ | The Cancer Imaging Archive collects medical cancer images accessible for public download. Data include 78 collections and different image modalities. Many subjects possess corresponding genomics data on the GDC (ex TCGA) | Medical images in DICOM format, clinical data | Open/controlled user account (open use studies) or request access by data use application (controlled use studies) | Web based, web client and Programmatic | 2018 |
| Neurological and neurodegenerative disorders | |||||
| 1000 Functional Connectomes Project/INDI International NeuroImaging Data-sharing Initiative [ | It provides the broader imaging community complete access to a large-scale functional imaging dataset such as prospective, retrospective dataset | Imaging and clinical data | NITRC account for some public datasets and some controlled dataset | Amazon Web Services S3 and CyberDuke web client and command line | 2018 |
| LONI Database (The Laboratory of Neuroimaging at University of Southern California) [ | Repository for sharing and long-term preservation of neuroimaging and biomedical research data especially on neurological, neurodegenerative and psychiatric diseases. Some studies ongoing are: ADNI, ENIGMA, GAAIN, PPMI | Clinical, imaging (MRI, PET, MRA, DTI and other imaging modalities), genetic and behavioral data from multisite longitudinal study | Open use data required account controlled access by Image and Data Archive (IDA) request otherwise data use application request | Web-based Image and Data Archive (IDA)* | 2018 |
| LRRK2 Cohort consortium (The Michael J. Fox Foundation (MJFF) for Parkinson’s Research) [ | The LRRK2 Cohort Consortium (LCC) comprises three closed studies: the LRRK2 Cross-sectional Study, LRRK2 Longitudinal Study and the 23 and Me Blood Collection Study | Clinical data and biospecimens (blood, urine and cerebrospinal fluid) from PD and control volunteers | Account controlled access data | LONI (IDA) repositorya | 2018 |
| National Institute of Neurological Disorders and Stroke/The Michael J. Fox Foundation (MJFF) for Parkinson’s Research BioFIND [ | BioFIND is a cross-sectional clinical study designed to discovery new Parkinson’s disease biomarker | Clinical data and biospecimens (blood, urine and cerebrospinal fluid) from PD and control volunteers | Account controlled access data | LONI (IDA) repository* | 2018 |
The National Institute of Mental Health (NIMH)/NIMH Repository and Genomic resources (RGR) [
| The NIMH Repository is an infrastructure for sharing data collected by hundreds of research projects in concerns clinical and genetic analysis of mental health disorders (e.g. schizophrenia, bipolar disorder, depression, Alzheimer’s disease, autism, obsessive–compulsive disorder, etc.). For instance the National Database for Autism Research (NDAR) website is the primary point of entry for Autism Research | Imaging Genetic and Clinical data | NIMH account approval | Web-based and web client Open Database License (ODbL) | 2018 |
| The National Institute of Neurological Disorders and Stroke (NINDS) [ | The NINDS is divided into basic, clinical and translational research projects to advance the study of neurological disorders to both academic and industry investigators. One dataset is the PDBP DMR Parkinson’s Disease Biomarkers Program Data Management Resource | Gene expression, clinical data | NINDS account approval | Web-based and web-client Open Database License (ODbL) | 2018 |
| The National Institute on Aging (NIA)/AMP-AD Knowledge Portal Accelerating Medicines Partnership-Alzheimer’s Disease [ | The AMP-AD Knowledge Portal is the NIA-designated repository for distribution of data from multiple NIA-supported programs on Alzheimer’s disease | Various types of molecular data from human, cell-based and animal model biosamples | Account controlled access data | Synapse web browser and web client | 2018 |
| The National Institute on Aging Genetics of Alzheimer’s Disease (Data Storage Site NIAGADS) [ | The NIAGADS provides access to publicly available NIAGADS summary statistics datasets for Alzheimer’s Disease and related neuropathologies | Multi-omic GWAS, whole genome (WGS) and whole exome (WES), expression, RNA Seq, and CHIP Seq analyses | Open to investigators return secondary analysis data to the database | Web-based (NIAGADS genome browser) and web-client Open Database License (ODbL) | 2018 |
| Cardiovascular disease | |||||
| Cardiac Atlas Project [ | A multi-center cardiac MRI data sets with the most robust manual contours defined by the consensus of 7 independent expert readers from 7 world-class core labs. Datasets related to 6 different studies | Imaging (MRI data) and clinical data | Controlled CAP data access request | Web client | 2018 |
| National Heart, Lung, and Blood Institute (BioLINCC) [ | NHLBI is the NIH center devoted to research, training, and education of heart, lung, blood and sleep disorders. It provides teaching datasets and public use datasets | Clinical data and sometimes corresponding biospecimens | Open and controlled data on request | Web-based user interface (BioLINCC) | 2018 |
| The Cardiovascular Research Grid (CVRG) [ | The CardioVascular Research Grid (CVRG) project is supported by the National Heart Lung & Blood Institute for creating an infrastructure for sharing cardiovascular data and data analysis tools | Imaging (ex vivo DWI and in vivo heart CT) and clinical data | Open/Controlled | Web-based | 2018 |
| The Qatar Cardiovascular Biorepository (QCBio) [ | Cases include patients needing percutaneous intervention for symptomatic coronary heart disease (CHD) or admitted with an acute coronary syndrome (myocardial infarction or unstable angina). Controls are individuals identified from the Hamad Medical Corp. blood bank who have no history of CHD. The goal of QCBio is to archive plasma and DNA of 1000 Qatari patients with coronary heart disease and 1000 controls, who are matched on age, sex and ethnicity | Biospecimens (plasma and DNA) and clinical data | Open to Qatari investigators and controlled access data for others | Web-based and web client | 2018 |
| Vascular Diseases Biorepository [ | Biorepository for common vascular diseases, including: (PAD) Peripheral artery disease, aortic aneurysm, (CAD) carotid artery stenosis, fibromuscular dysplasia. These samples are linked with demographic information, conventional cardiovascular risk factors, and comorbidities ascertained from Mayo Clinic’s electronic health record using EHR-based electronic phenotyping algorithms | Biospecimens (DNA, serum and plasma) and clinical data | Open/controlled | Web-based and web client | 2018 |
| Multiple diseases | |||||
| DAA [ | The Digital Aging Data is a portal of age-related changes covering different biological levels. It integrates to create an interactive portal that serves as the first centralised collection of human ageing changes and pathologies | Gene expression and proteomic, psychological and pathological age-related data | Publicly available by DAA account approval | DAA account approval for open | 2017 |
| dbGaP [ | The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans. Over 150 NCI studies are registered in dbGaP | Genome wide studies and clinical data | Open/controlled NCBI account approval | Web client and programmatic | 2018 |
| EGA [ | The European Genome-phenome Archive collects human biomedical data across Europe. It allows authorised users to search sequenced material, patient samples stored in biobanks, patients illnesses, treatments, outcomes | Imaging Gene expression, genome wide studies and clinical data | Controlled data use application request, then EGA account approval | Web client and programmatic | 2018 |
| GEO [ | Gene Expression Omnibus provides multiple level datasets (4348 in total) related to cancer and other diseases | Gene expression, genome wide studies and clinical data | Most data are publicly available, sometimes data use on request | Web client and programmatic | 2018 |
| HGAR [ | The Human Ageing Genomic Resources (HAGR) is a collection of databases and tools designed to help researchers study the genetics of human ageing using modern approaches such as functional genomics, network analyses, systems biology and evolutionary analyses | Gene expression and clinical data | Publicly available raw data, processed data on request | Web based download (zip, csv files) | 2018 |
| JGA [ | Japanese Genotype-phenome Archive is a service for archiving and sharing of all types of individual-level genetic and de-identified phenotypic data | Imaging, gene expression, genome wide studies and clinical data | NBDC Human Database approval | Web client and programmatic | 2018 |
aLONI (IDA) repository of multiple projects
Fig. 2A barcode example. An example of a The Cancer Genome Atlas barcode with a focus on the Sample Type Codes table. Some of the identifiers, such as Vial, Portion, Analyte and Plate, are specific for biological experiments and obviously are not usable for radiomic experiments
Fig. 3SummarizedExperiment object schema. In yellow: a classic use of summarizedExperiment object to store biological ‘omic experiment data. Each assay contains data for a result of the experiment (in this case segment mean, no probes and Log X from a Copy Number Alterations experiment). The rows of SE represent the genes and the columns represent the samples. Data describing the samples are stored in ColData object. In red: a summarizedExperiment with Magnetic Resonance Time Points as different assays. Each assay of the summarizedExperiment contains data of a single time-point and the rows represent radiomic features
Fig. 4MultiAssayExperiment object schema with Magnetic Resonance Time Points as different Experiments. The second option described to store temporal multi-dimensionality of a radiomic experiment. Each element of Experiments (in this case a SummarizedExperiments) object of the MultiAssayExperiment contains data of a single time-point. TRhe radiomic features are also contained in the rows of SummarizedExperiment
Fig. 5A generalized Venn diagram for sample membership in multiple assays. The visualization of set intersections was performed using the UpSet matrix design using UpSetR package
Fig. 6Architecture of the modular integration platform. The architecture herein proposed follows three separate modules. The first module, based on data uploading of a MultiAssayExperiment or from its construction from multiple SummarizedEXperiment or matrix-like data. The second module allows to execute different selections of data (by clinical data, such as pathological stage or histological type of cancer, by experiment/assay and features). Then selected data are the input of different and/or integrate data analysis module. This modular architecture simplify expansion and redesign of a single implementation and allow simple adding of a personal module of data preparation and/or analysis for specific tasks. Moreover, all modules may provide visualization of data to support the different operations (see an example of data visualization in Fig. 6)
Fig. 7A screenshot of summary tab of the graphic interface prototype. The summary tab shows the MAE data of the described case study. In the top table the name of all MAE experiments are listed and for each of them are reported the assays (timepoint_1 and timepoint_2 in the case of BRCA_T1_weighted_DCE_MRI) and the sample types. For each sample type, the number of patients is specified. The number of features and patients for each experiment are also represented as histogram (for a simple graphic representation the number of features was limited to 36 for all experiments)