| Literature DB >> 34471925 |
Mohammad Reza Karimi1, Amir Hossein Karimi1, Shamsozoha Abolmaali1, Mehdi Sadeghi1, Ulf Schmitz2.
Abstract
It is becoming evident that holistic perspectives toward cancer are crucial in deciphering the overwhelming complexity of tumors. Single-layer analysis of genome-wide data has greatly contributed to our understanding of cellular systems and their perturbations. However, fundamental gaps in our knowledge persist and hamper the design of effective interventions. It is becoming more apparent than ever, that cancer should not only be viewed as a disease of the genome but as a disease of the cellular system. Integrative multilayer approaches are emerging as vigorous assets in our endeavors to achieve systemic views on cancer biology. Herein, we provide a comprehensive review of the approaches, methods and technologies that can serve to achieve systemic perspectives of cancer. We start with genome-wide single-layer approaches of omics analyses of cellular systems and move on to multilayer integrative approaches in which in-depth descriptions of proteogenomics and network-based data analysis are provided. Proteogenomics is a remarkable example of how the integration of multiple levels of information can reduce our blind spots and increase the accuracy and reliability of our interpretations and network-based data analysis is a major approach for data interpretation and a robust scaffold for data integration and modeling. Overall, this review aims to increase cross-field awareness of the approaches and challenges regarding the omics-based study of cancer and to facilitate the necessary shift toward holistic approaches.Entities:
Keywords: biological networks; metabolomics; proteogenomics; proteomics; systems biology; transcriptomics
Mesh:
Year: 2022 PMID: 34471925 PMCID: PMC8769701 DOI: 10.1093/bib/bbab343
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Figure 1
A timeline of some of the major contributions to the field of systems biology.
Figure 2
General workflows for different omics studies. The wet lab and computational procedures are distinguished by different background colors.
A list of tools dedicated to single-cell RNA-seq data manipulation and analysis
| Name | Implementation | Description | Reference |
|---|---|---|---|
|
| Web-based | A comprehensive and user-friendly tool that supports quality control, normalization, batch-effect correction, cell type identification, DGE analysis and visualization | [ |
|
| R | An algorithm that performs gene expression quantification and differential analysis | [ |
|
| MATLAB | A tool that performs the imputation of the dropout events in the expression matrix | [ |
|
| R | A tool that identifies and removes doublet events using gene expression data | [ |
|
| R | An algorithm that sequentially imputes the dropout events | [ |
|
| R | An algorithm that accounts for batch effect noise through detection of mutual nearest neighbors | [ |
|
| R | A tool for quantification of gene expression in single-cell RNA-seq studies that incorporated unique molecular identifiers | [ |
|
| R | A comprehensive and highly powerful toolkit designed for single-cell data manipulation and integration | [ |
|
| R | A comprehensive R package capable of performing gene expression quantification, quality control, normalization and visualization | [ |
|
| R | A Bayesian approach for DGE analysis | [ |
|
| R | An algorithm for the identification and analysis of cellular regulatory networks | [ |
|
| MATLAB | A user-friendly and comprehensive toolkit that supports batch effect correction, normalization, imputation, feature selection, clustering, trajectory analysis and network construction and can readily be incorporated in customized workflows | [ |
Figure 3
Integrative study of biological phenomena. The first fundamental decision for modern large-scale studies is the choice between hypothesis-driven or data-driven study design. While both types of study designs are applicable, complementary approaches are recommended since hypothesis-driven studies are vulnerable to bias, while data-driven studies are highly prone to false positives [365]. The extracted omics data can be subjected to integration through multiple approaches. The resulting functional data will improve our knowledge base and can serve as a starting point for future studies. Already emerging pipelines demonstrate the clinical utility of the integrative approaches [366]. The integration approaches provided in this figure are based on the categorization in [240]. Sequential analysis: the integration of datasets subsequent to independent analysis. Latent variable analysis: partitioning of samples into functional groups through unsupervised clustering for example by implementation of an expectation–maximization algorithm. Penalized likelihood analysis: outcome prediction through penalized regression. Pairwise correlation analysis: association estimation for related molecule pairs across datasets. Gene set analysis: homogenization of multiple datasets by replacing every molecule with its respective gene and subsequent enrichment of the resulting datasets. Network analysis: using prior knowledge of molecular interactions to provide an environment for integration. Bayesian analysis: utilization of the information in an omics layer as the prior information for the analysis of another through Bayesian approaches.
Figure 4
General workflow for the integration of genomics and tandem mass spectrometry data in proteogenomics. The MS/MS spectra of the sample are searched against the theoretical spectra inferred from the NGS data (most commonly RNA-seq) obtained from the same sample. The identified novel peptides should be validated (using PepQuery). The resulting data can be utilized for the study of posttranslational modifications, identification of neoantigens and biomarkers and mutation prioritization in the downstream interpretation. Network-based analysis of these data can provide a critical vantage point for functional study of system perturbations.
A list of resources for proteogenomics computational analysis
| Tool | Implementation | Description | Reference |
|---|---|---|---|
|
| R | Customized database construction from RNA-seq data. | [ |
|
| Python & Perl | Identification and annotation of chimeric transcripts. | [ |
|
| Perl & R | Customized database construction, database search, filtering and visualization. | [ |
|
| Web-based | Validation of novel variants independent of customized database. Also available as a stand-alone tool. | [ |
|
| R | Customized database construction and novel peptide identification. | [ |
|
| Perl & Python | Customized database construction, FDR estimation, protein identification and annotation, visualization. | [ |
|
| Python | Neoantigen identification, classification and prioritization. | [ |
|
| Python & Perl | Proteoform identification through proteogenomic analysis of ribosome profiling and MS/MS data. | [ |
|
| Web-based | Customized database construction. | [ |
|
| Web-based & Python | User-friendly single amino acid variant prioritization. | [ |
|
| Windows | User-friendly customized database construction. Importantly, it accepts raw RNA-seq data as input and automatically performs preprocessing through utilization of 23 tools. | [ |
Figure 5
General workflow for the network-based analysis of omics data. The constructed subnetworks from the integration of the omics-driven data and prior knowledge of molecular interactions can be subjected to module identification or enrichment analysis. The identified modules can also be enriched to yield functional information. Note that it is possible to enrich the omics data independent of the subnetwork construction process. An example of downstream interpretation is to demonstrate multiomics data in multilayered networks for computational and/or visual pattern detection. Going from either raw omics data or interactome databases to subnetwork modules and enriched data, the complexity decreases, and the data are constantly narrowed down to yield functional information. ORA, overrepresentation analysis; FCS, functional class scoring.