| Literature DB >> 34819397 |
Jieun Lee1,2, Youngju Kim1,2, Seonghee Jin1, Heeseung Yoo1, Sumin Jeong1, Euna Jeong3, Sukjoon Yoon1,3.
Abstract
The rapid increase in collateral omics and phenotypic data has enabled data-driven studies for the fast discovery of cancer targets and biomarkers. Thus, it is necessary to develop convenient tools for general oncologists and cancer scientists to carry out customized data mining without computational expertise. For this purpose, we developed innovative software that enables user-driven analyses assisted by knowledge-based smart systems. Publicly available data on mutations, gene expression, patient survival, immune score, drug screening and RNAi screening were integrated from the TCGA, GDSC, CCLE, NCI, and DepMap databases. The optimal selection of samples and other filtering options were guided by the smart function of the software for data mining and visualization on Kaplan-Meier plots, box plots and scatter plots of publication quality. We implemented unique algorithms for both data mining and visualization, thus simplifying and accelerating user-driven discovery activities on large multiomics datasets. The present Q-omics software program (v0.95) is available at http://qomics.sookmyung.ac.kr.Entities:
Keywords: Kaplan-Meier plot; biomarker; cancer bioinformatics; immune infiltrate; omics data mining; smart software
Mesh:
Year: 2021 PMID: 34819397 PMCID: PMC8627836 DOI: 10.14348/molcells.2021.0169
Source DB: PubMed Journal: Mol Cells ISSN: 1016-8478 Impact factor: 5.034
Numbers of data points integrated into Q-omics software
| No. of lineages | No. of cell lines/No. of samples | No. of genes/No. of drugs | Data type | |
|---|---|---|---|---|
| Cell line data | ||||
| Gene expression | 20 | 1,061 | 19,137 | RNA sequencing |
| sgRNA | 20 | 741 | 18,110 | CRISPR |
| shRNA | 20 | 587 | 16,800 | RNAi shRNA |
| Drug response | 20 | 1,001 | 397 | Drug response |
| Mutation | 20 | 1,281 | 18,731 | Exome sequencing |
| Drug-induced gene expression | 13 | 60 | 12,305/15 | DNA microarray |
| Tissue data | ||||
| Tumor gene expression | 33 | 9,951 | 38,311 | RNA sequencing |
| Paired normal vs. cancer: gene expression | 18 | 679 | 38,311 | RNA sequencing |
| Mutation | 33 | 9,100 | 20,850 | Exome sequencing |
| Immune | 33 | 8,954 | 64 (cell types) | Cell type enrichment score |
Fig. 1Overview of data integration in Q-omics software.
Public datasets from the TCGA, GDSC, CCLE, NCI and DepMap were integrated for the cross-association analysis (blue arrow) of between any two datasets.
Fig. 2Software workflow and user interface.
(A) The workflow of functional modules and databases between the local software and server-side knowledge base in Q-omics. (B) Main interface of Q-omics software. Search options are separated into “Browse smart data” and “Query-oriented analysis”. “Ouick start examples” are comprehensive options for first-time users. Knowledge-based smart search is enabled for all of the search options.
Fig. 3Graphical interface of patient survival analysis and related smart search results.
(A) The panel of survival analyses included Kaplan–Meier (KM) plots, sample group information and advanced options for plotting. (B) The panel of gene lists retrieved by the smart algorithm from the server-side knowledge base. In this example, the list shows genes that are significantly (P < 0.01) associated with the user’s query in the KM plot.
Fig. 4Graphical interface of cross-association analysis between datasets using cell lines.
(A) The panel of cross-associations displaying the predictivity and descriptivity scores of all data points. The list on the right side shows hits with significant P values. (B and C) Box plot and scatter plot of a selected hit from the cross-association panel. Box plots and scatter plots are also available for patient sample analyses.