| Literature DB >> 29699486 |
Rasmus Krempel1, Pranav Kulkarni2, Annie Yim3,4, Ulrich Lang1, Bianca Habermann3,4, Peter Frommolt5.
Abstract
BACKGROUND: Recent cancer genome studies on many human cancer types have relied on multiple molecular high-throughput technologies. Given the vast amount of data that has been generated, there are surprisingly few databases which facilitate access to these data and make them available for flexible analysis queries in the broad research community. If used in their entirety and provided at a high structural level, these data can be directed into constantly increasing databases which bear an enormous potential to serve as a basis for machine learning technologies with the goal to support research and healthcare with predictions of clinically relevant traits.Entities:
Mesh:
Year: 2018 PMID: 29699486 PMCID: PMC5921751 DOI: 10.1186/s12859-018-2157-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of various features of the CancerSysDB with those of other cancer genomics data integration tools
| CancerSysDB | TCGAbiolinks | RTCGA | cBio portal | |
|---|---|---|---|---|
| GUI | Web framework based on Groovy/Grails | Based on Shiny | None | Web framework based on Spring Java |
| Query schema | Hibernate | R scripting | R scripting | SQL |
| Data upload | Parametrized CSV file upload | Direct access to GDC through API | Data packages available on Bioconductor | CSV files plus meta file |
| Query definition | JSON-based | Combination of R commands | Combination of R commands | REST-based API |
| Portability | Native Docker implementation | Hosted on Bioconductor | Hosted on Bioconductor | Hosted on GitHub |
| Programming skills required | No | Yes | Yes | No |
Fig. 1Analysis results for workflows splitting multiple TCGA cohorts into TP53-mutant and non-mutant patients: a Overall survival is significantly different between TP53-mutant (red curve) and non-mutant patients (black curve) with a more favorable for non-mutant patients (gain in median survival: 2066 days, p < 0.0001, n = 9444). b The distribution of the mutations types in lung adenocarcinoma is strongly shifted towards an increase of G > T transversions in TP53 mutant compared to non-mutant patients (p = 0.0006, n = 584). c Genomic stability is quantified in terms of the overall size of somatic copy number alterations (sCNA) compared between tumor and normal. sCNA are considered as genomic amplifications above a level of 3 and as genomic deletions below a level of 1 for the signal ratio between tumor and paired normal sample. The difference between TP53 mutant and non-mutant patients is highly significant in glioblastoma multiforme (p = 0.0132, n = 379)
Results of TP53-dependent analysis of genomic and clinical characteristics
| (a) | |||||||
| Patients | Events | 5-year survival rate [%] | Median survival | 95% CI | |||
| TP53 mutant | 3772 | 1237 | 47.4 | 1670 | [1526; 1818] | ||
| TP53 non-mutant | 5672 | 1128 | 66.9 | 3736 | [3262; 4267] | ||
| (b) | |||||||
| Patients | CNAs [Mb] | ||||||
| TP53 mutant | 133 | 74.5 | |||||
| TP53 non-mutant | 246 | 50.5 | |||||
| (c) | |||||||
| TP53 | ATM | ||||||
| VarType | All | Mutant [%] ( | Non-mutant [%] ( | Mutant [%] ( | Non-mutant [%] ( | ||
| A > C or T > G | 3.5 | 3.3 | 3.6 | < 0.0001 | 3.9 | 3.4 | 0.2160 |
| A > G or T > C | 9.9 | 9.2 | 10.7 | < 0.0001 | 9.6 | 9.9 | 0.7695 |
| A > T or T > A | 8.1 | 8.4 | 7.8 | 0.0005 | 8.6 | 8.1 | 0.4584 |
| C > G or G > C | 13.6 | 13.9 | 13.1 | < 0.0001 | 13.2 | 13.6 | 0.3790 |
| C > T or G > A | 32.7 | 30.0 | 36.0 | < 0.0001 | 28.6 | 33.0 | 0.5121 |
| G > T or C > A | 32.3 | 35.2 | 28.8 | 0.0001 | 36.0 | 32.0 | 0.4940 |
Classes of carcinomas used for random forest prediction of cancer types
| Class name | TCGA cohorts | Sample size | ||
|---|---|---|---|---|
| Total | Training set | Test set | ||
| Adrenal gland | Adrenocortical carcinoma (ACC) | 271 | 179 | 92 |
| Pheochromocytoma and paraganglioma (PCPG) | ||||
| Bladder | Urothelial carcinoma (BLCA) | 411 | 272 | 139 |
| Brain | Lower grade glioma (LGG) | 515 | 340 | 175 |
| Breast | Breast invasive carcinoma (BRCA) | 1077 | 711 | 366 |
| Gastrointestinal | Esophageal carcinoma (ESCA) | 1237 | 817 | 420 |
| Stomach adenocarcinoma (STAD) | ||||
| Colon adenocarcinoma (COAD) | ||||
| Rectum adenocarcinoma (READ) | ||||
| Cholangiocarcinoma (CHOL) | ||||
| Head & Neck | Head and neck squamous cell carcinoma (HNSC) | 590 | 390 | 200 |
| Uveal melanoma (UVM) | ||||
| Hematologic | Acute myeloid leukemia (LAML) | 321 | 212 | 109 |
| Diffuse large B-cell lymphoma (DLBC) | ||||
| Thymoma (THYM) | ||||
| Kidney | Kidney Chromophobe (KICH) | 738 | 488 | 250 |
| Renal clear cell carcinoma (KIRC) | ||||
| Renal papillary cell carcinoma (KIRP) | ||||
| Liver | Hepatocellular carcinoma (LIHC) | 321 | 212 | 109 |
| Ovary | Ovarian serous cystadenocatcinoma (OV) | 437 | 289 | 148 |
| Pancreas | Pancreatic adenocarcinoma (PAAD) | 184 | 122 | 62 |
| Prostate | Prostate adenocarcinoma (PRAD) | 498 | 329 | 169 |
| Skin | Cutaneous melanoma (SKCM) | 104 | 69 | 35 |
| Testis | Testicular germ cell tumors (TGCT) | 150 | 99 | 51 |
| Thoracic | Lung adenocarcinoma (LUAD) | 1143 | 755 | 388 |
| Lung squamous cell carcinoma (LUSC) | ||||
| Mesothelioma (MESO) | ||||
| Thyroid | Thyroid carcinoma (THCA) | 496 | 327 | 169 |
| Uterus | Uterine carcinosarcoma (UCS) | 598 | 395 | 203 |
| Uterine corpus endometrial carcinoma (UCEC) | ||||
Fig. 2Results of a cross validation of the random forest prediction of cancer types in the CancerSysDB. The predictions are based on a random forest learned on the training set comprising 6006 patients from 30 TCGA studies (Table 2). Displayed are the predictions of the classes in the 3085 patients in the training set. The accuracy strongly varies across the particular subclasses, but sums up to a total of 1521 correctly classified patients (49.3%)
Fig. 3In-depth analysis of the dynamics of the TCA pathway in KIRP cancer patients. Interactive view bee-swarm scatter plot on the Tricarboxylic acid cycle (TCA) pathway from KIRP cancer patients is shown. The log2-fold changes are averaged for patients according to tumor grade (Stage I-IV). The dashboard gives the number of patients per grade and allows for further filtering according to gender or vital status (see also Additional file 2: Figure S1). a The SUCLG1 gene is selected (pink bubble in bee-swarm scatter plot). b The SUCLG2 gene is selected. Both genes show a strong, averaged down-regulation in Stage IV KIRP cancer patients (see Table 4 for averaged log2-fold changes)
Averaged log2-fold changes of SUCLG1 and SUCLG2 mRNAs in different tumor stages of KIRP cancer patients
| Stage | # Patients | Female/Male | Alive/Dead | SUCLG1 | SUCLG2 | ||
|---|---|---|---|---|---|---|---|
| log2 FC | log2 FC | ||||||
| I | 15 | 5 / 10 | 13 / 2 | −0.473 | 0.132 | −0.338 | 0.307 |
| II | 1 | 0 / 1 | 1 / 0 | −1.163 | 0.082 | 0.137 | 0.431 |
| III | 11 | 5 / 6 | 8 / 3 | −0.835 | 0.018 | −0.760 | 0.028 |
| IV | 4 | 0 / 4 | 3 / 1 | −1.975 | 0.066 | −1.664 | 0.054 |