| Literature DB >> 30669703 |
Remy Nicolle1, Jerome Raffenne2, Valerie Paradis3,4, Anne Couvelard5,6, Aurelien de Reynies7, Yuna Blum8, Jerome Cros9,10.
Abstract
Data from the Cancer Genome Atlas (TCGA) are now easily accessible through web-based platforms with tools to assess the prognostic value of molecular alterations. Pancreatic tumors have heterogeneous biology and aggressiveness ranging from the deadly adenocarcinoma (PDAC) to the better prognosis, neuroendocrine tumors. We assessed the availability of the pancreatic cancer TCGA data (TCGA_PAAD) from several repositories and investigated the nature of each sample and how non-PDAC samples impact prognostic biomarker studies. While the clinical and genomic data (n = 185) were fairly consistent across all repositories, RNAseq profiles varied from 176 to 185. As a result, 35 RNAseq profiles (18.9%) corresponded to a normal, inflamed pancreas or non-PDAC neoplasms. This information was difficult to obtain. By considering gene expression data as continuous values, the expression of the 5312 and 4221 genes were significantly associated with the progression-free and overall survival respectively. Considering the cohort was not curated, only 4 and 14, respectively, had prognostic value in the PDAC-only cohort. Similarly, mutations in key genes or well-described miRNA lost their prognostic significance in the PDAC-only cohort. Therefore, we propose a web-based application to assess biomarkers in the curated TCGA_PAAD dataset. In conclusion, TCGA_PAAD curation is critical to avoid important biological and clinical biases from non-PDAC samples.Entities:
Keywords: TCGA; curation; pancreatic cancer
Year: 2019 PMID: 30669703 PMCID: PMC6357157 DOI: 10.3390/cancers11010126
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.639
Figure 1Flow chart depicting the curation of the pancreatic cancer dataset (TCGA_PAAD).
Clinical comparison of the TCGA_PAAD and the pancreatic adenocarcinoma multicenter cohort.
| Clinical/Pathological Features | TCGA_PAAD ( | PDAC Multicenter Cohort ( | |
|---|---|---|---|
| 64.89 (35, 88) | 63.31 (34, 88) | 0.12 | |
| 54% (69, 81) | 54% (215, 256) | 1 | |
| 37.97 (18, 120) | 32.42 (7, 150) | 1.86 × 10−4 | |
|
| < 1 × 10−10 | ||
| G1 | 5 | 201 | < 1 × 10−10 |
| G2 | 75 | 189 | 0.079 |
| G3 | 69 | 67 | < 1 × 10−10 |
| G4 | 1 | 0 | 0.5579 |
|
| 0.0047 | ||
| T1 | 1 | 17 | 0.114 |
| T2 | 20 | 68 | 0.861 |
| T3 | 125 | 386 | 0.676 |
| T4 | 3 | 0 | 0.016 |
| N (N1 proportion (N0, N1)) | 73.8% (39, 110) | 74.5% (120, 351) | 0.950 |
| M (M0 proportion (M0, M1)) | 94.4% (68, 0) | 100% (471, 0) | 1.11 × 10−5 |
Figure 2Progression-free and overall survival of the curated TCGA_PAAD and a PDAC multicenter cohort. Kaplan-Meier curves depicting the progression-free (a) and overall survival (b) of the curated TCGA-PAAD cohort (n = 150) and a multicenter PDAC cohort (n = 471).
Figure 3Progression-free and overall survival of the PDAC and non-PDAC cases. (a) Kaplan-Meier curves depicting the progression-free (left panel) and overall survival (right panel) of the PDAC cases (n = 150) and the non-PDAC cases (n = 27). (b) Number of genes associated significantly associated with the progression-free (left panel) and overall survival (right panel) in PDAC only cases and the uncurated cohort.
Figure 4Bias in prognostic analysis when using the uncurated cohort. (a) TWIST1 mRNA expression in PDAC and non-PDAC cases and prognostic impact (OS) in the uncurated and the PDAC only cohorts (left panels). Kaplan-Meier curves depicting the overall (middle panel) and progression-free survival (right panels) according to TWIST1 expression in the uncurated cohort or the PDAC only cohort. (b) and (c) Distribution of the KRAS and TP53 mutation in the PDAC and non-PDAC cases (left panels) and Kaplan-Meier curves depicting the overall survival according to the mutational status in the uncurated cohort (middle panels) or the PDAC-only cohort (right panels). (d) miR-203 mRNA expression in PDAC and non-PDAC cases and prognostic impact (OS) in the uncurated and the PDAC only cohorts (left panels). Kaplan-Meier curves depicting the overall (middle panel) and progression-free survival (right panels) according to miR-203 expression in the uncurated cohort or the PDAC only cohort.