| Literature DB >> 35288573 |
Kathleen Schoofs1,2,3, Annouck Philippron3,4, Piet Pattyn4,5, Katleen De Preter6,7, Francisco Avila Cobos2,3, Jan Koster8, Steve Lefever2,3, Jasper Anckaert2,3, Danny De Looze9, Jo Vandesompele2,3.
Abstract
In the past decades, the incidence of esophageal adenocarcinoma has increased dramatically in Western populations. Better understanding of disease etiology along with the identification of novel prognostic and predictive biomarkers are urgently needed to improve the dismal survival probabilities. Here, we performed comprehensive RNA (coding and non-coding) profiling in various samples from 17 patients diagnosed with esophageal adenocarcinoma, high-grade dysplastic or non-dysplastic Barrett's esophagus. Per patient, a blood plasma sample, and a healthy and disease esophageal tissue sample were included. In total, this comprehensive dataset consists of 102 sequenced libraries from 51 samples. Based on this data, 119 expression profiles are available for three biotypes, including miRNA (51), mRNA (51) and circRNA (17). This unique resource allows for discovery of novel biomarkers and disease mechanisms, comparison of tissue and liquid biopsy profiles, integration of coding and non-coding RNA patterns, and can serve as a validation dataset in other RNA landscaping studies. Moreover, structural RNA differences can be identified in this dataset, including protein coding mutations, fusion genes, and circular RNAs.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35288573 PMCID: PMC8921197 DOI: 10.1038/s41597-022-01176-x
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Metadata of 17 patients included in this dataset.
| clinical diagnosis | sample ID | age | gender | TNMa | Barrett’s segmentb | location | follow-up time |
|---|---|---|---|---|---|---|---|
| EAC | ID20 | 74 | M | pT2N1M0 | C0M2 | distal esophagus | 44 |
| EAC | ID29 | 77 | M | pT1bN1M0 | yes, CM not reported | GEJ | 34 |
| EAC | ID30 | 73 | M | ypT1bN0M0 | — | GEJ | 36 |
| EAC | ID43 | 63 | M | pT1aN0M0 | C4M5 | NA (no resection) | 10D |
| HGD | ID2 | 45 | M | — | C10M12 | — | 29 (EAC) |
| HGD | ID5 | 78 | M | — | C5M7 | — | 49 |
| HGD | ID25 | 73 | M | — | C10M10 | — | 23 (EAC) |
| HGD | ID26 | 54 | M | — | C5M7 | — | 36 |
| HGD | ID39 | 83 | F | — | C0M3 | — | 37 |
| NDB | ID1 | 59 | M | — | C0M7 | — | 40D |
| NDB | ID18 | 59 | F | — | C10M12 | — | 39 (LGD) |
| NDB | ID19 | 71 | M | — | C11M12 | — | 43 (C11M11) |
| NDB | ID22 | 73 | M | — | C6M6 | — | 20 |
| NDB | ID33 | 51 | M | — | C10M12 | — | 37 (C11M12) |
| NDB | ID35 | 78 | F | — | C9M9 | — | 16 (C7M8) |
| NDB | ID37 | 45 | M | — | C5M5 | — | 23 (C3M6) |
| NDB | ID40 | 76 | M | — | C8M8 | — | 6 |
aClassification that describes the size of the primary tumor and invasion in surrounding tissue (T), lymph node involvement (N) and metastasis (M). The prefix p indicates histopathological staging of the resected tumor and y indicates that the patient received neoadjuvant therapy.
bThe Prague C and M classification is used for reporting the Barrett’s segment: C = circumferential Barrett’s segment; M = maximal length of the Barrett’s tongue-like extent[62].
EAC = esophageal adenocarcinoma, HGD = high-grade dysplasia, NDB = non-dysplastic Barrett’s esophagus, M = male, F = female, LGD = low-grade dysplasia, GEJ = gastro-esophageal junction. Follow-up time indicates time in months with the last known disease progression in brackets. D indicates the patient has died.
Fig. 1Experimental set-up and overview of the data. This comprehensive dataset includes 17 patients with EAC, HGD or NDB. From each patient disease tissue, healthy esophageal tissue and blood plasma was collected. From all 51 samples, RNA was isolated that was used for mRNA (polyA+ and capture-based) and small RNA sequencing. Data reported in this study includes data for mRNA and miRNA expression, variant analysis, fusion gene detection and circRNAs (the latter only in plasma samples).
Range and mean (±standard deviation) of the number of reads per sample during the different pre-processing steps for all mRNA (tissue and plasma) and miRNA (tissue and samples) samples.
| mRNA (incl. circRNA for plasma) | miRNA | ||||
|---|---|---|---|---|---|
| raw reads (million) | tissue healthy | 25.7–30.5 | 27.7 ± 1.5 | 14.7–28.8 | 21.7 ± 3.7 |
| tissue disease | 24.2–31.2 | 27.1 ± 1.8 | 19.1–26.2 | 22.5 ± 2.0 | |
| reads after trimming (million) | tissue healthy | 20.8–25.6 | 23.1 ± 1.5 | — | — |
| tissue disease | 16.7–25.7 | 21.9 ± 2.1 | — | — | |
| mapped reads (million) | tissue healthy | 20.5–25.4 | 22.9 ± 1.5 | 2.0–11.7 | 6.0 ± 2.7 |
| tissue disease | 14.5–25.4 | 21.5 ± 2.4 | 3.5–10.5 | 7.0 ± 1.9 | |
| raw reads (million) | plasma | 22.9–34.1 | 29.1 ± 3.2 | 15.2–20.6 | 18.0 ± 1.3 |
| reads after trimming (million) | plasma | 13.3–29.7 | 23.5 ± 4.5 | — | — |
| reads after deduplication (million) | plasma | 1.0–6.0 | 3.3 ± 1.4 | — | — |
| mapped reads (million) | plasma | 0.9–5.8 | 3.2 ± 1.4 | 0.4–1.5 | 0.8 ± 0.3 |
Overview of available data and sources.
| data | data type | samples | source | accession number or name |
|---|---|---|---|---|
| pre-processed data (count tables) | mRNA | tissue (healthy and disease, 34 samples) | ArrayExpress | |
| pre-processed data (count tables) | mRNA | plasma (17 samples) | ArrayExpress | |
| pre-processed data (count tables) | small RNA | tissue (healthy and disease, 34 samples) | ArrayExpress | |
| pre-processed data (count tables) | small RNA | plasma (17 samples) | ArrayExpress | |
| pre-processed data (count tables) | circRNA | plasma (17 samples) | ArrayExpress | |
| pre-processed data (count tables) | mRNA | tissue (healthy and disease, 34 samples) | R2 | |
| pre-processed data (count tables) | mRNA | plasma (17 samples) | R2 | |
| pre-processed data (count tables) | small RNA | tissue (healthy and disease, 34 samples) | R2 | |
| pre-processed data (count tables) | small RNA | plasma (17 samples) | R2 | |
| pre-processed data (count tables) | circRNA | plasma (17 samples) | R2 | |
| results variant analysis | based on mRNA data | plasma | Supplementary Table | — |
| results fusion gene analysis | based on mRNA data | tissue | Supplementary Table | — |
| results fusion gene analysis | based on mRNA data | plasma | Supplementary Table | — |
Fig. 2Technical validation of the data. (a) quality plots of the RNA raw reads sequencing data: per base mean quality of mRNA tissue and plasma data (top row), and miRNA tissue and plasma data (bottom row); (b) hierarchical clustering of the mRNA plasma samples based on Pearson’s correlation coefficient, generated in R2 (Euclidian distances, average linkage), where the R-value ranging from −1 to 1 represents the negative (−1), positive (1) or no (0) relationship. It shows a clustering of EAC samples versus HGD and NDB samples; (c) heatmap showing the relative expression of 35 overlapping differentially expressed genes (up and down) for tissue (left) and plasma (right) samples (Benjamini-Hochberg adjusted p-value < 0.05); (d) the relative expression of top ten abundant circRNAs in plasma (EAC vs NDB) shown in a heatmap (p-value < 2.36 × 10−3); (e) boxplot representation of the relative expression of four of the most frequently reported up- and down regulated miRNAs (more than four times in literature) in EAC, HGD and/or NDB tissue samples compared to matched healthy esophageal tissue. Samples included in the boxplots are healthy and disease tissues from 3 patients with EAC, 5 with HGD and 7 with NDB.
Range and mean (±standard deviation) of unique protein coding genes (mRNAs), miRNAs and circRNAs found in tissue or plasma samples.
| RNA type | disease | sample type | range | mean ± s.d. |
|---|---|---|---|---|
| mRNA | EAC | healthy tissue | 17,297–18,844 | 18,122 ± 552 |
| disease tissue | 15,374–19,291 | 17,990 ± 1,534 | ||
| plasma | 8,195–10,237 | 8,968 ± 763 | ||
| HGD | healthy tissue | 17,578–18,119 | 17,834 ± 220 | |
| disease tissue | 18,055–19,817 | 18,893 ± 688 | ||
| plasma | 8,974–11,468 | 10,707 ± 886 | ||
| NDB | healthy tissue | 16,848–17,937 | 17,503 ± 338 | |
| disease tissue | 16,294–19,685 | 18,282 ± 909 | ||
| plasma | 9,514–11,443 | 10,455 ± 633 | ||
| miRNA | EAC | healthy tissue | 483–639 | 529 ± 64 |
| disease tissue | 629–682 | 657 ± 20 | ||
| plasma | 375–438 | 417 ± 25 | ||
| HGD | healthy tissue | 494–726 | 598 ± 81 | |
| disease tissue | 577–704 | 659 ± 44 | ||
| plasma | 347–427 | 386 ± 28 | ||
| NDB | healthy tissue | 531–682 | 626 ± 54 | |
| disease tissue | 621–714 | 663 ± 32 | ||
| plasma | 332–432 | 391 ± 30 | ||
| circRNA | EAC | plasma | 353–1,165 | 745 ± 301 |
| HGD | plasma | 858–3,624 | 2,286 ± 895 | |
| NDB | plasma | 1,237–3,683 | 2,000 ± 824 |
Counts were filtered by only keeping RNAs with more than four counts.
Number of overlapping upregulated genes in EAC tissue compared to healthy tissue.
| Maag | Lv | Wang | tissue data from this study (including all 34 samples) | |
|---|---|---|---|---|
| 19 | ||||
| 0 (1) | 63 | |||
| 0 (1) | 10 (9.54 × 10−12) | 119 | ||
| 19 (1.32 × 10−15) | 12 (2.48 × 10−08) | 20 (9.29 × 10−12) | 446 |
On the diagonal line are the number of reported genes in each gene set. The number of overlapping genes between a given pair of datasets are shown, with Fisher’s exact test adjusted p-values (Benjamini-Hochberg).
Number of overlapping downregulated genes in EAC tissue compared to healthy tissue.
| Lv | Wang | tissue data from this manuscript (including all 34 samples) | |
|---|---|---|---|
| 57 | |||
| 5 (3.27 × 10−05) | 100 | ||
| 2 (0.01) | 3 (4.70 × 10−03) | 57 |
On the diagonal line are the number of reported genes in each gene set. The number of overlapping genes between a given pair of datasets are shown, with Fisher’s exact test adjusted p-values (Benjamini-Hochberg).
Results of expression and abundance analyses of tissue samples (19,734 genes and 676 miRNAs included) and plasma samples (11,255 genes, 457 miRNAs and 2,275 circRNAs included).
| contrasts | 1. disease vs healthy tissue | 2. disease tissue vs disease tissue | 3. disease-healthy vs disease-healthy | 4. plasma | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EAC | HGD | NDB | EAC vs NDB | EAC vs HGD | HGD vs NDB | EAC vs NDB | EAC vs HGD | HGD vs NDB | EAC vs NDB | EAC vs HGD | HGD vs NDB | |
| 99/5 | 4,440/4,218 | 4,799/4,324 | 3,653/2,615 | 2,798/1,956 | 2/8 | 1,979/1,172 | 1,665/734 | 0/0 | 54/167 | 0/0 | 0/0 | |
| 42/42 | 203/154 | 219/186 | 56/38 | 15/5 | 0/0 | 46/62 | 27/21 | 0/0 | 0/0 | 0/0 | 0/0 | |
| — | — | — | — | — | — | — | — | — | 0/0 | 0/0 | 0/0 | |
Prior to the analyses, count tables were filtered to include RNAs with more than four counts in at least half of the samples per group. Results shown in the table are filtered based on adjusted p-value < 0.05 (Benjamini-Hochberg) and LFC > log2(1.5). Different contrasts were analyzed: comparing disease with healthy tissue (contrast 1), comparing disease tissue between groups (contrast 2), comparing disease versus healthy tissue samples of one group with the disease versus healthy tissue samples of another group (contrast 3), and comparing the three groups for the plasma samples (contrast 4).
Fig. 3Usage notes. Boxplot per sample group of the hsa-miR-194, SHH and SUFU expression levels in the tissue samples (generated in R2). Samples included in the boxplots are healthy and disease tissues from 3 patients with EAC, 5 with HGD and 7 with NDB.
| Measurement(s) | mRNA Sequencing • MicroRNA Sequencing |
| Technology Type(s) | sequencer |
| Sample Characteristic - Organism | Homo sapiens |
| Sample Characteristic - Location | Belgium |