| Literature DB >> 29848287 |
Noa Bossel Ben-Moshe1, Shlomit Gilad2, Gili Perry3, Sima Benjamin2, Nora Balint-Lahat4, Anya Pavlovsky4, Sharon Halperin3, Barak Markus2, Ady Yosepovich4, Iris Barshack4,5, Einav Nili Gal-Yam6, Eytan Domany1, Bella Kaufman6,5, Maya Dadiani7.
Abstract
BACKGROUND: The main bottleneck for genomic studies of tumors is the limited availability of fresh frozen (FF) samples collected from patients, coupled with comprehensive long-term clinical follow-up. This shortage could be alleviated by using existing large archives of routinely obtained and stored Formalin-Fixed Paraffin-Embedded (FFPE) tissues. However, since these samples are partially degraded, their RNA sequencing is technically challenging.Entities:
Keywords: Breast cancer; FFPE; Poly-a capturing; RNA sequencing; Ribosomal depletion
Mesh:
Substances:
Year: 2018 PMID: 29848287 PMCID: PMC5977534 DOI: 10.1186/s12864-018-4761-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Mapping percentages for the different RNA-seq methods and samples
| mRNA-seq | RiboZero | Nugen | |||||
|---|---|---|---|---|---|---|---|
| FF | FFPE | FFPE old | FF | FFPE | FF | FFPE | |
| Exons | 58 (51–63) | 29.2 (16.6–38.8) | 20.1 (13.8–32.2) | 21.4 (14.9–30) | 8.4 (0.6–12.7) | 20.3 (16–22.7) | 3.9 (0.8–7) |
| Intronic/intergenic | 25 (20–32) | 28.2 (22.3–32.6) | 23.3 (18.9–27.4) | 44.4 (35.1–55.5) | 70.2 (61.7–75.2) | 65.1 (63.1–67.9) | 32.5 (24.5–43.1) |
| rRNA | 1.3 (0.9–1.6) | 12.7 (4.17–17.8) | 6.4 (2.7–12.9) | 24.7 (1.15–45.3) | 5.3 (0.9–13.4) | 0.07 (0.05–0.08) | 1.2 (0.3–2.2) |
| Multiple alignment | 11.9 (9.5–15) | 5.9 (3.8–7.2) | 4.2 (2.63–6) | 6.3 (2–10) | 6.5 (6.2–6.6) | 9.6 (8.7–10.3) | 3.6 (2.7–4.7) |
| Unmapped | 3.8 (3.7–4) | 24 (10.9–44.9) | 45.7 (31.6–54) | 3 (2.7–3.3) | 9.6 (5.3–17.8) | 4.8 (4.2–6.1) | 58.7 (43–67.8) |
Values presented as mean percentage (range)
n = number of measurements (#of tumor samples X number of initial RNA amounts)
Fig. 1Mapping and coverage information. (a) The percentage of reads mapped to exons and to introns/intergenic regions out of the total number of uniquely mapped reads per sample. Color-code: purple for mRNA-seq protocol, orange for RiboZero protocol and blue for NuGEN ovation protocol. Black: FF samples (T1-T3), gray: ~ 4 years old FFPE samples (T1-T3), and light gray: ~ 10 years old FFPE samples (T4-T6, done only with mRNA-seq protocol). The amount of starting material, in ng, is indicated below the bars corresponding to each sample. (b) Box-plot of the percentages of reads uniquely mapped to exons out of total number of reads, for each sample type (FF mRNA-seq (T1-T3; n = 6); ~ 4 years old FFPE mRNA-seq (T1-T3; n = 5); ~ 10 years old FFPE mRNA-seq (T4-T6; n = 3); FF RiboZero (T1-T3; n = 3); FFPE RiboZero (T1-T3; n = 3); FFPE NuGEN ovation (T1-T3; n = 6)). n = number of measurements (#of tumor samples x number of initial RNA amounts). (c) The estimated number of detected genes as a function of the total number of reads for each sample type (FF mRNA-seq samples: purple solid line; FFPE mRNA-seq samples: purple dashed line, shown separately for ~ 4 and ~ 10 years old samples; FFPE RiboZero samples: orange dashed line). The horizontal dashed line represents the estimated coverage required for each sample type to get ~ 11,000 genes (see the numbers at the bottom (15 M, etc) for the estimated number of reads required for each sample type, and see methods for more details). (d) The average coverage along the relative genomic region from the 5′ end (Transcription Start Site) to the 3′ end (Transcription End Site) for each sample. mRNA-seq protocol at the left (purple; solid line for FF samples (T1-T3) and dashed line for FFPE samples (T1-T6)). RiboZero protocol to the right (orange; solid line for FF samples (T1-T3) and dashed line for FFPE samples (T1-T3))
Fig. 2Comparisons of matched FF and FFPE samples prepared with different RNA-seq protocols. (a) Scatter plot of gene expression levels as measured from FFPE sample T1 by mRNA-seq protocol (purple; left) and RiboZero protocol (orange; right), compared to the gene expression data obtained from the FF sample of the same tumor . Gene expression data presented in log2 scale; r-square and correlation coefficients are presented for each plot. Total number of reads for each library is indicated at the x and y labels (M = million) (b) Same as (a) for the T2 samples (c) Same as (a) for T3. (d) Pearson correlation coefficient matrix between T1 gene expression data as measured by the different protocols: mRNA-seq in purple, RiboZero in orange and NuGEN ovation in blue, on the FF (black) and FFPE (gray) samples (colorbar at the bottom), measured for the indicated RNA quantities (in ng). The colorbar to the right is for the correlation coefficient values from 0 to 1. (e) Same as (d) for T2 (f) Same as (d) for T3
Fig. 3Comparison of fold-changes measured for FFPE samples versus matched FF samples. (a) Scatter plot for the expression fold changes (log2 scale) of ~ 23,000 genes measured in T1 vs. T2, obtained from FF samples (x-axis) compared to matched FFPE samples (y-axis) by mRNA-seq protocol (purple). R2 and correlation coefficient are presented at the plot. (b) Scatter plot for the expression fold changes (log2 scale) of ~ 23,000 genes measured in T1 vs. T2, obtained from FF samples (x-axis) compared to matched FFPE samples (y-axis) by RiboZero protocol (orange). R2 and correlation coefficient are presented at the plot. (c) For each comparison in (a)-(b) the percentages of differentiating genes by FFPE samples (2fold) that are in agreement or disagreement with the FC of matched FF samples are presented (see methods for more information). In addition, the same percentages are presented for the FC comparing T1 vs. T3 and T2 vs. T3 for the mRNA-seq protocol. Purple bars represent the mRNA-seq protocol and orange bar for RiboZero protocol. (d) Same as (c), presented for fold-change threshold values (imposed on the FFPE samples) varying between 1 to 10 (x-axis). The y-axis shows the percentage of differentiating genes in FFPE that changed in the same direction as in matched FF samples. Purple for mRNA-seq protocol and orange for RiboZero protocol
Fig. 4Expression of clinically relevant genes in breast cancer, and comparison to IHC. (a) Scatter plot of the expression levels of a set of 701 differentially expressed genes in the METABRIC dataset, between normal and tumor samples, as measured on T1 FFPE sample by RiboZero (orange; left) or mRNA-seq (purple; middle) vs. their mRNA-seq derived expression in the T1 FF sample. At the right we compare the expression of these genes on the T1 FFPE sample alone, as measured by mRNAs-seq (x-axis) and RiboZero (y-axis). (b) Same as (a) for PAM50 genes (“intrinsic subtype”). ESR1 (Estrogen Receptor) is indicated in the scatter plots. (c) Comparison between the expression levels of Estrogen Receptor (ESR1) as measured by the different RNA-seq protocols on T1-T6 (mRNA-seq in purple, RiboZero in orange, diamond for FF samples, circles for ~ 4 years old FFPE samples, and stars for ~ 10 years old FFPE samples), relative to IHC levels. (d) Chromosomal view of ESR1 with the reads mapped to this location (from IGV). Data shown for T1 FF sample, done with the mRNA-seq protocol; T1 FFPE sample, done with mRNA-seq protocol, and T1 FFPE sample done with RiboZero protocol. For each panel a histogram of the mapped reads to this genomic location is presented. At the bottom the gene model and the chromosomal location are shown