| Literature DB >> 32397474 |
Maxim Sorokin1,2,3, Kirill Ignatev4, Elena Poddubskaya1,5, Uliana Vladimirova3, Nurshat Gaifullin6, Dmitriy Lantsov7, Andrew Garazha2, Daria Allina1, Maria Suntsova1, Victoria Barbara8, Anton Buzdin1,2,3,9.
Abstract
RNA sequencing is considered the gold standard for high-throughput profiling of gene expression at the transcriptional level. Its increasing importance in cancer research and molecular diagnostics is reflected in the growing number of its mentions in scientific literature and clinical trial reports. However, the use of different reagents and protocols for RNA sequencing often produces incompatible results. Recently, we published the Oncobox Atlas of RNA sequencing profiles for normal human tissues obtained from healthy donors killed in road accidents. This is a database of molecular profiles obtained using uniform protocol and reagents settings that can be broadly used in biomedicine for data normalization in pathology, including cancer. Here, we publish new original 39 breast cancer (BC) and 19 lung cancer (LC) RNA sequencing profiles obtained for formalin-fixed paraffin-embedded (FFPE) tissue samples, fully compatible with the Oncobox Atlas. We performed the first correlation study of RNA sequencing and immunohistochemistry-measured expression profiles for the clinically actionable biomarker genes in FFPE cancer tissue samples. We demonstrated high (Spearman's rho 0.65-0.798) and statistically significant (p < 0.00004) correlations between the RNA sequencing (Oncobox protocol) and immunohistochemical measurements for HER2/ERBB2, ER/ESR1 and PGR genes in BC, and for PDL1 gene in LC; AUC: 0.963 for HER2, 0.921 for ESR1, 0.912 for PGR, and 0.922 for PDL1. To our knowledge, this is the first validation that total RNA sequencing of archived FFPE materials provides a reliable estimation of marker protein levels. These results show that in the future, RNA sequencing can complement immunohistochemistry for reliable measurements of the expression biomarkers in FFPE cancer samples.Entities:
Keywords: NCT03521245; RNA sequencing; bioinformatics; biomarkers detection; breast cancer; clinical oncology; immunohistochemistry; lung cancer; molecular diagnostics; personalized medicine; targeted therapies; transcriptomics; trastuzumab
Year: 2020 PMID: 32397474 PMCID: PMC7277916 DOI: 10.3390/biomedicines8050114
Source DB: PubMed Journal: Biomedicines ISSN: 2227-9059
Clinical and molecular annotation of the breast cancer biosamples.
| Sample ID | Primary Tumor or Metastasis | Age | Stage | HER2 Score | ER Score | PR Score | Coverage (mln Mapped Reads) | RIN |
|---|---|---|---|---|---|---|---|---|
| BC-1 | primary | 39 | T2N3aM0, IIIC | 3 | 0 | 0 | 9.42 | 2.1 |
| BC-10 | primary | 48 | T2N0M0, II | 3 | 0 | 0 | 6.70 | 1 |
| BC-12 | primary | 60 | T2N0M0, IIA | 3 | 0 | 0 | 5.12 | 1 |
| BC-13 | primary | 69 | T2N3M0, III C | 3 | 8 | 4 | 9.03 | 1 |
| BC-14 | primary | 49 | T2N2M0, IIIA | 3 | 0 | 0 | 6.11 | 2.4 |
| BC-17 | primary | 59 | T4N2M0 | 3 | 7 | 2 | 3.96 | 2.5 |
| BC-18 | lymph node metastasis | 47 | T3N1M0, IIIA | 3 | 0 | 0 | 6.62 | 2.3 |
| BC-19 | primary | 48 | T1N0M0, I | 3 | 5 | 5 | 9.07 | 1.1 |
| BC-20 | lymph node metastasis | 51 | T2N0M0, II | 3 | 0 | 0 | 10.22 | 2.3 |
| BC-21 | primary | 49 | T1N3M0, IIIC | 3 | 0 | 0 | 9.34 | 2.3 |
| BC-22 | primary | 47 | T2N0M0, II | 3 | 6 | 5 | 10.52 | 2 |
| BC-23 | primary | 46 | T2N2M0, IIIA | 3 | 7 | 6 | 8.39 | 2.1 |
| BC-24 | primary | 57 | T2N0M0, IIA | 3 | 6 | 4 | 11.21 | 1 |
| BC-27 | primary | 44 | T2N0M0 | 3 | 0 | 0 | 13.82 | 2.2 |
| BC-28 | ovary metastasis | 53 | T2N0M0, IIA | 0 | 7 | 4 | 4.65 | 3.7 |
| BC-29 | primary | 65 | T4N3M1,IV | 3 | 0 | 0 | 12.56 | 2.2 |
| BC-3 | primary | 55 | T2N1M0, IIIa | 3 | 0 | 0 | 6.84 | 1 |
| BC-4 | primary | 58 | T2N1M0, IIB | 3 | 0 | 0 | 7.17 | 1 |
| BC-46 | liver metastasis | 27 | T2N2M0 | 0 | 8 | 8 | 15.07 | 3.3 |
| BC-48 | relapse in the scar | 36 | T3N1M0 | 1 | 0 | 0 | 20.54 | NA |
| BC-49 | primary | 54 | T1N2M0 | 0 | 2 | 8 | 10.54 | 2 |
| BC-50 | primary | 51 | T2N0M0 | 0 | 0 | 0 | 8.49 | 2.6 |
| BC-51 | primary | 38 | T2N1M0 | 0 | 0 | 0 | 8.68 | 3 |
| BC-52 | primary | 78 | T1N2M0 | 1 | 4 | 8 | 11.92 | 1.7 |
| BC-53 | primary | 50 | T2N0M0 | 1 | 0 | 8 | 8.06 | 1.9 |
| BC-54 | primary | 50 | T2N0M0 | 0 | 0 | 0 | 7.30 | 1.8 |
| BC-55 | primary | 71 | T2N3M0 | 1 | 8 | 8 | 9.32 | 3.3 |
| BC-56 | primary | 60 | T1N1M1 | 0 | 0 | 8 | 12.66 | 2.4 |
| BC-57 | primary | 55 | T3N2M0 | 1 | 0 | 0 | 13.77 | 2.8 |
| BC-58 | lymph node metastasis | 55 | T1N0M0 | 0 | 7 | 7 | 14.24 | 2.1 |
| BC-59 | scar metastasis | 61 | T1N1M0 | 0 | 3 | 1 | 16.88 | 1.2 |
| BC-60 | primary | 33 | T2N1M0 | 2 | 0 | 0 | 10.03 | 1.8 |
| BC-61 | liver metastasis | 38 | T2N2M0 | 0 | 8 | 8 | 5.42 | 3 |
| BC-62 | brain metastasis | 44 | T2N0M0 | 0 | 0 | 0 | 10.99 | 3 |
| BC-63 | primary | 66 | T4N2M0 | 0 | 0 | 0 | 10.11 | 3.7 |
| BC-64 | primary | 60 | T3N3M0 | 1 | 0 | 0 | 12.71 | 3.8 |
| BC-65 | primary | 42 | T2N0M0 | 0 | 0 | 0 | 9.92 | 2.6 |
| BC-66 | primary | 55 | T3N1M0 | 3 | 3 | 3 | 8.96 | 3.1 |
| BC-9 | primary | 57 | T1N1M0, IIB | 3 | 8 | 5 | 6.88 | 1 |
RIN—RNA integrity number, mln—million, NA—not assessed.
Clinical and molecular annotation of the lung cancer biosamples.
| ID | Histology | Age | Stage | Sex | Percent of PDL1 Positive Cells | Coverage (mln Mapped Reads) | RIN |
|---|---|---|---|---|---|---|---|
| LuC_16 | squamous cell carcinoma | 75 | T3N2M1, IV | male | 1%–50% | 11.54 | 2.4 |
| LuC_18 | squamous cell carcinoma | 63 | T2N1M0 | male | 0 | 15.45 | 3 |
| LuC_19 | squamous cell carcinoma | 65 | T2N0M0 | male | >50% | 12.57 | 3 |
| LuC_30 | Unidentified | 79 | T2NXM0 | male | >50% | 11.01 | 4.9 |
| LuC_31 | adenocarcinoma | 66 | T3N2M0 | male | 1%–50% | 10.27 | 4.5 |
| LuC_32 | adeno-squamous cell carcinoma | 70 | T2N1M0 | male | >50% | 12.14 | 2.7 |
| LuC_33 | squamous cell carcinoma | 57 | T3N0M0 | male | 0 | 14.12 | 3.8 |
| LuC_42 | adenocarcinoma | 67 | T1N1M0 | male | 1%–50% | 11.9 | 1.4 |
| LuC_23 | adenocarcinoma | 60 | T2N0M0 | male | 0 | 12.06 | 3.2 |
| LuC_24 | adenocarcinoma | 67 | T2N0M0 | male | 0 | 10.77 | 3.8 |
| LuC_26 | small cell carcinoma | 65 | T3N2M0, IIIa | male | 1%–50% | 5.71 | 1.1 |
| LuC_28 | adenocarcinoma | 76 | T2N0M0 | male | 0 | 12.37 | 1.8 |
| LuC_29 | squamous cell carcinoma | 65 | T2N0M0 | male | 0 | 16.58 | 2.4 |
| LuC_34 | adenocarcinoma | 62 | pT1bN0M0 | female | 0 | 11.82 | 2.3 |
| LuC_35 | squamous cell carcinoma | 75 | T3N0M0 | male | >50% | 12.28 | 3.2 |
| LuC_36 | adenocarcinoma | 57 | pT2N0M0 | male | 1%–50% | 11.3 | 2.6 |
| LuC_37 | squamous cell carcinoma | 68 | T3N1M0 | male | 0 | 11.93 | 2.3 |
| LuC_38 | adenocarcinoma | 68 | pT2aN2M0 | male | 1%–50% | 15.38 | 3.5 |
| LuC_39 | adenocarcinoma | 68 | pT2pNXpM1 | female | 0 | 8.58 |
RIN—RNA integrity number, mln—million, NA—not assessed.
Figure 1Effect of time interval between paraffinization and analysis (in days) on the quality of the sample. (A) RIN vs. time between paraffinization and analysis (Days): Spearman’s rho = −0.496 (p-value = 0.00012). (B) RIN vs. number of uniquely mapped reads per sample: Spearman’s rho = 0.304 (p-value = 0.022). (C) Time between paraffinization and analysis (Days) vs. number of uniquely mapped reads per sample: Spearman’s rho = −0.496 (p-value = 0.0001). Grey zone indicates 95% confidence interval for the trendlines. RIN—RNA integrity number, mln—million, cor—correlation coefficient.
Figure 2The hierarchical clustering dendrogram of experimental RNA sequencing profiles of breast and lung cancer and corresponding normal tissues from the ANTE database. Gene expression data were used to calculate Euclidean distances between the samples. The color markers indicate tissue types. The lower scale indicates the number of uniquely mapped reads. ‘CT’ denotes the coverage threshold of 2.5 million uniquely mapped reads.
Figure 3Correlation plot for four technical replicates (different slices from the same FFPE block) obtained from lung cancer tissue specimen. The samples were sequenced and processed separately. Top diagonal shows correlation coefficients (Spearman’s rho). Bottom diagonal shows pairwise plots for gene expression values.
Figure 4IHC results vs. mRNA level measured by NGS RNA sequencing: (A) HER2: correlation coefficient (Spearman’s rho) = 0.798 (p-value = 6.9 × 10−10); (B) ESR1: correlation coefficient (Spearman’s rho) = 0.777 (p-value = 3.8 × 10−9); (C) PGR: correlation coefficient (Spearman’s rho) = 0.653 (p-value = 4.9×10−6); (D) PD-L1: correlation coefficient (Spearman’s rho) = 0.797 (p-value =4.4 × 10−5). Grey zone indicates 95% confidence interval for the trendlines; (E) PD-L1 IHC staining examples. (F) HER2 IHC staining examples; (H) ER (ESR) IHC staining examples; (H) PR (PGR) IHC staining examples. Cor—correlation coefficient.
Figure 5Computational simulation of gene-mapped reads coverage using random reads permutations. Left panels: p-value for Spearman’s rho vs. coverage. Right panels: Spearman’s rho vs. coverage. “CT” indicates coverage threshold of 2,500,000 reads.
Figure 6Percentile of gene counts for marker genes versus IHC (immunohistochemistry) score in breast and lung cancer samples.
Figure 7IHC results vs. mRNA level measured by NGS RNA sequencing in The Cancer Genome Atlas (TCGA) data: (A) HER2: area under the receiver-operator curve (AUC = 0.82); (B) ESR1: area under the receiver-operator curve (AUC = 0.96); (C) PGR: area under the receiver-operator curve (AUC = 0.92). * p-value < 2.2 × 10−16 (Wilcoxon rank-sum test).
Area under the receiver-operator curve (AUC) for predicting IHC status using RNA sequencing data.
| Protein | Experimental Dataset | The Cancer Genome Atlas |
|---|---|---|
| HER2 | 0.963 | 0.818 |
| ESR1 | 0.921 | 0.959 |
| PGR | 0.912 | 0.923 |
| PDL1 | 0.922 | Not available |
Figure 8Proteomic results vs. mRNA level measured by NGS RNA sequencing in CPTAC data: (A) HER2: correlation coefficient (Spearman’s rho) = 0.62 (p-value < 2.2 × 10−16); (B) ESR1: correlation coefficient (Spearman’s rho) = 0.81 (p-value < 2.2 × 10−16); (C) PGR: correlation coefficient (Spearman’s rho) = 0.74 (p-value < 2.2 × 10−16); Grey zone indicates 95% confidence interval for the trendlines. Cor—correlation coefficient.