| Literature DB >> 30881377 |
George S Krasnov1, Anna V Kudryavtseva1, Anastasiya V Snezhkina1, Valentina A Lakunina1, Artemy D Beniaminov1, Nataliya V Melnikova1, Alexey A Dmitriev1.
Abstract
Quantitative PCR (qPCR) remains the most widely used technique for gene expression evaluation. Obtaining reliable data using this method requires reference genes (RGs) with stable mRNA level under experimental conditions. This issue is especially crucial in cancer studies because each tumor has a unique molecular portrait. The Cancer Genome Atlas (TCGA) project provides RNA-Seq data for thousands of samples corresponding to dozens of cancers and presents the basis for assessment of the suitability of genes as reference ones for qPCR data normalization. Using TCGA RNA-Seq data and previously developed CrossHub tool, we evaluated mRNA level of 32 traditionally used RGs in 12 cancer types, including those of lung, breast, prostate, kidney, and colon. We developed an 11-component scoring system for the assessment of gene expression stability. Among the 32 genes, PUM1 was one of the most stably expressed in the majority of examined cancers, whereas GAPDH, which is widely used as a RG, showed significant mRNA level alterations in more than a half of cases. For each of 12 cancer types, we suggested a pair of genes that are the most suitable for use as reference ones. These genes are characterized by high expression stability and absence of correlation between their mRNA levels. Next, the scoring system was expanded with several features of a gene: mutation rate, number of transcript isoforms and pseudogenes, participation in cancer-related processes on the basis of Gene Ontology, and mentions in PubMed-indexed articles. All the genes covered by RNA-Seq data in TCGA were analyzed using the expanded scoring system that allowed us to reveal novel promising RGs for each examined cancer type and identify several "universal" pan-cancer RG candidates, including SF3A1, CIAO1, and SFRS4. The choice of RGs is the basis for precise gene expression evaluation by qPCR. Here, we suggested optimal pairs of traditionally used RGs for 12 cancer types and identified novel promising RGs that demonstrate high expression stability and other features of reliable and convenient RGs (high expression level, low mutation rate, non-involvement in cancer-related processes, single transcript isoform, and absence of pseudogenes).Entities:
Keywords: CrossHub; RNA-Seq; TCGA; cancer; data normalization; gene expression; quantitative PCR; reference genes
Year: 2019 PMID: 30881377 PMCID: PMC6406071 DOI: 10.3389/fgene.2019.00097
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Components of the scoring function.
| SDP | T-N expression level difference (pooled samples) | Abs (log2FCP) 10−90 | 0.05 | 0.25 | 2.5 | 1 | 0 | 4 | 1 (all samples) |
| SDL | T-N expression level difference (paired samples) | Abs (Average(log2FCL)10−90) | 1 (paired samples) | ||||||
| SDoO | T-N expression level difference: outliers, overexpression | Abs (Average(log2FCL)90−100) | 0.1 | 0.7 | 2.5 | 1 | 10 | 1 | 1 (paired samples) |
| SDoU | T-N expression level difference: outliers, underexpression | Abs (Average(log2FCL)0−10) | 1 (paired samples) | ||||||
| SDLc | Cumulative T-N expression difference among paired samples | Average (Abs(log2FCL)10−90) | 0.1 | 0.5 | 2.5 | 1 | 5 | 2 | 1 (paired samples) |
| SEStD | Expression level stability: standard deviation | StDev (CPM)10−90/Average (CPM)10−90 | 0.1 | 0.3 | 2 | 1 | 5 | 1.5 | 2 (all samples: normal and tumor) |
| SEoH | Expression level stability: outliers (high expression) | log2 (Average(CPM)90−100/Average (CPM)10−90) | 0.1 | 0.7 | 2.5 | 1 | 5 | 0.75 | 2 (all samples: normal and tumor) |
| SEoL | Expression level stability: outliers (low expression) | log2 (Average(CPM)10−90/Average (CPM)0−10) | 2 (all samples: normal and tumor) | ||||||
| SEA | Average expression level | 1/log2 (CPM)10−90 | 0.07 | 0.15 | 3 | 1 | 0 | 6 | 1 (all tumor samples) |
| SCp | Correlations of expression with clinical parameters ( | -log2 ( | 2 | 4 | 3 | 0.3 | 5 | 0.3 | 18 (3 × 6; 3: CPM10−90 all tumor samples, CPM10−90 all normal samples, (log2FCL)10−90; 6: pathologic TNM classification, pathologic stage, follow-up—person neoplasm cancer status, follow-up—treatment success) |
| SCr | Correlations of expression with clinical parameters ( | Abs ( | 0.1 | 0.25 | 2.5 | 0.3 | 5 | 0.2 | 18 (the same as above) |
| SMut | Percentile of mutation rate | 75 | 95 | 4 | 1 | ||||
| SIsoforms | Number of transcript isoforms | 1 | 3 | 2 | 0.4 | ||||
| SPseudogenes | Number of pseudogenes | 0 | 2 | 2 | 0.4 | ||||
Percentiles, which were taken into calculation, are indicated as a subscript.
IV, ideal value; IP, inflection point; CS, curve slope; Sq, “squeeze”; CA, constant add; W, weight; Abs (…), absolute value; Average (…), mean value; CPM, counts per million, gene expression level; FC.
Figure 1Scoring functions used for evaluation of gene suitability for qPCR data normalization. Percentiles, which were taken into calculation, are indicated as a subscript. Abs(…), absolute value; Avg(…), mean value; CPM, counts per million, gene expression level; FCP, ratio of the average CPM in a pool of tumor samples to the average CPM in a pool of normal samples; FCL, ratio of CPM values between tumor and matched normal tissue (per each paired sample); StDev(…), standard deviation; rs, Spearman's correlation coefficient.
Top 5 traditionally used reference genes with the highest expression scores in 12 cancer types.
| BRCA | 82.1 | 75.7 | 71.8 | 69.8 | 66.2 | |||||
| LUAD | 79.8 | 76.4 | 69.6 | 67.9 | 65.5 | |||||
| LUSC | 81.4 | 72.9 | 71.4 | 70.7 | 66.3 | |||||
| KIRC | 82.6 | 73.2 | 69.7 | 68.7 | 64.7 | |||||
| KIRP | 70.3 | 66.0 | 63.2 | 61.7 | 61.1 | |||||
| PRAD | 80.8 | 78.4 | 76.2 | 76.1 | 75.8 | |||||
| COAD | 76.9 | 73.4 | 72.8 | 72.0 | 71.6 | |||||
| HNSC | 73.4 | 72.7 | 68.1 | 64.2 | 63.1 | |||||
| LIHC | 82.3 | 80.9 | 78.4 | 65.7 | 56.4 | |||||
| STAD | 71.7 | 71.0 | 69.7 | 68.7 | 68.1 | |||||
| THCA | 84.4 | 84.3 | 80.0 | 79.2 | 76.0 | |||||
| BLCA | 66.3 | 65.9 | 63.3 | 62.2 | 61.2 | |||||
| Cross-tissue | 70.1 | 60.8 | 59.8 | 54.7 | 54.3 | |||||
Optimal pairs of reference genes for each cancer type are shown in bold.
Figure 2The pipeline for identification of promising reference genes for qPCR data normalization in cancer studies. SIso − SIsoforms, SPse − SPseudogenes.