| Literature DB >> 35228606 |
Dorota Hoja-Łukowicz1, Dawid Maciążek2, Piotr Kościelniak3, Marcelina E Janik4.
Abstract
The algorithms commonly used to select the best stable reference gene in RT-qPCR data analysis have their limitations. We showed that simple selection of the reference gene or pair of genes with the lowest stability value from the pool of potential reference genes-a commonly used approach-is not sufficient to accurately and reliably normalize the target gene transcript and can lead to biologically incorrect conclusions. For reliable assessment of changes in a target gene expression level, we propose our innovative GenExpA software, which works in a manner independent of the experimental model and the normalizer used. GenExpA software selects the best reference by combining the NormFinder algorithm with progressive removal of the least stable gene from the candidate genes in a given experimental model and in the set of daughter models assigned to it. The reliability of references is validated based on the consistency of the statistical analyses of normalized target gene expression levels through all models, described by the coherence score (CS). The use of the CS value imparts a new quality to qPCR analysis because it clarifies how low the stability value of reference must be in order for biologically correct conclusions to be drawn. We tested our method on qPCR data for the B4GALT genes family in melanoma, which is characterized by a high mutation rate, and in melanocytes. GenExpA is available at https://github.com/DorotaHojaLukowicz/GenExpA or https://www.sciencemarket.pl/baza-programow-open-source#oferty .Entities:
Mesh:
Year: 2022 PMID: 35228606 PMCID: PMC8885735 DOI: 10.1038/s41598-022-07257-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overview of the GenExpA tool workflow. (a) (A) The user uploads a table with Ct values for ‘n’ candidate reference genes and ‘y’ target genes measured in ‘x’ samples. The user uploads a table with calculated quantified values for ‘n’ candidate reference genes. (B) In the setting mode (not shown in this paper; see the GenExpA manual), the user separates ‘n’ potential reference genes from a list of all genes. After clicking on ‘Generate combinations’, the GenExpA tool automatically creates a set of possible ‘z’ models, i.e., ‘z’ models composed of samples of interest, which are combinations without repetitions (order does not matter) of two or more samples from the pool of ‘x’ samples. The ‘z’ models consist of the experimental model and its ‘z − 1’ daughter models (auxiliary models); for example, if the experimental model consists of 5 samples, the number of daughter models is 25 (10 models of 2 samples, 10 models of 3 samples, and 5 models of 4 samples). The user unchecks the statistical model, wherein the pairwise t-test with Holm adjustment is recommended as a statistical model for multiple pairs of observations in the case of a normal distribution (to determine the significance of a difference in gene expression between two or more samples). Alternatively, the non-parametric Mann–Whitney test or the ANOVA Kruskal–Wallis test with subsequent post hoc Dunn’s test are recommended for models composed of two unmatched samples or three or more unmatched samples, respectively, in the case of a non-normal distribution[23]. In the setting mode, the user determines the P value indicating the level of statistical significance (default P < 0.05; not shown in this paper; see the GenExpA manual). The user starts the automated data analysis by clicking on ‘Run calculations’; it causes the implemented NormFinder algorithm to select, in each of the analyzed models, the best references between ‘n’ potential reference genes (purple line) according to the stability values of their expression in the experimental model and its auxiliary models. The best stability value is referred to the minimal combined inter- and intrasample expression variation of the gene[9]. NormFinder works on Ct values converted to quantified values, which were calculated based on the calibration curve previously prepared for all candidate reference genes. Then the GenExpA tool normalizes the expression of target genes (RQ value) in each model in relation to selected references, conducts a statistical analysis and calculates the coherence score (CS) separately for each target gene [details of how GenExpA determines CS are shown in (b)]. If the CS value is equal to 1, the target gene expression analysis is completed. In the case of inconsistent results (CS < 1), the user can change the setting parameters by choosing 1 in the ‘Remove repetitions’ box and by marking the option ‘Select best remove for models’ (not shown in this paper; see the GenExpA manual). These new settings cause the least stable HKG in each model to be removed from the pool of ‘n’ candidate reference genes. The user starts the improved analysis by clicking on ‘Run calculations’. The GenExpA tool chooses new references based on a reduced pool of HKGs to ‘n − 1’ in each model and conducts a reanalysis (statistic and CS calculation) of the normalized target gene(s) (red line). This approach can be repeated serially (number in ‘Remove repetitions’ box should be increased by 1 each time) when the CS value is below 1 and the pool of reference genes is not less than three (from the green line to the navy blue line). It is important to note that after marking the option ‛Select best remove for models’, GenExpA selects the reference for a given model from the level at which the stability value was the lowest. (C) If the gene removal process does not yield consistency of analysis, the user can upload a new input extended with data for an additional HKG or HKGs and perform an improved analysis based on the strategy of removing the least stable HKG until consistency of results is reached. (D) The user can export the results and graphs of all analyses in publication-ready form. The results show coherence scores, the best reference gene or gene pair results, RQ data and statistics. (b) Flow chart of determination of the coherence score (CS) for a given normalized target gene. First, the GenExpA tool calculates the comparison values for a given target gene in each pair of samples in each model built from these samples. To do this, the software downloads the p-value attributed to a given two samples, and if P ≥ 0.05 it assigns the value 0 to the comparison. If P < 0.05 and the median of RQ for sample 1 is greater or lower than the median of RQ for sample 2, then comparison = 1 or − 1, respectively. The order of samples is fixed. In the aforementioned experimental model, consisting of five samples, where the number of all generated models is 26 (1 experimental model and 25 daughter models), a given pair of samples is present in 8 out of 26 models, so eight comparison values are generated. Next, the software analyzes the obtained comparison values to determine the partial CS value for a given target gene in a given two samples. If the eight comparison values are composed of 1 and − 1, then the partial CS = 0; in other cases (1 and 1; − 1 and − 1; 0 and 0; 1 and 0; − 1 and 0) the partial CS = 1. Finally, the software calculates the CS value for a given target gene as the arithmetic mean of the partial CSs across all pairs of samples (in the aforementioned 26 models there are 10 different pairs of two samples). The resulting CS takes a value between 0 to 1.
Reference genes and their stability values in a panel of the experimental model and daughter models selected by NormFinder software on the base of 4 (part A) and 5 (part B) potential reference genes before (0) and after removing one (1) or two (2) the least stable genes from a given set of 4 and 5 reference genes.
| No | Model | 4 potential reference genes—part A | 5 potential reference genes—part B | |||
|---|---|---|---|---|---|---|
| Remove repetition level | Remove repetition level | |||||
| 0 | 1 | 0 | 1 | 2 | ||
| 1 | HEMa-LP Mel202 | 0.212 | 0.074 | 0.076 | ||
| 2 | HEMa-LP WM35 | 0.248 | 0.208 | 0.225 | ||
| 3 | HEMa-LP WM793 | 0.310 | 0.093 | 0.101 | ||
| 4 | HEMa-LP WM266-4 | 0.374 | 0.145 | 0.177 | ||
| 5 | Mel202 WM35 | 0.121 | 0.105 | 0.121 | ||
| 6 | Mel202 WM793 | 0.092 | 0.077 | 0.056 | ||
| 7 | Mel202 WM266-4 | 0.171 | 0.091 | 0.171 | ||
| 8 | WM35 WM793 | 0.192 | 0.192 | 0.099 | ||
| 9 | WM35 WM266-4 | 0.112 | 0.112 | 0.078 | ||
| 10 | WM793 WM266-4 | 0.145 | 0.159 | 0.088 | ||
| 11 | HEMa-LP Mel202 WM35 | 0.251 | 0.272 | 0.333 | ||
| 12 | HEMa-LP Mel202 WM793 | 0.341 | 0.257 | 0.084 | ||
| 13 | HEMa-LP Mel202 WM266-4 | 0.356 | 0.203 | 0.172 | ||
| 14 | HEMa-LP WM35 WM793 | 0.402 | 0.264 | 0.235 | ||
| 15 | HEMa-LP WM35 WM266-4 | 0.376 | 0.241 | 0.256 | ||
| 16 | HEMa-LP WM793 WM266-4 | 0.504 | 0.331 | 0.266 | ||
| 17 | Mel202 WM35 WM793 | 0.181 | 0.181 | 0.106 | ||
| 18 | Mel202 WM35 WM266-4 | 0.185 | 0.207 | 0.185 | ||
| 19 | Mel202 WM793 WM266-4 | 0.222 | 0.200 | 0.249 | ||
| 20 | WM35 WM793 WM266-4 | 0.187 | 0.219 | 0.272 | ||
| 21 | HEMa-LP Mel202 WM35 WM793 | 0.370 | 0.236 | 0.229 | ||
| 22 | HEMa-LP Mel202 WM35 WM266-4 | HPRT1 SNRPA 0.316 | 0.264 | 0.323 | ||
| 23 | HEMa-LP Mel202 WM793 WM266-4 | 0.403 | 0.362 | 0.284 | ||
| 24 | HEMa-LP WM35 WM793 WM266-4 | 0.436 | 0.394 | 0.299 | ||
| 25 | Mel202 WM35 WM793 WM266-4 | 0.219 | 0.233 | 0.219 | ||
| 26 | HEMa-LP Mel202 WM35 WM793 WM266-4 | 0.364 | 0.330 | 0.267 | ||
The differences in the choice of the normalizer or obtained stability values are written in bold.
Figure 2Medians of relative quantity (RQ) for target genes B4GALT1–B4GALT7 in the experimental model normalized to a reference gene or pair of reference genes with the best stability values resulting from GenExpA analysis based on a set of five candidate reference genes and remove repetition level 2 (a) or remove repetition level 1 (b) or based on a set 3 (c) (see Table 1, part B). The statistical analyses used the pairwise t-test with Holm adjustment. Red line represents statistical significance, P < 0.05.
Summary of candidate reference and target genes.
| Symbol | Gene name (assay ID, TaqMan probes; amplicon size bp) | Location (GeneCards) | Description |
|---|---|---|---|
| Glucuronidase Beta (Hs00939627_m1; 96 bp) | 7q11.21 | Glycosaminoglycan metabolism | |
| Hypoxanthine Phosphoribosyltransferase 1 (Hs99999909_m1; 100 bp) | Xq26.2-q26.3 | Generation of purine nucleotides | |
| Phosphoglycerate Kinase 1 (Hs00943178_g1; 73 bp) | Xq21.1 | Metabolic pathway of glycolysis | |
| Ribosomal Protein S23 (Hs01922548_s1; 90 bp) | 5q14.2 | poly(A) RNA binding and structural constituent of ribosome | |
| Small Nuclear Ribonucleoprotein Polypeptide A (Hs00190231_m1; 123 bp) | 19q13.2 | Splicing of cellular pre-mRNAs | |
| Beta-1,4-Galactosyltransferase 1 (Hs00155245_m1; 70 bp) | 9p21.1 | Biosynthesis of different glycoconjugates and saccharide structures | |
| Beta-1,4-Galactosyltransferase 2 (Hs00243566_m1; 57 bp) | 1p34.1 | ||
| Beta-1,4-Galactosyltransferase 3 (Hs00937515_g1; 68 bp) | 1q23.3 | ||
Beta-1,4-Galactosyltransferase 4 (Hs00186850_m1; 70 bp) | 3q13.32 | ||
| Beta-1,4-Galactosyltransferase 5 (Hs00941041_m1; 66 bp) | 20q13.13 | ||
| Beta-1,4-Galactosyltransferase 6 (Hs00999574_m1; 69 bp) | 18q11.1 | ||
| Beta-1,4-Galactosyltransferase 7 (Hs01011260_g1; 74 bp) | 5q35.3 | ||