| Literature DB >> 31784565 |
Marina Wright Muelas1, Farah Mughal2, Steve O'Hagan3,4, Philip J Day5,6, Douglas B Kell7,8.
Abstract
We recently introduced the Gini coefficient (GC) for assessing the expression variation of a particular gene in a dataset, as a means of selecting improved reference genes over the cohort ('housekeeping genes') typically used for normalisation in expression profiling studies. Those genes (transcripts) that we determined to be useable as reference genes differed greatly from previous suggestions based on hypothesis-driven approaches. A limitation of this initial study is that a single (albeit large) dataset was employed for both tissues and cell lines. We here extend this analysis to encompass seven other large datasets. Although their absolute values differ a little, the Gini values and median expression levels of the various genes are well correlated with each other between the various cell line datasets, implying that our original choice of the more ubiquitously expressed low-Gini-coefficient genes was indeed sound. In tissues, the Gini values and median expression levels of genes showed a greater variation, with the GC of genes changing with the number and types of tissues in the data sets. In all data sets, regardless of whether this was derived from tissues or cell lines, we also show that the GC is a robust measure of gene expression stability. Using the GC as a measure of expression stability we illustrate its utility to find tissue- and cell line-optimised housekeeping genes without any prior bias, that again include only a small number of previously reported housekeeping genes. We also independently confirmed this experimentally using RT-qPCR with 40 candidate GC genes in a panel of 10 cell lines. These were termed the Gini Genes. In many cases, the variation in the expression levels of classical reference genes is really quite huge (e.g. 44 fold for GAPDH in one data set), suggesting that the cure (of using them as normalising genes) may in some cases be worse than the disease (of not doing so). We recommend the present data-driven approach for the selection of reference genes by using the easy-to-calculate and robust GC.Entities:
Mesh:
Year: 2019 PMID: 31784565 PMCID: PMC6884504 DOI: 10.1038/s41598-019-54288-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Graphical indication of the means by which we calculate the Gini coefficient.
Studies used for assessing proposed stable reference genes.
| Study short name | Comments | Reference |
|---|---|---|
| GiniGene | Study presenting novel potential housekeeping genes in cells and tissues from the HPA project cell and tissue RNA-seq data. | [ |
| geNorm or Vandesompele | Classic set of reference genes in tissues and a means of analysing them | [ |
| Eisenberg | Very detailed analysis of housekeeping/ reference genes in tissues using the Illumina Body Map study of RNA-seq of 16 Human Tissues. E-MTAB-513. | [ |
| Lee | Two novel reference genes from a detailed analysis of 281 normal tissue samples from 17 different organs then compares between disease states m and cell lines. | [ |
| Caracausi | 646 expression profile data sets from 54 different human tissues. | [ |
Studies used for expression profiling data.
| Dataset short name | Comments | Reference |
|---|---|---|
| HPA | RNA-seq-based dataset from the Human Protein Atlas group. Two data sets available: one of 19,628 protein coding genes in 56 cell lines (HPA_C) and another of 19,613 protein coding genes in 59 tissues (HPA_T). | [ |
| CCLE | RNA-seq-based dataset (Cancer Cell Line Encyclopedia) of 58,035 genes in 934 human cancer cell lines (downloaded from EBI Expression Atlas E-MTAB2770). | [ |
| Klijn / Genentech | RNA-seq-based analysis of 57,711 genes in 622 human cancer cell lines (downloaded from EBI Expression Atlas E-MTAB-2706). | [ |
| GTEx | RNA-Seq data of 46,711 genes in 53 human tissue samples from the Genotype-Tissue Expression (GTEx) project (downloaded from EBI Expression Atlas E-MTAB-5214). | [ |
| PCAWG | RNA-Seq of 46,816 genes in 76 tissues, cancer and normal, from The International Cancer Genome Project: Pan Cancer Analysis of Whole Genomes (downloaded from EBI Expression Atlas E-MTAB-5200). |
|
| HBM | Illumina Body Map: RNA-seq of 16 Human tissues. (downloaded from EBI Expression Atlas E-MTAB-513). Used by Eisenberg and colleagues in their analysis of housekeeping/ reference genes in tissues. | [ |
Figure 2Gini coefficient and median expression levels of proposed reference genes in the HPA cell-line dataset. (A) GC versus median expression level of HPA dataset. (B) Median expression levels of CCLE vs HPA datasets. Line of best linear fit (in log space) shown is y = 0.991 + 0.827 × (r2 = 0.606). (C) Median expression levels of CCLE vs Klijn datasets. Line of best linear fit (in log space) shown is y = 0.998 + 0.804 × (r2 = 0.593). Colour coding: red, GeneGini reference genes; blue Eisenberg & Levanon; yellow Vandesompele; green Lee; lilac both GeneGini and Eisenberg and Levanon.
Figure 3Gini coefficient of candidate reference genes in CCLE and Klijn/Genentech cell-line datasets. Left panel shows all proposed housekeeping genes considered in this study, with the right panel showing labels of those genes with a GC < 0.25. The line of best fit is y = −0.171 + 0.829 × (r2 = 0.909). Colour code as in Fig. 2.
Figure 4Robustness of the Gini coefficient. (A) IQR of different genes in Klijn/Genentech vs HPA cell-line dataset. Left panel shows all genes considered in this study, with right panel showing genes with IQR < 2 in both datasets. Line of best linear fit (in log space) shown is y = 0.01 + 1.11 × (r2 = 0.937). (B) IQR of different genes in CCLE vs HPA cell-line dataset. Left panel shows all genes considered in this study, with right panel showing genes with IQR < 2 in both datasets. Line of best linear fit (in log space) shown is y = 0.04 + 0.99 × (r2 = 0.930). (C) Min vs Max: Median expression levels in HPA data set. Colour code as in Fig. 2.
Figure 5Shared and unique genes in HPA, CCLE and Klijn/Genentech cell-line data sets. (A) Genes with a GC < 0.2 .(B) Housekeeping genes in Table 2 with GC < 0.2.
Figure 6GC vs Median for 115 genes in. (A) HPA, (B). CCLE and C. Klijn/Genentech cell-line data sets. Colour coding: Blue, Caracausi; Green, GeneGini reference genes; Grey, neither. Shape coding: Circle, other; Triangle, SLC coding gene.
Descriptive statistics of 13 genes common across cell-line data sets with GC < 0.2.
| Gene | Gini (HPA Cells) | Gini (CCLE) | Gini (Klijn) | Median (HPA Cells) | Median (CCLE) | Median (Klijn) | RSD (HPA Cells) | RSD (CCLE) | RSD (Klijn) | GeneGini (GG)/GeNorm (V), Eisenberg (EL), Lee (L), Caracausi © /N (Cell Line Data sets) | Reference | S/A/O | Protein name | Uniprot ID | Role |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ARF1 | 0.18 | 0.19 | 0.18 | 316.70 | 423.00 | 517.00 | 32.54 | 35.87 | 35.17 | N | N | O | ADP-ribosylation factor 1 | P84077 | Essential and ubiquitous GTP-binding protein regulators of vesicular trafficking and actin remodeling. |
| CNBP | 0.15 | 0.20 | 0.16 | 324.24 | 602.00 | 637.50 | 28.47 | 37.37 | 29.49 | N | N | O | Cellular nucleic acid-binding protein | P62633 | Zinc finger protein, function unclear (Pellizzoni et al. 1997), regulates protein translation and transcription (Wei 2018) |
| DYNLL1 | 0.17 | 0.19 | 0.16 | 485.97 | 215.50 | 224.00 | 30.73 | 34.50 | 28.50 | N | N | O | Dynein light chain 1, cytoplasmic | P63167 | Component of dynein involved in intracellular transport and motility |
| EDF1 | 0.16 | 0.19 | 0.18 | 449.42 | 379.00 | 502.50 | 29.69 | 33.83 | 34.30 | N | N | O | Endothelial differentiation-related factor 1 | O60869 | Modulates transcription of genes involved in endothelial differentiation, also acts as a transcriptional coactivator (Cazzaniga 2018) |
| EIF4H | 0.15 | 0.18 | 0.17 | 294.21 | 553.50 | 673.00 | 27.91 | 33.27 | 30.64 | N | N | O | Eukaryotic translation initiation factor 4H | Q15056 | Translation initiation factor |
| HNRNPC | 0.18 | 0.19 | 0.17 | 800.62 | 314.50 | 409.50 | 32.96 | 34.41 | 29.97 | N | N | O | Heterogeneous nuclear ribonucleoproteins C1/C2 | P07910 | RNA binding protein involved in regulation of RNA splicing, export, expression, stability, and translation. |
| HNRNPK | 0.14 | 0.17 | 0.12 | 603.32 | 548.00 | 625.50 | 25.19 | 29.60 | 21.35 | GG | O | Heterogeneous nuclear ribonucleoprotein K | P61978 | Regulation of RNA transcription and translation, splicing, nuclear export, and decay | |
| PCBP1 | 0.14 | 0.20 | 0.16 | 291.40 | 336.00 | 452.00 | 24.52 | 36.23 | 29.01 | GG | O | Poly(rC)-binding protein 1 | Q15365 | Regulation of mRNA transcription, translation and stability | |
| PFDN5 | 0.17 | 0.20 | 0.19 | 451.20 | 158.00 | 152.50 | 31.60 | 41.30 | 35.69 | N | N | O | Prefoldin subunit 5 | Q99471 | Molecular protein folding cytosolic chaperone. Prevents misfolding of newly synthesised nascent polypeptides |
| SF3B1 | 0.16 | 0.19 | 0.15 | 179.26 | 143.00 | 164.00 | 29.26 | 33.89 | 27.01 | N | N | O | Splicing factor 3B subunit 1 | O75533 | Essential RNA-protein complex involved in pre-mRNA splicing |
| SLC25A3 | 0.17 | 0.18 | 0.16 | 471.21 | 154.00 | 193.00 | 30.19 | 32.86 | 28.39 | C | S | Phosphate carrier protein, mitochondrial | Q00325 | Phosphate transport from cytoplasm to mitochondria, with protons. | |
| SRP14 | 0.17 | 0.19 | 0.17 | 224.37 | 296.00 | 347.50 | 30.36 | 34.77 | 30.62 | N | N | O | Signal recognition particle 14 kDa protein | P37108 | Signal-recognition-particle assembly has a crucial role in targeting secretory proteins to the rough endoplasmic reticulum membrane. Required for elongation arrest by binding with SRP9 to the Alu domain. |
| SRSF3 | 0.19 | 0.19 | 0.15 | 260.33 | 164.00 | 207.00 | 35.17 | 33.97 | 28.60 | N | N | O | Serine/arginine-rich splicing factor 3 | P84103 | splicing factor that promotes exon inclusion during alternative splicing. Regulatory roles in RNA metabolism and functions such as mRNA splicing and 3’end processing. Essential for embryo development |
In addition, the protein name, as well as UniProt ID and function are shown. S/A/O refers to SLC, ABC or Other respectively.
Figure 7Robustness of GC for finding stably expressed genes using shared genes between HPA, CCLE and Klijn/Genentech cell-line data sets with GC < 0.2. Shown are the results for the Klijn/Genentech dataset. (A) IQR vs GC, (B). Max:Mean vs Min. Colour coding: Blue, Caracausi; Green, GeneGini reference genes; Grey, neither. Shape coding: Circle, other; Triangle, SLC coding gene.
Figure 8Gini coefficient and median expression levels of proposed reference genes in the HPA tissue dataset. Colour coding: blue, Caracausi; purple, Eisenberg and Levanon; green, GeneGini reference genes; yellow, both GeneGini and Eisenberg and Levanon; orange, Lee; black, Vandesompele.
Figure 9Robustness of the Gini coefficient in the HPA tissue data set. (A) RSD versus Gini coefficient of candidate reference genes. Line of best linear fit (in log space) shown is y = 2.45 + 1.24 × (r2 = 0.938) (B). IQR versus Gini coefficient of candidate reference genes. Line of best linear fit (in log space) shown is y = 0.87 + 0.96 × (r2 = 0.566). Colour code as in Fig. 8.
Figure 10UpSetR[139] plot showing genes with a GC < 0.2 that are variously shared and unique across the PCAWG, HBM, GTEX and HPA tissue data sets. The data underpinning this plot can be found in Supplementary Table S4.
Descriptive statistics of 15 common genes across tissue data sets with a GC < 0.2.
| Gene | Gini | Gini | Gini | Gini | Median | Median | Median | Median | % RSD | % RSD | % RSD | % RSD | Protein name | UniProt ID | Function (UniProt) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (HPA Tissue) | (GTEx) | (PCAWG) | (HBM) | (HPA Tissue) | (GTEx) | (PCAWG) | (HBM) | (HPA Tissue) | (GTEx) | (PCAWG) | (HBM) | ||||
| CHCHD4 | 0.19 | 0.14 | 0.19 | 0.19 | 13.08 | 17 | 20 | 25.69 | 36.11 | 27.26 | 40.35 | 35.44 | Mitochondrial intermembrane space import and assembly protein 40 | Q8N4Q1 | Functions as a chaperone and catalyses formation of disulfide bonds in substrate proteins such as COX17, COX19 and MICU1. Required for import of small cysteine-containing proteins in the mitochondrial intermembrane space. |
| COPS5 | 0.17 | 0.17 | 0.2 | 0.17 | 45.27 | 19 | 20 | 20 | 32.46 | 30.76 | 43.24 | 33.27 | COP9 signalosome complex subunit 5 (SGN5) | Q92905 | Probable protease subunit of the COP9 signalosome complex (CSN), a complex involved in various cellular and developmental processes. The CSN complex is an essential regulator of the ubiquitin (Ubl) conjugation pathway by mediating the deneddylation of the cullin subunits of the SCF-type E3 ligase complexes, leading to decrease the Ubl ligase activity of SCF-type complexes such as SCF, CSA or DDB2. The complex is also involved in phosphorylation of p53/TP53, c-jun/JUN, IkappaBalpha/NFKBIA, ITPK1 and IRF8, possibly via its association with CK2 and PKD kinases. CSN-dependent phosphorylation of TP53 and JUN promotes and protects degradation by the Ubl system, respectively. In the complex, it probably acts as the catalytic center that mediates the cleavage of Nedd8 from cullins. It however has no metalloprotease activity by itself and requires the other subunits of the CSN complex. Interacts directly with a large number of proteins that are regulated by the CSN complex, confirming a key role in the complex. Promotes the proteasomal degradation of BRSK2. |
| COX4I1 | 0.17 | 0.12 | 0.18 | 0.16 | 447.69 | 123 | 144 | 94.13 | 33.09 | 22.96 | 37.11 | 28.92 | Cytochrome c oxidase subunit 4 isoform 1, mitochondrial | P13073 | This protein is one of the nuclear-coded polypeptide chains of cytochrome c oxidase, the terminal oxidase in mitochondrial electron transport. |
| IDH3G | 0.16 | 0.18 | 0.17 | 0.18 | 44.67 | 56 | 60 | 34.75 | 28.6 | 31.58 | 33.02 | 32.45 | Isocitrate dehydrogenase [NAD] subunit gamma, mitochondrial | P51553 | Regulatory subunit which plays a role in the allosteric regulation of the enzyme catalyzing the decarboxylation of isocitrate (ICT) into alpha-ketoglutarate. The heterodimer composed of the alpha (IDH3A) and beta (IDH3B) subunits and the heterodimer composed of the alpha (IDH3A) and gamma (IDH3G) subunits, have considerable basal activity but the full activity of the heterotetramer (containing two subunits of IDH3A, one of IDH3B and one of IDH3G) requires the assembly and cooperative function of both heterodimers. |
| MAP2K2 | 0.2 | 0.17 | 0.18 | 0.17 | 60.91 | 55 | 58.5 | 30.75 | 36.87 | 31.05 | 34.07 | 31.65 | Dual specificity mitogen-activated protein kinase kinase 2 (MAP kinase kinase 2) (MAPKK 2) (EC 2.7.12.2) | P36507 | Catalyzes the concomitant phosphorylation of a threonine and a tyrosine residue in a Thr-Glu-Tyr sequence located in MAP kinases. Activates the ERK1 and ERK2 MAP kinases (By similarity). |
| MTIF3 | 0.18 | 0.17 | 0.19 | 0.19 | 45.15 | 51 | 55 | 72.63 | 33.81 | 30.61 | 37.88 | 37.82 | Translation initiation factor IF-3, mitochondrial (IF-3(Mt)) | Q9H2K0 | IF-3 binds to the 28 S ribosomal subunit and shifts the equilibrium between 55 S ribosomes and their 39 S and 28 S subunits in favor of the free subunits, thus enhancing the availability of 28 S subunits on which protein synthesis initiation begins. |
| MTRF1L | 0.17 | 0.19 | 0.19 | 0.14 | 7.86 | 11 | 17 | 17 | 31.84 | 33.29 | 34.42 | 28.57 | Peptide chain release factor 1-like, mitochondrial | Q9UGC7 | Mitochondrial peptide chain release factor that directs the termination of translation in response to the peptide chain termination codons UAA and UAG. |
| NDUFB8 | 0.16 | 0.16 | 0.18 | 0.19 | 143.5 | 39 | 37.5 | 56.63 | 30.35 | 28.76 | 33.91 | 35.07 | NADH dehydrogenase [ubiquinone] 1 beta subcomplex subunit 8, mitochondrial | O95169 | Accessory subunit of the mitochondrial membrane respiratory chain NADH dehydrogenase (Complex I), that is believed not to be involved in catalysis. Complex I functions in the transfer of electrons from NADH to the respiratory chain. The immediate electron acceptor for the enzyme is believed to be ubiquinone. |
| NMT1 | 0.2 | 0.2 | 0.18 | 0.16 | 29.71 | 46 | 51.5 | 39.94 | 36.83 | 35.09 | 34.69 | 29.15 | Glycylpeptide N-tetradecanoyl-transferase 1 (EC 2.3.1.97) | P30419 | Enzyme catalysing transfer of myristate from CoA to proteins. Required for full expression of the biological activiteies of several N-myristoylated proteins, including the alpha subunit of the signal-transducing guanine nucleotide-binding protein (G protein) GO (GNAO1; MIM 139311) |
| PPID | 0.16 | 0.17 | 0.17 | 0.19 | 31.11 | 29 | 32.5 | 44.75 | 29.02 | 32.72 | 34.11 | 33.73 | Peptidyl-prolyl cis-trans isomerase D (PPIase D) (EC 5.2.1.8) | Q08752 | Catalyze the cis-trans isomerization of proline imidic peptide bonds in oligopeptides and accelerate the folding of proteins. This protein has been shown to possess PPIase activity and, similar to other family members, can bind to the immunosuppressant cyclosporin A. |
| RTCA | 0.17 | 0.18 | 0.2 | 0.18 | 26.5 | 24 | 27 | 33.69 | 30.67 | 35.82 | 42.89 | 33.68 | RNA 3’-terminal phosphate cyclase (RNA cyclase) | O00442 | Catalyzes the conversion of 3’-phosphate to a 2’,3’-cyclic phosphodiester at the end of RNA. The mechanism of action of the enzyme occurs in 3 steps: (A) adenylation of the enzyme by ATP; (B) transfer of adenylate to an RNA-N3’P to produce RNA-N3’PP5’A; (C) and attack of the adjacent 2’-hydroxyl on the 3’-phosphorus in the diester linkage to produce the cyclic end product. The biological role of this enzyme is unknown but it is likely to function in some aspects of cellular RNA processing. |
| SELENOK | 0.19 | 0.16 | 0.18 | 0.18 | 31.07 | 49 | 49 | 80.94 | 36.89 | 30.39 | 38.19 | 33.31 | Selenoprotein K (SelK) | Q9Y6D0 | Required for Ca2 + flux in immune cells and plays a role in T-cell proliferation and in T-cell and neutrophil migration (By similarity). Involved in endoplasmic reticulum-associated degradation (ERAD) of soluble glycosylated proteins (PubMed:22016385). Required for palmitoylation and cell surface expression of CD36 and involved in macrophage uptake of low-density lipoprotein and in foam cell formation (By similarity). Together with ZDHHC6, required for palmitoylation of ITPR1 in immune cells, leading to regulate ITPR1 stability and function (PubMed:25368151). Plays a role in protection of cells from ER stress-induced apoptosis (PubMed:20692228). Protects cells from oxidative stress when overexpressed in cardiomyocytes (PubMed:16962588). |
| SMG5 | 0.19 | 0.16 | 0.19 | 0.18 | 34.89 | 63 | 64 | 34.13 | 35.95 | 27.52 | 44.99 | 34.09 | Protein SMG5 (EST1-like protein B) | Q9UPR3 | Plays a role in nonsense-mediated mRNA decay. Does not have RNase activity by itself. Promotes dephosphorylation of UPF1. Together with SMG7 is thought to provide a link to the mRNA degradation machinery involving exonucleolytic pathways, and to serve as an adapter for UPF1 to protein phosphatase 2 A (PP2A), thereby triggering UPF1 dephosphorylation. Necessary for TERT activity. |
| SNX3 | 0.17 | 0.18 | 0.19 | 0.18 | 169.22 | 190 | 208.5 | 327.06 | 30.77 | 31.22 | 39.21 | 33.13 | Sorting nexin-3 (Protein SDP3) | O60493 | Phosphoinositide-binding protein required for multivesicular body formation. Specifically binds phosphatidylinositol 3-phosphate (PtdIns(P3)). Also can bind phosphatidylinositol 4-phosphate (PtdIns(P4)), phosphatidylinositol 5-phosphate (PtdIns(P5)) and phosphatidylinositol 3,5-biphosphate (PtdIns(3,5)P2) (By similarity). Plays a role in protein transport between cellular compartments. Together with RAB7A facilitates endosome membrane association of the retromer cargo-selective subcomplex (CSC/VPS). May in part act as component of the SNX3-retromer complex which mediates the retrograde endosome-to-TGN transport of WLS distinct from the SNX-BAR retromer pathway (PubMed:21725319, PubMed:24344282). Promotes stability and cell surface expression of epithelial sodium channel (ENAC) subunits SCNN1A and SCNN1G (By similarity). Not involved in EGFR degradation. Involved in the regulation of phagocytosis in dendritic cells possibly by regulating EEA1 recruitment to the nascent phagosomes (PubMed:23237080). Involved in iron homeostasis through regulation of endocytic recycling of the transferrin receptor TFRC presumably by delivering the transferrin:transferrin receptor complex to recycling endosomes; the function may involve the CSC retromer subcomplex (By similarity). In the case of Salmonella enterica infection plays arole in maturation of the Salmonella-containing vacuole (SCV) and promotes recruitment of LAMP1 to SCVs (PubMed:20482551). |
| SURF1 | 0.18 | 0.15 | 0.2 | 0.17 | 18.3 | 47 | 57.5 | 45.69 | 34.94 | 26.2 | 38.25 | 32.15 | Surfeit locus protein 1 | Q15526 | Component of the MITRAC (mitochondrial translation regulation assembly intermediate of cytochrome c oxidase complex) complex, that regulates cytochrome c oxidase assembly. |
In addition, the protein name, as well as UniProt ID and function are shown.
Details of human cell lines used for the assessment of expression of candidate reference genes by RT-qPCR.
| Cell line | Tissue | Disease | Morphology | Growth mode | Media |
|---|---|---|---|---|---|
| K562 | Blood | Chronic Myeloid Leukemia | Lymphoblast | Suspension | RPMI-1640 |
| HEK293 | Kidney | Immortalized cell line obtained by transfecting sheared adenovirus 5 DNA | Epithelial | Adherent | DMEM |
| Panc1 | Pancreas | Pancreatic carcinoma of ductal origin | Epithelial | Adherent | DMEM |
| SH-SY5Y | Neuroblastoma | metastasis | Neuroblast | Adherent | DMEM |
| T24 | Bladder | bladder carcinoma | Epithelial | Adherent | McCoy’s 5A |
| J82 | Bladder | Transitional cell carcinoma | Epithelial | Adherent | EMEM |
| RT-112 | Bladder | Carcinoma | Epithelial | Adherent | RPMI-1640 |
| 5637 | Bladder | Grade II carcinoma | Epithelial | Adherent | RPMI-1640 |
| PC3 | Prostate | Grade IV adenocarcinoma | Epithelial | Adherent | Ham’s F12 |
| PNT2 | Prostate | Immortalized with SV40 | Epithelial | Adherent | RPMI-1640 |
Candidate reference genes used to assess expression stability experimentally by RT-qPCR.
| Gene Name | Uniprot | Gini (HPA Cell Lines) | GeneGini (GG)/GeNorm (V), Eisenberg & Levanon (EL), Lee (L), Caracausi (C), Zhang & Kriegova (ZK) | S/A/O | Reference |
|---|---|---|---|---|---|
| ACTB | 0.26 | V | O | [ | |
| B2M | 0.44 | V | O | [ | |
| BTF3 | 0.15 | GG | O | [ | |
| C2orf49 | 0.14 | GG | O | [ | |
| CHTOP | 0.14 | GG | O | [ | |
| CLINT1 | 0.14 | GG | O | [ | |
| CNOT2 | 0.14 | GG | O | [ | |
| CNOT4 | 0.13 | GG | O | [ | |
| GAPDH | 0.27 | V | O | [ | |
| GGNBP2 | 0.15 | GG | O | [ | |
| GORASP2 | 0.15 | GG | O | [ | |
| HMBS | 0.26 | V | O | [ | |
| HNRNPK | 0.14 | GG | O | [ | |
| HPRT1 | 0.31 | V | O | [ | |
| IK | 0.13 | GG | O | [ | |
| INTS14 | 0.14 | GG | O | [ | |
| KAT5 | 0.13 | GG | O | [ | |
| MDH1 | 0.15 | GG | O | [ | |
| NACA | 0.15 | GG | O | [ | |
| NXF1 | 0.12 | GG | O | [ | |
| PARK7 | 0.14 | GG | O | [ | |
| PCBP1 | 0.14 | GG | O | [ | |
| PCBP2 | 0.14 | GG | O | [ | |
| RBM45 | 0.12 | GG | O | [ | |
| RNF123 | 0.15 | GG | O | [ | |
| RPL13A | 0.21 | V | O | [ | |
| RPL32 | 0.22 | ZK | O | [ | |
| RPL41 | 0.15 | GGC | O | [ | |
| RPRD2 | 0.14 | GG | O | [ | |
| SDHA | 0.29 | V | O | [ | |
| SF3B2 | 0.11 | GG | O | [ | |
| SNW1 | 0.13 | GG | O | [ | |
| SRP19 | 0.14 | GG | O | [ | |
| SUPT7L | 0.13 | GG | O | [ | |
| TRNT1 | 0.15 | GG | O | [ | |
| TXNL1 | 0.14 | GG | O | [ | |
| UBE2Q1 | 0.14 | GG | O | [ | |
| UBR2 | 0.14 | GG | O | [ | |
| UXT | 0.13 | GG | O | [ | |
| VPS29 | 0.15 | GGEL | O | [ |
Included are gene name and UniProt ID, Gini coefficient as calculated using the HPA cell-line data set. S/A/O refers to SLC, ABC or Other respectively.
Figure 11The KNIME workflow described here to calculate descriptive statistics and the Gini coefficient from RT-qPCR data. This workflow can be adapted for use with large RNA-Seq Data sets.
Figure 12Gini coefficient and median expression levels of candidate reference genes assessed by RT-qPCR. Left panel shows all genes considered in this study, with right panel showing genes with GC < 0.2. Colour coding: green, GeneGini reference genes; red, both GeneGini and Caracausi reference genes; yellow, GeneGini and Eisenberg and Levanon; orange, Lee, yellow; black, Vandesompele; purple, Zhang and Kriegova.
Figure 13Robustness of the Gini coefficient in assessed experimentally by RT-qPCR using a small subset of proposed reference genes. Left panel shows Gini coefficient vs % RSD for all genes considered in this study, with right panel showing the same with genes with a GC < 0.2 and % RSD <10. Line of best linear fit shown is y = 0.002 + 0.004 × (r2 = 0.988). Shape coding as in Fig. 12.