| Literature DB >> 32213971 |
Jingxin Tao1, Youjin Hao1, Xudong Li1, Huachun Yin1, Xiner Nie1, Jie Zhang1, Boying Xu1, Qiao Chen2, Bo Li1.
Abstract
For accurate gene expression quantification, normalization of gene expression data against reliable reference genes is required. It is known that the expression levels of commonly used reference genes vary considerably under different experimental conditions, and therefore, their use for data normalization is limited. In this study, an unbiased identification of reference genes in Caenorhabditis elegans was performed based on 145 microarray datasets (2296 gene array samples) covering different developmental stages, different tissues, drug treatments, lifestyle, and various stresses. As a result, thirteen housekeeping genes (rps-23, rps-26, rps-27, rps-16, rps-2, rps-4, rps-17, rpl-24.1, rpl-27, rpl-33, rpl-36, rpl-35, and rpl-15) with enhanced stability were comprehensively identified by using six popular normalization algorithms and RankAggreg method. Functional enrichment analysis revealed that these genes were significantly overrepresented in GO terms or KEGG pathways related to ribosomes. Validation analysis using recently published datasets revealed that the expressions of newly identified candidate reference genes were more stable than the commonly used reference genes. Based on the results, we recommended using rpl-33 and rps-26 as the optimal reference genes for microarray and rps-2 and rps-4 for RNA-sequencing data validation. More importantly, the most stable rps-23 should be a promising reference gene for both data types. This study, for the first time, successfully displays a large-scale microarray data driven genome-wide identification of stable reference genes for normalizing gene expression data and provides a potential guideline on the selection of universal internal reference genes in C. elegans, for quantitative gene expression analysis.Entities:
Keywords: Caenorhabditis elegans; RankAggreg; housekeeping gene; microarray; reference gene
Mesh:
Year: 2020 PMID: 32213971 PMCID: PMC7140892 DOI: 10.3390/cells9030786
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1The workflow for the identification of housekeeping gene (HKG) candidates. Briefly, gene expression matrix (GEM) for each microarray dataset was obtained by normalizing raw data using six algorithms (RMA, MAS5, Li-Wong, GCRMA, PLIER, and VSN, respectively), and the corresponding first-round ranked gene lists (abbr. GL, from GL1 to GLn) were achieved based on gene expression stability. And then, multiple GLs derived from the algorithm-specific normalization were merged as the second-round ranked gene list (abbr. SGL) using RankAggreg algorithm and produced six SGLs (SGLR, SGLM, SGLd, SGLG, SGLP, and SGLV). Finally, these six SGLs were intersected to obtain the final reliable HKG candidates.
Reliable housekeeping gene candidates of C. elegans identified in this study.
| Gene Symbol | Entrez | Description | Chromosome Location | Size (bp) |
|---|---|---|---|---|
|
| 178188 | 40S ribosomal protein S23 | Chr. IV: 12390264–12391396 | 550 |
|
| 178538 | 40S ribosomal protein S27 | Chr. V: 103394–104064 | 356 |
|
| 179998 | 40S ribosomal protein S16 | Chr. V: 15000011–15000594 | 533 |
|
| 173342 | 40S ribosomal protein S26 | Chr. I: 14759918–14760654 | 440 |
|
| 177481 | 40S ribosomal protein S4 | Chr. IV: 7083694–7084682 | 849 |
|
| 177583 | 40S ribosomal protein S2 | Chr. IV: 7925298–7926391 | 998 |
|
| 172313 | 40S ribosomal protein S17 | Chr. I: 6220090–6220766 | 465 |
|
| 172062 | 60S ribosomal protein L24 | Chr. I: 4585115–4586177 | 552 |
|
| 176891 | 60S ribosomal protein L15 | Chr. IV: 653436–654576 | 732 |
|
| 176097 | 60S ribosomal protein L35 | Chr. III: 7855118–7855680 | 460 |
|
| 176007 | 60S ribosomal protein L36 | Chr. III: 7180249–7180677 | 355 |
|
| 171750 | 60S ribosomal protein L27 | Chr. I: 1834881–1835439 | 466 |
|
| 174166 | 60S ribosomal protein L35 | Chr. II: 7105556–7106462 | 440 |
Figure 2Intersection analysis of top 50 genes from each SGL.
Figure 3Function enrichment of all top 50 genes in six SGLs.
Validation and comparison for housekeeping gene candidates in C. elegans.
| Accession Number | Journal, Year | Technique | Sample Size | Data Source | Normalization Method | Top 10 Genes Sorted by | Top 10 Genes Sorted by |
|---|---|---|---|---|---|---|---|
| GSE118294 | DNA Microarray | 12 | GPL200 | RMA |
|
| |
| GSE108968 | DNA Microarray | 32 | GPL10094 Agilent-020186 | Quantile, log |
|
| |
| GSE76380 | DNA Microarray | 48 | GPL10094 Agilent-020186 | Quantile, log |
|
| |
| GSE63528 | RNA-sequencing | 36 | GPL13657 | log2(CPM+1) |
|
| |
| GSE60755 | RNA-sequencing | 139 | GPL13657 | log2(CPM+1) |
|
| |
| GSE98919 | RNA-sequencing | 42 | GPL18730 | log2(CPM+1) |
|
|
1SD and 2 GC were abbreviations of standard deviation and Gini coefficient, respectively. Top 10 genes were selected based on the ranking results of the combination gene list of 13 newly identified reference genes (NRGs) and 13 commonly used reference genes (CRGs) in this study.
Figure 4Violin plots showing the reliability of expression levels and boxplots representing standard variations in six independent datasets, for 13 NRGs (dark green) and 13 CRGs (orange). (A–C) and (D–F) represent validated results by using three microarray datasets and three RNA-seq datasets, respectively.