Literature DB >> 36176302

Investigation of chicken housekeeping genes using next-generation sequencing data.

Karim Hasanpur1, Sevda Hosseinzadeh1, Atiye Mirzaaghayi1, Sadegh Alijani1.   

Abstract

Accurate normalization of the gene expression assays, using housekeeping genes (HKGs), is critically necessary. To do so, selection of a proper set of HKGs for a specific experiment is of great importance. Despite many studies, there is no consensus about the suitable set of HKGs for implementing in the quantitative real-time PCR analyses of chicken tissues. A limited number of HKGs have been widely used. However, wide utilization of a little number of HKGs for all tissues is challenging. The emergence of high-throughput gene expression RNA-seq data has enabled the simultaneous comparison of the stability of multiple HKGs. Therefore, employing the average coefficient of variations of at least three datasets per tissue, we sorted all reliably expressed genes (REGs; with FPKM ≥ 1 in at least one sample) and introduced the top 10 most suitable and stable reference genes for each of the 16 chicken tissues. We evaluated the consistency of the results of five tissues using the same methodology on other datasets. Furthermore, we assessed 96 previously widely used HKGs (WU-HKGs) in order to challenge the accuracy of the previous studies. The New Tuxedo software suite was used for the main analyses. The results revealed novel, different sets of reference genes for each of the tissues with 17 common genes among the top 10 genes lists of 16 tissues. The results did disprove the suitability of WU-HKGs such as Actb, Ldha, Scd, B2m, and Hprt1 for any of the tissues examined. On the contrary, a total of 6, 13, 14, 23, and 32 validated housekeeping genes (V-HKGs) were discovered as the most stable and suitable reference genes for muscle, spleen, liver, heart, and kidney tissues, respectively. Although we identified a few new HKGs usable for multiple tissues, the selection of suitable HKGs is required to be tissue specific. The newly introduced reference genes from the present study, despite lacking experimental validation, will be able to contribute to the more accurate normalization for future expression analysis of chicken genes.
Copyright © 2022 Hasanpur, Hosseinzadeh, Mirzaaghayi and Alijani.

Entities:  

Keywords:  New Tuxedo; RNA-sequencing; chicken; coefficient of variation; housekeeping genes

Year:  2022        PMID: 36176302      PMCID: PMC9514876          DOI: 10.3389/fgene.2022.827538

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.772


Introduction

Housekeeping genes (HKGs), by definition, are genes required for the maintenance of basal cellular function, irrespective of their specific roles in the tissue or organism. HKGs are expected to express stably in all tissues of an organism under different conditions, regardless of developmental stage, sex, or external stressors (e.g., heat stress, disease, and immunological challenge, among others). Full characterization of a minimal set of genes that are required to sustain the life of a tissue is of particular interest (Eisenberg and Levanon, 2013). The current trend of analyses of global gene expression data using microarray or RNA-seq technologies has enabled the simultaneous analysis of tens of thousands of genes. However, quantitative real-time PCR (qPCR) has remained the only valid, more preferred independent tool for validating the results of genome-wide gene expression analyses (VanGuilder et al., 2008). The reliability of the final quantification result of qPCR depends heavily on the utilization of one or multiple internal reference genes for the normalization of the expression of the genes of interest. Normalization to a set of HKGs is nowadays a current and crucial procedure and is preferred to the normalization to a single reference gene. Therefore, the identification of at least two proper, stable HKGs for a specified tissue is crucial (Bagés et al., 2015). The simultaneous analysis of a large number of genes was not possible until the emergence of high-throughput next-generation sequencing data. Although the evaluation of expression stability of potential reference genes has been carried out earlier for several tissues of chicken (Yang et al., 2013; Olias et al., 2014; Oliveira et al., 2017; Hassanpour et al., 2018; Zhang et al., 2018), the methods of choice of almost all of them were merely based on the qPCR and utilization of the BestKeeper (Pfaffl et al., 2004), geNorm (Vandesompele et al., 2002), and NormFinder (Andersen et al., 2004) statistical algorithms. To our knowledge, the work on identifying suitable HKGs using high-throughput microarray or RNA-seq data is scarce and limited to only some plant species including olive and Arabidopsis (Zhuo et al., 2016; Carmona et al., 2017), grapevine (González-agüero et al., 2013), white campion (Zemp et al., 2014), and some insects and animals including sweet potato whitefly (Su et al., 2013), Arctic charr (Pashay Ahi et al., 2013), human (Carmona et al., 2017), and human and mouse (de Jonge et al., 2007). However, there is no comprehensive study to address the most suited HKGs in chicken using RNA-seq data. In the present work, we tested most of the reliably expressed genes (REGs) for stability in the 16 important chicken tissues using at least three RNA-seq datasets per tissue and reported 10 most stably expressed genes for each of them to be used as proper sets of HKGs in the future gene expression assays. In addition, the consistency of the results was evaluated for five tissues, namely, heart, kidney, liver, muscle, and spleen. The evaluation step was not performed for the remaining 11 tissues as a sufficient, required number of high-depth datasets were not available.

Materials and methods

Selection of the desired tissues for study

Based on the importance of the tissues in research and availability of sufficient gene expression data, we selected 16 chicken tissues, namely, adipose, blood, brain, bursa of Fabricius, duodenum, heart, ileum, jejunum, kidney, liver, lung, muscle, ovary, skin, spleen, and trachea, for the current work. There were no sufficient datasets (at least three) for other tissues in the databases.

RNA-seq data collection from databases

The required RNA-seq datasets were downloaded from NCBI (https://www.ncbi.nlm.nih.gov). At least three datasets were downloaded per tissue. Each dataset should have at least three replicated samples for case group and at least three replicated samples for control or another treatment group. The treatment of the case samples was not important as the only criterion for the comparison was to compare the expression of genes between case (treated) and control (untreated) or another treated group. For simplicity, the two mentioned groups will be called case and control, hereafter. The data were downloaded in SRA format using the SRA Toolkit and converted into fastq format using the fastq-dump tool. In Table 1, the accession numbers are shown, and in Supplementary Table S1, the meta information of the used datasets are presented.
TABLE 1

Accession numbers of the used datasets for discovering the most stably expressed genes for 16 chicken tissues.

Training datasetsAdiposeSRP143406SRP343295SRP042257SRP212250
BloodSRP200185SRP200118SRP200118SRP310357SRP310357
BrainSRP102082SRP081121SRP233052SRP233052
BursERP122030ERP122030SRP163233SRP098825
DuodenumSRP348148SRP299602SRP173587SRP055561
HeartSRP265642SRP097223SRP153755
IleumSRP149780SRP200118SRP300399SRP126304
JejunumSRP280208SRP140601ERP121879ERP121879
KidneySRP097223SRP092600SRP338989
LiverSRP143406SRP097223SRP321387SRP294224SRP161836SRP133195
LungSRP097223SRP265640SRP233531SRP238721SRP081121
MuscleSRP217060SRP217060SRP217060SRP217060SRP159467SRP321387
OvarySRR12315154SRP143406SRP256253SRP256253
SkinSRP343295SRP142597SRP126033SRP112878
SpleenSRP097223SRP225741SRP174144SRP280208SRP158365SRP174144
TracheaSRP338989SRP247563SRP226600SRP126851
Evaluation datasetsHeartSRP152925SRP266037SRP159467
KidneySRP338989SRP338989SRP338989
LiverSRP111815SRP104528SRP233052SRP233052SRP081121SRP100368
MuscleSRP255211SRP104528SRP327337SRP327185SRP313854SRP226900
SpleenSRP254842SRP254842SRP223412SRP173965SRP174144SRP174144
Accession numbers of the used datasets for discovering the most stably expressed genes for 16 chicken tissues.

Required conditions for the datasets to be chosen for the analyses

The main question of the current work was “Which HKGs are the most suitable?” It is obvious that genes with the lowest expression differences between the case and control groups within a specified dataset as well as with expression sustainability among all datasets of a specified tissue could be considered as the most suitable HKGs. Since at least three datasets were analyzed per tissue, genes with the highest consistency of expression across all the experiments were finally introduced as the most proper sets of HKGs. We screened the NCBI database exhaustively to download only the datasets that address the question of research. Therefore, we downloaded only the datasets with the following conditions: 1) Illumina paired-end RNA-seq data (no single-end data used); 2) each of the tissues should have at least three datasets; 3) each dataset could be subset into only two case and control groups; and 4) each case group and control group should have at least three replicate samples.

Analyses of individual datasets

The individual datasets were analyzed separately using the New Tuxedo software suite employing the Ensembl Gallus gallus Build 6.0 reference genome (https://asia.ensembl.org/Gallus_gallus/info/index). At first, Fastqc (Andrews, 2010) and Trimmomatic (Bolger et al., 2014) software tools were employed for quality control and trimming, respectively. Datasets with insufficient quality metrics were excluded. The data were trimmed using ILLUMINACLIP, SLIDING WINDOW (window size 3–5 and Phred quality mean of 20–28), CROP (to trim 3–10 left-end nucleotides), AVGQUAL (minimum Phred quality of 20–25), and MINLEN (read length ≥ 40–45) options. Depending on the dataset, the values were varied. The Hisat2 software (Kim et al., 2019) (available at https://daehwankimlab.github.io/hisat2) was used for both indexing of the genome and mapping of the clean reads onto the indexed reference genome. The Stringtie software (Kovaka et al., 2019) (available at http://ccb.jhu.edu/software/stringtie) was used for assembly of transcripts of each sample using the -G option that forces the assembly to be limited to only the known genes. The transcripts of all samples of all experiments were assembled using the merge option of Stringtie. The Cuffdiff software (Trapnell et al., 2010) (available in http://cole-trapnell-lab.github.io/cufflinks) was used for differential expression analysis between the case and control groups with the multiread and bias correction options enabled. Genes with considerable expression differences between the two mentioned groups resulted in statistically significant differences, while genes with constant expressions between the two groups have p-values approximate to 1. Genes with unreliable expression (FPKM < 1) in all samples of a dataset were excluded. The expression (FPKM) values of all samples within a specified experiment were gathered, and mean, variation, and coefficient of variation (CV) statistics were calculated for each gene. Only REGs (i.e., genes passing the aforementioned filter in all datasets of a tissue) were considered for the discovery of the stably expressed genes. Then, average CV was calculated for each gene across all datasets within a specified tissue. Genes were sorted in ascending order and ranked based on the average CV, and those with the lowest average CV values were reported as the most stable and suitable housekeeping genes. The flowchart of the analyses is shown in Figure 1. This process was repeated for all 16 tissues, and the top 10 most stably expressed genes were reported for each tissue separately.
FIGURE 1

Flowchart of detection of the most suitable housekeeping genes for a specific tissue. This workflow was repeated for all tissues separately. At least three datasets were analyzed per tissue. The comparisons within each experiment were performed between the treated and control groups, each with at least three replicates. All the used datasets are publicly available data generated using Illumina paired-end RNA-Sequencing method. Software programs within the New Tuxedo suite were hisat2 (for mapping of reads onto reference genome), Stringtie (for assembly and read counting), and Cuffdiff (for differentiall expression analysis). Expression stability was monitored per gene based on the coefficient of variation (CV), and ranking of most stably expressed genes were performed based on the average CV criterion.

Flowchart of detection of the most suitable housekeeping genes for a specific tissue. This workflow was repeated for all tissues separately. At least three datasets were analyzed per tissue. The comparisons within each experiment were performed between the treated and control groups, each with at least three replicates. All the used datasets are publicly available data generated using Illumina paired-end RNA-Sequencing method. Software programs within the New Tuxedo suite were hisat2 (for mapping of reads onto reference genome), Stringtie (for assembly and read counting), and Cuffdiff (for differentiall expression analysis). Expression stability was monitored per gene based on the coefficient of variation (CV), and ranking of most stably expressed genes were performed based on the average CV criterion.

Gene ontology and pathway enrichment analyses of the most stably expressed genes

All the top 10 genes of the 16 tissues were gathered, and the duplicated genes were deduplicated. In the end, a total of 139 unique genes were submitted to gene ontology and pathway enrichment analyses using the DAVID web-based software (Jiao et al., 2012) (available at https://david.ncifcrf.gov) in order to understand the functions and to gain insight into the pathways that the less variable genes are involved in.

Evaluation of the consistency of the results

For five of the tissues, there were six or more datasets. For each of heart and kidney tissues, there were six datasets, and for each of liver, muscle, and spleen tissues, there were 12 datasets. We analyzed half of the datasets of the mentioned tissues as training and the second half for the evaluation of the consistency of the results of the training datasets. We named the second set as evaluation datasets. As for the training datasets, the REGs of the evaluation datasets were also ranked based on the average CV. The top 100 genes of the training datasets were compared with those of the evaluation datasets and those genes that were in common in both of the top 100 genes lists were reported as validated housekeeping genes (V-HKGs). The more the counts of matched genes between the two top 100 genes lists, the greater the accuracy and repeatability of discovering the stable genes.

Assessment of the suitability of widely used housekeeping genes

A total of 96 WU-HKGs were selected from the literature. The list of WU-HKGs and their corresponding citations are shown in Supplementary Table S2. Gene expression analysis using a real-time PCR assay was the main subject of the reviewed papers. The main objective of the current section was to challenge the accuracy of the previously conducted gene expression studies that had used nonproper HKGs.

Results and discussion

We used 94 datasets (70 for training and 24 for evaluation) sourced from 16 tissues, namely, adipose, blood, brain, bursa of Fabricius, duodenum, heart, ileum, jejunum, kidney, liver, lung, muscle, ovary, skin, spleen, and trachea. A total of 23,403 genes were analyzed in each dataset, and almost 3,000–11,000 REGs (with FPKM value ≥ 1 in at least one sample of a datasets) were assessed for the stability of expression. In contrast, nearly 50–85% of genes were filtered out because of the inconsistency of expression or because of low coverage of sequencing in some of the datasets. It is worth mentioning that only REGs of all datasets within a tissue were allowed for the final analysis. For tissues with more available datasets (such as spleen, liver, and muscle), we only chose the datasets with sufficiently deep sequencing, because the low coverage datasets would not guarantee the possibility of the evaluation of all REGs. In contrast, for less studied tissues that had no abundant datasets available, we decided to utilize all available datasets regardless of their sequencing depths. It is obvious that the insufficient coverage of datasets will cause the number of analyzed genes to be reduced. The total number of datasets and REGs for each of the tissues are reported in Supplementary Table S3. To identify the most stable reference genes, we first checked the relation of mean of expression and variation. We observed no relation between the mentioned coefficients, and thereby, the suggested reference genes can be used for the normalization of the interested genes irrespective of their expression levels. In Figure 2, a scatterplot demonstrating the relation of mean and CV of one experiment of adipose tissue is shown. Scatterplots of the remaining experiments were similar and therefore were not shown here.
FIGURE 2

Scatterplot of the reliably expressed genes of one dataset of adipose tissue. The relationship of mean (in log10 scale) and variation of expression of genes were assessed in order to understand whether the variability of expression is increased with the mean of expression. The expression values are given as fragments per kilobase of transcript per million reads in log10 scale.

Scatterplot of the reliably expressed genes of one dataset of adipose tissue. The relationship of mean (in log10 scale) and variation of expression of genes were assessed in order to understand whether the variability of expression is increased with the mean of expression. The expression values are given as fragments per kilobase of transcript per million reads in log10 scale. The methodology used in the present work to find the most stable reference genes revealed interesting results. Almost all of the introduced reference genes (i.e., top 10 most stably expressed genes) were new, indicating that the previously used HKGs were not as stable as required for a gene to be considered as a proper HKG. Only eight (8.3%) of the WU-HKGs were present in the top 10 lists of five tissues including Rpl4 for adipose; Oaz1, Rpl27a, and Gapdh for blood; Rpl6 for jejunum; Gusb and Polr2b for kidney; and Ap2m1 for liver. The top 10 most stably expressed genes introduced here for the remaining tissues were completely novel. In Table 2, the top 10 most stable reference genes are reported for the studied tissues.
TABLE 2

Top 10 most suitable reference genes for chicken tissues based on average coefficient of variations (CVs) across at least three experiments.

AdiposeBloodBrainBursaDuodenumHeartIleumJejunumKidneyLiverLungMuscleOvarySkinSpleenTrachea
Abcb6Psma1Serbp1Rap2cAmotMrpl33Dhx30 Atp5b Itgb1bp3Fam120aAtl1Adam17 Tasor2 Hnrnpab Wdr81 Tasor2
Prrc2c Gapdh Nr1h3Tmem259 Xpo5 Ilf2 Dhx38 Rpl6 Uck1Xpo6Tfip11Slc39a3Pcif1Ddb1Hdac1 Atl1
Ubr7 a Oaz1 Bet1lCnot9 Exosc10 Rufy3Arnt Gnb2l1 Polr2b Ubr7 Mvb12a Cops7a Tomm22Lonp1Scyl3 Ikbkb
Eif3aRpl39lArhgef9Wasf2Zyx Cep68 Ckap2l Ilf2 Col4a1Mrps25Slc35a1Xpo7Erlin1Tcf25Zc3h11b Cep68
Pabpc1Tmed10 Atp5b Grb2 Usp5Nfyc Hnrnpab Lasp1 Gusb Ap2m1 PisdFem1bPoll Grb2 Dnajc5Cfap92
Tmem57 Rpl27a NonoAscc2Cnot1Cuedc2Nup188Mif4gd Atl1 Pcbd1Phc1Ipo9 Ctnna1 Mtmr3Nek9Parp9
Thrap3Cox7bSumo3Rpl7l1Psmd13 Tasor2 Stx10 Ikbkb Hbp1Rnf130Spout1 Gsr Kbtbd4Baz1bGzf1Hspd1
Rpl4 Rpl7aRps3aMan2c1Fhl3Slc7a3 Eif2b5 Atl1 Herc2Adat1 Tor1b Copg2Brd8 Exosc10 Cdk12Cd80
Ctnna1 Rpsap58Uso1Tpm3Med1 Gnb2l1 Map3k14Mkrn2Ndufb9Myh9Cog5Arcn1Gsr Cops7a Xpo5 Mettl21c
Kcnh4Eif4ebp1Cd99l2Chmp1a Eif2b5 Ak2 Nol9Aamp Tor1b Arpc2 Cep68 Eya3ClpxYipf3Mpp1Mrpl40

Highlighted (bold) genes are in common for at least two tissues. Underlined genes are among the widely used housekeeping genes.

Top 10 most suitable reference genes for chicken tissues based on average coefficient of variations (CVs) across at least three experiments. Highlighted (bold) genes are in common for at least two tissues. Underlined genes are among the widely used housekeeping genes. As can be seen in Table 2, some of the introduced reference genes (17 genes) were in common in the top 10 lists of at least two tissues. For example, Atl1 was identified as suitable for four tissues. Cep68 and Tasor2 genes were in common in the lists of three tissues, and Ubr7, Ctnna1, Atp5b, Grb2, Xpo5, Exosc10, Eif2b5, Ilf2, Gnb2l1, Hnrnpab, Ikbkb, Tor1b, cops7a, and Gsr genes were in common in the top 10 lists of two tissues. It is obvious that genes with more frequent occurrence in multiple top 10 genes lists are more likely suggested as suitable HKGs than are those with only one occurrence. Therefore, these 17 genes are strongly suggested to be used as HKGs for the mentioned tissues.

Gene ontology and Kyoto Encyclopedia of Genes and Genomes pathway analyses of the most stably expressed genes

Understanding the functionality of the most stable reference genes is of great importance. In the current work, the top 10 most stable genes of all 16 studied tissues (totally 134 unique genes) were subjected to gene ontology and KEGG pathway enrichment analyses using the DAVID web tool. The results revealed that the most stably expressed genes that enriched significantly (p < 0.05) in molecular function terms, biological processes terms, and KEGG pathways were generally related with the transcription and translation of proteins (e.g., RNA binding, protein binding, structural constituent of ribosome, translational initiation, positive regulation of transcription, cytoplasmic translation, nucleocytoplasmic transport, RNA degradation, and ribosome, among others). In other words, as compared to genes of other processes or pathways, it seems that genes associated with transcription and translation are less variable under different conditions and are more suited as HKGs. These results are in accordance with the findings of Kouadjo et al. (2007) and de Jonge et al. (2007), who reported the protein biosynthesis–related genes as the most stably expressed genes. Gene ontology terms as well as KEGG pathways enriched by the top 10 most stably expressed genes are reported in Supplementary Table S4.

Consistency of the results of the training datasets

As mentioned above, other sets of data of heart (three datasets), kidney (three datasets), liver (six datasets), muscle (six datasets), and spleen (six datasets) tissues were analyzed in the same way as the training datasets in order to evaluate the consistency of the results of the training datasets. The average CVs of the training datasets strongly correlated with that of the evaluation datasets with Pearson correlation coefficient ranging from 0.64 to 0.88 and Spearman correlation coefficients ranging from 0.67 to 0.82, indicating the relatively high accuracy of the discovering stable genes. The top 100 genes of the training datasets were compared with that of the evaluation datasets. We found that the top 100 genes of the training datasets also took place in higher ranks in the evaluation datasets. Except for muscle, we found relatively consistent results for the remaining four tissues. There were 6, 13, 14, 23, and 32 genes in common in the top 100 genes of the training and evaluation datasets of muscle, spleen, liver, heart, and kidney tissues, respectively. In total, 80% of the top 100 genes for kidney tissue and almost 50% of the top 100 genes for heart, liver, and spleen tissues were present in the top 500 genes of the evaluation datasets. It should be noted that in the case of random distribution of genes in the various ranks of the evaluation datasets, less than 10% of the top 100 genes of training datasets would be present in the top 500 genes of the evaluation datasets [(500/number of analyzed genes) × 100 = ∼4.6–8%]. These findings indicate that the identification of the most stably expressed genes with the used method is, to some extent, accurate, and the repeatability of the results is considerable. In Table 3, the number of common genes in the top 100 genes of the training datasets and top 500 genes of the evaluation datasets are presented. Although the employed approach identified reliable, stable reference genes for all tissues, we recommend the utilization of the reported HKGs cautiously. In addition, we invite related researchers to further validate the reported HKGs using real-time PCR as this was not possible in the current work.
TABLE 3

Number of genes that are in common in the top 100 genes list of the training datasets and the top 100 to top 500 genes lists of the evaluation datasets.

HeartKidneyLiverMuscleSpleen
Top-100 genes of evaluation datasets233214613
Top-200 genes of evaluation datasets3851261121
Top-300 genes of evaluation datasets4868361631
Top-400 genes of evaluation datasets5976461935
Top-500 genes of evaluation datasets6380492343
Number of genes that are in common in the top 100 genes list of the training datasets and the top 100 to top 500 genes lists of the evaluation datasets. Here, we introduced only those genes that were in common between the top 100 genes of the training and evaluation datasets as V-HKGs. Six of the V-HKGs, namely, Ankrd16, Strada, Phc1, Atl1, Mkrn2, and Uck1, were observed commonly for at least two tissues. Ankrd16 gene was identified as suitable for heart, liver, and spleen tissues. The remaining five genes were identified as the best for both heart and kidney tissues. In Table 4, the official names of the V-HKGs are reported. In addition, in Figure 3, the boxplots of the V-HKGs are illustrated. As can be seen in Figure 3, the expression variations of the V-HKGs were negligible.
TABLE 4

Most stably expressed genes of five chicken tissues that are in common between the top 100 genes of the training dataset and the top 100 genes of the evaluation dataset.

HeartKidneyLiverMuscleSpleen
Mob1aIkbkbRp11-529k1.3SrpraNfyc
Strada Wnk1Dpagt1Cops7aHdac1
Ticam1Aplp2Rbm7HnrnpdCnp
Cep68Ilf2NpeppsDhx38Adam17
Phc1 Strada Eif2b5Gtpbp1Spata5
Atl1 Ugp2 Ap2m1 Puf60Grk2
FancmBpnt1AmfrRpn1
Abi2 Phc1 Psmd7Hnrnpab
AampEtfdhFam120aMtmr3
Tmem41aRufy3Ctnna1Nup188
Ak3Oraov1Xpo6Tor1b
Hbp1SlirpLig3Znrf2
Casc4 Atl1 Ankrd16 Ankrd16
Rbl2Hspd1Prpf6
PepdTanc1
Mkrn2 Rpl5
Tfip11Cp
PisdGpr18
Spout1Cog5
Uck1 Vwa9
Mif4gd Mkrn2
Ankrd16 Hvcn1
Tasor2 Uck1
Gsn
Fam104a
Mrps7
Gusb
Rps6kb1
Myo19
Tubb2a
Stx17
Mrps16

Highlighted (bold) genes are in common for at least two tissues. Underlined genes are among the widely used housekeeping genes.

FIGURE 3

Box plots of the expression of most stable, validated housekeeping genes in comparison with that of five randomly selected widely used housekeeping genes of five chicken tissues. For each tissue, the expression variation of newly introduced housekeeping genes (left side of the vertical line) was compared with that of five old and widely used housekeeping genes (right side of the vertical line). The expression values are given as fragments per kilobase of transcript per million reads in log10 scale (whiskers: min to max).

Most stably expressed genes of five chicken tissues that are in common between the top 100 genes of the training dataset and the top 100 genes of the evaluation dataset. Highlighted (bold) genes are in common for at least two tissues. Underlined genes are among the widely used housekeeping genes. Box plots of the expression of most stable, validated housekeeping genes in comparison with that of five randomly selected widely used housekeeping genes of five chicken tissues. For each tissue, the expression variation of newly introduced housekeeping genes (left side of the vertical line) was compared with that of five old and widely used housekeeping genes (right side of the vertical line). The expression values are given as fragments per kilobase of transcript per million reads in log10 scale (whiskers: min to max).

Assessing the suitability of 96 widely used housekeeping genes

In Supplementary Tables S5 and S6, the average CVs and rankings of the WU-HKGs among the REGs are reported, respectively. The expression profile of most of the 96 WU-HKGs showed inconsistency in some of the chicken tissues. Eight genes (i.e., Mb, Dimt1, Rps29, Stx5, Gys1, Il6, Rbx1, and Rnasek) showed no expression in more than five tissues. It is obvious that the mentioned genes do not have merit to be suggested as proper HKGs, although some of them have been traditionally used widely for the normalization of real-time PCR assays. In contrast, only 13 genes showed consistent expression in all 16 tissues, including Ap2m1, Gusb, Polr2b, Rpl6, Eif4a3, Rpl4, Eef1a1, Gapdh, Rpl19, Rpl27a, Rpl31, and Rps6. Six out of the 13 mentioned genes (i.e., Eef1a1, Gapdh, Rpl19, Rpl27a, Rpl31, and Rps6) showed relatively consistent, stable expression and ranked among the best 100 most stably expressed genes of two tissues. Although not detected in three datasets, Ap2m1 performed well in seven tissues, which was followed by Gusb, Polr2b, Rpl6, Rpl4, and Eif4a3. These WU-HKGs were high-ranked in three or more tissues and seem to be suitable for further utilization. For blood, 17 WU-HKGs were relatively suitable, which was followed by adipose (seven genes), jejunum (six genes), kidney (six genes), and liver (five genes), indicating that the WU-HKGs are more suitable for blood than for other tissues. For the remaining tissues, almost none of the WU-HKGs is suitable, and further use of them is not suggested. The variation of expression of the 96 WU-HKGs were considerably more than that of the stable reference genes that were introduced in the present work. In Figure 3, the expression variations of five randomly selected WU-HKGs (i.e., Gapdh, Actb, Vim, Rps6, and B2m) are illustrated. As can be seen in Figure 3, some of the mentioned genes were not stably expressed across all experiments of 16 tissues. Two of them (i.e., Gapdh and Rps6) were among the top 100 stable genes, while the remaining three (i.e., Vim, B2m, and Actb) were not. Previous studies have reported relatively high variability for the expression of WU-HKGs in a varied range of tissues and organs in livestock animals (Gromboni et al., 2020; Lozano-Villegas et al., 2021) as well as in humans (Mahoney et al., 2004; Caracausi et al., 2017; Cadenas et al., 2022), mice (Fu et al., 2020; Muñoz et al., 2021), and insects (Shi et al., 2016), among others. For adipose tissue, only Rpl4 gene took place within the top 10 stable genes among 10,343 REGs (Table 2). Other WU-HKGs were not even within the best 100 genes. We found it interesting that having fold changes of ∼9.7 and 12.0, respectively, Acta1 and Scd genes showed significant difference between the case and control groups in one experiment of adipose (q-value < 0.05). Therefore, these genes are no longer suggested for further utilization. Unlike Rpl4 gene, which is also proposed here, Tbp has previously been reported elsewhere as a suitable reference gene for the normalization of gene expression data of adipose tissue (Wang et al., 2020). Tbp was the only WU-HKG that was reliably expressed in all 16 tissues with nonsignificant differential expression in all of the 94 studied datasets. However, Tbp ranked 1,092 among the 10,343 REGs of adipose tissue. Other studies have reported Rpl32 and B2m genes as two suitable HKGs for abdominal fat, compared with the other three genes, namely, Sdha, Tbp, and Ywhaz (Bagés et al., 2015). Neither of them, however, were identified as suitable HKGs for adipose in the current work. Na et al. (2021) compared 14 chicken reference genes and reported both Tbp and Hmbs genes as the most stably expressed genes during the growth and development of abdominal adipose tissue of broilers. They also reported Tbp and Rpl13 genes as the most stable during the differentiation of primary preadipocytes and Tbp and Hmbs genes in preadipocytes and mature adipocytes. In the present work, as compared to those of other tissues, greater number of WU-HKGs were relatively constant across all datasets of blood. Studying human peripheral blood, Martínez-Sánchez et al. (2019) found Hprt and Tbp as the most reliable genes. They suggested the utilization of Gapdh, B2m, and Rpl13a genes to be avoided. Likewise, Dheda et al. (2004) emphasized the avoidance of employment of Gapdh, B2m, and Actb genes (each with ∼10- to 30-fold variability across conditions) for normalizing mRNA levels in human pulmonary tuberculosis. For blood, we found the following four WU-HKGs as relatively stable: Gapdh (rank = 2), Rpl13 (rank = 80), Oaz1 (rank = 3), and Rpl27a (rank = 6). For heart tissue, except Gusb (rank = 50) and Tbp (rank = 57) genes, further utilization of other WU-HKGs genes is not suggested. TBP has been also identified as a suitable reference gene for lung and heart (Hassanpour et al., 2018) and abdominal fat (Wang et al., 2020) tissues. Because of the low coverage of the used datasets of heart tissue in the current work, only 3,334 (6,235) genes in the training (evaluation) datasets were reliably expressed. Therefore, as expected, many of the WU-HKGs with little expression were not detected in heart datasets. On that account, only 33 WU-HKGs were analyzed. Gromboni et al. (2020) assessed the suitability of eight HKGs for heart. Their studied WU-HKGs were present in neither the top 10 genes nor the list of 23 V-HKGs of heart tissue in the current study. Hassanpour et al. (2018) investigated a panel of nine HKGs and introduced Ywhaz and Rpl13 genes as suitable for chicken heart (Hassanpour et al., 2018). All of their studied genes were filtered out in the current study as lacking the criteria we employed to categorize the stable reference genes. For kidney tissue, only Polr2b and Gusb genes (rankings of 3 and 5, respectively) and Eef1a1, Nelfcd, Rpl5, and Rpl6 (rankings of less than 100) outperformed other WU-HKGs within 7,527 REGs. On the contrary, Il6, Scd, and Dimt1 were among the worst genes and identified as the most inappropriate HKGs for kidney. Other WU-HKGs were also not stable and therefore not recommended for further use. For liver tissue, only Ap2m1 gene appeared suitable with a ranking of 5 among 8,428 REGs. Out of 94 analyzed datasets, Ap2m1 was reliably expressed in 80, while it was significant (q-value < 0.05) in only two datasets belonging to blood and brain tissues. Therefore, although not suggested for brain, blood, heart, and lung tissues, Ap2m1 seems to be suggested for more than one tissue. In yellow feathered broilers, Zhang et al. (2018) reported Rpl13 gene as the most proper HKG for liver, compared with only six other candidate genes. In another research, Ywhaz and Tbp were found more stable than B2m, Rpl32, and Sdha genes (Bagés et al., 2015). In the present work, being significant in one, two, two, and three experiments among 12 experiments of liver, Actb, B2m, Gapdh, and Rpl13 genes, respectively, were not proved to be suitable HKGs. For lung tissue, only 34 WU-HKGs showed reliable expression. Almost 95% of these 34 genes were not stable. Only Gusb (rank 30 among 3,562 REGs) was identified as relatively suitable. In an attempt, Kriegova et al. (2008) investigated 10 candidate HKGs and introduced Rpl32 as the most suitable HKG for lung tissue. Being excluded from the analyses in the filtration steps, Rpl32 was not identified as a suitable HKG in the current work for neither lung tissue nor other tissues. The results of Fu et al. (2020) indicated that none of the 15 WU-HKGs that they studied were sufficiently good as reference genes. However, they suggested the combination of Grcc10 and Ppia genes as a proper choice for the lung tissue of mouse infected with IAV. The datasets of muscle in the current work appeared very variable. Although we only used datasets belonging to pectoral major muscle tissue, as compared to other tissues, there were less genes in common between the training and evaluation datasets. It seems that sampling of the tissues of the different studies had been done differently and not from the same section of the pectoral major muscle. The six identified V-HKGs (i.e., Srpra, Cops7a, Hnrnpd, Dhx38, Gtpbp1, and Puf60), however, showed sufficiently less variability within the used 12 different datasets and seem to be a suitable set of reference genes for muscle. The findings of Nascimento et al. (2015) showed Hmbs and Hprt1 genes as the most stable while Tfrc and B2m as the least stable reference genes for the pectoralis major muscle of chicken. Their results, also, revealed that Hmbs and Hprt1 gene expression did not change owing to dietary variations and thus were recommended for accurate normalization of RT-qPCR data of chicken pectoralis major muscle. In our results, the best WU-HKG was MIF (rank = 89). Hprt1, although expressed nonsignificantly in all six training analyses of pectoral muscle and its fold change ranged 1.0–1.3 between the case and control groups, appeared relatively variable within the case or control groups of the evaluation datasets and ranked 1,172 among 10,745 REGs. In accordance with the findings of the current work, Barber et al. (2005) analyzed Gapdh expression in a panel of 72 human tissues and observed a 15-fold difference in Gapdh mRNA copy numbers between the skeletal muscle and the breast. Their results confirmed previous reports of the marked variability of Gapdh expression between tissue types. On the contrary, Mahoney et al. (2004) concluded that B2M and ACTB were the most stably expressed HKGs in human skeletal muscle following resistance exercise, while B2m and Gapdh were the most stable following endurance exercise. For ovary, we discovered three genes (i.e., Tasor2, Ctnna1, and Gsr) within its top 10 genes that were also identified as the most stable for other tissues. These stable genes along with all the top 10 genes of brain, bursa of Fabricius, duodenum, ileum, jejunum, lung, skin, and trachea were completely new and, to our knowledge, are first reported in the present study. Hassanpour et al. (2019) reported Ywhaz, Hprt1, and Hmbs genes as most stable. They suggested the combination of Ywhaz, Hprt1, and Hmbs as the best set of reference genes for ovarian and uterine tissues of laying hens under control and heat stress conditions. Cadenas et al. (2022) found that the stability of all reference genes differs among ovarian cell types in humans. They identified Actb as the best reference gene for oocytes and cumulus cells and B2m for medulla tissue and isolated follicles. They concluded that using a single validated reference gene may be sufficient when the available testing material is limited. For the ovarian cortex, depending on culture conditions, Gapdh or Actb were found to be the most stably expressed genes. Their reported stable genes were not confirmed in the current work. Similar to the tissues discussed above, we evaluated the 96 HKGs for spleen transcriptome data as well. Having a relatively low CV and high rank (58 among 9,904 genes), only Rpl6 proved to be relatively suitable for spleen. In a previous study, 10 HKGs were assessed and 2 genes, i.e., Tbp and Ywhaz, were identified as the most suitable HKGs for spleen tissue (Khan et al., 2017). None of them, however, was in the list of best, high-ranked HKGs in the current work. For spleen, liver, and cecum of different-aged specific-pathogen-free layer chickens and commercial turkeys, Mitra et al. (2016) suggested Rpl13 and Tbp as the most stable reference genes. They also observed a stable expression of Rpl13 and Tfrc genes in the mentioned tissue samples of turkey. In the current work, both Rpl13 and Tfrc genes were expressed differently between the case and control groups of two experiments of evaluation datasets (q-value < 0.05). Therefore, our results did not prove the consistency of expression of Rpl13 and Tfrc genes. Likewise, we could not approve the suitability of spleen HKGs introduced by Khan et al. (2017) and Borowska et al. (2016). To our knowledge, the present work is the first comprehensive study that investigated all REGs for 16 most important chicken tissues. Most of the previous studies have compared only a handful of HKGs in which the used genes were in common (Zinzow-Kramer et al., 2014; Bagés et al., 2015; Mitra et al., 2016; Khan et al., 2017; Hassanpour et al., 2018; Zhang et al., 2018; Gromboni et al., 2020). The employed methodologies of the mentioned studies were NormFinder, GeNorm, BestKeeper, RefFinder, and delta CT (Mitra et al., 2016). Each of them has its own strengths and weaknesses. In RefFinder, PCR efficiencies are not taken into account. The NormFinder software is influenced by sample size (Spiegelaere et al., 2015). GeNorm ranking for genes is based on the highest degree of similarity in their expression profile and does not take the amount of variation into account (Andersen et al., 2004). BestKeeper utilizes Pearson correlation analysis and is just valid for normally distributed data with a homogeneous variance. In general, the ranking of HKGs is different based on the output of RefFinder, NormFinder, GeNorm, and BestKeeper, and there is little overlap (Kou et al., 2017). Owing to the dynamic and high-throughput nature of the next-generation sequencing data, the methodology that was introduced in the current work seems to overcome the weakness of the previously used methods. Moreover, utilization of different datasets that belong to different studies, instead of real-time PCR data, is the superiority of the current work over the previous research. In addition, the integration of the results of at least three datasets per tissue seems to increase the reliability of the results.

Conclusion

In the present work, we, for the first time, conducted a comprehensive genome-wide gene expression evaluation of 3,000–11,000 genes, analyzing 94 experiments in order to assess the suitability of previously known HKGs as well as to discover the most stable, new housekeeping genes for each of 16 chicken tissues. The results clearly revealed novel reference genes with more stable expressions in different experimental conditions. On the basis of the definition of ubiquitous and stable expression, our results suggest that no single gene qualifies as a real HKG. In addition, although we identified some genes that were suited for more than one tissue, most of the introduced new and validated HKGs were tissue specific. Thus, instead of one suitable HKG, we reported 10 high-ranked, stable genes for each tissue to provide future studies with more options to choose from. The identified new HKGs were predominantly involved in transcription, translation, and protein biosynthesis. There were 17 common HKGs that were suitable for more than one tissue. We strongly suggest them as well as the V-HKGs for normalization in all future qRT-PCR experiments. We believe that the results of the present work will contribute to more accurate normalization of chicken gene expression data, especially for the data of heart, liver, kidney, spleen, and muscle tissues, and that their results will be validated by analyzing additional sets of datasets.
  44 in total

1.  Selection of endogenous reference genes for gene expression analysis in the Mediterranean species of the Bemisia tabaci (Hemiptera: Aleyrodidae) complex.

Authors:  Yun Su; Wen-Bo He; Jia Wang; Jun-Min Li; Shu-Sheng Liu; Xiao-Wei Wang
Journal:  J Econ Entomol       Date:  2013-06       Impact factor: 2.381

Review 2.  Twenty-five years of quantitative PCR for gene expression analysis.

Authors:  Heather D VanGuilder; Kent E Vrana; Willard M Freeman
Journal:  Biotechniques       Date:  2008-04       Impact factor: 1.993

3.  Identification of reliable reference genes for quantitative real-time PCR in ovary and uterus of laying hens under heat stress.

Authors:  Hossein Hassanpour; Zahra Aghajani; Shahab Bahadoran; Navid Farhadi; Hasan Nazari; Waranyoo Kaewduangta
Journal:  Stress       Date:  2019-02-26       Impact factor: 3.493

4.  Reference gene validation for RT-qPCR, a note on different available software packages.

Authors:  Ward De Spiegelaere; Jutta Dern-Wieloch; Roswitha Weigel; Valérie Schumacher; Hubert Schorle; Daniel Nettersheim; Martin Bergmann; Ralph Brehm; Sabine Kliesch; Linos Vandekerckhove; Cornelia Fink
Journal:  PLoS One       Date:  2015-03-31       Impact factor: 3.240

5.  Influence of heat stress, sex and genetic groups on reference genes stability in muscle tissue of chicken.

Authors:  Haniel Cedraz de Oliveira; Antonio Amandio Pinto Garcia; Juliana Gracielle Gonzaga Gromboni; Ronaldo Vasconcelos Farias Filho; Carlos Souza do Nascimento; Amauri Arias Wenceslau
Journal:  PLoS One       Date:  2017-05-01       Impact factor: 3.240

6.  Analysis of the p53 pathway in peripheral blood of retinoblastoma patients; potential biomarkers.

Authors:  Mayra Martínez-Sánchez; Mariana Moctezuma-Dávila; Jesús Hernandez-Monge; Martha Rangel-Charqueño; Vanesa Olivares-Illana
Journal:  PLoS One       Date:  2020-06-05       Impact factor: 3.240

7.  Ppia is the most stable housekeeping gene for qRT-PCR normalization in kidneys of three Pkd1-deficient mouse models.

Authors:  Juan J Muñoz; Ana C Anauate; Luiz F Onuchic; Ita P Heilberg; Andressa G Amaral; Frederico M Ferreira; Elieser H Watanabe; Renata Meca; Milene S Ormanji; Mirian A Boim
Journal:  Sci Rep       Date:  2021-10-05       Impact factor: 4.379

8.  Validation of reference genes for expression studies during craniofacial development in arctic charr.

Authors:  Ehsan Pashay Ahi; Jóhannes Guðbrandsson; Kalina H Kapralova; Sigríður R Franzdóttir; Sigurður S Snorrason; Valerie H Maier; Zophonías O Jónsson
Journal:  PLoS One       Date:  2013-06-13       Impact factor: 3.240

9.  Reference gene selection for gene expression study in shell gland and spleen of laying hens challenged with infectious bronchitis virus.

Authors:  Samiullah Khan; Juliet Roberts; Shu-Biao Wu
Journal:  Sci Rep       Date:  2017-10-27       Impact factor: 4.379

10.  Validating Reference Gene Expression Stability in Human Ovarian Follicles, Oocytes, Cumulus Cells, Ovarian Medulla, and Ovarian Cortex Tissue.

Authors:  Jesús Cadenas; Susanne Elisabeth Pors; Dmitry Nikiforov; Mengxue Zheng; Cristina Subiran; Jane Alrø Bøtkjær; Linn Salto Mamsen; Stine Gry Kristensen; Claus Yding Andersen
Journal:  Int J Mol Sci       Date:  2022-01-14       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.