Literature DB >> 31208411

Defining housekeeping genes suitable for RNA-seq analysis of the human allograft kidney biopsy tissue.

Zijie Wang1, Zili Lyu2, Ling Pan3, Gang Zeng4, Parmjeet Randhawa5.   

Abstract

BACKGROUND: RNA-seq is poised to play a major role in the management of kidney transplant patients. Rigorous definition of housekeeping genes (HKG) is essential for further progress in this field. Using single genes or a limited set HKG is inherently problematic since their expression might be altered by specific diseases in the patients being studied.
METHODS: To generate a HKG set specific for kidney transplantation, we performed RNA-sequencing from renal allograft biopsies collected in a variety of clinical settings. Various normalization methods were applied to identify transcripts that had a coefficient of variation of expression that was below the 2nd percentile across all samples, and the corresponding genes were designated as housekeeping genes. Comparison with transcriptomic data from the Gene Expression Omnibus (GEO) database, pathway analysis and molecular biological functions were utilized to validate the housekeeping genes set.
RESULTS: We have developed a bioinformatics solution to this problem by using nine different normalization methods to derive large HKG gene sets from a RNA-seq data set of 47,611 transcripts derived from 30 biopsies. These biopsies were collected in a variety of clinical settings, including normal function, acute rejection, interstitial nephritis, interstitial fibrosis/tubular atrophy and polyomavirus nephropathy. Transcripts with coefficient of variation below the 2nd percentile were designated as HKG, and validated by showing their virtual absence in diseased allograft derived transcriptomic data sets available in the GEO. Pathway analysis indicated a role for these genes in maintenance of cell morphology, pyrimidine metabolism, and intracellular protein signaling.
CONCLUSIONS: Utilization of these objectively defined HKG data sets will guard against errors resulting from focusing on individual genes like 18S RNA, actin & tubulin, which do not maintain constant expression across the known spectrum of renal allograft pathology.

Entities:  

Keywords:  Genes with housekeeping functions; Kidney transplantation; RNA-sequencing

Mesh:

Substances:

Year:  2019        PMID: 31208411      PMCID: PMC6580566          DOI: 10.1186/s12920-019-0538-z

Source DB:  PubMed          Journal:  BMC Med Genomics        ISSN: 1755-8794            Impact factor:   3.063


Background

During the last decade, remarkable advances have been achieved in clinical medicine by the application of DNA microarray technology. Molecular signatures relevant to the diagnosis, prognosis and therapy have been discovered for numerous diseases [1-3]. In recent years, RNA-sequencing (RNA-seq) has been recognized as an attractive alternate technology for the same purpose. Compared to microarrays, RNA-seq provides a more comprehensive profiling of the transcriptome, with better quantitation, over a wider dynamic range, while allowing single base resolution, and detection of isoforms, RNA editing events, microRNAs and long noncoding RNAs [4-6]. The technology has been refined sufficiently to allow mRNA profiling of single cells. Challenges among the application of RNA-seq in clinical medicine include the need for an experimental design that includes sufficient numbers of biologic and technical replicates, and implementation of a mathematically valid bioinformatics pipeline to mine the large volume of data generated at a reasonable cost [7, 8]. The application of RNA-seq to the allograft kidney is at a very rudimentary stage. Rigorous definition of housekeeping genes (HKG) is essential for further progress in this field. HKG can be defined as genes ubiquitously expressed in all tissue compartments and cell-types regardless of their developmental stage, physiological condition and exposure to external stimuli [9]. HKG used in traditional clinical studies and classical biology experiments include 18S RNA, 28S RNA, tubulins, beta-actins, and glyceraldehyde-3-phosphate dehydrogenase (GAPDH). However, it is known that the expression of these genes is not constant through the cell cycle, and is further altered in response to injurious stimuli. Indeed 18 s RNA Is one of the biomarkers associated with acute rejection [10]. Actin is upregulated in chronic allograft dysfunction [11]. Tubulin is targeted by Colchicine, a drug used in patients with gout: it inhibits microtubule polymerization by binding to tubulin and block mitosis by acting as a ‘spindle poison’ [12]. These examples illustrate how use of single genes or a limited set HKG can be inherently problematic. One potential solution to the problem is to use bioinformatics techniques and derive large HKG data sets for evaluation of high throughput gene expression data. This will ensure that alteration of a small number of genes due to experimental conditions does not unfavorably affect the overall data analysis. Accordingly, this study has developed HKG gene sets appropriate for assessment of differential gene expression using nine different nine normalization methods that include library size, total counts (TC), upper quartile (UQ), Median, Quantile, trimmed mean of M -values (TMM), reads per kilobase million (RPKM), transcripts per kilobase million (TPM) and DESeq. HKG lists are offered that are specific to particular normalization paradigms. In addition, there is a universal set of 42 housekeeping transcripts that are common to all nine individual analyses.

Methods

Clinical material

This study was approved by the University of Pittsburgh IRB (protocol # 10110393). Formalin fixed paraffin embedded renal allograft biopsies (n = 25) were derived from recipients diagnosed with acute tubular injury (ATI; n = 5), T cell-mediated rejection (TCMR; n = 5), interstitial fibrosis and tubular atrophy (IFTA; n = 5), and BK virus-associated nephropathy (BKVN; n = 5), as well as recipients with stable allograft function (STA; n = 5). Five native kidney biopsies with interstitial nephritis (ISN; n = 5) were also studied. The clinical and pathology parameters pertinent to these specimens have been published previously [13].

RNA sequencing

RNA was extracted from 1 cubic mm pieces of formalin fixed paraffin embedded biopsy tissue using the Invitrogen PureLink™ FFPE RNA Isolation Kit (Catalog number: K156002), which includes a melting buffer to remove paraffin and a Proteinase K digestion step. cDNA libraries were constructed from 100 ng total RNA obtained using the Ion Ampliseq Transcriptome Human Gene Expression Kit from Life Technologies (Cat# A26325) and the manufacturers recommended protocol. Ampliseq Transcriptome analysis was performed by PrimBio Research Institute LLC, Exton, PA, USA, using an Ion Proton sequencer Ion Proton P1 chips, IonXpress barcodes, and Torrrent_Suite 5.0.4 software according to the manufacturer’s instructions. Briefly, Library Amp Primers were employed to amplify the purified cDNA library by PCR, and the yield and size of distribution of each library was run on Agilent 2100 Bioanalyzer. Approximately 100 pM of pooled barcoded libraries were used for templating using the Life Technologies Ion Chef Kit. Raw sequence files (fastq) were aligned to the human transcriptome (hg19) reference sequences in StrandNGS software. Gene and transcript annotations were retrieved from the Ensembl database to generate aligned SAM files, which were filtered on read quality (> 15), alignment score (≥90), match count (≤1) and mapping quality (≥25). RNA-seq quality control data on these biopsies has been published has been published [14]. RNA purity assessed by the A260/A280 ratio ranged from 1.87 to 2.0. RNA fragments of greater than 200 nucleotides in length comprised greater than 30% of the total RNA concentration. The mean sequence length in this RNA-Seq data set ranged from 66 to 117 nucleotides. Greater than 98.5% of the reads aligned to the human transcriptome with accuracy rates of greater than 97%. Our data has been submitted to the GEO database (GSE120495).

Definition of HKG/normalization methods

The term HKG has been conceived to refer to genes responsible for maintenance of fundamental cellular function. These genes are ubiquitously expressed at approximately the same level in all cell-types regardless of developmental stage, physiological condition and presence of external stimuli [9, 15]. In this study, genes with expression coefficients of variance (CV) corresponding to the 2nd percentile across all 30 samples were assigned to the HKG category as has been suggested in the literature [16]. In a dataset of 47,613 genes this corresponded to 952 genes with the lowest CV. CV was calculated as the ratio of the standard deviation (SD) σ to the arithmetical mean μ of each gene. Comparison of RNA-seq expression values across multiple samples requires normalization of data. Several normalization algorithms have been described in the literature, and we explored nine different methods, namely, library size, TC, UQ, Median, Quantile, RPKM, TPM, TMM and DESeq. Briefly, library size refers to the number of reads that aligned to the human genome. TC refers to transcript counts that remained after removing genes with an expression value of zero in all samples. The UQ scaling factor was calculated as a ratio of the 75th percentile of counts for each sample divided by the mean 75th percentile in all 30 samples [17]. The median scaling factor was obtained in the same manner using the 50th percentile [18]. Quantile normalization was implemented in R software using the “normalizeQuantiles()” function in the EBSeq package (Bioconductor version 3.6). This method sorts the test and reference distributions and proceeds to assign the highest value in latter to the highest value in the former [19]. The RPKM method attempts to normalize first for sequencing depth (per million reads) and then gene length (expressed in kilobases) [20]. TPM normalization proceeds in the reverse order: first, the raw read counts are divided by the length of the gene in kilobases, and then divided by the “per million” scaling factor [21]. TMM normalization was performed using the “calcNormFactors()” function in the edgeR package. The TMM method calculates a scaling factor based on a weighted trimmed mean of log gene expression ratios based on the assumption that most genes are not differentially expressed. Weights are assigned to account for the fact that genes with larger RNA-seq counts have lower variance, and data from both the upper and lower ends are trimmed prior to deriving a scaling factor for the sample library size [22]. Finally, DESeq normalization was implemented in DESeq package by calling the “estimateSizeFactors()” and “sizeFactors()” functions, which are also based on the hypothesis that most genes in the RNA-seq are not differentially expressed [23]. The performance of different normalization methods on our dataset was compared by calculating the bias and variance of genes in each HKG set [24]. The following formulae were used for the calculation of bias and variance, respectively: In these formulae, the Kij represents the normalized read counts for ith gene from the jth sample, where the is the mean value of normalized read counts of each gene across 30 samples.

Validation of HKG using published datasets

It was reasoned that genes classified HKG in this study would have minimal representation in lists of genes known to be differentially expressed in disease states that affect the kidney. Accordingly, we sought overlaps between the HKG dataset, and published gene sets derived from biopsy with T-cell mediated rejection, antibody mediated rejection, polyomavirus nephropathy, and chronic allograft damage [25-28]. Probe sets used to define disease associated genes in these studies were extracted from the NCBI GEO (Gene Expression Omnibus) database, and the corresponding gene and transcript annotations were obtained from the Ensembl database. Overlaps between gene lists of interest were defined by the “Compare” tool available in IPA® (Ingenuity Pathway Analysis) software (QIAGEN Biotechnology, Venlo, Netherlands). IPA core analysis was used to define the top-ranked canonical pathways and molecular functions associated with HKGs. A flow diagram of the steps used to identify and validate HKG in this study is presented as Fig. 1.
Fig. 1

Flow diagram of the steps used to identify and validate HKG genes in this study

Flow diagram of the steps used to identify and validate HKG genes in this study

Results

Identification of housekeeping genes

The mean number of reads with a quality score > Q30 obtained from the 30 biopsies ranged from 19 to 28 million, and yielded a total of 57,738 distinct reads that aligned to the hg19 human reference genome. After removing genes with an extracted expression value of zero in all biopsies, 47,613 transcripts remained for further consideration. Nine different HKG sets were created, one for each normalization method. Individual HKG expression accounted for only a small percentage of the total transcription activity in the samples. This is suggested by our calculation of expression ratios that represent mean normalized transcript counts of individual genes expressed as a proportion of the maximal transcript read count in the entire sample set. The numerical value of these expression ratios was less than < 0.05% for > 70% of the HKGs. (Table 1). The median coefficient of variation associated with most normalization methods was comparable (~ 0.3) except for the RPKM and TC methods where it was substantially higher (0.66 & 0.43 respectively) (Fig. 2a). The bias and variance of gene expression measurements was also the highest for these same two normalization methods (Table 1) indicating that the other methods tested by us provide much better data normalization. Similar results were obtained if CVs were calculated for the 42 HKG common to all normalization methods (Fig. 2b).
Table 1

Summary of HKG Datasets Defined in This Study Using 9 Different Normalization Methods

Normalization methodsExpression ratio*Bias**Variance**
0–0.01 (%)0.01–0.05 (%)0.05–0.20 (%)0.20–0.40 (%)0.40–0.60 (%)0.60–0.80 (%)0.80–1.0 (%)
TC396 (41.60)473 (49.68)78 (8.19)1 (0.11)1 (0.11)0 (0)3 (0.32)0.740.55
UQ216 (22.69)612 (64.29)115 (12.08)3 (0.32)4 (0.42)1 (0.11)1 (0.11)0.450.21
Median157 (16.49)643 (67.54)142 (14.92)7 (0.74)2 (0.21)0 (0)1 (0.11)0.450.22
Quantile125 (13.13)655 (68.80)161 (16.91)6 (0.63)3 (0.32)1 (0.11)1 (0.11)0.420.18
TMM236 (24.79)599 (62.92)108 (11.34)4 (0.42)4 (0.42)0 (0)1 (0.11)0.470.23
DESeq231 (24.26)610 (64.08)104 (10.89)4 (0.42)2 (0.21)0 (0)1 (0.11)0.430.19
TPM157 (16.49)643 (67.54)142 (14.92)7 (0.74)2 (0.21)0 (0)1 (0.11)0.450.22
RPKM603 (63.34)319 (33.51)26 (2.73)2 (0.21)0 (0)0 (0)2 (0.21)1.041.03
Lib_size202 (21.22)617 (64.81)123 (12.92)7 (0.74)2 (0.21)0 (0)1 (0.11)0.430.20

Abbreviations: TC total counts, UQ upper quantile, TMM trimmed mean of M-values, DESeq a differential expression package implemented in R, TPM transcripts per kilobase million, RPKM reads per kilobase per million mapped reads, Lib_size library size

*The expression ratio of each housekeeping gene was calculated by its mean normalized read divided by the maximum reads in its corresponding HKG set

**The bias and variance of each normalization method was calculated by the formulae

Fig. 2

Box plots showing the median, first quartile, third quartile, and range of CV (coefficient of variance) for all 952 HKG defined by nine different normalization algorithms (a) and for the subset of 42 HKG common to all nine normalization methods (b) . a The median values (range) of CV in 952 HKGs defined by RPKM and TC are 0.67 (0.65–0.69) and 0.44 (0.41–0.45), respectively; whereas the mean values of CV defined by UQ, Median, Quantile, TMM, DESeq, TPM and Library size are 0.31 (0.28–0.33), 0.29 (0.27–0.31), 0.29 (0.26–0.31), 0.31 (0.29–0.33), 0.30 (0.27–0.32), 0.29 (0.27–0.31), 0.29 (0.26–0.31), respectively. b The median values (range) of CV in 42 common HKGs defined by RPKM and TC are 0.67 (0.65–0.69) and 0.43 (0.42–0.45), respectively; whereas the mean values of CV defined by UQ, Median, Quantile, TMM, DESeq, TPM and Library size are 0.28 (0.26–0.31), 0.25 (0.23–0.28), 0.25 (0.22–0.29), 0.26 (0.24–0.30), 0.25 (0.22–0.29), 0.25 (0.23–0.28), 0.25 (0.22–0.28), respectively. TC: total counts; UQ: upper quartile; TMM: trimmed mean of M-values; TPM: transcripts per kilobase million; RPKM: reads per kilobase per million mapped reads; e (see Materials and Methods section for details)

Summary of HKG Datasets Defined in This Study Using 9 Different Normalization Methods Abbreviations: TC total counts, UQ upper quantile, TMM trimmed mean of M-values, DESeq a differential expression package implemented in R, TPM transcripts per kilobase million, RPKM reads per kilobase per million mapped reads, Lib_size library size *The expression ratio of each housekeeping gene was calculated by its mean normalized read divided by the maximum reads in its corresponding HKG set **The bias and variance of each normalization method was calculated by the formulae Box plots showing the median, first quartile, third quartile, and range of CV (coefficient of variance) for all 952 HKG defined by nine different normalization algorithms (a) and for the subset of 42 HKG common to all nine normalization methods (b) . a The median values (range) of CV in 952 HKGs defined by RPKM and TC are 0.67 (0.65–0.69) and 0.44 (0.41–0.45), respectively; whereas the mean values of CV defined by UQ, Median, Quantile, TMM, DESeq, TPM and Library size are 0.31 (0.28–0.33), 0.29 (0.27–0.31), 0.29 (0.26–0.31), 0.31 (0.29–0.33), 0.30 (0.27–0.32), 0.29 (0.27–0.31), 0.29 (0.26–0.31), respectively. b The median values (range) of CV in 42 common HKGs defined by RPKM and TC are 0.67 (0.65–0.69) and 0.43 (0.42–0.45), respectively; whereas the mean values of CV defined by UQ, Median, Quantile, TMM, DESeq, TPM and Library size are 0.28 (0.26–0.31), 0.25 (0.23–0.28), 0.25 (0.22–0.29), 0.26 (0.24–0.30), 0.25 (0.22–0.29), 0.25 (0.23–0.28), 0.25 (0.22–0.28), respectively. TC: total counts; UQ: upper quartile; TMM: trimmed mean of M-values; TPM: transcripts per kilobase million; RPKM: reads per kilobase per million mapped reads; e (see Materials and Methods section for details)

Validation of housekeeping genes

By definition, HKG maintain basic cellular functions and their expression does should not change in different disease states. This prediction was verified using public datasets for 4 common pathologic conditions in the kidney, namely, TCMR, antibody mediated rejection (ABMR), BKVN and chronic allograft injury manifesting as IFTA (Table 2). None of the 952 genes identified as HKG in this study were differentially expressed in human allograft biopsies with TCMR. For the gene lists associated with the remaining biopsy-diagnoses, an overlap of no more than 3 genes was seen with our HKG lists. This is remarkable since Gene Expression Omnibus data used in these comparisons was derived from more than 1000 biopsies.
Table 2

Overlapsa Between Gene Expression Datasets Derived from Diseased Allograft Kidney & HKG Defined in This Study

Reference#Biopsies#of DE transcriptsBiopsy DiagnosisNormalization method Used to Define HKG
TC (%)UQ (%)Median (%)Quantile (%)TMM (%)DESeq (%)TPM (%)RPKM (%)Lib_siz (%)
[25]703453ABMR2(0.40)2(0.44)2(0.44)2(0.44)1(0.22)2(0.44)2(0.44)8(1.77)1(0.22)
[26]70882TCMR0(0)0(0)0(0)0(0)0(0)0(0)0(0)0(0)0(0)
[27]168206BKVN3(1.46)5(2.43)3(1.46)3(1.46)3(1.46)3(1.46)3(1.46)3(1.46)3(1.46)
[28]20482Chronic allograft damage1(1.22)1(1.22)2(2.44)2(2.44)1(1.22)1(1.22)2(2.44)1(1.22)2(2.44)

Abbreviations: ABMR antibody mediated rejection, DE differentially expressed, TCMR T-cell mediated rejection, BKVN polyomavirus nephropathy, For other abbreviations, see legend to Table 1

aThe total number of overlapping genes with the specified datasets is enumerated

Overlapsa Between Gene Expression Datasets Derived from Diseased Allograft Kidney & HKG Defined in This Study Abbreviations: ABMR antibody mediated rejection, DE differentially expressed, TCMR T-cell mediated rejection, BKVN polyomavirus nephropathy, For other abbreviations, see legend to Table 1 aThe total number of overlapping genes with the specified datasets is enumerated As an alternate approach to validating the HKG datasets obtained in this study, we compared the constituent genes with HKG lists defined by other investigators using varied technical approaches including expressed sequence tags, DNA microarray, RNA-seq, and massively parallel signature sequencing (Table 3) [29-35]. HKG derived from sequencing based technologies gave the largest number of genes (279 to 656) in common with our own RNA-seq derived gene list. There were fewer (80 to 117) genes shared with microarray technology-based lists. It is apparent that HKG gene identification can be affected by both the normalization method used as well as the technology applied to measure gene expression. The type of tissue analyzed is also an important variable. Whereas all our samples represent the allograft kidney, the aforementioned prior studies included multiple organs in their analysis. Thus, different HKG gene sets can be equally valid depending on the clinical setting and sample set being investigated.
Table 3

Comparison of Published housekeeping genes with HKG Datasets Defined in This Study

Study#samples#HKG#Tissues/cells studiedTechniqueNormalization methodHousekeeping Gene Set Stratified by Normalization Methoda
TC (%)UQ (%)Median (%)Quantile (%)TMM (%)DESeq (%)TPM (%)RPKM (%)Lib_siz (%)
(4)142178979MicroarrayNAb94 (5.25)91 (5.09)115 (6.43)111 (6.20)92 (5.14)89 (4.97)115 (6.43)76 (4.25)96 (5.37)
(5)18240318MicroarrayNA110 (4.58)145 (6.03)158 (6.58)161 (6.70)132 (5.49)124 (5.16)158 (6.58)103 (4.29)133 (5.53)
(6)42152242MicroarrayNA89 (5.84)112 (7.36)117 (7.69)115 (7.56)87 (5.72)88 (5.78)117 (7.69)80 (5.26)92 (6.04)
(6)NA15,05032sequencing_MPSSTPM516 (3.43)578 (3.84)566 (3.76)581 (3.86)550 (3.65)559 (3.71)566 (3.76)489 (3.25)559 (3.71)
(7)2502690918Sequencing_ESTNA398 (5.76)542 (7.84)533 (7.71)551 (7.98)463 (6.70)458 (6.63)533 (7.71)369 (5.34)471 (6.82)
(8)NA12,71419sequencing_ESTNA583 (4.59)627 (4.93)642 (5.05)656 (5.16)610 (4.80)620 (4.88)642 (5.05)546 (4.29)628 (4.94)
(9)NA789632RNA-SeqRPKM514 (6.51)628 (7.95)654 (8.28)656 (8.31)601 (7.61)594 (7.52)654 (8.28)441 (5.59)615 (7.79)
(10)16380416RNA-SeqRPKM279 (7.33)361 (9.49)372 (9.78)379 (9.96)317 (8.33)315 (8.28)372 (9.78)212 (5.57)329 (8.64)

Abbreviations: EST expressed sequence tags, HKG housekeeping genes, MPSS Massively parallel signature sequencing, ABMR antibody mediated rejection, DE differentially expressed, TCMR T-cell mediated rejection, PVAN polyomavirus nephropathy, NA not available; for other abbreviations, see legend to Table 1

aThe total number (%) of overlapping genes with the specified datasets is enumerated. Percentage calculations are based on the total number of HKG in column 3

bThe normalization methods in these references were not mentioned, but the most common method used for microarray data is Quantile normalization

Comparison of Published housekeeping genes with HKG Datasets Defined in This Study Abbreviations: EST expressed sequence tags, HKG housekeeping genes, MPSS Massively parallel signature sequencing, ABMR antibody mediated rejection, DE differentially expressed, TCMR T-cell mediated rejection, PVAN polyomavirus nephropathy, NA not available; for other abbreviations, see legend to Table 1 aThe total number (%) of overlapping genes with the specified datasets is enumerated. Percentage calculations are based on the total number of HKG in column 3 bThe normalization methods in these references were not mentioned, but the most common method used for microarray data is Quantile normalization

Pathway analyses

Ingenuity pathway analysis was performed on 42 genes common to 9 HKG sets derived from different normalization methods. The Entrez gene names and molecular functions of these genes are listed in Table 4. The majority are involved in chromatin, core promoter, DNA, mRNA, protein, or ATP binding, while others represent ubiquitous enzymes belonging to the protein kinase, phosphatase, protease, ligase, ATPase or GTPase family. Physiologic functions mediated by these housekeeping genes included regulation of the cell cycle, cell to cell signaling, post-translational modifications, cell morphology, cell movement, molecular transport, and lipid or nucleic acid metabolism (Figs. 3 and 4, Table 5). The top 4 canonical pathways identified all involved de novo or salvage pathways of pyrimidine biosynthesis, including pyrimidine ribonucleotides interconversion, pyrimidine ribonucleotides de novo biosynthesis, and pyrimidine deoxyribonucleotides de novo biosynthesis. Notably, less than 5% of the genes in these pathways met the criterion for being classified as a housekeeping gene. The majority of the remaining canonical pathways were related to protein signaling mediated by the protein kinase A, p38 MAPK, RhoA, CREB, ERK/MAPK, Eif4, p70S6K, IL-12, glucocorticoid receptor, estrogen receptor, or progesterone receptor pathways.
Table 4

Housekeeping Genes (n = 42) Common to All Normalization Methods

Entrez Gene IDTranscriptsEntrez Gene NameMolecular function
51,433ANAPC5anaphase promoting complex subunit 5protein phosphatase binding
25,906ANAPC15anaphase promoting complex subunit 15anaphase-promoting complex
10,620ARID3BAT-rich interaction domain 3Btranscription regulator
285,598ARL10ADP ribosylation factor like GTPase 10small GTPase mediated signal transduction
6311ATXN2ataxin 2epidermal growth factor receptor binding
57,020C16orf62chromosome 16 open reading frame 62protein binding
132,200C3orf49chromosome 3 open reading frame 49unknown
55,749CCAR1cell division cycle and apoptosis regulator 1core promoter binding
202,243CCDC125coiled-coil domain containing 125regulation of cell motility
60,492CCDC90Bcoiled-coil domain containing 90Bprotein binding
55,743CHFRcheckpoint with forkhead and ring finger domainsE3 ubiquitin-protein ligase
207,063DHRSXdehydrogenase/reductase X-linkedoxidoreductase activity
83,786FRMD8FERM domain containing 8protein binding
26,088GGA1golgi associated, gamma adaptin ear containing, ARF binding protein 1cellular protein metabolic process
26,091HERC4HECT and RLD domain containing E3 ubiquitin protein ligase 4transferase activity; ubiquitin-protein ligase activity
8569MKNK1MAP kinase interacting serine/threonine kinase 1ATP binding; calcium-dependent protein serine/threonine kinase activity
4678NASPnuclear autoantigenic sperm proteinhistone binding; Hsp90 protein binding;
4833NME4NME/NM23 nucleoside diphosphate kinase 4ubiquitous enzymes
55,611OTUB1OTU deubiquitinase, ubiquitin aldehyde binding 1NEDD8-specific protease activity
11,243|100,527,963PMF1/PMF1-BGLAPpolyamine modulated factor 1leucine zipper domain binding
5431POLR2BRNA polymerase II subunit Bchromatin binding
11,128POLR3ARNA polymerase III subunit Achromatin binding
84,197POMKprotein-O-mannose kinaseATP binding; carbohydrate kinase activity
379,025PSMA3-AS1PSMA3 antisense RNA 1unknown
5784PTPN14protein tyrosine phosphatase, non-receptor type 14hydrolase activity; phosphatase activity
51,735RAPGEF6Rap guanine nucleotide exchange factor 6GTP-dependent protein binding
5966RELREL proto-oncogene, NF-kB subunitchromatin binding; DNA binding
8568RRP1ribosomal RNA processing 1RNA binding
146,923RUNDC1RUN domain containing 1GTPase activator activity; Rab GTPase binding
55,095SAMD4Bsterile alpha motif domain containing 4BmRNA binding;
22,950SLC4A1APsolute carrier family 4 member 1 adaptor proteinmRNA binding; protein binding
7871SLMAPsarcolemma associated proteinprotein binding
50,485SMARCAL1SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily a like 1ATP binding; DNA-dependent ATPase activity
9342SNAP29synaptosome associated protein 29protein binding; SNAP receptor activity
23,020SNRNP200small nuclear ribonucleoprotein U5 subunit 200ATP binding; ATP-dependent helicase activity
6827SUPT4H1SPT4 homolog, DSIF elongation factor subunitmetal ion binding; protein binding
25,771TBC1D22ATBC1 domain family member 22A14–3-3 protein binding; GTPase activator activity
440,944THUMPD3-AS1THUMPD3 antisense RNA 1unknown
100,506,779TSPOAP1-AS1TSPOAP1 antisense RNA 1unknown
10,844TUBGCP2tubulin gamma complex associated protein 2gamma-tubulin binding
23,038WDTC1WD and tetratricopeptide repeats 1enzyme inhibitor activity
27,300ZNF544zinc finger protein 544DNA binding; metal ion binding
Fig. 3

Canonical pathways identified by IPA core analysis as over-represented amongst 42 HKG common to 9 different data normalization methods. Pathways meeting statistical confidence thresholds preset in IPA are identified on the Y-axis (−log10 p = 1.3, right-tailed Fisher’s exact test). The lower X-axis and the line diagram display the proportion of total genes in the specified pathway that meet the cutoff criteria for identification

Fig. 4

Top 20 physiologic functions associated with 42 HKG common to all biopsies and normalization methods. Physiological functions meeting statistical confidence thresholds (−log10 p = 1.3, right-tailed Fisher’s exact test)

Table 5

Canonical Pathways identified by IPA software for 42 Housekeeping Genes Common to All Normalization Methods

Ingenuity Canonical Pathways-log(p-value)RatioMolecules
Pyrimidine Ribonucleotides Interconversion2.580.0444NME4,SMARCAL1
Pyrimidine Ribonucleotides De Novo Biosynthesis2.550.0426NME4,SMARCAL1
Salvage Pathways of Pyrimidine Ribonucleotides1.950.0211NME4,POMK
Pyrimidine Deoxyribonucleotides De Novo Biosynthesis I1.420.0435NME4
Nucleotide Excision Repair Pathway1.240.0286POLR2B
Assembly of RNA Polymerase II Complex1.090.02POLR2B
Pyridoxal 5′-phosphate Salvage Pathway0.9830.0154POMK
Mitotic Roles of Polo-Like Kinase0.9770.0152ANAPC5
Protein Kinase A Signaling0.8360.005PTPN14,ANAPC5
Androgen Signaling0.7670.00901POLR2B
p38 MAPK Signaling0.7360.00833MKNK1
RhoA Signaling0.7230.00806RAPGEF6
Estrogen Receptor Signaling0.7110.00781POLR2B
Hereditary Breast Cancer Signaling0.6650.00694POLR2B
IL-12 Signaling and Production in Macrophages0.660.00685REL
Regulation of eIF4 and p70S6K Signaling0.6320.00637MKNK1
CREB Signaling in Neurons0.5680.00538POLR2B
RAR Activation0.560.00526REL
ERK/MAPK Signaling0.5410.005MKNK1
Systemic Lupus Erythematosus Signaling0.4990.00444SNRNP200
Huntington’s Disease Signaling0.4610.004POLR2B
Protein Ubiquitination Pathway0.4410.00377ANAPC5
Glucocorticoid Receptor Signaling0.3580.00295POLR2B
Axonal Guidance Signaling0.270.00221MKNK1
Housekeeping Genes (n = 42) Common to All Normalization Methods Canonical pathways identified by IPA core analysis as over-represented amongst 42 HKG common to 9 different data normalization methods. Pathways meeting statistical confidence thresholds preset in IPA are identified on the Y-axis (−log10 p = 1.3, right-tailed Fisher’s exact test). The lower X-axis and the line diagram display the proportion of total genes in the specified pathway that meet the cutoff criteria for identification Top 20 physiologic functions associated with 42 HKG common to all biopsies and normalization methods. Physiological functions meeting statistical confidence thresholds (−log10 p = 1.3, right-tailed Fisher’s exact test) Canonical Pathways identified by IPA software for 42 Housekeeping Genes Common to All Normalization Methods

Discussion

The primary purpose of this study was to identify HKG appropriate for analyzing RNA-seq data derived from human renal allograft biopsies. It is expected that RNA-seq technology will be increasingly applied to discover molecular signatures relevant to the diagnosis, prognosis and therapy of diseases that commonly afflict kidney transplant recipients. The work performed has identified 9 HKG sets using different normalization methods and the question arises which gene set is most applicable to the analysis of gene expression data derived from renal allograft biopsies. Zyprych-Walczak et al. [36] analyzed transcripts from mammary epithelial cell lines, B-cells, and blood or bone marrow samples from patients with acute myeloid leukemia. They compared six normalization algorithms with respect to sensitivity, specificity, classification errors, and generation of diagnostic plots, and found that bias and variance were appropriate indices to compare the performance of different normalization methods. Application of this principle to our data indicates normalization using only the TC or RPKM methods is not advisable. The other normalization methods give essentially comparable results, although the quantiles method is nominally better than all the others that were tested. The basic idea behind RPKM is to normalize the reads first by total reads and then by gene length. Previous studies have confirmed the suboptimal performance of this method [17, 37, 38]. Interestingly, better performance was seen with TPM which differs from RPKM only in that normalization for gene length precedes correction for total reads. This reversal in the order of operations led to relatively uniform transcript counts in all 30 biopsies. However, in one prior study both TC and RPKM led to unsatisfactory results [39]. Two prior investigations noted that the quantile method is associated with lower variance in observed gene expression data, but there is a tradeoff that results in the introduction of some bias [19, 40]. Another study reported that DESeq method is the best for the normalization [39]. Our assessment of the published literature is that no single normalization method can be universally recommended for all data sets. HKG lists vary depending on study design, tissues analyzed, sequencing technology, normalization methods, as well as criteria and tools for housekeeping gene selection [41, 42]. Data distribution and analytic plans can influence the choice of normalization method: e.g. if most genes have low expression, upper quantile rather than median normalization should be preferred. On the other hand, if differential expression is to be performed by the DESeq program, the normalization algorithm incorporated in the software can work directly with unnormalized RNA-seq counts. Finally, we suggest that when working with renal allograft biopsies, the problem of choosing the right HKG set can be circumvented by using the list of 42 genes (Table 4) that is common to gene sets derived by 9 different algorithms. The HKG proposed in this study have been validated with reference to publicly available external gene expression datasets obtained on an independent platform, namely, the Affymetrix DNA microarray analysis system. These latter datasets were derived from kidney transplant biopsies with TCMR, ABMR, BKVN or i-IFTA. A second observation that validates our HKG gene lists is that these share up to 656 genes with other RNA-seq derived gene lists in the literature. Finally, our IPA analysis is consistent with the proposed housekeeping function of these genes, and is concordant with putative cellular and biologic functions of other HKG reported in the literature. These reported functions include RNA processing, RNA splicing, DNA repair and mRNA metabolic processes [43], cell morphology and signaling, defense/apoptosis, ribosomal protein signaling/communication, structure/motility [44, 45], and biogenesis of nucleotides/amino acids and protein localization [35]. It is to be noted that some genes such as GAPHD and beta actin (ACTB), which are widely used in biological experiments as housekeeping controls, do not appear in our HKG set [46-48]. Likewise, our gene list does not include 8 genes that have been suggested to be suitable as a reference set for studies of the non-transplanted kidney [49].

Conclusion

In summary, we have developed several different HKG gene lists applicable to RNA-seq data derived from for human allograft kidney biopsies and processed by a variety of normalization methods. We have also assembled a universal set of 42 HKG that can be used without regard to the actual normalization procedure used. The study is limited by the small number of biopsies studied and use of formalin fixed paraffin embedded tissue, which may not be optimal to detect genes expressed at low abundance. However, low abundance genes have high variance and are not good candidates for the HKG designation. Importantly, the general bioinformatics approach that we have outlined is applicable to define HKG for RNA-seq datasets of any size and RNA quality for transplantation of all organs. Appropriate normalization of samples with a comprehensive set of HKG provides a mechanism to correct for batch effects, which can be a significant obstacle in the implementation of RNA-seq as a monitoring tool in the transplant clinic.
  49 in total

1.  Analysis of boutique arrays: a universal method for the selection of the optimal data normalization procedure.

Authors:  Barbara Uszczyńska; Joanna Zyprych-Walczak; Luiza Handschuh; Alicja Szabelska; Maciej Kaźmierczak; Wiesława Woronowicz; Piotr Kozłowski; Michał M Sikorski; Mieczysław Komarnicki; Idzi Siatkowski; Marek Figlerowicz
Journal:  Int J Mol Med       Date:  2013-07-15       Impact factor: 4.101

2.  Diagnosis of T-cell-mediated kidney rejection in formalin-fixed, paraffin-embedded tissues using RNA-Seq-based machine learning algorithms.

Authors:  Peng Liu; George Tseng; Zijie Wang; Yuchen Huang; Parmjeet Randhawa
Journal:  Hum Pathol       Date:  2018-10-06       Impact factor: 3.466

3.  Role of tumor necrosis factor-α in epithelial-to-mesenchymal transition in transplanted kidney cells in recipients with chronic allograft dysfunction.

Authors:  Chunchun Zhao; Zhen Xu; Zijie Wang; Chuanjian Suo; Jun Tao; Zhijian Han; Min Gu; Ruoyun Tan
Journal:  Gene       Date:  2017-11-23       Impact factor: 3.688

Review 4.  RNA Sequencing of the Tumor Microenvironment in Precision Cancer Immunotherapy.

Authors:  Denise Lau; Alexandria M Bobe; Aly A Khan
Journal:  Trends Cancer       Date:  2019-03-08

5.  Intragraft Antiviral-Specific Gene Expression as a Distinctive Transcriptional Signature for Studies in Polyomavirus-Associated Nephropathy.

Authors:  Tara K Sigdel; Oriol Bestard; Nathan Salomonis; Szu-Chuan Hsieh; Joan Torras; Maarten Naesens; Tim Q Tran; Silke Roedder; Minnie M Sarwal
Journal:  Transplantation       Date:  2016-10       Impact factor: 4.939

6.  Differential expression analysis for sequence count data.

Authors:  Simon Anders; Wolfgang Huber
Journal:  Genome Biol       Date:  2010-10-27       Impact factor: 13.583

7.  Comparing the normalization methods for the differential analysis of Illumina high-throughput RNA-Seq data.

Authors:  Peipei Li; Yongjun Piao; Ho Sun Shon; Keun Ho Ryu
Journal:  BMC Bioinformatics       Date:  2015-10-28       Impact factor: 3.169

8.  Systematic identification of human housekeeping genes possibly useful as references in gene expression studies.

Authors:  Maria Caracausi; Allison Piovesan; Francesca Antonaros; Pierluigi Strippoli; Lorenza Vitale; Maria Chiara Pelleri
Journal:  Mol Med Rep       Date:  2017-07-06       Impact factor: 2.952

9.  How many human genes can be defined as housekeeping with current expression data?

Authors:  Jiang Zhu; Fuhong He; Shuhui Song; Jing Wang; Jun Yu
Journal:  BMC Genomics       Date:  2008-04-16       Impact factor: 3.969

10.  A robust (re-)annotation approach to generate unbiased mapping references for RNA-seq-based analyses of differential expression across closely related species.

Authors:  Montserrat Torres-Oliva; Isabel Almudi; Alistair P McGregor; Nico Posnien
Journal:  BMC Genomics       Date:  2016-05-24       Impact factor: 3.969

View more
  11 in total

1.  SAREV: A review on statistical analytics of single-cell RNA sequencing data.

Authors:  Dorothy Ellis; Dongyuan Wu; Susmita Datta
Journal:  Wiley Interdiscip Rev Comput Stat       Date:  2021-05-20

2.  Identification of Important Modules and Hub Gene in Chronic Kidney Disease Based on WGCNA.

Authors:  Jia Wang; Yuan Yin; Qun Lu; Yan-Rong Zhao; Yu-Jie Hu; Yun-Zhao Hu; Zheng-Yin Wang
Journal:  J Immunol Res       Date:  2022-05-04       Impact factor: 4.493

3.  In-silico performance, validation, and modeling of the Nanostring Banff Human Organ transplant gene panel using archival data from human kidney transplants.

Authors:  R N Smith
Journal:  BMC Med Genomics       Date:  2021-03-19       Impact factor: 3.063

4.  Development of an RNA sequencing panel to detect gene fusions in thyroid cancer.

Authors:  Dongmoung Kim; Seung-Hyun Jung; Yeun-Jun Chung
Journal:  Genomics Inform       Date:  2021-12-31

5.  Variable expression of eighteen common housekeeping genes in human non-cancerous kidney biopsies.

Authors:  Philipp Strauss; Håvard Mikkelsen; Jessica Furriol
Journal:  PLoS One       Date:  2021-12-09       Impact factor: 3.240

6.  Hypoxanthine Phosphoribosyl Transferase 1 Is Upregulated, Predicts Clinical Outcome and Controls Gene Expression in Breast Cancer.

Authors:  Melina J Sedano; Enrique I Ramos; Ramesh Choudhari; Alana L Harrison; Ramadevi Subramani; Rajkumar Lakshmanaswamy; Mina Zilaie; Shrikanth S Gadad
Journal:  Cancers (Basel)       Date:  2020-06-10       Impact factor: 6.639

7.  Catalogue of stage-specific transcripts in Ixodes ricinus and their potential functions during the tick life-cycle.

Authors:  Pavlina Vechtova; Zoltan Fussy; Radim Cegan; Jan Sterba; Jan Erhart; Vladimir Benes; Libor Grubhoffer
Journal:  Parasit Vectors       Date:  2020-06-16       Impact factor: 3.876

Review 8.  Non-coding RNAs in Brain Tumors, the Contribution of lncRNAs, circRNAs, and snoRNAs to Cancer Development-Their Diagnostic and Therapeutic Potential.

Authors:  Julia Latowska; Adriana Grabowska; Żaneta Zarębska; Konrad Kuczyński; Bogna Kuczyńska; Katarzyna Rolle
Journal:  Int J Mol Sci       Date:  2020-09-23       Impact factor: 5.923

9.  Technical considerations when designing a gene expression panel for renal transplant diagnosis.

Authors:  F Toulza; K Dominy; T Cook; J Galliford; J Beadle; A McLean; C Roufosse
Journal:  Sci Rep       Date:  2020-10-21       Impact factor: 4.379

10.  RNA sequencing of whole blood in dogs with primary immune-mediated hemolytic anemia (IMHA) reveals novel insights into disease pathogenesis.

Authors:  Corie Borchert; Adam Herman; Megan Roth; Aimee C Brooks; Steven G Friedenberg
Journal:  PLoS One       Date:  2020-10-22       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.