Literature DB >> 26426037

Differentially-Expressed Pseudogenes in HIV-1 Infection.

Aditi Gupta1,2, C Titus Brown3,4,5, Yong-Hui Zheng6,7, Christoph Adami8,9,10.   

Abstract

Not all pseudogenes are transcriptionally silent as previously thought. Pseudogene transcripts, although not translated, contribute to the non-coding RNA pool of the cell that regulates the expression of other genes. Pseudogene transcripts can also directly compete with the parent gene transcripts for mRNA stability and other cell factors, modulating their expression levels. Tissue-specific and cancer-specific differential expression of these "functional" pseudogenes has been reported. To ascertain potential pseudogene:gene interactions in HIV-1 infection, we analyzed transcriptomes from infected and uninfected T-cells and found that 21 pseudogenes are differentially expressed in HIV-1 infection. This is interesting because parent genes of one-third of these differentially-expressed pseudogenes are implicated in HIV-1 life cycle, and parent genes of half of these pseudogenes are involved in different viral infections. Our bioinformatics analysis identifies candidate pseudogene:gene interactions that may be of significance in HIV-1 infection. Experimental validation of these interactions would establish that retroviruses exploit this newly-discovered layer of host gene expression regulation for their own benefit.

Entities:  

Keywords:  HIV-1; RNASeq; differential gene expression; pseudogenes; transcriptome

Mesh:

Year:  2015        PMID: 26426037      PMCID: PMC4632377          DOI: 10.3390/v7102869

Source DB:  PubMed          Journal:  Viruses        ISSN: 1999-4915            Impact factor:   5.048


1. Introduction

Pseudogenes are considered relics of once-functional genes that have lost the ability to code for proteins. They arise due to mutations, frame-shifts and gene duplication events, where the duplicated gene loses transcriptional elements (like promoters and enhancers) necessary for gene expression [1]. Irrespective of how they are created, pseudogenes are traditionally assumed to be transcriptionally silent. However, microarray studies show that 2%–20% of pseudogenes are quantifiably expressed [1,2,3]. Tissue-specific expression of pseudogenes hints at their possible functional importance in those tissues [4,5]. Pseudogenes are also differentially expressed in diseases, notably cancer [6,7,8,9,10,11,12]. We are only beginning to understand the implications of pseudogene expression for cellular well-being. Pseudogenes can regulate the expression of parent genes by the following mechanisms [1]: (i) antisense pseudogene transcripts can pair with a sense parent transcript, altering the parent gene mRNA levels [13,14]; (ii) pseudogene transcripts are sometimes processed to produce small interfering RNAs that silence the parent gene [15,16]; and (iii) pseudogene transcripts can form competitive endogenous RNA (ceRNA) that competes with the parent gene transcripts for the shared pool of mRNA stability factors and miRNA [17,18,19,20]. Tumor suppressor gene PTEN, for example, is post-transcriptionally regulated by microRNAs that target transcripts of both PTEN and its pseudogene PTENP1. Yet, overexpression of the PTENP1 3 UTR recovers PTEN mRNA levels, highlighting the functional role of pseudogene PTENP1 transcripts in regulating parent-gene mRNA abundance levels [20]. Moreover, transcriptional networks are reported to be cross-regulated with ceRNA networks, suggesting that pseudogene-derived transcripts play an integral role in regulating cellular gene expression [21]. Pseudogene-derived endogenous siRNA can also regulate the expression of genes other than its parent gene [12]. In cancer, pseudogenes have been shown to affect the expression of non-parent genes as well, such as tumor suppressor gene MGA, when integrated into their promoter regions [9]. By contributing substantially to the small-RNA pool of a cell, pseudogenes may regulate the expression of their parent gene, as well as other genes. The majority of pseudogenes in humans are “processed”, i.e., they are created when a reverse-transcribed mRNA is re-integrated into the genome, often on chromosomes different from the parent gene [1,22,23]. Thus, these “processed” pseudogenes lack introns and contain poly-A tails. They also have direct repeats at the 3 and 5 ends that likely participate in the genome re-integration step [1,22,24]. However, how are these processed mRNA transcripts re-integrated back into the genome? Some long interspersed elements (LINEs) code for reverse transcriptase that makes the cDNA copy of mRNAs, possibly contributing to the generation of the processed pseudogenes [9,25,26]. Thus, retroviruses, such as HIV-1, by carrying viral reverse transcriptase and integrase to the infected cells, may create new pseudogenes and alter the expression of existing pseudogenes [27,28]. Carlton et al. find evidence for the generation of a processed pseudogene during retroviral infection and speculate that viral reverse transcriptase is involved in the process [28]. It is thus likely that pseudogene:gene interactions and pseudogene-derived endogenous non-coding RNA constitute another dimension of virus-host interactions. The impact of non-coding RNAs on HIV-1 replication and pathogenesis has already been discussed elsewhere [29], such as the role of host-induced antiviral microRNA in limiting the progression of viral infection [30]. In this study, we investigate differential expression of pseudogenes in HIV-1 infection by comparing the transcriptomes of infected and uninfected T-cells seven days post-infection. By identifying parent and other genes that show sequence similarity to these pseudogenes, we predict candidate pseudogene:gene interactions that may be relevant to HIV-1 infection. We further analyze publicly-available transcriptomic data collected at 12 h and 24 h post-HIV-1 infection to identify time-course changes in the expression of pseudogenes and their parent genes. Our results demonstrate that pseudogenes are active participants in host-pathogen interactions, and their role in modulating host gene expression should be investigated further.

2. Materials and Methods

2.1. Cells

The human T-cell line H9 was obtained from the NIH AIDS Research and Reference Reagent Program (Catalog Number 87) [31] and propagated in RPMI 1640 medium (Gibco) supplemented with 10% fetal bovine serum (HyClone) and 1% 100X Antibiotic-Antimycotic (Gibco). The NL4-3 strain of HIV-1 was obtained from the NIH AIDS Research and Reference Reagent Program (Catalog Number 9489), and the virus was prepared by transfection of 293T cells. 293T cells were purchased from ATCC and propagated in Dulbecco’s Modified Eagle’s Medium (DMEM) with 10% bovine calf serum (HyClone).

2.2. HIV-1 Infection

To generate transcriptomic data, we infected H9 T-cells with 300 ng NL43 strain of HIV-1 using a spinfection protocol (multiplicity of infection > 1): infected cells were incubated at 37 °C for 15 min and then centrifuged at 1200 rpm for 2 h at 33 °C, followed by a further half-hour incubation at 37 °C and three rounds of washing with RPMI. The infected cells were then replenished with fresh medium and incubated at 37 °C for seven days. Then, uninfected H9 T-cells were used as the control and incubated for 7 days in conditions identical to the infected T-cells (37 °C, 5% ). Cellular RNA from infected and uninfected cells was extracted for sequencing 7 days post-infection using Qiagen’s RNeasy Mini Kit.

2.3. RNA Sequencing

Paired-end (150 bp × 2) RNASeq data were generated using the Illumina HiSeq sequencing platform. The uninfected sample generated 82 million paired-end reads (24 Gbp) with a mean fragment size of 377 bp, and the infected sample generated 76 million reads (23 Gbp) with a mean fragment size of 361 bp. Illumina adaptors were removed using Trimmomatic Version 0.3, and low-quality reads were dropped using fastq_quality_filter [32]. Ribosomal RNA was depleted, and poly-A RNA was selected prior to sequencing. The sequence data generated were of high quality. The average quality score for the forward reads was >34, and that for the reverse reads was >33 (a Q-score of 30 is considered optimal). The transcriptomics data are available at the NCBI’s Gene Expression Omnibus (GEO) database, via Accession Number GSE70785.

2.4. Pipeline to Identify Differentially-Expressed Pseudogenes

The filtered reads were mapped to the human reference transcriptome using Tophat2 Version 2.0.11 [33]. The human reference genome (fasta file: Homo_sapiens.GRCh37.57.dna.toplevel.fa.gz, indexed using Bowtie2 Version 2.2.1 [34]) and the most recent human gene model (Gene Transfer Format (GTF) file: Homo_sapiens.GRCh37.75.gtf) were downloaded from Ensembl. More than 90% of reads from the uninfected and the infected sample mapped to the human reference transcriptome. The mapped reads were then supplied to the Cuffdiff program of Cufflinks (Version 2.2.0) to quantify gene expression, as well as to calculate the fold change in gene expression due to HIV-1 infection [35]. Cuffdiff reports gene expression in FPKM (fragments per kilobase of transcript per million mapped reads). To avoid false positives due to noise in measuring very low expression levels, we limited ourselves to genes with expression >0.1 FPKM in both infected and uninfected cells, a threshold widely used for reliable detection of transcripts in RNASeq data [36,37,38]. The 16,243 mRNA transcripts fulfilling this criteria were further pruned to only those that showed at least a 4-fold increase or decrease in gene expression due to infection (i.e., a fold change in expression >2 or <−2). We focused on fold change in gene expression, because this quantity is comparable across genes with different basal expression levels. This resulted in 493 transcripts with at least a 4-fold change in expression, of which 46 are annotated as pseudogenes in Ensembl based on GENCODE (ENCyclopedia Of DNA Elements- genes and gene variants) annotations [26]. We then computed the p-values and false discovery rate (FDR) for these 46 differentially-expressed pseudogenes as follows: the 16,243 genes with expression level >0.1 FPKM in both infected and uninfected cells had an average (fold change in expression) of −0.037 (μ) with a standard deviation (σ) of 0.819 (the log fold change distribution is normally distributed; Supplementary Figure S1). Using this μ and σ, we calculated z-scores for the pseudogene change in gene expression as , where x is the (fold change). p-values for the two-sided test were determined as the cdf of the normal distribution, and the FDR (Q-value) was calculated using the qvalue function of R [39]. The computational pipeline to identify differentially-expressed pseudogenes is outlined in Figure 1.
Figure 1

Computational pipeline to identify differentially-expressed pseudogenes from transcriptomic data. The transcriptomic data collected from HIV-1 infected and uninfected H9 T-cells 7 days post-infection were mapped to the human reference transcriptome after quality control analysis. Differentially-expressed pseudogenes were identified following several stringent filtering steps that are the accepted standard in the field: at least >4-fold change in expression, p-value < 0.01 and FDR < 0.25.

Computational pipeline to identify differentially-expressed pseudogenes from transcriptomic data. The transcriptomic data collected from HIV-1 infected and uninfected H9 T-cells 7 days post-infection were mapped to the human reference transcriptome after quality control analysis. Differentially-expressed pseudogenes were identified following several stringent filtering steps that are the accepted standard in the field: at least >4-fold change in expression, p-value < 0.01 and FDR < 0.25.

2.5. Bioinformatics Analysis of Candidate Pseudogenes

Human proteins matching the translated pseudogene sequence were identified by BLASTX against the GENCODE/Ensembl database using a cutoff E-value of . BLASTX hits that were uncharacterized novel protein coding genes are not reported. cDNAs (transcripts/splice variants) matching pseudogene sequences were identified using BLASTN against the Ensembl cDNA database using a cutoff E-value of and sequence identity of >90%. cDNA hits for uncharacterized genes and other pseudogenes were ignored.

2.6. Analyses of Publicly-Available Transcriptomic Data

We obtained transcriptomes from HIV-1-infected cells 12 h and 24 h post-infection from the Gene Expression Omnibus database (GEO ID: GSE53993) [40]. The study authors infected human T-cell line SUP-T1 (obtained from the American Type Culture Collection) with the HIV-1 strain LAI (obtained from the NIH AIDS Research and Reference Reagent Program). The authors submitted the RNASeq data to the GEO database for the following: from HIV-1-infected cells 12 h post-infection (3 replicates), mock infected cells 12 h post-infection (3 replicates; mock infection with SUP-T1 cell-conditioned medium), HIV-1-infected cells 24 h post-infection (3 replicates) and mock infected cells 24 h post-infection (2 replicates) [40]. We analyzed these RNASeq datasets as described above: quality control followed by mapping the reads to human reference transcriptome using Tophat2 and quantifying gene expression using Cuffdiff to determine the fold change in gene expression. This generated 9 sets of gene expression comparisons at the 12-h time point (3 replicates of Mock transcriptomes compared to 3 replicates of HIV-1 infection transcriptomes: a total of 9 comparisons) and 6 sets of gene expression comparisons at the 24-h time point (2 replicates of Mock × 3 replicates of HIV-1 infection transcriptomes). These 15 transcriptomics datasets had 14–23 million paired-end reads each, with a 73%–88% overall read mapping rate to the reference human transcriptome. Supplementary Table S2 lists the fold change in gene expression at 12 h and 24 h post-infection for the pseudogenes and their parent genes.

3. Results

Using a p-value < 0.01 as a cutoff, we identified 13 pseudogenes as significantly over-expressed and 17 as significantly under-expressed in HIV-1-infected T-cells (Table 1). Using an FDR cutoff of 0.25, as previously used for exploratory gene expression analyses [41,42], we further narrowed the number of differentially-expressed pseudogenes to 21 (eight over-expressed and 13 under-expressed; shaded rows in Table 1). Most of the over-expressed pseudogenes are processed (Table 1). We also find several uncharacterized pseudogenes as differentially expressed (pseudogene names with prefix RP followed by a number).
Table 1

Differentially-expressed pseudogenes in HIV-1 infection 7 days post-infection. Pseudogene expression is measured from RNASeq data as fragments per kilobase of transcript per million mapped reads (FPKM). Pseudogenes with a p-value < 0.01 and FDR < 0.25 are shaded and are deemed differentially expressed.

PseudogeneTypeExpression in Uninfected T-CellsExpression in HIV-1 Infected T-Cellslog2 (Fold Change)p-valueQ-value
A. Over-expressed
DUTP1Processed0.1592.3213.8691.85e-060.0012
RP1-89D4.1Processed0.2001.3772.7820.00060.0696
RP11-720N19.1Processed0.1020.6762.7220.00070.0824
UBE2FP1Processed1.2777.7162.5950.00130.1183
RP11-170L3.6Processed0.3121.8872.5950.00130.1183
MTATP8P2Processed2.74416.5262.590.00130.1193
RP11-265N6.3Processed0.1060.6172.5360.00170.1348
MTND4P15Unprocessed0.3622.0972.5350.00170.1348
IFNL3P1Unprocessed0.1280.6322.3050.00420.2537
RP11-44M6.3Processed0.2551.2352.2750.00480.2693
CTD-2611O12.6Processed0.1830.8822.2660.00490.2736
HMGB3P24Processed0.110.5042.1910.00650.3153
AOC4PUnprocessed0.1260.5462.1210.00840.3688
B. Under-expressed
RP11-380G5.3Processed1.1310.103−3.4632.868e-050.0103
ZNF137PUnprocessed1.2030.141−3.0920.00020.0353
RP11-114F3.5Processed0.9240.113−3.0340.00030.0415
SCML2P2Processed0.9550.119−3.0030.00030.0439
ANTXRLP1Unprocessed1.5070.207−2.8650.00060.0686
HNRNPA3P6Processed0.9660.138−2.8120.00070.078
RP1-224A6.8Processed0.9030.129−2.8080.00070.0787
AC010733.5Processed3.6830.555−2.7310.0010.1026
RP11-411B10.4Unprocessed0.9430.152−2.6320.00150.1284
RP11-490K7.4Processed2.1870.358−2.6130.00170.1345
KLHL2P1Unprocessed0.6920.113−2.6110.00170.1348
RP11-471L13.3Processed1.60.263−2.6080.00170.1348
CTD-2008A1.2Unprocessed1.6520.276−2.580.00190.1457
GUSBP2Unprocessed1.1240.217−2.3750.00430.2551
RP11-1166P10.1Unprocessed0.7660.154−2.3110.00550.2849
AKR7A2P1Processed0.460.1−2.1990.00830.3665
ADAMTS7P4Unprocessed1.060.234−2.1820.00880.3744
Differentially-expressed pseudogenes in HIV-1 infection 7 days post-infection. Pseudogene expression is measured from RNASeq data as fragments per kilobase of transcript per million mapped reads (FPKM). Pseudogenes with a p-value < 0.01 and FDR < 0.25 are shaded and are deemed differentially expressed.

3.1. Pseudogenes Derived from Over-Expressed Genes

Over-expressed genes are more likely to form processed pseudogenes [43]. In our analysis, we find that several differentially-expressed pseudogenes are derived from highly-expressed genes: MTATP8P2, MTND4P15, RP1-89D4.1, RP11-380G5.3, AC010733.5. It is likely that these pseudogenes are observed as differentially expressed largely due to fluctuations in the expression of their parent genes. To investigate the putative function of differentially-expressed pseudogenes that are not derived from over-expressed protein-coding genes, we identified human proteins (Table 2; BLASTX hits against GENCODE using a cutoff E-value of ) and transcripts/splice variants (Table 3; BLASTN hits against the Ensembl cDNA database using a cutoff E-value of and sequence identity of > 90%) that are similar to these pseudogene transcripts or their translated products.
Table 2

BLASTX hits of differentially-expressed pseudogenes. Only hits to known protein-coding genes with E-value < are shown. “fc” denotes fold change in gene expression; “Mock” represents gene expression in uninfected T-cells; and “HIV” denotes gene expression in HIV-1-infected T-cells 7 days post-infection.

PseudogeneChromosomelog2(fc)BLASTX HitsChromosomeMockHIVlog2(fc)
A. Over-expressed
DUTP133.869DUT15181.046128.614−0.812
UBE2FP132.595UBE2F217.44820.2620.216
MTATP8P222.59MT-ATP8MT1.3e+051.5e+050.144
MTND4P1592.535MT-ND4MT3.8e+043.6e+04−0.07
RP1-89D4.1112.782RPS24106416.65334.27−0.267
RP11-720N19.1172.722MSANTD399.6949.455−0.036
RP11-170L3.6162.595IGHV4-31140.8060-inf
IGHV4-3914---
RP11-265N6.3152.536MYL12B18100.483126.4510.332
MYL12A18255.43271.6750.089
B. Under-expressed
ZNF137P19−3.092ZNF816194.1783.376−0.307
ZNF813191.8460.919−1.007
ZNF845194.7724.083−0.225
ZNF831912.518.327−0.587
FKSG61146.176.2430.017
SCML2P216−3.003SCMH116.9145.049−0.453
ANTXRLP110−2.865ANTXRL100.0190-inf
HNRNPA3P63−2.812HNRNPA32330.731185.908−0.831
HNRNPA1121293.29905.014−0.515
KLHL2P14−2.611TMEM1351117.78715.472−0.201
KLHL249.69911.4240.236
MYB687.1578.048−0.159
PKN11925.75527.8040.11
CTD-2008A1.215−2.58SORD1525.1612.187−1.046
RP11-380G5.310−3.463RPL1112653.492219.2−0.258
RP11-114F3.512−3.034HKR11917.20612.676−0.441
CRLF2X0.1491.5573.387
TSEN2310.776.858−0.651
SEPSECS44.8422.855−0.762
MRPS25331.18128.143−0.148
CLTA924.20534.0360.492
RP1-224A6.81−2.808MPHOSPH61622.65722.301−0.023
AC010733.52−2.731RPS1267413.978991.730.278
RP11-411B10.418−2.632VN1R419---
VN1R219---
VN1R1194.2731.546−1.466
LEPRE116.658.3240.324
RP11-490K7.41−2.613GTF2A215108.131143.9930.413
STAP21919.6520.5890.067
STARD101149.33847.267−0.062
ADAM101537.31821.645−0.786
RP11-471L13.317−2.608DYNLT16204.754140.976−0.539
DYNLT3X16.79920.6540.298
Table 3

BLASTN hits (transcripts/splice variants) for differentially-expressed pseudogenes. Only hits to known protein-coding genes with E-value < and sequence similarity >90% are shown. “fc” denotes fold change in gene expression; “Mock” represents gene expression in uninfected T-cells; and “HIV” denotes gene expression in HIV-1-infected T-cells 7 days post-infection.

PseudogeneChromosomelog2(fc)BLASTN HitsChromosomeMockHIVlog2–(fc)
A. Over-expressed
DUTP133.869-----
UBE2FP132.595-----
MTATP8P222.59MT-ATP8MT1.3e+051.5e+050.144
MT-ATP6MT133,475147,4770.144
MTND4P1592.535-----
RP1-89D4.1112.782RPS24106416.65334.27−0.267
RP11-720N19.1172.722MSANTD399.6949.455−0.036
RP11-170L3.6162.595IGHVII-15-114000
IGHV4-3414000
RP11-265N6.3152.536MYL12B18100.483126.4510.332
B. Under-expressed
ZNF137P19−3.092PIGL1739.280137.686−0.06
TBC1D9B524.22748.8921.013
C2CD3115.7056.2850.14
HLA-DQA1647.24135.167−0.426
ORC61614.99119.3020.365
SCML2P216−3.003-----
ANTXRLP110−2.865ANTXRL100.0190-inf
SOD2660.695112.1510.886
HNRNPA3P63−2.812HNRNPA32330.731185.908−0.831
KLHL2P14−2.611KLHL249.69911.4240.236
FOXK174.5286.0160.41
DNAJC21525.75517.957−0.52
BIRC51735.42721.565−0.716
TRIM59370.34843.319−0.699
CTD-2008A1.215−2.58SORD1525.1612.187−1.046
RP11-380G5.310−3.463DOK125.48311.7811.104
RP11-114F3.512−3.034IDSX2.8913.4130.239
AS3MT1020.48716.317−0.328
THAP647.9244.931−0.684
ZYG11B14.2752.514−0.766
ABCC1068.5176.681−0.35
RP1-224A6.81−2.808MPHOSPH61622.65722.301−0.023
AC010733.52−2.731RPS1267413.978991.730.278
RP11-411B10.418−2.632ADC10.7681.1480.58
SLC2A510.4980.361−0.464
SETD2312.0649.976−0.274
RP11-490K7.41−2.613GTF2A215108.131143.9930.413
RP11-471L13.317−2.608DYNLT16204.754140.976−0.539
Several pseudogenes have their parent genes as the best (and sometimes the only) BLASTX or BLASTN hit (Table 2 and Table 3). A pseudogene may have a regulatory function if its expression correlates (positively or negatively) with that of its parent or other similar genes. BLASTX hits of differentially-expressed pseudogenes. Only hits to known protein-coding genes with E-value < are shown. “fc” denotes fold change in gene expression; “Mock” represents gene expression in uninfected T-cells; and “HIV” denotes gene expression in HIV-1-infected T-cells 7 days post-infection. BLASTN hits (transcripts/splice variants) for differentially-expressed pseudogenes. Only hits to known protein-coding genes with E-value < and sequence similarity >90% are shown. “fc” denotes fold change in gene expression; “Mock” represents gene expression in uninfected T-cells; and “HIV” denotes gene expression in HIV-1-infected T-cells 7 days post-infection.

3.2. Pseudogenes with Antagonistic Expression to the Parent Genes

If pseudogene transcripts are acting as molecular sponges for mRNA stability factors and miRNA, then their expression will negatively correlate with that of the parent gene. We find several pseudogene:gene pairs where pseudogene expression is antagonistic to their BLASTX/BLASTN hits (Table 3 and Table 4). The GO terms for the TBC1D9B protein product suggest its function as a regulator of Rab-GTPase activity. Another GTPase (Rho-GTPase) is required for HIV-1 replication [44,45], but the role of Rab-GTPases in the HIV-1 life cycle is not clear. However, it has been shown that Rab GTPase-activating proteins bind Hepatitis C virus protein NS5A to mediate viral replication [46]. The average gene expression change from nine comparisons at 12 h and 6 h post-infection also show upregulation of TBC1D9B and a >2-fold increase in gene expression seven days post-infection (Table 4). Thus, the interaction between ZNF137P and TBC1D9B transcripts may be implicated in HIV-1 replication.
Table 4

Differentially-expressed pseudogenes in early and late HIV-1 infection. (fold change) of only those pseudogene:gene pairs are shown where the parent gene shows a >1.5-fold change in expression due to HIV-1 infection, i.e., (fold change) either >0.585 (upregulated) or <−0.585 (downregulated). The time points (12 h, 24 h or 7 days post-infection) where the genes show a >1.5-fold change in expression are highlighted in bold. The genes that are upregulated or downregulated in all of the replicate observations (nine observations for 12 h and six observations for 24 h time-points) are identified by *. “on” genes are those that show detectable expression in HIV-1-infected cells only, and “off” genes show detectable expression in uninfected cells only. The (fold change) at the 12-h and 24-h time points is the average (fold change) over replicate observations. Supplementary Table S2 lists (fold change) in each replicate. ANTXRL did not have detectable gene expression in any of the nine observations at 12 h post-infection.

Pseudogene12 h24 h7 dParent Gene12 h24 h7 d
DUTP1−0.908on3.869DUT0.245−0.527 *−0.812
RP11-265N6.31.082on2.536MYL12A0.619 *−0.624 *0.089
ZNF137P−0.031−1.007−3.092TBC1D9B0.2290.18 *1.013
C2CD3−0.6−0.21 *0.14
ZNF813−0.212−0.298 *−1.007
ZNF830.0620.62 *−0.587
ANTXRLP1off−3.75−2.865ANTXRL-−1.086off
HNRNPA3P60.406−3.517−2.812HNRNPA3−0.231−0.962 *−0.831
HNRNPA1−0.14−0.903 *−0.515
KLHL2P1on0.017−2.611KLHL2−0.0430.790.236
FOXK1−0.68 *−1.075 *0.41
BIRC5−0.238−1.163 *−0.716
TRIM59−0.1−0.22−0.699
MYB−0.694 *−1.915 *−0.159
CTD-2008A1.20.131−0.319−2.58SORD0.055−1.928 *−1.046
RP11-380G5.3−0.0830.285−3.463DOK10.494−0.252 *1.104
RP11-114F3.5−0.438−0.0628−3.034CRLF2−1.08on *3.387
TSEN2−0.423 *−1.647 *−0.651
SEPSECS−0.003−0.213 *−0.762
CLTA−0.05−1.255 *0.492
THAP60.474 *0.806 *−0.684
ZYG11B−0.099−0.093−0.766
ABCC10−0.594 *0.419 *−0.35
RP1-224A6.80.096on−2.808MPHOSPH60.123−0.618 *−0.023
RP11-411B10.40.02−0.53−2.632ADC0.243.235 *0.58
SLC2A5−1.6981.908 *−0.464
VN1R10.296 *−0.425 *−1.466
RP11-490K7.40.3460.069−2.613GTF2A20.031−0.83*0.413
ADAM10−0.115−0.12−0.786
STAP2−0.3580.7040.067
RP11-471L13.3on0.839−2.608DYNLT10.323 *0.626 *−0.539
DYNLT3−0.099−0.67 *0.298
Differentially-expressed pseudogenes in early and late HIV-1 infection. (fold change) of only those pseudogene:gene pairs are shown where the parent gene shows a >1.5-fold change in expression due to HIV-1 infection, i.e., (fold change) either >0.585 (upregulated) or <−0.585 (downregulated). The time points (12 h, 24 h or 7 days post-infection) where the genes show a >1.5-fold change in expression are highlighted in bold. The genes that are upregulated or downregulated in all of the replicate observations (nine observations for 12 h and six observations for 24 h time-points) are identified by *. “on” genes are those that show detectable expression in HIV-1-infected cells only, and “off” genes show detectable expression in uninfected cells only. The (fold change) at the 12-h and 24-h time points is the average (fold change) over replicate observations. Supplementary Table S2 lists (fold change) in each replicate. ANTXRL did not have detectable gene expression in any of the nine observations at 12 h post-infection. It is known that HIV’s Tat protein downregulates SOD2 transcripts in mitochondria, leading to mitochondria-mediated apoptosis in infected cells [47], making the pair ANTXRLP1:SOD2 of interest (Table 3). RP11-411B10.4’s parent gene ADC is upregulated in HIV-1 infection (more than an eight-fold increase in gene expression at the 24-h time point; Table 4). ADC encodes arginine decarboxylase, an enzyme that metabolizes L-arginine to agmatine. This is interesting, because increased metabolism of L-arginine impairs lymphocyte response to antigens [48]. Thus, even though ADC expression is not linked to HIV-1 infection, upregulation of arginine decarboxylase appears to be an effective viral strategy to evade host immune response.

3.3. Pseudogenes with Synergistic Expression to the Parent Genes

Sometimes, pseudogene transcripts compete with the parent gene transcript for silencing non-coding RNAs, as has been shown for tumor suppressor gene PTEN and its pseudogene PTENP1 [20]. Pseudogene and parent gene (or other genes with sequence similarity to the pseudogene) expression are positively correlated under this model of pseudogene:gene interaction. In HIV-1 infection, we find the following pseudogene:gene pairs with a positively correlated decrease in expression: ZNF137P:ZNF813, ANTXRLP1:ANTXRL, HNRNPA3P6:HNRNPA3/HNRNPA1, CTD-2008A1.2:SORD, KLHL2P1:BIRC5, KLHL2P1:TRIM59, KLHL2P1:MYB, RP11-114F3.5:TSEN2, RP11-114F3.5:SEPSECS, RP11-114F3.5:ZYG11B and RP11-490K7.4:ADAM10. Of these, only a few parent genes code for proteins that are known to have a role in viral infection, such as HNRNPA3/HNRNPA1. The heterogeneous nuclear ribonucleoproteins (hn-RNPs) bind to pre-mRNA and are involved in presenting the mRNA to the splicing machinery and transporting mRNA to the cytoplasm. Downregulating hn-RNP A proteins promotes viral production due to efficient alternative splicing of viral mRNA transcripts and their trafficking [49]. Further, upregulation of hn-RNP A1 results in a 100-fold decline in virus production [50]. We find that pseudogene HNRNPA3P6, its parent gene HNRNPA3 and a close paralog of the parent gene HNRNPA1 are under-expressed in HIV-1 infection (Table 2 and Table 3). It is thus possible that the virus exploits the regulatory influence of pseudogene HNRNPA3P6 on HNRNPA3 and HNRNPA1 to downregulate hn-RNP proteins and promote virus production. Several functional genes similar to pseudogene KLHL2P1 are downregulated in HIV-1 infection (Table 4). BIRC5 inhibits apoptosis, and its downregulation in both early and late HIV-1 infection could contribute to HIV-associated depletion of T-cells [51]. While the TRIM family of proteins is important in innate immune responses [52], we find TRIM59 to be downregulated in HIV-1 infection (Table 4). MYB encodes a transcriptional activator protein that binds to the HIV-1 long terminal repeat (LTR) and activates viral transcription [53]. Consistent downregulation of MYB expression could thus be the host response to mitigate HIV-1 transcription. Pseudogene RP11-490K7.4’s functional relative ADAM10 encodes a cell surface protein that has both adhesion and protease domains and has been shown to facilitate nuclear trafficking of HIV-1 [54]. Another protein coding gene similar to this pseudogene, GTF2A2, is found downregulated in uninfected T-cells in response to exposure to the HIV-1 gp120/V3 epitope [55], similar to our observation at 24 h post-infection.

3.4. Pseudogenes and Their Parent Genes with Modulating Gene Expression in Early and Late HIV-1 Infection

As the expression of pseudogene DUTP1 goes from a >1.5-fold decrease at 12 h post-infection to a >8-fold increase at seven days post-infection, its parent gene expression is modulated as well from over-expressed to under-expressed (Table 4). DUT codes for a protein that hydrolyzes the dUTP to dUMP reaction, and its downregulation will increase cellular dUTP levels. Heavy uricilation of HIV reverse transcripts is common and promotes chromosomal integration of the viral genome [56]. Thus, over-expression of DUTP1 to downregulate DUT expression may be the cellular response to inhibit the integration of uricilated viral transcripts into the host genome. Pseudogene RP11-265N6.3 is consistently upregulated in HIV-1 infection with consistent expression at 12 h and 24 h, and a >4-fold increase in expression at seven days post-infection. However, its parent gene MYL12A shows modulating gene expression at 12 h and 24 h post-infection. MYL12A is involved in the regulation of the actin cytoskeleton and is found to be upregulated in HIV/HCV co-infection [57]. HIV-1 proteins are known to reorganize the actin cytoskeleton to optimize virion production [58]. While pseudogene ZNF137P is increasingly downregulated as the infection progresses, its parent gene ZNF83, an anti-inflammatory gene [59], is over-expressed by a factor of 1.5 in all six datasets at 24 h post-infection, but is under-expressed seven days post-infection (Table 4). In contrast, FOXK1 (parent gene of KLHL2P1) is downregulated only in early viral infection (a >1.5-fold decrease in expression and downregulation in all nine datasets at 12 h and six datasets at the 24-h time point; Table 4). FOXK1 inhibited interferon production in Sendai virus infection, a single-stranded RNA virus [60], and its downregulation in early HIV-1 infection likely contributes to triggering the immune response. In the RP11-114F3.5:CRLF2 pair, the CRLF2 gene shows a >2-fold downregulation 12 h post-infection, followed by an increase in gene expression at 24 h (as reflected by the “on” signal in all six transcriptomic comparisons at 24 h post-infection), leading to a >8-fold increase in expression in infected cells seven days post-infection (Table 4). Thus, it appears that the CRLF2 gene expression went from downregulated in early infection to significantly upregulated in the late HIV-1 infection. CRLF2 encodes a receptor for cytokine TSLP, whose over-production in HIV-1 infected mucosal epithelial cells amplifies HIV infection in activated CD4+ T-cells [61]. It is thus important to understand how CRLF2 expression is regulated in T-cells and if pseudogene RP11-114F3.5 has any influence on its post-transcriptional regulation. ABCC10, a functional protein similar to pseudogene RP11-114F3.5, codes for the multidrug resistance-associated protein 7 (MRP7) that transports molecules, such as xenobiotics, across intra- and extra-cellular membranes [62]. HIV-Tat is known to upregulate the expression of another transporter from the same family (MRP1), a cell-surface protein of the cells of the blood brain barrier that remove anti-viral drugs from the central nervous system [63]. It is thus likely that modulating gene expression of ABCC10 (downregulated in all nine datasets at 12 h and upregulated in all six datasets at 24 h post-infection) might be interfering with the transport of molecules across HIV-1-infected T-cells. Interestingly, another transporter encoding gene (SLC2A5) shows modulating gene expression, as well (a >3-fold decrease in expression at 12 h and a >3-fold increase in expression at the 24-h time point; Table 4). Pseudogene RP11-471L13.3 is over-expressed by a factor of >1.5 at 24 h and shows a >4-fold decrease in expression seven days post-infection. Its putative parents genes (DYNLT1 and DYNLT3) are dynein motor proteins that carry cargo, such as proteins, across the cellular microtubules; dynein proteins are thought to associate with viral proteins during infections [64]. Other pseudogene:gene pairs with modulating gene expression in early and late HIV-1 infection, but with unknown roles in viral infection, are RP11-380G5.3:DOK1 and RP11-114F3.5:THAP6.

4. Discussion

Although Vanin characterizes pseudogenes as “related but defective” [22], this sentiment has seen a shift in recent years. With the accumulating evidence of functional consequences of pseudogene transcription, the term “pseudogene” should be revisited [1]. A pseudogene capable of exerting regulatory control on other genes via its RNA intermediates is functionally active even if it does not code for a protein. Although most pseudogenes are thought to evolve neutrally [3,65], some transcribed pseudogenes exhibit patterns of non-neutral evolution [66]. The contribution of pseudogenes to the non-coding small-RNA pool that regulates the expression of a wide range of genes provides a possible explanation for their evolutionary maintenance. About a fifth of human pseudogenes are transcribed [3], and thus might be candidates for purifying selection if they participate in gene expression regulation. For a retrovirus with a pre-packaged reverse transcriptase and integrase, this presents an opportunity to exploit a layer of gene expression regulation for its own benefit. Here, we present evidence that suggests the involvement of pseudogenes in the regulation of cellular gene expression in the early stages of a retroviral infection. Our computational pipeline for detecting differentially-expressed pseudogenes relies on software that is widely used and accepted by the scientific community. Gene expression is normalized by gene length to account for the fact that longer genes end up producing more sequencing reads than shorter genes. Although it is desirable to have replicates in the sequencing experiment design, replicate data are usually generated by pooling the replicate samples together in a single sequencing run, reducing the data generated per sample. While we analyzed multiple replicate transcriptomes at 12 h and 24 h post-infection (which likely increased specificity) the single transcriptome at seven days post-infection time point had four times the sequencing data (≈80 million reads compared to the ≈20 million reads at the 12-h and 24-h time points). This high sequencing depth combined with high quality reads and a mean fragment size of >350 (150 × 2 paired-end sequencing) for both uninfected and infected samples improves sensitivity in detecting differentially-expressed genes and adds confidence to our predictions. Both increasing sequencing depth and biological replication have been shown to increase the power of detecting differentially-expressed genes [67]. Strongly upregulated protein-coding genes in our data (the seven-day time point) are implicated in HIV-1 infection, lending further confidence to our analysis (Supplementary Table S1). While our methodology cannot find novel pseudogenes that might be created in an infected cell due to viral reverse transcriptase and integrase activity, we find a small number of known pseudogenes that are differentially expressed in HIV-1-infected cells. While the data from the 12-h and 24-h time points comes from a T lymphocyte cell-line that is different from the T-cell-line at the seven-day time point, our candidate pseudogene:gene interactions show consistent differential expression across the three time points, suggesting that these host responses are not cell-line specific. We further identify protein-coding genes (parent genes or genes that are BLASTX/BLASTN hits to the pseudogenes) whose expression is correlated with that of the differentially-expressed pseudogenes. One third of these parent genes are implicated in HIV-1 infection, suggesting pseudogene-mediated post-transcriptional regulation of gene expression in the infected cells. It should be noted that the study that experimentally confirmed the effect of pseudogene PTENP1 transcription on the mRNA levels of tumor suppressor gene parent gene PTEN showed that the change in PTEN mRNA expression was less than two-fold, yet the pseudogene PTENP1 is “selectively lost in human cancer” [20]. Thus, we restricted our focus to those parent genes that show a >1.5-fold change in gene expression at 12 h, or 24 h, or seven days post-infection (Table 4). It is possible that these parent genes are differentially expressed in HIV-1 infection due to other factors in the complex gene-regulatory machinery of the cell, but the significant change in expression in their pseudogenes suggests pseudogene-mediated gene expression regulation. Experimental validation of regulatory interactions between the candidate pseudogene:gene pairs is needed to confirm if viruses exploit this newly-discovered layer of post-transcriptional regulation of gene expression.

5. Conclusions

In this study, we find that instead of being silent spectators, pseudogenes are differentially expressed in HIV-1 infection. While non-coding RNAs are now considered integral to the host response to viral infections, our findings indicate for the first time that RNAs derived from pseudogene transcripts are a part of the cellular non-coding RNA pool, and may regulate expression of genes implicated in viral infections. Functional relevance of pseudogene transcription has been experimentally verified in cancer but remains to be validated in viral infections. We anticipate that pseudogene transcription will be of significance in other cellular processes and in host responses to the environment as well.
  66 in total

1.  A new role for expressed pseudogenes as ncRNA: regulation of mRNA stability of its homologous coding gene.

Authors:  Yoshihisa Yano; Rintaro Saito; Noriyuki Yoshida; Atsushi Yoshiki; Anthony Wynshaw-Boris; Masaru Tomita; Shinji Hirotsune
Journal:  J Mol Med (Berl)       Date:  2004-05-18       Impact factor: 4.599

2.  HIV-TAT protein upregulates expression of multidrug resistance protein 1 in the blood-brain barrier.

Authors:  Kentaro Hayashi; Hong Pu; Ibolya E Andras; Sung Yong Eum; Atsushi Yamauchi; Bernhard Hennig; Michal Toborek
Journal:  J Cereb Blood Flow Metab       Date:  2005-12-14       Impact factor: 6.200

3.  Human LINE retrotransposons generate processed pseudogenes.

Authors:  C Esnault; J Maestre; T Heidmann
Journal:  Nat Genet       Date:  2000-04       Impact factor: 38.330

4.  Pseudogenes as a paradigm of neutral evolution.

Authors:  W H Li; T Gojobori; M Nei
Journal:  Nature       Date:  1981-07-16       Impact factor: 49.962

5.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

6.  Mapping a dynamic innate immunity protein interaction network regulating type I interferon production.

Authors:  Shitao Li; Lingyan Wang; Michael Berman; Young-Yun Kong; Martin E Dorf
Journal:  Immunity       Date:  2011-09-08       Impact factor: 31.745

Review 7.  Subversion of the actin cytoskeleton during viral infection.

Authors:  Matthew P Taylor; Orkide O Koyuncu; Lynn W Enquist
Journal:  Nat Rev Microbiol       Date:  2011-04-27       Impact factor: 60.633

Review 8.  The emerging role of long non-coding RNAs in HIV infection.

Authors:  Daniel C Lazar; Kevin V Morris; Sheena M Saayman
Journal:  Virus Res       Date:  2015-07-26       Impact factor: 3.303

Review 9.  The multilayered complexity of ceRNA crosstalk and competition.

Authors:  Yvonne Tay; John Rinn; Pier Paolo Pandolfi
Journal:  Nature       Date:  2014-01-16       Impact factor: 49.962

10.  Generation of a pseudogene during retroviral infection.

Authors:  M B Carlton; W H Colledge; M J Evans
Journal:  Mamm Genome       Date:  1995-02       Impact factor: 2.957

View more
  4 in total

Review 1.  Protein-Coding Genes' Retrocopies and Their Functions.

Authors:  Magdalena Regina Kubiak; Izabela Makałowska
Journal:  Viruses       Date:  2017-04-13       Impact factor: 5.048

2.  Comprehensive Analysis of HERV Transcriptome in HIV+ Cells: Absence of HML2 Activation and General Downregulation of Individual HERV Loci.

Authors:  Nicole Grandi; Maria Paola Pisano; Sante Scognamiglio; Eleonora Pessiu; Enzo Tramontano
Journal:  Viruses       Date:  2020-04-23       Impact factor: 5.048

3.  Pseudogene fms-related tyrosine kinase 1 pseudogene 1 (FLT1P1) cooperates with RNA binding protein dyskeratosis congenita 1 (DKC1) to restrain trophoblast cell proliferation and angiogenesis by targeting fms-related tyrosine kinase 1 (FLT1) in preeclampsia.

Authors:  Zhenjing Chi; Yanlan Sun; Zhou Yu; Fenmei Zhou; Hairong Wang; Muling Zhang
Journal:  Bioengineered       Date:  2021-12       Impact factor: 3.269

4.  Evolution of Genome Size in Asexual Digital Organisms.

Authors:  Aditi Gupta; Thomas LaBar; Miriam Miyagi; Christoph Adami
Journal:  Sci Rep       Date:  2016-05-16       Impact factor: 4.996

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.