Literature DB >> 34575842

Computational Methods to Study Human Transcript Variants in COVID-19 Infected Lung Cancer Cells.

Jiao Sun1,2, Naima Ahmed Fahmi1,2, Heba Nassereddeen2,3, Sze Cheng4, Irene Martinez5, Deliang Fan6, Jeongsik Yong4, Wei Zhang1,2.   

Abstract

Microbes and viruses are known to alter host transcriptomes by means of infection. In light of recent challenges posed by the COVID-19 pandemic, a deeper understanding of the disease at the transcriptome level is needed. However, research about transcriptome reprogramming by post-transcriptional regulation is very limited. In this study, computational methods developed by our lab were applied to RNA-seq data to detect transcript variants (i.e., alternative splicing (AS) and alternative polyadenylation (APA) events). The RNA-seq data were obtained from a publicly available source, and they consist of mock-treated and SARS-CoV-2 infected (COVID-19) lung alveolar (A549) cells. Data analysis results show that more AS events are found in SARS-CoV-2 infected cells than in mock-treated cells, whereas fewer APA events are detected in SARS-CoV-2 infected cells. A combination of conventional differential gene expression analysis and transcript variants analysis revealed that most of the genes with transcript variants are not differentially expressed. This indicates that no strong correlation exists between differential gene expression and the AS/APA events in the mock-treated or SARS-CoV-2 infected samples. These genes with transcript variants can be applied as another layer of molecular signatures for COVID-19 studies. In addition, the transcript variants are enriched in important biological pathways that were not detected in the studies that only focused on differential gene expression analysis. Therefore, the pathways may lead to new molecular mechanisms of SARS-CoV-2 pathogenesis.

Entities:  

Keywords:  3′-UTR; COVID-19; RNA-seq; alternative polyadenylation; alternative splicing; transcript variants

Mesh:

Year:  2021        PMID: 34575842      PMCID: PMC8464664          DOI: 10.3390/ijms22189684

Source DB:  PubMed          Journal:  Int J Mol Sci        ISSN: 1422-0067            Impact factor:   5.923


1. Introduction

Recent research on the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has focused primarily on gene expression profiling in various human cell lines and patient samples in an effort to understand the molecular basis of SARS-CoV-2 pathogenesis [1,2,3]. Unsurprisingly, many of the differentially expressed host genes identified are involved in the antiviral response [1,4]. As the SARS-CoV-2 virus can trigger different levels of changes in humans, a thorough understanding of molecular signatures beyond gene expression profiling is needed to comprehensively dissect the pathogenesis of SARS-CoV-2 coronavirus. Immune activation to viral infection is regulated by changes in gene expression and post-transcriptional regulation, such as AS and APA [5,6]. A classic example of AS in immune cells is the generation of neoantigens, novel peptides of immune proteins for antibody diversification in normal conditions [7]. However, during viral infection by the influenza A virus, the virus can induce widespread AS in the host genes [5,7,8]. Mechanically, viral proteins can interact with splicing and polyadenylation factors and cause misprocessing of the host mRNAs [8]. Some of these isoforms may encode unique neoantigens or disrupt critical functional domains, thereby promoting viral replication and production [7,8,9]. These studies stress the importance of the RNA-processing regulation upon viral infection. A recent study suggests that the NSP16 viral protein of SARS-CoV-2 can interact with the human U1 and U2 snRNA, thereby changing the AS landscape in humans [10]. This warranted our investigation of the host splicing alteration during SARS-CoV-2 infection to further reveal the molecular basis of the COVID-19 disease. Here, RNA-seq data of SARS-CoV-2 infected lung cancer (A549) cells are used to investigate the SARS-CoV-2-inducted remodeling of the host transcriptome [1]. By utilizing computational methods and pipelines, including custom-developed tools, we have identified global AS and APA (Figure 1) changes upon SARS-CoV-2 infection in human lung epithelial cells. These widespread splicing signatures rarely overlap with findings from differential gene expression studies and are enriched in many previously undetected pathways critical for SARS-CoV-2 pathobiology.
Figure 1

Alternative splicing (AS) and alternative polyadenylation (APA). (A) Schematic representation of the five major types of alternative splicing in eukaryotes: Skipped Exon (SE), Retained Intron (RI), Alternative 3 Splice Site (A3SS), Alternative 5 Splice Site (A5SS), and Mutually Exclusive Exon (MXE). Light blue boxes represent the constitutive exons, whereas light yellow and pink exons represent the spliced ones. (B) Two types of alternative polyadenylation, i.e., coding region alternative polyadenylation (CR-APA) and 3-UTR alternative polyadenylation (UTR-APA), and their impact on the functional proteome. PAS: polyadenylation signal; TSS: transcription start site; RBP: RNA-binding protein.

2. Results

In this study, multiple bioinformatics pipelines and statistical methods were used to identify the molecular signatures of COVID-19 at the transcriptional and post-transcriptional levels. The objective is to achieve a better understanding of the impact of transcriptomic changes on the proteome in COVID-19. In this section, we first describe previously published RNA-seq data of COVID-19 [1] that are used in this study (Section 2.1). Then in the following subsections, we focus on reporting the results of our analyses on transcriptome changes using these datasets: we will extensively discuss the alterations of splicing and polyadenylation events as well as the differential expression of genes/transcripts upon SARS-CoV-2 infection in A549 human lung cancer cells.

2.1. Data

The RNA-seq data used in this study were downloaded from the NCBI Gene Expression Omnibus (GEO) database, under the accession number GSE147507 [1]. The dataset contains independent biological triplicates that were mock-treated or infected with SARS-CoV-2. Series 2, 5, and 7 in the dataset were analyzed in this study. Series 2 and 5 consist of transformed lung alveolar (A549) samples that are mock-treated or infected with SARS-CoV-2. Series 7 includes samples from transformed lung-derived Calu-3 cells that are mock-treated or infected with SARS-CoV-2. We used TopHat2 [11] to conduct sequence alignment. These three pairs of cell line data (Series 2, 5, and 7) with a high overall alignment rate were selected for downstream analyses (Supplementary Table S1). Series 5 results are discussed in this manuscript, whereas results for the other two series can be found in the Supplementary Materials (Supplementary Tables S2–S8). The RNA-seq data were quantified with Kallisto [12] for differential transcript and gene expression analyses. To generate the read coverage profile, SAMtools [13] was used with the aligned bam files as an input.

2.2. Alternative Splicing

Five major AS events between groups of mock-treated and SARS-CoV-2 infected A549 cells are reported in Table 1. The table displays information regarding the number of AS events in each category. Overall, more AS events were detected in SARS-CoV-2 infected cells than in mock-treated ones. Among the five types of AS events, the Skipped Exon (SE) category was found to be the most common type; 51 AS events were identified in the mock-treated cells, and 126 AS events were identified in the SARS-CoV-2 infected cells. Notably, compared to other types of AS events, the AS event for alternative 5’ splice site (A5SS) was more frequent in the SARS-CoV-2 infected cells. We further investigated the top 10 significant events for the AS type SE among the two groups (Table 2). The AS events are sorted in ascending order by the p-value, as defined in the Chi-squared hypothesis test. The column ‘Ratio Difference’ denotes the differences in the ratio of average read coverage between the SARS-CoV-2 infected and mock-treated cells. A positive value for the ratio difference indicates that the mock-treated cells have more SE splicing events compared to the SARS-CoV-2 infected cells, and a negative value means the other way around.
Table 1

Number of detected significant splicing events in each category between SARS-CoV-2 infected and mock-treated cells.

SampleSERIMXEA3SSA5SS
Mock51167136
SARS-CoV-2126301959100
Table 2

Top 10 significant splicing events between A549 mock-treated and SARS-CoV-2 infected cells.

Gene NameChrStartEndp-ValueFDRRatio Difference
TPT1chr1345,914,84645,914,9205.81 × 10187.5 × 1014−0.074
C6ORF48chr63,118,9553,119,0499.21 × 10155.94 × 1011−0.113
FKBP1Achr201,373,4771,373,5252.25 × 10149.67 × 1011−0.168
PPIAchr744,838,34644,838,4131.63 × 10125.25 × 1090.149
HNRNPA1chr1254,676,86254,677,0181.26 × 10113.26 × 108−0.248
RPS24chr1079,799,96179,799,9835.25 × 10111.13 × 1070.122
RPS9chr1954,710,42054,710,5924.59 × 10108.46 × 1070.117
SRSF2chr1774,731,85374,731,9571.24 × 1071.78 × 1040.172
CA12chr1563,638,72863,638,9081.66 × 1071.99 × 104−0.103
RPLP1chr1569,745,98569,746,0601.69 × 1071.99 × 1040.040
From the differential AS analysis performed in Table 1, we conducted a literature survey of the genes with significant AS events. Three of them are implicated in SARS-CoV-2 pathogenesis and prognosis as shown in Table 3. In addition, the genes with the splicing events were analyzed with the DAVID functional annotation tool [14]. The enriched KEGG pathways in mock-treated and SARS-CoV-2 infected cells are shown in Figure 2A. Interestingly, genes with AS events are enriched in ribosome and spliceosome, suggesting that SARS-CoV-2 infected cells undergo changes in their gene expression pathways.
Table 3

Literature review of identified genes with transcript variants and their implication on SARS-CoV-2 pathogenesis and prognosis.

CategoryGeneRef.Description
Alternative SplicingBTF3[15]Interacts with the NSP10 CoV protein, which is involved in the pathological function of SARS-CoV in cells.
FKBP1A[16]FKBP1A causes immunosuppression and is required by CoV for viral growth.
G3BP1[17]SARS-CoV-2 N protein undergoes liquid–liquid phase separation, which serves as a scaffold for virus replication and assembly, through its N-terminal intrinsically disordered region (IDR) with G3BP1.
UTR-APAANXA2[18]The upregulation of expression of annexin A2 (ANXA2) by SARS-associated cytokines and the cross-reactivity of anti-SARS-CoV S2 antibodies to annexin A2 may have implications in SARS disease pathogenesis.
CAV1[19]Coronaviruses enter cells via the CAV1 dependent pathway.
TMEM97[20]TMEM97 forms a complex with ACE2 and modulates its ability to internalize the SARS-CoV-2.
CR-APACTSC[21]CTSC activates the elastase-related neutrophil proteases mediated tissue degradation in which it diffuses the alveolar inflammation in acute respiratory distress syndrome.
RHOA[22]Activation of RhoA GTPase and its downstream effector, Rho kinase (ROCK), contributes to a burst in inflammatory features, immune cell migration, apoptosis, coagulation, contraction, and cell adhesion in pulmonary endothelial cells, leading to endothelium barrier dysfunction and edema as hallmarks of lung injury.
CANX[23]Calnexin (CANX) strictly monitors the maturation of the CoV S protein by its direct binding.
Differential ExpressionBCL2A1[24] Pro-survival gene mostly present in adult genes, if downregulated, promotes apoptosis in lung tissue.
SKP2[25]SKP2 attenuates autophagy through Beclin1-ubiquitination and allowing for the replication of coronaviruses.
Figure 2

(A) KEGG pathways enriched by alternative splicing events. The blue and red bar charts show the pathways enriched by the splicing events in mock-treated cells and SARS-CoV-2 infected cells, respectively. (B) A plot for the SDCCAG3 AS event. The spliced exon is highlighted in orange. The first two subplots show the read coverage of the gene in both groups, whereas the bottom subplot denotes the gene annotation with exon information. The x-axis and y-axis of the plot represent the position of the specific gene and read coverage of that sample, respectively. To show the differential splicing event, the altered exon is highlighted in orange to provide a clear insight into the phenomenon.

Figure 2B shows a high resolution read coverage plot generated by our AS-Quant pipeline [26] for the gene SDCCAG3 with a significant AS event. The figure illustrates the notable alternation of expression levels in the spliced exons between the mock-treated and SARS-Cov-2 infected groups.

2.3. Alternative Polyadenylation

To further assess the predictions of alternative polyadenylation events between mock-treated and the SARS-CoV-2 infected cells, both CR-APA and UTR-APA events (Figure 1B) were identified by the computational pipelines introduced in Section 4.2.

2.3.1. UTR-APA

Our pipeline, APA-Scan [27], identified 144 unannotated UTR-APA events, which were reported using the Chi-squared hypothesis test on A549 cells. A total of 137 UTR-APA events were identified in mock-treated cells, while only 7 UTR-APA events were found in SARS-CoV-2 infected cells, indicating that UTR-APA events were suppressed in SARS-CoV-2 infected cells. Figure 3 shows a UTR-APA event for the gene HNRNPH3 generated by APA-Scan. In Table 4, the top 10 UTR-APA events are reported by the p-value in the ascending order. The p-values indicate the significance of the UTR-APA event between mock versus SARS-CoV-2 infected cells. The ratio difference in the last column is calculated by taking the average read coverage ratios of the two groups. A positive ratio difference indicates an increase in the UTR-APA event in mock-treated cells. Based on the results reported in Table 4, all the top 10 events were identified from the mock-treated cells, thereby reinforcing that SARS-CoV-2 infections suppress the usage of the proximal polyadenylation signal in the 3-end processing of mRNAs. The literature survey shows that three genes (ANXA2, CAV1, and TMEM97) with the UTR-APA events related to SARS-CoV-2 pathogenesis and prognosis (Table 3). For example, TMEM97 has been recently shown to interact with SARS-CoV-2 viral proteins [20].
Figure 3

One example of a gene (HNRNPH3) shows an alternative polyadenylation event in the 3-UTR region. The black vertical line indicates the potential cleavage site. The x-axis and y-axis represent the position of the gene in its chromosome and the read coverage, respectively. The top two subplots show the changes in read coverage surrounding the event, and the cleavage site is indicated by a black vertical line. The bottom part of the plot shows both the full length and truncated 3-UTRs.

Table 4

Top 10 significant alternative polyadenylation events in the 3-UTR region (UTR-APA) between A549 mock-treated and SARS-CoV-2 infected cells. The ‘Position’ column refers to the potential cleavage site for the UTR-APA.

Gene NameChrPositionp-ValueFDRRatio Difference
ACTN4chr1938,730,1846.07 × 10145.71 × 10100.144
ALDH1A1chr972,900,9861.34 × 10126.30 × 1090.054
S100A6chr1153,534,6902.73 × 10118.56 × 1080.045
HNRNPA2B1chr726,191,8611.26 × 1092.96 × 1060.088
PMEPA1chr2057,651,5792.10 × 1093.95 × 1060.153
ACAT2chr6159,779,0333.11 × 1094.88 × 1060.273
ARL4Cchr2234,495,1516.05 × 1088.14 × 1050.231
H3F3Achr1226,071,7758.70 × 1081.02 × 1040.189
RPL15chr323,919,5371.32 × 1071.38 × 1040.041
MYH9chr2236,281,6603.29 × 1072.91 × 1040.177

2.3.2. CR-APA

In the CR-APA analysis, both the hg38 RefSeq annotation [28] and the UCSC annotation [29] were applied to report the list of CR-APA events between mock-treated versus SARS-CoV-2 infected cells. We identified significant CR-APA events by measuring the CR-truncation ratio (Equation (4)). We found an increase of 1206 CR-APA events in mock-treated cells and an enrichment of 656 CR-APA events in SARS-CoV-2 infected cells, indicating that CR-APA events are more frequent in mock-treated cells. The bipartite expression patterns of CR-APA in Figure 4 revealed an enrichment of specific truncated mRNAs in both mock and SARS-CoV-2 infected cells, indicating that reprogramming of the transcriptome and consequent changes in the proteome diversity would occur upon SARS-CoV-2 infection.
Figure 4

RNA-seq data of mock-treated and SARS-CoV-2 infected A549 cells were analyzed for the CR-APA. The x-axis and y-axis represent the CR-truncation ratios ((short mRNA)/(total mRNA)) of a gene. Each dot represents a gene, where the blue dots and red dots show the significant ones in mock and SARS-CoV-2 infected cells, respectively. Upon SARS-CoV-2 infection, 656 genes showed up-regulated CR-APA while 1206 genes showed down-regulated CR-APA. A total of 8832 genes remained unchanged in their CR-APA.

Among the significant CR-APA events, the top 10 significant events are listed in Table 5. The ratio difference is calculated by taking the difference between the average CR-truncation ratio of the SARS-CoV-2 infected cells and mock-treated cells. A CR-APA event occurs in SARS-CoV-2 infected samples if the ratio difference is larger than zero. It is considered significant in the mock-treated group if the ratio difference is smaller than zero. A literature review shows that several genes with CR-APA events are connected to SARS-CoV-2 pathogenesis and prognosis, as described in Table 3. The KEGG pathways in mock-treated and the SARS-Cov-2 infected cells are shown in Figure 5A. Particularly, genes involved in oxidative phosphorylation increased CR-APA, which may affect the expression of full-length proteins that are critical for cellular energy production and mitochondrial dysfunction in respiratory illnesses [30]. It was shown that one of the SARS-CoV-2 proteins, NSP16, binds to the mRNA recognition domains of the spliceosome and suppresses global mRNA splicing in SARS-CoV-2-infected human cells [10]. The KEGG pathways enriched in Figure 2A and Figure 5A show AS and CR-APA events in genes associated with the spliceosome pathway, suggesting that alterations in the regulatory splicing pathway could happen. This, in turn, would affect splicing and splicing-coupled CR-APA events upon SARS-CoV-2 infection, leading to further deleterious effects on gene function [31].
Table 5

Top 10 significant alternative polyadenylation events in the coding region (CR-APA) between A549 mock-treated and SARS-CoV-2 infected cells.

GeneChrTruncated Positionp-ValueFDRRatio Difference
GLRX5chr1495,544,5572.76 × 1066.12 × 1030.361
UBXN6chr194,445,0095.98 × 1061.14 × 1020.020
SLC3A2chr1162,881,2901.31 × 1052.16 × 1020.256
ARIH2chr348,928,6741.37 × 1051.18 × 1020.071
DVL3chr3184,164,5171.50 × 1052.16 × 102−0.269
DEF8chr1689,959,3661.62 × 1051.18 × 102−0.232
SIAH2chr3150,742,4871.62 × 1052.16 × 102−0.362
SNHG7chr9136,724,6591.84 × 1052.16 × 1020.020
LTBP2chr1474,506,7051.94 × 1052.16 × 102−0.248
S100A16chr1153,606,9742.29 × 1052.35 × 1020.200
Figure 5

(A) KEGG pathways enriched by CR-APA events. The blue and red bar charts show the pathways enriched by the APA events in mock-treated cells and SARS-CoV-2 infected cells, respectively. (B) One example of a gene (LARP6) shows an alternative polyadenylation event in the coding region. The truncated coding exon is highlighted in orange.

Figure 5B shows a CR-APA event for the gene LARP6. The top two subplots show the read coverage of the gene in two conditions. The bottom panel represents the isoform annotations. NM_197958 is an annotated CR-truncated isoform in the gene, and the last highlighted exon indicates the end of the CR-APA transcript. Compared to SARS-CoV-2 infected cells, the plot shows a significant increase of RNA-seq alignments in the highlighted exon, indicating a higher CR-truncation ratio of LARP6 in mock-treated cells compared to SARS-CoV-2 infected cells.

2.4. Differential Gene/Transcript Expression

In differential expression analysis, 991 genes and 1443 transcripts show higher expression in mock-treated cells, whereas 717 genes and 933 transcripts show higher expression in SARS-CoV-2 infected cells. These differentially expressed genes and transcripts can be served as molecular signatures for COVID-19 outcome prediction. The top 100 differentially expressed transcripts and genes were selected for bi-clustering analysis. The z-score transformation was applied to the expression value for each entry in the heatmap. From the results, we can see that the six samples are precisely clustered into two groups (Figure 6 and Figure S1 in the Supplementary Materials).
Figure 6

COVID-19 A549 cell lines are clustered by the top 100 transcript markers detected by differential transcript expression analysis.

Table 3 shows two differentially expressed genes that are related to SARS-CoV-2 pathogenesis and prognosis. Figure 7 and Figure S2 in the Supplementary Materials show the enriched KEGG pathways by the differentially expressed transcripts and genes, respectively. In each figure, the top bar plot represents the pathways enriched by the transcripts or genes that are highly expressed in mock-treated cells, and the bottom bar plot represents the transcripts or genes that are highly expressed in SARS-CoV-2 infected cells. The enriched pathways by the gene expression and transcript expression analyses are similar. The KEGG pathway enrichment of the differentially expressed transcripts in the SARS-CoV-2 in Figure 7 shows an up-regulation in the pro-inflammatory pathways TNF- and NF-kappa, confirming the finding that patients with SARS-CoV-2 were reported to have elevated levels of cytokines in blood and cytokine gene expression in peripheral blood mononuclear cells (PBMC) [32,33,34].
Figure 7

Differentially expressed transcripts enriched KEGG pathways. The blue and red bar charts show the pathways enriched by the up-regulated and down-regulated transcripts in mock-treated samples over SARS-CoV-2 infected samples, respectively.

2.5. Comparison between Alternative Processing of Pre-mRNA and Differential Gene Expression

In addition to mRNA expression, post-transcriptional regulation of gene expression (i.e., AS and APA) plays a critical role in the control of cellular processes and affects phenotypic changes in biological features [35]. They may serve as a new layer of biomarker for drug development for COVID-19 therapeutics. Therefore, it is important to compare the genes that are differentially expressed and the genes that undergo alternative processing events. The relationship between differential gene expression and the UTR-APA event was tested based on the APA events identified in Section 2.3.1 and the differentially expressed genes identified in Section 2.4. We combined the analyses of UTR-APA and differential gene expression, and the results are shown in Figure 8A. Each gene was plotted by fold changes in the differential gene expression (y-axis) and the significance of UTR-APA events (x-axis). The left three sections and the right three sections in Figure 8A show the 3-UTR-truncated genes in mock-treated cells and SARS-CoV-2 infected cells, respectively. The top three sections and the bottom three sections represent the up-regulated and down-regulated genes in SARS-CoV-2 infected cells over mock treated cells, respectively. From the plot, we can see that a significant portion (>88%) of the genes showing UTR-APA events are not differentially expressed, indicating no strong correlation between the differential gene expression and the UTR-APA events in the SARS-CoV-2 infected cells. Similarly, more than 90% of the genes showing CR-APA events are not differentially expressed (Figure 8B). Therefore, the profiling of APA-based molecular signatures could provide additional predictive power and potentially lead to the findings of new target pathways for SARS-CoV-2 related translational research.
Figure 8

Scatter plot of APA and differential gene expression. Red dots represent the individual gene in the analysis. Horizontal blue-dashed lines represent the cutoff values for two-fold changes in differential gene expression. Vertical green-dashed lines represent the cutoff values for the (p-value) of UTR-APA (A) determined by the Chi-squared test, and the (p-value) of CR-APA (B) determined by the Student’s t-test.

To further investigate the relationship among all transcript variants identified in this study, a four-set Venn diagram is generated to illustrate the intersections of differentially expressed genes (DEG) and the genes with post-transcriptional regulation events (Figure 9). Both RefSeq and UCSC annotations were applied in the data analysis to make the experimental results comparable. The total number of events generated from alternative splicing, UTR-APA, CR-APA, and differentially expressed genes are 320, 144, 1862, and 2256, respectively. None of the genes were shown in all four categories. Though 2256 genes were detected as differentially expressed genes, only 25 of them have alternative splicing events and 17 of them have UTR-APA events. In the case of an alternative splicing event in a gene showing CR-APA, it has a higher chance of being considered as a CR-APA gene. Based on these observations, we found that 27.5% of genes with AS events show CR-APA. Overall, the results further confirm that post-transcriptional regulation profiles and differential gene expressions profiles provide distinctive molecular signatures for COVID-19 studies. In addition, the three different types of post-transcriptional regulation events barely overlap.
Figure 9

Four-set Venn diagram shows the overlapped genes in three different types of post-transcriptional regulations and differentially expressed genes (DEG).

3. Discussion and Conclusions

At present, the whole world is suffering from the COVID-19 pandemic, but the molecular mechanisms of COVID-19 pathogenesis are not clear. To better understand the molecular basis of the disease, here we compare the effects of transcriptional and post-transcriptional events on SARS-CoV-2 infection in lung cancer cell lines. From these analyses, we were able to identify distinct molecular signatures that were not otherwise found by traditional differential gene expression analysis. These novel molecular signatures may contribute to COVID-19 pathobiology. The results from the data analyses demonstrate that several transcriptional and post-transcriptional signatures identified in this study are highly correlated with SARS-CoV-2 pathogenesis and prognosis. These signatures are enriched in diverse biological pathways that were not detected in the original study [1], which only focused on differential gene expression analysis. These pathways may lead to the discovery of new molecular mechanisms of SARS-CoV-2 pathogenesis. In addition, post-transcriptional gene regulations provide additional molecular signatures for COVID-19 therapeutic targets compared to the transcriptional signatures of COVID-19 patients. The latest COVID-19 study [10] has shown that the SARS-CoV-2 NSP16 protein suppresses mRNA splicing, leading to intron retention in host genes. In our data analyses, we only identified 16 intron retention events in the SARS-CoV-2 infected samples. This might be due to the fact that our custom-developed AS-Quant pipeline detects intron retention based on the annotations [26]. Thus, more cases for intron retention could be discovered by future pipelines, which will be important to identify non-canonical peptides produced from the translation of intron retained transcripts. While our findings are confined at the transcriptome level, further investigations on the changes of proteome and post-translational modifications in the proteome are warranted to comprehensively dissect the mechanistic aspect of COVID-19 pathobiology. In our pathway enrichment analysis, genes in the ribosome pathway are up-regulated in the SARS-CoV-2 infected cells, suggesting that there may be an up-regulation of protein synthesis. Recent quantitative proteomics studies have suggested that time-dependent proteome changes in the post-infection of SARS-Cov-2 [36,37]. Studies have reported that in an early stage of infection (less than 24 h), a proteome change occurs bi-directionally (some up-regulated and some down-regulated). However, at a later stage of infection (greater than 24 h), there is an overall down-regulation of the global protein abundance. However, proteins that displayed differential phosphorylation and ubiquitination did not change the overall protein abundance. Nonetheless, 17.0%, 20.1%, and 16.4% of the genes that contain AS, UTR-APA, and CR-APA events are reported to be differentially expressed in their protein levels [37]. In addition, both our post-transcriptional study and the proteomics studies have consistently identified the activation of the MAPK signaling pathway in SARS-CoV-2 infected cells. These findings suggest the importance of using a multi-omics approach in resolving detailed molecular and cellular signatures of COVID-19.

4. Materials and Methods

In this section, we first introduce the pipelines and strategies to detect the alternative splicing and alternative polyadenylation events between two biological conditions in Section 4.1 and Section 4.2, respectively. Then, we explain the method to determine the differentially expressed transcripts and genes in Section 4.3. Finally, we provide an overview of the gene set enrichment analysis on the significant events in Section 4.4.

4.1. Detection of Alternative Splicing Events

To analyze the alternative splicing events between mock-treated and infected with SARS-CoV-2 samples with RNA-seq data, we used the computational pipeline AS-Quant [26]. Based on the read coverage files generated by SAMtools, all the potential splicing events were categorized into five major types of alternative splicing, as shown in Figure 1A. For each spliced exon, the average read coverage (n) is calculated by the read count on that exon () divided by the read length () in AS-Quant: The average read coverage for all other exons of that gene (N) is calculated in the same way, by dividing the total reads () by the gene length (). Next, the ratio difference between two biological conditions is calculated based on the following equation: where 1 and 2 represent the different conditions. The most significant events are reported according to the p-value generated by a canonical 2 × 2 Chi-squared test. The topmost events are then illustrated by read coverage plots to give a visual representation (e.g., Figure 2B).

4.2. Detection of APA Events

4.2.1. UTR-APA

To identify the 3’-UTR-APA events between mock-treated or SARS-CoV-2 infected samples with RNA-seq data, we applied the computational tool APA-Scan [27]. APA-Scan utilizes either predicted or experimentally validated polyadenylation signals as a reference for 3’-UTR polyadenylation sites. It also calculates the amount of full-length and truncated 3’-UTR transcripts with the RNA-seq data. The canonical polyadenylation signals (PAS) AATAAA and ATTAAA in the distal 3’-UTR region were considered as the tentative locations of cleavage sites. Then, the APA-events were evaluated by comparing the RNA-seq reads coverage upstream and downstream of the candidate cleavage site between the samples in two different biological conditions. Specifically, the average read coverage for upstream ( and for two biological conditions) and downstream ( and ) of the candidate cleavage site were calculated. Next, the ratio difference between two conditions was determined by the following equation: The 2 × 2 Chi-square test was then applied to assess the significance of the candidate events. The read coverage plots of the significant events were reported according to the user-defined p-value cutoff (e.g., Figure 3).

4.2.2. CR-APA

To identify and quantify dynamic CR-APA events, the protein-coding transcripts of a gene were categorized as either full-length or truncated transcripts based on the NCBI human hg38 RefSeq [28] or UCSC annotations [38]. A transcript is categorized as a truncated transcript if it satisfies the following conditions: (1) its coding end is not the maximum coding end in the gene, and (2) its transcript end is not the maximum transcript end in the gene; otherwise, the transcript is a full-length transcript in the gene. The CR-truncation ratio that quantifies the CR-APA event is defined as, where is the summation of expressions of all truncated transcripts in the gene, denotes gene expression, and . The Student’s t-test is then applied to the CR-truncation ratios to determine the significant CR-APA events between the two groups of samples.

4.3. Differential Gene/Transcript Expressions

Differential gene expression analysis has been widely used to identify transcriptomic signatures that define phenotypic changes in biology. In this work, Kallisto [12] was applied to perform the quantification of RNA-seq data for COVID-19 samples. After filtering out lowly expressed genes (TPM < 1), the fold change of transcript or gene expression was calculated between two groups of samples. The transcripts or genes with a fold change larger than 1 or smaller than , and with a Student’s t-test p-value < 0.05 were considered as significantly differentially expressed.

4.4. Enriched Pathway Analysis

After identifying the AS events, APA events, and differentially expressed genes/transcripts in each biological condition, the events and genes which are over-represented in GO (Gene Ontology) terms and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways were determined by DAVID Bioinformatics Resources [39]. These enriched functional profiles and molecular interactions could help us better understand the underlying biological processes in COVID-19. The KEGG pathway is a collection of pathway-maps representing molecular level information, including metabolism, genetic information processing, environmental information processing, cellular processes, organismal systems, human diseases, and drug development. The Gene Ontology (GO) terms define the biological domain with respect to three aspects: molecular function, cellular component, and biological process.
  37 in total

1.  The Global Phosphorylation Landscape of SARS-CoV-2 Infection.

Authors:  Mehdi Bouhaddou; Danish Memon; Bjoern Meyer; Kris M White; Veronica V Rezelj; Miguel Correa Marrero; Benjamin J Polacco; James E Melnyk; Svenja Ulferts; Robyn M Kaake; Jyoti Batra; Alicia L Richards; Erica Stevenson; David E Gordon; Ajda Rojc; Kirsten Obernier; Jacqueline M Fabius; Margaret Soucheray; Lisa Miorin; Elena Moreno; Cassandra Koh; Quang Dinh Tran; Alexandra Hardy; Rémy Robinot; Thomas Vallet; Benjamin E Nilsson-Payant; Claudia Hernandez-Armenta; Alistair Dunham; Sebastian Weigang; Julian Knerr; Maya Modak; Diego Quintero; Yuan Zhou; Aurelien Dugourd; Alberto Valdeolivas; Trupti Patil; Qiongyu Li; Ruth Hüttenhain; Merve Cakir; Monita Muralidharan; Minkyu Kim; Gwendolyn Jang; Beril Tutuncuoglu; Joseph Hiatt; Jeffrey Z Guo; Jiewei Xu; Sophia Bouhaddou; Christopher J P Mathy; Anna Gaulton; Emma J Manners; Eloy Félix; Ying Shi; Marisa Goff; Jean K Lim; Timothy McBride; Michael C O'Neal; Yiming Cai; Jason C J Chang; David J Broadhurst; Saker Klippsten; Emmie De Wit; Andrew R Leach; Tanja Kortemme; Brian Shoichet; Melanie Ott; Julio Saez-Rodriguez; Benjamin R tenOever; R Dyche Mullins; Elizabeth R Fischer; Georg Kochs; Robert Grosse; Adolfo García-Sastre; Marco Vignuzzi; Jeffery R Johnson; Kevan M Shokat; Danielle L Swaney; Pedro Beltrao; Nevan J Krogan
Journal:  Cell       Date:  2020-06-28       Impact factor: 41.582

2.  Gene expression profiles in peripheral blood mononuclear cells of SARS patients.

Authors:  Shi-Yan Yu; Yun-Wen Hu; Xiao-Ying Liu; Wei Xiong; Zhi-Tong Zhou; Zheng-Hong Yuan
Journal:  World J Gastroenterol       Date:  2005-08-28       Impact factor: 5.742

3.  Aberrant splicing and defective mRNA production induced by somatic spliceosome mutations in myelodysplasia.

Authors:  Yusuke Shiozawa; Luca Malcovati; Anna Gallì; Aiko Sato-Otsubo; Keisuke Kataoka; Yusuke Sato; Yosaku Watatani; Hiromichi Suzuki; Tetsuichi Yoshizato; Kenichi Yoshida; Masashi Sanada; Hideki Makishima; Yuichi Shiraishi; Kenichi Chiba; Eva Hellström-Lindberg; Satoru Miyano; Seishi Ogawa; Mario Cazzola
Journal:  Nat Commun       Date:  2018-09-07       Impact factor: 14.919

4.  Alternate splicing of transcripts upon Mycobacterium tuberculosis infection impacts the expression of functional protein domains.

Authors:  Haroon Kalam; Kartikeya Singh; Komal Chauhan; Mary F Fontana; Dhiraj Kumar
Journal:  IUBMB Life       Date:  2018-08-18       Impact factor: 3.885

5.  SKP2 attenuates autophagy through Beclin1-ubiquitination and its inhibition reduces MERS-Coronavirus infection.

Authors:  Nils C Gassen; Daniela Niemeyer; Doreen Muth; Victor M Corman; Silvia Martinelli; Alwine Gassen; Kathrin Hafner; Jan Papies; Kirstin Mösbauer; Andreas Zellner; Anthony S Zannas; Alexander Herrmann; Florian Holsboer; Ruth Brack-Werner; Michael Boshart; Bertram Müller-Myhsok; Christian Drosten; Marcel A Müller; Theo Rein
Journal:  Nat Commun       Date:  2019-12-18       Impact factor: 14.919

6.  Viral-induced alternative splicing of host genes promotes influenza replication.

Authors:  Matthew G Thompson; Mark Dittmar; Michael J Mallory; Prasanna Bhat; Max B Ferretti; Beatriz Ma Fontoura; Sara Cherry; Kristen W Lynch
Journal:  Elife       Date:  2020-12-03       Impact factor: 8.140

Review 7.  Alternative Splicing of Pre-mRNA in the Control of Immune Activity.

Authors:  Zhongjing Su; Dongyang Huang
Journal:  Genes (Basel)       Date:  2021-04-15       Impact factor: 4.096

8.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists.

Authors:  Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal:  Nucleic Acids Res       Date:  2008-11-25       Impact factor: 16.971

9.  Plausibility of therapeutic effects of Rho kinase inhibitors against Severe Acute Respiratory Syndrome Coronavirus 2 (COVID-19).

Authors:  Farshad Abedi; Ramin Rezaee; Gholamreza Karimi
Journal:  Pharmacol Res       Date:  2020-04-10       Impact factor: 7.658

Review 10.  Lung Protection by Cathepsin C Inhibition: A New Hope for COVID-19 and ARDS?

Authors:  Brice Korkmaz; Adam Lesner; Sylvain Marchand-Adam; Celia Moss; Dieter E Jenne
Journal:  J Med Chem       Date:  2020-08-07       Impact factor: 7.446

View more
  2 in total

1.  An L-theanine derivative targets against SARS-CoV-2 and its Delta and Omicron variants.

Authors:  Jing Lu; Ying Zhang; Dan Qi; Chunyan Yan; Benhao Wu; Jason H Huang; Jianwen Yao; Erxi Wu; Guoying Zhang
Journal:  Heliyon       Date:  2022-06-09

2.  Differential Co-Expression Network Analysis Reveals Key Hub-High Traffic Genes as Potential Therapeutic Targets for COVID-19 Pandemic.

Authors:  Aliakbar Hasankhani; Abolfazl Bahrami; Negin Sheybani; Behzad Aria; Behzad Hemati; Farhang Fatehi; Hamid Ghaem Maghami Farahani; Ghazaleh Javanmard; Mahsa Rezaee; John P Kastelic; Herman W Barkema
Journal:  Front Immunol       Date:  2021-12-15       Impact factor: 7.561

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.