| Literature DB >> 24922324 |
Laurence Josset1, Nicolas Tchitchek1, Lisa E Gralinski2, Martin T Ferris3, Amie J Eisfeld4, Richard R Green1, Matthew J Thomas1, Jennifer Tisoncik-Go1, Gary P Schroth5, Yoshihiro Kawaoka4, Fernando Pardo Manuel de Villena6, Ralph S Baric2, Mark T Heise3, Xinxia Peng1, Michael G Katze1.
Abstract
The outcome of respiratory virus infection is determined by a complex interplay of viral and host factors. Some potentially important host factors for the antiviral response, whose functions remain largely unexplored, are long non-coding RNAs (lncRNAs). Here we systematically inferred the regulatory functions of host lncRNAs in response to influenza A virus and severe acute respiratory syndrome coronavirus (SARS-CoV) based on their similarity in expression with genes of known function. We performed total RNA-Seq on viral-infected lungs from eight mouse strains, yielding a large data set of transcriptional responses. Overall 5,329 lncRNAs were differentially expressed after infection. Most of the lncRNAs were co-expressed with coding genes in modules enriched in genes associated with lung homeostasis pathways or immune response processes. Each lncRNA was further individually annotated using a rank-based method, enabling us to associate 5,295 lncRNAs to at least one gene set and to predict their potential cis effects. We validated the lncRNAs predicted to be interferon-stimulated by profiling mouse responses after interferon-α treatment. Altogether, these results provide a broad categorization of potential lncRNA functions and identify subsets of lncRNAs with likely key roles in respiratory virus pathogenesis. These data are fully accessible through the MOuse NOn-Code Lung interactive database (MONOCLdb).Entities:
Keywords: collaborative cross; influenza virus; interferon; long non-coding rna; rna-seq; sars-cov
Mesh:
Substances:
Year: 2014 PMID: 24922324 PMCID: PMC4179962 DOI: 10.4161/rna.29442
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652

Figure 1. Characterization of lncRNA pulmonary expression in mice after infection with either IAV PR8 or SARS-CoV MA15. (A-B) Similarities in lncRNA (A) or coding gene (B) expression profiles are depicted using non-parametric multidimensional scaling (MDS). Each RNA sample is represented as a single point colored by viral treatment (green for mock-, salmon for MA15- and blue for PR8-infected samples), and with a different shape according to mouse strain. Convex hulls link samples belonging to the same condition, with different line width depicting the DPI. Euclidian distance was calculated using the normalized counts data for lncRNA passing QC (A) or coding genes passing QC (B), such that proximity indicates similarity, while distance indicates dissimilarity of gene-expression profiles. Kruskal's stress (KS) quantifies the quality of the representations as a fraction of the information lost during the dimensionality reduction procedure. (C) Dynamic range of expression after infection for lncRNA compared with coding genes. Boxplots represent the 5% and 95% quantile (lower and upper extreme whiskers), 25% and 75% (lower and upper hinges) and the median of gene expression changes after infection in log2FC, considering data for two and four DPI and for all eight mice strains together. (D) Number of differentially expressed (DE) lncRNA and coding genes after infection at each DPI and for each mouse strain (FDR = 1%). Dark colors represent lncRNAs and the light colors represent coding genes.

Figure 2. Modular annotation of lncRNA. (A) Heatmap depicting expression values for DE coding and non-coding genes. Samples were clustered by hierarchical clustering and represented by symbols similar to the ones used in Figure 1A and B. Genes were grouped into modules (co-expressed sets of transcripts), which were arbitrary labeled and depicted by different colors. (B) Number of coding and non-coding genes comprising each module. (C) Odds ratio of being a key point in the network given the gene is coding compared with non-coding. Key points are defined as bottlenecks: top 5% genes with highest betweeness centrality (bc); and hubs: top 5% genes with highest degree in the whole network (kTotal). (D) Example of lncRNA hubs within the turquoise module: n266006, n265692, and n280959. The turquoise module is enriched in ISGs (Table 1). For clarity, only the top 15 most correlated genes for each hub lncRNA are shown. lncRNAs are colored based on their MONOCLdb module membership and represented by square symbols, while coding genes are depicted as open circles, but please note that all genes in panel D belong to the turquoise module. This representation was generated using MONOCLdb.
Table 1. Functional enrichment for each module of co-expressed coding and non-coding genes
| MONOCLdb module | GO | Reactome | GeneAtlas | Immgen | motif | ISG | QTL | Correlation with WL and viral replication (bicor) |
|---|---|---|---|---|---|---|---|---|
| GO:0007275_multicellular organismal development (ES = 5.91) | Metabolism (ES = 13.85) | T-cells CD4+ (ES = 1.79) | #44: “Downregulated with differentiation, except some myeloids. High in stromal”(ES = 2.84) | SP1(MA0079.2) (ES = 12.82) | QTL_SARS_eosinophilia (ES = 1.62) | SARS_MA15_vRNA (-0.63);PR8_vRNA (-0.37) | ||
| GO:0006412_translation (ES = 14.48) | Gene Expression (ES = 40.25) | B-cells follicular (ES = 1.84) | #4: “Ribosomal proteins” (ES = 4.03) | Klf4(MA0039.2) (ES = 32.6) | QTL_FLU_HrI4 (ES = 7.24) | SARS_WL (-0.42);SARS_MA15_vRNA (0.3);PR8_WL (-0.41);PR8_vRNA (0.35) | ||
| GO:0007018_microtubule-based movement (ES = 9.32) | Potassium Channels (ES = 3.63) | #37: “High in stromal and blood endothelial cell” (ES = 2.9) | Rfx4_primary(UP00056_1) (ES = 15.88) | PR8_WL (0.47); PR8_vRNA (-0.72) | ||||
| GO:0042113_B cell activation (ES = 2.56) | GPCR downstream signaling (ES = 2.78) | Bcells common (ES = 7.81) | #33: “Early B module” (ES = 4.61) | Otx1_2325.1(UP00229_1) (ES = 3.91) | SARS_WL (0.37) | |||
| GO:0042254_ribosome biogenesis (ES = 8.45) | Gene Expression (ES = 24.61) | Mast cells (ES = 1.75) | #5: “Downregulated with differentiation” (ES = 10.38) | GABPA(MA0062.2) (ES = 5.12) | SARS_WL (-0.71); SARS_MA15_vRNA (0.84); PR8_WL (0.49); PR8_vRNA (0.71) | |||
| GO:0008152_metabolic process (ES = 10.29) | Metabolism (ES = 21.02) | #35: “Endothelial genes, extracellular matrix ” (ES = 7.16) | SP1(MA0079.2) (ES = 11.13) | SARS_WL (0.59); SARS_MA15_vRNA (-0.92); PR8_WL (0.45); PR8_vRNA (-0.52) | ||||
| GO:0009404_toxin metabolic process (ES = 1.43) | Neurotransmitter Receptor Binding And Downstream Transmission In The Postsynaptic Cell (ES = 1.72) | DC lymphoid (ES = 1.77) | #36: “Fibroblasts genes, extracellular matrix ” (ES = 7.38) | Hoxa10_2318.1(UP00217_1) (ES = 15.63) | QTL_FLU_HrI3 (ES = 2.89) | SARS_WL (0.67); SARS_MA15_vRNA (-0.85); PR8_WL (0.52); PR8_vRNA (-0.7) | ||
| GO:0007165_signal transduction (ES = 4.75) | Immune System (ES = 25.77) | T-cells foxP3+ (ES = 2.47) | #25: “Low in T cells, intermediate in B cells, high in myeloids” (ES = 3.46) | Klf4(MA0039.2) (ES = 11.09) | ISG (ES = 2.77) | QTL_FLU_HrI4 (ES = 1.53) | SARS_WL (-0.67); SARS_MA15_vRNA (0.8); PR8_WL (-0.52); PR8_vRNA (0.68) | |
| GO:0031123_RNA 3′-end processing (ES = 1.34) | B-cells marginal (ES = 2.2) | #1: “Downregulated in myeloids and stromal” (ES = 2.33) | Foxl1_secondary(UP00061_2) (ES = 6.95) | SARS_MA15_vRNA (0.68); PR8_vRNA (0.39) | ||||
| GO:0051301_cell division (ES = 48.25) | Cell Cycle (ES = 71.45) | Macrophage BM_0hr (ES = 15.16) | #11: “cell cycle genes” (ES = 75.87) | E2F2_secondary(UP00001_2) (ES = 6.36) | SARS_WL (-0.55); SARS_MA15_vRNA (0.49); PR8_WL (-0.52); PR8_vRNA (0.39) | |||
| GO:0006955_immune response (ES = 23.55) | Immune System (ES = 39.88) | Macrophage common (ES = 10.6) | #52: “Interferon response” (ES = 12.59) | Isgf3g_primary(UP00074_1) (ES = 6.55) | ISG (ES = 87.9) | QTL_SARS_eosinophilia (ES = 1.92) | SARS_WL (-0.51); SARS_MA15_vRNA (0.95); PR8_WL (-0.39); PR8_vRNA (0.76) |
ES, Enrichment score (ES) defined as –log10 p-value calculated by exact Fisher’s test.

Figure 3. Individual lncRNA annotation based on ranked correlation. (A) Example of ranked-correlation annotation for n284201. DE genes are ranked based on their bicor coefficient with n284201 and colored in black for ISG and grey for not ISG. Functional enrichment was performed with the Wilcoxon Rank-Sum (WRS) test, which defined whether genes from one gene-set are significantly found at the top of the list. Enrichment score (ES) is defined as -log10 (Bonferoni adjusted p-value) for n284101 was highly significant (ES = 110) and therefore n284101 was annotated as an ISG. (B) Distribution of the ranked annotation in each functional category. “GeneAtlas” gene-sets were defined as genes highly expressed in immune cell populations compared with lung profiles in GeneAtlas, “GOBP” gene-sets are the Gene Ontology Biological Processes, “Immgen” gene-sets are modules of co-expressed genes across various immune cell types as defined in the Immgen project, “ISG” is a list of IFN response genes, “Motif” gene-sets are lists of genes whom promoters have TF motif binding sites, “QTL” gene-sets are QTL regions identified for susceptibility of SARS or IAV in the CC mice, and “Reactome” are reactome pathways. Finally, “Total_annot” is the sum of GeneAtlas, GOBP, Immgen and Reactome annotations.

Figure 4. Prediction of potentially cis-acting lncRNAs. (A) Number of lncRNA positively (enhancer-like function) or negatively (inhibitors) correlated with neighbor coding genes (within 200 kb) considering all genes regardless of their strand (both), or only genes on the same strand as the lncRNA (sense) or on the opposite strand (antisense). (B) Specificity and strength of cis lncRNA correlation with neighbor genes, regardless of their strand. ES PAGE were defined as –log10 p-value calculated by PAGE test which assess whether neighbor genes were among the most positively correlated (for enhancer-like lncRNA) or negatively correlated (for inhibitor lncRNA) genes. ES PAGE was calculated only for lncRNAs with more than 3 coding neighbors; otherwise this score was set arbitrarily to 0. The x-axis represents the arithmetic mean of bicor coefficient between a given lncRNA and all its coding neighbor genes. lncRNAs with the highest specificity for correlation with coding neighbor genes, or the most correlated with their neighbor genes (mean bicor) are indicated with their names. Similar plots for lncRNA specificity for antisense or sense neighbors are depicted Fig S11. (C) Expression levels (in Log2FC) of n265841, n287111, and their neighbor genes, across the different CC founder mice and viral conditions.

Figure 5. Validation of ISG annotation. (A) Enrichment score (ES) of each lncRNA for ISG annotation. Dashed line indicates the rank above which lncRNAs had a significant ES > 1.3. (B) Module membership for each lncRNA ranked as in panel A. Each line represent a lncRNA colored based on its MONOCLdb module membership (C) lncRNAs that were found DE in an additional RNA-Seq data set of mice treated with IFN-α are displayed with black lines. (D) Expression level for each ISG in C57BL/6J mice treated with IFN over untreated mice is depicted in a blue to red gradient. In B, C and D, lncRNA are ranked as in panel A, based on their ES for ISG annotation. Top ranked lncRNA were highly and significantly upregulated in mice treated with IFN.

Figure 6. MONOCLdb. (A) Presentation of the MONOCLdb pipeline. Users can select lncRNAs by: noncode ID (e.g., “n424068”), GO term found significantly enriched with the rank-based annotation (e.g., “GO:0007010”), Immgen Coarse module number found significantly enriched with the rank-based annotation (e.g., “Immgen_Coarse.module_28”), Ensembl gene ID of most correlated coding-genes (e.g., “ENSMUSG00000029088”), or Ensembl gene ID of chromosomic neighbor (within 200 kb) coding-genes (e.g., “ENSMUSG00000030921”). (B-G) Examples of figures generated by MONOCLdb after query with: n424068 (Neat1), n424069 (Neat1), n177784 (Malat1), n424043 (Adapt33), and n424044 (Adapt33). For simplification, we have replaced the MONOCLdb lncRNA gene names by their symbol. (B) Expression heatmap. Expression values of lncRNAs in Log2FC in PR8- and MA15-infected mice are displayed as a green to red gradient (saturation levels: log2FC from -2 to 2) (mean of biological replicate). (C) Module-based enrichment. Module membership is depicted by a set of colored squares with functional description of each module on the top. The second set on the right displays percentile rank (PR) of intramodular degree and betweeness centrality with a yellow to blue gradient. High PRs in dark blue indicate intramodular hubs and bottlenecks. (D) Pathogenicity Association. Bubble plot showing the correlation between lncRNA expression and phenotypic data. The size of each bubble is relative to the absolute bicor coefficient, with green indicating anti-correlation and red positive correlation. (E) Genomic Co-Expression Network. Genomic network showing the top 15 most correlated genes with each queried lncRNA (|bicor| > 0.7). The position of each lncRNA in the chromosomic circle is relative to its coordinate (middle of the gene). lncRNA classified as potential cis lncRNA are represented in blue while trans lncRNA are in purple. (F) Rank-based Enrichment. Radial plot showing results of rank-based enrichment for Neat1 in Reactome pathways. Distance from the center to each edge is relative to the enrichment score (ES) defined as –log10 Bonferoni corrected p-value of WRS test. (G) Co-expression network. Relationships between each queried lncRNA and their top 15 most correlated genes (|bicor| > 0.7) are represented as a network with yellow edges indicating negative correlation and blue edges indicating positive correlation. Coding genes are depicted as circles and non-coding genes as squares. lncRNAs are colored based on their module membership.