Literature DB >> 24647608

Transcriptome analysis reveals differential splicing events in IPF lung tissue.

Tracy Nance1, Kevin S Smith1, Vanessa Anaya1, Rhea Richardson1, Lawrence Ho2, Mauro Pala1, Sara Mostafavi3, Alexis Battle3, Carol Feghali-Bostwick4, Glenn Rosen5, Stephen B Montgomery1.   

Abstract

Idiopathic pulmonary fibrosis (IPF) is a complex disease in which a multitude of proteins and networks are disrupted. Interrogation of the transcriptome through RNA sequencing (RNA-Seq) enables the determination of genes whose differential expression is most significant in IPF, as well as the detection of alternative splicing events which are not easily observed with traditional microarray experiments. We sequenced messenger RNA from 8 IPF lung samples and 7 healthy controls on an Illumina HiSeq 2000, and found evidence for substantial differential gene expression and differential splicing. 873 genes were differentially expressed in IPF (FDR<5%), and 440 unique genes had significant differential splicing events in at least one exonic region (FDR<5%). We used qPCR to validate the differential exon usage in the second and third most significant exonic regions, in the genes COL6A3 (RNA-Seq adjusted pval = 7.18e-10) and POSTN (RNA-Seq adjusted pval = 2.06e-09), which encode the extracellular matrix proteins collagen alpha-3(VI) and periostin. The increased gene-level expression of periostin has been associated with IPF and its clinical progression, but its differential splicing has not been studied in the context of this disease. Our results suggest that alternative splicing of these and other genes may be involved in the pathogenesis of IPF. We have developed an interactive web application which allows users to explore the results of our RNA-Seq experiment, as well as those of two previously published microarray experiments, and we hope that this will serve as a resource for future investigations of gene regulation in IPF.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24647608      PMCID: PMC3960165          DOI: 10.1371/journal.pone.0092111

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Idiopathic pulmonary fibrosis (IPF) is a progressive disease of unknown aetiology, characterized by fibrotic scarring in the lungs which leads to shortness of breath and eventual respiratory failure. The disease typically presents in patients 50–70 years old, with prevalence increasing with age, and has been shown to have both genetic and environmental predisposing factors [1], [2]. Median survival time after diagnosis is only 4–5 years [3], and there is currently no effective treatment for IPF except lung transplantation [4]. Current theory of pathogenesis in IPF holds that chronic injury to alveolar epithelial cells induces aberrant activation of wound-healing pathways, leading to an increase in inflammatory signals and subsequent differentiation of fibroblasts, epilthelial-mesenchymal transition in alveolar cells, and accumulation of myofibroblasts. This results in the formation of fibroblastic foci and deposition of collagen, fibronectin, and other extracellular matrix (EM) components. In contrast with normal wound-healing and for unknown reasons, apoptosis is not properly initiated in myofibroblasts, and secretion of EM proteins does not terminate. This results in contraction and ultimately destruction of the lung parenchyma [3], [4]. The primary cause of alveolar injury and dysregulated repair is still poorly understood, but recent genome-wide association studies have implicated abnormalities in mucosal defense, cell-cell adhesion and DNA repair in the development of IPF [5]. Previous studies have indicated that many other pathways are perturbed in IPF as well, including TGF-β and WNT signaling and others related to coagulation, angiogenesis, oxidative stress, and development [4]. Genes associated with these pathways have been found to have differential expression in IPF cases as compared to healthy controls; however, no effective treatment has yet been developed which targets any individual gene. The ability to interrogate mRNA transcripts through RNA sequencing allows us to find genes whose differential expression reaches genome-wide significance, and to investigate differential splicing events on a broad scale. Furthermore this transcriptome-wide information can be used to inform the study of pathways and networks which may be dysregulated in IPF. We performed RNA sequencing on whole lung tissue samples obtained from 8 patients with IPF and 7 healthy controls in order to investigate these phenomena and their potential role in the pathogenesis of this complex disease. In addition, a large number of microarray-based gene expression studies of IPF have been published and the data are publicly accessible, but there is currently no easy way to visualize differential expression results across various studies. We further wanted to make splicing visualizations from our study easily available and searchable. With this in mind, we have developed an internet application which allows users to view the results of our RNA-Seq study interactively, and to compare them with results of previously published microarray studies: The IPF Gene Explorer, available from the project website link at http://montgomerylab.stanford.edu/resources.html. We hope that this application will make gene expression results more accessible to researchers and will be a valuable tool in future investigations.

Results

Differential Gene Expression

In comparing the healthy (n = 7) and diseased (n = 8) lung samples, the Bioconductor package DESeq [6] produced a list of 873 genes showing significant differential expression (DE) at an FDR of 5%, where sex and demographic group were included as covariates. These genes had fold changes in the rangewith one gene (LGALS7) having no counts in the healthy samples. Overall, we observed more up-regulated genes in the IPF samples than down-regulated genes (Figure 1). The top ten genes with smallest p-value are listed in Table 1.
Figure 1

Differential expression analysis reveals more upregulation than down regulation.

This plot depicts fold change vs. mean expression. Points depict genes, with red indicating those genes that show significant differential expression (FDR<5%).

Table 1

Top ten differentially expressed genes (by p-value).

genepvalpadjlog2FCdescription
COMP4.17e-201.28e-153.46cartilage oligomeric matrix protein
DIO21.35e-182.08e-142.75deiodinase, iodothyronine, type II
CXCL141.19e-151.22e-114.11chemokine (C-X-C motif) ligand 14
IGLC36.04e-154.65e-114.14immunoglobulin lambda constant 3 (Kern-Oz+ marker)
PDGFD1.18e-147.29e-112.51platelet derived growth factor D
MMP134.11e-142.11e-103.52matrix metallopeptidase 13 (collagenase 3)
CDH36.83e-143.01e-102.00cadherin 3, type 1, P-cadherin (placental)
RP11-731F5.23.17e-131.22e-094.36lincRNA
IGHG24.96e-131.70e-094.21immunoglobulin heavy constant gamma 2 (G2m marker)
IGHGP1.26e-123.87e-094.41immunoglobulin heavy constant gamma P (non-functional)

Differential expression analysis reveals more upregulation than down regulation.

This plot depicts fold change vs. mean expression. Points depict genes, with red indicating those genes that show significant differential expression (FDR<5%). There were 82 unique named genes which showed highly significant DE (FDR<1%) in both our RNA-Seq data and in two recent microarray studies [7], [8]. At an FDR of <1%, our study identified 475 differentially expressed (DE) genes (8 IPF samples and 7 controls), whereas microarray experiment GSE24206 (17 IPF, 6 controls) identified 3083 DE genes and microarray experiment GSE32537 (119 Idiopathic Interstitial Pneumonias, 50 controls) identified 6291 DE genes. The direction of change found in the RNA-Seq data agrees with that from both microarray studies for all 82 genes in this overlap. A list of these genes, together with their p-values and fold changes, may be viewed on the “Gene Set” page of our web application, by selecting the gene set “Microarray overlap” and clicking on the “Table” tab.

GWAS Genes are Enriched for Differential Expression

198 SNPs were retrieved from the discovery set of a recent GWAS study [5] of idiopathic interstitial pneumonia (IIP), a class of diseases which includes IPF and similar fibrotic diseases of the lung. The discovery SNPs were those having GWAS pvalue <0.0001. We identified 111 genes which were associated with these SNPs via biomaRt [9] and for which we had sufficient read depth to test for DE in our RNA-Seq data; of these genes, 8 showed significant differential expression at an FDR of 5% and are listed in Table 2. We calculated a hypergeometric p-value of 0.0132 for the likelihood of seeing 8 significant hits in a gene set of this size (see Methods), indicating that it is unlikely to see this many genes showing differential expression by chance alone. Because questions have been raised about the validity of the hypergeometric (or Fisher’s exact) test in this context [10], we also calculated an empirical p-value of 0.01 by a sample permutation method.
Table 2

Genes associated with GWAS [5] validation SNPs which are differentially expressed in RNA-Seq data at 5% FDR.

genepvalpadjlog2FCdescriptionGWAS significant rsids
RNF59.46e-072.04e-04−2.16ring finger protein 5, E3 ubiquitin protein ligasers3134943
MUC5B 6.50e-06 9.05e-04 4.63 mucin 5B, oligomeric mucus/gel-forming rs12417955, rs2735727, rs2857476, rs868903
DSP 8.83e-05 6.40e-03 1.16 desmoplakin rs10484325, rs10484326, rs2076295, rs2076302, rs3778337
AGER1.03e-047.27e-03−2.28advanced glycosylation end product-specific receptorrs3134943
SRGAP32.45e-041.41e-021.31SLIT-ROBO Rho GTPase activating protein 3rs12638703
MAPK107.46e-043.23e-020.76mitogen-activated protein kinase 10rs4488910
FAT19.27e-043.71e-020.93FAT atypical cadherin 1rs2130910
HBEGF1.33e-034.76e-02−1.05heparin-binding EGF-like growth factorrs13385

The two genes in bold have corresponding SNPs which were validated by the GWAS meta-analysis.

The two genes in bold have corresponding SNPs which were validated by the GWAS meta-analysis. The definition of significant differential expression above relies on an arbitrary FDR cutoff, but we similarly found enrichment for differential expression using FDR cutoffs of 1% (p-value = 0.093) and 10% (p-value = 0.040). An alternative view is given by the QQ-plot in Figure 2, which depicts the quantiles of −log10 p-values for the 111 GWAS-identified genes versus the means of quantiles of −log10 p-values across 10,000 random gene sets, with error bars depicting one standard deviation from the mean. This plot suggests that the GWAS genes are enriched for more significant DE p-values, even among those that do not pass our 5% FDR cutoff for significance.
Figure 2

GWAS genes identified from discovery SNPs appear to be enriched for differential expression.

X-axis values of dots indicate the mean −log10(pval) from 10,000 randomly permuted sets, and error bars indicate one standard deviation from the mean in these permutations.

GWAS genes identified from discovery SNPs appear to be enriched for differential expression.

X-axis values of dots indicate the mean −log10(pval) from 10,000 randomly permuted sets, and error bars indicate one standard deviation from the mean in these permutations. For the 45 SNPs identified as being significantly associated with IIP in the meta-analysis of [5], 25 corresponded to genes we tested in DESeq with 2 being significant, giving a hypergeometric p-value of 0.158. Although this p-value is not as low as the p-value for the discovery set of GWAS SNPs, the QQ-plots in Figure 3 and Figure 4 show that the distribution of p-values in validated GWAS SNPs has a larger deviation from the diagonal, indicating that these higher confidence GWAS SNPs may still have a greater enrichment for DE genes than the discovery set.
Figure 3

GWAS genes identified from validated SNPs appear to be enriched for differential expression.

X-axis values of dots indicate the mean −log10(pval) from 10,000 randomly permuted sets, and error bars indicate one standard deviation from the mean in these permutations.

Figure 4

GWAS identified genes appear to be enriched for differential expression.

We see smaller differential expression p-values for GWAS genes than for randomly selected genes.

GWAS genes identified from validated SNPs appear to be enriched for differential expression.

X-axis values of dots indicate the mean −log10(pval) from 10,000 randomly permuted sets, and error bars indicate one standard deviation from the mean in these permutations.

GWAS identified genes appear to be enriched for differential expression.

We see smaller differential expression p-values for GWAS genes than for randomly selected genes. The most striking observation about the genes in Table 2 is that all have been implicated in cellular adhesion, migration, or invasion, underscoring the observation made in [5] that cell-cell adhesion should be high on the list of processes considered for further research and potential therapeutic interventions. Our results confirm those of [5] showing that MUC5B and DSP are significantly over-expressed in IPF lungs as compared to controls, and these genes are discussed in detail in this previous study; in particular DSP is involved in cell adhesion via the connection of cytoskeleton to cell membrane. RNF5 is an E3 ubiquitin ligase which has been associated with focal adhesion as well as endoplasmic reticulum stress response [11], and is proposed to be a regulator of breast cancer progression through its effects on actin cytoskeleta [12]. RNF5 acts by ubiquitination of the protein paxillin, decreasing its localization in focal adhesions and impairing cell motility [13]. AGER encodes a receptor for advanced glycosylation end-product which is normally highly expressed in adult lung, and which has been shown to be significantly down-regulated in IPF lungs as compared to controls [14], [15]. A decrease in this gene’s protein product (RAGE) has been shown to impede cellular adhesion to collagen, leading to increased migration of epithelial cells and fibroblasts [15]. RAGE is also associated with the differentiation of alveolar epithelial type II cells into alveolar type I cells [16]. SRGAP3 (MEGAP) is a small GTPase involved in the Slit-Robo pathway which plays a role in neuronal development and has been implicated in X-specific mental retardation [17]. In this context it has been shown that SRGAP3 inhibits the formation of focal complexes and alters the actin and microtubule cytoskeleton, thereby impeding cell migration [18]. MAPK10 is the gene encoding the enzyme mitogen-activated protein kinase 10, also known as JNK3. The related kinase JNK1 contributes to TGFβ1-induced epithelial to mesenchymal transition and collagen deposition, and has been shown to be necessary for the development of pulmonary fibrosis in a mouse model [19]; however contrary to our findings, this study reports that JNK3 expression is limited to heart, testis, and brain. JNK has been shown to be activated in IPF tissues [20], where it appears to play a role in persistence of myofibroblasts [21], and inhibition of JNK has been shown to enhance cell-cell adhesion [22]. FAT1 encodes a protein that belongs to the cadherin superfamily of membrane proteins. Previous research has shown that FAT1 is involved with actin dynamics and cellular polarization in mammalian (rat/mouse) cell lines; in particular it was shown to be necessary for regulation of cellular motility in wound closing, in that a knockdown of this protein led to a decrease in cell migration [23]. FAT1 has also been associated with liver fibrosis [24]. Heparin-binding EGF-like growth factor (HBEGF) is an epidermal growth factor which affects multiple cell types, including fibroblasts, keratinocytes, and vascular smooth muscle cells, and has been associated in various ways with fibrosis occuring in the heart, liver, and pancreas of mice or rats [25], [26], [27]. It promotes the survival and proliferation of mesenchymal cells like fibroblasts and myofibroblasts, which is a key element in the progression of a normal response to lung injury to a pathologic state of fibrosis [28]. HBEGF has been shown to enhance cellular adhesion to the extracellular matrix, as well as invasion, angiogenesis and EMT in the context of ovarian cancer [29]. It has also been shown to increase migration in keratinocytes [30] and human peritoneal membrane cells [31].

Network Analysis

The results of our network analysis recapitulate those described in other studies, underscoring that IPF is a disease in which many pathways are disrupted. Tables 3, 4, and 5 list those pathways identified by SPIA [32] as being significantly perturbed at an FDR of <10%, where we provided SPIA with a list of genes found to be significantly differentially expressed in our RNA-Seq data at FDRs of 1%, 5% and 10%, respectively.
Table 3

SPIA results when provided with DE genes significant at 1% FDR.

NameFDRStatus
Salivary secretion0.0329Inhibited
Pancreatic secretion0.0441Inhibited
ECM-receptor interaction0.0441Activated
Focal adhesion0.0785Activated
Table 4

SPIA results when provided with DE genes significant at 5% FDR.

NameFDRStatus
Salivary secretion0.0007Inhibited
TGF-beta signaling pathway0.0022Activated
Amoebiasis0.0483Inhibited
ECM-receptor interaction0.0483Activated
Basal cell carcinoma0.0643Inhibited
Wnt signaling pathway0.0866Inhibited
Table 5

SPIA results when provided with DE genes significant at 10% FDR.

NameFDRStatus
Amoebiasis0.0008Inhibited
Salivary secretion0.0008Inhibited
TGF-beta signaling pathway0.0035Activated
Complement and coagulation cascades0.0254Inhibited
ECM-receptor interaction0.0523Activated
Arrhythmogenic right ventricular cardiomyopathy (ARVC)0.0523Activated
Pathways in cancer0.0523Inhibited
Dilated cardiomyopathy0.0676Activated
Wnt signaling pathway0.0729Inhibited
Transcriptional misregulation in cancer0.0729Inhibited
Basal cell carcinoma0.0729Inhibited
We also performed a gene-set enrichment analysis (see Methods) on the results of the differential expression analyses for our RNA-Seq data, first for gene-level expression and then for differential exon usage. Five pathways showed up as being significant at FDR 10% in both analyses; these are listed in Table 6.
Table 6

Pathways enriched for differentially expressed genes and for differentially spliced genes in the RNA-Seq data.

pathwayDE padjDEX padj
KEGG: ECM-receptor interaction<0.002750.010267
Reactome: Steroid metabolism<0.002750.076444
Reactome: Integrin cell-surface interactions0.0184190.083849
Reactome: Signaling by VEGF0.0698410.023158
Reactome: Hemostasis0.0954370.017600

Alternative Splicing

The Bioconductor package DEXSeq [33] discovered 675 differentially expressed exonic regions (FDR<5%), which lie in 436 unique named genes. The top ten significant regions are listed in Table 7. We were unable to perform PCR validation for the first exonic hit, RPS24, because the GC-content of the region prevented us from designing a high-quality primer within the differentially expressed exon. However we did validate differential usage in the second and third most significant regions, within the genes POSTN and COL6A3, by using a second exon in each respective gene as a control and proxy for gene-level expression. PCR confirmed that usage of these exons was depressed in IPF as compared to controls, with both exons giving a Wilcoxon rank-sum p-value of 3.108e-4, the smallest p-value possible in this nonparametric rank-based test.
Table 7

Top ten differentially expressed exons (by p-value).

genepvalpadjexondescriptionin IPF
RPS241.67e-153.42e-10chr10∶79800373–79800428ribosomal protein S24down-regulated
COL6A36.99e-157.18e-10chr2∶238296225–238296720collagen, type VI, alpha 3down-regulated
POSTN3.01e-142.06e-09chr13∶38143437–38143520periostin, osteoblast specific factordown-regulated
DLC15.82e-122.99e-07chr8∶13356583–13357330deleted in liver cancer 1up-regulated
COL3A14.03e-111.66e-06chr2∶189867756–189867788collagen, type III, alpha 1down-regulated
ZFP36L14.97e-111.70e-06chr14∶69261464–69262759ZFP36 ring finger protein-like 1down-regulated
COL3A15.84e-101.66e-05chr2∶189868460–189868513collagen, type III, alpha 1down-regulated
GM2A6.46e-101.66e-05chr5∶150646857–150650001GM2 ganglioside activatorup-regulated
COL3A19.89e-102.26e-05chr2∶189868149–189868190collagen, type III, alpha 1down-regulated
TRA2B1.75e-093.30e-05chr3∶185652294–185654025transformer 2 beta homolog (Drosophila)down-regulated

POSTN

The third most significant result lies in the gene encoding periostin (POSTN). Figure 5 shows that exon 21, labelled as bin E013 in the figure, is more likely to be spliced out in IPF samples than in controls; shown are the fitted expression and fitted splicing coefficients, for which E013 is relatively lower in IPF than other exonic regions in the gene (see [33] for more on fitted coefficients). To perform real-time PCR validation, we designed primers to amplify exon 21 as well as exon 20, an adjacent exon that is present in all annotated transcripts of POSTN. For this reason, the expression of exon 20 was used as an internal baseline from which to estimate the proportion of POSTN transcripts lacking exon 21.
Figure 5

Periostin (POSTN) shows evidence of differential splicing at exon 21.

In this splicing plot produced by DEXSeq, exon 21 (labelled E013) is highlighted in magenta.

Periostin (POSTN) shows evidence of differential splicing at exon 21.

In this splicing plot produced by DEXSeq, exon 21 (labelled E013) is highlighted in magenta. The ratios calculated for the expression of exon 21 versus exon 20 by PCR are in good linear agreement with the ratios calculated from normalized counts in the RNA-Seq data: a linear regression with intercept 0 has adjusted R 2 = 0.946, F = 261.9 with p-value 1.86e-10 (Figure 6). In addition the PCR data validates the finding that these ratios are smaller in IPF samples than in healthy controls (Wilcoxon rank-sum p-value = 3.108e-4, Figure 7).
Figure 6

Ratios of periostin exon usage calculated from qPCR vs. RNA-Seq.

The grey area indicates 95% confidence interval for the linear regression.

Figure 7

qPCR confirms down-regulation of POSTN exon 21 in IPF lung tissue.

Shown are PCR ratios of POSTN exon usage for IPF vs controls (Wilcoxon p-value = 3.108e-4).

Ratios of periostin exon usage calculated from qPCR vs. RNA-Seq.

The grey area indicates 95% confidence interval for the linear regression.

qPCR confirms down-regulation of POSTN exon 21 in IPF lung tissue.

Shown are PCR ratios of POSTN exon usage for IPF vs controls (Wilcoxon p-value = 3.108e-4). POSTN encodes the secreted extracellular matrix protein periostin, whose increased expression has previously been associated with IPF [34], and whose expression levels have been reported to be predictive of clinical progression in IPF [35]. Other studies have implicated periostin in murine models of lung fibrosis, where it has been shown to induce chemokines to recruit neutrophils and macrophages [36]. Like the proteins associated with GWAS SNPs in the previous section, periostin is involved in cell adhesion and migration. It has been implicated in tumor invasion via a contribution to epithelieal-mesenchymal transition [37], [38], and it enhances TGF-β-induced myofibroblast differentiation in neonatal lungs [39]. Although our RNA-Seq data does indicate that gene-level expression of POSTN is increased in IPF samples by a factor of 2, this increase did not reach genome-wide significance in our study (p-value = 9.5e-3), adjusted p-value = 0.1685), whereas relative down-regulation of exon 21 was highly significant (p-value = 3.01e-14, adjusted p-value = 2.06e-9). Full length periostin consists of 23 exons and there are nine reported isoforms [40]; some of these isoforms were detected only in fetal lung or renal tissue and not in adult lungs. The N-terminal region of the protein is conserved. Isoforms vary in the cassette exons 17–21 of the C-terminal region, and it has been proposed that this region binds ECM proteins like collagen and fibronectin, so that differences in the constitutive exons of this region should affect interactions with the ECM [41]. Indeed it has been shown that periostin isoforms have differential effects on cell invasiveness in several models [42].

COL6A3

The second most significant result lies in exon 4 of the gene COL6A3, and is more likely to be spliced out in IPF samples than in controls. A probe within exon 9 of COL6A3, which did not show differential splicing in the RNA-Seq data or evidence of SNPs within our samples, was chosen as an internal reference. Ratios calculated for the expression of exon 4 to exon 9 by PCR agree linearly with those calculated by RNA-Seq, giving adjusted R 2 = 0.803 and F = 62.24 with p-value 1.61e-6, calculated as for POSTN (Figure 8). Furthermore the PCR data validates that these ratios are smaller in IPF samples than in healthy controls (Wilcoxon rank-sum p-value = 3.108e-4), as shown in Figure 9).
Figure 8

Ratios of COL6A3 exon usage calculated from qPCR vs. RNA-Seq.

The grey area indicates 95% confidence interval for the linear regression.

Figure 9

qPCR confirms down-regulation of COL6A3 exon 4 in IPF lung tissue.

Shown are PCR ratios of COL6A3 exon usage for IPF vs controls (Wilcoxon p-value = 3.108e-4).

Ratios of COL6A3 exon usage calculated from qPCR vs. RNA-Seq.

The grey area indicates 95% confidence interval for the linear regression.

qPCR confirms down-regulation of COL6A3 exon 4 in IPF lung tissue.

Shown are PCR ratios of COL6A3 exon usage for IPF vs controls (Wilcoxon p-value = 3.108e-4). COL6A3 does not show differential gene-level expression in our RNA-Seq data, but it is significantly upregulated in IPF in the two previously published microarray experiments described above (GSE24206 padj = 3.16e-02, GSE32537 padj = 8.26e-17). Like periostin, collagen VI is an extracellular matrix protein which is involved in cellular adhesion [43]; it is also associated with integrin signaling. Differential splicing of COL6A3, and in particular of exon 4, has previously been associated with pancreatic cancer [44] and colon cancer [45], though in these studies it was inclusion as opposed to exclusion of exon 4 that associated with the disease phenotype. COL6A3 has been shown to be a target of the TGF-β/Smad signaling pathway [46], which is believed to play a role in the pathogenesis of idiopathic pulmonary fibrosis [4].

IPF Gene Explorer

We created a web application built from the shiny package for R [47] to visualize the results of our gene expression study as well as previously published microarray studies. The website consists of two pages: one dedicated to viewing information about a single gene (Figure 10), and another for visualizing sets of genes (Figure 11), such as genes involved in a biological pathway or genes which are significantly differentially expressed.
Figure 10

The IPF Gene Explorer displays expression data for a single user-selected gene.

Figure 11

The IPF Gene Set Explorer displays expression data for a set of user-selected genes.

The single gene page takes as input a gene by its HGNC gene symbol, and displays the p-values, adjusted p-values, and fold changes found in our study and in two recent microarray experiments [7], [8]. Boxplots of expression data are shown; for RNA-Seq data the user may choose to see counts, log-normalized counts, or variance stabilized expression data [6]. For the microarray experiments, expression data has been processed as described in the respective papers: briefly, log transformed data is normalized by Robust Multi-array Average (RMA), and in the case of GSE32537, expression values for probes in the same gene are averaged and a variance filter is employed [7], [8]. The gene set page provides a number of ways for visualizing the relative expression of genes across our 15 samples, including an interactive barchart, an expression plot, and a heatmap. The user may also view and download a table showing p-values, adjusted p-values, and log fold changes for the genes in the set.

Discussion

Interrogation of transcription through RNA sequencing enabled us to discover genes whose differential expression reaches genome-wide significance in IPF, and to detect alternative splicing events which are not easily observed with traditional microarray experiments. We identified many differentially expressed genes and exonic regions, and validated two alternative splicing events by quantitative PCR. We believe these results indicate a potential role for alternative splicing of periostin and collagen VI alpha-3 in IPF, but more investigation is needed to determine the cell types in which alternative splicing is operative, and to identify causal variants and mechanisms of this effect and their relationship to IPF. We were not able to reproduce POSTN splicing results in a smaller sample of IPF lung fibroblasts (4 IPF samples and 4 healthy controls); this could indicate that the effect manifests primarily in a different cell type, or that we had insufficient power to detect the effect with this small sample size. We looked at SNPs from the 1000 Genomes CEU population (Utah residents with Northern European ancestry) lying within 50bp of periostin and COL6A3 splice sites, and found that none of these variants were significantly associated with IPF in the recent GWAS study [5]. Our results indicate that RNA-Seq has the potential to identify novel gene targets for further research, and larger studies and follow-up experiments will shed further light on the mechanisms underlying IPF. We believe that public access and easy visualization of results will enhance research efforts across disciplines in understanding this disease. To our knowledge there has been only one published study examining gene expression in IPF lung tissues via RNA-Seq [48], which used samples from three IPF lungs and three COPD controls. Although this previous study did find differential splicing of periostin at an FDR of <10% (adjusted p-value = 0.0694), overall the set of genes which they find to have significant differential splicing at FDR <5% has small overlap with the set of genes having at least one differentially expressed exonic region (FDR<5%) in our data: only 7 genes are claimed to be significantly differentially spliced in both experiments, which is not more than expected by chance alone (1628 genes tested by both experiments, of which 147 are called significant here and 132 are called significant in [48]). There could be many reasons for this lack of replication. First, control samples are drawn from different clinical groups; second, [48] uses transcript quantification where we use differential exon usage, which introduces an extra layer of statistical modelling and estimation; third, both study sizes are relatively small, though our larger sample size should give us greater power to detect effects.

Materials and Methods

Unprocessed fastq files and processed gene and exon counts are available at NCBI’s Sequence Read Archive (SRA) and NCBI’s Gene Expression Omnibus [49] through the GEO Series accession number GSE52463 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52463). Scripts for downstream quantitative analyses are available at https://github.com/datapixie/ipf.

IRB Protocols

Sample collection and research activities approved by University of Pittsburgh IRB protocol: IRB970946 and by Stanford IRB protocol: 18891 (Pathogenesis and progression of interstitial lung disease). Participants provided written consent to participate in study. The ethics committees approved the consent procedure. All data was analyzed anonymously.

Human Lung Samples

We obtained 8 healthy lung samples and 8 IPF lung samples from resources within Stanford University and the University of Pittsburgh. IPF tissue was obtained from patients undergoing surgical biopsy or transplantation for the diagnosis of interstitial lung disease or lung transplant for IPF. Control lung tissue was taken from patients undergoing lung cancer resection, remote from the cancer, or from donor explant lung which was not deemed suitable for transplantation. These samples were taken from heart-beating patients without prior flushing of the lungs. One healthy sample was determined to have extreme-outlier patterns of gene expression relative to the other 15 lung samples, and was excluded from downstream analyses (Figure 12). In particular, it showed extremely low expression of lung surfactant proteins SFTPA, SFTPC, and SFTPB (Table 8).
Figure 12

Norm 7 is an outlier in gene expression.

This heatmap shows the top 50 most highly expressed genes (average across all samples) and corresponding hierarchical clustering of samples.

Table 8

Norm 7 shows outlier patterns of gene expression.

geneNorm1Norm2Norm3Norm4Norm5Norm6Norm7Norm8IPF1IPF2IPF3IPF4IPF5IPF6IPF7IPF8
SFTPB7030272895423191637574228375872142129418673385453898640438170762119739943110583
SFTPC10049777145796402701377635335485185724851202457065780150553843892481015162671
SFTPA18300187500643069750567908533751923898361924749411733277627337962155551131945
SFTPA215677712693110219117623610068559697462966449723610791426552364146201567970143198983
PCSK11613324910561958734539517955411052347
DIRAS31323112782925199403478503020314856
PCK1022214772628126102134102414764
SCG2128441992932937334571383621365332
NEFL03217143972613799316763
CHGA221791218638971852142814121017

Normalized counts were variance-stabilized using the DESeq package to give “VST” expression values. Listed are the raw counts for genes which had VST <6 for one sample and average VST >14 for all other samples, or VST >14 in one sample and average VST <6 for all other samples. Healthy 7 was the only sample displaying such patterns. The four genes which are very highly expressed in all samples except Norm7 encode lung-specific surfactant proteins which reduce alveolar surface tension. Lack of these proteins in the lung leads to respiratory failure.

Norm 7 is an outlier in gene expression.

This heatmap shows the top 50 most highly expressed genes (average across all samples) and corresponding hierarchical clustering of samples. Normalized counts were variance-stabilized using the DESeq package to give “VST” expression values. Listed are the raw counts for genes which had VST <6 for one sample and average VST >14 for all other samples, or VST >14 in one sample and average VST <6 for all other samples. Healthy 7 was the only sample displaying such patterns. The four genes which are very highly expressed in all samples except Norm7 encode lung-specific surfactant proteins which reduce alveolar surface tension. Lack of these proteins in the lung leads to respiratory failure.

Genotyping and Determination of Covariates

All samples were genotyped on Illumina Human Exome BeadChips, which give allelic information at >200,000 SNPs. Principal components (PC) analysis [50] was performed on these genotypes, with the first two PCs revealing three distinct clusters (Figure 13). Comparison to data from the Phase-One Thousand Genomes project [51] showed that this clustering corresponded to demographic structure among samples (Figure 14). We identified two IPF samples as coming from individuals of Asian descent and three samples from admixed or American descent, while the rest cluster with samples of European descent. The inclusion of each sample into one of these three demographic groups was used as a covariate in downstream differential expression analyses. We used Plink’s sex check tool (based on heterozygosity rates on the X chromosome [52]) to identify the sex of each sample, which was also used as a covariate.
Figure 13

Subject genotypes cluster into three demographic groups.

The first two principal components of sample genotype scores separate samples into three clusters.

Figure 14

The three genotype clusters correspond to sample population-of-origin.

The first two principal components for sample genotypes combined with corresponding Thousand Genomes genotypes are shown.

Subject genotypes cluster into three demographic groups.

The first two principal components of sample genotype scores separate samples into three clusters.

The three genotype clusters correspond to sample population-of-origin.

The first two principal components for sample genotypes combined with corresponding Thousand Genomes genotypes are shown.

Library Preparation and Sequencing

RNA was isolated from all lung samples with Trizol and rRNA was depleted using the Ribo-Zero Magnetic Kit (Epicentre, Madison, WI). Briefly, 1 μg total RNA was incubated with rRNA removal solution containing rRNA specific probes according to instructions at 68°C for 10 minutes. rRNA bound to probes was removed by magnetic bead pull-down. The final ribosomal depleted RNA was recovered following sodium acetate/glycogen addition and ethanol precipitation overnight. Samples were centrifuged at 10,000×g for 30 min, washed once per instruction, and resuspended with RNAse-free water. The remaining ribosomal depleted RNA was used to generate cDNA libraries using the Illumina TruSeq RNA preparation kit. Strand specificity was performed using dUTP during second strand synthesis. All samples were indexed with Illumina adapters and sequenced using an Illumina HiSeq 2000.

Alignment and Counting of RNA-Seq Reads

Reads were aligned to the UCSC hg19 human reference genome using default parameters with STAR v2.0.2c [53], which was provided with splice junction information from Gencode v.14 annotation. The median number of reads for all 16 samples was about 26 million, and the median number of uniquely mapped reads was about 22.5 million (Table 9). Nonuniquely mapped reads and reads mapping to mitochondrial DNA were discarded. An in-house script was used to count reads over each gene in the gencode annotation, and DEXSeq-bundled scripts were used to count reads lying in non-overlapping exonic parts, as described in [54]. Read mappings were required to be properly stranded when performing these counts.
Table 9

Numbers of reads sequenced for each sample.

samplenumber of readsuniquely mapped reads
Norm12062346617354393
Norm21971164316939631
Norm32385038421326596
Norm42756815625210789
Norm52132856718264192
Norm62605468622831687
Norm72960694026957981
Norm82718291323959608
IPF12182200518861962
IPF22669630923524193
IPF32810472324908240
IPF42619731823168922
IPF52340123920251950
IPF62678034222191034
IPF72341378820835580
IPF82964046725248566

Differential Expression Analysis

The bioconductor package DESeq tests for differential gene expression by modeling read counts with a negative binomial distribution, where means are predicted via a generalized linear model (GLM) with logarithmic link and dependencies of variance on mean are estimated from the data. DESeq tests for differential expression by comparing the fit of the GLM with disease-state coefficient to one without, via a χ 2 likelihood-ratio test [6]. After filtering out the 40% lowest-expressed genes (Figure 15), we used DESeq v.1.10.1 to identify genes that were differentially expressed between IPF samples and healthy controls. The “pooled-CR” option was used to estimate dispersions and resulting p-values were adjusted by the method of Benjamini-Hochberg (BH) to account for multiple testing [55]. The related package DEXSeq v.1.0.2 [33] was used to find differential exon usage between the two cohorts. Overlapping genes were excluded from the differential exon analysis. Demographic group and sex were added as covariates in the generalized linear models used by both DESeq and DEXSeq.
Figure 15

Genes with low read count are filtered prior to testing for differential expression.

Each point represents a gene. We are able to increase our detection power by eliminating tests which have read counts below the 40th percentile, and therefore would have small probability of attaining significance [6].

Genes with low read count are filtered prior to testing for differential expression.

Each point represents a gene. We are able to increase our detection power by eliminating tests which have read counts below the 40th percentile, and therefore would have small probability of attaining significance [6].

Microarray Comparison

We compared differentially expressed genes found in the RNA-Seq data with two recently published studies which used microarrays to estimate relative transcript expression [7], [8]. The data for these studies are publicly available on the Gene Expression Omnibus (GEO) website (accession IDs GSE24206 and GSE32537) [49]. Differential expression analysis was performed using the limma package [56], [57], via R scripts produced by GEO’s geo2R application.

Gene Network Analysis

We determined pathway enrichment for differentially expressed or spliced genes by a method using p-value enrichment permutation analysis [58]. Gene sets were obtained from the following online databases: BioCarta (www.biocarta.com), KEGG (www.genome.jp/kegg), Pathway Interaction Database (pid.nci.nih.gov), Reactome (www.reactome.org), SigmaAldrich (www.sigmaaldrich.com/life-science.html), Signaling Gateway (www.signaling-gateway.org), Signal Transduction KE (stke.sciencemag.org), SuperArray (www.superarray.com). A score for each gene (resp., exon) was computed as the negative log10 of the p-value output from DESeq, and the median of scores in each gene set was used as the gene set’s score. Gene (exon) scores were permuted so as to preserve the number of genes (exons) in each pathway and the number of pathways corresponding to each gene; gene set scores were then recomputed, and this was repeated 10,000 times. The distribution of scores for each gene set was used to establish an empirical p-value, and p-values for gene sets were adjusted by BH. We compared these results with those from SPIA [32], a Bioconductor network analysis package which incorporates knowledge of network topology and gene expression fold change to determine if a pathway is activated or inactivated in IPF relative to healthy samples. We provided SPIA with the fold changes of genes which were found to have significant differential expression at 1%, 5%, and 10% FDR. Because SPIA utilizes information at the gene-level, this analysis could not be used to identify pathways enriched for alternative splicing events.

GWAS Enrichment

In order to determine whether the overlap between the set of differentially expressed genes and the set of GWAS-identified genes (both considered as subsets of those genes tested for differential expression) is larger than expected by chance alone, we calculated a p-value of 0.013 from the hypergeometric distribution in R as:where q is the size of the overlap minus one, m is the number of genes found to be significantly differentially expressed at FDR 5%, n is the number of ensembl IDs tested for differential expression which were not called significant, and k is the number of GWAS-identified genes which were tested for differential expression. This p-value is equal to the probability of drawing 8 or more green balls from an urn containing 873 green balls and 29889 red balls, when you have drawn a total of 110 balls randomly from the urn. Since the p-value indicates that it is unlikely to see an intersection of 8 genes by chance alone, we conclude that the GWAS-identified genes are enriched for differential expression. Other hypergeometric p-values were calculated similarly. For the sample-based permutation approach, we permuted the disease status (IPF or healthy) of our 15 subjects. For each permutation, DESeq was rerun and the size of overlap between significant genes and GWAS genes was computed. We ran 200 permutations, and of these only two had overlaps of size 8 or larger, giving an empirical p-value of 0.01.

PCR Validation

POSTN primers for exons 20 and 21 were designed to produce amplicons of approximately the same length that were solely contained within each respective exon at final concentrations of 500nM (POSTN 20FWD-2: ACTAGGATTTCTACTGGAGGTGGA; POSTN 20REV-2: ACAATTTCTTCAGAGTTTCTTCTGT; POSTN 21 FWD-2: AGGTCACCAAGGTCACCAAA; POSTN 21REV-3 TCAAATAAATGACCATCACCACCT). Phusion High-Fidelity Polymerase (0.02U/ul) and 1x buffer (NEB, Ipswich, MA) were used with a final concentration of 0.6x SYBR Green (Molecular Probes, Eugene OR) and 0.2mM dNTPs for 40 cycles on a StepOnePlus Real-Time Q-PCR machine (Life Technologies, Carlsbad, CA). cDNA libraries used for RNA-Seq representing healthy and IPF lung samples were diluted over 5 two-fold dilutions for Q-PCR and amplified to produce standard curves. Real time PCR was performed for each sample and each of the two primers at five different dilutions. For each sample, the machine-calculated optimal threshold was found for each exon primer and concentration separately, and C values were calculated by using the mean of these thresholds for both exons. We then calculated standard curves for each (sample, exon) pair as a quality control to confirm linearity between C and log concentration. The expression ratio for each sample was calculated aswhere C(j) is the C value found for exon j at the most concentrated dilution (0.1). PCR validation for COL6A3 was performed similarly (primers COL6A3 E057-FWD: CCGATATTGGCCAACTCCCC; COL6A3 E057-REV: GACACCTACTCCACCAAGGC; COL6A3 E050-FWD: ATGAGGGTGCGAACGTACTG; COL6A3 E050-REV: GCAAGAGGGACGTGGTCTTT).
  54 in total

Review 1.  Analysing biological pathways in genome-wide association studies.

Authors:  Kai Wang; Mingyao Li; Hakon Hakonarson
Journal:  Nat Rev Genet       Date:  2010-12       Impact factor: 53.242

2.  Principal components analysis corrects for stratification in genome-wide association studies.

Authors:  Alkes L Price; Nick J Patterson; Robert M Plenge; Michael E Weinblatt; Nancy A Shadick; David Reich
Journal:  Nat Genet       Date:  2006-07-23       Impact factor: 38.330

3.  Constitutive ALK5-independent c-Jun N-terminal kinase activation contributes to endothelin-1 overexpression in pulmonary fibrosis: evidence of an autocrine endothelin loop operating through the endothelin A and B receptors.

Authors:  Xu Shi-Wen; Fernando Rodríguez-Pascual; Santiago Lamas; Alan Holmes; Sarah Howat; Jeremy D Pearson; Michael R Dashwood; Roland M du Bois; Christopher P Denton; Carol M Black; David J Abraham; Andrew Leask
Journal:  Mol Cell Biol       Date:  2006-07       Impact factor: 4.272

4.  Overexpression of heparin-binding EGF-like growth factor in mouse pancreas results in fibrosis and epithelial metaplasia.

Authors:  Anna L Means; Kevin C Ray; Amar B Singh; M Kay Washington; Robert H Whitehead; Raymond C Harris; Christopher V E Wright; Robert J Coffey; Steven D Leach
Journal:  Gastroenterology       Date:  2003-04       Impact factor: 22.682

5.  Bayesian probit regression model for the diagnosis of pulmonary fibrosis: proof-of-principle.

Authors:  Eric B Meltzer; William T Barry; Thomas A D'Amico; Robert D Davis; Shu S Lin; Mark W Onaitis; Lake D Morrison; Thomas A Sporn; Mark P Steele; Paul W Noble
Journal:  BMC Med Genomics       Date:  2011-10-05       Impact factor: 3.063

6.  Mesenchymal cell survival in airway and interstitial pulmonary fibrosis.

Authors:  James C Bonner
Journal:  Fibrogenesis Tissue Repair       Date:  2010-08-25

7.  Neonatal periostin knockout mice are protected from hyperoxia-induced alveolar simplication.

Authors:  Paul D Bozyk; J Kelley Bentley; Antonia P Popova; Anuli C Anyanwu; Marisa D Linn; Adam M Goldsmith; Gloria S Pryhuber; Bethany B Moore; Marc B Hershenson
Journal:  PLoS One       Date:  2012-02-17       Impact factor: 3.240

8.  Alternative splicing and differential gene expression in colon cancer detected by a whole genome exon array.

Authors:  Paul J Gardina; Tyson A Clark; Brian Shimada; Michelle K Staples; Qing Yang; James Veitch; Anthony Schweitzer; Tarif Awad; Charles Sugnet; Suzanne Dee; Christopher Davies; Alan Williams; Yaron Turpaz
Journal:  BMC Genomics       Date:  2006-12-27       Impact factor: 3.969

9.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

10.  Genome-wide association study identifies multiple susceptibility loci for pulmonary fibrosis.

Authors:  Tasha E Fingerlin; Elissa Murphy; Weiming Zhang; Anna L Peljto; Kevin K Brown; Mark P Steele; James E Loyd; Gregory P Cosgrove; David Lynch; Steve Groshong; Harold R Collard; Paul J Wolters; Williamson Z Bradford; Karl Kossen; Scott D Seiwert; Roland M du Bois; Christine Kim Garcia; Megan S Devine; Gunnar Gudmundsson; Helgi J Isaksson; Naftali Kaminski; Yingze Zhang; Kevin F Gibson; Lisa H Lancaster; Joy D Cogan; Wendi R Mason; Toby M Maher; Philip L Molyneaux; Athol U Wells; Miriam F Moffatt; Moises Selman; Annie Pardo; Dong Soon Kim; James D Crapo; Barry J Make; Elizabeth A Regan; Dinesha S Walek; Jerry J Daniel; Yoichiro Kamatani; Diana Zelenika; Keith Smith; David McKean; Brent S Pedersen; Janet Talbert; Raven N Kidd; Cheryl R Markin; Kenneth B Beckman; Mark Lathrop; Marvin I Schwarz; David A Schwartz
Journal:  Nat Genet       Date:  2013-04-14       Impact factor: 38.330

View more
  30 in total

1.  Attenuated pulmonary fibrosis in sialidase-3 knockout (Neu3-/-) mice.

Authors:  Tejas R Karhadkar; Wensheng Chen; Richard H Gomer
Journal:  Am J Physiol Lung Cell Mol Physiol       Date:  2019-10-16       Impact factor: 5.464

2.  Periostin Deficiency Causes Severe and Lethal Lung Injury in Mice With Bleomycin Administration.

Authors:  Hirofumi Kondoh; Takashi Nishiyama; Yoshinao Kikuchi; Masashi Fukayama; Mitsuru Saito; Isao Kii; Akira Kudo
Journal:  J Histochem Cytochem       Date:  2016-06-07       Impact factor: 2.479

3.  Long Noncoding RNA FENDRR Exhibits Antifibrotic Activity in Pulmonary Fibrosis.

Authors:  Chaoqun Huang; Yurong Liang; Xiangming Zeng; Xiaoyun Yang; Dao Xu; Xuxu Gou; Roshini Sathiaseelan; Lakmini Kumari Senavirathna; Pengcheng Wang; Lin Liu
Journal:  Am J Respir Cell Mol Biol       Date:  2020-04       Impact factor: 6.914

Review 4.  The role of periostin in lung fibrosis and airway remodeling.

Authors:  David N O'Dwyer; Bethany B Moore
Journal:  Cell Mol Life Sci       Date:  2017-09-16       Impact factor: 9.261

Review 5.  Introductory review: periostin-gene and protein structure.

Authors:  Akira Kudo
Journal:  Cell Mol Life Sci       Date:  2017-09-07       Impact factor: 9.261

6.  Transcriptomic evidence of immune activation in macroscopically normal-appearing and scarred lung tissues in idiopathic pulmonary fibrosis.

Authors:  Irina G Luzina; Mariah V Salcedo; Mónica L Rojas-Peña; Anne E Wyman; Jeffrey R Galvin; Ashutosh Sachdeva; Andrew Clerman; June Kim; Teri J Franks; Edward J Britt; Jeffrey D Hasday; Si M Pham; Allen P Burke; Nevins W Todd; Sergei P Atamas
Journal:  Cell Immunol       Date:  2018-01-03       Impact factor: 4.868

7.  Modeling of Fibrotic Lung Disease Using 3D Organoids Derived from Human Pluripotent Stem Cells.

Authors:  Alexandros Strikoudis; Anna Cieślak; Lucas Loffredo; Ya-Wen Chen; Nina Patel; Anjali Saqi; David J Lederer; Hans-Willem Snoeck
Journal:  Cell Rep       Date:  2019-06-18       Impact factor: 9.423

Review 8.  Periostin in inflammation and allergy.

Authors:  Kenji Izuhara; Satoshi Nunomura; Yasuhiro Nanri; Masahiro Ogawa; Junya Ono; Yasutaka Mitamura; Tomohito Yoshihara
Journal:  Cell Mol Life Sci       Date:  2017-09-08       Impact factor: 9.261

9.  LEM domain-containing protein 3 antagonizes TGFβ-SMAD2/3 signaling in a stiffness-dependent manner in both the nucleus and cytosol.

Authors:  Dwight M Chambers; Leandro Moretti; Jennifer J Zhang; Spencer W Cooper; Davis M Chambers; Philip J Santangelo; Thomas H Barker
Journal:  J Biol Chem       Date:  2018-08-14       Impact factor: 5.157

10.  Differential Expression of VEGF-Axxx Isoforms Is Critical for Development of Pulmonary Fibrosis.

Authors:  Shaney L Barratt; Thomas Blythe; Caroline Jarrett; Khadija Ourradi; Golda Shelley-Fraser; Michael J Day; Yan Qiu; Steve Harper; Toby M Maher; Sebastian Oltean; Thomas J Hames; Chris J Scotton; Gavin I Welsh; David O Bates; Ann B Millar
Journal:  Am J Respir Crit Care Med       Date:  2017-08-15       Impact factor: 21.405

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.