| Literature DB >> 22375630 |
Neil R Hackett1, Marcus W Butler, Renat Shaykhiev, Jacqueline Salit, Larsson Omberg, Juan L Rodriguez-Flores, Jason G Mezey, Yael Strulovici-Barel, Guoqing Wang, Lukas Didon, Ronald G Crystal.
Abstract
BACKGROUND: The small airway epithelium (SAE), the cell population that covers the human airway surface from the 6th generation of airway branching to the alveoli, is the major site of lung disease caused by smoking. The focus of this study is to provide quantitative assessment of the SAE transcriptome in the resting state and in response to chronic cigarette smoking using massive parallel mRNA sequencing (RNA-Seq).Entities:
Mesh:
Year: 2012 PMID: 22375630 PMCID: PMC3337229 DOI: 10.1186/1471-2164-13-82
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Establishment of detection limit for gene expression for RNA-Seq assessment of gene expression of the small airway epithelium of healthy nonsmokers. A. Distribution of RPKM for exons (blue), introns (red), and intergenic regions (green). RPKM depends on the size and read numbers mapped in the region considered. The RPKM for introns and intergenic regions was calculated by selecting intronic and intergenic regions throughout the genome that match the size of the exons analyzed, i.e., the size is comparable for the introns, intergenic regions, and exons. B. Estimate of minimum detectable level of expression (RPKM = 0.125) determined from an estimate of false discovery rate (red) and false negative rate (purple) [24]. C. Average RPKM expression levels (log10). Dashed line represents the 0.125 threshold.
Figure 2Composition of the healthy nonsmoker small airway epithelium (SAE) transcriptome. A. Comparison of the SAE transcriptome to that of other tissues. Abscissa - the number of genes, with the genes in descending order of mRNA level. Ordinate - fraction of all mRNAs derived from these genes. The genes expressed by the small airway epithelium (blue) are compared to genes expressed by other organs as indicated [24]. Note that the SAE is similar to liver in that a few genes are expressed at very high levels. B. RNA-Seq sequence alignments for SCGB1A1 (uteroglobin; CC10), the most highly expressed gene in the SAE. The region of the genome corresponding to SCGB1A1 is shown with the read coverage depth for 5 healthy nonsmokers plotted using Partek Genomics Suite version 6.5. RPKM for whole mRNA for each subject is shown on the left. C-E. Frequency distribution of expression level for ubiquitous vs SAE-enriched genes in the small airway epithelium of healthy nonsmokers. "Ubiquitous" genes are those expressed by most tissues; "SAE-enriched" genes are those more abundant in SAE compared to other tissues (see text). For all panels, the number of genes in 1/2 log10 bins was determined starting at the detection limit (RPKM = 0.125). For each panel, the expressed genes are grouped (in 1/2 log10 bins); low (-0.9 to 1), median (> 1 to 10) and high (> 10), with the number of genes and % in each category listed and median RPKM for n = 5 healthy nonsmokers. C. All genes. D. Ubiquitous genes, representing 48% of all expressed genes. E. SAE-enriched genes representing 52% of all expressed genes. Note that the SAE-enriched genes have a much larger proportion of low level expressed genes compared to the ubiquitous genes. F, G. Comparison of coverage of RNA-Seq and microarray assessment of SAE gene expression of healthy nonsmokers. Genes assessed by RNA-Seq were divided into low (0.125-1), median (> 1-10) and high (> 10) RPKM on the basis of median expression level in n = 5 nonsmokers. Affymetrix U133 data for small airway epithelium for n = 27 African-American healthy nonsmokers [[129]; Additional file 1, Table S1] were assessed based on the Affymetrix P calls in low expression (Affymetrix "present (P)" in < 50%; red) or high expression in microarray ("P" in > 50%; blue). Genes with no unique probe on the microarray are identified in green. F. Ubiquitous genes. G. SAE- enriched genes. For medium and high expressing genes the microarray and RNA-Seq are very similar in detecting expressed genes, but for the SAE-enriched, low level expressed genes detected by RNA-Seq, the microarrays miss a large proportion of the genes.
Overall Most Highly Expressed Genes in the SAE of Healthy Nonsmokers1
| Gene symbol | Gene title | |
|---|---|---|
| SCGB1A1 | secretoglobin, family 1A, member 1 (uteroglobin) | 38675 |
| SCGB3A1 | secretoglobin, family 3A, member 1 | 7838 |
| SLPI | secretory leukocyte peptidase inhibitor | 1602 |
| C20orf114 | chromosome 20 open reading frame 114 | 1484 |
| TPPP3 | tubulin polymerization-promoting protein family member 3 | 1302 |
| CD74 | CD74 molecule, major histocompatibility complex, class II invariant chain | 947 |
| TMEM190 | transmembrane protein 190 | 945 |
| GSTP1 | glutathione S-transferase pi 1 | 859 |
| WFDC2 | WAP four-disulfide core domain 2 | 840 |
| C20orf85 | chromosome 20 open reading frame 85 | 738 |
| TSPAN1 | tetraspanin 1 | 664 |
| C9orf24 | chromosome 9 open reading frame 24 | 629 |
| NEAT1 | nuclear paraspeckle assembly transcript 1 (non-protein coding) | 565 |
| S100A11 | S100 calcium binding protein A11 | 540 |
| KRT19 | keratin 19 | 493 |
| MALAT1 | metastasis associated lung adenocarcinoma transcript 1 (non-protein coding) | 461 |
| ODF3B | outer dense fiber of sperm tails 3B | 392 |
| CYP4B1 | cytochrome P450, family 4, subfamily B, polypeptide 1 | 374 |
| FOXJ1 | forkhead box J1 | 363 |
| LCN2 | lipocalin 2 | 359 |
| PIGR | polymeric immunoglobulin receptor | 351 |
| MS4A8B | membrane-spanning 4-domains, subfamily A, member 8B | 348 |
| ALDH3B1 | aldehyde dehydrogenase 3 family, member B1 | 342 |
| MSMB | microseminoprotein, beta- | 333 |
| RSPH1 | radial spoke head 1 homolog (Chlamydomonas) | 318 |
| CLDN4 | claudin 4 | 308 |
| AQP3 | aquaporin 3 (Gill blood group) | 308 |
| C9orf117 | chromosome 9 open reading frame 117 | 302 |
| IGFBP2 | insulin-like growth factor binding protein 2, 36 kDa | 297 |
| ANXA2P2 | annexin A2 pseudogene 2 | 292 |
1 Listed are the top 30 most highly expressed SAE enriched genes.
2 Median for n = 5 healthy nonsmokers. PARTEK implementation of Bowtie algorithm with parameters as described in Material and Methods.
Figure 3Relative distribution of the expression of ubiquitous and SAE-enriched genes of healthy non-smokers in different functional categories. Gene ontology assignments were used to identify genes of 7 functional categories and the frequency distribution of expression level was determined in 1/2 log10 bins, starting with the threshold (RPKM 0.125 = -0.9 log10). The data is plotted separately for the ubiquitous genes (open symbols) and SAE-enriched genes (closed symbols) with the number and percentage of genes in each low, medium and high group. For each panel, for each group, listed is the number of genes, % of the total in that category, and median RPKM for n = 5 healthy nonsmokers (number with downward arrow). A. Transcription; B. Translation; C. Immunity; D. Signal transduction; E. Adhesion; F. Membrane receptors; and G. Ion transporters.
Figure 4Distribution of expression level of cell type-specific genes in the small airway epithelium of healthy nonsmokers. Lists of genes specific to neuroendocrine cells, basal cells, secretory cells (including all mucins) and ciliated cells were used to assess the cumulative frequency distribution of expression levels for each category. The lists of cell type-specific genes are from the literature, including neuroendocrine genes [31], basal cells [30], secretory cells and ciliated cells [52]. Ordinate - cumulative frequency; abscissa - expression level (RPKM in 1/2 log10 bins).
Most Highly Expressed Genes Enriched in Differentiated Cell Types of the SAE of Healthy Nonsmokers1
| Differentiated cell type | Gene symbol | Gene title | |
|---|---|---|---|
| TUBB2C | tubulin, beta 2C | 1161 | |
| ACTG1 | actin, gamma 1 | 513 | |
| TUBA1A | tubulin, alpha 1a | 342 | |
| HSPA1A | heat shock 70 kDa protein 1A | 266 | |
| HSPA1B | heat shock 70 kDa protein 1A///heat shock 70 kDa protein 1B | 260 | |
| TEKT2 | tektin 2 (testicular) | 146 | |
| DYNLT1 | dynein, light chain, Tctex-type 1 | 131 | |
| DNAI1 | dynein, axonemal, intermediate chain 1 | 108 | |
| DNALI1 | dynein, axonemal, light intermediate chain 1 | 108 | |
| DNAI2 | dynein, axonemal, intermediate chain 2 | 105 | |
| SPAG6 | sperm associated antigen 6 | 101 | |
| DYNLRB2 | dynein, light chain, roadblock-type 2 | 100 | |
| CROCC | ciliary rootlet coiled-coil, rootletin | 84 | |
| PPP2R1A | protein phosphatase 2, regulatory subunit A, alpha | 73 | |
| DNAH9 | dynein, axonemal, heavy chain 9 | 63 | |
| CCDC146 | coiled-coil domain containing 146 | 60 | |
| RSPH4A | radial spoke head 4 homolog A (Chlamydomonas) | 59 | |
| CALM3 | calmodulin 3 (phosphorylase kinase, delta) | 56 | |
| TCTEX1D2 | Tctex1 domain containing 2 | 50 | |
| IFT140 | intraflagellar transport 140 homolog (Chlamydomonas) | 49 | |
| AGR2 | anterior gradient homolog 2 | 166 | |
| TFF3 | trefoil factor 3 (intestinal) | 149 | |
| MUC1 | mucin 1, cell surface associated | 123 | |
| MUC5B | mucin 5B, oligomeric mucus/gel-forming | 118 | |
| MUC4 | mucin 4, cell surface associated | 93 | |
| MUC15 | mucin 15, cell surface associated | 28 | |
| MUC20 | mucin 20, cell surface associated | 27 | |
| MUC16 | mucin 16, cell surface associated | 20 | |
| MUC13 | mucin 13, cell surface associated | 15 | |
| MUCL1 | mucin-like 1 | 3.90 | |
| TFF1 | trefoil factor 1 | 2.09 | |
| PARM1 | prostate androgen-regulated mucin-like protein 1 | 1.85 | |
| EMR2 | egf-like module containing, mucin-like, hormone receptor-like 2 | 1.38 | |
| GCNT3 | glucosaminyl (N-acetyl) transferase 3, mucin type | 0.81 | |
| MUC2 | mucin 2, oligomeric mucus/gel-forming | 0.60 | |
| MUC6 | mucin 6, oligomeric mucus/gel-forming | 0.58 | |
| MUC12 | mucin 12, cell surface associated | 0.46 | |
| MALAT1 | metastasis associated lung adenocarcinoma transcript 1 (non-protein coding) | 461 | |
| CST3 | cystatin C | 295 | |
| PFN1 | profilin 1 | 224 | |
| ALDOA | aldolase A, fructose-bisphosphate | 183 | |
| SQSTM1 | sequestosome 1 | 106 | |
| MT2A | metallothionein 2A | 91 | |
| ENO1 | enolase 1, (alpha) | 89 | |
| KRT7 | keratin 7 | 83 | |
| MYL12A | myosin, light chain 12A, regulatory, non-sarcomeric///myosin, light chain 12B, regulatory | 70 | |
| FLNB | filamin B, beta | 69 | |
| BRI3 | brain protein I3///hypothetical protein LOC644975 | 62 | |
| PLEC1 | plectin | 60 | |
| EIF5A | eukaryotic translation initiation factor 5A | 60 | |
| GNB1 | guanine nucleotide binding protein (G protein), beta polypeptide 1 | 57 | |
| KRT5 | keratin 5 | 54 | |
| PSMA7 | proteosome subunit alpha type 7 | 53 | |
| CTTN | cortactin | 52 | |
| JUP | junction plakoglobin | 51 | |
| MGST1 | microsomal glutathione S-trasnferase | 51 | |
| LMNA | laminin A | 49 | |
| ENO2 | enolase 2 (gamma, neuronal) | 23 | |
| GRP | gastrin-releasing peptide | 0.82 | |
| UCHL1 | ubiquitin carboxyl-terminal esterase L1 (ubiquitin thiolesterase) | 0.65 | |
| SCG2 | secretogranin II | 0.48 | |
| ASCL1 | achaete-scute complex homolog 1 (Drosophila) | 0.34 | |
| CHGA | chromogranin A (parathyroid secretory protein 1) | 0.32 |
1 List includes genes known to be enriched in expression in ciliated cells (Dvorak et al. 2010) [52], secretory cells (consisting of all mucins and mucin components), basal cells (Hackett et al. 2011) [30], and neuroendocrine cells (Carolan et al. 2008) [31]. The small airway epithelium expression level was determined and the top 20 highly expressed (or all genes expressed from mucin list and neuroendocrine list, i.e. above threshold of 0.125) were tabulated in descending order of expression level.
2 Median for n = 5 nonsmokers.
Highly Expressed SAE-enriched Transcription Factors1
| Category | Gene symbol | Gene title | Median expression level (RPKM) |
|---|---|---|---|
| Basic helix-loop | ATF6B | activating transcription factor 6 beta | 26.4 |
| BHLHE40 | basic helix-loop-helix family, member e40 | 16.2 | |
| RFX3 | regulatory factor X, 3 (influences HLA class II expression) | 15.6 | |
| HES6 | hairy and enhancer of split 6 (Drosophila) | 13.6 | |
| FOXC1 | forkhead box C1 | 9.3 | |
| CEBPA | CCAAT/enhancer binding protein (C/EBP), alpha | 9.1 | |
| HEY1 | hairy/enhancer-of-split related with YRPW motif 1 | 9.1 | |
| Zinc finger | TRIM29 | tripartite motif-containing 29 | 77.3 |
| KLF5 | kruppel-like factor 5 (intestinal) | 42.0 | |
| RREB1 | ras responsive element binding protein 1 | 7.2 | |
| KLF4 | kruppel-like factor 4 (gut) | 7.1 | |
| Helix-turn-helix | FOXJ1 | forkhead box J1 | 363.1 |
| ELF3 | E74-like factor 3 (ets domain transcription factor, epithelial-specific) | 170.1 | |
| FOXA1 | forkhead box A1 | 62.6 | |
| EHF | ets homologous factor | 39.1 | |
| TBX1 | T-box 1 | 19.7 | |
| SATB1 | SATB homeobox 1 | 19.3 | |
| MYB | v-myb myeloblastosis viral oncogene homolog (avian) | 15.9 | |
| SIX2 | SIX homeobox 2 | 13.9 | |
| NKX2-1 | NK2 homeobox 1 | 11.6 | |
| PHTF1 | putative homeodomain transcription factor 1 | 10.9 | |
| TEAD3 | TEA domain family member 3 | 10.9 | |
| ETV6 | ets variant 6 | 10.6 | |
| FOXA2 | forkhead box A2 | 6.9 | |
| β-scaffold | SOX2 | SRY (sex determining region Y)-box 2 | 71.3 |
| RUNX1 | runt-related transcription factor 1 | 16.7 | |
| SOX4 | SRY (sex determining region Y)-box 4 | 11.4 | |
| TFCP2 | transcription factor CP2 | 10.7 | |
| SOX21 | SRY (sex determining region Y)-box 21 | 9.0 | |
| NFATC1 | nuclear factor of activated T-cells, cytoplasmic, calcineurin-dependent 1 | 7.8 | |
| SOX9 | SRY (sex determining region Y)-box 9 | 7.7 |
1 The top 30 DNA-binding transcription factors were identified in the highly expressed SAE-enriched list for n = 5 healthy nonsmokers. They were categorized by transcription factor family and sorted in descending order of expression within each category.
Highly Expressed SAE-enriched Transmembrane Receptors1
| Category | Gene symbol | Gene title | Median expression level (RPKM) |
|---|---|---|---|
| G protein coupled/7 transmembrane | CELSR1 | cadherin, EGF LAG seven-pass G-type receptor 1 (flamingo homolog, Drosophila) | 63.0 |
| C5AR1 | complement component 5a receptor 1 | 27.6 | |
| GPR110 | G protein-coupled receptor 110 | 21.7 | |
| GPRC5C | G protein-coupled receptor, family C, group 5, member C | 20.3 | |
| OXTR | oxytocin receptor | 19.3 | |
| LPAR3 | lysophosphatidic acid receptor 3 | 17.6 | |
| FZD6 | frizzled homolog 6 (Drosophila) | 12.5 | |
| PTGER4 | prostaglandin E receptor 4 (subtype EP4) | 9.2 | |
| VIPR1 | vasoactive intestinal peptide receptor 1 | 8.8 | |
| ADRB1 | adrenergic, beta-1-, receptor | 7.8 | |
| GPR116 | G protein-coupled receptor 116 | 7.5 | |
| FZD8 | frizzled homolog 8 (Drosophila) | 7.1 | |
| ADRA2A | adrenergic, alpha-2A-, receptor | 6.8 | |
| PTGFR | prostaglandin F receptor (FP) | 6.6 | |
| Cyclase related | NRP2 | neuropilin 2 | 12.0 |
| NPR2 | natriuretic peptide receptor B/guanylate cyclase B (atrionatriuretic peptide receptor B) | 10.7 | |
| CRCP | CGRP receptor component | 10.4 | |
| IgG like | SCARA3 | scavenger receptor class A, member 3 | 21.8 |
| PTPRT | protein tyrosine phosphatase, receptor type, T | 13.0 | |
| IL1R1 | interleukin 1 receptor, type I | 6.5 | |
| Ion channel | GABRP | gamma-aminobutyric acid (GABA) A receptor, pi | 15.7 |
| Serine kinase | TGFBR2 | transforming growth factor, beta receptor II (70/80 kDa) | 8.8 |
| Tyrosine kinase | DDR1 | discoidin domain receptor tyrosine kinase 1 | 140.3 |
| FGFR3 | fibroblast growth factor receptor 3 | 19.9 | |
| IGF1R | insulin-like growth factor 1 receptor | 15.2 | |
| PTK7 | PTK7 protein tyrosine kinase 7 | 14.5 | |
| FGFR2 | fibroblast growth factor receptor 2 | 11.6 | |
| MET | met proto-oncogene (hepatocyte growth factor receptor) | 8.5 | |
| EGFR | epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian) | 7.2 | |
| Other | SORL1 | sortilin-related receptor, L(DLR class) A repeats-containing | 7.7 |
1 The top 30 transmembrane receptors were identified in the highly expressed SAE-enriched list for n = 5 healthy nonsmokers. They were categorized by structural family and sorted in descending order of expres-sion within each category.
Highly Expressed SAE-enriched Signaling Ligands and Growth Factors1
| Gene symbol | Gene title | Median expression level (RPKM) |
|---|---|---|
| MDK | midkine (neurite growth-promoting factor 2) | 59.8 |
| CXCL1 | chemokine (C-X-C motif) ligand 1 (melanoma growth stimulating activity, alpha) | 49.3 |
| CX3CL1 | chemokine (C-X3-C motif) ligand 1 | 32.9 |
| TNFSF10 | tumor necrosis factor (ligand) superfamily, member 10 | 31.8 |
| FSTL1 | follistatin-like 1 | 27.9 |
| CXCL6 | chemokine (C-X-C motif) ligand 6 (granulocyte chemotactic protein 2) | 18.6 |
| FGF14 | fibroblast growth factor 14 | 16.5 |
| DLL1 | delta-like 1 (Drosophila) | 13.6 |
| JAG2 | jagged 2 | 11.9 |
| IL8 | interleukin 8 | 8.6 |
| CCL15 | C-C motif chemokine 15 | 7.8 |
| PDGFA | platelet-derived growth factor alpha polypeptide | 6.1 |
| TNFSF12 | tumor necrosis factor (ligand) superfamily, member 12 | 5.9 |
| CCL23 | chemokine (C-C motif) ligand 23 | 5.3 |
| NMB | neuromedin B | 4.9 |
| CCL5 | chemokine (C-C motif) ligand 5 | 4.9 |
| NPFF | neuropeptide FF-amide peptide precursor | 4.6 |
| FAS | fas (TNF receptor superfamily, member 6) | 3.6 |
| WIF1 | WNT inhibitory factor 1 | 3.6 |
| CCL18 | chemokine (C-C motif) ligand 18 (pulmonary and activation-regulated) | 3.5 |
| LTB | lymphotoxin beta (TNF superfamily, member 3) | 3.0 |
| LIF | leukemia inhibitory factor (cholinergic differentiation factor) | 2.9 |
| ERBB4 | v-erb-a erythroblastic leukemia viral oncogene homolog 4 (avian) | 2.7 |
| PTCH1 | patched homolog 1 (Drosophila) | 2.6 |
| WNT5A | wingless-type MMTV integration site family, member 5A | 2.1 |
| CCL17 | chemokine (C-C motif) ligand 17 | 1.7 |
| NRTN | neurturin | 1.7 |
| AREG | amphiregulin | 1.4 |
| CCL4 | chemokine (C-C motif) ligand 4 | 1.3 |
| NTS | neurotensin | 1.2 |
1 The top 30 signaling ligands and growth factors were identified in the highly expressed SAE-enriched list for n = 5 healthy nonsmokers.
Figure 5Examples of RNA-Seq quantification of small airway epithelium (SAE) expression levels of genes within gene families of ≥ 90% homology. To identify gene families expressed by the SAE, the % identity between gene pairs expressed by the healthy nonsmoker SAE was determined using BLAST, where each gene was blasted against a database of all human RefSeq mRNA [26]. Gene families were defined as genes for which the alignments yielded ≥ 90% identity and the alignment length was at least 50% of both sequences. A. CYP2A6, CYP2A7 and CYP2A13; B. GSTA1, GSTA2, GSTA3 and GSTA5; and C. MT1E, MT1L and MT1M.
Different Expression Levels Among Members of Common Gene Families Expressed in the SAE1
| Category | Gene name | Gene symbol | Median expression level (RPKM) |
|---|---|---|---|
| Cilia | tubulin, alpha | TUBA1A | 341.7 |
| TUBA1B | 162.6 | ||
| TUBA1C | 120.5 | ||
| tubulin, beta | TUBB2A | 38.7 | |
| TUBB2B | 23.3 | ||
| TUBB4 | 14.8 | ||
| Annexin (signal | annexin A2 | ANXA2 | 362.1 |
| ANXA2P2 | 292.3 | ||
| ANXA2P1 | 42.6 | ||
| ANXA2P3 | 26.1 | ||
| Glutathione | glutathione S-transferase alpha | GSTA1 | 248.9 |
| GSTA2 | 143.9 | ||
| GSTA3 | 15.9 | ||
| GSTA5 | 9.3 | ||
| Glutathione | glutathione S-transferase mu | GSTM2 | 35.9 |
| GSTM1 | 20.8 | ||
| GSTM4 | 8.8 | ||
| Sulfotransferase - | sulfotransferase family, cytosolic, 1A, phenol-preferring | SULT1A1 | 25.7 |
| SULT1A4 | 14.6 | ||
| SULT1A3 | 10.3 | ||
| SULT1A2 | 5.4 | ||
| SLX1A-SULT1A3 | |||
| SLX1B-SULT1A4 | |||
| Amylase | amylase, alpha | AMY1A | 43.4 |
| AMY1B | 18.6 | ||
| AMY1C | 15.1 | ||
| AMY2B | 13.8 | ||
| AMY2A | 7.3 | ||
| Polarity/left right | NODAL modulator | NOMO2 | 22.4 |
| NOMO1 | 16.0 | ||
| NOMO3 | 14.3 | ||
| Metallothionein | metallothionein | MT1E | 33.2 |
| MT1L | 2.4 | ||
| MT1M | 1.1 | ||
| MT1JP | |||
| P450 | cytochrome P450, family 2, subfamily A, polypeptides | CYP2A13 | 17.2 |
| CYP2A6 | 4.2 | ||
| CYP2A7 | 1.4 | ||
| Aldo-keto reductase | aldo-keto reductase family 7 | AKR7A2 | 34.1 |
| AKR7A3 | 3.0 | ||
| AKR7L | 1.5 | ||
| AKR7A2P1 | |||
| Aldehyde | alcohol dehydrogenase | ADH1C | 52.2 |
| ADH1B | 4.3 | ||
| ADH1A | 2.6 | ||
| Short chain | dehydrogenase/reductase | DHRS9 | 68.4 |
| MUC20 | 27.5 | ||
| VPS53 | 4.0 | ||
| SMU1 | 3.9 | ||
| FAM153B | 0.4 | ||
| LEP | 0.0 | ||
1 List of gene families identified using BLAST with ≥ 90% identity and alignment lengths of at least 50% in both sequences. They were categorized by gene family and sorted in descending order within each category by median expression level for n = 5 nonsmokers.
Figure 6Overall impact of smoking on small airway epithelium gene expression. Shown are comparisons of RNA-Seq assessment of genes expressed in the small airway epithelium (SAE) in nonsmokers (n = 5) vs smokers (n = 6). A. Cumulative frequency of expression levels as a function of increasing RPKM. The data is shown as cumulative frequency in 1/2 log10 bins starting at the lower limit (RPKM 0.125, log10 = -0.9) for healthy nonsmokers (blue) and healthy smokers (red). On an overall basis assessing all genes, there is no difference in the nonsmokers vs smokers. B. Comparison of expression of the subset of smoking-responsive vs non-responsive genes for the ubiquitous and SAE-enriched genes. Each category is divided into low, medium and high expression groups using the same criteria as in Figures 2, 3, with smoking-responsive genes defined as p < 0.05. Ordinate - number of genes; abscissa - smoking responsive (red) and smoking non-responsive (blue) for ubiquitous and small airway epithelium (SAE)-enriched genes. Note that for both ubiquitous and SAE-enriched genes, only a small fraction, and approximately the same proportion (8-14%; low, medium, high), are smoking-responsive. C, D. Modified volcano plot showing absolute change in expression level (RPKM smoker - RPKM nonsmoker) vs -log p value for ubiquitous and SAE-enriched genes. C. Ubiquitous genes. D. SAE-enriched genes. Note that for both ubiquitous and SAE-enriched genes, more genes are down-regulated by smoking than up-regulated.
Figure 7Comparison of smoking dependent genes observed by microarray and RNA-Seq. A. Microarray-determined smoking-responsive genes. The data includes all significant genes (Benjamini Hochberg corrected p value < 0.05; Additional file 1, Table S6) with > 1.5-fold different in mean expression level between n = 12 smokers and n = 12 nonsmokers, as determined by microarray. For each probeset the corresponding genes was assessed by RNA-Seq for n = 5 nonsmokers an n = 6 smokers and fold-change by microarray is plotted against the fold-change by RNA-Seq. B. RNA-Seq-determined smoking-responsive genes. The data includes the fold-change of all genes significantly impacted by smoking (uncorrected p < 0.005, 1.5-fold-change cut off), as assessed by RNA-Seq for n = 5 nonsmokers an n = 6 smokers. n = 12 nonsmokers and n = 12 healthy smokers were assessed by microarray and the fold-change for RNA-Seq is plotted against the fold-change for the probeset with largest change.
Figure 8Functional categorization of small airway epithelium (SAE) ubiquitous and SAE-enriched smoking-responsive genes. The smoking-responsive genes (p < 0.05) of the ubiquitous and SAE-enriched groups were assigned function based on Gene Ontology classification and searches of NCBI databases. The 9 functional categories chosen accounted for the largest fraction of the genes that could be assigned functional categories. For each category, the genes were divided into smoking-induced and smoking-repressed and the log10 of absolute change (RPKM smoker - RPKM nonsmoker) was plotted. Genes are divided by category and ordered within each category by decreasing change in expression level. Red - down-regulated; blue - up-regulated. Note that for both ubiquitous and SAE-enriched genes, a higher fraction of genes in most categories are down-regulated, and that most up-regulated genes are in the SAE-enriched subgroup.
Small Airway Epithelium Expressed Genes Most Affected by Smoking1
| RNA-Seq | Micorarray | |||||||
|---|---|---|---|---|---|---|---|---|
| Category | Gene symbol | Gene title | Nonsmoker median | Smoker median | ||||
| Ubiquitous | FTL | ferritin, light polypeptide | 371.1 | 843.4 | 472.2 | 2.3 | 1.6 | 0.1784 |
| PRDX1 | peroxiredoxin 1 | 187 | 398.6 | 211.6 | 2.1 | 1.6 | 0.1083 | |
| FTH1 | ferritin, heavy polypeptide 1 | 349.2 | 551.5 | 202.3 | 1.6 | 1.8 | 0.0602 | |
| TUBB2C | tubulin, beta 2C | 1161.3 | 1331.9 | 170.5 | 1.1 | 1.1 | 0.8995 | |
| CLU | clusterin | 498 | 665.2 | 167.3 | 1.3 | 1.5 | 0.2157 | |
| NQO1 | NAD(P)H dehydrogenase, quinone 1 | 38.3 | 198.7 | 160.4 | 5.2 | 4.8 | < 0.0001 | |
| UBB | ubiquitin B | 615.1 | 711.1 | 96 | 1.2 | -1.0 | 0.9760 | |
| GSN | gelsolin | 93.1 | 174.9 | 81.8 | 1.9 | 1.4 | 0.4772 | |
| TUBA1A | tubulin, alpha 1a | 341.7 | 423.2 | 81.5 | 1.2 | 1.1 | 0.7340 | |
| ALDOA | aldolase A, fructose-bisphosphate | 182.9 | 263.4 | 80.5 | 1.4 | 1.5 | 0.1670 | |
| SAE-enriched | MSMB | microseminoprotein, beta- | 333.1 | 3112.7 | 2779.6 | 9.3 | 2.0 | 0.0820 |
| C20orf114 | chromosome 20 open reading frame 114 | 1484.3 | 4102.7 | 2618.5 | 2.8 | 1.1 | 0.6685 | |
| ALDH3A1 | aldehyde dehydrogenase 3 family, member A1 | 226.9 | 2077.9 | 1851 | 9.2 | 9.8 | < 0.0001 | |
| TFF3 | trefoil factor 3 (intestinal) | 149.4 | 697.9 | 548.5 | 4.7 | 2.8 | 0.1771 | |
| WFDC2 | WAP four-disulfide core domain 2 | 840.3 | 1327 | 486.7 | 1.6 | 1.3 | 0.5735 | |
| TPPP3 | tubulin polymerization-promoting protein family member 3 | 1301.7 | 1604.9 | 303.2 | 1.2 | -1.0 | 0.9493 | |
| TSPAN1 | tetraspanin 1 | 663.9 | 960.2 | 296.3 | 1.4 | 1.1 | 0.8884 | |
| S100P | S100 calcium binding protein P | 79.6 | 291.2 | 211.5 | 3.7 | 1.8 | 0.4832 | |
| GSTA2 | glutathione S-transferase alpha 2 | 143.9 | 337.5 | 193.6 | 2.3 | Not in U133 | ||
| PLUNC | palate, lung and nasal epithelium associated | 5.4 | 186 | 180.6 | 34.5 | 1.6 | 0.8375 | |
| Ubiquitous | CRIP1 | cysteine-rich protein 1 (intestinal) | 1014.1 | 587.7 | -426.3 | -1.7 | -1.4 | 0.4842 |
| RPLP1 | ribosomal protein, large, P1 | 916.1 | 516 | -400.1 | -1.8 | -1.2 | 0.3845 | |
| CAPS | calcyphosine | 2197.4 | 1994.7 | -202.8 | -1.1 | -1.5 | 0.1733 | |
| PRDX5 | peroxiredoxin 5 | 1022.4 | 823.3 | -199.1 | -1.2 | -1.2 | 0.3653 | |
| RPS11 | ribosomal protein S11 | 537.2 | 357.5 | -179.7 | -1.5 | -1.3 | 0.3209 | |
| RPLP2 | ribosomal protein, large, P2 | 347.5 | 181.2 | -166.4 | -1.9 | -1.2 | 0.6557 | |
| RPL8 | ribosomal protein L8 | 631.5 | 468.9 | -162.6 | -1.3 | -1.2 | 0.5028 | |
| TPT1 | tumor protein, translationally-controlled 1 | 655.2 | 500.7 | -154.5 | -1.3 | -1.1 | 0.6701 | |
| S100A6 | S100 calcium binding protein A6 | 758.3 | 617.2 | -141 | -1.2 | -1.1 | 0.9250 | |
| CD81 | CD81 molecule | 248.3 | 120.5 | -127.8 | -2.1 | -1.3 | 0.3499 | |
| SAE-enriched | SCGB1A1 | secretoglobin, family 1A, member 1 (uteroglobin) | 38675.4 | 17244 | -21431.5 | -2.2 | -1.1 | 0.4670 |
| SCGB3A1 | secretoglobin, family 3A, member 1 | 7838.2 | 2947.3 | -4890.8 | -2.7 | -1.3 | 0.1509 | |
| CD74 | CD74 molecule, major histocompatibility complex, class II invariant chain | 947.2 | 723.7 | -223.5 | -1.3 | -2.1 | 0.2881 | |
| C9orf24 | chromosome 9 open reading frame 24 | 628.7 | 488 | -140.7 | -1.3 | -1.3 | 0.2256 | |
| CYP4B1 | cytochrome P450, family 4, subfamily B, polypeptide 1 | 373.5 | 259.3 | -114.2 | -1.4 | -1.6 | 0.1826 | |
| C20orf85 | chromosome 20 open reading frame 85 | 738.1 | 625.3 | -112.8 | -1.2 | -1.2 | 0.4756 | |
| KRT19 | keratin 19 | 493.4 | 396.8 | -96.6 | -1.2 | -1.1 | 0.7722 | |
| RPS18 | ribosomal protein S18 | 285 | 207.8 | -77.3 | -1.4 | -1.2 | 0.3731 | |
| ALDH3B1 | aldehyde dehydrogenase 3 family, member B1 | 341.9 | 265.1 | -76.7 | -1.3 | -1.5 | 0.3595 | |
| TMEM190 | transmembrane protein 190 | 945.4 | 868.8 | -76.6 | -1.1 | -1.2 | 0.5710 | |
| Ubiquitous | AHRR | aryl-hydrocarbon receptor repressor | 0.1 | 1.3 | 1.2 | 20.8 | 3.7 | 0.0054 |
| SAE-enriched | AKR1B10 | aldo-keto reductase family 1, member B10 (aldose reductase) | 0.3 | 28.5 | 28.2 | 94.8 | 56.6 | < 0.0001 |
| CABYR | calcium binding tyrosine-(Y)-phosphorylation regulated | 1 | 12.5 | 11.6 | 12.7 | 9.4 | < 0.0001 | |
| SPP1 | secreted phosphoprotein 1 | 0.8 | 10.6 | 9.8 | 12.9 | 8.5 | 0.0021 | |
| CYP1B1 | cytochrome P450, family 1, subfamily B, polypeptide 1 | 0.2 | 9.2 | 9 | 43.3 | 55.0 | < 0.0001 | |
| AKR1B15 | aldo-keto reductase family 1, member B15 | 0.1 | 6.5 | 6.3 | 50 | Not in U133 | ||
| B3GNT6 | UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 6 (core 3 synthase) | 0.3 | 4.4 | 4.1 | 16 | 4.5 | 0.0525 | |
| NOS3 | nitric oxide synthase 3 (endothelial cell) | 0.6 | 4 | 3.4 | 6.5 | "P" < 25% | ||
| TPRXL | tetra-peptide repeat homeobox-like | 0.3 | 3.2 | 2.9 | 10.2 | 3.8 | 0.0761 | |
| SFRP2 | secreted frizzled-related protein 2 | 0.3 | 2.7 | 2.4 | 9.8 | 9.7 | 0.0071 | |
| FAM177B | family with sequence similarity 177, member B | 0.2 | 2.5 | 2.3 | 11 | Not in U133 | ||
| Ubiquitous | PANK1 | pantothenate kinase 1 | 1.4 | 0.7 | -0.7 | -2.1 | -1.9 | 0.0391 |
| SAE-enriched | LYPD2 | LY6/PLAUR domain containing 2 | 23.9 | 1.7 | -22.3 | -14.5 | Not in U133 | |
| LYNX1 | Ly6/neurotoxin 1 | 8.6 | 2 | -6.6 | -4.3 | -1.8 | 0.3058 | |
| AZU1 | azurocidin 1 | 6.1 | 1.9 | -4.3 | -3.3 | -2.1 | 0.2405 | |
| ITM2A | integral membrane protein 2A | 5.2 | 1.3 | -4 | -4.2 | -2.6 | 0.0071 | |
| ITM2A | integral membrane protein 2A | 5.2 | 1.3 | -4 | -4.2 | -2.6 | 0.0071 | |
| SAA4 | serum amyloid A4, constitutive | 4.8 | 1.4 | -3.4 | -3.4 | -3.4 | 0.0147 | |
| GAL3ST2 | galactose-3-O-sulfotransferase 2 | 4.7 | 1.6 | -3.1 | -2.9 | "P" < 25% | ||
| NEU4 | sialidase 4 | 3.7 | 0.7 | -3 | -5.4 | -1.2 | 0.6483 | |
| PAX1 | paired box 1 | 3.6 | 1 | -2.6 | -3.5 | -1.9 | 0.3312 | |
| ERP27 | endoplasmic reticulum protein 27 | 2.4 | 0.5 | -2 | -5.2 | -4.4 | 0.0016 | |
1 The top 20 genes were identified in four categories of genes most affected by smoking; largest absolute increase, largest absolute decrease, novel genes up-regulated by smoking and low level expressed genes suppressed by smoking. They were sorted in descending order of absolute difference in expression within each category and identified as being from the ubiquitous or small airway epithelium-enriched groups.
2 Absolute difference = smoker median - nonsmoker median
3 Fold change = mean in smokers/mean in non smokers
4 Benjamini-Hochberg corrected p value. Where the gene is not represented on the microarray or the Affymetrix Present call "P" is less than 25% of subjects, it is so indicated.
Figure 9RNA-Seq quantification of examples of smoking-dependent changes in expression of ion channel-related genes expressed in the small airway epithelium. Smoking responsiveness of selected ion channels. A. CFTR - unchanged; B-D. Up-regulated. B. ABCC3; C. CACNG4; and D. CNGB1. E-F. Down-regulated. E. SLC13A2; and F. KCNC4. In all panels, each data point represents one individual.
Figure 10Quantile-quantile plot of significance of difference in splice junction usage between smokers and nonsmokers. Normalized reads supporting splicing in the smokers and nonsmoker samples were compared. The data shows that smoking caused no significant difference in the splicing for either the ubiquitously expressed genes (blue) or SAE-enriched genes (green).