| Literature DB >> 31949184 |
Ana B Villaseñor-Altamirano1, Marco Moretto2, Mariel Maldonado3, Alejandra Zayas-Del Moral4, Adrián Munguía-Reyes3, Yair Romero5, Jair S García-Sotelo1, Luis A Aguilar6, Oscar Aldana-Assad1, Kristof Engelen2, Moisés Selman3, Julio Collado-Vides7,8, Yalbi I Balderas-Martínez9,10, Alejandra Medina-Rivera11.
Abstract
Chronic Obstructive Pulmonary Disease (COPD) and Idiopathic Pulmonary Fibrosis (IPF) have contrasting clinical and pathological characteristics and interesting whole-genome transcriptomic profiles. However, data from public repositories are difficult to reprocess and reanalyze. Here, we present PulmonDB, a web-based database (http://pulmondb.liigh.unam.mx/) and R library that facilitates exploration of gene expression profiles for these diseases by integrating transcriptomic data and curated annotation from different sources. We demonstrated the value of this resource by presenting the expression of already well-known genes of COPD and IPF across multiple experiments and the results of two differential expression analyses in which we successfully identified differences and similarities. With this first version of PulmonDB, we create a new hypothesis and compare the two diseases from a transcriptomics perspective.Entities:
Mesh:
Year: 2020 PMID: 31949184 PMCID: PMC6965635 DOI: 10.1038/s41598-019-56339-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Flow chart of PulmonDB. PulmonDB was created using COMMAND by downloading, parsing and storing COPD and IPF public transcriptomic data into a MySQL database. Then, we remapped microarray probes to establish a uniform gene annotation, and we also created a controlled vocabulary for clinical and biological annotations for each sample. We created contrasts based on the original hypothesis, selecting a sample as the reference. Finally, the data were homogenized and subjected to a quality check.
Figure 2Summary of PulmonDB. (A) The number of contrast samples in PulmonDB per biological sample type. (B) The number of sample states found in PulmonDB. The color key below the bar chart shows the sectors for COPD patients, healthy/controls, IPF patients, match_tissue_controls (non-cancerous sample from a cancer patient), and other diseases (such as asthma). (C) The number of contrast samples measured using each platform (clustered by using Affymetrix, Agilent, Illumina, and other platforms with fewer samples).
Figure 3IPF and COPD well-known disease-associated genes. In both heatmaps, rows are genes, and columns are sample contrasts. Both were hierarchically clustered. The first annotation row represents their GSE IDs. The second annotation row is the sample type, LUNG_BIOPSY samples, in light brown. The third and the fourth annotation rows are sample states, the third annotation row represents the test state, and the fourth annotation row is the reference state. (A) IPF genes reported being relevant in the literature (CCL18[15], CXCL12[16], CXCL13[17], COL1A1, COL1A2, COL3A1, COL5A2, COL14A1[18], DSP[19], FAS[20], IL-8[21], MMP1[22], MMP2[23], MMP7[22], MUC5B[19], SPP1[24], PTGS2[25], TGFB1[26] and THY1[27]). The IPF experiments selected were GSE32537 (pink), GSE21369 (purple), GSE24206 (blue), GSE94060 (grass-green), GSE72073 (lemon yellow), GSE35145 (green), and GSE31934 (yellow). The third and the fourth annotation rows are sample states: light blue, MATCH_TISSUE_CONTROL; dark blue, HEALTHY/CONTROL; turquoise, IPF samples; and grey, NON_IPF_ILD. (B) COPD genes reported being relevant in the literature (HHIP[28,29], CFTR[30,31], PPARG[32], SERPINA1[33,34], JUN[35], FAM13A[36], MYH1035, CHRNA5[37], JUND[35], JUNB[35], TNF[34], MMP9[34], MMP12[34], CHRNA3[37], TGFBR3[32], and GATA2[32]). The COPD experiments selected were GSE27597, GSE37768, GSE57148, GSE8581, and GSE1122. The third and the fourth annotation rows are sample states: light blue, MATCH_TISSUE_CONTROL; dark blue, HEALTHY/CONTROL; red, COPD samples.
Figure 4IPF and COPD differentially expressed and similarly expressed genes. (A) Flow chart of steps used for COPD and IPF differential expression analysis to evaluate transcriptomic differences and similarities. (B) Experiments selected for the analysis, following the criteria of being lung biopsy samples and contrasted with HEALTHY/CONTROL references. The colors represent the sample state: COPD, red; HEALTH/CONTROL, blue; IPF, turquoise. At the top, the bar graph is the total sum of contrasts, rows are the GSE experiments, and each dot is the number of contrasts per experiment from COPD, HEALTHY/CONTROL, or IPF subjects. On the right side, we can see the distributions in violin plots for all sample contrasts per experiment. (C) Differentially expressed genes between COPD and IPF. (D) Similar genes between COPD and IPF. In both (C, D) columns are sample contrasts, rows are genes, the first covariate is colored by each corresponding experiment, the second covariate is the sample type (in this case, lung tissue is shown in light brown), the third row is the test status, and the fourth is the reference status. Columns are ordered by test status and genes by hierarchical clusterization. The right heatmap is the correlation among sample contrasts, and the covariates are the same.