Literature DB >> 32302349

Blood co-expression modules identify potential modifier genes of diabetes and lung function in cystic fibrosis.

Fanny Pineau¹, Davide Caimmi², Milena Magalhães¹, Enora Fremy¹, Abdillah Mohamed¹, Laurent Mely³, Sylvie Leroy⁴, Marlène Murris⁵, Mireille Claustres^1,6, Raphael Chiron², Albertina De Sario¹.

Abstract

Cystic fibrosis (CF) is a rare genetic disease that affects the respiratory and digestive systems. Lung disease is variable among CF patients and associated with the development of comorbidities and chronic infections. The rate of lung function deterioration depends not only on the type of mutations in CFTR, the disease-causing gene, but also on modifier genes. In the present study, we aimed to identify genes and pathways that (i) contribute to the pathogenesis of cystic fibrosis and (ii) modulate the associated comorbidities. We profiled blood samples in CF patients and healthy controls and analyzed RNA-seq data with Weighted Gene Correlation Network Analysis (WGCNA). Interestingly, lung function, body mass index, the presence of diabetes, and chronic P. aeruginosa infections correlated with four modules of co-expressed genes. Detailed inspection of networks and hub genes pointed to cell adhesion, leukocyte trafficking and production of reactive oxygen species as central mechanisms in lung function decline and cystic fibrosis-related diabetes. Of note, we showed that blood is an informative surrogate tissue to study the contribution of inflammation to lung disease and diabetes in CF patients. Finally, we provided evidence that WGCNA is useful to analyze-omic datasets in rare genetic diseases as patient cohorts are inevitably small.

Entities: CellLine Chemical Disease Gene Mutation Species

Year: 2020 PMID： 32302349 PMCID： PMC7164665 DOI： 10.1371/journal.pone.0231285

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Cystic fibrosis (CF; OMIM 219700) is an autosomal recessive inherited disease that affects approximately 1/3000 newborns [1]. It results from impairment of the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) protein, a chloride channel expressed at the apical membrane of various epithelial cells. The defective protein results in thick, sticky and obstructive mucus in multiple organs of the respiratory, digestive and reproductive systems [1,2]. The mutant CFTR protein is also responsible for an altered innate and adaptive immune function. Lung disease is the main cause of morbidity and mortality in cystic fibrosis [2]. CF patients present chronic infections and abnormal inflammation of the lungs that lead to progressive airway destruction. The rate of lung function deterioration is variable among patients and associated with the development of comorbidities and chronic infections [1,2]. CF-related diabetes (CFRD) is a common comorbidity of CF [3]. It affects about 20% of adolescents and 40% to 50% of adults, and is associated with more frequent pulmonary exacerbations, accelerated pulmonary function decline and higher mortality. CFRD is characterized by a reduced and delayed insulin response. The beta-cell dysfunction is evident before the onset of diabetes and is already associated with a pulmonary function decline [3]. The exact causes of CFRD are not totally elucidated, nor is explained the association between diabetes and accelerated lung function loss. Mutant cftr-/- newborn zebrafishes have fewer beta cells than the wildtype ones, which suggests that the CFTR protein is important for pancreas development [4]. In addition, CFTR seems to be critical for insulin exocytosis, which implies that CF patients have an intrinsic pancreatic islet dysfunction [5]. Finally, the continuous infiltration of immune cells into the pancreas may contribute to the progressive destruction of the islets [6]. The defective CFTR protein and subsequent insufficient mucociliary clearance predispose CF patients to acute and, ultimately, chronic lung infections with opportunistic pathogens [7]. Chronic P. aeruginosa infection is found in approximately 40% of adult CF patients and is also associated with a drastic decrease of lung function [7]. Previous genetic studies showed that the clinical variability of CF patients depends not only on the type of mutations in the CFTR gene, but also on modifier genes, other genes that modulate the patient phenotype [1]. Much current research focuses on finding CF modifier genes to develop new therapies [1]. In the present study, we analyzed the transcriptome to identify genes and pathways that (i) contribute to the pathogenesis of cystic fibrosis and (ii) modulate the associated comorbidities. Knockout mice models have limitations because they do not develop spontaneous diabetes, nor do they present lung disease [8]; on the other hand, human pancreas and airway tissues are not easily accessible for studies. Herein, we used whole blood samples from CF patients as a surrogate tissue. Of interest, we found that lung function, body mass index (BMI), the presence of diabetes, and chronic P. aeruginosa infection correlated with modules of co-expressed genes.

Materials and methods

Study population

The study population included ≥ 18-year old subjects from the MethylCF cohort: 33 cystic fibrosis patients and 16 healthy controls [9,10]. A replication set of subjects from the same cohort was used for real-time PCR validation (20 CF patients and 8 healthy controls). The two sets were similar with respect to the age and male-to-female ratio. Demographic and clinical features are reported (). CF patients carried the homozygous p.Phe508del mutation. The presence of diabetes was determined on the basis of an abnormal oral glucose-tolerance test. CF patients were classified as chronically infected by P. aeruginosa, methicillin-resistant S. aureus and/or A. fumigatus, whenever they had three consecutive positive sputum cultures after antibiotic treatment. The study was approved by the “Comité de Protection des Personnes Sud Méditerranée III” Institutional Review Board (2013.02.01bis) and is registered at clinical.gov under reference #NCT02884. Informed written consent was obtained from all participants. BMI, body mass index; CF, cystic fibrosis; FEV1, forced expiratory volume in 1 second FVC, forced vital capacity; HbA1c, glycated hemoglobin fraction; PI, pancreatic insufficiency MRSA, Methicillin-resistant Staphyloccocus aureus. † Median values (interquartile range). * The presence of diabetes was determined on the basis of an abnormal oral glucose-tolerance test. § CF patients were classified as chronically infected by P. aeruginosa, MSRA and/or A. fumigatus whenever they had three consecutive positive sputum cultures after antibiotic treatment. na, not applicable because only five measurements were available.

RNA sequencing and differential expression analysis

RNA was extracted from whole blood samples using the PAXgene Blood RNA kit (#762124, PreAnalytix), according to the manufacturer’s recommendations [9]. Total RNA sequencing libraries were prepared with the TruSeq Stranded Total RNA kit (Illumina®) and ribosomal RNA was depleted using the Ribo-Zero Gold rRNA removal kit following the manufacturer's instructions. Libraries were sequenced in paired-end 75 nucleotides mode with a HiSeq4000 Illumina. The quality of raw sequenced reads was assessed using the FASTQC quality control tool and reads were mapped to the reference human genome build hg19/GRCh37 with Tophat 2 [11]. We used HTSeq to obtain the number of reads associated with each gene in the Gencode v26lift37 database (restricted to protein-coding genes, antisense and lincRNAs) [12]. The differential expression of the annotated genes was calculated using DESeq [13]. Transcripts with a minimum 2-fold change and a Benjamin-Hochberg adjusted p-value (FDR) < 0.05 were considered as differentially expressed. Normalized data and raw data generated during the current study are available in Gene Expression Omnibus (GEO) with accession number 136371 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE136371).

Real-time PCR validation

Total RNA from whole blood samples were reverse transcribed from 500 ng as previously described [9]. Primers were designed using Primer3Plus and Beacon Designer Free Edition online tools, using a Gibbs energy threshold of ΔG ≥ -2.0 for hetero- and homodimers and a GC content > 40% (). All primers were tested to display an efficiency of amplification of at least 93% (±SD 6%). Amplicons overlapped exon junctions except for CITF22-49E9.3. Real-time PCR reactions were done in duplicate in two independent reverse transcriptions, using SYBR Green I Master mix (Roche Diagnostics) and a LightCycler 480 Instrument. The reverse transcription reaction program consisted of 10 min pre-incubation at 95°C followed by a three step amplification (95°C for 10 s, 60°C for 20 s, 72°C for 10 s). Standard curves were generated by serial dilutions of a control cDNA. Expression levels were expressed as ratios relative to that of the reference gene (YWHAZ), using the Pffafl method (with efficiency correction) [14]. Differences between groups were analyzed with Wilcoxon test and were considered significant when p-value < 0.05.

Weighted Gene Correlation Network Analysis

To identify modules of co-expressed genes, we implemented Weighted Gene Correlation Network Analysis in the WGCNA R package [15]. We used the WGCNA functions to (i) construct a network of coexpressed and highly connected genes, (ii) identify modules of coexpressed genes, and (iii) correlate the gene modules with biological features (continuous or binary phenotypic traits) of the MethylCF cohort. Unless otherwise specified below, we used the default parameters as described by [15]. Briefly, we preselected a list of 15077 genes with a FPKM > 0.1 and log2-transformed the FPKM values (FPKM value +1). Then, we generated a signed adjacency matrix using the biweight midcorrelation and raising it to the power beta = 18 to reduce the noise. The adjacency network exhibited approximate scale-free topology (R2 = 0.92). Scale-free topology is obtained when few genes are highly connected to each other (hub genes), whereas the remaining genes are weakly connected. The adjacency matrix was transformed into a topological overlap matrix, an adjacency matrix that considers coexpression information and topological similarity. Modules were generated using the dynamic tree cut and modules with highly correlated module eigengenes (correlation > 0.75) were merged together. Correlations between the modules of co-expressed genes (eigengenes) and clinical and demographic features of the MethylCF cohort were calculated. Top genes in the modules were visualized with Cytoscape [16].

Gene ontology analysis

Gene ontology (GO) and KEGG pathways were analyzed with WebGestalt (WEB-based GEne SeT AnaLysis Toolkit; URL: http://www.webgestalt.org) using the Benjamini–Hochberg correction for multiple testing [17]. For GO, we retained a false discovery rate of 5%, excluding categories with less than four genes. For KEGG pathway analyses, we used a false discovery rate of either 1% or 5%.

Results

RNA sequencing and biotype distribution

Whole blood samples had been collected from CF patients with no ongoing pulmonary exacerbation [9]. Total blood cell count and the percentage of different types of leukocytes were within normal range. The median percentages calculated on 19 CF patients were 60% (iqr 14.7%) neutrophils, 26% (iqr 12.0%) lymphocytes, 9% (iqr 3.0%) monocytes, 3% (iqr 2.6%) eosinophils, 1% (iqr 0.5%) basophils. We collected the blood samples in PAXgene tubes that stabilize RNA, preserve all types of circulating cells (leukocytes and platelets) and do not alter the gene expression profile of frozen samples [18]. Using RNA-seq, we generated the transcriptome of 49 blood samples from the MethylCF cohort [9,10]. Eight samples were excluded because ribosomal depletion failed. Hence, the bioinformatic analyses were carried out on 41 samples (27 CF patients and 14 controls). The median total number of reads per sample was 112 million (iqr 17 million). We found that 9324 protein coding genes, 501 antisense transcripts and 493 lincRNA were expressed in blood samples (FPKM >1). Biotype distribution and expression level of the corresponding transcripts are represented in .

Differentially expressed genes between CF patients and controls

When we compared gene expression between CF patients and controls, we found 75 differentially expressed (DE) genes (48 genes were over-expressed and 27 genes were under-expressed in CF patients) (Log2foldChange ≥ 1 or ≤ -1; FDR < 0.05) (, ). Thirteen non-coding RNAs were over-expressed and two non-coding RNAs were under-expressed in CF patients compared to controls ().

Differentially expressed gene between CF patients and controls.

The volcano plot represents the Log2 transformed fold-changes (x-axis) and the Log10 transformed q-values (y-axis). 48 genes were over-expressed (red) and 27 genes were under-expressed (green) in CF blood samples (FDR < 0.05). Gene ontology (GO) analysis showed that DE genes between CF patients and controls were overrepresented among genes important for the response to bacterial infection (FDR p-value = 1.2E-05, 19 genes including TLR5, S100A8, S100A12, ILR23R) and leukocyte activation (FDR p-value = 3.4E-04, 11 genes including IL23R, IL4R, CDC80, TBX21) (. KEGG analysis highlighted the Th17 lymphocyte activation pathway (FDR p-value = 4.6E-03, 7 genes: MAPK14, IL23R, TBX21, HLA-DOA, IL2RB, IL4R, RORC).

Validation of RNA-seq data with real-time PCR

To validate the RNA-seq data, we assessed the expression levels of three DE genes (TLR5, CLEC4D, and ALPL) and one DE lincRNA (CITF22-49E9.3) using real-time PCR in the same set of blood samples (n = 49) as a technical validation, and in a replication set (n = 28) as a biological validation (). We selected protein-coding and non-coding transcripts among the top DE genes with a relevant biological function and a range of expression levels (from 5 to 56 FPKM). TLR5, CLEC4D, ALPL and CITF22-49E9.3 were differentially expressed between CF patients and controls of the discovery set, and thus technically validated (p-value < 0.01). TLR5, CLEC4D, and ALPL were biologically validated since their expression levels differed between CF patients and controls of the replicative set (p-value < 0.05). CITF22-49E9.3 failed to be biologically validated (p-value = 0.18), but the direction was the same as in RNA-seq (over-expression in CF). For all loci, the direction of differential expression was identical and fold-changes were similar between RNA-seq and real-time PCR data, showing a total concordance between the two techniques (). FPKM: Fragments Per Kilobase of transcript per Million mapped reads, Med: median, CF: Cystic Fibrosis, C: Control, FC: fold-change (CF/C). † CF vs C, Wilcoxon test. Next, to find additional genes important for CF pathogenesis and the associated comorbidities, we implemented Weighted Gene Correlation Network Analysis (WGCNA) [15]. Using RNA-seq datasets from 27 CF patients of the MethylCF cohort, we found 28 modules of co-expressed genes (). The number of genes in each module ranged from 35 to 2839. A majority of modules were enriched with genes that belong to biological pathways (Notch signaling, MAPK signaling, platelet activation, B cell receptor, etc), which suggests that the gene modules are biologically meaningful (). *Number of genes in the module § Top KEGG pathway N, number of genes in the pathway. R, ratio of enrichment. FDR, false discovery rate n.s., not significant, p-value > 0.05 after Benjamini-Hochberg correction Next, we calculated the correlation between the clinical traits of the patients and the eigengenes of the modules (. The eigengene is the first component of a principal component analysis and represents the summary of the gene expression profile of the module [15]. Modules of co-expressed genes that correlated with lung function, the presence of diabetes, and a chronic P. aeruginosa infection were analyzed in more detail.

WGCNA on blood RNA-seq dataset.

The heatmap represents the correlation (coefficient and p-value) between eigengene modules and clinical traits. Module sizes are shown in the colored boxes. Lung function metrics included forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC) expressed in liters and as a percent predicted based on age, sex and height [19]. These clinical measures are used for the follow-up of CF patients and as endpoints to assess whether patients respond to treatments [20]. In the MethylCF cohort, lung function and BMI best correlated with the dark turquoise module. For simplicity, only FEV1 (%) was represented (), however, consistent results were obtained with FVC (%) (r = 0.54 p-value = 4.0E-03). The GO analysis of the dark turquoise module highlighted terms related to cell-cell adhesion and cell adherence junctions (). FLNA is a hub gene of this module. It encodes Filamin A, an actin-binding and scaffolding protein that interacts with integrins to regulate leukocyte trafficking () [21].

Network representation of dark turquoise and light yellow modules.

Top genes and their connections were visualized with Cytoscape [16]. Genes of interest were emphasized. For each module the correlated clinical traits and main gene ontology (GO) terms are shown. * Top 10 enriched terms n.s., not significant after Benjamini-Hochberg correction Diabetes and glycated hemoglobin (HbA1c) levels correlated with the light yellow module. HbA1c reflects the mean glucose levels over a three-month period. GO analysis of this module revealed terms related to vesicle transport and platelet activation (). Among the hub genes of this network were integrin genes (ITGB3, ITGA2B, ITGB5) (). The presence of diabetes (but not HbA1c levels) also correlated with the cyan module. GO analysis of the genes belonging to this module revealed enrichment for terms related to the proteasome (). Through hierarchical clustering, we identified three groups of CF patients with distinct gene expression signatures and prevalence of diabetes (). In the group enriched with diabetic patients (6 out of 9 patients), seven proteasome genes (PSME4, PSMA7, PSMB4, PSMB6, PSMC4, PSMD7 and PSMD8) and three genes previously associated with common diabetes (SNX17, PARK7/DJ-1 and ATP5B) were highly expressed (). By contrast, two genes encoding histone methyltransferases (SMYD3 and KMT2A) and one gene encoding a chromatin-remodeling ATPase (EP400) were under-expressed in this group of patients.

Network representation of cyan module.

Top genes and their connections were visualized with Cytoscape [16] (left side). Genes of interest were emphasized. The correlated clinical trait and main gene ontology (GO) term are shown. Hierarchical clustering and heatmap of the cyan genes: purple square, CF patients with diabetes; white square, CF patients without diabetes. C1, C2, C3: cluster 1, 2 and 3, respectively (right side). Top 10 enriched terms n.s., not significant after Benjamini-Hochberg correction Finally, the presence of a chronic P. aeruginosa infection negatively correlated with the eigengene of the magenta module. GO analysis revealed terms related to heme metabolic process and hemoglobin complex (). The magenta network comprised three hub genes encoding the following proteins: SLC25A39 (Solute Carrier Family 25 member 39), a mitochondrial solute carrier protein, GATA1 (GATA Binding Protein 1), an erythroid transcription factor and CDC34 (Cell Division Cycle 34), a ubiquitin-conjugating enzyme ().

Discussion

CF patients present a clinical heterogeneity that is not fully explained by the type of mutations in the CFTR gene. Previous studies emphasized that other genes modulate the clinical phenotype and account for the development of comorbidities [1]. CF modifier genes were extensively searched in genetic studies first and more recently in transcriptomic studies [1]. Transcriptomic analyses in airway cell lines and nasal epithelial cell samples showed expression changes in genes involved in cell proliferation, inflammation and immune responses, protein metabolism, and calcium and membrane pathways [22-24]. More recently, blood samples from CF patients presenting either severe or mild lung disease were analyzed: genes of the type I interferon response and ribosomal stalk proteins were differentially expressed [25]. However, no healthy subjects were compared with CF patients in that study. Herein, we used RNA-seq to profile blood samples from CF patients and healthy controls. An added value of this cohort is that the clinical data were recorded on the same day biological samples were collected. Importantly, in addition to the differential gene expression analysis, we implemented WGCNA. DE genes between CF patients and controls were overrepresented among genes important for the leukocyte activation and the response to bacterial infection, which is consistent with the permanent inflammation and chronic infections in CF patients. KEGG analysis highlighted the Th17 lymphocytes activation pathway. A limitation of the DE gene analysis is that a number of relevant genes do not reach significance after correction for multiple tests, unless large cohorts are assembled. But it is difficult to fulfill this condition in rare disease studies. To overcome this drawback, we set up WGCNA [15]. WGCNA detects networks of co-regulated and highly connected genes that belong to biological pathways and reduces the number of variables to be tested, thus decreasing the false discovery rate [15]. Using WGCNA, we found that clinical traits of interest in cystic fibrosis correlated with modules of co-expressed genes in blood samples. Because lung disease is the main cause of mortality and morbidity in CF, first we inspected the dark turquoise module that correlated with the lung function (FEV1 and FVC) and BMI. Lung function and BMI are positively correlated in CF and their deterioration is predictive of patient decline and, ultimately, patient death [26]. The GO analysis of the dark turquoise module showed terms related to cell-cell adhesion and cell adherence junctions. Interestingly, the same association between lung function and cell adhesion was highlighted by the DNA methylation analysis of CF nasal epithelial cell samples [10]. In the present transcriptomic study, FLNA is a hub gene of the dark turquoise module. The Filamin A protein is an actin-binding and scaffolding protein that binds to integrins and also interacts with CFTR [27]. Of interest, mutations in the FLNA gene result in interstitial lung disease, a severe respiratory illness [28]. The FLNA protein is required for optimal T cell homing into lymph nodes and inflamed tissues [29]. To explain the association between lung function and cell adhesion in cystic fibrosis, we argue that if cell junctions are loosened, the leukocyte trafficking from the blood stream to the airways is facilitated, and the resulting high inflammation reduces lung function. The decline of lung function is steeper in CF patients with diabetes and the fast decay starts 1–3 years before the appearance of diabetes [3]. Peaks of hyperglycemia also occur before the appearance of diabetes [3]. To explain the association between diabetes and a more rapid lung function decline, modules that correlated with lung function (dark turquoise) and with diabetes and HbA1c levels (light yellow) should be investigated together. Of interest, some of their respective hub genes encode proteins (FLNA and integrins) that bind one to each other to regulate T lymphocyte and neutrophil trafficking [26,30]. Also, high glucose modifies the levels of the Flna protein in rat endothelial cells [31]. All together, these findings suggest that glucose fluctuations can be the initial event that alter the expression of genes responsible for leukocyte trafficking, increases airway inflammation and, thereby, reduces the lung function in cystic fibrosis. CF patients with diabetes may present microvascular complications, namely retinopathy and nephropathy [3]. Platelets are activated by glucose and their abnormalities are the initial event responsible for microvascular complications in diabetes [32]. Genes encoding platelet aggregation proteins were overrepresented in the light yellow module which, therefore, should be analyzed in detail with respect to these comorbidities. The presence of diabetes but not HbA1c levels correlated with the cyan module. It comprised genes that encode proteins of the 20S and 19S proteasome subunits. A pivotal function of the proteasome is to degrade the oxidized and misfolded proteins that are generated by oxidative stress [33]. Through visualization of the most connected genes of this module, we identified SNX17, previously associated with glucose-homeostasis in muscle and adipose tissues of type 2 diabetic patients [34]. The SNX17 protein activates T cells by regulating T cell receptors and integrin recycling in humans [35]. SNX17 is a hub connected to 12 co-expressed genes, namely PARK7/DJ-1 encoding a protein deglycase and ATP5B encoding the mitochondrial ATP synthase B subunit. In diabetic mice, the expression of the ATP5B protein is high and activated by reactive oxygen species (ROS) [36]. Overall, genes of the cyan module encode proteins that are activated by and protect from high levels of ROS. Adult CF patients are sensitized to chronic opportunistic airway infections. In the blood transcriptome dataset, P. aeruginosa chronic infection negatively correlated with the magenta module. This correlation should be taken with caution because only two patients were not chronically infected by P. aeruginosa. Genes of the magenta module encode proteins important for erythrocyte differentiation and homeostasis, and for the hemoglobin complex. SLC25A39, a hub in this module, codes for a mitochondrial solute carrier protein. Silencing of the mouse Slc25a39 ortholog affected iron incorporation, essential for bacterial growth [37,38]. Thus, the magenta module points to the role of iron fixation in P. aeruginosa infections. The present study has some limitations. We analyzed gene transcription in blood samples from 33 CF patients and 16 healthy controls. Confirmatory studies should be carried out in independent cohorts. We showed that blood is an informative surrogate tissue to address the contribution of inflammation to CFRD. However, evidence exists that this comorbidity also depends on an intrinsic pancreatic islet dysfunction whose study requires access to pancreas samples. Finally, in the future, patients should be followed longitudinally to correlate the gene signatures with the progression of the disease.

Conclusions

In summary, using blood samples from CF patients, we identified modules of co-expressed genes that belong to relevant biological pathways. Detailed inspection of three modules that correlated with the presence of diabetes and lung function pointed to cell adhesion, leukocyte trafficking and production of ROS as central mechanisms in CFRD and pulmonary function decline. A fourth module that correlated with P. aeruginosa infection comprised genes important for iron fixation. Of note, we showed that blood is an informative surrogate tissue to address the contribution of inflammation to lung disease and diabetes in CF patients. Finally, we provided evidence that WGCNA is much valuable to analyze–omic datasets in rare genetic diseases as patient cohorts are inevitably small.

Biotype distribution and expression level of the corresponding transcripts.

(TIFF) Click here for additional data file.

Network representation of the magenta module.

Top genes and their connections were visualized with Cytoscape [16]. Genes of interest were emphasized. The correlated clinical trait and main gene ontology (GO) term are shown. (TIFF) Click here for additional data file.

Primer sequences and conditions used for qPCR validation.

(DOCX) Click here for additional data file.

Differentially expressed genes between CF patients and controls.

(DOCX) Click here for additional data file.

GO terms for differentially expressed genes between CF patients and controls.

(DOCX) Click here for additional data file.

Genes and modules.

Rows correspond to genes. A total of 15077 genes with FPKM > 0.1 in blood samples are listed. Columns list the gene name followed by module membership (kMEi) and corresponding p-values for each module of co-expressed genes. Lists of genes with high module membership can be sorted by selecting decreasing kMEi or increasing p-values. The kMEi is the correlation between the expression of a gene and the module eigengene. It ranges between 0 and 1. A gene is highly connected to other genes of a module when its kMEi approaches 1. (XLSX) Click here for additional data file. 7 Feb 2020 PONE-D-20-00190 Blood co-expression modules identify potential modifier genes of diabetes and lung function in cystic fibrosis PLOS ONE Dear Dr De Sario, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please follow the suggestions of both reviewers to improve your manuscript. We would appreciate receiving your revised manuscript by Mar 22 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. We look forward to receiving your revised manuscript. Kind regards, Barbara Bardoni Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements: 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We note that you have included the phrase “data not shown” in your manuscript. Unfortunately, this does not meet our data sharing requirements. PLOS does not permit references to inaccessible data. We require that authors provide all relevant data within the paper, Supporting Information files, or in an acceptable, public repository. Please add a citation to support this phrase or upload the data that corresponds with these findings to a stable repository (such as Figshare or Dryad) and provide and URLs, DOIs, or accession numbers that may be used to access these data. Or, if the data are not a core part of the research being presented in your study, we ask that you remove the phrase that refers to these data. 3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: “Blood co-expression 1 modules identify potential modifier genes 2 of diabetes and lung function in cystic fibrosis”, by F. Pineau et al. This study used transcriptome analysis of blood samples from individuals with CF in an attempt to find novel gene modifiers of CF disease. Using a Weighted Gene Correlation Network Analysis (WGCNA) approach, Lung function, BMI, diabetes presence and chronic P. aeruginosa infection were found to be correlated with modules of co-expressed genes. The study is clear and well presented. I have only a few minor comments which, if addressed, would render the article suitable for publication. Comments: 1. Methods p.5-6 (Table 1): It would be informative to have the standard error (+/- SEM) for values in this table, rather than just the mean or median values, in order to better assess the similarity between the two patient sets. 2. Methods p.7 line 131: primer efficiency was determined to be 93% - presumably this means at least 93%? 3. Methods p. 7-8 (WGCNA section): this section is not very informative for readers who have not used this method of data analysis, and would benefit from a slight expansion to clarify some of the terms. 4. Results p 9-10: When presenting results for GO term enrichment a few of the most enriched groups are mentioned here along with a few of the representative genes. It would be more informative if the p values and number of genes in each group were shown in this section to reduce the necessity for looking at the supplementary table. 5. Results p 10 (Validation by real time PCR): the validation is well described and the results seem to be sound but the number of DE genes/lincRNAs validated is rather low, given the relative ease and rapidity of this technique. Particularly, given the non-validation of the lincRNA in the replication, I wonder why the authors did not try validating one of the other transcripts. 6. Results (WGCNA section including tables 3-4 and Figs 2-3): this section necessarily contracts the data in order to focus on a few modules of co-expressed genes that correlate with some CF parameters. However, it would be useful to provide a further clarification about the identity of other undiscussed modules which can be seen in Fig. 2. Table S4 is uninformative in this respect, as it is not easy to find the relevant genes in each module, even using the table legend on page 26. Either explain better how the data can be treated to find these genes, or provide another sheet in table S4 with the columns filtered to identify the important genes of each module. 7. Discussion p 16-19. The discussion of the suitability of the blood transcriptome for studying the contribution of DE gene expression in CFRD should be extended. In particular it would be interesting to know what proportions of which cell types were present in the whole blood from which the RNA samples were extracted. Some cytological data from similar blood samples to those used in the study would help in this respect, and would allow a fuller appreciation of the functional significance of the data. Reviewer #2: In this manuscript, the authors performed RNA-seq analysis of blood samples obtained from patients affected with cystic fibrosis and healthy controls. They analyzed the data obtained by Weighted Gene Correlation Network Analysis and they found a correlation between lung function, body mass index, the presence of diabetes, and chronic P. aeruginosa infections with four modules of co-expressed genes. This study is original and overall well written, I found the discussion a little bit lengthy. Figure 3 is too dense. Figure legends are not included in the PDF file. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Enzo Lalli [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. 16 Mar 2020 Response to Reviewer #1: This study used transcriptome analysis of blood samples from individuals with CF in an attempt to find novel gene modifiers of CF disease. Using a Weighted Gene Correlation Network Analysis (WGCNA) approach, Lung function, BMI, diabetes presence and chronic P. aeruginosa infection were found to be correlated with modules of co-expressed genes. The study is clear and well presented. I have only a few minor comments which, if addressed, would render the article suitable for publication. Comments: 1. Methods p.5-6 (Table 1): It would be informative to have the standard error (+/- SEM) for values in this table, rather than just the mean or median values, in order to better assess the similarity between the two patient sets. In Table 1 of the revised manuscript, we added the interquartile ranges (IQR). 2. Methods p.7 line 131: primer efficiency was determined to be 93% - presumably this means at least 93%? Yes, it does mean at least 93%. Line 123 we added “at least”. 3. Methods p. 7-8 (WGCNA section): this section is not very informative for readers who have not used this method of data analysis, and would benefit from a slight expansion to clarify some of the terms. Lines 135-152, we have summed up the main steps of WGCNA and slightly expanded the paragraph. 4. Results p 9-10: When presenting results for GO term enrichment a few of the most enriched groups are mentioned here along with a few of the representative genes. It would be more informative if the p values and number of genes in each group were shown in this section to reduce the necessity for looking at the supplementary table. Lines 193-197, we added the p-values and the number of genes for GO terms. 5. Results p 10 (Validation by real time PCR): the validation is well described and the results seem to be sound but the number of DE genes/lincRNAs validated is rather low, given the relative ease and rapidity of this technique. Particularly, given the non-validation of the lincRNA in the replication, I wonder why the authors did not try validating one of the other transcripts. For real-time PCR validation, we selected 4 genes of interest having high, medium and low levels of expression. The lncRNA had rather low levels of expression in blood and was technically validated by real-time PCR in the discovery set of patients, the same used for RNAseq. In the confirmatory set of patients, although it did not reach significance, the lncRNA had the same direction of differential expression (overexpression in CF patients) and a similar fold-change as in RNAseq. 6. Results (WGCNA section including tables 3-4 and Figs 2-3): this section necessarily contracts the data in order to focus on a few modules of co-expressed genes that correlate with some CF parameters. However, it would be useful to provide a further clarification about the identity of other undiscussed modules which can be seen in Fig. 2. Table S4 is uninformative in this respect, as it is not easy to find the relevant genes in each module, even using the table legend on page 26. Either explain better how the data can be treated to find these genes, or provide another sheet in table S4 with the columns filtered to identify the important genes of each module. In the revised manuscript, we have added the KEGG analysis of the 28 modules and provided evidence that gene modules are biologically meaningful. See lines 222-225 and Table 3 showing the KEGG analysis of 28 modules. In the revised legend of Table S4, we have specified how to identify the most connected genes of a module and we have better explained the kMEi (lines 554-560). We wish to keep Table S4 as a whole, including modules that were not analyzed in details and genes that are below our thresholds. This information can be useful for other studies or people interested in other genes. 7. Discussion p 16-19. The discussion of the suitability of the blood transcriptome for studying the contribution of DE gene expression in CFRD should be extended. In particular it would be interesting to know what proportions of which cell types were present in the whole blood from which the RNA samples were extracted. Some cytological data from similar blood samples to those used in the study would help in this respect, and would allow a fuller appreciation of the functional significance of the data. In the result section, we have provided blood cell composition for a subset of CF patients. Blood cell composition was within normal ranges, as expected for CF patients without exacerbation. In addition, we have specified that whole blood samples had been gathered in PAXgene tubes that not only stabilize RNA, but also preserve all types of circulating leukocytes. This is not the case for other blood storage methods that alter the cell proportion in blood samples and also the transcriptomic profile. A bibliographic reference has been provided. See lines 162-169 and reference 18. We preferred to treat the issue in the result section to avoid expanding the discussion in line with referee 2 comments. Reviewer #2: In this manuscript, the authors performed RNA-seq analysis of blood samples obtained from patients affected with cystic fibrosis and healthy controls. They analyzed the data obtained by Weighted Gene Correlation Network Analysis and they found a correlation between lung function, body mass index, the presence of diabetes, and chronic P. aeruginosa infections with four modules of co-expressed genes. This study is original and overall well written, I found the discussion a little bit lengthy. Figure 3 is too dense. We thank the reviewer for his comments. The discussion was slightly shortened and a few paragraphs were rephrased. Figure 3 was split into figure 3 and figure 4. Figure legends are not included in the PDF file. Figure legends are embedded in the manuscript according to PLOS One guidelines. 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Enzo Lalli [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step. Figures 1 to 4 have been uploaded to the PACE tool. Submitted filename: Response to Reviewers.docx Click here for additional data file. 20 Mar 2020 Blood co-expression modules identify potential modifier genes of diabetes and lung function in cystic fibrosis PONE-D-20-00190R1 Dear Dr. De Sario, We are pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it complies with all outstanding technical requirements. Within one week, you will receive an e-mail containing information on the amendments required prior to publication. When all required modifications have been addressed, you will receive a formal acceptance letter and your manuscript will proceed to our production department and be scheduled for publication. Shortly after the formal acceptance letter is sent, an invoice for payment will follow. To ensure an efficient production and billing process, please log into Editorial Manager at https://www.editorialmanager.com/pone/, click the "Update My Information" link at the top of the page, and update your user information. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, you must inform our press team as soon as possible and no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. With kind regards, Barbara Bardoni Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 7 Apr 2020 PONE-D-20-00190R1 Blood co-expression modules identify potential modifier genes of diabetes and lung function in cystic fibrosis Dear Dr. De Sario: I am pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please notify them about your upcoming paper at this point, to enable them to help maximize its impact. If they will be preparing press materials for this manuscript, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. For any other questions or concerns, please email plosone@plos.org. Thank you for submitting your work to PLOS ONE. With kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Barbara Bardoni Academic Editor PLOS ONE

Table 1

Demographic and relevant clinical features of the MethylCF cohort.

	Discovery set		Replication set
	Controls	CF patients	Controls	CF patients
	(n = 16)	(n = 33)	(n = 8)	(n = 20)
Age, years†	28	28 (10)	31	25 (8)
Sex, M:F	10:6	23:10	3:5	10:10
BMI, kg/m²†		21 (4)		21 (4)
Weight, kg†		60 (13)		60 (12)
Height, cm†		171 (9)		168 (11)
FEV₁, %†		48 (24)		48 (24)
FEV₁, L†		1.91 (1.1)		1.70 (0.8)
FVC, %†		76 (22)		68 (28)
FVC, L†		3.32 (1.1)		2.99 (0.6)
PI, %		100		100
Diabetes, %*		36		30
HbA1c, %†		6.1 (1.1)		5.4 (na)
Atopy, %		18		45
P. aeruginosa, %§		94		95
MRSA, %§		36		15
A. fumigatus, %		21		25
Azythromycin, %		91		100
Aztreonam, %		12		25
Colistin, %		39		50
Tobramycin, %		55		50
Corticosteroid, %		33		45

BMI, body mass index; CF, cystic fibrosis; FEV1, forced expiratory volume in 1 second

FVC, forced vital capacity; HbA1c, glycated hemoglobin fraction; PI, pancreatic insufficiency

MRSA, Methicillin-resistant Staphyloccocus aureus.

† Median values (interquartile range).

* The presence of diabetes was determined on the basis of an abnormal oral glucose-tolerance test.

§ CF patients were classified as chronically infected by P. aeruginosa, MSRA and/or A. fumigatus

whenever they had three consecutive positive sputum cultures after antibiotic treatment.

na, not applicable because only five measurements were available.

Table 2

Validation of RNA-seq data by real time-PCR.

	RNA-seq—Discovery set				real time-PCR—Discovery set				real time-PCR—Replication set
	FPKM		FC	p-value†	Normalized ratios		FC	p-value†	Normalized ratios		FC	p-value†
	Med CF	Med C	FC	p-value†	Med CF	Med C	FC	p-value†	Med CF	Med C	FC	p-value†
TLR5	13.87	4.69	2.96	3.9E-05	2.59	1.15	2.26	6.6E-05	2.72	1.39	1.96	3.9E-03
CLEC4D	14.70	8.73	1.68	3.5E-05	5.72	2.60	2.20	2.6E-06	4.68	2.61	1.79	1.6E-02
ALPL	49.21	19.32	2.55	7.9E-04	2.34	0.84	2.80	7.3E-03	2.88	0.91	3.18	2.1E-03
CITF22-49E9.3	9.35	5.38	1.74	6.1E-05	3.37	2.09	1.61	1.5E-06	3.15	2.20	1.43	1.8E-01

FPKM: Fragments Per Kilobase of transcript per Million mapped reads, Med: median, CF: Cystic Fibrosis, C: Control, FC: fold-change (CF/C).

† CF vs C, Wilcoxon test.

Table 3

Module of coexpressed genes: KEGG analysis.

	Size*	KEGG §	Pathway	N	R	FDR
Darkgreen	90					n.s.
Red	802	hsa04330	Notch signaling pathway	48	6.9	4.3E-04
Turquoise	2838	hsa04666	Fc gamma R-mediated phagocytosis	91	6.7	8.0E-10
Black	679	hsa00310	Lysine degradation	59	6.5	4.2E-03
Tan	243	hsa05210	Colorectal cancer	86	7.8	1.8E-02
White	35	hsa04120	Ubiquitin mediated proteolysis	137	10.2	6.8E-03
Orange	66	hsa04217	Necroptosis	162	12.1	1.3E-06
Royalblue	140	hsa04621	NOD-like receptor signaling	168	8.6	1.8E-06
Darkred	107					n.s.
Greenyellow	320	hsa04932	Non-alcoholic fatty liver disease	149	7.1	2.5E-06
Purple	573	hsa04144	Endocytosis	244	3.3	3.2E-04
LightYellow	143	hsa04611	Platelet activation	122	8.2	9.3E-05
Magenta	595					n.s.
Salmon	235					n.s.
Cyan	173	hsa03050	Proteasome	44	20.4	1.0E-05
Darkgrey	68	hsa05203	Viral carcinogenesis	201	14.2	0.0E+00
Lightgreen	145					n.s.
Pink	673	hsa04010	MAPK signaling	255	2.8	1.0E-03
Darkturquoise	79					n.s.
Grey60	156	hsa01230	Biosynthesis of amino acids	75	8.2	6.7E-03
Darkorange	59					n.s.
Midnightblue	171	hsa04650	Natural killer cell mediated cytotoxicity	131	6.3	0.0E+00
Brown	1484	hsa03010	Ribosome	154	12.2	0.0E+00
Green	1097	hsa03010	Ribosome	153	6.4	1.6E-12
Lightcyan	169	hsa04662	B cell receptor signaling	71	16.2	7.8E-06
Blue	1859	hsa04660	T cell receptor signaling	101	4.9	2.6E-02
Yellow	1320					n.s.
Grey	757					n.s.

*Number of genes in the module

§ Top KEGG pathway

N, number of genes in the pathway. R, ratio of enrichment. FDR, false discovery rate

n.s., not significant, p-value > 0.05 after Benjamini-Hochberg correction

Table 4

GO terms for dark turquoise module correlated with lung function and BMI, and light yellow module correlated with diabetes.

Geneset	Description	Number of genes	Ratio of enrichment	FDR
DARKTURQUOISE MODULE
BIOLOGICAL PROCESS				n.s.
CELLULAR COMPONENT*
GO:0030529	intracellular ribonucleoprotein complex	12	4.4	7.7E-03
GO:1990904	ribonucleoprotein complex	12	4.4	7.7E-03
GO:0005912	adherens junction	11	4.3	1.0E-02
GO:0061695	transferase complex, transferring phosphorus-containing groups	7	7.5	1.0E-02
GO:0070161	anchoring junction	11	4.2	1.0E-02
GO:0005730	nucleolus	12	3.8	1.1E-02
GO:1990234	transferase complex	11	4.0	1.2E-02
GO:0044798	nuclear transcription factor complex	5	10.1	1.7E-02
GO:0005913	cell-cell adherens junction	7	5.9	2.0E-02
GO:1902494	catalytic complex	14	3.0	2.0E-02
MOLECULAR FUNCTION
GO:0003723	RNA binding	17	2.8	3.8E-02
GO:0044877	macromolecular complex binding	15	3.0	3.8E-02
GO:0098641	cadherin binding involved in cell-cell adhesion	7	6.5	3.8E-02
GO:0098632	protein binding involved in cell-cell adhesion	7	6.2	3.8E-02
GO:0044822	poly(A) RNA binding	14	3.1	3.8E-02
GO:0098631	protein binding involved in cell adhesion	7	6.1	3.8E-02
GO:0045296	cadherin binding	7	6.1	3.8E-02
KEGG PATHWAY				n.s.
LIGHT YELLOW MODULE
BIOLOGICAL PROCESS*
GO:0007596	blood coagulation	34	4.2	6.8E-09
GO:0042060	wound healing	43	3.4	6.8E-09
GO:0050817	coagulation	34	4.1	6.8E-09
GO:0007599	hemostasis	34	4.1	6.8E-09
GO:0009611	response to wounding	44	2.9	3.4E-07
GO:0030168	platelet activation	20	5.1	3.5E-06
GO:0050878	regulation of body fluid levels	35	2.9	1.4E-05
GO:0002576	platelet degranulation	15	6.0	3.2E-05
GO:0070527	platelet aggregation	10	7.2	9.3E-04
GO:0006887	exocytosis	27	2.8	1.4E-03
CELLULAR COMPONENT*
GO:0031091	platelet alpha granule	16	11.4	4.9E-10
GO:0031410	cytoplasmic vesicle	73	2.3	6.1E-09
GO:0097708	intracellular vesicle	73	2.3	6.1E-09
GO:0044433	cytoplasmic vesicle part	49	2.8	8.4E-09
GO:0031093	platelet alpha granule lumen	12	11.7	6.4E-08
GO:0099503	secretory vesicle	28	3.3	8.0E-06
GO:0034774	secretory granule lumen	12	7.6	8.4E-06
GO:0030141	secretory granule	23	3.5	3.0E-05
GO:0005925	focal adhesion	24	3.3	3.9E-05
GO:0005924	cell-substrate adherens junction	24	3.3	3.9E-05
MOLECULAR FUNCTION				n.s.
KEGG PATHWAY
hsa04611	Platelet activation—Homo sapiens (human)	10	8.2	9.3E-05

* Top 10 enriched terms

n.s., not significant after Benjamini-Hochberg correction

Table 5

GO terms for cyan module correlated with diabetes and magenta module correlated with P. aeruginosa infection.

Geneset	Description	Number of genes	Ratio of enrichment	FDR
CYAN MODULE
BIOLOGICAL PROCESS
GO:0042180	cellular ketone metabolic process	9	8.6	7.0E-04
GO:0009308	amine metabolic process	7	12.0	7.0E-04
GO:0006520	cellular amino acid metabolic process	10	5.8	1.9E-03
GO:0038061	NIK/NF-kappaB signaling	6	11.5	2.6E-03
GO:0007164	establishment of tissue polarity	6	10.6	3.2E-03
GO:0001738	morphogenesis of a polarized epithelium	6	9.6	4.7E-03
GO:0010608	posttranscriptional regulation of gene expression	10	4.7	4.9E-03
CELLULAR COMPONENT*
GO:0000502	proteasome complex	6	25.5	6.3E-05
GO:1905369	endopeptidase complex	6	25.5	6.3E-05
GO:1902494	catalytic complex	18	4.0	8.0E-05
GO:1905368	peptidase complex	6	18.9	1.9E-04
GO:0005844	polysome	4	25.5	3.7E-03
GO:0030529	intracellular ribonucleoprotein complex	11	4.2	7.0E-03
GO:1990904	ribonucleoprotein complex	11	4.2	7.0E-03
GO:0005839	proteasome core complex	3	40.1	7.0E-03
GO:0005838	proteasome regulatory particle	3	38.3	7.2E-03
GO:0022624	proteasome accessory complex	3	33.7	9.6E-03
MOLECULAR FUNCTION
GO:0003723	RNA binding	22	3.7	2.8E-05
GO:0044822	poly(A) RNA binding	19	4.3	2.8E-05
GO:0004298	threonine-type endopeptidase activity	3	37.6	3.0E-02
GO:0070003	threonine-type peptidase activity	3	37.6	3.0E-02
GO:0035257	nuclear hormone receptor binding	5	10.2	4.7E-02
KEGG PATHWAY
hsa03050	Proteasome—Homo sapiens (human)	6	29.6	1.1E-05
MAGENTA MODULE
BIOLOGICAL PROCESS^*
GO:0006778	porphyrin-containing compound metabolic process	9	12.9	1.7E-04
GO:0016567	protein ubiquitination	38	2.6	3.2E-04
GO:0030163	protein catabolic process	38	2.5	5.2E-04
GO:0006779	porphyrin-containing compound biosynthetic process	7	15.1	5.2E-04
GO:0030218	erythrocyte differentiation	12	6.5	5.5E-04
GO:0033014	tetrapyrrole biosynthetic process	7	13.4	7.3E-04
GO:0046501	protoporphyrinogen IX metabolic process	5	25.9	7.3E-04
GO:0015669	gas transport	6	17.3	7.3E-04
GO:0034101	erythrocyte homeostasis	12	6.0	7.3E-04
GO:0032446	protein modification by small protein conjugation	39	2.3	7.9E-04
CELLULAR COMPONENT^*
GO:0005833	hemoglobin complex	6	38.3	3.5E-06
GO:0030863	cortical cytoskeleton	11	9.2	1.0E-05
GO:0014731	spectrin-associated cytoskeleton	5	43.9	1.0E-05
GO:0005768	endosome	31	2.7	9.1E-05
GO:0005773	vacuole	26	2.9	2.4E-04
GO:0044448	cell cortex part	11	6.3	2.4E-04
GO:0005856	cytoskeleton	54	1.9	3.4E-04
GO:0036019	endolysosome	5	20.6	3.9E-04
GO:0031410	cytoplasmic vesicle	48	1.9	5.1E-04
GO:0097708	intracellular vesicle	48	1.9	5.1E-04
MOLECULAR FUNCTION
GO:0046983	protein dimerization activity	41	2.10	9.4E-03
KEGG PATHWAY				n.s.

Top 10 enriched terms

n.s., not significant after Benjamini-Hochberg correction

38 in total

1. Cytoscape: a software environment for integrated models of biomolecular interaction networks.

Authors: Paul Shannon; Andrew Markiel; Owen Ozier; Nitin S Baliga; Jonathan T Wang; Daniel Ramage; Nada Amin; Benno Schwikowski; Trey Ideker
Journal: Genome Res Date: 2003-11 Impact factor: 9.043

Review 2. Unique challenges of cystic fibrosis-related diabetes.

Authors: N Bridges; R Rowe; R I G Holt
Journal: Diabet Med Date: 2018-04-23 Impact factor: 4.359

3. Multi-ethnic reference values for spirometry for the 3-95-yr age range: the global lung function 2012 equations.

Authors: Philip H Quanjer; Sanja Stanojevic; Tim J Cole; Xaver Baur; Graham L Hall; Bruce H Culver; Paul L Enright; John L Hankinson; Mary S M Ip; Jinping Zheng; Janet Stocks
Journal: Eur Respir J Date: 2012-06-27 Impact factor: 16.671

Review 4. Cystic fibrosis.

Authors: J Stuart Elborn
Journal: Lancet Date: 2016-04-29 Impact factor: 79.321

5. The relationship of soluble immune complexes, insulin antibodies and insulin-anti-insulin complexes to platelet and coagulation factors in type 1 diabetic patients with and without proliferative retinopathy.

Authors: U Di Mario; D Q Borsey; G Contreas; C V Prowse; B F Clarke; D Andreani
Journal: Clin Exp Immunol Date: 1986-07 Impact factor: 4.330

6. HTSeq--a Python framework to work with high-throughput sequencing data.

Authors: Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Journal: Bioinformatics Date: 2014-09-25 Impact factor: 6.937

7. Changes in transcriptome of native nasal epithelium expressing F508del-CFTR and intersecting data from comparable studies.

Authors: Luka A Clarke; Lisete Sousa; Celeste Barreto; Margarida D Amaral
Journal: Respir Res Date: 2013-03-28

Review 8. The proteasome: overview of structure and functions.

Authors: Keiji Tanaka
Journal: Proc Jpn Acad Ser B Phys Biol Sci Date: 2009 Impact factor: 3.493

9. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.

Authors: Daehwan Kim; Geo Pertea; Cole Trapnell; Harold Pimentel; Ryan Kelley; Steven L Salzberg
Journal: Genome Biol Date: 2013-04-25 Impact factor: 13.583

10. CFTR and Anoctamin 1 (ANO1) contribute to cAMP amplified exocytosis and insulin secretion in human and murine pancreatic beta-cells.

Authors: Anna Edlund; Jonathan L S Esguerra; Anna Wendt; Malin Flodström-Tullberg; Lena Eliasson
Journal: BMC Med Date: 2014-05-28 Impact factor: 8.775

4 in total

1. Identification of human mitochondrial RNA cleavage sites and candidate RNA processing factors.

Authors: Guillermo Carbajosa; Aminah T Ali; Alan Hodgkinson
Journal: BMC Biol Date: 2022-07-22 Impact factor: 7.364

Review 2. Cystic Fibrosis-Related Diabetes (CFRD): Overview of Associated Genetic Factors.

Authors: Fernanda Iafusco; Giovanna Maione; Francesco Maria Rosanio; Enza Mozzillo; Adriana Franzese; Nadia Tinto
Journal: Diagnostics (Basel) Date: 2021-03-22

3. Cystic Fibrosis: Systems Biology Analysis from Homozygous p.Phe508del Variant Patients' Samples Reveals Perturbations in Tissue-Specific Pathways.

Authors: Joice de Faria Poloni; Thaiane Rispoli; Maria Lucia Rossetti; Cristiano Trindade; José Eduardo Vargas
Journal: Biomed Res Int Date: 2021-12-02 Impact factor: 3.411

4. DNA Methylation at ATP11A cg11702988 Is a Biomarker of Lung Disease Severity in Cystic Fibrosis: A Longitudinal Study.

Authors: Fanny Pineau; Davide Caimmi; Sylvie Taviaux; Maurane Reveil; Laura Brosseau; Isabelle Rivals; Margot Drevait; Isabelle Vachier; Mireille Claustres; Raphaël Chiron; Albertina De Sario
Journal: Genes (Basel) Date: 2021-03-19 Impact factor: 4.096

4 in total