Literature DB >> 33308225

Identification of biological correlates associated with respiratory failure in COVID-19.

Jung Hun Oh1, Allen Tannenbaum2, Joseph O Deasy3.   

Abstract

BACKGROUND: Coronavirus disease 2019 (COVID-19) is a global public health concern. Recently, a genome-wide association study (GWAS) was performed with participants recruited from Italy and Spain by an international consortium group.
METHODS: Summary GWAS statistics for 1610 patients with COVID-19 respiratory failure and 2205 controls were downloaded. In the current study, we analyzed the summary statistics with the information of loci and p-values for 8,582,968 single-nucleotide polymorphisms (SNPs), using gene ontology analysis to determine the top biological processes implicated in respiratory failure in COVID-19 patients.
RESULTS: We considered the top 708 SNPs, using a p-value cutoff of 5 × 10- 5, which were mapped to the nearest genes, leading to 144 unique genes. The list of genes was input into a curated database to conduct gene ontology and protein-protein interaction (PPI) analyses. The top ranked biological processes were wound healing, epithelial structure maintenance, muscle system processes, and cardiac-relevant biological processes with a false discovery rate < 0.05. In the PPI analysis, the largest connected network consisted of 8 genes. Through a literature search, 7 out of the 8 gene products were found to be implicated in both pulmonary and cardiac diseases.
CONCLUSION: Gene ontology and PPI analyses identified cardio-pulmonary processes that may partially explain the risk of respiratory failure in COVID-19 patients.

Entities:  

Keywords:  Bioinformatics; COVID-19; Genome-wide association study; Respiratory failure; SARS-CoV-2; Single-nucleotide polymorphisms

Mesh:

Year:  2020        PMID: 33308225      PMCID: PMC7729705          DOI: 10.1186/s12920-020-00839-1

Source DB:  PubMed          Journal:  BMC Med Genomics        ISSN: 1755-8794            Impact factor:   3.063


Background

Coronavirus disease 2019 (COVID-19) caused by a novel coronavirus (severe acute respiratory syndrome coronavirus 2, SARS-CoV-2) has resulted in a global pandemic with a rapidly developing global health and economic crisis [1]. Most people with COVID-19 are asymptomatic or experience only mild symptoms [2]. However, about 5% of patients infected with the coronavirus develop acute lung injury and acute respiratory distress syndrome, possibly leading to lethal lung damage and even death [3]. The most common reported comorbidities associated with poor outcomes in COVID-19 include hypertension, diabetes, cardiovascular disease, and chronic respiratory infections [4, 5]. However, the underlying molecular mechanisms in severe COVID-19 and their interplay with such comorbidities or clinical factors are poorly understood [6]. To identify putative biomarkers that can help better understand the molecular basis of COVID-19, Blanco-Melo et al. investigated the host transcriptional response to SARS-CoV-2 and other respiratory infections through in vitro, ex vivo, and in vivo experiments [1]. Bioinformatical approaches including gene ontology and protein-protein interaction (PPI) analyses were performed to identify key biological correlates. To investigate key genetic variants associated with respiratory failure in COVID-19 patients, a genome-wide association study (GWAS) was carried out on participants recruited from Italy and Spain [7]. In the current study, we performed an in-depth biological characterization including gene ontology and PPI analyses on summary statistics that resulted from the GWAS analysis in order to identify key biological correlates relevant to respiratory failure in COVID-19 patients.

Methods

The GWAS conducted by an international consortium group involved 1980 patients with severe acute respiratory failure induced by COVID-19 at seven hospitals in Italy and Spain [7]. After quality control, the final case-control cohort included 835 patients and 1255 control participants from Italy and 775 patients and 950 control participants from Spain. After genotyping and imputation on genome build GRCh38, univariate analysis was performed for 8,582,968 single-nucleotide polymorphisms (SNPs). The resulting summary statistics including individual SNP positions and p-values were submitted to the European Bioinformatics Institute (www.ebi.ac.uk/gwas; accession numbers, GCST90000255 and GCST90000256) and are available from www.c19-genetics.eu. The GCST90000255 was the main analysis in which all the association statistics were corrected for the top 10 principal components (PCs), whereas in the additional analysis of GCST90000256, association statistics were corrected for the top 10 PCs, age, and sex. In [7], the main results were found in the analysis on GCST90000255, and GCST90000256 was used for ancillary analysis. In the current study, we therefore focused on the summary statistics of GCST90000255 for further biological analysis because the analysis on GCST90000255 resulted in more plausible biological correlates likely associated with respiratory failure than those in GCST90000256. To further enrich gene ontology terms with more plausible SNPs likely relevant to acute respiratory failure in COVID-19, we employed a relaxed p-value of 5 × 10− 5 as a filtering threshold on the summary statistics. The SNPs with p-values < 5 × 10− 5 were mapped to nearest genes using a 50 kb window on both upstream and downstream sides of each gene. The resulting list of genes was together fed into MetaCore software (Thompson Reuters, New York, NY) for gene ontology analysis. Further PPI analysis was performed to explore the largest connected network among the resulting list of genes with an option of ‘Direct interactions’ as a network building algorithm in MetaCore software, assuming that interacting proteins in a biological network may have the same or similar molecular functions [8-10]. To complement the biological interpretation using genes that were identified based on the proximity of candidate SNPs, the biological analysis described above was repeated using genes identified as expression quantitative trait loci (eQTL) targets from the Genotype-Tissue Expression (GTEx) database for tissues that appear to be relevant to respiratory failure, including the aorta, coronary artery, skeletal muscle, lung, and atrial appendage and left ventricle in the heart [11].

Results

Gene ontology analysis

In our analysis, with a p-value threshold of 5 × 10− 5 applied to the summary statistics, 708 SNPs remained and a corresponding set of 144 unique genes in autosomes was found (Additional file 1). The list of genes was fed into a MetaCore database. Table 1 shows the top 10 biological processes and corresponding genes that appear to be relevant to respiratory failure in COVID-19 patients, all with false discovery rate (FDR) values < 0.05. Wound healing, epithelial structure maintenance, muscle system process, and cardiac-relevant biological processes were top-ranked.
Table 1

The top 10 significant biological processes likely associated with respiratory failure in COVID-19 patients, using genes that were identified based on the proximity of 708 SNPs. The genes for each biological process belong to the list of 144 genes. FDR: false discovery rate

RankingGene OntologyFDRGenes
1Wound healing1.962E-02ADAMTS13, CCR9, CXCR6, DMBT1, EPPK1, GATA4, ITGB3, MYH1, MYH2, MYH4, PLEC, PRTN3, SMAD3, TFF1, TFF3, UBASH3A
2Epithelial structure maintenance1.962E-02LDB2, TFF1, TFF2, TFF3
3Cardiac ventricle development1.962E-02CCR9, CXCR6, GATA4, ID2, MYH1, MYH4, PTBP1, SLIT3, SUFU
4Ventricular septum development1.962E-02CCR9, CXCR6, GATA4, ID2, SLIT3, SUFU
5Cardiac septum development1.962E-02CCR9, CXCR6, GATA4, ID2, MYH1, MYH4, SLIT3, SUFU
6Transdifferentiation1.962E-02GATA4, SMAD3
7Muscle system process2.106E-02CCR9, CHRNA1, CXCR6, GATA4, MYH1, MYH2, MYH4, TMOD2, TMOD3
8Cellular component maintenance2.518E-02CCR9, CXCR6, DLGAP1, ERC2, PRTN3, TANC1
9Embryonic foregut morphogenesis2.624E-02GATA4, SMAD3
10Response to virus2.783E-02AZU1, CCR9, CXCR6, DDX1, DMBT1, GATA4, IL12A, TRIM5, TRIM6, TRIM22, TRIM34
The top 10 significant biological processes likely associated with respiratory failure in COVID-19 patients, using genes that were identified based on the proximity of 708 SNPs. The genes for each biological process belong to the list of 144 genes. FDR: false discovery rate

PPI network analysis

The largest connected PPI network in the list of 144 genes is shown in Fig. 1. The PPI network consisted of 8 gene products for the following genes: GATA4, ID2, MAFA, NOX4, PTBP1, SMAD3, TUBB1, and WWOX. We conducted a literature search in PubMed to investigate the potential associations between those 8 genes/proteins and pulmonary or cardiac diseases. Additional file 2 contains a table that lists an overview of reported studies in terms of the associations. Interestingly, except for MAFA that is involved in insulin secretion, all 7 gene products were found to be implicated in both pulmonary and cardiac diseases.
Fig. 1

The largest connected network in the list of 144 genes. The line colors indicate the activation (green), inhibition (red), and unspecified (gray) effects

The largest connected network in the list of 144 genes. The line colors indicate the activation (green), inhibition (red), and unspecified (gray) effects

eQTL analysis

Tissue-specific genes that had significant associations with the 708 SNPs were identified from GTEx V8 [11]. Six tissues were examined including the aorta, coronary artery, skeletal muscle, lung, and atrial appendage and left ventricle in the heart, resulting in 17, 4, 21, 17, 8, and 10 genes, respectively (Additional file 3). Gene ontology and PPI analyses described above were repeated using the resultant 34 unique genes. Among 34 gene products, no any interaction was found. Table 2 shows the top 10 biological processes and corresponding genes, all with FDR values < 0.05. All biological processes were involved in 3 gene products: CCR3, CCR5, and CXCR6. Chemokine-related biological processes were top-ranked.
Table 2

The top 10 significant biological processes likely associated with respiratory failure in COVID-19 patients, using genes identified as expression quantitative trait loci targets from 6 tissues for 708 SNPs. The genes for each biological process belong to the list of 34 genes. FDR: false discovery rate

RankingGene OntologyFDRGenes
1Angioblast cell migration3.293E-03CCR3, CCR5, CXCR6
2Chemokine-mediated signaling pathway3.946E-03CCR3, CCR5, CXCR6
3Cellular response to chemokine3.946E-03CCR3, CCR5, CXCR6
4Response to chemokine3.946E-03CCR3, CCR5, CXCR6
5Positive regulation of apoptotic process by virus3.946E-03CCR3, CCR5, CXCR6
6Positive regulation by symbiont of host apoptotic process4.926E-03CCR3, CCR5, CXCR6
7Positive regulation by symbiont of host programmed cell death4.926E-03CCR3, CCR5, CXCR6
8Killing by symbiont of host cells4.926E-03CCR3, CCR5, CXCR6
9Modulation by virus of host apoptotic process4.926E-03CCR3, CCR5, CXCR6
10Positive regulation by organism of programmed cell death in other organism involved in symbiotic interaction4.926E-03CCR3, CCR5, CXCR6
The top 10 significant biological processes likely associated with respiratory failure in COVID-19 patients, using genes identified as expression quantitative trait loci targets from 6 tissues for 708 SNPs. The genes for each biological process belong to the list of 34 genes. FDR: false discovery rate Among the 708 SNPs, 41, 48, 5, 59, and 180 SNPs had eQTL associations for multiple tissues with 6, 5, 4, 3, and 2 tissues, respectively (Additional file 4). In addition, rs8093548 (chr18, pos: 79876451 in GRCh38), rs4799099 (chr18, pos: 79879585), and rs4799100 (chr18: pos: 79880207) had eQTL associations with the most genes; all three SNPs had eQTL associations with the same five genes – HSBP1L1, PQLC1, RBFA, RBFADN, and TXNL4A – in five tissues including the aorta, skeletal muscle, lung, and atrial appendage and left ventricle in the heart.

Discussion

Summary statistics from a GWAS dataset for respiratory failure in COVID-19 patients were analyzed employing bioinformatics techniques. To enrich the biological discovery, a relaxed p-value of 5 × 10− 5 was adopted, which likely enabled the inclusion of much potential genomic information in the analysis and the identification of plausible biological correlates associated with pulmonary or cardiac symptoms. A list of SNPs filtered by the relaxed p-value threshold was mapped to nearby genes. The resulting 144 genes were fed into a MetaCore database for gene ontology and PPI analyses. Gene ontology analysis identified wound healing, cardiac-related biological process, and muscle system process as key correlates. For PPI analysis, we attempted to find the largest connected network in the list of 144 genes, assuming that interacting proteins in a biological network tend to have the same or similar molecular functions. As a result, the largest connected network consisted of 8 gene products from the following genes: GATA4, ID2, MAFA, NOX4, PTBP1, SMAD3, TUBB1, and WWOX. A literature search was conducted through PubMed to investigate whether there are previously reported results in terms of biological associations between these genes/proteins and respiratory or cardiac symptoms. Interestingly, we found that most of these gene products are relevant to both respiratory and cardiac diseases. In what follows, we describe the role of these biomarkers in biology. A study reported that GATA4 plays a critical role as a transcription factor in the normal pulmonary development [12]. GATA4 also has been found to be a human candidate gene relevant to congenital heart disease [13, 14]. Several studies showed that GATA4 is a key protein responsible for the development of the lung, heart, and diaphragm in mice [15-17]. Arwood et al. described a mechanism of pulmonary hypertension in heart failure with preserved ejection fraction (HFpEF), using transcriptome-wide RNA sequencing [18]. When comparing the transcriptomic difference between patients without pulmonary hypertension and those with combined post- and pre-capillary pulmonary hypertension, six differentially expressed genes were identified. In a further replication test on an independent cohort, only ID2 was validated and in an additional animal study, ID2 expression was significantly upregulated in mice with HFpEF and pulmonary hypertension compared to control mice. Another study showed a functional role of ID2 as one of the culprit genes in both the arterial and the venous poles of the heart [19]. An increased expression of NOX4 and TGF-β was found to be correlated with the increased volume in both airway smooth muscle mass and epithelial cells of small airways in patients with chronic obstructive pulmonary disease (COPD) [20]. Another study reported that the upregulation of NOX4 in the heart induced cardiac remodeling, suggesting its potential role to reduce the severity of established heart failure [21]. Gauldie et al. demonstrated a cascade of biological interactions among inflammation, TGF-β activation, SMAD3 signaling, pulmonary fibrosis, and emphysema [22]. Huang et al. found that SMAD3 is a key mediator in chronic cardiovascular disease, and plays a critical role in hypertensive cardiac remodeling [23]. At 4 and 24 h after respiratory syncytial virus infection, gene expression profiles in human bronchial epithelial cells were analyzed [24]. Among the six genes that were associated with respiratory disease and were significantly altered at both 4 and 24 h post-infection, TUBB1 was the only gene observed to be downregulated at both time points. Freson et al. showed that the TUBB1 Q43P functional variant may be a protective genetic factor against cardiovascular disease [25]. Caruso et al. observed the downregulation of miR-124 in patients with pulmonary arterial hypertension and its central role in contributing to abnormal cell proliferation via PTBP1 and PKM2 [26]. Recently, Fochi et al. showed the emerging role of RBM20 and PTBP1 as key splicing factors in heart development and cardiovascular disease [27]. A study reported that the loss of WWOX promoted cell proliferation in pulmonary artery smooth muscle cells and contributed to pulmonary vascular remodeling in pulmonary arterial hypertension [28]. Another study reported the vital implications of WWOX in atherosclerosis and cardiovascular diseases [29]. MAFA has not been found to be directly related to pulmonary or cardiac symptoms in the literature review. However, MAFA has been shown to be a key regulator that controls genes implicated in insulin secretion [30, 31]. A recent study indicated that a number of patients with COVID-19, who were comorbid with diabetes or diabetes-related traits, had increased ACE2 expression [32]. This suggests that ACE2 appears to be a potentially key molecular link between insulin resistance and COVID-19 severity [33]. The combined evidence indicated that lung disease is likely to be associated with cardiovascular risk. Further research should be warranted to identify the common biological processes between lung and heart diseases and the interplay between them. We further assessed various filtering thresholds. With a stricter p-value of 1 × 10− 5, 390 SNPs and corresponding 27 unique genes in autosomes remained. Gene ontology analysis with a relatively small number of genes resulted in immunity-related biological processes as the top important covariates. The top two biological processes were chemokine-mediated signaling pathway (FDR = 2.906E-3) and CD8-positive, gamma-delta intraepithelial T cell differentiation (FDR = 2.906E-3). With a more relaxed p-value of 1 × 10− 4, 1112 SNPs and corresponding 243 unique genes in autosomes remained. Gene ontology analysis with those genes resulted in biological processes that are irrelevant to respiratory failure, which is likely due to false positives added in the analysis. This implies that the selection of an optimal threshold is critical to identify real biological correlates. Information informed by machine learning-based predictive modeling on GWAS data, which we employed in other studies [8, 9], can help resolve the issue. Biological analyses using genes that were identified based on the proximity of candidate SNPs resulted in cardio-pulmonary processes as associated with respiratory failure. In particular, 7 out of the 8 gene products in the largest connected network were found to be implicated in both pulmonary and cardiac diseases. In contrast, the selection of genes identified as eQTL targets uncovered chemokine-related biological processes, indicating the association with the immune system. This suggests that an integrated analysis of the two methods in identifying relevant genes can help better understand the underlying biological mechanisms of respiratory failure in COVID-19 patients.

Conclusions

We analyzed summary statistics from a GWAS dataset where individual SNPs were tested for associations with respiratory failure in COVID-19 patients. Bioinformatics approaches with SNPs filtered using a relaxed p-value enabled the identification of plausible biological correlates that are likely to be relevant to pulmonary or cardiac symptoms. When genotyping data become available, a more in-depth analysis using machine learning and bioinformatics techniques will provide greater insights into the underlying mechanisms of respiratory failure in COVID-19 patients. Additional file 1. 144 genes identified based on the proximity of 708 candidate SNPs. Additional file 2. An overview of reported studies in terms of the associations between genes/proteins and pulmonary or cardiac symptoms. Additional file 3. For 708 candidate SNPs, eQTL information obtained from GTEx V8 for 6 tissues, including the aorta, coronary artery, skeletal muscle, lung, and atrial appendage and left ventricle in the heart. Additional file 4. SNPs with eQTL associations in multiple tissues.
  32 in total

Review 1.  Gene regulatory networks in the evolution and development of the heart.

Authors:  Eric N Olson
Journal:  Science       Date:  2006-09-29       Impact factor: 47.728

2.  NADPH oxidase 4 induces cardiac fibrosis and hypertrophy through activating Akt/mTOR and NFκB signaling pathways.

Authors:  Qingwei David Zhao; Suryavathi Viswanadhapalli; Paul Williams; Qian Shi; Chunyan Tan; Xiaolan Yi; Basant Bhandari; Hanna E Abboud
Journal:  Circulation       Date:  2015-01-14       Impact factor: 29.690

3.  Impaired mesenchymal cell function in Gata4 mutant mice leads to diaphragmatic hernias and primary lung defects.

Authors:  Patrick Y Jay; Malgorzata Bielinska; Jonathan M Erlich; Susanna Mannisto; William T Pu; Markku Heikinheimo; David B Wilson
Journal:  Dev Biol       Date:  2006-10-05       Impact factor: 3.582

4.  MAFA controls genes implicated in insulin biosynthesis and secretion.

Authors:  H Wang; T Brun; K Kataoka; A J Sharma; C B Wollheim
Journal:  Diabetologia       Date:  2006-12-06       Impact factor: 10.122

5.  Gata4 is necessary for normal pulmonary lobar development.

Authors:  Kate G Ackerman; Jianlong Wang; Liqing Luo; Yuko Fujiwara; Stuart H Orkin; David R Beier
Journal:  Am J Respir Cell Mol Biol       Date:  2006-12-01       Impact factor: 6.914

6.  The TUBB1 Q43P functional polymorphism reduces the risk of cardiovascular disease in men by modulating platelet function and structure.

Authors:  Kathleen Freson; Rita De Vos; Christine Wittevrongel; Chantal Thys; Johan Defoor; Luc Vanhees; Jos Vermylen; Kathelijne Peerlinck; Chris Van Geet
Journal:  Blood       Date:  2005-06-14       Impact factor: 22.113

7.  Smad3 mediates cardiac inflammation and fibrosis in angiotensin II-induced hypertensive cardiac remodeling.

Authors:  Xiao R Huang; Arthur C K Chung; Fuye Yang; Wensheng Yue; Chuxia Deng; Chu Pak Lau; Hung Fat Tse; Hui Y Lan
Journal:  Hypertension       Date:  2010-03-15       Impact factor: 10.190

8.  Identification of gene biomarkers for respiratory syncytial virus infection in a bronchial epithelial cell line.

Authors:  Yuh-Chin T Huang; Zhuowei Li; Xhevahire Hyseni; Michael Schmitt; Robert B Devlin; Edward D Karoly; Joleen M Soukup
Journal:  Genomic Med       Date:  2009-05-15

9.  Transcriptome-wide analysis associates ID2 expression with combined pre- and post-capillary pulmonary hypertension.

Authors:  Meghan J Arwood; Nasim Vahabi; Christelle Lteif; Ravindra K Sharma; Roberto F Machado; Julio D Duarte
Journal:  Sci Rep       Date:  2019-12-20       Impact factor: 4.379

10.  Identification of MicroRNA-124 as a Major Regulator of Enhanced Endothelial Cell Glycolysis in Pulmonary Arterial Hypertension via PTBP1 (Polypyrimidine Tract Binding Protein) and Pyruvate Kinase M2.

Authors:  Paola Caruso; Benjamin J Dunmore; Kenny Schlosser; Sandra Schoors; Claudia Dos Santos; Carol Perez-Iratxeta; Jessie R Lavoie; Hui Zhang; Lu Long; Amanda R Flockton; Maria G Frid; Paul D Upton; Angelo D'Alessandro; Charaka Hadinnapola; Fedir N Kiskin; Mohamad Taha; Liam A Hurst; Mark L Ormiston; Akiko Hata; Kurt R Stenmark; Peter Carmeliet; Duncan J Stewart; Nicholas W Morrell
Journal:  Circulation       Date:  2017-09-26       Impact factor: 29.690

View more
  6 in total

1.  The Impact of IFNλ4 on the Adaptive Immune Response to SARS-CoV-2 Infection.

Authors:  Michelle Møhlenberg; Ida Monrad; Line K Vibholm; Stine S F Nielsen; Giacomo Schmidt Frattari; Mariane Høgsbjerg Schleimann; Rikke Olesen; Mads Kjolby; Jesper Damsgaard Gunst; Ole Schmeltz Søgaard; Thomas R O'Brien; Martin Tolstrup; Rune Hartmann
Journal:  J Interferon Cytokine Res       Date:  2021-11       Impact factor: 2.607

2.  First report on genome wide association study in western Indian population reveals host genetic factors for COVID-19 severity and outcome.

Authors:  Ramesh Pandit; Indra Singh; Afzal Ansari; Janvi Raval; Zarna Patel; Raghav Dixit; Pranay Shah; Kamlesh Upadhyay; Naresh Chauhan; Kairavi Desai; Meenakshi Shah; Bhavesh Modi; Madhvi Joshi; Chaitanya Joshi
Journal:  Genomics       Date:  2022-06-06       Impact factor: 4.310

3.  Potential Genes Associated with COVID-19 and Comorbidity.

Authors:  Shanshan Feng; Fuqiang Song; Wenqiong Guo; Jishan Tan; Xianqin Zhang; Fengling Qiao; Jinlin Guo; Lin Zhang; Xu Jia
Journal:  Int J Med Sci       Date:  2022-01-24       Impact factor: 3.738

4.  Identification of host transcriptome-guided repurposable drugs for SARS-CoV-1 infections and their validation with SARS-CoV-2 infections by using the integrated bioinformatics approaches.

Authors:  Fee Faysal Ahmed; Md Selim Reza; Md Shahin Sarker; Md Samiul Islam; Md Parvez Mosharaf; Sohel Hasan; Md Nurul Haque Mollah
Journal:  PLoS One       Date:  2022-04-07       Impact factor: 3.240

5.  Meta-Data Analysis to Explore the Hub of the Hub-Genes That Influence SARS-CoV-2 Infections Highlighting Their Pathogenetic Processes and Drugs Repurposing.

Authors:  Md Parvez Mosharaf; Md Kaderi Kibria; Md Bayazid Hossen; Md Ariful Islam; Md Selim Reza; Rashidul Alam Mahumud; Khorshed Alam; Jeff Gow; Md Nurul Haque Mollah
Journal:  Vaccines (Basel)       Date:  2022-08-03

6.  Computational identification of host genomic biomarkers highlighting their functions, pathways and regulators that influence SARS-CoV-2 infections and drug repurposing.

Authors:  Md Parvez Mosharaf; Md Selim Reza; Md Kaderi Kibria; Fee Faysal Ahmed; Md Hadiul Kabir; Sohel Hasan; Md Nurul Haque Mollah
Journal:  Sci Rep       Date:  2022-03-11       Impact factor: 4.379

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.