| Literature DB >> 36229517 |
Suvojit Hazra1,2, Alok Ghosh Chaudhuri3, Basant K Tiwary4, Nilkanta Chakrabarti5,6.
Abstract
'Tripartite network' (TN) and 'combined gene network' (CGN) were constructed and their hub-bottleneck and driver nodes (44 genes) were evaluated as 'target genes' (TG) to identify 21 'candidate genes' (CG) and their relationship with neurological manifestations of COVID-19. TN was developed using neurological symptoms of COVID-19 found in literature. Under query genes (TG of TN), co-expressed genes were identified using pair-wise mutual information to genes available in RNA-Seq autopsy data of frontal cortex of COVID-19 victims. CGN was constructed with genes selected from TN and co-expressed in COVID-19. TG and their connecting genes of respective networks underwent functional analyses through findings of their enrichment terms and pair-wise 'semantic similarity scores' (SSS). A new integrated 'weighted harmonic mean score' was formulated assimilating values of SSS and STRING-based 'combined score' of the selected TG-pairs, which provided CG-pairs with properties of CGs as co-expressed and 'indispensable nodes' in CGN. Finally, six pairs sharing seven 'prevalent CGs' (ADAM10, ADAM17, AKT1, CTNNB1, ESR1, PIK3CA, FGFR1) showed linkages with the phenotypes (a) directly under neurodegeneration, neurodevelopmental diseases, tumour/cancer and cellular signalling, and (b) indirectly through other CGs under behavioural/cognitive and motor dysfunctions. The pathophysiology of 'prevalent CGs' has been discussed to interpret neurological phenotypes of COVID-19.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36229517 PMCID: PMC9558001 DOI: 10.1038/s41598-022-21109-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1The flowchart of the stringent methodologies applied orderly and the results found in a systems-level analysis to identify ‘candidate genes’ and associated functional modules (symptoms/diseases) related to neurological manifestations of COVID-19.
Figure 2Study design for the construction of the tripartite network (TN) for COVID-19: Stepwise presentation of the construction of TN having symptoms, diseases and genes as nodes and their interactions as edges including inter-edges (viz., symptom-disease, symptom-gene, disease-gene) in point-1 to point-6 and intra-edges (viz. symptom-symptom, disease-disease, gene–gene) in Point-7 to point-9 of description. Point-1: Extraction of symptoms as terms associated with neurological disorders in COVID-19 from the PubMed bibliographic literature database and assigning a metric viz. bibliographic occurrence frequency (f) to each term of symptom. Point-2 and Point-3: Extraction of the interactions between symptom-to-disease (point-2) and symptom-to-gene (point-3) from the HPO database using the symptom terms as queries and calculating their respective connectivity probability score followed by the selection of best-fitted connections using the statistical analysis of FDR-adjusted p < 0.001. Point-4: Extraction of the ‘Elite’ disease-gene connections with respective Solr-based relevance score (Smalacards) from Malacards database using symptom-associated disease terms (found in point-2) as queries with respect to symptom-associated gene terms (found in point-3). Point-5 and Point-6: Implication of symptoms in connections of disease-genes by computing ‘weightage of average contribution’ of symptoms in diseases (Wd(Di)) and that of diseases in genes (Wg(gi)) to find the Elite’ Disease-gene connection considering co-occurrence of at least connections of one disease to one symptom and one gene terms in the network. The FDR-adjusted p < 0.01 was used to filter weighted-based disease-gene connections. Point-7 to Point-9: Finding intra-edges of nodes using cosine semantic similarity scores ≥ 0.7 for retaining symptom-symptom (point-7) and disease-disease (Point-8) pairs and, STRING confidence score ≥ 0.7 for retaining gene–gene pairs. These symptoms, diseases and genes are selected mathematically and statistically as described in point-6. Point-10: Integration of all inter- and intra-edges of symptoms and their allied diseases and genes to construct the TN (symptom-disease-gene) for COVID-19. The representative Venn diagrams indicate the stepwise changes in the number of nodes for the selection of elite symptoms, diseases and genes for the construction of TN.
Summary of the facts reported in 103 literatures curated in PubMed database for finding the neurological symptoms of COVID-19 selected for the construction of TN.
| Study type | Patients | Timeline | PubMed ID of screened articles | Remarks on COVID-19 patients |
|---|---|---|---|---|
Case study (18) | 4 (75%) | Jan–Apr, 2020 | 32314810, 32474220, 32457227, 32449057 | During ICU hospitalisation: Died after 10–12 days |
| 1 (100%) | 2020 | 32737799 | During hospitalisation with COVID-19: Recovered during release | |
| 1 (100%) | 2020 | 32730234 | During hospitalisation with COVID-19: Recovered during release | |
| 2 (100%) | 2020 | 32600350, 32409316 | During hospitalisation with COVID-19: Recovered during release | |
| 2 (50%) | 2020 | 32615528, 32367205 | During hospitalisation with COVID-19: Recovered during release | |
| 1 (0%) | 2020 | 32545925 | During hospitalisation with COVID-19: Recovered during release | |
| 1 (0%) | 2020 | 32518103 | During hospitalisation with COVID-19: Recovered during release | |
| 1 (0%) | 2020 | 32689590 | During hospitalisation with COVID-19: Recovered during release | |
| 3 (66.7%) | 2020 | 32464585, 32430637, 32418288 | During hospitalisation with COVID-19: Recovered during release | |
| 2 (0%) | 2020 | 32489724, 32586897 | During hospitalisation with COVID-19: Recovered during release | |
Case series (6) | 4 (50%) | 2020 | 32679347 | During hospitalisation with COVID-19: 3 patients with CNS problem during release |
| 2 (100%) | Mid-March, 2020 | 32462412 | During hospitalisation with COVID-19: 1 died, 1 with long-term monitoring | |
| 2 (50%) | 2020 | 32464157 | During hospitalisation with COVID-19: Recovered during release | |
| 6 (83.3%) | Mar 16–Apr 5, 2020 | 32436105 | During hospitalisation with COVID-19: 5 died, 1 with severe neurological deficits | |
| 4 (25%) | 2020 | 32360439 | During hospitalisation with COVID-19: 3 died, 1 release | |
| 2 (0%) | 2020 | 32307298 | During hospitalisation with COVID-19: 1 died, 1 release | |
Clinical cohort (16) | 140 (71.4%) | May 3–May 5, 2020 | 32771053 | During ICU hospitalisation with COVID-19: Not specified |
| 89 (61.8%) | Mar 23–May 23, 2020 | 32756734 | During ICU hospitalisation with COVID-19: Not specified | |
| 64 (67.2%) | Mar 6–Apr 9, 2020 | 32680942 | During ICU hospitalisation with COVID-19: Not specified | |
| 73 (65.8%) | Mar 23–May 7, 2020 | 32677875 | During hospitalisation with COVID-19: Not specified | |
| 86 (62.8%) | Feb 5–Apr 2, 2020 | 32754114 | During ICU hospitalisation with COVID-19: Not specified | |
| 9 (77.8%) | 2020 | 32639679 | During hospitalisation with COVID-19: Not specified | |
| 43 (55.8%) | Apr 9–May 15, 2020 | 32637987 | During hospitalisation with COVID-19: Not specified | |
| 4 (50.0%) | Mar 1–May 8, 2020 | 32609336 | During hospitalisation with COVID-19: Not specified | |
| 50 (58.0%) | Mar 1–Apr 30, 2020 | 32570113 | During hospitalisation with COVID-19: Not specified | |
| 10 (80.0%) | Mar 1–Apr 15, 2020 | 32466736 | During ICU hospitalisation with COVID-19: Not specified | |
| 163 | Feb–Mar, 2020 | 32467244 | During hospitalisation with COVID-19: Not specified | |
| 242 (62.0%) | Mar 1–Mar 31, 2020 | 32467191 | During hospitalisation with early COVID-19: Not specified | |
| 27 (74.1%) | Mar 1–Apr 14, 2020 | 32439651 | During hospitalisation with COVID-19: Not specified | |
| 454 (60.8%) | Mar 1–Apr 13, 2020 | 32447193 | During hospitalisation with COVID-19: Not specified | |
| 103 (57.3%) | Mar 30––Apr 24, 2020 | 32416289 | During hospitalisation with COVID-19: Not specified | |
| 58 | Mar 3–Apr 3, 2020 | 32294339 | During hospitalisation with COVID-19: Not specified | |
Systematic review (11) | 205,938 | 2020 | 32730915 | Not specified: Not specified |
| 765 | 2020 | 32725449 | Not specified: Not specified | |
| 116 | 2020 | 32603770 | Long-term reported: Not specified | |
| 36 (80.5%) | Mar 25–May 20, 2020 | 32653111 | During COVID-19: Not specified | |
| 9086 (45.2%) | Jan–June, 2020 | 32561222 | Long-term reported: Not specified | |
| 1454 | Dec 2019-May 1, 2020 | 32574246 | Not specified: Not specified | |
| 1048 (50.4%) | Jan 1–Apr 10, 2020 | 32437679 | Following COVID-19: Not specified | |
| 235 | 2020 | 32422545 | Not specified: Not specified | |
| - | Up to May 10, 2020 | 32490966 | Not specified: Not specified | |
| 4014 | Jan 1–Apr 15, 2020 | 32345728 | During COVID-19: Not specified | |
| 765 | Dec 1, 2019–Mar 26, 2020 | 32299017 | During COVID-19: Not specified | |
| Meta-analysis (1) | ~ 4700 | Feb 7-May 17, 2020 | 32529575 | Not specified: Not specified |
Narrative review, brief report, perspective, research article (51) | – | Feb 28–Aug 11, 2020 | 32776905, 32767055, 32758257, 32751841, 32729463, 32725545, 32720223, 32627524, 32683890, 32440692, 32672843, 32668062, 32628969, 32610334, 32655490, 32655489, 32491829, 32715280, 32587958, 32527073, 32581854, 32498691, 32492193, 32485101, 32486196, 32469504, 32474399, 32458193, 32574248, 32574247, 32442082, 32427468, 32424503, 32418055, 32427134, 32405259, 32405150, 32378030, 32417,235, 32417124, 32366614, 32643664, 32515379, 32352081, 32320066, 32343122, 32320211, 32266761, 32385132, 32104915, 32538857 | During COVID-19 and long-term COVID, as indicative: Not specified |
The facts indicate types of the studies in literatures, their publication timeline, sample size included in the studies, PubMed ID of the articles and status of the COVID-19 patients mentioned in the studies. The details of the literature PubMed ID with their full citation are presented in Table S1 under Supplementary File 1. The manually curated ‘neurological symptoms/manifestations’ from 103 selected literature are presented in Table S2 in Supplementary File 1.
Figure 3The model of ‘tripartite network’ (TN) and its ‘target nodes’ with their pairwise semantic similarity scores (SSS) for COVID-19: (a) Inset image: Representation of TN developed in Cytoscape software as described in Fig. 2, with 147 ‘target nodes’ (‘HB + D’, ‘pure-driver’, ‘pure-HB’) including 27 genes (deep grey circles) and categories (different colour codes) of 73 symptoms and 47 diseases, other than ‘target nodes’ (light grey) and all edges (grey). Pictorial image: The connections of only ‘target nodes’ including categories (big circles) of symptoms and diseases in TN, exhibiting details of nodes and edges: triangular (open) inter-links of three gene nodes (ACTB, ADAR, CTNNB1) and respective nodes of 12 symptoms and 11 diseases with same colour codes (for both nodes and edges); other nodes (white-coloured with grey border) without having triangular inter-links and their connections (grey-coloured edges). Both inset and pictorial images: Nodes are represented as different shapes (symptoms:diamond, diseases:rectangle, genes:circle) and sizes (connectivity scores adjusted by the ‘continuous mapping of node size’ in ranges between 25 and 60 pts.) with grey borders (illustrated with 2 pts.). The widths of the edges display respective metrics, which are adjusted by 0.5–2 pts. of ‘continuous mapping of edge width’ in the edge network style of the Cytoscape. (b–e) Heat maps: Representation of the pairwise SSS values (0–1 with colour codes) of ‘target nodes’ of TN including 73 symptoms (b), 47 diseases (c) and 27 genes (d, e). The pairwise SSS measurements (vide ‘Methodology’ section) are calculated as SSS-I for genes (d) and ‘SSS-II for symptoms (b), diseases (c) and genes (e). The columns of heatmaps (b and c) include codes (mentioned right side of each term in a row) of symptoms (b) and diseases (c). The vertical bars (left side of each heat map) demonstrate nodes having three topological properties (different colour codes) of centrality measurements and categories (same colour codes as in ‘inset in a’) of symptoms and diseases of the network. The summary of classifier ROC-AUC statistics (threshold scores, accuracy scores in %, AUC scores) of SSSs (b–e) are presented adjacent to the colour bar.
Figure 4The model of ‘combined gene network’ (CGN) and its ‘target nodes’ with their pairwise semantic similarity scores (SSS) for COVID-19: (a) Study design for the construction of the CGN is represented stepwise (point 1–6). (b) The PPI interactome model of CGN is a continuous network consisting of total 281 nodes of gene products/proteins and 793 edges corresponding to the functional connectivities between nodes. The CGN is constructed using SPPICS > 0.60 as the widths of the edges, which are adjusted by 0.5 to 5 pts. of ‘continuous mapping of edge width’ in the edge network style of the Cytoscape. The colour nodes (inset) represent 22 ‘target genes’ having both ‘HB and driver’ properties including five ‘query genes’ derived from TN (green nodes) and 17 ‘co-expressed genes’ (dodger blue) derived from RNA-Seq data of COVID-19 patients. The other nodes of the CGN are kept white-coloured with grey-borders (illustrated with 2 pts.). The sizes of the nodes indicate their connectivities (higher the value, higher will be the size) adjusted by the ‘continuous mapping of node size’ in ranges between 25 and 60 pts. for the lowest and highest node size respectively. (c) The Venn diagrams indicate the changes in number of ‘target genes’ (both ‘HB and driver’) with their names derived from CGNs developed using multiple SPPICS thresholds including > 0.60, > 0.70, > 0.80 and > 0.90 for inclusion of all possible ‘target genes’ for better interpretations. (d and e) Heat maps represent the pairwise SSS values (0–1 with colour codes) of 22 ‘target genes’ of CGN. The values of SSS-I (d) and SSS-II (e) are direct and indirect associations (vide ‘Methodology’ section), respectively. All ‘target genes’ in CGN belong to ‘HB and driver’ topological properties of centrality measurements. The colours of the vertical bars (left side of each heat map) indicate types (vide inset) of ‘target genes’. The summary of classifier ROC-AUC statistics (threshold scores, accuracy scores in %, AUC scores) of SSSs is presented in adjacent to the colour bar.
Summary of pairwise ‘candidate genes’ with their properties evaluated as ‘prevalent’ and ‘non-prevalent’ characters and their functional links having statuses with interaction scores vide SSS-I, SSS-II, SPPICS and WHMS values.
| S. no | Gene-pair | SSS-I | SSS-II | SPPICS | WHMS |
|---|---|---|---|---|---|
| 1 | (0.54) Is_a | (0.81) Is_a | (0.97) Strong | 0.62 | |
| 2 | (0.56) Is_a | (0.77) Is_a | (0.86) Strong | 0.61 | |
| 3 | (0.49) Is_a | (0.79) Is_a | (0.99) Strong | 0.60 | |
| 4 | (0.48) Is_a | (0.79) Is_a | (0.99) Strong | 0.59 | |
| 5 | (0.47) Is_a | (0.80) Is_a | (0.99) Strong | 0.59 | |
| 6 | (0.48) Is_a | (0.83) Is_a | (0.84) Strong | 0.58 | |
| 1 | (0.54) Is_a | (0.71) Is_a | (0.80) Strong | 0.57 | |
| 2 | (0.47) Is_a | (0.68) Part_of | (0.99) Strong | 0.56 | |
| 3 | (0.51) Is_a | (0.79) Is_a | (0.70) Weak | 0.56 | |
| 4 | (0.50) Is_a | (0.78) Is_a | (0.74) Weak | 0.55 | |
| 5 | (0.46) Is_a | (0.77) Is_a | (0.83) Strong | 0.55 | |
| 6 | (0.48) Is_a | (0.73) Is_a | (0.81) Strong | 0.55 | |
| 7 | (0.43) Is_a | (0.89) Is_a | (0.82) Strong | 0.55 | |
| 8 | (0.45) Is_a | (0.73) Is_a | (0.80) Strong | 0.53 | |
| 9 | (0.44) Is_a | (0.70) Part_of | (0.83) Strong | 0.53 | |
| 10 | (0.45) Is_a | (0.64) Part_of | (0.89) Strong | 0.53 | |
| 11 | (0.42) Is_a | (0.80) Is_a | (0.78) Strong | 0.52 | |
| 12 | (0.44) Is_a | (0.79) Is_a | (0.71) Weak | 0.52 | |
| 13 | (0.40) Is_a | (0.86) Is_a | (0.75) Weak | 0.52 | |
| 14 | (0.45) Is_a | (0.85) Is_a | (0.61) Weak | 0.51 | |
| 15 | (0.48) Is_a | (0.67) Part_of | (0.65) Weak | 0.50 | |
The prevalent pairwise ‘candidate genes’ are selected based on values of gene-pairs more than the cut-off value of WHMS viz. 0.57 (vide Fig. 5b and f). The terms ‘Is_a’ (subtype) and ‘Part_of’’ (component) represent the functional associations of GO-based terms hierarchically (ancestors-descendants relationship) organised in directed acyclic graph (DAG). The terms ‘Is_a’ and ‘Part_of’ properties are considered with the gene-pairs having scores greater than and less than, respectively the cut-off values of SSS-I (viz. 0.40) and SSS-II (viz. 0.71). The terms ‘Strong’ and ‘Weak’ represent the link strength having values of SPPICS to gene-pairs with scores greater than and less than, respectively the cut-off value (viz. 0.77). The cut-off values (optimal threshold score of ROC) are given in Fig. 5b.
Figure 5The ‘candidate genes’ and their functional modules for neurological manifestations of COVID-19: (a) Bubble plot: Results of Enrichment analysis against four sets of genes (‘target genes’ and their connected genes), exhibiting nervous system-specific functional annotations/terms (X-axis) across six resources, characterised by ‘gene count’ (size of bubbles adjusted with 0.5 to 4 pts.), ‘combined score’ (log(p-value) × z-value > 147 adjusted in VIBGYOR colour gradient of bubbles) and corresponding ‘gene ratio’ (Y-axis). (b) Box plot: Data (X-axis) of interaction score types (Y-axis) i.e., SSS-I, SSS-II, SPPICS and WHMS of ‘candidate genes’. (c) Checkerboard: ‘Candidate genes’ (Y-axis) with their status of topological properties in networks (vertical bars), associated with enriched 54 functional annotation terms (X-axis) evaluated under six different ontology categories (horizontal bar) manually curated from results of the bubble plot (a). (d) Heat map: Pairwise GO-BP SSS-II values (0.23–1.0 with colour codes) of 40 enriched terms with their categories (colours in horizontal bar), overrepresented with at least one ‘candidate gene’. (e) Box plot: Data of % changes (Y-axis) of topological properties (X-axis) in CGN using leave-one-out method by ‘candidate genes’ to cross-validate disease-causing ‘indispensable’ driver nodes in the network. (f) Pictorial presentation (developed in Cytoscape software): the interactome model of 21 ‘candidate genes’ and their associated functional seven categories of Enrichr terms related to neurological manifestations of COVID-19. Lines represent gene-pair interactions (solid lines: width as WHMS adjusted by 3–10 pts.) with the ‘prevalent’ links (dark solid lines) of ‘candidate genes’ having WHMS more than cut-off values (given in b) and functional links (dotted lines: width adjusted by 2 pts.) of genes with categories (given in d) of Enrichr terms. [Box plot (b and e) summary: Data with quartile values (edges of box), inter-quartile range, median value (black vertical line inside box), mean value (grey-coloured filled circle in box), maximum and minimum values (two vertical lines), and their out layers; Classifier statistics (ROC-AUC) summary: optimal threshold score(accuracy score)AUC score (vide: right side of box plots in b; adjacent to the colour bar in d); Codes of Enrichr annotation terms (c and d): ‘D15,D22,D23(a)’ and ‘D15,D22,D23(b)’ as ‘Lateral sclerosis’ and ‘Amyotrophic lateral sclerosis’ respectively.]