| Literature DB >> 36018233 |
Marie Oestreich1,2, Lisa Holsten2,3, Shobhit Agrawal2,4, Kilian Dahm2,3, Philipp Koch2,3, Han Jin5, Matthias Becker1,2, Thomas Ulas2,3,4.
Abstract
MOTIVATION: Transcriptome-based gene co-expression analysis has become a standard procedure for structured and contextualized understanding and comparison of different conditions and phenotypes. Since large study designs with a broad variety of conditions are costly and laborious, extensive comparisons are hindered when utilizing only a single dataset. Thus, there is an increased need for tools that allow the integration of multiple transcriptomic datasets with subsequent joint analysis, which can provide a more systematic understanding of gene co-expression and co-functionality within and across conditions. To make such an integrative analysis accessible to a wide spectrum of users with differing levels of programming expertise it is essential to provide user-friendliness and customizability as well as thorough documentation.Entities:
Mesh:
Year: 2022 PMID: 36018233 PMCID: PMC9563699 DOI: 10.1093/bioinformatics/btac589
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.hCoCena overview. The main steps of the analysis backbone are shown in the centre. These functions are provided in the main markdown including descriptions and references to satellite functions. The ‘orbit’ around the central steps illustrates the available satellite functions. These are not part of the main script and can be added or left out of the analysis as desired. The user can also add custom functions to the pool of satellites. In general, the satellite functions form two groups: data exploration functions enable a first impression of the data at hand, while the other functions are part of the module analysis and can only be applied once the co-expression modules have been detected in the main analysis (Botía )
Fig. 2.hCoCena results of the two integrated macrophage activation assays. (A) Initial data exploration, here exemplified on the Microarray data. Shown are the gene expression values per sample as boxplot (left) and a PCA of the samples coloured by treatment (right). (B) The cut-off selection interface provided to the user. It visualizes four different network criteria (top to bottom: R2, no. edges, no. nodes, no. networks) for a series of different correlation cut-off values. (C) The module heat-map (left) the shows the co-expression modules detected in the integrated network (right) as rows and the defined sample groups as columns. The cell colouring indicates the GFC value of the respective sample group across the genes of the corresponding module. Numbers and bar-plots on the right side indicate the sizes of the modules. (D) Exemplary Hallmark Molecular Signatures and Gene Ontology enrichments of selected modules. Shown are the top 5 most enriched terms with adjusted P-values 0.1. Module names are highlighted with respect to stimulus-specificity: IFN-γ in purple and Interleukin-4 in green. (E) Scaled mean expressions of hub genes detected for the lightblue module confirm IFN-γ-specific expression patterns. (F) Top5 transcription factors (TFs) with the most enriched targets on cluster lightblue. TFs are highlighted in turquoise font, their targets are in black. Cell colours indicate the module the genes belong to. Arcs indicate the TFs known targets, saturated arc colours represent that a corresponding edge is present in the integrated co-expression network
Fig. 3.(A) Comparison of different network packages concerning general characteristics and included algorithms. All Pubmed and Google Scholar searches were performed at the October 15, 2021 with exception of the GWENA search (December 17, 2021); Multiple Correlation Algorithms refer to the options provided to analyse the correlation of genes. Multiple Community Algorithms refer to the algorithms provided to detect community structures (‘clusters’/‘modules’) in the network. Multiple Clustering Algorithms refer to the options provided to cluster data points, e.g. samples or sample groups. aGoogle Scholar search ‘cocena2’; bPubmed search ‘WGCNA’; cPubmed search ‘networkx’ (eight results but two are not referring to NetworkX); dPubmed search ‘igraph’; eGoogle Scholar search ‘CDlib (Community discovery library)’ since 2020, not all results refer to the CDLIB tool; fPubmed search ‘BioNetStat’; gGoogle Scholar search ‘NetSimile’; hGoogle Scholar search ‘INfORM: Inference of NetwOrk Response Modules’ (both entries are referring to the same paper); iPubmed search ‘CoNekT’; jGoogle Scholar search ‘VOLTA (advanced molecular network analysis)’; kHamming distance can be calculated taking specific conditions into account; lin general the module detection is enabled via unsupervised clustering, by default hierarchical clustering dendrogram and branch cutting is used; mWGCNA pipeline can be supplemented by other algorithms like k-means clustering (Botía ); nexample for the implemented algorithms: Louvain & Tree partitioning; othey can make use of the base R function cor(); pclustering algorithms like cluster_louvain, cluster_fast_greedy, cluster_edge_betweenness are included and are used for community detection; qdifferent clustering approaches (like NodeClustering, FuzzyNodeClustering) are included for the standardized representation of community structures; rGoogle Scholar search ‘GWENA Lemoine’ (11 results in total, 3 could be associated with the discussed R package). (B) Comparison of different network packages concerning features, user-friendliness and connected data banks. aA correlation between modules and provided numerical traits can be performed, further enrichment analysis can be performed via recommended online tools like David; bdegree centrality of the graph components can be measured; cconnecting strength of graph components can be calculated; dcentrality analysis can be used to highlight key genes; enode importance can be calculated and used for evaluation; fFuzzyNodeClustering function also includes the generation of a ‘node-community allocation probability matrix to keep track of the probabilistic component of the final non-overlapping partition’ (Rossetti ); gprovided tutorials are guiding through complete analysis pipelines, the code chunks have to be adjusted manually; hGene ontology enrichment analysis can directly be performed within R using GO.db; ithe required data are provided using pathview R package; jINfORM can make use of GSEABase package; kGO enrichment API can be used ‘to perform enrichment against Reactome Pathways as well as GO or the Panther Protein class’ (https://github.com/fhaive/VOLTA/blob/master/jupyternotebooks/Example_of_Enrichment.ipynb); lPCA is not performed like in hCoCena but the module eigengene can be calculated which is the first principle component of the expression matrix; mSpectral Coarse Graining can be performed, ‘(PCA) can be viewed as a particular SCG, called exact SCG, where the matrix to be coarse-grained is the covariance matrix of some dataset’ (igraph R manual pages); nPCA like hCoCena cannot be performed but eigenvectors can be calculated; ograph/feature matrix can be projected into the principle component space via single value decomposition