| Literature DB >> 34992138 |
Naijia Xiao1,2, Aifen Zhou1,2, Megan L Kempher1,2, Benjamin Y Zhou3, Zhou Jason Shi1,2,4,5, Mengting Yuan1,2,6, Xue Guo1,2,7, Linwei Wu1,2, Daliang Ning1,2, Joy Van Nostrand1,2,3,8, Mary K Firestone9, Jizhong Zhou10,2,11,12.
Abstract
Networks are vital tools for understanding and modeling interactions in complex systems in science and engineering, and direct and indirect interactions are pervasive in all types of networks. However, quantitatively disentangling direct and indirect relationships in networks remains a formidable task. Here, we present a framework, called iDIRECT (Inference of Direct and Indirect Relationships with Effective Copula-based Transitivity), for quantitatively inferring direct dependencies in association networks. Using copula-based transitivity, iDIRECT eliminates/ameliorates several challenging mathematical problems, including ill-conditioning, self-looping, and interaction strength overflow. With simulation data as benchmark examples, iDIRECT showed high prediction accuracies. Application of iDIRECT to reconstruct gene regulatory networks in Escherichia coli also revealed considerably higher prediction power than the best-performing approaches in the DREAM5 (Dialogue on Reverse Engineering Assessment and Methods project, #5) Network Inference Challenge. In addition, applying iDIRECT to highly diverse grassland soil microbial communities in response to climate warming showed that the iDIRECT-processed networks were significantly different from the original networks, with considerably fewer nodes, links, and connectivity, but higher relative modularity. Further analysis revealed that the iDIRECT-processed network was more complex under warming than the control and more robust to both random and target species removal (P < 0.001). As a general approach, iDIRECT has great advantages for network inference, and it should be widely applicable to infer direct relationships in association networks across diverse disciplines in science and engineering.Entities:
Keywords: climate change; direct relationship; indirect relationship; network analysis; systems biology
Year: 2022 PMID: 34992138 PMCID: PMC8764688 DOI: 10.1073/pnas.2109995119
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Overview of iDIRECT. (A) An association network contains both direct (blue) and indirect (red) associations. Indirect associations include spurious links (solid lines) and overestimated direct links (dotted lines). (B) iDIRECT uses a copula-based addition ⊕ to combine association between two nodes through different paths, ensuring the interaction strengths to be within the range [0,1]. (C) iDIRECT introduces a transitivity matrix T (association between k and j excluding paths passing i) and uses S to calculate indirect association strength between i and j, eliminating spurious self-looping paths like i–k–i–j. (D) iDIRECT uses nonlinear solvers to obtain direct association strengths of each link, without inverting the ill-conditioned association matrix. (E) Overall workflow for iDIRECT.
Fig. 2.Performance of iDIRECT on simulated networks in comparison with other methods. (A–C) Synthetic networks with three distinct topologies. (A) Band-like. All the nodes are connected to form a long, band-like structure. The dotted red line indicates the band. (B) Clustered. All nodes are clustered into several disjointed groups. (C) Scale-free. The degree distribution of the nodes follows the power law. (D–F) Comparison of PR curves with varying network topologies: band-like (D), clustered (E), and scale-free (F). Pearson’s correlation coefficients were used to calculate the association matrix. Red, iDIRECT; blue, ND; green, GS; and purple, PC. The numbers indicate the AUPR, with values ranging from zero to one. AUPR represents the average precision when recall varies from zero to one.
Fig. 3.Regulatory networks from DREAM5 network inference challenge. (A) In silico network score. iDIRECT (red) was compared with original submissions (purple). ***P < 0.001. Note that the numbers for Spearman (2.26 × 10−5 for original and 2.90 × 10−3 for iDIRECT) are too small to show. (B) PR curve for the E. coli network. (C) Top 500 links in the E. coli networks obtained by iDIRECT. Four modules with one principal hub were highlighted. Nodes with orange color represent transcriptional factors, and those with gray color mean the regulated genes. Colors of the edges represent different types of supporting evidences: Cyan means links with evidences found in literature; blue means having a binding motif found in promoter; orange means involving either genes in the same operon or an antisigma factors; and gray means no information.
Fig. 4.Soil microbial networks in response to experimental warming. (A and B) Visualization of the microbial MENs under warming or control. n, node number; m, edge number; k, average connectivity; rm, relative modularity. Network nodes were colored at the phylum level; edges were colored based on their module memberships. OTUs identified as module hubs or network hubs were labeled by numbers. (C) Robustness to species removal of iDIRECT-processed networks when 50% of the taxa were randomly removed. (D) Robustness to target taxa removal of iDIRECT-processed networks when four module hubs were removed. The error bars represent SD of 100 repetitions of each simulation. Significant differences are expressed. ***P < 0.001. Detailed simulation results of robustness for both original iDIRECT-processed networks were seen in .