| Literature DB >> 32127796 |
Wujuan Zhong1, Li Dong1, Taylor B Poston2, Toni Darville2, Cassandra N Spracklen3, Di Wu1,4, Karen L Mohlke3, Yun Li1,3, Quefeng Li1, Xiaojing Zheng1,2.
Abstract
Construction of regulatory networks using cross-sectional expression profiling of genes is desired, but challenging. The Directed Acyclic Graph (DAG) provides a general framework to infer causal effects from observational data. However, most existing DAG methods assume that all nodes follow the same type of distribution, which prohibit a joint modeling of continuous gene expression and categorical variables. We present a new mixed DAG (mDAG) algorithm to infer the regulatory pathway from mixed observational data containing both continuous variables (e.g. expression of genes) and categorical variables (e.g. categorical phenotypes or single nucleotide polymorphisms). Our method can identify upstream causal factors and downstream effectors closely linked to a variable and generate hypotheses for causal direction of regulatory pathways. We propose a new permutation method to test the conditional independence of variables of mixed types, which is the key for mDAG. We also utilize an L 1 regularization in mDAG to ensure it can recover a large sparse DAG with limited sample size. We demonstrate through extensive simulations that mDAG outperforms two well-known methods in recovering the true underlying DAG. We apply mDAG to a cross-sectional immunological study of Chlamydia trachomatis infection and successfully infer the regularity network of cytokines. We also apply mDAG to a large cohort study, generating sensible mechanistic hypotheses underlying plasma adiponectin level. The R package mDAG is publicly available from CRAN at https://CRAN.R-project.org/package=mDAG.Entities:
Keywords: causal regulatory pathways; continuous and categorical variables; directed acyclic graphs; mixed observational data; regulatory network
Year: 2020 PMID: 32127796 PMCID: PMC7038820 DOI: 10.3389/fgene.2020.00008
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Sensitivity, specificity, and FDR of mDAG and two alternative methods, MMHC and CPC-stable, in simulation scenarios 1–8. (A) Scenario 1; (B) Scenario 2; (C) Scenario 3; (D) Scenario 4; (E) Scenario 5; (F) Scenario 6; (G) Scenario 7; (H) Scenario 8. The X-axis indicates the measurements of performance (sensitivity, specificity, and FDR); the Y-axis indicates the corresponding values. “*” indicates the sensitivity/specificity/FDR from mDAG significantly differs from the sensitivity/specificity/FDR of CPC-stable or the sensitivity/specificity/FDR of MMHC. “**” indicates the sensitivity/specificity/FDR from mDAG significantly differs from the sensitivity/specificity/FDR of CPC-stable and the sensitivity/specificity/FDR of MMHC. Such comparisons are tested by two-sample Wilcoxon.
Figure 2Graphic results for causal network analysis of human Chlamydia infection dataset, a mixed type dataset consisting of continuous variables, including expression of 14 cervical cytokines and one covariate (cervical chlamydial load), and categorical variables, including the binary disease outcome (endometrial infection: Endo+ vs. Endo-) and a binary covariate (gonorrhea coinfection) by mDAG and two alternative methods, respectively. The arrows indicate direction of causality. (A) mDAG; (B) MMHC; (C) CPC-stable. The dashed line in (A) separates cytokines connected to ascension on the left, from cytokines disconnected from ascension on the right.
Figure 3Graphic results for causal network analysis of the Metabolic Syndrome in Men dataset, a mixed type dataset consisting of a categorical variable, genotypes of one index SNP at the ADIPOQ GWAS locus, and several continuous variables, including expression levels of 21 genes and plasma adiponectin levels (disease trait). The arrows indicate direction of causality. (A) mDAG; (B) MMHC; (C) CPC-stable.
Figure 4Graphic results for causal network analysis of Metabolic Syndrome in Men dataset, a mixed type dataset consisting of a categorical variable, one index SNP at ARL15 GWAS locus, and continuous variables, including expression of 8 genes and adiponectin levels (disease trait). The arrows indicate direction of causality. (A) mDAG; (B) MMHC; (C) CPC-stable.