| Literature DB >> 32948695 |
Stephen Bates1, Matteo Sesia2, Chiara Sabatti3,4, Emmanuel Candès1,5.
Abstract
We introduce a method to draw causal inferences-inferences immune to all possible confounding-from genetic data that include parents and offspring. Causal conclusions are possible with these data because the natural randomness in meiosis can be viewed as a high-dimensional randomized experiment. We make this observation actionable by developing a conditional independence test that identifies regions of the genome containing distinct causal variants. The proposed digital twin test compares an observed offspring to carefully constructed synthetic offspring from the same parents to determine statistical significance, and it can leverage any black-box multivariate model and additional nontrio genetic data to increase power. Crucially, our inferences are based only on a well-established mathematical model of recombination and make no assumptions about the relationship between the genotypes and phenotypes. We compare our method to the widely used transmission disequilibrium test and demonstrate enhanced power and localization.Entities:
Keywords: causal discovery; conditional independence testing; false discovery rate (FDR); family-based association test (FBAT); transmission disequilibrium test (TDT)
Mesh:
Year: 2020 PMID: 32948695 PMCID: PMC7533659 DOI: 10.1073/pnas.2007743117
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.A visualization of the process of recombination on a single chromosome.
Fig. 2.A visualization of a digital twin. The gray shaded region represents the group ; the digital twin always matches the true offspring outside this region.
Fig. 3.Results of the TDT in two populations. (Left and Center) Manhattan plots on chromosome 22, which contains the one true causal SNP, indicated with a dashed vertical line. The genome-wide significance threshold is shown with a gray horizontal line. Left panel shows an admixed population, whereas Center panel shows a British population. (Right) A plot of the absolute correlations between the causal SNP and the other SNPs, conditional on the parental haplotypes. The red solid and blue dotted-dashed curves indicate a smoothed 90% quantile of the absolute correlation with the causal SNP across the chromosome, for the admixed and British populations, respectively.
Fig. 4.A graphical depiction of the causal argument in . A shows that the random variable can create an association between and , even if there is no causal effect. B shows that conditional on the parental haplotypes , the external confounder is independent of the offspring’s genotype . As a result, cannot be responsible for the remaining association between the genotype and the trait . Note that in our hypothesis test we also condition on , which is omitted from the figure for simplicity.
Fig. 5.Power of the digital twin test compared to TDT benchmarks for testing the full-chromosome causal null.
Fig. 6.Performance of the digital twin test and TDT in the binary-response full-genome simulations from . Here, error bars give one SD and the dashed horizontal line indicates the nominal FDR level.
Fig. 7.Performance of the digital twin test and TDT in an admixed population. The dashed horizontal line (Top row) indicates the nominal FDR level for the digital twin test. Because the TDT is using the genome-wide significance level, the nominal FDR level for the TDT is less than 0.05.
Analysis of ASD with digital twin test at different resolutions
| Resolution | 1 Mb | 2 Mb | 3 Mb | 4 Mb | Full chromosome |
| 0.237 | 0.146 | 0.100 | 0.0168 | 0.011 |
| distribution in |
| (see |
| digital twin statistics |
| HMM in |
| for an explicit sampler). |
| from the distribution in |
| for an explicit sampler). Otherwise, set |
| Sample |
| digital twin statistics |
| randomly breaking any ties. |