Sungyoung Lee1, Sungkyoung Choi1, Young Jin Kim2, Bong-Jo Kim2, Heungsun Hwang3, Taesung Park4. 1. Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea. 2. Center for Genome Science, National Institute of Health, Osong Health Technology Administration Complex, Chungcheongbuk-Do 363-951, Korea. 3. Department of Psychology, McGill University, Montreal, QC H3A 1B1, Canada. 4. Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 151-747, Korea Department of Statistics, Seoul National University, Seoul 151-747, Korea.
Abstract
MOTIVATION: To address 'missing heritability' issue, many statistical methods for pathway-based analyses using rare variants have been proposed to analyze pathways individually. However, neglecting correlations between multiple pathways can result in misleading solutions, and pathway-based analyses of large-scale genetic datasets require massive computational burden. We propose a Pathway-based approach using HierArchical components of collapsed RAre variants Of High-throughput sequencing data (PHARAOH) for the analysis of rare variants by constructing a single hierarchical model that consists of collapsed gene-level summaries and pathways and analyzes entire pathways simultaneously by imposing ridge-type penalties on both gene and pathway coefficient estimates; hence our method considers the correlation of pathways without constraint by a multiple testing problem. RESULTS: Through simulation studies, the proposed method was shown to have higher statistical power than the existing pathway-based methods. In addition, our method was applied to the large-scale whole-exome sequencing data with levels of a liver enzyme using two well-known pathway databases Biocarta and KEGG. This application demonstrated that our method not only identified associated pathways but also successfully detected biologically plausible pathways for a phenotype of interest. These findings were successfully replicated by an independent large-scale exome chip study. AVAILABILITY AND IMPLEMENTATION: An implementation of PHARAOH is available at http://statgen.snu.ac.kr/software/pharaoh/ CONTACT: tspark@stats.snu.ac.kr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: To address 'missing heritability' issue, many statistical methods for pathway-based analyses using rare variants have been proposed to analyze pathways individually. However, neglecting correlations between multiple pathways can result in misleading solutions, and pathway-based analyses of large-scale genetic datasets require massive computational burden. We propose a Pathway-based approach using HierArchical components of collapsed RAre variants Of High-throughput sequencing data (PHARAOH) for the analysis of rare variants by constructing a single hierarchical model that consists of collapsed gene-level summaries and pathways and analyzes entire pathways simultaneously by imposing ridge-type penalties on both gene and pathway coefficient estimates; hence our method considers the correlation of pathways without constraint by a multiple testing problem. RESULTS: Through simulation studies, the proposed method was shown to have higher statistical power than the existing pathway-based methods. In addition, our method was applied to the large-scale whole-exome sequencing data with levels of a liver enzyme using two well-known pathway databases Biocarta and KEGG. This application demonstrated that our method not only identified associated pathways but also successfully detected biologically plausible pathways for a phenotype of interest. These findings were successfully replicated by an independent large-scale exome chip study. AVAILABILITY AND IMPLEMENTATION: An implementation of PHARAOH is available at http://statgen.snu.ac.kr/software/pharaoh/ CONTACT: tspark@stats.snu.ac.kr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Alkes L Price; Gregory V Kryukov; Paul I W de Bakker; Shaun M Purcell; Jeff Staples; Lee-Jen Wei; Shamil R Sunyaev Journal: Am J Hum Genet Date: 2010-05-13 Impact factor: 11.025
Authors: Laura Almasy; Thomas D Dyer; Juan Manuel Peralta; Jack W Kent; Jac C Charlesworth; Joanne E Curran; John Blangero Journal: BMC Proc Date: 2011-11-29
Authors: Phoebe C R Parrish; Delong Liu; Russell H Knutsen; Charles J Billington; Robert P Mecham; Yi-Ping Fu; Beth A Kozel Journal: Hum Mol Genet Date: 2020-07-29 Impact factor: 6.150