Literature DB >> 22467913

Penalized logistic regression for high-dimensional DNA methylation data with case-control studies.

Hokeun Sun1, Shuang Wang.   

Abstract

MOTIVATION: DNA methylation is a molecular modification of DNA that plays crucial roles in regulation of gene expression. Particularly, CpG rich regions are frequently hypermethylated in cancer tissues, but not methylated in normal tissues. However, there are not many methodological literatures of case-control association studies for high-dimensional DNA methylation data, compared with those of microarray gene expression. One key feature of DNA methylation data is a grouped structure among CpG sites from a gene that are possibly highly correlated. In this article, we proposed a penalized logistic regression model for correlated DNA methylation CpG sites within genes from high-dimensional array data. Our regularization procedure is based on a combination of the l(1) penalty and squared l(2) penalty on degree-scaled differences of coefficients of CpG sites within one gene, so it induces both sparsity and smoothness with respect to the correlated regression coefficients. We combined the penalized procedure with a stability selection procedure such that a selection probability of each regression coefficient was provided which helps us make a stable and confident selection of methylation CpG sites that are possibly truly associated with the outcome.
RESULTS: Using simulation studies we demonstrated that the proposed procedure outperforms existing main-stream regularization methods such as lasso and elastic-net when data is correlated within a group. We also applied our method to identify important CpG sites and corresponding genes for ovarian cancer from over 20 000 CpGs generated from Illumina Infinium HumanMethylation27K Beadchip. Some genes identified are potentially associated with cancers.

Entities:  

Mesh:

Year:  2012        PMID: 22467913      PMCID: PMC3348559          DOI: 10.1093/bioinformatics/bts145

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  17 in total

1.  Association screening of common and rare genetic variants by penalized regression.

Authors:  Hua Zhou; Mary E Sehl; Janet S Sinsheimer; Kenneth Lange
Journal:  Bioinformatics       Date:  2010-08-06       Impact factor: 6.937

2.  A statistical framework for Illumina DNA methylation arrays.

Authors:  Pei Fen Kuan; Sijian Wang; Xin Zhou; Haitao Chu
Journal:  Bioinformatics       Date:  2010-09-29       Impact factor: 6.937

3.  Penalized methods for bi-level variable selection.

Authors:  Patrick Breheny; Jian Huang
Journal:  Stat Interface       Date:  2009-07-01       Impact factor: 0.582

4.  Network-constrained regularization and variable selection for analysis of genomic data.

Authors:  Caiyan Li; Hongzhe Li
Journal:  Bioinformatics       Date:  2008-03-01       Impact factor: 6.937

5.  Genome-wide association analysis by lasso penalized logistic regression.

Authors:  Tong Tong Wu; Yi Fang Chen; Trevor Hastie; Eric Sobel; Kenneth Lange
Journal:  Bioinformatics       Date:  2009-01-28       Impact factor: 6.937

6.  Age-dependent DNA methylation of genes that are suppressed in stem cells is a hallmark of cancer.

Authors:  Andrew E Teschendorff; Usha Menon; Aleksandra Gentry-Maharaj; Susan J Ramus; Daniel J Weisenberger; Hui Shen; Mihaela Campan; Houtan Noushmehr; Christopher G Bell; A Peter Maxwell; David A Savage; Elisabeth Mueller-Holzner; Christian Marth; Gabrijela Kocjan; Simon A Gayther; Allison Jones; Stephan Beck; Wolfgang Wagner; Peter W Laird; Ian J Jacobs; Martin Widschwendter
Journal:  Genome Res       Date:  2010-03-10       Impact factor: 9.043

7.  Myeloperoxidase genetic polymorphism and lung cancer risk.

Authors:  S J London; T A Lehman; J A Taylor
Journal:  Cancer Res       Date:  1997-11-15       Impact factor: 12.701

8.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

9.  VARIABLE SELECTION AND REGRESSION ANALYSIS FOR GRAPH-STRUCTURED COVARIATES WITH AN APPLICATION TO GENOMICS.

Authors:  Caiyan Li; Hongzhe Li
Journal:  Ann Appl Stat       Date:  2010-09-01       Impact factor: 2.083

10.  KEGGgraph: a graph approach to KEGG PATHWAY in R and bioconductor.

Authors:  Jitao David Zhang; Stefan Wiemann
Journal:  Bioinformatics       Date:  2009-03-23       Impact factor: 6.937

View more
  37 in total

1.  Robust network-based regularization and variable selection for high-dimensional genomic data in cancer prognosis.

Authors:  Jie Ren; Yinhao Du; Shaoyu Li; Shuangge Ma; Yu Jiang; Cen Wu
Journal:  Genet Epidemiol       Date:  2019-02-11       Impact factor: 2.135

2.  Regularized rare variant enrichment analysis for case-control exome sequencing data.

Authors:  Nicholas B Larson; Daniel J Schaid
Journal:  Genet Epidemiol       Date:  2013-12-30       Impact factor: 2.135

3.  NEpiC: a network-assisted algorithm for epigenetic studies using mean and variance combined signals.

Authors:  Peifeng Ruan; Jing Shen; Regina M Santella; Shuigeng Zhou; Shuang Wang
Journal:  Nucleic Acids Res       Date:  2016-06-14       Impact factor: 16.971

4.  Structure-Leveraged Methods in Breast Cancer Risk Prediction.

Authors:  Jun Fan; Yirong Wu; Ming Yuan; David Page; Jie Liu; Irene M Ong; Peggy Peissig; Elizabeth Burnside
Journal:  J Mach Learn Res       Date:  2016-12       Impact factor: 3.654

5.  Characterization of the Fundulus heteroclitus embryo transcriptional response and development of a gene expression-based fingerprint of exposure for the alternative flame retardant, TBPH (bis (2-ethylhexyl)-tetrabromophthalate).

Authors:  Weichun Huang; David C Bencic; Robert L Flick; Diane E Nacci; Bryan W Clark; Lawrence Burkhard; Tylor Lahren; Adam D Biales
Journal:  Environ Pollut       Date:  2019-01-10       Impact factor: 8.071

6.  Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data.

Authors:  Saurav Mallik; Zhongming Zhao
Journal:  Brief Bioinform       Date:  2020-03-23       Impact factor: 11.622

7.  Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data.

Authors:  Hokeun Sun; Shuang Wang
Journal:  Stat Med       Date:  2012-12-05       Impact factor: 2.373

8.  A method to detect differentially methylated loci with next-generation sequencing.

Authors:  Hongyan Xu; Robert H Podolsky; Duchwan Ryu; Xiaoling Wang; Shaoyong Su; Huidong Shi; Varghese George
Journal:  Genet Epidemiol       Date:  2013-04-01       Impact factor: 2.135

9.  pETM: a penalized Exponential Tilt Model for analysis of correlated high-dimensional DNA methylation data.

Authors:  Hokeun Sun; Ya Wang; Yong Chen; Yun Li; Shuang Wang
Journal:  Bioinformatics       Date:  2017-06-15       Impact factor: 6.937

10.  Contextual Correlates of Physical Activity among Older Adults: A Neighborhood Environment-Wide Association Study (NE-WAS).

Authors:  Stephen J Mooney; Spruha Joshi; Magdalena Cerdá; Gary J Kennedy; John R Beard; Andrew G Rundle
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2017-02-02       Impact factor: 4.254

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.