Literature DB >> 23212810

Network-based regularization for matched case-control analysis of high-dimensional DNA methylation data.

Hokeun Sun1, Shuang Wang.   

Abstract

The matched case-control designs are commonly used to control for potential confounding factors in genetic epidemiology studies especially epigenetic studies with DNA methylation. Compared with unmatched case-control studies with high-dimensional genomic or epigenetic data, there have been few variable selection methods for matched sets. In an earlier paper, we proposed the penalized logistic regression model for the analysis of unmatched DNA methylation data using a network-based penalty. However, for popularly applied matched designs in epigenetic studies that compare DNA methylation between tumor and adjacent non-tumor tissues or between pre-treatment and post-treatment conditions, applying ordinary logistic regression ignoring matching is known to bring serious bias in estimation. In this paper, we developed a penalized conditional logistic model using the network-based penalty that encourages a grouping effect of (1) linked Cytosine-phosphate-Guanine (CpG) sites within a gene or (2) linked genes within a genetic pathway for analysis of matched DNA methylation data. In our simulation studies, we demonstrated the superiority of using conditional logistic model over unconditional logistic model in high-dimensional variable selection problems for matched case-control data. We further investigated the benefits of utilizing biological group or graph information for matched case-control data. We applied the proposed method to a genome-wide DNA methylation study on hepatocellular carcinoma (HCC) where we investigated the DNA methylation levels of tumor and adjacent non-tumor tissues from HCC patients by using the Illumina Infinium HumanMethylation27 Beadchip. Several new CpG sites and genes known to be related to HCC were identified but were missed by the standard method in the original paper.
Copyright © 2012 John Wiley & Sons, Ltd.

Entities:  

Mesh:

Year:  2012        PMID: 23212810      PMCID: PMC4038397          DOI: 10.1002/sim.5694

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  22 in total

1.  KEGG: kyoto encyclopedia of genes and genomes.

Authors:  M Kanehisa; S Goto
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

Review 2.  Assessing the accuracy of prediction algorithms for classification: an overview.

Authors:  P Baldi; S Brunak; Y Chauvin; C A Andersen; H Nielsen
Journal:  Bioinformatics       Date:  2000-05       Impact factor: 6.937

3.  Penalized Cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data.

Authors:  Jiang Gui; Hongzhe Li
Journal:  Bioinformatics       Date:  2005-04-06       Impact factor: 6.937

4.  Network-constrained regularization and variable selection for analysis of genomic data.

Authors:  Caiyan Li; Hongzhe Li
Journal:  Bioinformatics       Date:  2008-03-01       Impact factor: 6.937

5.  Matched samples logistic regression in case-control studies with missing values: when to break the matches.

Authors:  Lisbeth Hansson; Harry J Khamis
Journal:  Stat Methods Med Res       Date:  2008-03-28       Impact factor: 3.021

6.  Partial Correlation Estimation by Joint Sparse Regression Models.

Authors:  Jie Peng; Pei Wang; Nengfeng Zhou; Ji Zhu
Journal:  J Am Stat Assoc       Date:  2009-06-01       Impact factor: 5.033

7.  Aberrant methylation of multiple tumor suppressor genes in aging liver, chronic hepatitis, and hepatocellular carcinoma.

Authors:  Naoshi Nishida; Takeshi Nagasaka; Takafumi Nishimura; Iwao Ikai; C Richard Boland; Ajay Goel
Journal:  Hepatology       Date:  2008-03       Impact factor: 17.425

8.  miR-124 and miR-203 are epigenetically silenced tumor-suppressive microRNAs in hepatocellular carcinoma.

Authors:  Mayuko Furuta; Ken-ich Kozaki; Shinji Tanaka; Shigeki Arii; Issei Imoto; Johji Inazawa
Journal:  Carcinogenesis       Date:  2009-10-20       Impact factor: 4.944

9.  Survival analysis with high-dimensional covariates: an application in microarray studies.

Authors:  David Engler; Yi Li
Journal:  Stat Appl Genet Mol Biol       Date:  2009-02-11

10.  Incorporating predictor network in penalized regression with application to microarray data.

Authors:  Wei Pan; Benhuai Xie; Xiaotong Shen
Journal:  Biometrics       Date:  2009-07-23       Impact factor: 2.571

View more
  12 in total

Review 1.  Gene-Environment Interaction: A Variable Selection Perspective.

Authors:  Fei Zhou; Jie Ren; Xi Lu; Shuangge Ma; Cen Wu
Journal:  Methods Mol Biol       Date:  2021

2.  Network-based analysis identifies epigenetic biomarkers of esophageal squamous cell carcinoma progression.

Authors:  Chun-Pei Cheng; I-Ying Kuo; Hakan Alakus; Kelly A Frazer; Olivier Harismendy; Yi-Ching Wang; Vincent S Tseng
Journal:  Bioinformatics       Date:  2014-07-10       Impact factor: 6.937

3.  NEpiC: a network-assisted algorithm for epigenetic studies using mean and variance combined signals.

Authors:  Peifeng Ruan; Jing Shen; Regina M Santella; Shuigeng Zhou; Shuang Wang
Journal:  Nucleic Acids Res       Date:  2016-06-14       Impact factor: 16.971

4.  The case-crossover design via penalized regression.

Authors:  Sam Doerken; Maja Mockenhaupt; Luigi Naldi; Martin Schumacher; Peggy Sekula
Journal:  BMC Med Res Methodol       Date:  2016-08-22       Impact factor: 4.615

5.  pETM: a penalized Exponential Tilt Model for analysis of correlated high-dimensional DNA methylation data.

Authors:  Hokeun Sun; Ya Wang; Yong Chen; Yun Li; Shuang Wang
Journal:  Bioinformatics       Date:  2017-06-15       Impact factor: 6.937

6.  Biomarker discovery study design for type 1 diabetes in The Environmental Determinants of Diabetes in the Young (TEDDY) study.

Authors:  Hye-Seung Lee; Brant R Burkhardt; Wendy McLeod; Susan Smith; Chris Eberhard; Kristian Lynch; David Hadley; Marian Rewers; Olli Simell; Jin-Xiong She; Bill Hagopian; Ake Lernmark; Beena Akolkar; Anette G Ziegler; Jeffrey P Krischer
Journal:  Diabetes Metab Res Rev       Date:  2014-07       Impact factor: 8.128

7.  Multiple Group Testing Procedures for Analysis of High-Dimensional Genomic Data.

Authors:  Hyoseok Ko; Kipoong Kim; Hokeun Sun
Journal:  Genomics Inform       Date:  2016-12-30

8.  Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data.

Authors:  Kipoong Kim; Hokeun Sun
Journal:  BMC Bioinformatics       Date:  2019-10-22       Impact factor: 3.169

9.  Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm.

Authors:  Marta Avalos; Hélène Pouyes; Yves Grandvalet; Ludivine Orriols; Emmanuel Lagarde
Journal:  BMC Bioinformatics       Date:  2015-04-17       Impact factor: 3.169

Review 10.  A Review of Matched-pairs Feature Selection Methods for Gene Expression Data Analysis.

Authors:  Sen Liang; Anjun Ma; Sen Yang; Yan Wang; Qin Ma
Journal:  Comput Struct Biotechnol J       Date:  2018-02-25       Impact factor: 7.271

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.