Literature DB >> 32364148

Penalized logistic regression based on L1/2 penalty for high-dimensional DNA methylation data.

Abstract

BACKGROUND: DNA methylation is a molecular modification of DNA that is vital and occurs in gene expression. In cancer tissues, the 5'-C-phosphate-G-3'(CpG) rich regions are abnormally hypermethylated or hypomethylated. Therefore, it is useful to find out the diseased CpG sites by employing specific methods. CpG sites are highly correlated with each other within the same gene or the same CpG island.
OBJECTIVE: Based on this group effect, we proposed an efficient and accurate method for selecting pathogenic CpG sites.
METHODS: Our method aimed to combine a L1/2 regularized solver and a central node fully connected network to penalize group constrained logistic regression model. Consequently, both sparsity and group effect were brought in with respect to the correlated regression coefficients.
RESULTS: Extensive simulation studies were used to compare our proposed approach with existing mainstream regularization in respect of classification accuracy and stability. The simulation results show that a greater predictive accuracy was attained in comparison to previous methods. Furthermore, our method was applied to over 20000 CpG sites and verified using the ovarian cancer data generated from Illumina Infinium HumanMethylation 27K Beadchip. In the result of the real dataset, not only the indicators of predictive accuracy are higher than the previous methods, but also more CpG sites containing genes are confirmed pathogenic. Additionally, the total number of CpG sites chosen is less than other methods and the results show higher accuracy rates in comparison to other methods in simulation and DNA methylation data.
CONCLUSION: The proposed method offers an advanced tool to researchers in DNA methylation and can be a powerful tool for recognizing pathogenic CpG sites.

Entities: CellLine Chemical Disease Gene Species

Keywords: CpG island; DNA methylation; L1/2 regularization method; gene regulatory network; variable selection

Mesh：

Year: 2020 PMID： 32364148 PMCID： PMC7369078 DOI： 10.3233/THC-209016

Source DB: PubMed Journal: Technol Health Care ISSN： 0928-7329 Impact factor: 1.285

Introduction

DNA methylations occur at cytosine which might affect the modifications of DNA molecules. In this process, the gene expressions can be regulated without changing the DNA sequences. In particular, the related gene silencing of DNA methylations is a well-accepted epigenetic mechanism that often occurs at tumor suppressor genes loci in human cancers [1, 2, 3, 4, 5]. Recently, some high-throughput DNA methylation platforms have generated amounts of DNA methylation data and mostly based on genotyping bisulfite converted DNA. In this paper, one of the popular platforms, Illumina Infinium HumanMethylation 27K array, was used. Additionally, the -values indicate the methylation status of the CpG sites within the array while each site’s value is calculated by the average of approximately 30 replicates [6]. Every individual -value is a continuous variable between 0 and 1, where zero means unmethylated and one means methylated. To date, researchers have selected methylated sites by statistical classification approaches [7, 8, 9]. Even though most of the CpG sites display various degrees of methylation, only a few gene expressions change. The statistical approaches therefore are difficult to find relevant CpG sites from high-dimensional data, making the statistical approaches not suitable for methylation data. In order to select CpG sites, different parameter models were utilized by researchers to represent diverse status of the samples [10]. Methylation data expresses different features from gene expression data. Firstly, the DNA methylation data has a group effect feature among CpG sites based on gene groups and CpG island groups. Secondly, the DNA methylation data values range between 0 and 1. Based on these features, Sun [11] has proposed a procedure that merged the penalty and squared penalty to select methylated CpG sites. With the Illumina HumanMethylation 27K array, each gene has about 1–25 correlated CpG sites and each CpG island has about 2–11 CpG sites. Based on these aspects of DNA methylation data, a penalized logistic regression model has been introduced to select potentially diseased pathogenic CpG sites within one gene. The regularization can be represented by (0 1) regularization and has exhibited properties for instance unbiasedness, sparsity and oracle [12]. Additionally, the sparsity of the regularization is better than regularization [12, 13, 14]. Based on the penalized logistic regression model, we used the proposed network structure (the central node fully connected network) to describe the two correlated CpG sites’ patterns, one is based on the gene group, whilst the other is based on the CpG island group. The proposed method is designed to select CpG sites by group effect that associate with diseases. The aimed method has a finer specificity than present methods, as it for instance has the potential to select more relevant genes.

Methods

Network-regularization

In this research, samples were used, where is the methylation -value of the -th sample and represents the total number of CpG sites, the dependent variable is a binary variable where 0 implies controls and 1 implies cases. The logistic regression is: where is the regression coefficients. The logistic log-likelihood is defined as: was obtained by minimizing the log-likelihood. In high dimensional application, it is not appropriate to solve the logistic model directly and may result in overfitting. Hence, the regularization approaches are employed to aim at the overfitting problem. The sparse logistic regression can be laid out as Eq. (2) when a regularization term is added: where is the penalty function. Lasso () and , a well-recognized regularization approach was used in previous methods. The does not have sparsity and has a sparsity less than (0 1). Nonetheless, when lies closer to zero, results show a sparser and subsequently more challenging to converge. Therefore, some researchers [12] investigated the properties of (0 1) regularization and demonstrated the regularization is particularly essential and crucial. The performance between penalty and has no significant diversity whereas the regularization is much more facile to solve. Accordingly, the regularization can be laid out as (0 1) regularization which exhibits unbiasedness as well as oracle properties [12, 13, 14]. In high-dimensional DNA methylation data, disease-related CpG sites are very limited and therefore, in practice, the penalty methodology would be more significant than the , and approaches. Consequently, the penalty was favored in our logistic regression model. Some methods have been provided in order to tackle highly correlated variables. Elastic net penalty () and HLR () emphasizes a grouping effect and tend to smooth the coefficient profiles. However, the pathway information was neglected in these methods. To merge CpG sites deduction into the analysis of high-demensional methylation data, we extended a network-based regularization technique designed for the penalty. The methylation data displays a strong group effect and thus previous research used a fully connected network (Fc.net) to describe the correlated CpG sites group patterns within a gene. In methylation data, the group effect of CpG sites is not only present within one gene and present within one CpG island. There are overlapping parts between groups and these overlapping parts correlate with both parts respectively. With the different previous network, we set the overlapping part as the central node and connect it with other correlated parts (Fig. 1). The network not only has the genome information or CpG island information, but also the two aspects of information integrated into the network. It can better reflect the relevance of CpG site.

Figure 1.

a. Previous fully connected network. b. Central node fully connected network.

a. Previous fully connected network. b. Central node fully connected network. The network information is represented in a graphed structure with p-dimensional Laplacian matrix . It is defined as: where is the total number of connections at vertex in graph. The penalty function in Eq. (3) is: where is a norm and illustrate the variables which are linked to the a-th predictor. The sparsity and smoothness are controlled by the parameters and . The effectiveness of the penalty function reduced significantly when two negatively correlated predictors are interacted; the signs of coefficients are thus predicted and added to the Laplacian matrix to overcome problem: The adaptive net function can be written as: Based on for , the adaptive penalty function can be written as:

The coordinate descent algorithm

To solve regularization models, the coordinate descent algorithm adopted as a competent tool. Regarding the coordinate descent algorithm, we referred to previous research [15, 16, 11] and Eq. (2) can be linearized by Taylor series expansion at current estimates : where , , . Next, the estimator: where , and is an enhanced thresholding operator for the coordinate descent algorithm [12, 13, 14]. where .

Results and discussion

Analyses of simulated data

The performance of the proposed simulation study quoting the simulation from Teschendorff et al. [17] and Su and Wang [11] was analyzed and evaluated. There were 600 groups, which were divided into 100 groups, 150 groups and 7 sets of 50 groups in accordance to their number of CpG sites. Each group comprised of at least 1 CpG site up to 9 CpG sites reciprocally. In total, there were 2500 CpG sites. First, we simulated variables with the group effect ranging between 0 and 1. So we performed an inverse logit transformation on a multivariate normal distribution variable to represent the -values of the i-th CpG site in the g-th group. where is the size of group, i.e. . In this simulation model, we set , ranging between 0 (unmethylated) to 1 (completely methylated). The relationship of CpG sites within group is shown by . The covariance matrix is presented as follows: 0.2, 0.5, 0.7. for a b and and 1 for , 0.2, 0.5, 0.7. The first condition is autoregressive (AR) model, and the second condition is compound symmetric correlation model. We set three different correlation coefficients 0.2, 0.5 and 0.7 for all conditions [18, 19]. Second, given the regression coefficients based on previous research, is the coefficient of CpG sites within the g-th group. After that, one group from each of the 9 different groups was selected to set the regression coefficients. At this step, there were 45 CpG sites which have been assigned the regression coefficients value. The regression coefficients are presented as: when is the strength of the true signals. The other sets of regression coefficients were set to 0. In the simulation models, there were 45 pathogenic CpG sites in a total of 2500 CpG sites. Lastly, the is given by Bernoulli distribution. For each simulation set, there were 200 cases and 200 controls. There were nine simulation conditions based on different parameters, for instance the strength of the true signals. The total area under the averaged ROC curves (AUC) and MSE for all models The ROC curve of every model. We repeated simulations 100 times for each condition. We then used the 10-fold cross-validation (CV) approach in the training set in order to tune the optimal regularization parameters of the Lasso, Elastic-Net (Enet), Fc.net, , HLR (), Fc.net. Note that, the Enet, Fc.net, HLR () and Fc.net methods have two-dimensional parameter surfaces in the 10-CV approach. Afterwards, the logistic regressions with the estimated tuning parameters were employed to build different classifiers. Lastly, the attained classifiers were adopted to the test set for further classification and prediction. The AUC of real data for each method The histogram of correlation between CpG sites. Figure 2 shows the receiver operating characteristic curve (ROC curve) for every model. The green solid line (Fc.net) is closer to the upper left corner in the system than other line. So the effect of Fc.net is at optimal for the other algorithm in each model. Table 1 shows the total area under the averaged ROC curves and the MSE of every model respectively. From Table 1 it can be seen that Fc.net also has a very good performance within all models. In general, our proposed enhanced model achieved preponderant accuracy rates in all models in comparison to the other methods (the Lasso, Enet, Fc.net, and HLR ()).

Figure 2.

The ROC curve of every model.

Table 1

The total area under the averaged ROC curves (AUC) and MSE for all models

δ	σ	ρ	Lasso		Enet		L1+ Fc.net		L1/2		HLR (L1/2+L2)		L1/2+ Fc.net
			AUC	MSE	AUC	MSE	AUC	MSE	AUC	MSE	AUC	MSE	AUC	MSE
1	AR(1)	0.2	0.806	0.273	0.847	0.238	0.850	0.239	0.809	0.260	0.853	0.254	0.860	0.197
1	AR(1)	0.5	0.871	0.213	0.885	0.206	0.888	0.197	0.856	0.225	0.933	0.161	0.954	0.116
1	AR(1)	0.7	0.898	0.187	0.906	0.185	0.909	0.175	0.917	0.169	0.942	0.149	0.962	0.101
2	AR(1)	0.2	0.806	0.273	0.860	0.237	0.866	0.220	0.809	0.260	0.869	0.213	0.921	0.155
2	AR(1)	0.5	0.871	0.273	0.889	0.237	0.903	0.220	0.918	0.167	0.953	0.134	0.975	0.085
2	AR(1)	0.7	0.898	0.187	0.904	0.177	0.912	0.168	0.917	0.169	0.953	0.133	0.970	0.089
2	CS	0.2	0.852	0.226	0.879	0.207	0.889	0.193	0.820	0.257	0.893	0.197	0.936	0.139
2	CS	0.5	0.899	0.181	0.913	0.163	0.919	0.157	0.895	0.182	0.961	0.128	0.969	0.097
2	CS	0.7	0.924	0.162	0.927	0.157	0.934	0.147	0.915	0.171	0.957	0.125	0.979	0.080

The top 20 CpG sites and the corresponding genes selected from the comparison between pre-treatment and normal control cases The boxplot of correlation between CpG sites.

Analyses of real data

To further evaluate the effectiveness of our proposed method, in this section, we examined the DNA methylation (ovarian cancer) data generated from Illumina Infiniumm HumanMethylation 27K Beadchip [20]. The data is accessible from NCBI (http://www.ncbi.nlm.nih.gov/). The data was generated by llumina Infiniumm HumanMethylation 27K Beadchip that contains 22727 CpG sites. We first removed samples which were low in BS conversion efficiency or low in CpG coverage. After that, a total of 207 genes contained more than 3 CpG sites and 295 CpG islands contained more than 3 CpG sites in the data; samples with error were removed. Lastly, there were 156 controls case samples (Healthy sample), 120 pre-treatment case samples and 122 post-treatment case samples. For these three cases, we calculated the maximum correlation of CpG sites in each group (gene and CpG island). Figure 3a–c shows the histogram of maximum sample correlation between CpG sites within genes in control, pre-treatment and post-treatment case where Fig. 3d–f shows the histogram of maximum sample correlation between CpG sites within CpG islands in control, pre-treatment and post-treatment cases. Figure 4 shows the boxplot of maximum sample correlation between CpG sites in gene or CpG islands. Based on Figs 3 and 4, the results show that most CpG sites within the same group have high correlation in pre-treatment case samples and post-treatment case samples whereas the control case samples only show a significant correlation.

Figure 3.

The histogram of correlation between CpG sites.

Figure 4.

The boxplot of correlation between CpG sites.

The top 20 CpG sites and the corresponding genes selected from the comparison between post-treatment and normal control cases Table 2 shows the AUC for each method from real data analysis. In real data, the enhanced model also achieved higher accuracy rates. Tables 3 and 4 show the top 20 selected CpG sites for all methods. We further validated the chosen genes from the GeneCards Database (http://www.genecards.org). In Table 3, when comparing pre-treatment cases with controls, the algorithm central node Fc.net (gene and CpG island) found various genes (TIAM1 [21, 22], CST7 [25], TCEAL7 [23, 24] and RNF11 [26]) that were not found by Fc.net (gene) and Enet. Likewise, HLR () was unable to find these genes (CST7, TCEAL7 and RNF11) where these genes (TIAM1, CST7, TCEAL7 and RNF11) were found to be correlated with cancer in previous research. On one hand, all methods were able to find genes (MPO [27, 28], PTPRCAP [29]); on the other hand, network penalty methods were able to find genes (CD248 [31] and HOXB5 [32]). In Table 4, our algorithm central node Fc.net (gene and CpG island) also found genes (CD200 [30], SRD5A2L [34], ENO1 [33]) which have not been found in Fc.net(gene) and Enet. Additionally, gene CD200 and gene SRD5A2L also proved to be related to cancer. The Fc.net and central node Fc.net (gene and CpG island) algorithm, which has gene and island network information, also found gene (TNFAIP8 [35]) which was not found by Enet/HLR.

Table 2

The AUC of real data for each method

	Lasso	Enet	L1+ Fc.net(gene)	L1/2	HLR (L1/2+L2)	L1/2+ central node Fc.net
Pre	0.798	0.886	0.921	0.803	0.908	0.946
Post	0.762	0.898	0.934	0.771	0.923	0.948

Table 3

The top 20 CpG sites and the corresponding genes selected from the comparison between pre-treatment and normal control cases

Enet		L1 + Fc.net (gene)		HLR (L1/2+L2)		L1/2 + central node Fc.net
cg1100973	cg0237448	cg2079283	cg15616083	cg02505409	cg21493583	cg11804789	cg06409153
(MARCO)	(PRF1)	(PTPRCAP)	(PAGE2)	(ANGPTL4)	(CRIPT)	(CST7)	(ABCA5)
cg0498897	cg2007009	cg04988978	cg27303882	cg06521852	cg00201234	cg24505527	cg05923103
(MPO)	(S100A8)	(MPO)	(MYL4)	(HRIHFB2122)	(FBLN2)	(NKIRAS2)	(RNF11)
cg2079283	cg2706761	cg0996492	cg05294455	cg08694544	cg09638834	cg15853125	cg09497789
(PTPRCAP)	(CYP4F3)	(KCNE1)	(ADORA1)	(RTBDN)	(RAET1L)	(TIAM1)	(SPAG17)
cg0996492	cg0435376	cg11009736	cg13626881	cg15853125	cg14861570	cg08694544	cg13626881
(KCNE1)	(MS4A6A)	(MARCO)	(ADORA1)	(TIAM1)	(MMD)	(RTBDN)	(ADORA1)
cg0652185	cg0224062	cg14360917	cg11412582	cg21608192	cg09964921	cg07607462	cg11412582
(HRIHFB2122)	(PLCB2)	(SP2)	(HOXB5)	(XYLT1)	(KCNE1)	(UBR1)	(HOXB5)
cg0013453	cg0619637	cg03801286	cg01405107	cg09497789	cg04988978	cg20792833	cg15736165
(UBASH3A)	(TREM1)	(KCNE1)	(IGLL1)	(SPAG17)	(MPO)	(PTPRCAP)	(BNC1)
cg1436091	cg21126943	cg06521852	cg10494770	cg20792833	cg14319409	cg14027234	cg05105069
(SP2)	(CEACAM6)	(HRIHFB2122)	(SNRPN)	(PTPRCAP)	(GLRA1)	(CD248)	(TCEAL7)
cg2193281	cg0020123	cg21517055	cg24993443	cg06409153	cg26838900	cg00201234	cg07376232
(CSTA)	(FBLN2)	(MGC11271)	(BRDG1)	(ABCA5)	(LRRC15)	(FBLN2)	(AMICA1)
cg0097486	cg2746119	cg00201234	cg04398282	cg13626881	cg23490074	cg04988978	cg21493583
(FCGR3B)	(FXYD1)	(FBLN2)	(ABCA5)	(ADORA1)	(C19orf2)	(MPO)	(CRIPT)
cg2151705	cg0529445	cg00134539	cg14027234	cg2193281	cg17231524	cg06183267	cg03856723
(MGC11271)	(MYL4)	(KCNQ2)	(CD248)	(CSTA)	(MGC39606)	(AFF3)	(PRKACA)

Table 4

The top 20 CpG sites and the corresponding genes selected from the comparison between post-treatment and normal control cases

Enet		L1 + Fc.net (gene)		HLR (L1/2+L2)		L1/2 + central node Fc.net
cg23580000	cg09626634	cg06653796	cg12243271	cg17682828	cg25554036	cg11093356	cg04836428
(ADCY7)	(EBI2)	(LIME1)	(CFI)	(FXYD7)	(WFS1)	(DDX19A)	(DTNA)
cg06653796	cg22988566	cg10986043	cg10467098	cg02713563	cg23125689	cg12711814	cg10777851
(LIME1)	(WFDC10B)	(TCAP)	(Bles03)	(TRAPPC6A)	(CD81)	(ENO1)	(CD200)
cg10986043	cg24335895	cg23580000	cg19573166	cg06653796	cg04232649	cg12906740	cg00636639
(TCAP)	(COX7A1)	(ADCY7)	(SLC22A17)	(LIME1)	(CCNG1)	(NUDT15)	(MRRF)
cg13379236	cg19573166	cg13379236	cg15096140	cg15489301	cg25410053	cg14838256	cg17133388
(EGF)	(SLC22A17)	(EGF)	(MYO1B)	(AKR1B10)	(ZIC3)	(SRD5A2L)	(C3orf28)
cg03547797	cg15096140	cg03547797	cg05767404	cg11093356	cg24643262	cg23002907	cg14275779
(GAS2)	(MYO1B)	(GAS2)	(C1orf150)	(DDX19A)	(BMX)	(RBMS2)	(PLEKHH3)
cg05135288	cg13745870	cg05135288	cg05004940	cg03547797	cg26200585	cg02964389	cg07389922
(RHOT2)	(SPATA12)	(RHOT2)	(C20orf195)	(GAS2)	(PRX)	(PSMD9)	(C17orf81)
cg20357806	cg00134539	cg12006284	cg23506842	cg20630655	cg14132995	cg23917399	cg19514928
(PPBP)	(UBASH3A)	(WT1)	(PTPN7)	(RNUT1)	(SLC35A2)	(TNFAIP8)	(TMEM56)
cg12006284	cg16853982	cg20357806	cg23917399	cg10986043	cg13056210	cg09119665	cg05798972
(WT1)	(ACTN2)	(PPBP)	(SPATA12)	(TCAP)	(MXRA5)	(PNMA1)	(PPARBP)
cg21640749	cg10467098	cg24335895	cg09626634	cg02497758	cg04499381	cg17682828	cg00096922
(CD300LF)	(Bles03)	(COX7A1)	(EBI2)	(MAFB)	(CXorf9)	(FXYD7)	(DLX5)
cg12243271	cg13247990	cg21640749	cg23917399	cg25919221	cg13435792	cg09816912	cg04232649
(CFI)	(MLCK)	(CD300LF)	(TNFAIP8)	(CA6)	(C12orf46)	(MARCKS)	(CCNG1)

Conclusion

In biological molecular research, the analysis of DNA methylation may be a new practice for cancer research. In this paper, we used the enhanced penalized logistic regression model to extract divergently methylated CpG sites between healthy controls and ovarian cancer cases. We constructed the central node fully connected network which combines with genome information and CpG island information. We have advanced the corresponding coordinate descent algorithm suited for real DNA methylation data. This method not only has the penalty sparser than , it also has more CpG sites relationship information. In real data, we used ovarian cancer samples with over 20,000 CpG sites. Even though the quantity of the selected CpG sites was less than previous methods, more corresponding CpG sites within genes selected were potentially associated with cancers. Therefore, by comparing to traditional methods, our method clearly achieved a higher predictive accuracy. Therefore, the proposed method offers an advanced tool to researchers in DNA methylation and can be a powerful tool for recognizing pathogenic CpG sites.

29 in total

1. Method to detect differentially methylated loci with case-control designs using Illumina arrays.

Authors: Shuang Wang
Journal: Genet Epidemiol Date: 2011-08-04 Impact factor: 2.135

2. L1/2 regularization: a thresholding representation theory and a fast solver.

Authors: Zongben Xu; Xiangyu Chang; Fengmin Xu; Hai Zhang
Journal: IEEE Trans Neural Netw Learn Syst Date: 2012-07 Impact factor: 10.451

3. High-throughput DNA methylation profiling using universal bead arrays.

Authors: Marina Bibikova; Zhenwu Lin; Lixin Zhou; Eugene Chudin; Eliza Wickham Garcia; Bonnie Wu; Dennis Doucet; Neal J Thomas; Yunhua Wang; Ekkehard Vollmer; Torsten Goldmann; Carola Seifart; Wei Jiang; David L Barker; Mark S Chee; Joanna Floros; Jian-Bing Fan
Journal: Genome Res Date: 2006-01-31 Impact factor: 9.043

4. Novel 5 alpha-steroid reductase (SRD5A3, type-3) is overexpressed in hormone-refractory prostate cancer.

Authors: Motohide Uemura; Kenji Tamura; Suyoun Chung; Seijiro Honma; Akihiko Okuyama; Yusuke Nakamura; Hidewaki Nakagawa
Journal: Cancer Sci Date: 2007-11-06 Impact factor: 6.716

5. A regulatory polymorphism at position -309 in PTPRCAP is associated with susceptibility to diffuse-type gastric cancer and gene expression.

Authors: Hyoungseok Ju; Byungho Lim; Minjin Kim; Yong Sung Kim; Woo Ho Kim; Chunhwa Ihm; Seung-Moo Noh; Dong Soo Han; Hang-Jong Yu; Bo Youl Choi; Changwon Kang
Journal: Neoplasia Date: 2009-12 Impact factor: 5.715

6. CD200: a putative therapeutic target in cancer.

Authors: Jérôme Moreaux; Jean Luc Veyrune; Thierry Reme; John De Vos; Bernard Klein
Journal: Biochem Biophys Res Commun Date: 2007-12-04 Impact factor: 3.575

7. Polymorphisms in TCEAL7 and risk of epithelial ovarian cancer.

Authors: Abraham Peedicayil; Robert A Vierkant; Vijayalakshmi Shridhar; Joellen M Schildkraut; Sebastian Armasu; Lynn C Hartmann; Brooke L Fridley; Julie M Cunningham; Catherine M Phelan; Thomas A Sellers; Ellen L Goode
Journal: Gynecol Oncol Date: 2009-05-05 Impact factor: 5.482

8. A comparison of cluster analysis methods using DNA methylation data.

Authors: Kimberly D Siegmund; Peter W Laird; Ite A Laird-Offringa
Journal: Bioinformatics Date: 2004-03-25 Impact factor: 6.937

9. TNFAIP8 promotes the proliferation and cisplatin chemoresistance of non-small cell lung cancer through MDM2/p53 pathway.

Authors: Ying Xing; Yuechao Liu; Tianbo Liu; Qingwei Meng; Hailing Lu; Wei Liu; Jing Hu; Chunhong Li; Mengru Cao; Shi Yan; Jian Huang; Ting Wang; Li Cai
Journal: Cell Commun Signal Date: 2018-07-31 Impact factor: 5.712

10. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling.

Authors: Ruth Pidsley; Elena Zotenko; Timothy J Peters; Mitchell G Lawrence; Gail P Risbridger; Peter Molloy; Susan Van Djik; Beverly Muhlhausler; Clare Stirzaker; Susan J Clark
Journal: Genome Biol Date: 2016-10-07 Impact factor: 13.583