Zhenxing Wang1, XiaoLiang Wu1, Yadong Wang2. 1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China. 2. School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China. ydwang@hit.edu.cn.
Abstract
BACKGROUND: DNA methylation has been identified to be widely associated to complex diseases. Among biological platforms to profile DNA methylation in human, the Illumina Infinium HumanMethylation450 BeadChip (450K) has been accepted as one of the most efficient technologies. However, challenges exist in analysis of DNA methylation data generated by this technology due to widespread biases. RESULTS: Here we proposed a generalized framework for evaluating data analysis methods for Illumina 450K array. This framework considers the following steps towards a successful analysis: importing data, quality control, within-array normalization, correcting type bias, detecting differentially methylated probes or regions and biological interpretation. CONCLUSIONS: We evaluated five methods using three real datasets, and proposed outperform methods for the Illumina 450K array data analysis. Minfi and methylumi are optimal choice when analyzing small dataset. BMIQ and RCP are proper to correcting type bias and the normalized result of them can be used to discover DMPs. R package missMethyl is suitable for GO term enrichment analysis and biological interpretation.
BACKGROUND: DNA methylation has been identified to be widely associated to complex diseases. Among biological platforms to profile DNA methylation in human, the Illumina Infinium HumanMethylation450 BeadChip (450K) has been accepted as one of the most efficient technologies. However, challenges exist in analysis of DNA methylation data generated by this technology due to widespread biases. RESULTS: Here we proposed a generalized framework for evaluating data analysis methods for Illumina 450K array. This framework considers the following steps towards a successful analysis: importing data, quality control, within-array normalization, correcting type bias, detecting differentially methylated probes or regions and biological interpretation. CONCLUSIONS: We evaluated five methods using three real datasets, and proposed outperform methods for the Illumina 450K array data analysis. Minfi and methylumi are optimal choice when analyzing small dataset. BMIQ and RCP are proper to correcting type bias and the normalized result of them can be used to discover DMPs. R package missMethyl is suitable for GO term enrichment analysis and biological interpretation.
Entities:
Keywords:
DNA methylation; Illumina 450K; Normalization; Ontology interpretation
DNA methylation is an important epigenetic modification which has shown numerous associations with biological processes and complex diseases such as diabetes, schizophrenia and cancer [1-4]. However, the methylomic landscape in disease pathogenesis has not yet been well characterized, especially in cancer where DNA methylation can be altered dramatically. Interests of exploring the associations between DNA methylation and complex diseases increase in disease studies.Illumina Infinium HumanMethylation450 (450K) BeadChip array, which covers over 480K CpG sites and targets 96% of CpG islands in human genome [5], has been widely utilized in many large studies, such as The Cancer Genome Atlas (TCGA) and The International Cancer Genome Consortium (ICGC) Project [6]. With the availability of public data resources, a number of methods for analyzing the Illumina 450K array data became rapidly available in the past few years.Unlike the previous platform Illumina Infinium HumanMethylation27 (27K) BeadChip, in which only one probe type is utilized, the Illumina 450K BeadChip includes two distinct probe types, Infinium I (n=135501) and Infinium II (n=350076) [5]. Each CpG site of Infinium I is targeted by two 50bp probes: one for detecting “methylated” (M) intensity and one for detecting “unmethylated” (U) intensity, whereas each CpG site of Infinium II uses just one probe to distinguish “M” and “U” intensity through different dye colors (green and red), then the β-value, indicating the methylation level of one CpG site, can be computed as β=M/(M+U+α) where α is 100 generally. M-value, M=log2(β/(1−β)), the logit-transformed β-value, is another quantity used in following up analysis.Owing to the platform design, more loci could be tested simultaneously on a fixed array size. Bibikova et al. [5] report a difference between Infinium I and Infinium II that Infinium II assays demonstrate an average of β-value upward shift for U and downward shift for M (shown in Fig. 1). Dedeurwaerder et al. [7] further evaluated the Illumina 450K BeadChip and reported that the β-values obtained from Infinium II probes had a narrower dynamic range and were less reproducible than those obtained from Infinium I leading to a type design bias. Hence, data preprocessing and normalization is critical for analyzing the Illumina 450K array data. Although many methods and R packages for correcting probe design bias have been proposed [8-14], more attention should be paid on the entire framework for analyzing Illumina 450K array data and selecting different modules on various datasets. Here we present a generalized framework including importing data, quality control, within-array normalization, correction of the type bias, identification of differential methylated probes or regions and biological interpretation, whilst the most popular modules for each step will be introduced, facilitating users to select appropriate module according their needs.
Fig. 1
Density curves of the β-values of Infinium I and Infinium II of a HPV-HNC sample (GSM937820 in GSE38268). The distribution models of β-values of the two types’ probes are different, where the Infinium II probe’s curve shows a narrower dynamic range
Density curves of the β-values of Infinium I and Infinium II of a HPV-HNC sample (GSM937820 in GSE38268). The distribution models of β-values of the two types’ probes are different, where the Infinium II probe’s curve shows a narrower dynamic range
Methods
Figure 2 shows the diagram of the framework and the detail of each step will be introduced below.
Fig. 2
A generalized framework of Illumina 450K array data analysis. There are six steps in the generalized framework leading to a successful methylation data analysis
A generalized framework of Illumina 450K array data analysis. There are six steps in the generalized framework leading to a successful methylation data analysis
Importing data
There are mainly two forms of the Illumina 450K array data: i) raw (*.idat) data which is the direct output from Illumina iScan system and stores intensities for each probe, ii) *.txt data, which is usually got after simple preprocessing and is easier to access. Both file formats can be handled by R packages: *.idat files can be read by illuminaio package [15] and *.txt files can be dealt with minfi [12], wateRmelon [11] et al. We use three datasets for expounding our framework.Dataset 1: Illumina 450K dataset from Dedeurwaerder et al. (GEO accession number: GSE29290) [7]. Three samples on HCT116WT cell-lines are considered, to evaluate the capability of methods to reduce the replicate variation.Dataset 2: Illumina 450K dataset of fresh frozen head and neck cancer (HNC) samples form GSE38268 [16], where three of them are HPV+ and other three are HPV- samples.Dataset 3: Illumina 450K dataset of level 1 methylation data of TCGA KIRC samples, which are *.idat files containing 160 normal samples and 325 tumor samples, to evaluate the efficiency of different modules.
Quality control
After the data imported into R, we would evaluate the quality of data. First, probes displaying a high detected p-value should be filtered out (e.g. > 0.01), while such probes have a β-value of “NA” in *.txt files. It is worth mentioned that different strategies of assigning missing β-value are applied in different modules. In minfi, the β-values are assigned with “NAs” when both M and U intensities are zero, but an additional criteria is that “NAs” will be assigned if either M or U intensities don’t fluoresce above background. Second, there are 850 inbuilt control probes on the 450K array, such as bisulfite conversion I, bisulfite conversion II, extension, hybridization and negative (n=613), which can be used to evaluate the other probes’ intensities. Samples that can’t pass this quality control are excluded in further analysis. Third, probes on Chromosome X or Y should be filtered out to eliminate the impact of sex on differential methylated analysis in many studies. Fourth, Price ME et al. [17] reported that there were 4.3% of Illumina 450K probes containing a known SNP at the targeted site. Such probes should cause problems in inter-sample analysis. Last but not the least, cross-reactive probes on the Illumina 450K array are identified depending on [18], which is particularly problematic because the β-value of such probes is more likely to represent a combination of multiple sites and not the level of initially targeted CpG sites.
Within-array normalization
This step includes background correction and color bias adjustment. Background correction methods have been developed by Triche et al. [19] such as Noob and Normexp, which are based on convolution models and use out-of-band (OOB) probes intensities to measure the background. The lumi R package provides two different methods for eliminating the background. The first one is based on the negative-control probes inbuilt the BeadChip and the second one estimates the background from the density modes of probes intensities. The second one would show bad performance when there are more than two density modes for some sample. The popular methylumi R package [20] proposed a color bias adjustment based on smooth quantile or shift-and-scaling normalization. Globally, these methods seem to improve the data quality in some cases [7].
Correction of the type bias
The type bias is the one that is most crucial to correct as it is the main source decreasing the data quality. There have been several efforts to develop methodologies to correct the probe type bias because of the differences between Infinium I and Infinium II. Because Infinium I probes are more stable and reproducible across different samples, most methods reduce the bias of Infinium II rather than Infinium I probes.The first method is called peak-based correction (PBC) [7], which rescale the methylation values of Infinium II to the same modes for distribution of methylation values of Infinium I. But this method is sensitive to the shape of β-value density curves and is therefore less robust when the methylation density distribution does not exhibit well-defined peaks.Touleimat and Tost [8] developed a method called Subset Quantile Normalization (SQN) based on an assumption that the β-values of CpGs form the same biological category should have the same density distribution. They found that the normalization result of using the “relation to CpG” annotation perfectly corrected the bias.Subset-quantile Within Array Normalization (SWAN) [10] was developed based on the assumption that the β-values distribution should be the same when the probes have the same number of CpGs. But SWAN also alters Infinium I probe data, which increases Infinium I technical variation, and does not seem to improve the data quality when applied to some datasets [21].Beta Mixture Quantile normalization (BMIQ) [9] method decomposes the density profile of Infinium I and Infinium II probes by fitting a beta-mixture model of three states: unmethylated (default β-value <0.25), hemimethylated (default 0.25 ≤β-value <0.75) and fully methylated (default β-value ≥0.75). Then it uses a quantile normalization to fit β-values distribution of Infinium II to the corresponding β-values distribution of Infinium I. This method does not depend on unceremonious choices of biological characteristics to be used to normalize data. Thus it seems more suitable than other methods. However, some points appear worse after BMIQ correction.Another method, called Regression on Correlated Probes (RCP) [14], uses a quantile linear regression model of correlation between pairs of nearby Infinium I and II probes that share the same genomic context to adjust the methylation levels of Infinium II probes. The weakness of RCP is that it may not fit some experimental data leading to a result worse than raw data.While background is important for measuring absolute methylation levels for single sample/condition experiments, we ignore the background here in this analysis since it can be cancelled out when comparing two conditions. As shown by other studies [22], widely used normalization process, which is based on the assumption that the majority of signals should not change across compared conditions, usually makes mistakes when it was applied to experiments where large portions of signals are differentially expressed. Thus, with a good quality control on the analyzed datasets, we didn’t choose to apply common normalization strategies.It is also important to remove non-biological variation called batch effects existing between batches and samples. Such batch effects can influence on measurement of global level that could be partially removed through between-sample normalization using principal component analysis.
Identification of DMPs/DMRs
As we mentioned above, the main focus of many methylation studies has been on detecting differentially methylated probes (DMPs) or regions (DMRs) associated with a phenotype. The β-value is the default value for methylation measurement, allowing easy biological interpretation. Another type of value, M-value, is used to express the degree of methylation obtained with Infinium. Due to the heteroscedasticity of β-value, the variance of M-value across the methylation range is approximately constant, so the M-value has better statistical properties. The two types of value are used in different methods.SQN simply considers a probe as DMP if the absolute value of the difference between β-value medians of paired samples is higher than 0.2:
where and corresponding β-values of paired normal and tumor samples. The 0.2 threshold represents approximately a difference in methylation level of 20% which can be detected by the Infinium technology with 99% confidence [5]. Then the corresponding differentially methylated gene identities can be obtained from the list of DMPs.Minfi offers a comprehensive package to analyze Illumina 450K array data, where candidate regions are determined for DMR analysis and locally weighted scatterplot smoothing (LOESS) is adopted to smooth the methylation differences between groups within each determined region. It also can find long-range alterations such as identified hypomethylated blocks [23] based on “open sea” probes. A empirical Bayes moderated t-test is used in limma [24] when sample sizes are less than 10, in which case M-values should be used as they rely much more on Gaussianity assumption [25].Generally, DMRs are detected by applying various statistical techniques such as Fisher’s exact test [26, 27], t-test [27], Wilcoxon rank sum test [28] or different regression models [29-31].
Biological interpretation
There is a long list of significant CpGs to be interpreted after differential methylation analysis. Using the Infinium annotation file, Illumina 450K probes are classified according to their relations to CGIs and to the closest annotated gene. Regarding their relation to CGIs, the probes are classified into four categories: sites located inside a CGI, sites located in the CGI shores (0-2k bp), sites located in the CGI shelves (2-4k bp) and sites located in the “open sea”. As regards their relation to annotated genes, the sites are categorized as inside the promoter, inside the 5’-UTR region, inside the gene body and inside the 3’-UTR region. Then the significant DMPs can be marked with their related genes.Performing gene set analysis is a popular way to understand the affected potential gene pathways. Although gene set analysis is well established in gene expression experimental, the research in methylation data is ongoing in different groups. In Illumina 450K array, the numbers of CpGs associated with each gene ranges largely from 1 to 1299 [32]. Genes with larger numbers of probes are more likely to have significant differentially methylated CpGs [33]. With the ontology and knowledgebase developing [34-40], researchers can easily annotate the genes containing DMPs or DMRs to ontology entries, which brings convenience for understanding the function of genes in the pathogenesis of diseases. Obviously, a phenotype is associated with several -omics data, such as mRNA expression and protein expression, which suggests researchers should utilize integrated analysis with multi-dimension data like TCGA project does [41, 42].
Results
Reduce the technical variation and type bias
We evaluated the different modules of correcting methods including SQN, SWAN, Dasen, BMIQ and RCP methods on Dataset 1, described in Table 1. The methylated and unmethylated intensities are imported into R environment and some results are displayed in following figures. For each method, we first plotted the density curves of β-value for three samples in Dataset 1. Then, we computed the standard deviation across the three replicates.
Table 1
Normalization methods for Illumina 450K array data
Method
Object
R Package
Ref.
SQN
MethyLumiSet
wateRmelon
[8, 11]
SWAN
RGChannelSet
minfi,wateRmelon
[11, 12]
Dasen
MethylSet
wateRmelon
[11]
BMIQ
β-value, MethylSet
ENmix
[9, 13]
RCP
MethylSet
ENmix
[13]
Normalization methods for Illumina 450K array dataAs seen, SQN, Dasen and RCP can significantly reduce the technical variation (Fig. 3) and the same result can be seen base on standard deviation of replicates (Fig. 4). Because BMIQ and RCP do not change the β-values of Infinium I probes, the standard deviation of Infinium I probes of the two methods stays the same as the raw data. SWAN shows the least ability of reducing the variation among replicates.
Fig. 3
Density curves of β-value of replicates. Density curves of all probes after normalization using different methods are shown in this figure. The separate color represents replicates of the dataset. Raw data shows that there are differences among the replicates, while SQN, Dasen and RCP significantly reduce the technical variation
Fig. 4
Box plot of standard deviation across replicates. The standard deviation between replicates after normalization shows that SQN, Dasen and RCP make the normalized β-values of replicates more similar than other two methods
Density curves of β-value of replicates. Density curves of all probes after normalization using different methods are shown in this figure. The separate color represents replicates of the dataset. Raw data shows that there are differences among the replicates, while SQN, Dasen and RCP significantly reduce the technical variationBox plot of standard deviation across replicates. The standard deviation between replicates after normalization shows that SQN, Dasen and RCP make the normalized β-values of replicates more similar than other two methodsWe also plotted the density curves of β-value of Infinium I/II probes for different methods. BMIQ, RCP and SQN show similar performance on the sample GSM815136 such that the Infinium I and Infinium II probes have more similar distribution modes with same local maximum values, while Dasen and SWAN underperformed others regarding removing the type bias (Fig. 5).
Fig. 5
Density curves of β-value of two types in sample GSM815136 for different methods. Density curves of β-values of Infinium I/II probes are plotted with different colors. BMIQ, RCP and SQN can correct the type bias efficiently on this sample
Density curves of β-value of two types in sample GSM815136 for different methods. Density curves of β-values of Infinium I/II probes are plotted with different colors. BMIQ, RCP and SQN can correct the type bias efficiently on this sample
Identify DMPs/DMRs by preprocessed results of different methods
We used the dmpFinder function in minfi R package to evaluate the result of identifying DMPs obtained by the five methods. We selected the probes with p-value of the results obtained after dmpFinder less than 0.05 as DMPs. The number of DMPs of five methods is listed in Table 2. Then the Venn diagram of the detected probes was plotted, shown in Fig. 6. As shown in Fig. 6, BMIQ and RCP methods show larger overlap with other methods, where the ratio of common probes is 43.41% (7370/16977 in BMIQ) and 39.42% (7370/18696 in RCP). Moreover, the most probes identified after BMIQ have intersections with others and only 311 probes (311/16977=1.83%) are identified uniquely. But it is also shown that there are differences among results of different methods, which maybe caused by the models they applying to normalization. Then the GO term enrichment analysis were analyzed with gometh function in missMethyl package. The GO terms of FDR ≤0.05 of five methods were got (see Table 2) and the Venn diagram of GO terms obtained after normalization is shown in Fig. 7. RCP got the highest rate of common (42/76=55.26%) GO terms among the five methods and SQN got the most number (n=199) of GO terms of which 33.67% (67/199) identified by only this method.
Table 2
The number of obtained DMPs and GO terms by five methods
SQN
SWAN
Dasen
BMIQ
RCP
#DMPs
19356
19667
17903
16977
18696
#GO term
199
125
145
85
76
Fig. 6
Venn diagram of DMPs obtained by five methods. After filtering probes with p-value <0.01, we use dmpfinder in minfi to identify DMPs with p-value <0.05. BMIQ and RCP can lead to better results than other three methods
Fig. 7
Venn diagram of GO terms obtained by five methods. There are 42 common GO terms identified by five methods. RCP has the highest rate of common terms (42/76=55.26%). SQN discovers most terms but 33.67% of them are identified by only this method
Venn diagram of DMPs obtained by five methods. After filtering probes with p-value <0.01, we use dmpfinder in minfi to identify DMPs with p-value <0.05. BMIQ and RCP can lead to better results than other three methodsVenn diagram of GO terms obtained by five methods. There are 42 common GO terms identified by five methods. RCP has the highest rate of common terms (42/76=55.26%). SQN discovers most terms but 33.67% of them are identified by only this methodThe number of obtained DMPs and GO terms by five methods
Efficiency of large dataset analysis
We use Dataset3 to evaluate the performance of minfi and meffil [43] in term of importing large data. It took 48 min and less than 3 Gb memory to import data of 485 samples (970 *.idat files) using meffil package on a computer with 8 Gb of RAM and 4 processors while minfi could not run on the same computer. Minfi took two hours and ∼ 23 Gb memory on a server to import the same dataset.
Discussion
In this study we propose a generalized framework for Illumina 450K data analysis. We evaluate five methods for correcting type bias. Analysis of reducing technical replicates showed that different methods optimized different assessment criterion. The summary table of evaluation is shown in Table 3, where the score is set to 1 if there is significant change using the method, otherwise the score is set to 0.5. RCP gets the highest total score based on mentioned criteria.
Table 3
Summary of evaluation of five methods
Replicate
Standard
Type bias
DMPs
GO terms
Total
variation
deviation
SQN
1
1
1
0.5
0.5
4
SWAN
0.5
0.5
0.5
0.5
0.5
2.5
Dasen
1
1
0.5
0.5
0.5
3.5
BMIQ
0.5
0.5
1
1
1
4
RCP
1
1
1
1
1
5
If there is significant change, score of the method is set to 1, otherwise the score is set to 0.5
Summary of evaluation of five methodsIf there is significant change, score of the method is set to 1, otherwise the score is set to 0.5The SQN, Dasen and RCP methods could significantly improve the replicated data quality, while SWAN and BMIQ didn’t show improvement of the replicates. SWAN and Dasen didn’t remove type bias as other methods, which might be due to that models they applying cannot fit the distribution of Infinium I well as other methods.When evaluation focused on detecting DMPs, BMIQ and RCP got more overlapped DMPs and credible GO terms than other methods. It is should pointed out that result may vary largely when using different datasets, which will be validated in further work.Illumina MethylationEPIC BeadChip [44] microarrays have been used in some project, which contain more probes on a single array. More efficient tools are in urgent need of merging 450K and EPIC array data and the efficiency of analysis should be considered. Minfi has been utilized widely but it cannot handle large dataset on personal computer in our view while the newly package meffil displayed surprising performance.During the evaluating processing, there were conflicts between R packages, for example, the MethylSet object in wateRmelon and minfi are different because the one in minfi has been updated in the newest version while the wateRmelon still use the previous object constructor. It is should be noticed in case of using different version of them.
Conclusions
It is suggested that the Illumina 450K users should choose proper strategy about importing data, background eliminating, correcting dye bias, correcting the type bias and detecting DMPs or DMRs. When analyzing small dataset, minfi and methylumi are optimal choice to import data and SQN, BMIQ and RCP may be proper to correcting the Infinium I/II bias. R package missMethyl is suitable for GO term enrichment analysis and biological interpretation. In our view, minfi is a proper R package to import data, eliminate background and ENmix package can be used to correct the type bias, then the normalized data should be used in the remaining steps of the framework.
Authors: Benjamin P Berman; Daniel J Weisenberger; Joseph F Aman; Toshinori Hinoue; Zachary Ramjan; Yaping Liu; Houtan Noushmehr; Christopher P E Lange; Cornelis M van Dijk; Rob A E M Tollenaar; David Van Den Berg; Peter W Laird Journal: Nat Genet Date: 2011-11-27 Impact factor: 38.330
Authors: Thomas J Hudson; Warwick Anderson; Axel Artez; Anna D Barker; Cindy Bell; Rosa R Bernabé; M K Bhan; Fabien Calvo; Iiro Eerola; Daniela S Gerhard; Alan Guttmacher; Mark Guyer; Fiona M Hemsley; Jennifer L Jennings; David Kerr; Peter Klatt; Patrik Kolar; Jun Kusada; David P Lane; Frank Laplace; Lu Youyong; Gerd Nettekoven; Brad Ozenberger; Jane Peterson; T S Rao; Jacques Remacle; Alan J Schafer; Tatsuhiro Shibata; Michael R Stratton; Joseph G Vockley; Koichi Watanabe; Huanming Yang; Matthew M F Yuen; Bartha M Knoppers; Martin Bobrow; Anne Cambon-Thomsen; Lynn G Dressler; Stephanie O M Dyke; Yann Joly; Kazuto Kato; Karen L Kennedy; Pilar Nicolás; Michael J Parker; Emmanuelle Rial-Sebbag; Carlos M Romeo-Casabona; Kenna M Shaw; Susan Wallace; Georgia L Wiesner; Nikolajs Zeps; Peter Lichter; Andrew V Biankin; Christian Chabannon; Lynda Chin; Bruno Clément; Enrique de Alava; Françoise Degos; Martin L Ferguson; Peter Geary; D Neil Hayes; Thomas J Hudson; Amber L Johns; Arek Kasprzyk; Hidewaki Nakagawa; Robert Penny; Miguel A Piris; Rajiv Sarin; Aldo Scarpa; Tatsuhiro Shibata; Marc van de Vijver; P Andrew Futreal; Hiroyuki Aburatani; Mónica Bayés; David D L Botwell; Peter J Campbell; Xavier Estivill; Daniela S Gerhard; Sean M Grimmond; Ivo Gut; Martin Hirst; Carlos López-Otín; Partha Majumder; Marco Marra; John D McPherson; Hidewaki Nakagawa; Zemin Ning; Xose S Puente; Yijun Ruan; Tatsuhiro Shibata; Michael R Stratton; Hendrik G Stunnenberg; Harold Swerdlow; Victor E Velculescu; Richard K Wilson; Hong H Xue; Liu Yang; Paul T Spellman; Gary D Bader; Paul C Boutros; Peter J Campbell; Paul Flicek; Gad Getz; Roderic Guigó; Guangwu Guo; David Haussler; Simon Heath; Tim J Hubbard; Tao Jiang; Steven M Jones; Qibin Li; Nuria López-Bigas; Ruibang Luo; Lakshmi Muthuswamy; B F Francis Ouellette; John V Pearson; Xose S Puente; Victor Quesada; Benjamin J Raphael; Chris Sander; Tatsuhiro Shibata; Terence P Speed; Lincoln D Stein; Joshua M Stuart; Jon W Teague; Yasushi Totoki; Tatsuhiko Tsunoda; Alfonso Valencia; David A Wheeler; Honglong Wu; Shancen Zhao; Guangyu Zhou; Lincoln D Stein; Roderic Guigó; Tim J Hubbard; Yann Joly; Steven M Jones; Arek Kasprzyk; Mark Lathrop; Nuria López-Bigas; B F Francis Ouellette; Paul T Spellman; Jon W Teague; Gilles Thomas; Alfonso Valencia; Teruhiko Yoshida; Karen L Kennedy; Myles Axton; Stephanie O M Dyke; P Andrew Futreal; Daniela S Gerhard; Chris Gunter; Mark Guyer; Thomas J Hudson; John D McPherson; Linda J Miller; Brad Ozenberger; Kenna M Shaw; Arek Kasprzyk; Lincoln D Stein; Junjun Zhang; Syed A Haider; Jianxin Wang; Christina K Yung; Anthony Cros; Anthony Cross; Yong Liang; Saravanamuttu Gnaneshan; Jonathan Guberman; Jack Hsu; Martin Bobrow; Don R C Chalmers; Karl W Hasel; Yann Joly; Terry S H Kaan; Karen L Kennedy; Bartha M Knoppers; William W Lowrance; Tohru Masui; Pilar Nicolás; Emmanuelle Rial-Sebbag; Laura Lyman Rodriguez; Catherine Vergely; Teruhiko Yoshida; Sean M Grimmond; Andrew V Biankin; David D L Bowtell; Nicole Cloonan; Anna deFazio; James R Eshleman; Dariush Etemadmoghadam; Brooke B Gardiner; Brooke A Gardiner; James G Kench; Aldo Scarpa; Robert L Sutherland; Margaret A Tempero; Nicola J Waddell; Peter J Wilson; John D McPherson; Steve Gallinger; Ming-Sound Tsao; Patricia A Shaw; Gloria M Petersen; Debabrata Mukhopadhyay; Lynda Chin; Ronald A DePinho; Sarah Thayer; Lakshmi Muthuswamy; Kamran Shazand; Timothy Beck; Michelle Sam; Lee Timms; Vanessa Ballin; Youyong Lu; Jiafu Ji; Xiuqing Zhang; Feng Chen; Xueda Hu; Guangyu Zhou; Qi Yang; Geng Tian; Lianhai Zhang; Xiaofang Xing; Xianghong Li; Zhenggang Zhu; Yingyan Yu; Jun Yu; Huanming Yang; Mark Lathrop; Jörg Tost; Paul Brennan; Ivana Holcatova; David Zaridze; Alvis Brazma; Lars Egevard; Egor Prokhortchouk; Rosamonde Elizabeth Banks; Mathias Uhlén; Anne Cambon-Thomsen; Juris Viksna; Fredrik Ponten; Konstantin Skryabin; Michael R Stratton; P Andrew Futreal; Ewan Birney; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Sancha Martin; Jorge S Reis-Filho; Andrea L Richardson; Christos Sotiriou; Hendrik G Stunnenberg; Giles Thoms; Marc van de Vijver; Laura van't Veer; Fabien Calvo; Daniel Birnbaum; Hélène Blanche; Pascal Boucher; Sandrine Boyault; Christian Chabannon; Ivo Gut; Jocelyne D Masson-Jacquemier; Mark Lathrop; Iris Pauporté; Xavier Pivot; Anne Vincent-Salomon; Eric Tabone; Charles Theillet; Gilles Thomas; Jörg Tost; Isabelle Treilleux; Fabien Calvo; Paulette Bioulac-Sage; Bruno Clément; Thomas Decaens; Françoise Degos; Dominique Franco; Ivo Gut; Marta Gut; Simon Heath; Mark Lathrop; Didier Samuel; Gilles Thomas; Jessica Zucman-Rossi; Peter Lichter; Roland Eils; Benedikt Brors; Jan O Korbel; Andrey Korshunov; Pablo Landgraf; Hans Lehrach; Stefan Pfister; Bernhard Radlwimmer; Guido Reifenberger; Michael D Taylor; Christof von Kalle; Partha P Majumder; Rajiv Sarin; T S Rao; M K Bhan; Aldo Scarpa; Paolo Pederzoli; Rita A Lawlor; Massimo Delledonne; Alberto Bardelli; Andrew V Biankin; Sean M Grimmond; Thomas Gress; David Klimstra; Giuseppe Zamboni; Tatsuhiro Shibata; Yusuke Nakamura; Hidewaki Nakagawa; Jun Kusada; Tatsuhiko Tsunoda; Satoru Miyano; Hiroyuki Aburatani; Kazuto Kato; Akihiro Fujimoto; Teruhiko Yoshida; Elias Campo; Carlos López-Otín; Xavier Estivill; Roderic Guigó; Silvia de Sanjosé; Miguel A Piris; Emili Montserrat; Marcos González-Díaz; Xose S Puente; Pedro Jares; Alfonso Valencia; Heinz Himmelbauer; Heinz Himmelbaue; Victor Quesada; Silvia Bea; Michael R Stratton; P Andrew Futreal; Peter J Campbell; Anne Vincent-Salomon; Andrea L Richardson; Jorge S Reis-Filho; Marc van de Vijver; Gilles Thomas; Jocelyne D Masson-Jacquemier; Samuel Aparicio; Ake Borg; Anne-Lise Børresen-Dale; Carlos Caldas; John A Foekens; Hendrik G Stunnenberg; Laura van't Veer; Douglas F Easton; Paul T Spellman; Sancha Martin; Anna D Barker; Lynda Chin; Francis S Collins; Carolyn C Compton; Martin L Ferguson; Daniela S Gerhard; Gad Getz; Chris Gunter; Alan Guttmacher; Mark Guyer; D Neil Hayes; Eric S Lander; Brad Ozenberger; Robert Penny; Jane Peterson; Chris Sander; Kenna M Shaw; Terence P Speed; Paul T Spellman; Joseph G Vockley; David A Wheeler; Richard K Wilson; Thomas J Hudson; Lynda Chin; Bartha M Knoppers; Eric S Lander; Peter Lichter; Lincoln D Stein; Michael R Stratton; Warwick Anderson; Anna D Barker; Cindy Bell; Martin Bobrow; Wylie Burke; Francis S Collins; Carolyn C Compton; Ronald A DePinho; Douglas F Easton; P Andrew Futreal; Daniela S Gerhard; Anthony R Green; Mark Guyer; Stanley R Hamilton; Tim J Hubbard; Olli P Kallioniemi; Karen L Kennedy; Timothy J Ley; Edison T Liu; Youyong Lu; Partha Majumder; Marco Marra; Brad Ozenberger; Jane Peterson; Alan J Schafer; Paul T Spellman; Hendrik G Stunnenberg; Brandon J Wainwright; Richard K Wilson; Huanming Yang Journal: Nature Date: 2010-04-15 Impact factor: 49.962
Authors: Mingxiang Teng; Curt Balch; Yunlong Liu; Meng Li; Tim H M Huang; Yadong Wang; Kenneth P Nephew; Lang Li Journal: PLoS One Date: 2012-03-06 Impact factor: 3.240
Authors: Matthias Lechner; Tim Fenton; James West; Gareth Wilson; Andrew Feber; Stephen Henderson; Christina Thirlwell; Harpreet K Dibra; Amrita Jay; Lee Butcher; Ankur R Chakravarthy; Fiona Gratrix; Nirali Patel; Francis Vaz; Paul O'Flynn; Nicholas Kalavrezos; Andrew E Teschendorff; Chris Boshoff; Stephan Beck Journal: Genome Med Date: 2013-02-05 Impact factor: 11.117
Authors: Diana L Juvinao-Quintero; Anne P Starling; Andres Cardenas; Camille E Powe; Patrice Perron; Luigi Bouchard; Dana Dabelea; Marie-France Hivert Journal: Epigenomics Date: 2021-01-07 Impact factor: 4.778
Authors: Iacopo Cristoferi; Tommaso Antonio Giacon; Karin Boer; Myrthe van Baardwijk; Flavia Neri; Manuela Campisi; Hendrikus J A N Kimenai; Marian C Clahsen-van Groningen; Sofia Pavanello; Lucrezia Furian; Robert C Minnee Journal: Clin Epigenetics Date: 2022-02-07 Impact factor: 6.551
Authors: Hyeon-Kyoung Koo; Jarrett Morrow; Priyadarshini Kachroo; Kelan Tantisira; Scott T Weiss; Craig P Hersh; Edwin K Silverman; Dawn L DeMeo Journal: Epigenetics Date: 2020-09-22 Impact factor: 4.528