Literature DB >> 27330998

Identification of polymorphic and off-target probe binding sites on the Illumina Infinium MethylationEPIC BeadChip.

Daniel L McCartney1, Rosie M Walker1, Stewart W Morris1, Andrew M McIntosh2, David J Porteous3, Kathryn L Evans3.   

Abstract

Genome-wide analysis of DNA methylation has now become a relatively inexpensive technique thanks to array-based methylation profiling technologies. The recently developed Illumina Infinium MethylationEPIC BeadChip interrogates methylation at over 850,000 sites across the human genome, covering 99% of RefSeq genes. This array supersedes the widely used Infinium HumanMethylation450 BeadChip, which has permitted insights into the relationship between DNA methylation and a wide range of conditions and traits. Previous research has identified issues with certain probes on both the HumanMethylation450 BeadChip and its predecessor, the Infinium HumanMethylation27 BeadChip, which were predicted to affect array performance. These issues concerned probe-binding specificity and the presence of polymorphisms at target sites. Using in silico methods, we have identified probes on the Infinium MethylationEPIC BeadChip that are predicted to (i) measure methylation at polymorphic sites and (ii) hybridise to multiple genomic regions. We intend these resources to be used for quality control procedures when analysing data derived from this platform.

Entities:  

Keywords:  Cross-hybridising probes; DNA methylation; Infinium MethylationEPIC BeadChip; Polymorphic CpG; Quality control

Year:  2016        PMID: 27330998      PMCID: PMC4909830          DOI: 10.1016/j.gdata.2016.05.012

Source DB:  PubMed          Journal:  Genom Data        ISSN: 2213-5960


Direct link to deposited data

http://dx.doi.org/10.1016/j.gdata.2016.05.012

Introduction

DNA methylation is an epigenetic mark typically occurring at cytosine-guanine dinucleotides (CpGs). Changes in DNA methylation are observed in normal development, in response to environmental stimuli, and in certain disease states [1]. DNA methylation is linked to transcriptional activity, rendering it a key regulatory motif [2]. Recent years have seen the development of high-throughput DNA methylation profiling techniques including whole-genome bisulphite sequencing (WGBS), methylated DNA immunoprecipitation (meDIP) and microarray-based technologies [3]. The Infinium HumanMethylation450 BeadChip, developed by Illumina, has offered an attractive array-based option to researchers, as it interrogates methylation at over 485,000 sites across the genome at single-base resolution at a relatively low cost (Bibikova et al., 2011 [4]). However, issues with probe-binding specificity and polymorphic targets have been identified which may compromise data integrity if not adequately addressed (Chen et al., 2013 [5]). The Infinium HumanMethylation450 BeadChip has recently been superseded by the Infinium MethylationEPIC BeadChip. This array interrogates DNA methylation at over 850,000 sites, including > 90% of the HumanMethylation450 array's targets. This substantial increase in coverage, coupled with a continuing trend for interest in the role of DNA methylation, is likely to result in wide-spread use of this array. As such, it is essential that its potential shortcomings are thoroughly understood. In order to generate a resource that will be of use to researchers using the MethylationEPIC BeadChip we have identified probes that may perform sub-optimally. This work, therefore, represents an update of Chen et al.'s [5] previous characterisation of the Infinium HumanMethylation450 BeadChip. Like its predecessor, the MethylationEPIC BeadChip uses two types of probe chemistry (Type I and Type II) to interrogate methylation. The differences between the two chemistries and the situations in which they are used have been described fully in previous publications [6]. Briefly, Type I assays use separate probes for unmethylated and methylated target sites while Type II assays use a single probe. Both assay types differentiate methylation state via single base extension of a fluorescent-labelled nucleotide. Taking the differences between Type I and Type II assays into consideration, we have performed in silico analyses to identify probes on the Infinium MethylationEPIC BeadChip that are predicted to hybridise to multiple genomic regions, as well as probes where signal may be affected by polymorphisms at the target site, which could alter probe binding. Both of these factors should be taken into account when performing quality control of data produced using this technology.

Methods

Identification of probes with a polymorphic target

Probes potentially affected by polymorphisms at the target site were identified following methods described previously [5]. The signal-generating process of single-base extension requires end-nucleotide matching for both Type I and Type II probes. Therefore, we limited our query to target CpGs and sites of single-base extension, as polymorphisms at these sites are most likely to generate spurious signals. Using information from the Infinium MethylationEPIC BeadChip manifest file (MethylationEPIC_v-1-0_B1.csv; date of download: 8 February 2016), we generated a list of genomic coordinates (hg19, GRCh37) of the target cytosine base (C) and guanine base (G) for all probes on the array. For Infinium Type I probes we also included the base before the target CpG, as this is the site of single base extension for these probes. We cross-referenced these coordinates to those of variants listed by the 1000 Genomes Project (phase 3) [7] to generate a list of probes affected by polymorphisms at the target CpG and/or site of single-base extension.

Identification of probes with non-specific hybridisation potential

Probes with the potential to cross-hybridise were identified following methods described previously [5].

Generation of probe sequences for in silico analyses

Many Infinium Type II probe sequences contain an “R” nucleotide representing either an adenine (A) or guanine (G) base, depending on whether the underlying target cytosine is methylated or unmethylated. All possible combinations of Type II probe sequences were generated, and combined with a list of the Type I probe sequences.

Generation of genomic comparison sequences for in silico analyses

The GRCh37 release of the human genome sequence was downloaded from the University of California, Santa Cruz (UCSC) Genome Browser website (https://genome.ucsc.edu/) as a reference, excluding alternative assemblies (e.g. chr17_ctg5_hap1) to avoid duplicated results (date of download: 11 January 2016). From this, we generated four modified reference genome sequences. A bisulphite-converted methylated forward genome sequence was generated in silico by converting all non-CpG cytosine bases to thymine (T) bases in the reference sequence. The same process was performed for the reverse complement of the reference sequence to generate a bisulphite-converted methylated reverse sequence of the human genome. Bisulphite-converted unmethylated forward and reverse sequences were generated by converting all C bases to T in the forward reference sequence and its reverse complement. Using the BLAST-like alignment tool (BLAT) [8], we aligned the probe sequences described above to the four modified reference genome sequences, as well as their reverse complements. The BLAT parameters used were: stepSize = 5, minScore = 0, minIdentity = 0 and repMatch = 1,000,000,000. Probes were defined as being at high-risk of non-specific binding if there was a gap-free match of 47 or more nucleotides, which had to include the end base of the query sequence, at an off-target locus.

Results

Infinium MethylationEPIC BeadChip probes with polymorphic targets

Coordinates for 866,836 probes were obtained from the Infinium MethylationEPIC BeadChip manifest downloaded on 8th February 2016. Excluding control probes, the manifest file contained 142,262 Type I probes (426,786 potential signal-affecting positions), and 724,574 Type II probes (1,449,148 potential signal-affecting positions), giving a total of 1,875,934 sites which were interrogated for genetic variation. We identified 340,327 sites with either single nucleotide polymorphisms (SNPs), insertions or deletions (indels), or structural variation. These sites were targeted by 297,744 unique probes: 34% of the total probe content of the Infinium MethylationEPIC BeadChip. Of these, 23,399 probes (2.7%) targeted polymorphic sites with a minor allele frequency (MAF) of ≥ 5% in at least one population studied. A table of probes affected by polymorphisms, with minor allele frequencies corresponding to African, admixed American, European, South Asian, and East Asian populations (AFR, AMR, EUR, SAS, EAS; respectively) is available in the supplementary information of this paper (Supplementary Table 1).

Infinium MethylationEPIC BeadChip probes with cross-hybridisation potential

A total of 1,752,932 potential probe sequences, each 50 bases in length, were aligned to in silico bisulphite-converted forward and reverse methylated and unmethylated reference genomes, and their corresponding complementary strands in BLAT (i.e. eight single-stranded genomes in total). We identified 44,210 probes (11,772 Type I probes and 32,438 Type II probes) with ≥ 47 nucleotide off-target matches including the end base, which were defined as potentially cross-hybridising. A list of these probes is available in the supplementary information of this paper (Supplementary Tables 2–3). Consistent with findings on the Infinium HumanMethylation450 BeadChip [5], a larger proportion of non-CpG-targeting probes (Probe ID prefix = “ch”) were identified as potentially cross-hybridising compared to CpG-targeting probes (Probe ID prefix = “cg”). Of 863,904 CpG-targeting probes present on the array, 42,558 (4.9% of total CpG-targeting probes) were identified as potentially cross-hybridising (Supplementary Table 2). In contrast, of 2932 non-CpG targeting probes, we found only 1280 to bind specifically to their targets while the remaining 1652 were potentially cross-hybridising (56% of total non-CpG targeting probes; Supplementary Table 3), based on the information provided in the Illumina manifest.

Discussion

In order to identify probes that might compromise the performance of the Illumina Infinium MethylationEPIC BeadChip, we have generated lists of probes that may be affected by non-specific binding and/or polymorphisms at the target site. Our in silico analyses identified 44,210 probes (5.1% total probe content) with potential off-target binding sites and 23,399 probes (2.7% total probe content) whose target site contains a polymorphism with a MAF ≥ 0.05 in at least one population studied, which may lead to artefactual signal due to impaired probe-binding. We recommend that users take these probes into consideration when analysing data on this platform, applying the appropriate filtering criteria in a population-specific manner, where possible. We recognise that there may be some situations where retaining probes mapping to polymorphic target sites will be desirable. For example, a difference in methylation due to a SNP that creates or destroys a CpG at a target site may be informative if it confers a change in disease risk. Chen et al. (2013) [5] demonstrated that autosomal probes defined as potentially cross-hybridising according to their criterion of an off-target match of 47/50 bases, including the end nucleotide, showed an enrichment for off-target binding sites on the sex chromosomes. Failure to exclude these probes could, therefore, result in the spurious conclusion that these loci are differentially methylated between males and females. Following their methods, we have identified probes on the Infinium MethylationEPIC BeadChip with the potential to hybridise to multiple genomic regions, thus generating off-target signal. We suggest the exclusion of these probes prior to data analysis. Although the exclusion of potentially cross-hybridising probes defined using this method is likely to result in an improvement in the validity of the results obtained from the array, it is likely that the actual extent of off-target binding will vary by locus. Factors such as local sequence composition, including the presence of polymorphisms underlying the probe sequence, are likely to play a role in determining the likelihood of cross-hybridisation. It is, therefore, recommended that any results of interest that may have been generated due to cross-hybridisation are checked using an alternative technique, such as pyrosequencing of bisulphite-converted DNA. In summary, we have produced lists of probes on the new Illumina Infinium MethylationEPIC BeadChip that measure methylation at sites affected by polymorphisms and/or have the potential to cross-hybridise. Based on the wide-spread use of the HumanMethylation450 BeadChip, we predict that the Illumina Infinium MethylationEPIC BeadChip will play a central role in epigenome-wide association studies (EWAS) over the next few years. As such, it is essential that factors affecting the performance of the array, such as probe specificity and sequence polymorphisms, which we have demonstrated to potentially affect a substantial proportion of probes, are taken into consideration. We recommend that the resources supplied with this paper be used in conjunction with additional standard quality control measures, such as excluding probes with low signal-to-background ratios, omission of samples with a high proportion of such probes, and appropriate data normalisation strategies (for review see Wilhelm-Benartzi et al., 2013 [9]), in order to maximise the likelihood of producing meaningful results. The following are the supplementary data related to this article.

Supplementary Table 1

Table of Infinium MethylationEPIC probes with potentially polymorphic targets

Supplementary Table 2

List of potentially cross-hybridising CpG-targeting probes

Supplementary Table 3

List of potentially cross-hybridising non-CpG-targeting probes

Conflict of interest

The authors declare no conflicts of interest.
Specifications [standardized info for the reader]
Organism/cell line/tissueHomo sapiens genome sequence (Hg19)
Sexn/a
Sequencer or array typeIllumina Infinium MethylationEPIC Array
Data formatAnalysed: Table of polymorphic targets and lists of crosshybridising probes
Experimental factorsInfinium MethylationEPIC probe data, 1000 genomes phase 3 data, UCSC genome browser human reference genome sequence
Experimental featuresIn silico alignment of Infinium MethylationEPIC probe sequences to bisulfite converted genome sequences (Hg19) and cross-referencing of probe target coordinates to 1000 genomes project phase 3 data.
ConsentRaw data available from Illumina, UCSC genome browser and 1000 genomes project
Sample source locationn/a
  9 in total

1.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

2.  Evaluation of the Infinium Methylation 450K technology.

Authors:  Sarah Dedeurwaerder; Matthieu Defrance; Emilie Calonne; Hélène Denis; Christos Sotiriou; François Fuks
Journal:  Epigenomics       Date:  2011-12       Impact factor: 4.778

Review 3.  Methods in DNA methylation profiling.

Authors:  Tao Zuo; Benjamin Tycko; Ta-Ming Liu; Juey-Jen L Lin; Tim H-M Huang
Journal:  Epigenomics       Date:  2009-12       Impact factor: 4.778

4.  High density DNA methylation array with single CpG site resolution.

Authors:  Marina Bibikova; Bret Barnes; Chan Tsan; Vincent Ho; Brandy Klotzle; Jennie M Le; David Delano; Lu Zhang; Gary P Schroth; Kevin L Gunderson; Jian-Bing Fan; Richard Shen
Journal:  Genomics       Date:  2011-08-02       Impact factor: 5.736

Review 5.  DNA methylation: roles in mammalian development.

Authors:  Zachary D Smith; Alexander Meissner
Journal:  Nat Rev Genet       Date:  2013-02-12       Impact factor: 53.242

Review 6.  DNA methylation in development and human disease.

Authors:  Suhasni Gopalakrishnan; Beth O Van Emburgh; Keith D Robertson
Journal:  Mutat Res       Date:  2008-08-20       Impact factor: 2.433

7.  A global reference for human genetic variation.

Authors:  Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

8.  Discovery of cross-reactive probes and polymorphic CpGs in the Illumina Infinium HumanMethylation450 microarray.

Authors:  Yi-an Chen; Mathieu Lemire; Sanaa Choufani; Darci T Butcher; Daria Grafodatskaya; Brent W Zanke; Steven Gallinger; Thomas J Hudson; Rosanna Weksberg
Journal:  Epigenetics       Date:  2013-01-11       Impact factor: 4.528

Review 9.  Review of processing and analysis methods for DNA methylation array data.

Authors:  C S Wilhelm-Benartzi; D C Koestler; M R Karagas; J M Flanagan; B C Christensen; K T Kelsey; C J Marsit; E A Houseman; R Brown
Journal:  Br J Cancer       Date:  2013-08-27       Impact factor: 7.640

  9 in total
  111 in total

1.  Cell lineage-specific genome-wide DNA methylation analysis of patients with paediatric-onset systemic lupus erythematosus.

Authors:  Kit San Yeung; Tsz Leung Lee; Mo Yin Mok; Christopher Chun Yu Mak; Wanling Yang; Patrick Chun Yin Chong; Pamela Pui Wah Lee; Marco Hok Kung Ho; Sanaa Choufani; Chak Sing Lau; Yu Lung Lau; Rosanna Weksberg; Brian Hon Yin Chung
Journal:  Epigenetics       Date:  2019-03-16       Impact factor: 4.528

2.  Blood-Derived DNA Methylation Signatures of Crohn's Disease and Severity of Intestinal Inflammation.

Authors:  Hari K Somineni; Suresh Venkateswaran; Varun Kilaru; Urko M Marigorta; Angela Mo; David T Okou; Richard Kellermayer; Kajari Mondal; Dawayland Cobb; Thomas D Walters; Anne Griffiths; Joshua D Noe; Wallace V Crandall; Joel R Rosh; David R Mack; Melvin B Heyman; Susan S Baker; Michael C Stephens; Robert N Baldassano; James F Markowitz; Marla C Dubinsky; Judy Cho; Jeffrey S Hyams; Lee A Denson; Greg Gibson; David J Cutler; Karen N Conneely; Alicia K Smith; Subra Kugathasan
Journal:  Gastroenterology       Date:  2019-02-16       Impact factor: 22.682

3.  Pre-diagnostic DNA methylation patterns differ according to mammographic breast density amongst women who subsequently develop breast cancer: a case-only study in the EPIC-Florence cohort.

Authors:  Saverio Caini; Giovanni Fiorito; Domenico Palli; Benedetta Bendinelli; Silvia Polidoro; Valentina Silvestri; Laura Ottini; Daniela Ambrogetti; Ines Zanna; Calogero Saieva; Giovanna Masala
Journal:  Breast Cancer Res Treat       Date:  2021-06-08       Impact factor: 4.872

4.  Maternal fatty acid concentrations and newborn DNA methylation.

Authors:  Sonia L Robinson; Sunni L Mumford; Weihua Guan; Xuehuo Zeng; Keewan Kim; Jeannie G Radoc; Mai-Han Trinh; Kerry Flannagan; Enrique F Schisterman; Edwina Yeung
Journal:  Am J Clin Nutr       Date:  2020-03-01       Impact factor: 7.045

5.  Biomarkers of Exposure and Effect in the Lungs of Smokers, Nonsmokers, and Electronic Cigarette Users.

Authors:  Min-Ae Song; Jo L Freudenheim; Theodore M Brasky; Ewy A Mathe; Joseph P McElroy; Quentin A Nickerson; Sarah A Reisinger; Dominic J Smiraglia; Daniel Y Weng; Kevin L Ying; Mark D Wewers; Peter G Shields
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2019-12-17       Impact factor: 4.254

6.  Exposure to polybrominated biphenyl and stochastic epigenetic mutations: application of a novel epigenetic approach to environmental exposure in the Michigan polybrominated biphenyl registry.

Authors:  Sarah W Curtis; Dawayland O Cobb; Varun Kilaru; Metrecia L Terrell; M Elizabeth Marder; Dana Boyd Barr; Carmen J Marsit; Michele Marcus; Karen N Conneely; Alicia K Smith
Journal:  Epigenetics       Date:  2019-06-14       Impact factor: 4.528

7.  Measured maternal prepregnancy anthropometry and newborn DNA methylation.

Authors:  Edwina H Yeung; Weihua Guan; Sunni L Mumford; Robert M Silver; Cuilin Zhang; Michael Y Tsai; Enrique F Schisterman
Journal:  Epigenomics       Date:  2019-01-08       Impact factor: 4.778

8.  Intraspecific and interspecific investigations of skeletal DNA methylation and femur morphology in primates.

Authors:  Genevieve Housman; Ellen E Quillen; Anne C Stone
Journal:  Am J Phys Anthropol       Date:  2020-03-14       Impact factor: 2.868

9.  DNA Methylation of LRRC3B: A Biomarker for Survival of Early-Stage Non-Small Cell Lung Cancer Patients.

Authors:  Yichen Guo; Ruyang Zhang; Sipeng Shen; Yongyue Wei; Sebastian Moran Salama; Thomas Fleischer; Maria Moksnes Bjaanæs; Anna Karlsson; Maria Planck; Li Su; Zhaozhong Zhu; Johan Staaf; Åslaug Helland; Manel Esteller; David C Christiani
Journal:  Cancer Epidemiol Biomarkers Prev       Date:  2018-09-05       Impact factor: 4.254

10.  PTSD is associated with increased DNA methylation across regions of HLA-DPB1 and SPATC1L.

Authors:  Seyma Katrinli; Yuanchao Zheng; Aarti Gautam; Rasha Hammamieh; Ruoting Yang; Suresh Venkateswaran; Varun Kilaru; Adriana Lori; Rebecca Hinrichs; Abigail Powers; Charles F Gillespie; Aliza P Wingo; Vasiliki Michopoulos; Tanja Jovanovic; Erika J Wolf; Regina E McGlinchey; William P Milberg; Mark W Miller; Subra Kugathasan; Marti Jett; Mark W Logue; Kerry J Ressler; Alicia K Smith
Journal:  Brain Behav Immun       Date:  2020-11-03       Impact factor: 7.217

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.