| Literature DB >> 22964528 |
Hong Pan1, Li Chen, Shaillay Dogra, Ai Ling Teh, Jun Hao Tan, Yubin I Lim, Yen Ching Lim, Shengnan Jin, Yew Kok Lee, Poh Yong Ng, Mei Lyn Ong, Shelia Barton, Yap-Seng Chong, Michael J Meaney, Peter D Gluckman, Walter Stunkel, Chunming Ding, Joanna D Holbrook.
Abstract
The Infinium Human Methylation450 BeadChip Array (TM) (Infinium 450K) is an important tool for studying epigenetic patterns associated with disease. This array offers a high-throughput, low cost alternative to more comprehensive sequencing-based methodologies. Here we compare data generated by interrogation of the same seven clinical samples by Infinium 450K and reduced representation bisulfite sequencing (RRBS). This is the largest data set comparing Infinium 450K array to the comprehensive RRBS methodology reported so far. We show good agreement between the two methodologies. A read depth of four or more reads in the RRBS data was sufficient to achieve good agreement with Infinium 450K. However, we observe that intermediate methylation values (20-80%) are more variable between technologies than values at the extremes of the bimodal methylation distribution. We describe careful processing of Infinium 450K data to correct for known limitations and batch effects. Using methodologies proposed by others and newly implemented and combined in this report, agreement of Infinium 450K data with independent techniques can be vastly improved.Entities:
Mesh:
Year: 2012 PMID: 22964528 PMCID: PMC3469459 DOI: 10.4161/epi.22102
Source DB: PubMed Journal: Epigenetics ISSN: 1559-2294 Impact factor: 4.528
Table 1. Number of CpGs covered by RRBS and shared with Infinium 450K across the seven samples
| 1 | 6,715,332 | 113,571 | 2,846,600 | 65,702 | 0.96 | 0.82 |
| 2 | 6,658,349 | 114,336 | 3,027,004 | 69,041 | 0.96 | 0.83 |
| 3 | 6,250,352 | 109,113 | 2,990,478 | 68,749 | 0.96 | 0.84 |
| 4 | 5,838,715 | 101,765 | 2,628,619 | 60,641 | 0.95 | 0.83 |
| 5 | 5,462,059 | 93,562 | 2,607,467 | 61,366 | 0.96 | 0.82 |
| 6 | 5,817,127 | 93,693 | 2,703,504 | 60,815 | 0.96 | 0.81 |
| 7 | 5,624,859 | 104,305 | 2,855,611 | 68,346 | 0.94 | 0.82 |
| Average | 6,052,399 | 104,335 | 2,808,469 | 64,951 | | |
| Stdev | 496,344 | 8,602 | 167,495 | 3,911 | | |
| Total | 42,366,793 | 73,0345 | 19,659,283 | 0.96 | 0.83 |

Figure 1. Optimal selection of number of reads quality cut-off in RRBS data. (A) Frequency distribution of number of reads in RRBS data. X-axis is truncated at 200 reads but maximum in data are 45,972. (B) Correlation between Infinium 450K and RRBS values using different read cutoffs for RRBS data, estimated using Pearson R (black squares) and Spearman R (black circles) values. Number of CpGs shared between the two technologies and remaining after read cut-offs are indicated by red stars.

Figure 2. Pie charts of CpG coverage in relation to CpG island location. (A) RRBS data average for all seven samples at Nreads ≥ 1. (B) RRBS data average for all seven samples at Nreads ≥ 4. (C) RRBS data average for all seven samples at Nreads ≥ 10. (D) Infinium 450K assays passing QC. (E) Shared data between RRBS (Nreads ≥ 4) and Infinium 450K (n = 454,660). (F) Relationship of proportion CpG island coverage and Nreads cutoff.

Figure 3. Histograms of % methylation value frequency in (A) RRBS data and (B) Infinium 450K data for the 454,660 CpGs covered by both technologies. Data was plotted into 201 bins stepped by 0.5 between 0% and 100%. For RRBS data, y-axis was truncated for clarity, the peak at 0–0.25% extends to 180,000 and the peak at 99.75–100% extends to 28,700. For Infinium 450K data, type I assay data was indicated in red and type II in blue.

Figure 4. The concordance of Raw Infinium 450K data vs. RRBS data (A) A scatter plot of % methylation values from RRBS (x-axis) and % methylation values from Infinium 450K (y-axis) over a density cloud. Density cloud is generated by the smoothed two-dimensional histogram using 50 equally spaced bins in both directions. A random selection of 2000 data from type I probes is plotted as red dots and another random selection of 2000 data from type II probes is plotted as blue dots. (B) Bland-Altman plot for raw Infinium data compared with RRBS Nreads ≥ 4. Average % methylation at each CpG from both methods is on the x-axis. Difference in % at each CpG over the two methods is on the y-axis. Data from Type I assays is shown in red, type II is shown in blue.
Table 2. Number of CpG % methylation values showing agreement within 5%, 10% and 20% ranges, between Infinium 450K and RRBS (Nreads ≥ 4) data, at different levels of Infinium processing
| Difference Range (n = 454,660) | within 20% | within 10% | within 5% | Spearman's Rank R | Pearson's R2 | Slope | MIC | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Type I | Type II | Type I | Type II | Type I | Type II | | | | | |
| 196,937 | 217,721 | 172,003 | 161,520 | 132,267 | 75,144 | 0.83 | 0.92 | 0.83 | 0.81 | |
| 93% | 90% | 81% | 67% | 62% | 31% | p < 0.001 | p < 0.001 | | | |
| 197,089 | 221,887 | 173,872 | 165,158 | 139,888 | 73,510 | 0.83 | 0.92 | 0.87 | 0.81 | |
| 93% | 92% | 82% | 68% | 66% | 30% | p < 0.001 | p < 0.001 | | | |
| 197,089 | 225,828 | 173,872 | 192,004 | 139,888 | 144,070 | 0.83 | 0.93 | 0.92 | 0.81 | |
| 93% | 93% | 82% | 79% | 68% | 60% | p < 0.001 | p < 0.001 | | | |
| 0.83 | 0.93 | 0.93 | 0.81 | |||||||
| 93% | 94% | 82% | 81% | 67% | 62% | p < 0.001 | p < 0.001 | |||
The greatest number of CpGs agreeing at every level, between processing level, are bolded. Overall correlation statistics are also shown (n = 454,660).

Figure 5. Processed Infinium 450K data vs. RRBS data. (A) A scatter plot of % methylation values from RRBS (x-axis) and % methylation values from Infinium 450K (y-axis) over a density cloud. Density cloud is generated as in Figure 4A. (B) Bland-Altman plot for processed Infinium 450K data compared with RRBS Nreads ≥ 4 is generated as in Figure 4B.
Table 3. Number of CpG % methylation values showing agreement within 5%, 10% and 20% ranges, between Infinium 450K and RRBS (Nreads ≥ 10) data, at different levels of Infinium processing
| Difference range (n = 387,789) | Within 20% | Within 10% | Within 5% | Spearman's rank R | Pearson's R2 | Slope | MIC | |||
|---|---|---|---|---|---|---|---|---|---|---|
| Type I | Type II | Type I | Type II | Type I | Type II | | | | | |
| 166,565 | 192,635 | 147,047 | 145,040 | 114,131 | 69,159 | 0.83 | 0.93 | 0.84 | 0.83 | |
| 94% | 92% | 83% | 69% | 64% | 33% | p < 0.001 | p < 0.001 | | | |
| 166,723 | 195,934 | 148,589 | 148,186 | 119,916 | 67,879 | 0.83 | 0.93 | 0.88 | 0.83 | |
| 94% | 93% | 84% | 71% | 67% | 32% | p < 0.001 | p < 0.001 | | | |
| 166,723 | 199,332 | 148,589 | 172,073 | 119,916 | 130,431 | 0.83 | 0.94 | 0.93 | 0.83 | |
| 94% | 95% | 84% | 82% | 67% | 62% | p < 0.001 | p < 0.001 | | | |
| 0.83 | 0.94 | 0.95 | 0.83 | |||||||
| 94% | 95% | 84% | 83% | 69% | 64% | p < 0.001 | p < 0.001 | |||
The greatest number of CpGs agreeing at every level, between processing level, are bolded. Overall correlation statistics are also shown (n = 387,789).
Table 4. Number of CpG % methylation values showing agreement or not (within 10%), between processed Infinium 450K and RRBS data (Nreads ≥ 4), at different % methylation value ranges
| Infinium | Difference | Difference | Total | Median absolute difference | Standard deviation of absolute difference |
|---|---|---|---|---|---|
| 257,897 | 14,671 | 272,568 | 2.00 | 3.96 | |
| | 95% | 5% | | | |
| 29,573 | 46,720 | 76,293 | 13.24 | 12.44 | |
| | 39% | 61% | | | |
| 82,855 | 22,944 | 105,799 | 4.94 | 9.31 | |
| 78% | 22% |
Table 5. Number of CpG % methylation values showing agreement (within 10%), between processed Infinium 450K and RRBS data (Nreads ≥ 4), at different CpG location categories
| CpG location category | Within 10% | Total | % Within 10% | Average methylation value overall | Median methylation value | Standard deviation |
|---|---|---|---|---|---|---|
| Islands | 238,359 | 271,566 | 88% | 16 | 3 | 29 |
| N Shore | 32,959 | 43,640 | 76% | 41 | 28 | 38 |
| S Shore | 30,802 | 40,249 | 77% | 42 | 29 | 38 |
| N Shelf | 8,724 | 12,815 | 68% | 73 | 84 | 28 |
| S Shelf | 7,200 | 10,478 | 69% | 73 | 85 | 28 |
| Open Sea | 53,288 | 75,912 | 70% | 66 | 82 | 33 |

Figure 6. Percentage of assays with methylation values significantly associated with: array on which sample was run, gestational age (GA) or gender; before (blue bars) and after (red bars) data processing, in the set of 72 samples.