Literature DB >> 26553913

CopyNumber450kCancer: baseline correction for accurate copy number calling from the 450k methylation array.

Nour-Al-Dain Marzouka1, Jessica Nordlund1, Christofer L Bäcklin2, Gudmar Lönnerholm3, Ann-Christine Syvänen1, Jonas Carlsson Almlöf1.   

Abstract

UNLABELLED: The Illumina Infinium HumanMethylation450 BeadChip (450k) is widely used for the evaluation of DNA methylation levels in large-scale datasets, particularly in cancer. The 450k design allows copy number variant (CNV) calling using existing bioinformatics tools. However, in cancer samples, numerous large-scale aberrations cause shifting in the probe intensities and thereby may result in erroneous CNV calling. Therefore, a baseline correction process is needed. We suggest the maximum peak of probe segment density to correct the shift in the intensities in cancer samples.
AVAILABILITY AND IMPLEMENTATION: CopyNumber450kCancer is implemented as an R package. The package with examples can be downloaded at http://cran.r-project.org CONTACT: nour.marzouka@medsci.uu.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26553913      PMCID: PMC4896365          DOI: 10.1093/bioinformatics/btv652

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

DNA methylation is the most studied epigenetic modification in cancer and many other diseases. Many technologies and platforms have been developed to facilitate DNA methylation analysis, such as bisulphite conversion followed by analysis using DNA methylation arrays. The Illumina Infinium HumanMethylation450 BeadChip (450k) provides detection of methylation levels of approximately 485000 CpG loci across the human genome (Sandoval ). The 450k platform is based on similar biochemical reaction principle and technology as the Infinium SNP arrays, making it possible to use the 450k platform to detect copy number variants (CNVs) as a zero-cost byproduct of methylation studies (Feber ). Recently, bioinformatics tools were developed for copy number (CN) calling from methylation data (e.g. ChAMP (Morris ), CopyNumber450k and conumee (www.bioconductor.org)). CN calling consists of the following principal steps: normalization of probe intensities, calculation of the Log R Ratios (LRRs), segmentation and determination of copy number status for each segment based on a chosen cutoff level or a P-value threshold. In each step different algorithms, methods and parameters can be applied. However, in cancer samples the large and numerous chromosomal duplications and deletions cause a shift in the LRRs away from the hypothetical baseline level (i.e. 2 n/diploid level) resulting in erroneous CNV calling. The conventional normalization methods (e.g. Quantile and Functional normalization (Fortin )) as well as the median/average sample centering cannot overcome this problem. A similar centering problem was observed in copy number calling from array-based Comparative Genomic Hybridization (aCGH) data for cancer samples. The density of the probes was suggested to help determine the center of the samples and avoid erroneous CNV calls (Lipson ). However, to date there is no tool available to resolve this problem in CN data derived from the 450k array.

2 Description

To resolve the cancer-specific problem of erroneous CN calling in data derived from the 450k array, we provide a freely available R package denoted CopyNumber450kCancer that can run on all operating systems with installed R (version > 3.0) and provides a novel functionality to correct the center in segmentation data obtained from CN calling tools such as CopyNumber450k and ChAMP.

3 Baseline estimation

For the baseline estimation we assumed that the majority of the probes are located close to the correct baseline. Therefore a density function based on segments (weighted on the number of probes) should show maximum peak at the correct baseline. For the correction, the sample log values are shifted in an amount equal to the difference between the sample baseline and the maximum peak level (Fig. 1). We call this method Maximum Density Peak Estimation (MDPE). MDPE avoids the challenges and limitations facing the CN calling from 450k data, such as the absence of B-allele frequency that is available from SNP genotyping arrays. CopyNumber450kCancer has a function for manual revision where samples with inaccurate automatic baseline determination can be interactively corrected.
Fig. 1.

The upper plot shows the CN calling from a cancer sample by CopyNumber450k (first 3 chromosomes shown). The segments above the cutoff represent amplifications, and the segments under the cutoff represent deletions. The horizontal lines represent the cutoffs and the zero level (i.e. baseline). The lower panel shows the segments after the baseline correction by CopyNumber450kCancer

The upper plot shows the CN calling from a cancer sample by CopyNumber450k (first 3 chromosomes shown). The segments above the cutoff represent amplifications, and the segments under the cutoff represent deletions. The horizontal lines represent the cutoffs and the zero level (i.e. baseline). The lower panel shows the segments after the baseline correction by CopyNumber450kCancer To assess the accuracy of the correction method, we performed the auto-correction for 764 public acute lymphoblastic leukemia samples (Nordlund ). All the generated plots (Supplementary 1 and 2) were manually reviewed in comparison to the karyotyping data. The auto-correction changed the baseline for 760 samples, the rest 4 samples (0.5%) had correct baseline without any changes. For 740 samples (96.9%) the auto-correction was correct. Manual modification was needed for 20 samples (2.6%). The difference between the corrected and uncorrected CN data was significant in all the cytogenetic subtypes (Supp. 3).

4 Input and output

CopyNumber450kCancer uses a simple input data structure that can be generated by a wide range of segmentation tools. The package requires two input files. The first file contains the genomic regions for all samples with log values and number of the probes in each segment. The second file contains the samples names. To facilitate the analysis, the tool can directly read the segmentation output file from the CopyNumber450k package. CopyNumber450kCancer generates a corrected segmentation file, corrected plots, a QC file and a baseline shifting file.

5 Quality control

There is no well-defined quality control (QC) standard for 450k data segmentation. Therefore we selected QC standards developed for SNP array data. A QC file is generated with the following SNP QC standards for each sample; InterQuartile Range (IQR), Median Absolute Pairwise Difference (MAPD), number of segments and standard deviation (SD). An additional QC measurement called Maximum Density Peak Sharpness (MDPS) is also reported. The QC values are calculated based on the log values of the segments instead of the individual probes. CopyNumber450kCancer does not provide any fixed QC thresholds as they may differ from one experiment to another and are also dependent on the type of analysis. The user can use the QC file to exclude samples that have low quality QC values. However, we strongly recommend the visual reviewing option in order to recognize low-quality samples.

6 Interactive revision

CopyNumber450kCancer provides graphical interactive plots as an option to supervise/review the baseline estimation. The user can select a log ratio interval wherein the baseline should be located. For the reviewing step, we strongly recommend to use any external sample information (e.g. karyotyping) that can help the user decide the correct baseline. The package comes with example files that can be run directly for auto-correction and interactive revision: regions <- system.file(“extdata”, “regions.csv”, package = “CopyNumber450kCancer”) sample_list <- system.file(“extdata”, “sample_list.csv”,package= “CopyNumber450kCancer”) object <- ReadData(regions, sample_list) object <- AutoCorrectPeak(object) object <- ReviewPlot(object)

7 Conclusion

CN calling from the 450k data is possible, but faces some difficulties in some cancer samples due to incorrect sample centering and baseline shifting. Without solving this issue the CN calling will be inaccurate. We successfully tested the MDPE method on 450k cancer segmentation data. CopyNumber450kCancer package implements the MDPE method together with interactive reviewing to efficiently correct the baseline in cancer samples. The main advantages for CopyNumber450kCancer are: fast auto-correction (few seconds per sample), high accuracy rate, in-sample correction, no input parameters needed, low computer resources required and adaptable to 450k-similar technologies.
  5 in total

1.  ChAMP: 450k Chip Analysis Methylation Pipeline.

Authors:  Tiffany J Morris; Lee M Butcher; Andrew Feber; Andrew E Teschendorff; Ankur R Chakravarthy; Tomasz K Wojdacz; Stephan Beck
Journal:  Bioinformatics       Date:  2013-12-12       Impact factor: 6.937

2.  Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome.

Authors:  Juan Sandoval; Holger Heyn; Sebastian Moran; Jordi Serra-Musach; Miguel A Pujana; Marina Bibikova; Manel Esteller
Journal:  Epigenetics       Date:  2011-06-01       Impact factor: 4.528

3.  Functional normalization of 450k methylation array data improves replication in large cancer studies.

Authors:  Jean-Philippe Fortin; Aurélie Labbe; Mathieu Lemire; Brent W Zanke; Thomas J Hudson; Elana J Fertig; Celia Mt Greenwood; Kasper D Hansen
Journal:  Genome Biol       Date:  2014-12-03       Impact factor: 13.583

4.  Using high-density DNA methylation arrays to profile copy number alterations.

Authors:  Andrew Feber; Paul Guilhamon; Matthias Lechner; Tim Fenton; Gareth A Wilson; Christina Thirlwell; Tiffany J Morris; Adrienne M Flanagan; Andrew E Teschendorff; John D Kelly; Stephan Beck
Journal:  Genome Biol       Date:  2014-02-03       Impact factor: 13.583

5.  Genome-wide signatures of differential DNA methylation in pediatric acute lymphoblastic leukemia.

Authors:  Jessica Nordlund; Christofer L Bäcklin; Per Wahlberg; Stephan Busche; Eva C Berglund; Maija-Leena Eloranta; Trond Flaegstad; Erik Forestier; Britt-Marie Frost; Arja Harila-Saari; Mats Heyman; Olafur G Jónsson; Rolf Larsson; Josefine Palle; Lars Rönnblom; Kjeld Schmiegelow; Daniel Sinnett; Stefan Söderhäll; Tomi Pastinen; Mats G Gustafsson; Gudmar Lönnerholm; Ann-Christine Syvänen
Journal:  Genome Biol       Date:  2013-09-24       Impact factor: 13.583

  5 in total
  12 in total

1.  Global epigenetic profiling identifies methylation subgroups associated with recurrence-free survival in meningioma.

Authors:  Adriana Olar; Khalida M Wani; Charmaine D Wilson; Gelareh Zadeh; Franco DeMonte; David T W Jones; Stefan M Pfister; Erik P Sulman; Kenneth D Aldape
Journal:  Acta Neuropathol       Date:  2017-01-27       Impact factor: 17.088

2.  DNA Copy Number Variation Associated with Anti-tumour Necrosis Factor Drug Response and Paradoxical Psoriasiform Reactions in Patients with Moderate-to-severe Psoriasis.

Authors:  Ancor Sanz-Garcia; Alejandra Reolid; Laura H Fisas; Ester Muñoz-Aceituno; Mar Llamas-Velasco; Antonio Sahuquillo-Torralba; Rafael Botella-Estrada; Jorge García-Martínez; Raquel Navarro; Esteban Daudén; Francisco Abad-Santos; Maria C Ovejero-Benito
Journal:  Acta Derm Venereol       Date:  2021-05-04       Impact factor: 3.875

3.  Transcriptome sequencing in pediatric acute lymphoblastic leukemia identifies fusion genes associated with distinct DNA methylation profiles.

Authors:  Yanara Marincevic-Zuniga; Johan Dahlberg; Sara Nilsson; Amanda Raine; Sara Nystedt; Carl Mårten Lindqvist; Eva C Berglund; Jonas Abrahamsson; Lucia Cavelier; Erik Forestier; Mats Heyman; Gudmar Lönnerholm; Jessica Nordlund; Ann-Christine Syvänen
Journal:  J Hematol Oncol       Date:  2017-08-14       Impact factor: 17.388

4.  MethCNA: a database for integrating genomic and epigenomic data in human cancer.

Authors:  Gaofeng Deng; Jian Yang; Qing Zhang; Zhi-Xiong Xiao; Haoyang Cai
Journal:  BMC Genomics       Date:  2018-02-13       Impact factor: 3.969

5.  DNA Methylation Patterns in Normal Tissue Correlate more Strongly with Breast Cancer Status than Copy-Number Variants.

Authors:  Yang Gao; Martin Widschwendter; Andrew E Teschendorff
Journal:  EBioMedicine       Date:  2018-05-04       Impact factor: 8.143

6.  Refined detection and phasing of structural aberrations in pediatric acute lymphoblastic leukemia by linked-read whole-genome sequencing.

Authors:  Jessica Nordlund; Yanara Marincevic-Zuniga; Lucia Cavelier; Amanda Raine; Tom Martin; Anders Lundmark; Jonas Abrahamsson; Ulrika Norén-Nyström; Gudmar Lönnerholm; Ann-Christine Syvänen
Journal:  Sci Rep       Date:  2020-02-13       Impact factor: 4.379

7.  The signature of liver cancer in immune cells DNA methylation.

Authors:  Yonghong Zhang; Sophie Petropoulos; Jinhua Liu; David Cheishvili; Rudy Zhou; Sergiy Dymov; Kang Li; Ning Li; Moshe Szyf
Journal:  Clin Epigenetics       Date:  2018-01-18       Impact factor: 6.551

8.  A validation and extended description of the Lund taxonomy for urothelial carcinoma using the TCGA cohort.

Authors:  Nour-Al-Dain Marzouka; Pontus Eriksson; Carlos Rovira; Fredrik Liedberg; Gottfrid Sjödahl; Mattias Höglund
Journal:  Sci Rep       Date:  2018-02-27       Impact factor: 4.379

9.  A DNA Methylation Signature of Addiction in T Cells and Its Reversal With DHEA Intervention.

Authors:  Elad Lax; Gal Warhaftig; David Ohana; Rachel Maayan; Yael Delayahu; Paola Roska; Alexander M Ponizovsky; Abraham Weizman; Gal Yadid; Moshe Szyf
Journal:  Front Mol Neurosci       Date:  2018-09-10       Impact factor: 5.639

10.  Recurring urothelial carcinomas show genomic rearrangements incompatible with a direct relationship.

Authors:  Nour-Al-Dain Marzouka; David Lindgren; Pontus Eriksson; Gottfrid Sjödahl; Carina Bernardo; Fredrik Liedberg; Håkan Axelson; Mattias Höglund
Journal:  Sci Rep       Date:  2020-11-11       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.