| Literature DB >> 35573871 |
Michael P Mariani1,2, Jennifer A Chen1, Ze Zhang1,3, Steven C Pike1,4, Lucas A Salas1.
Abstract
DNA methylation-based copy number variation (CNV) calling software offers the advantages of providing both genetic (copy-number) and epigenetic (methylation) state information from a single genomic library. This method is advantageous when looking at large-scale chromosomal rearrangements such as the loss of the short arm of chromosome 3 (3p) in renal cell carcinoma and the codeletion of the short arm of chromosome 1 and the long arm of chromosome 19 (1p/19q) commonly seen in histologically defined oligodendrogliomas. Herein, we present MethylMasteR: a software framework that facilitates the standardization and customization of methylation-based CNV calling algorithms in a single R package deployed using the Docker software framework. This framework allows for the easy comparison of the performance and the large-scale CNV event identification capability of four common methylation-based CNV callers. Additionally, we incorporated our custom routine, which was among the best performing routines. We employed the Affymetrix 6.0 SNP Chip results as a gold standard against which to compare large-scale event recall. As there are disparities within the software calling algorithms themselves, no single software is likely to perform best for all samples and all combinations of parameters. The employment of a standardized software framework via creating a Docker image and its subsequent deployment as a Docker container allows researchers to efficiently compare algorithms and lends itself to the development of modified workflows such as the custom workflow we have developed. Researchers can now use the MethylMasteR software for their methylation-based CNV calling needs and follow our software deployment framework. We will continue to refine our methodology in the future with a specific focus on identifying large-scale chromosomal rearrangements in cancer methylation data.Entities:
Keywords: DNA methylation; clear cell renal cell carcinoma; copy number variation; epigenetics; genomics; kidney cancer; methylmaster; multiomics
Year: 2022 PMID: 35573871 PMCID: PMC9098103 DOI: 10.3389/fbinf.2022.859828
Source DB: PubMed Journal: Front Bioinform ISSN: 2673-7647
FIGURE 1 |MethylMasteR workflow overview. The MethylMasteR software package combines common methylation CNV callers using a common Sample Sheet and raw IDAT files and is implemented as a Docker image which can be run as a Docker container on any operating system that has Docker installed. Individual algorithms are then run, and CNV segments are formatted into a common data. frame format, which facilitates visualizations and comparison across algorithms. In addition to the generation of CNV heatmaps and tables, time and memory are also recorded for comparison across algorithms.
FIGURE 2 |Autocorrection as part of the custom workflow. The MethylMasteR software framework enables the easy deployment of custom workflows such as results from the SeSAMe and CnAnalysis450kCancer workflows above. (A) An uncorrected CopyNumber450kCancer-style plot. (B) The corrected CNV segmentation values after processing with SeSAMe segmentation values with the AutoCorrectPeak () function from the CopyNumber450kCancer R package. Abbreviations: L-value - the LRR value or log2 ratio of tumor methyla.
FIGURE 3 |Comparisons of CNV calling routines. Results are shown for the four common software routines incorporated into the MethylMasteR framework and the MethylMasteR custom analysis routine. (A-B). ChAMP required an extended runtime and peak memory for successful completion, while HM450 was fast but memory intensive. SeSAMe and the custom routine were optimal for speed and memory usage, with Epicopy a close second. (C-D). Overlaps with the Affymetrix SNP6 legacy data as the gold standard showed that Epicopy was the least accurate overall, with ChAMP having slightly better recall for identification of losses in kidney cancer but better recall for gains in oligodendroglioma. HM450 and our custom routine were the most accurate overall, and HM450 required less run time but used the highest amount of peak RAM of all the routines (except the copy number neutral astrocyte-like oligodendroglioma runtime results—see Supplementary Table S1.1). SeSAMe was not as accurate in identifying losses as our custom routine and had similar run time performance. This routine was also on par with ChAMP and Epicopy in terms of recall for the oligodendroglioma samples. In general, similar recall results to those obtained for the renal cell carcinoma samples in Figure 3C can be seen for the oligodendroglioma samples as well in Figure 3D.