| Literature DB >> 34976309 |
Madhu Sharma1, Rohit Kumar Verma1, Sunil Kumar2, Vibhor Kumar1.
Abstract
Cell-free DNA(cfDNA) methylation profiling is considered promising and potentially reliable for liquid biopsy to study progress of diseases and develop reliable and consistent diagnostic and prognostic biomarkers. There are several different mechanisms responsible for the release of cfDNA in blood plasma, and henceforth it can provide information regarding dynamic changes in the human body. Due to the fragmented nature, low concentration of cfDNA, and high background noise, there are several challenges in its analysis for regular use in diagnosis of cancer. Such challenges in the analysis of the methylation profile of cfDNA are further aggravated due to heterogeneity, biomarker sensitivity, platform biases, and batch effects. This review delineates the origin of cfDNA methylation, its profiling, and associated computational problems in analysis for diagnosis. Here we also contemplate upon the multi-marker approach to handle the scenario of cancer heterogeneity and explore the utility of markers for 5hmC based cfDNA methylation pattern. Further, we provide a critical overview of deconvolution and machine learning methods for cfDNA methylation analysis. Our review of current methods reveals the potential for further improvement in analysis strategies for detecting early cancer using cfDNA methylation.Entities:
Keywords: Cancer heterogeneity; Cell free DNA; Computation; DMP, Differentially methylated base position; DMR, Differentially methylated regions; Diagnosis; HELP-seq, HpaII-tiny fragment Enrichment by Ligation-mediated PCR sequencing; MBD-seq, Methyl-CpG Binding Domain Protein Capture Sequencing; MCTA-seq, Methylated CpG tandems amplification and sequencing; MSCC, Methylation Sensitive Cut Counting; MSRE, methylation sensitive restriction enzymes; MeDIP-seq, Methylated DNA Immunoprecipitation Sequencing; RRBS, Reduced-Representation Bisulfite Sequencing; WGBS, Whole Genome Bisulfite Sequencing; cfDNA, cell free DNA; ctDNA, circulating tumor DNA; dPCR, digital polymerase chain reaction; ddMCP, droplet digital methylation-specific PCR; ddPCR, droplet digital polymerase chain reaction; scCGI, methylated CGIs at single cell level
Year: 2021 PMID: 34976309 PMCID: PMC8669313 DOI: 10.1016/j.csbj.2021.12.001
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1An overview of techniques for profiling DNA methylation which are also useful for detecting cfDNA. The triangular and circular symbols reveal further details of different methods. The expanded form of abbreviations for different methods are as such:- HELP: HpaII-tiny fragment enrichment by ligation-mediated PCR, CHARM: comprehensive high-throughput arrays for relative methylation, cfNOMe: cell-free DNA-based Nucleosome Occupancy and Methylation profiling, MSCC: methyl-sensitive cut counting, qPCR: Quantitative polymerase chain reaction, TAPS: TET-assisted pyridine borane sequencing, MRE-Seq: methylation restriction enzyme sequencing, RSMA: methylation-sensitive restriction enzyme-based assay, DMH: differential methylation hybridization, ddPCR: droplet digital PCR, EM-Seq: Enzymatic Methyl-seq, MeDip: methylation DNA immunoprecipitation sequencing, MIRA: methylated CpG island recovery assay, mDIP: methylated DNA immunoprecipitation, oxBs-seq: oxidative bisulphite sequencing, WGBS: whole-genome bisulphite sequencing, RRBS: reduced representation bisulphite sequencing, BC-Seq: bisulphite conversion followed by capture and sequencing, BiMP: bisulphite methylation profiling, BSPP: bisulphite padlock probe, TAB-seq: TET-assisted bisulphite sequencing).
Read alignment and Data visualization Tools.
| 1 | BatMeth2 | Indel-sensitive mapping | Removes some parts of reads (soft-clipping) | |
| 2 | BSMAP | Good performance and flexibility due to seeding and hashing | Can detect indels with length less than 3 nucleotides only | |
| 3 | Bismark | Flexible, easy to use and interpret | Increased run time | |
| 4 | BS-Seeker2 | Supports both local and gapped alignments | Local alignment leads to longer CPU times | |
| 5 | BWA-meth | Direct useable output, less storage requirements | doesn’t facilitate data visualization, only supports 3-letter alignment mode | |
| 6 | BSmooth | Ability to handle low coverage experimental data | Assumes methylation profiles to be smooth, not able to detect single CpG sites | |
| 7 | MethylCoder | Allows fast and sensitive mapping in both color and nucleotide space | Uses only short read aligners | |
| 8 | Segemehl | Efficiently handles 3’ and 5’ contaminants along with mismatches and indels | Large memory requirements | |
| 9 | GSNAP | SNP tolerant alignment, splicing and multiple mismatches can be detected | Might be slow for long positions | |
| 10 | BRAT-BW | Runs faster on longer reads | Allows at most one mismatch in user defined reads | |
| 11 | ERNE-BS5 | Analysis of methylation pattern at repeats, skillfully handles multiple mapping reads | Chances of false positives are higher | |
| 12 | GEM3 | Exhaustive search model, fast, scalable, and gapped matches can also be found | some pruning methods are sensitive to mismatches | |
| 13 | Last | High sensitivity and speed | Requires removal of poor quality bases | |
| 14 | Msuite | supports bisulfite-free techniques,4-letter mode of alignment and computationally less expensive | analysis on irregular CpG sites needs additional validation | |
| 15 | TAMeBS | Filters ambiguous read alignments and reduces bias in context of methylated cytosines | Memory requirements and running time are high |
DNA Methylation Calling Software.
| MeDIP-seq | Batman | High resolution and cost-effective whole genome methylome can be obtained | Time-consuming to run even with multiple processors | Bayesian model | |
| MEDME | Provides both relative as well as absolute methylation levels, Can also be used for microarray designs of different platforms | Poor resolution in comparison to bisulfite based methods | Logistic model | ||
| MEDIPS | More user friendly, cost and time effective | Difficult to detect methylation based on single end short reads | T-test, Wilcoxen test | ||
| MeDUSA | Complete analysis of MeDIP-seq data from quality control to DMR calling | Approach employed is less efficient in terms of time and computation | Fisher’s exact test | ||
| MBD-seq | MethylAction | Applicable on larger study designs (four group comparisons), detects DMR’s through bootstrapping | Chances of type one error | Negative binomial and ANODEV (Analysis of Deviance) | |
| Bisulfite-based | RnBeads | High computational efficiency and cross platform analysis | Limited genome annotation packages | Bayes framework and Bartlett test | |
| DMRcate | Easy integration with other bioconductor tools, de novo based method | Make use of 450 k array only | F statistics | ||
| DMRcaller | Detects DMRs in both CpG and non-CpG contexts | Sensitivity and specificity depends on window sizes, based on assumptions | Fisher’s exact test, Z test, Beta regression | ||
| methylKit | Includes clustering functions along with DMRs visualisation | Limited by the memory of computer | Logistic regression and Fisher’s exact test | ||
| MethylSig | Incorporates local information for estimating biological variation | Difficulty in handling heterogeneous data | Beta binomial model | ||
| DSS | Capacity to handle multi factorial experimentation and data without biological replicates | Not suitable for paired design and longitudinal data type | Beta binomial distribution | ||
| MRE-seq | msgbsR | Removes fallacious mapped reads, explores differential methylation | Requires pre-processed raw data | Negative binomial model | |
| 5-hydroxymethylation | BiQ HiMod | user-friendly GUI, locus based methylation analysis and comprehensive analysis pipeline | pre-processed FASTA files are needed | Multiple statistical models |
Fig. 2cfDNA methylation based markers performance on TCGA data. Illumina 450k methylation data-set for bulk tissue (Breast Cancer) was retrieved from TCGA (The Cancer Genome Atlas) database and processed for manually curated literature based markers. (a) boxplot of FPR (False positive rate) vs sensitivity showing performance of single marker for sample class prediction (Breast Cancer vs Normal). Based on LDA (Linear Discriminant Analysis) fitting of TCGA samples for one marker, values for sensitivity and FPR were obtained and presented in the form of box-plot. It can be observed from the plot that a single marker-based approach for detection of disease delivers quite less sensitivity. (b) heatmap showing heterogeneity among biomarkers for the same cancer type. Markers based normalised beta scores for all the TCGA observations were visualised as a heatmap for differential analysis of cancer and non-cancerous observations. This figure demonstrates that such level of heterogeneity among biomarkers can be one of the influencing factors for disease diagnostics and therapeutics.
Fig. 3The visualisation of low dimensional embedding of 5hmC profile of cell free DNA of samples from patients with different types of cancer. Here 2D embedding (using tSNE) of 5hmC profile is shown for samples either using read-count of genes or CpG islands. The results of embedding are shown for read-counts on all genes or only 50 selected genes. Similarly the results of embedding done using all CpG island or only 50 selected CpG islands are also shown. The 5hmC profiles used here published by Song et al. 2017. Using large number of genomic loci (all genes or all CpG island) can provide good separability among samples according to type of cancer. With 5hmc profile, top 50 chosen marker CpG island provide slightly better separability among different pathological condition in comparison to top 50 gene. The purity of clustering after embedding using top 50 markers (genes or CpG islands) is also shown terms of Normalized mutual information (NMI).
Fig. 4Applying different deconvolution techniques on the DNA methylation profiles of cancer and normal samples. Reference-free deconvolution methods such as RefFreeEWAS, ReFACTor and SVA were applied to DNA methylation profiles and projected as tSNE coordinates to analyze sample separability. (a) Here DNA methylation profiles available in the TCGA portal for solid tissue from prostate cancer were used. (b) Deconvolution methods were applied to DNA-methylation profiles of cell-free DNA (cfDNA) extracted from the plasma of individuals with normal and prostate cancer pathotypes from CFEA. The comparative analysis is based on 100 randomly selected CpG sites of the samples.