Literature DB >> 35505214

Workflow to Mine Frequent DNA Co-methylation Clusters in DNA Methylome Data.

Jie Zhang1, Kun Huang2.   

Abstract

The advances in high-throughput nucleotide sequencing technology revolutionized biomedical research. Vast amount of genomic data rapidly accumulates in a daily basis, which in turn calls for the development of powerful bioinformatics tools and efficient workflows to analyze them. One of the approaches to address the "big data" issue is to mine highly correlated clusters/networks of biological molecules, which may provide rich yet hidden information about the underlying functional, regulatory, or structural relationships among genes, proteins, genomic loci or various types of biological molecules or events. A network mining algorithm lmQCM has recently been developed, which can be applied to mine tightly connected correlation clusters (networks) in large biological data with big sample size, and it guarantees a lower bound of the cluster density. This algorithm has been used in a variety of cancer transcriptomic data to mine gene co-expression networks (GCNs), but it can be applied to any correlational matrix. lmQCM is available through R package lmQCM as well as the online tool package TSUNAMI ( https://biolearns.medicine.iu.edu ). In this study, the purpose is to establish an analytical workflow to apply lmQCM for frequent (consensus) cluster mining in multiple DNA methylation datasets in different cancers and extract the underlying common co-methylation networks for genes.Specifically, the workflow is applied to analyze DNA methylome data across different cancer types using lmQCM. It mines frequent DNA methylation clusters based on individual clustering mining results, identifying common as well as distinctive DNA methylation patterns among different cancer types. This workflow has successfully identified frequent GCNs in 33 types of cancers, thus proven to be a powerful tool to analyze large biological data. It helps to identify common features as well as distinctions among different diseases, disease subtypes, or among different biological processes. The resulted frequent clusters may provide new insights on the pathway/function networks. In the case of disease study, the results lead to new directions for biomarker and drug target discovery. The advantages of this workflow include the highly efficient processing of large biological data generated from high-throughput experiments, quick identification of highly correlated interaction networks, substantial reduction of the data dimensionality to a manageable number of variables for downstream comparative analysis, and consequently increased statistical power for detecting differences between conditions.
© 2022. Springer Science+Business Media, LLC, part of Springer Nature.

Entities:  

Keywords:  Cluster mining; DNA co-methylation; Epigenetics; Frequent network mining; Pan-cancer methylation; lmQCM

Mesh:

Substances:

Year:  2022        PMID: 35505214     DOI: 10.1007/978-1-0716-1994-0_12

Source DB:  PubMed          Journal:  Methods Mol Biol        ISSN: 1064-3745


  3 in total

1.  Normalized lmQCM: An Algorithm for Detecting Weak Quasi-Cliques in Weighted Graph with Applications in Gene Co-Expression Module Discovery in Cancers.

Authors:  Jie Zhang; Kun Huang
Journal:  Cancer Inform       Date:  2016-07-24

2.  Gene Co-Expression Analysis Predicts Genetic Variants Associated with Drug Responsiveness in Lung Cancer.

Authors:  Sanaya Shroff; Jie Zhang; Kun Huang
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2016-07-20
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.