| Literature DB >> 32183722 |
Joshua J Levy1,2, Alexander J Titus3, Curtis L Petersen4,5,6, Youdinghuan Chen4,5, Lucas A Salas5, Brock C Christensen5,7.
Abstract
BACKGROUND: DNA methylation (DNAm) is an epigenetic regulator of gene expression programs that can be altered by environmental exposures, aging, and in pathogenesis. Traditional analyses that associate DNAm alterations with phenotypes suffer from multiple hypothesis testing and multi-collinearity due to the high-dimensional, continuous, interacting and non-linear nature of the data. Deep learning analyses have shown much promise to study disease heterogeneity. DNAm deep learning approaches have not yet been formalized into user-friendly frameworks for execution, training, and interpreting models. Here, we describe MethylNet, a DNAm deep learning method that can construct embeddings, make predictions, generate new data, and uncover unknown heterogeneity with minimal user supervision.Entities:
Keywords: DNA methylation; Deep learning; Embedding; High performance computing; Supervised; Transfer learning; Unsupervised; Workflow automation
Mesh:
Year: 2020 PMID: 32183722 PMCID: PMC7076991 DOI: 10.1186/s12859-020-3443-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Step-by-step description of the modular framework: a Train feature extraction network using variational auto-encoders; b Fine-tune encoder for prediction tasks; c Perform hyperparameter scans for (a) and (b); d Identify contributing CpGs; e Interpret the CpGs
Fig. 2Age Results on Test Set (n = 144): a Age predictions derived using the Horvath, Hannum, and MethylNet estimators are compared to the true age of the individual, the predicted ages are plotted on the x-axis, the actual ages on the y-axis, and a line was fit to the data for each estimator; b Comparison of MethylNet Age estimates on Test Set (n = 144) to Horvath and Hannum Age Estimators. 95% confidence intervals for each score were calculated using a one thousand sample non-parametric bootstrap; c Bar chart depicting the overlap of CpGs important to MethylNet and Hannum age estimators where one thousand CpGs with the highest SHAP scores per 10-year age group are divided by the total number of Hannum CpGs that passed QC; d Hierarchical clustering using the correlation distance between SHAP CpG scores for age groups across all CpGs. The linkage is found between similar age groups
Comparison of MethylNet Cell Type Deconvolution Results to IDOL Library EpiDISH Methods. 95% confidence intervals calculated using 1 k-sample non-parametric bootstrap
n = 144, RPC: Robust Partial Correlations
Fig. 3Results on test set (n = 144) for cell-type deconvolution: a For each cell type, the predicted cellular proportion using MethylNet (x-axis) was plotted against the predicted cellular proportion using estimateCellCounts2, which has been found to be a highly accurate measure of cellular proportions and thus serving as the ground truth for comparison, a regression line was fit to the data for each cell type: B-cell, CD4T, CD8T, Monocytes (Mono), NK cells, and Neutrophils (Neu); b Grouped box plot demonstrating the concordance between the distributions of the MethylNet-estimated proportions of each cell-type and the distributions derived using estimateCellCounts2; c Hierarchical clustering using the correlation distance between two cell types’ SHAP CpG scores across all CpGs. The linkage is found between cell types of similar lineage
Fig. 4Results on test set for pan-cancer sub-type predictions: a Comparison of MethylNet derived pan-cancer classification of test set (n = 1676) to UMAP+SVM method. 95% confidence intervals for each score were calculated using a 1000 sample non-parametric bootstrap; b Hierarchical clustering of average embedding cosine distance between all pairs of cancer subtypes. Cancer subtypes from both axes are colored by cancer superclasses, derived using the hierarchical clustering method. The clustering of similar MethylNet embeddings is concordant with known biology of tissue/cancer type difference. Skin and connective tissue cancers, and bile and liver cancers in Cluster 1. All kidney cancers in Cluster 2. Bladder, uterine and cervix cancers in Cluster 3. Pairing of colon and rectal cancers, both adrenal cancers in Cluster 4. A tie between lung adenocarcinoma and mesothelioma in Cluster 5, both of which may develop in similar locations. Pairings between stomach and esophagus cancer, and pancreas and prostate cancers in Cluster 6. Brain cancers in Cluster 7. Thymoma, Diffuse Large B-Cell lymphomas in Cluster 8. While the lung cancers were not paired together, they experienced a high degree of embedded similarity. The connectivity between the lung squamous cell cancer and its neighboring types prevented the two cancers from being grouped together