| Literature DB >> 33789109 |
Jacob R Leistico1, Priyanka Saini2, Christopher R Futtner3, Miroslav Hejna1, Yasuhiro Omura3, Pritin N Soni2, Poorva Sandlesh2, Magdy Milad2, Jian-Jun Wei4, Serdar Bulun2, J Brandon Parker2, Grant D Barish3, Jun S Song5, Debabrata Chakravarti6.
Abstract
Understanding the epigenomic evolution and specificity of disease subtypes from complex patient data remains a major biomedical problem. We here present DeCET (decomposition and classification of epigenomic tensors), an integrative computational approach for simultaneously analyzing hierarchical heterogeneous data, to identify robust epigenomic differences among tissue types, differentiation states, and disease subtypes. Applying DeCET to our own data from 21 uterine benign tumor (leiomyoma) patients identifies distinct epigenomic features discriminating normal myometrium and leiomyoma subtypes. Leiomyomas possess preponderant alterations in distal enhancers and long-range histone modifications confined to chromatin contact domains that constrain the evolution of pathological epigenomes. Moreover, we demonstrate the power and advantage of DeCET on multiple publicly available epigenomic datasets representing different cancers and cellular states. Epigenomic features extracted by DeCET can thus help improve our understanding of disease states, cellular development, and differentiation, thereby facilitating future therapeutic, diagnostic, and prognostic strategies.Entities:
Keywords: HOXA13; cancer; epigenomics; leiomyoma; support tensor machine; tensor decomposition
Mesh:
Substances:
Year: 2021 PMID: 33789109 PMCID: PMC8111960 DOI: 10.1016/j.celrep.2021.108927
Source DB: PubMed Journal: Cell Rep Impact factor: 9.995
Figure 1.DeCET uncovers epigenetic patterns specific to myometrium and uterine leiomyoma subtypes
(A) Schematic illustration of how the data tensor is decomposed into characteristic modes in each index space (condition (c), patient (pt), assay (a), and genomic location (loc). The bottom portion shows how the decomposition represents each ChIP-seq profile as a projection onto independent spatial patterns of histone modifications.
(B) The projections of ChIP-seq datasets onto the first 10 HOSVD location vectors.
(C) Unsupervised hierarchical clustering of the 21 patient-matched samples using 8 of the first 10 HOSVD projections for each assay (Figure 1B; STAR Methods). The columns correspond to an assay and location vector index pair. Leiomyoma tissues are labeled by observed mutations (Table S1): MED12 exon 2 mutations (MED12-mut), HMGA2 overexpression (HMGA2 high), biallelic loss of FH (FH low), or unknown if none of the above three were observed.
See also Figures S1–S4 and Tables S1, S2, S3, S4, and S5.
Figure 2.DeCET yields a robust epigenomic classifier of uterine leiomyoma disease status
(A) UCSC genome browser tracks (http://genome.ucsc.edu) of the H3K27ac ChIP-seq data and the identified differential regions at the HOXA cluster. Significantly differentially expressed genes (STAR Methods) are shown below the ChIP-seq tracks (FC = fold change).
(B) Illustration of projecting the data for an additional patient not used in the HOSVD onto the location vectors (STAR Methods). The projections were ℓ2-normalized across the first 10 location vectors; this normalization was used for all classifiers.
(C) STM classification of the 21 patient-matched samples and 7 additional leiomyoma samples. The x axis shows the signed distance from the decision boundary hyperplane (black line). The y axis shows the first principal component of the data projected onto the decision boundary.
See also Figure S5.
Figure 3.Genomic annotation and distribution of epigenetic alterations in leiomyomas
(A) Distribution of the patient mean projection onto the second basis vector of the assay space for the differential genomic bins higher (top) or lower (bottom) in leiomyoma (STAR Methods). The sign of the x axis was flipped in the bottom plot to make the interpretation of the direction consistent (STAR Methods).
(B) Heatmap showing the mean difference in the normalized ChIP-seq signal between leiomyoma and myometrium at the differential regions identified from the fourth HOSVD vector. Within each functional class, the bins (rows) are sorted by the absolute value of the fourth location vector component (the most differential bins being at the top). Each column is scaled by the 90th percentile of its absolute entry values.
(C and D) The spatial distributions of the center of the 2-kb genomic bins identified in (B) as being differential promoters and enhancers, relative to the nearest gene TSS.
See also Figure S6.
Figure 4.Contact domains confine epigenetic alterations in uterine leiomyomas
(A) Ratio of the discrete wavelet transform coefficients of the binarized fourth location vector (top) and patient mean normalized H3K27ac ChIP-seq data extracted at the differential regions identified from this vector (bottom).
(B) UCSC genome browser track of HeLa contact domains (Rao et al., 2014) and the fourth location vector signal in the region chromosome 6 (chr6): 125,088,000–130,138,000.
(C) Pairwise correlation of the fourth location vector signal at binned regions within and flanking contact domains (STAR Methods). The 15 bins are of the same size and ordered by their genomic position.
(D) Same as (C), but for random domain locations obtained by moving each contact domain to a random location along the same chromosome.
(E) Same as (C), but after shuffling the vector components, while fixing the contact domains at the true locations.
(F) Contact domains sorted by the summed fourth location vector signal at significantly altered regions within each domain (STAR Methods). Some top domains are labeled by the gene showing the greatest differential expression within the corresponding domain.
See also Figure S6.
Figure 5.HOXA13 is elevated in and regulates leiomyoma pathogenesis
(A) (Left) Representative images showing HOXA13 IHC staining in the normal myometrium (Myo) and leiomyoma (Leio) (scale = 50 μm). (Right) Scatter dot plot of H-score measured for nuclear HOXA13 in normal myometrium (Myo; n = 70) and leiomyoma (Leio; n = 57) from HOXA13-stained tissue microarrays. Vertical dashed line shows mean H-score for each condition. p value was from the two-tailed t test.
(B) Relative mRNA levels of MEDAG, PDK4, LIMK1, SDC1 measured by qRT-PCR in primary leiomyoma cells treated with shControl or shHOXA13. Mean fold-change in shHOXA13 relative to shControl is shown, with error bars representing standard deviation of biological replicates (n = 3). Significance was from the two-tailed t test (***p < 0.001, **p < 0.01).
(C) Hierarchically clustered heatmap of differentially expressed genes (adjusted p value < 0.05) in HOXA13-overexpressing primary myometrial cells from three patients (samples 1, 2, and 3). Gene expression relative to the mean expression in control and HOXA13 construct-containing cells are shown as row Z scores.
(D) Bar plot of GO enrichment scores (Zhou et al., 2019) for differentially expressed genes in HOXA13-overexpressing cells.
See also Figure S7 and Tables S2, S6, and S7.
Figure 6.DeCET reveals epigenetic organization of tissue types and differentiation states in REMC data
(A) Hierarchical clustering of 34 adult human tissues using the projections onto the first 13 HOSVD location vectors. The REMC sample identifiers and associated laboratory are shown on the right.
(B) Hierarchical clustering of 10 adult human muscle tissues using the projections onto the first 10 location vectors.
(C) Hierarchical clustering of 8 T cell samples representing three differentiation states using the projections onto the first 9 location vectors.
See also Figure S8 and Table S8.
Figure 7.DeCET provides epigenetic stratification of breast and prostate cancer subtypes
(A) Hierarchical clustering of 13 breast cancer cell lines based on three histone modifications (STAR Methods). The 6 location vectors with greatest variance in the projections across samples were used for clustering (STAR Methods).
(B) The expression pattern of known marker genes for breast cancer subtypes (Dai et al., 2017). The cell lines are ordered according to the clustering in (A). Each row is scaled by its largest value for visualization.
(C) Hierarchical clustering of prostate cancer samples using the 15 location vectors with greatest variation in the projections (STAR Methods). Samples are colored according to the status of a biochemical recurrence (case) or no recurrence (control). The cluster labels, named based on expression signatures, represent the DeCET clustering.
See also Figure S8 and Table S8.
KEY RESOURCES TABLE
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Antibodies | ||
| Rabbit polyclonal anti-GAPDH | Sigma-Aldrich | G9545; RRID:AB_796208 |
| Rabbit polyclonal anti-HOXA13 | Abcam | ab106503; RRID:AB_11128701 |
| Rabbit polyclonal anti-H3K27ac | Active Motif | 39133; RRID:AB_2561016 |
| Rabbit polyclonal anti-H3K4me3 | Diagenode | C15410003; RRID:AB_2616052 |
| Rabbit polyclonal anti-H3K4me1 | Diagenode | C15410194; RRID:AB_2637078 |
| Rabbit polyclonal anti-Histone3 | Abcam | ab1791; RRID:AB_302613 |
| Mouse monoclonal anti- V5 | Thermo Fisher Scientific | R960-25; RRID:AB_2556564 |
| Mouse monoclonal anti-HMGA2 | Genetex | GTX629478 |
| Biological samples | ||
| Fresh human uterine leiomyoma and matched myometrium tissues | Northwestern University Prentice Women's Hospital | N/A |
| Chemicals, peptides, and recombinant proteins | ||
| HOXA13 inserted pLEX_306 plasmid | This paper | N/A |
| Critical commercial assays | ||
| SimpleChIP kit | Cell Signaling Technology | 9003 |
| Kapa hyper prep kit | Kapa Biosystems | KK8502 |
| Kapa quantification kit | Kapa Biosystems | KK4835 |
| RNeasy Fibrous tissue kit | QIAGEN | 74704 |
| TruSeq stranded mRNA kit | Illumina | 20020594 |
| DNeasy blood and tissue kit | QIAGEN | 69504 |
| CellTiter-Glo® 2.0 Cell Viability Assay | Promega | G9241 |
| Deposited Data | ||
| ChIP-seq, RNA-seq, ATAC-seq | This paper, Gene Expression Omnibus | GEO: GSE142332 |
| Experimental models: Cell lines | ||
| Primary uterine leiomyoma and myometrial cells | Fresh tissues | N/A |
| Oligonucleotides | ||
| Primers for | This paper | N/A |
| Primers for | This paper | N/A |
| Primers for | This paper | N/A |
| Primers for | This paper | N/A |
| Primers for | This paper | N/A |
| Primers for | This paper | N/A |
| Primers for | This paper | N/A |
| Primers for | This paper | N/A |
| Primers for | This paper | N/A |
| shRNA (sh | Sigma | TRCN0000015406 |
| Software and algorithms | ||
| Indigo | GitHub | |
| FastQC v0.11.5 | Babraham Bioinformatics | |
| Bowtie2 v2.3.2 | ||
| trim galore v0.4.4 | Babraham Bioinformatics | |
| Picard v2.10.1 | Broad Institute | |
| MACS2 v2.1.1 | ||
| STAR v2.5.3a | ||
| DESeq2 | ||
| ATACseqQC | ||
| SAMtools v1.7 | ||
| Bedtools v2.26.0 | ||
| DAVID | ||
| Python v3.6.1 and v3.7.3 | Python | |
| NumPy v1.18.5 and v1.16.4 | ||
| PyTorch v0.4.0 | A. Paszke et al., 2017, NIPS Autodiff Workshop, conference | |
| Tensorly | ||
| SciPy | ||
| Scikit-learn v0.21.2 | ||
| pandas v0.24.2 | W. McKinney, 2010, Proc. Python Sci. Conf., conference | |
| Seaborn v0.9.0 | ||
| PyWavelets v1.0.3 | ||
| Ranking of Super Enhancer (ROSE) | ||
| The Human Genome Browser at UCSC | ||
| Integrative Genomics Viewer | ||
| BioEdit | ||
| GREAT v4.0.4 | ||
| DeCET | This paper | |
| Metascape | ||
| R v3.6.3 | R | |
| edgeR v3.28.1 | ||
| pheatmap v1.0.12 | ||