| Literature DB >> 34769020 |
Teresa Szczepińska1,2, Ayatullah Faruk Mollah1,3, Dariusz Plewczynski1,4.
Abstract
The nature of genome organization into two basic structural compartments is as yet undiscovered. However, it has been indicated to be a mechanism of gene expression regulation. Using the classification approach, we ranked genomic marks that hint at compartmentalization. We considered a broad range of marks, including GC content, histone modifications, DNA binding proteins, open chromatin, transcription and genome regulatory segmentation in GM12878 cells. Genomic marks were defined over CTCF or RNAPII loops, which are basic elements of genome 3D structure, and over 100 kb genomic windows. Experiments were carried out to empirically assess the whole set of features, as well as the individual features in classification of loops/windows, into compartment A or B. Using Monte Carlo Feature Selection and Analysis of Variance, we constructed a ranking of feature importance for classification. The best simple indicator of compartmentalization is DNase-seq open chromatin measurement for CTCF loops, H3K4me1 for RNAPII loops and H3K79me2 for genomic windows. Among DNA binding proteins, this is RUNX3 transcription factor for loops and RNAPII for genomic windows. Chromatin state prediction methods that indicate active elements like promoters, enhancers or heterochromatin enhance the prediction of loop segregation into compartments. However, H3K9me3, H4K20me1, H3K27me3 histone modifications and GC content poorly indicate compartments.Entities:
Keywords: 3D genome structure; GC content; H3K27me3; H3K4me1; H3K79me2; H3K9me3; H4K20me1; chromatin compartments; epigenetic modifications; open chromatin
Mesh:
Substances:
Year: 2021 PMID: 34769020 PMCID: PMC8584073 DOI: 10.3390/ijms222111591
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Feature evaluation diagram.
Compartment prediction performance using various genomic marks of CTCF convergent loops for various classifiers (all 428 features are considered for classification).
| Classifier | R | P | F-Score | ACC | AUC | SD | RMSE |
|---|---|---|---|---|---|---|---|
| GNB | 0.8139 | 0.8510 | 0.8179 | 0.8139 | 0.9138 | 0.0301 | 0.4296 |
| KNN | 0.8876 | 0.8912 | 0.8881 | 0.8876 | 0.9357 | 0.0234 | 0.2924 |
| LDA | 0.9175 | 0.9185 | 0.9166 | 0.9175 | 0.9677 | 0.0172 | 0.2509 |
| ADB | 0.9142 | 0.9158 | 0.9134 | 0.9142 | 0.9703 | 0.0123 | 0.4697 |
| SVM | 0.9264 | 0.9276 | 0.9253 | 0.9264 | 0.9689 | 0.0218 | 0.2353 |
| MLP | 0.9253 | 0.9262 | 0.9249 | 0.9253 | 0.9728 | 0.0167 | 0.2515 |
| RFO | 0.9209 | 0.9231 | 0.9195 | 0.9209 | 0.9732 | 0.0147 | 0.2419 |
|
|
|
|
|
|
|
|
|
Figure 2Top decile of the most important features for classification of CTCF convergent loops into compartments.
The highest position in the ranking of different genomic marks for loops and genomic windows. All DNA binding proteins are represented together here.
| The Highest Place in the Ranking | ||||
|---|---|---|---|---|
| Assay Type | CTCF Convergent | CTCF Tandem | RNAPII | 100 kb Windows |
| Genome segmentation | 1 | 1 | 1 | 1 |
| Open chromatin | 7 | 7 | 19 | 13 |
| H3K4me1 | 10 | 9 | 7 | 8 |
| H3K4me2 | 11 | 10 | 10 | 9 |
| H3K9ac | 13 | 13 | 17 | 11 |
| DNA binding protein | 14 | 19 | 35 | 42 |
| H3K27ac | 15 | 17 | 14 | 25 |
| RNA-seq | 18 | 16 | 9 | 36 |
| H3K4me3 | 19 | 14 | 16 | 45 |
| H3K36me3 | 23 | 24 | 12 | 12 |
| H3K79me2 | 25 | 20 | 13 | 6 |
| Nascent RNA | 26 | 28 | 14 | 32 |
| Replication time | 30 | 34 | 120 | 43 |
| H2A.Z | 54 | 55 | 33 | 37 |
| DNA methylation at CpG sites | 60 | 53 | 70 | 151 |
| H3K27me3 | 104 | 56 | 30 | 345 |
| H4K20me1 | 119 | 102 | 45 | 194 |
| H3K9me3 | 336 | 327 | 203 | 270 |
| GC% | 409 | 384 | 51 | 84 |
Compartment prediction performance using various genomic marks of CTCF tandem loops for various classifiers (all 428 features are considered for classification).
| Classifier | R | P | F-Score | ACC | AUC | SD | RMSE |
|---|---|---|---|---|---|---|---|
| GNB | 0.7122 | 0.8259 | 0.7318 | 0.7122 | 0.8665 | 0.0331 | 0.5354 |
| KNN | 0.8613 | 0.8648 | 0.8622 | 0.8613 | 0.8942 | 0.0292 | 0.3226 |
| LDA | 0.8912 | 0.8897 | 0.8887 | 0.8912 | 0.9365 | 0.0246 | 0.2846 |
| ADB | 0.8946 | 0.8942 | 0.8927 | 0.8946 | 0.9409 | 0.0201 | 0.4778 |
| SVM | 0.8976 | 0.8962 | 0.8938 | 0.8976 | 0.9287 | 0.036 | 0.2736 |
| MLP | 0.8930 | 0.8928 | 0.8919 | 0.893 | 0.9379 | 0.0216 | 0.3038 |
| RFO | 0.9004 | 0.8997 | 0.8962 | 0.9004 | 0.9411 | 0.0235 | 0.2736 |
|
|
|
|
|
|
|
|
|
Figure 3Top decile of the most important features for classification of CTCF tandem loops into compartments.
Compartment prediction performance using various genomic marks of RNA Pol II loops for various classifiers (all 428 features are considered for classification).
| Classifier | R | P | F-Score | ACC | AUC | SD | RMSE |
|---|---|---|---|---|---|---|---|
| GNB | 0.4544 | 0.7952 | 0.4963 | 0.4544 | 0.7736 | 0.0692 | 0.7372 |
| KNN | 0.8744 | 0.8639 | 0.8629 | 0.8744 | 0.7944 | 0.0648 | 0.3253 |
| LDA | 0.8801 | 0.8735 | 0.8713 | 0.8801 | 0.8657 | 0.0567 | 0.3108 |
| ADB | 0.8824 | 0.8754 | 0.8719 | 0.8824 | 0.8665 | 0.0563 | 0.4890 |
| SVM | 0.8785 | 0.7861 | 0.8243 | 0.8785 | 0.8737 | 0.0514 | 0.2730 |
| MLP | 0.8642 | 0.8606 | 0.8596 | 0.8642 | 0.8206 | 0.0607 | 0.3370 |
| RFO | 0.8910 | 0.8855 | 0.8771 | 0.8910 | 0.8676 | 0.0587 | 0.2930 |
|
|
|
|
|
|
|
|
|
Figure 4Top decile of most important features for classification of RNAPII loops into compartments.
Compartment prediction performance using various genomic marks of 100 kb genomic windows for various classifiers (all 428 genomic marks are considered for classification).
| Classifier | R | P | F-Score | ACC | AUC | SD | RMSE |
|---|---|---|---|---|---|---|---|
| GNB | 0.8450 | 0.8476 | 0.8424 | 0.8450 | 0.9189 | 0.0180 | 0.3930 |
| KNN | 0.8693 | 0.8720 | 0.8672 | 0.8693 | 0.9255 | 0.0094 | 0.3141 |
| LDA | 0.8983 | 0.9011 | 0.8980 | 0.8983 | 0.9655 | 0.0116 | 0.2796 |
| ADB | 0.8970 | 0.9009 | 0.8973 | 0.8970 | 0.9654 | 0.0107 | 0.4779 |
| SVM | 0.8991 | 0.9036 | 0.8992 | 0.8991 | 0.9633 | 0.0167 | 0.2703 |
| MLP | 0.8907 | 0.8943 | 0.8908 | 0.8907 | 0.9566 | 0.0114 | 0.3116 |
| RFO | 0.8960 | 0.8999 | 0.8961 | 0.8960 | 0.9645 | 0.0108 | 0.2738 |
|
|
|
|
|
|
|
|
|
Figure 5Top decile of the most important features for classification of 100 kb windows into compartments.