Literature DB >> 33816958

A machine learning framework for the prediction of chromatin folding in Drosophila using epigenetic features.

Michal B Rozenwald1, Aleksandra A Galitsyna2, Grigory V Sapunov1,3, Ekaterina E Khrameeva2, Mikhail S Gelfand2,4.   

Abstract

Technological advances have lead to the creation of large epigenetic datasets, including information about DNA binding proteins and DNA spatial structure. Hi-C experiments have revealed that chromosomes are subdivided into sets of self-interacting domains called Topologically Associating Domains (TADs). TADs are involved in the regulation of gene expression activity, but the mechanisms of their formation are not yet fully understood. Here, we focus on machine learning methods to characterize DNA folding patterns in Drosophila based on chromatin marks across three cell lines. We present linear regression models with four types of regularization, gradient boosting, and recurrent neural networks (RNN) as tools to study chromatin folding characteristics associated with TADs given epigenetic chromatin immunoprecipitation data. The bidirectional long short-term memory RNN architecture produced the best prediction scores and identified biologically relevant features. Distribution of protein Chriz (Chromator) and histone modification H3K4me3 were selected as the most informative features for the prediction of TADs characteristics. This approach may be adapted to any similar biological dataset of chromatin features across various cell lines and species. The code for the implemented pipeline, Hi-ChiP-ML, is publicly available: https://github.com/MichalRozenwald/Hi-ChIP-ML. ©2020 Rozenwald et al.

Entities:  

Keywords:  Chromatin; DNA folding patterns; Gradient Boosting; Hi-C experiments; Linear Regression; Machine Learning; Recurrent Neural Networks (RNN); Topologically Associating Domains (TADs)

Year:  2020        PMID: 33816958      PMCID: PMC7924456          DOI: 10.7717/peerj-cs.307

Source DB:  PubMed          Journal:  PeerJ Comput Sci        ISSN: 2376-5992


  48 in total

1.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

2.  Integrating distal and proximal information to predict gene expression via a densely connected convolutional neural network.

Authors:  Wanwen Zeng; Yong Wang; Rui Jiang
Journal:  Bioinformatics       Date:  2020-01-15       Impact factor: 6.937

3.  Gene density, transcription, and insulators contribute to the partition of the Drosophila genome into physical domains.

Authors:  Chunhui Hou; Li Li; Zhaohui S Qin; Victor G Corces
Journal:  Mol Cell       Date:  2012-10-04       Impact factor: 17.970

4.  Enhancer-promoter interactions are encoded by complex genomic signatures on looping chromatin.

Authors:  Sean Whalen; Rebecca M Truty; Katherine S Pollard
Journal:  Nat Genet       Date:  2016-04-04       Impact factor: 38.330

5.  Comprehensive mapping of long-range interactions reveals folding principles of the human genome.

Authors:  Erez Lieberman-Aiden; Nynke L van Berkum; Louise Williams; Maxim Imakaev; Tobias Ragoczy; Agnes Telling; Ido Amit; Bryan R Lajoie; Peter J Sabo; Michael O Dorschner; Richard Sandstrom; Bradley Bernstein; M A Bender; Mark Groudine; Andreas Gnirke; John Stamatoyannopoulos; Leonid A Mirny; Eric S Lander; Job Dekker
Journal:  Science       Date:  2009-10-09       Impact factor: 47.728

6.  Condensin-driven remodelling of X chromosome topology during dosage compensation.

Authors:  Emily Crane; Qian Bian; Rachel Patton McCord; Bryan R Lajoie; Bayly S Wheeler; Edward J Ralston; Satoru Uzawa; Job Dekker; Barbara J Meyer
Journal:  Nature       Date:  2015-06-01       Impact factor: 49.962

7.  Dense neural networks for predicting chromatin conformation.

Authors:  Pau Farré; Alexandre Heurteau; Olivier Cuvier; Eldon Emberly
Journal:  BMC Bioinformatics       Date:  2018-10-11       Impact factor: 3.169

8.  Predicting 3D genome folding from DNA sequence with Akita.

Authors:  Geoff Fudenberg; David R Kelley; Katherine S Pollard
Journal:  Nat Methods       Date:  2020-10-12       Impact factor: 28.547

9.  Sub-kb Hi-C in D. melanogaster reveals conserved characteristics of TADs between insect and mammalian cells.

Authors:  Qi Wang; Qiu Sun; Daniel M Czajkowsky; Zhifeng Shao
Journal:  Nat Commun       Date:  2018-01-15       Impact factor: 14.919

10.  Stratification of TAD boundaries reveals preferential insulation of super-enhancers by strong boundaries.

Authors:  Yixiao Gong; Charalampos Lazaris; Theodore Sakellaropoulos; Aurelie Lozano; Prabhanjan Kambadur; Panagiotis Ntziachristos; Iannis Aifantis; Aristotelis Tsirigos
Journal:  Nat Commun       Date:  2018-02-07       Impact factor: 14.919

View more
  2 in total

1.  Single-cell Hi-C data analysis: safety in numbers.

Authors:  Aleksandra A Galitsyna; Mikhail S Gelfand
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

Review 2.  Chromatin-Based Transcriptional Reprogramming in Plants under Abiotic Stresses.

Authors:  Koushik Halder; Abira Chaudhuri; Malik Z Abdin; Manoj Majee; Asis Datta
Journal:  Plants (Basel)       Date:  2022-05-29
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.