| Literature DB >> 33720766 |
Carlos Soto1, Darshan Bryner2, Nicola Neretti3, Anuj Srivastava1.
Abstract
The study of the three-dimensional (3D) structure of chromosomes-the largest macromolecules in biology-is one of the most challenging to date in structural biology. Here, we develop a novel representation of 3D chromosome structures, as sequences of shape letters from a finite shape alphabet, which provides a compact and efficient way to analyze ensembles of chromosome shape data, akin to the analysis of texts in a language by using letters. We construct a Chromosome Shape Alphabet from an ensemble of chromosome 3D structures inferred from Hi-C data-via SIMBA3D or other methods-by segmenting curves based on topologically associating domains (TADs) boundaries, and by clustering all TADs' 3D structures into groups of similar shapes. The median shapes of these groups, with some pruning and processing, form the Chromosome Shape Letters (CSLs) of the alphabet. We provide a proof of concept for these CSLs by reconstructing independent test curves by using only CSLs (and corresponding transformations) and comparing these reconstructions with the original curves. Finally, we demonstrate how CSLs can be used to summarize shapes in an ensemble of chromosome 3D structures by using generalized sequence logos.Entities:
Keywords: TAD segmentation; chromosome structures; shape analysis; shape letters; structural representations; structural variability
Mesh:
Year: 2021 PMID: 33720766 PMCID: PMC8219198 DOI: 10.1089/cmb.2020.0383
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.549