Musinu Zakari1,2, Kobe Yuen1, Jennifer L Gerton1,3. 1. Stowers Institute for Medical Research, Kansas City, MO, USA. 2. Universite Pierre et Marie Curie, Paris, France. 3. Department of Biochemistry and Molecular Biology, University of Kansas School of Medicine, Kansas City, KS, USA.
Abstract
Cohesin is a chromosome-associated protein complex that plays many important roles in chromosome function. Genetic screens in yeast originally identified cohesin as a key regulator of chromosome segregation. Subsequently, work by various groups has identified cohesin as critical for additional processes such as DNA damage repair, insulator function, gene regulation, and chromosome condensation. Mutations in the genes encoding cohesin and its accessory factors result in a group of developmental and intellectual impairment diseases termed 'cohesinopathies.' How mutations in cohesin genes cause disease is not well understood as precocious chromosome segregation is not a common feature in cells derived from patients with these syndromes. In this review, the latest findings concerning cohesin's function in the organization of chromosome structure and gene regulation are discussed. We propose that the cohesinopathies are caused by changes in gene expression that can negatively impact translation. The similarities and differences between cohesinopathies and ribosomopathies, diseases caused by defects in ribosome biogenesis, are discussed. The contribution of cohesin and its accessory proteins to gene expression programs that support translation suggests that cohesin provides a means of coupling chromosome structure with the translational output of cells.
Cohesin is a chromosome-associated protein complex that plays many important roles in chromosome function. Genetic screens in yeast originally identified cohesin as a key regulator of chromosome segregation. Subsequently, work by various groups has identified cohesin as critical for additional processes such as DNA damage repair, insulator function, gene regulation, and chromosome condensation. Mutations in the genes encoding cohesin and its accessory factors result in a group of developmental and intellectual impairment diseases termed 'cohesinopathies.' How mutations in cohesin genes cause disease is not well understood as precocious chromosome segregation is not a common feature in cells derived from patients with these syndromes. In this review, the latest findings concerning cohesin's function in the organization of chromosome structure and gene regulation are discussed. We propose that the cohesinopathies are caused by changes in gene expression that can negatively impact translation. The similarities and differences between cohesinopathies and ribosomopathies, diseases caused by defects in ribosome biogenesis, are discussed. The contribution of cohesin and its accessory proteins to gene expression programs that support translation suggests that cohesin provides a means of coupling chromosome structure with the translational output of cells.
Cohesin is an evolutionarily conserved multisubunit protein complex consisting of four core subunits: SMC1A, SMC3, the α‐kleisin protein RAD21, and the HEAT repeat‐containing protein SA1 or SA21, 2, 3 (Figure 1). The complex forms a ring‐like structure with an estimated diameter of 35 nm that entraps DNA (Figure 1). SMC1 and SMC3 belong to the structural maintenance of chromosome (SMC) family, each consisting of a coiled‐coil domain flanked by a hinge domain (created by the foldback) that mediates SMC dimer formation and a head domain with ATPase activity. This family also includes members of the condensin complex (SMC2 and SMC4) and SMC5/6 complex. These complexes are conserved from yeast to human and contribute to the organization of chromosomes.
Figure 1
Schematic diagram of the cohesin complex. Cohesin is composed of four subunits. Smc1 and Smc3 (blue) are long polypeptides that form a hinge domain at one end and an ATPase domain at the other end by folding back on themselves to form antiparallel coiled‐coil interactions. The SMC heads are connected by the α‐kleisin RAD21 (green), which interacts with the fourth subunit, SA1 or SA2 (pink).
Schematic diagram of the cohesin complex. Cohesin is composed of four subunits. Smc1 and Smc3 (blue) are long polypeptides that form a hinge domain at one end and an ATPase domain at the other end by folding back on themselves to form antiparallel coiled‐coil interactions. The SMC heads are connected by the α‐kleisin RAD21 (green), which interacts with the fourth subunit, SA1 or SA2 (pink).Cohesin is loaded onto chromosomes in G1 in budding yeast and during telophase of the preceding cell division in vertebrates by a complex comprised of NIPBL (also known as Scc2, Saccharomyces cerevisiae, Nipped‐B, Drosophila melanogaster) and MAU‐2 (also known as Scc4, S. cerevisiae) (Figure 2). Cohesin may encircle both sister chromatids or may associate with chromosomes in other ways. The ‘handcuff’ model proposes that a single cohesin ring encircles one sister chromatid while interacting with a second ring encircling the other sister via the Rad21 and Scc3 subunits. A third model proposes cohesin encircles one sister chromatid while interacting with proteins bound to the other.4, 5, 6 In any event, cohesin functions to hold two pieces of DNA in close proximity. It has been proposed that DNA enters the ring through the hinge domain governed by the Scc2–Scc4 loading complex, and that it can exit the ring at the Smc3 head–Rad21 junction in a process regulated by acetylation and other regulatory factors.7
Figure 2
Schematic diagram showing cohesin regulation throughout the cell cycle. Cohesin (blue) is loaded onto chromosomes in telophase/G1 phase by the NIPBL–MAU2 heterodimer and requires the opening of the hinge domain of SMC1A and SMC3 for DNA entry. Cohesion establishment at S phase is facilitated by ESCO1/2‐dependent acetylation of the SMC3 head domain, making cohesin refractory to removal from chromatin by WAPL. Cohesion is then maintained by other proteins such as WAPL, PDS5, and Sororin. Cohesin can be removed in prophase in a separase‐independent manner from chromosome arms. This removal depends on PLK1 and WAPL/PDS5. Pericentromeric cohesion is protected by shugoshin and PP2A. At the onset of mitosis, pericentromeric cohesion is destroyed by proteolytic cleavage of RAD21 by separase and recycled for the next cell cycle. Recycling of the SMC3 subunit requires deacetylation by HDAC8.
Schematic diagram showing cohesin regulation throughout the cell cycle. Cohesin (blue) is loaded onto chromosomes in telophase/G1 phase by the NIPBL–MAU2 heterodimer and requires the opening of the hinge domain of SMC1A and SMC3 for DNA entry. Cohesion establishment at S phase is facilitated by ESCO1/2‐dependent acetylation of the SMC3 head domain, making cohesin refractory to removal from chromatin by WAPL. Cohesion is then maintained by other proteins such as WAPL, PDS5, and Sororin. Cohesin can be removed in prophase in a separase‐independent manner from chromosome arms. This removal depends on PLK1 and WAPL/PDS5. Pericentromeric cohesion is protected by shugoshin and PP2A. At the onset of mitosis, pericentromeric cohesion is destroyed by proteolytic cleavage of RAD21 by separase and recycled for the next cell cycle. Recycling of the SMC3 subunit requires deacetylation by HDAC8.The deposition and removal of cohesin is regulated over the cell cycle. Many aspects of the cycle are evolutionarily conserved from yeast to human cells. Establishment of cohesion between sister chromatids during DNA replication is dependent on Eco1 (yeastECO1 is a homolog of humanESCO1/ESCO2) (Figure 2). Eco1 is an acetyltransferase that acetylates two lysine residues within the head domain of Smc3. The acetylation appears to convert the cohesin ring to a ‘cohesive’ mode in which it can very stably juxtapose two DNAs. While budding yeast have a single copy of the gene encoding the acetyltransferase, humans have two genes, ESCO1 and ESCO2, both of which are important for sister chromatid cohesion and cell survival in response to radiation‐induced DNA damage.8, 9, 10 Once established, cohesion between sister chromatids is maintained by accessory proteins including PDS5, WAPL, shugoshin, and sororin11, 12, 13, 14 (Figure 2). At the onset of mitosis, cohesin is removed from chromatin as a precursor to sister chromatid segregation. In metazoans, cohesin removal occurs in two waves.15 While most cohesin along chromosome arms is removed through the action of PDS5, WAPL, PP2A, Aurora B, and phosphorylation of SA1 or SA2 by PLK1 in prophase;12, 16, 17, 18 full separation of sister chromatids at the centromere region occurs at the metaphase‐to‐anaphase transition through site‐specific cleavage of the α‐kleisin RAD21 by separase19 (Figure 2). The acetyl groups on the SMC3 head are removed by HDAC8 (also known as Hos1, S. cerevisiae) to help recycle SMC3 for the next round of cohesin deposition.20, 21, 22, 23
COHESIN CAN REGULATE GENE EXPRESSION
Although cohesin has roles in many nuclear processes such as chromosome segregation, double‐strand break repair, gene regulation, and chromosome condensation (Figure 3), we have chosen to focus this review on gene expression because this role may be critical for understanding the cohesinopathies, developmental syndromes caused by mutations in the genes encoding cohesin and its regulators. Genetic studies in Drosophila originally identified a role for the SMC loading factor in gene regulation. Reduction in Nipped‐B altered the expression of homeobox genes cut and ultrabithorax through altered enhancer–promoter communication.24 Subsequently, studies in yeast and flies showed that moderate reduction in cohesin affected chromosome condensation, gene expression, and development without affecting chromosome segregation.25, 26 Cohesin can regulate gene expression independent of chromosome segregation, as shown in nondividing Drosophila neurons and mouse thymocytes, where changes in cohesin activity altered gene expression.27, 28, 29
Figure 3
Schematic diagram depicting functions of cohesin at noncentromeric sites. (a) Double‐strand break (DSB) repair is facilitated by cohesin binding to the break site. Cohesion is also re‐enforced genome‐wide in response to a DSB. (b) Cohesin regulates gene expression by gene looping to promote long‐range (1) promoter–enhancer communication, (2) promoter–terminator interaction (middle), or (3) insulation. These types of events may contribute to chromatin organization within topological domains. (c) Cohesin promotes chromosome condensation.
Schematic diagram depicting functions of cohesin at noncentromeric sites. (a) Double‐strand break (DSB) repair is facilitated by cohesin binding to the break site. Cohesion is also re‐enforced genome‐wide in response to a DSB. (b) Cohesin regulates gene expression by gene looping to promote long‐range (1) promoter–enhancer communication, (2) promoter–terminator interaction (middle), or (3) insulation. These types of events may contribute to chromatin organization within topological domains. (c) Cohesin promotes chromosome condensation.Cohesin association with chromatin appears to be relatively sequence‐independent. Cohesin binds between convergently transcribed genes in budding yeast.30 Cohesin binds to highly transcribed genes in higher order organisms. In D. melanogaster cells, cohesin is enriched at highly transcribed genes together with RNA polymerase II.31 Recent studies suggest that cohesin loading occurs at sites of active transcription in all organisms, while the differences between organisms may be due to cohesin redistribution after loading onto chromosomes. Various models for how cohesin association regulates gene expression have been proposed and are discussed below.Although one theme in cohesin–DNA associations is that they tend to occur at highly transcribed regions,32 in most cases it is not clear whether cohesin helps support the expression of highly transcribed regions or prefers to bind to these regions because they are highly transcribed. In budding yeast, the cohesin loading complex and condensin associate with promoters of RNA polymerase II‐transcribed genes encoding small nuclear RNAs, small nucleolar RNAs, ribosomal protein genes, as well as RNA polymerase III‐transcribed genes encoding transfer RNAs (tRNAs) and spliceosomal RNAs.33 The clustering of tRNA genes adjacent to the nucleolus is dependent on SMC complexes, suggesting that the binding of these proteins provides elements of chromosome organization even in budding yeast.34 Cohesin colocalizes completely with its loading factor in Drosophila at subsets of highly transcribed genes and DNA replication origins, but is excluded at silenced genes.31, 35 In both prokaryotes and eukaryotes, cohesin also binds to rDNA repeats.36 The production of ribosomal RNAs in the nucleolus is essential for the assembly of ribosomes. High levels of transcription of this locus are therefore critical for cell survival, cell proliferation, and protein translation. If cohesin does support the expression of these translation themed genes, then cohesin might promote a gene expression program that supports translation.
Cohesin May Facilitate DNA Looping
Transcription is controlled by regulatory regions. These regulatory regions are bound by DNA‐binding proteins and control both the assembly and activity of the basal transcriptional machinery.37 Defects in the association of cohesin and NIPBL with these highly transcribed regions could affect the transcription of genes critical for development. In eukaryotes such as yeast, the transcription of genes is controlled by regulatory elements close to transcription start sites. Metazoans, however, have evolved long‐distance regulatory elements that are critical for development and gene expression.37 The interaction of DNA sequences can be mediated by cohesin, in some cases in conjunction with additional factors. The interactions can be activating for transcription, for instance, a looping event that brings a promoter near an enhancer or a terminator; in other cases, they can be repressive, for instance, an interaction could insulate a promoter from an enhancer.One factor that binds to regulatory elements is the sequence‐specific DNA‐binding protein CTCF. The ability of CTCF to connect with other regulatory regions is facilitated by cohesin, polycomb, and nuclear lamina.38, 39, 40 Depending on cell type, 50–80% of CTCF‐binding sites genome‐wide are also bound by cohesin, and disruption of cohesin results in defects in CTCF‐mediated intrachromosomal interactions.41 CTCF functions as a transcriptional activator through binding to mammalian c‐Myc promoters42 as well as an insulator when placed between enhancers and promoters of genes such as the imprinted IGF2/H19 locus.43 In fact, cohesin may promote the transcription of c‐MYC, an important factor for the transcription of ribosomal protein genes.44 Cohesin binding is thought to be critical for long‐range chromatin interactions both between CTCF insulator elements and for gene activation.42, 45 However, the association between cohesin and CTCF appears to be vertebrate‐specific because CTCF is not conserved in budding yeast and does not colocalize with cohesin in Drosophila. These types of long‐range interactions could help shape the genome by supplying organization within larger topological domains.46Another fraction of cohesin physically interacts with and colocalizes with the Mediator complex, which binds the enhancers and promoters of active genes. About half of the cohesin loading complex associates with Mediator sites, further implicating cohesin in transcription. The Mediator complex facilitates interactions between the basal transcriptional machinery and enhancer‐bound transcriptional activators.47, 48, 49 In mouse embryonic stem cells, cohesin colocalizes with Mediator at extragenic enhancers and drives the expression of key pluripotency genes.49 In addition, NIPBL and the Mediator complex cooperate to regulate the expression of genes critical for limb development such as the hox clusters.50Another type of DNA loop that may be mediated by cohesin is the promoter–terminator loop (Figure 3(b), middle). Chromatin immunoprecipitation (ChIP) experiments in human cells identified cohesin enrichment at transcription start sites and terminators in metagene analysis.51 These looping events could potentially allow for the recycling of RNA polymerase, resulting in highly efficient transcription reinitiation and polymerase recycling. These types of loops have been proposed to form at the ribosomal DNA genes,52 the most highly transcribed locus in most cells. Cohesin‐binding sites are positioned such that they could mediate looping within the rDNA. In budding yeast, this proposed looped structure formed less efficiently in yeast bearing a mutation in the cohesinacetyltransferaseEco1, and production of rRNA was decreased, suggesting that the efficient formation of these loops depends on acetylation of cohesin.53 Mutations in SMC complexes also result in aberrant nucleolar morphology and ribosome biogenesis,54, 55 implying a critical role for cohesin in the formation and function of the nucleolus. Because the nucleolus serves as an anchor point for chromosome organization, this finding has implications for both the architecture and expression of the genome.
Cohesin May Regulate Transcription Initiation, Elongation, and Termination
Eukaryotic transcription is divided into three main steps: initiation, elongation, and termination. Each of these processes is regulated at multiple levels. Numerous studies have identified cohesin to be important for transcription initiation. Studies in Drosophila cells have shown that cohesin and Nipped‐B colocalize at the transcription start sites of active genes,31 although less colocalization between these two proteins is observed in mouse cells.49 When compared with RNA polymerase II binding, NIPBL preferentially binds 100–200 nucleotides upstream of RNA polymerase II. NIPBL knockdown causes reduced transcription of these genes, suggesting a role for NIPBL in transcription initiation.56 Cohesin may also promote transcription termination in eukaryotes. In Schizosaccharomyces pombe, cohesin promotes mRNA 3′‐end formation and transcriptional termination in convergent gene transcripts.57Recent studies suggest that cohesin also regulates transcription elongation. In D. melanogaster cells, cohesin was shown to bind to genes with low levels of histone H3 lysine 36 trimethylation (H3K36me3), a chromatin mark associated with transcription elongation.58 Many of these genes had paused RNA polymerase II just downstream of transcription start sites and GAGA factor (GAF)‐binding sites. Cohesin, along with other pausing factors such as NELF, was proposed to play a role in the transition of RNA polymerase II from the paused form to the elongating form. Taken together, these studies suggest that cohesin might be playing an important role in all three stages of transcription.
NIPBL May Act as a Chromatin Adaptor
The cohesin loader NIPBL tends to bind to nucleosome‐free regions in promoters of active genes in metazoans. Nucleosome‐free regions are important for transcription initiation because they serve as a ‘landing pad’ for key transcription‐promoting factors. The maintenance of nucleosome‐free regions requires chromatin remodeling factors such as the SNF/SWI2 superfamily,59 ISWI, and CHD family; and posttranslational modifiers of histone proteins, such as deacetylases (HDACs), methyltransferases, and acetyltransferases.60, 61 Studies in model organisms from yeast to human have implicated the cohesin loading complex in helping maintain nucleosome‐free regions.61, 62, 63 Because SMC complexes first evolved in bacteria, which do not have nucleosomes, we speculate that the loading complex may have evolved in eukaryotes to assist SMC complexes loading onto DNA in the context of chromatin.The RSC (remodels the structure of chromatin) remodeling complex, a member of the SNF/SWI2 superfamily, may act upstream of genes to facilitate the recruitment of the cohesin loading complex Scc2–Scc4.61, 62 Mutations in the RSC remodeling complex subunits cause Coffin–Siris syndrome (CSS), a genetic disorder that results in developmental delay in patients.64 CSS is caused by heterozygous mutations in ARID1A, ARID1B, SMARCA4, SMARCB1, or SMARCE1. Characteristic features of patients with CSS include aplasia or hypoplasia of the distal phalanx, moderate to severe developmental and cognitive delay, facial abnormalities, short stature, ophthalmologic abnormalities, microcephaly, brain malformations, and hearing loss. In humans, the ISWI (SNF2h)‐containing chromatin remodeling complex interacts with the cohesin subunit RAD21.63 The cohesin loader NIPBL was also reported to mediate the recruitment of HDAC1 and HDAC3 to chromatin.65 The Scc2–Scc4 complex itself, or in conjunction with additional factors that it recruits, may help to maintain nucleosome‐free regions to promote cohesin loading as well as transcription. In this way, the cohesin loading complex might be serving as a ‘chromatin adaptor’ for cohesin and for transcription‐promoting complexes.
GENOME‐WIDE ANALYSIS OF COHESIN AND Nipbl BINDING
Genome‐wide binding profiles of NIPBL and cohesin (SMC1) in mouse embryonic stem cells have been published.49 To further explore the association of these proteins with the genome and their potential role in gene expression, we chose the top NIPBL‐ and cohesin‐binding sites and examined colocalization with other factors in a window of ±1 kb. For NIPBL we defined 4413 sites with a false discovery rate (FDR) cutoff of 1−e9. For cohesin, we defined 18519 sites with an FDR cutoff of 1e−9.Among the largest categories of binding sites for both NIPBL and cohesin are the promoter regions and transcription start sites of active genes (Figures 4(a) and 5(a)), although SMC1 enrichment is also observed at less active genes (Figure 5(b)). Enrichment of NIPBL and SMC1 at active genes for some genes is quite broad, extending a few kilobases or more to either side of the transcription start sites. Binding of NIPBL and cohesin at transcription start sites of active genes is assigned by correlation with trimethylation of histone 3 at lysine 4 (H3K4me3), a chromatin modification mark associated with active genes, RNA polymerase II, transcription initiation factor TFIID subunit 3 (TAF3), histone H2AZ, and negative elongation factor (NELF). A larger fraction of NIPBL peaks correlate with H3K4me3 than do SMC1 peaks. NIPBL and cohesin both colocalize with CDK9, an elongation factor for RNA polymerase II‐directed transcription, a finding consistent with the hypothesis that cohesin plays an important role in transcription elongation (Figures 4(a) and 5(a)). NIPBL does not significantly colocalize with CTCF, while cohesin does, suggesting that cohesin is more likely to be involved in CTCF‐mediated processes such as looping and insulation compared with NIPBL.66, 67, 68, 69, 70 Cohesin and CTCF interact through the carboxy‐terminal region of CTCF and SA2 subunit of cohesin, which helps to stabilize most CTCF‐mediated chromosomal contacts,76 a process important for cohesin‐dependent insulation activity.
Figure 4
Analysis of the cohesin loader factor (NIPBL) binding throughout the genome of mouse embryonic stem cells. (a) NIPBL peaks from chromatin immunoprecipitation (ChIP) seq data were defined along with windows of ±1 kb on either side. These were then used to select the ChIP signal for each additional factor shown.49, 66, 67, 68, 69, 70, 71, 72, 73, 74 ChIP‐seq data were subjected to unbiased clustering using seqMiner v1.3.3 package.75 Kmeans rank was used as the method of clustering, with the following parameters: left and right extensions = 1 kb; internal bins = 160; flanking region bins = 20; and the number of clusters = 7. NIPBL binds to active genes, enhancer regions, and intergenic and intronic regions of genes. NIPBL does not colocalize with CTCF.74 (b) The majority of NIPBL‐binding sites were at the promoter regions of genes. Gene ontology (GO) analysis of NIPBL‐bound active genes showed enrichment for genes important for RNA processing, mRNA processing, and RNA splicing.
Figure 5
Genome‐wide analysis of cohesin (SMC1) binding in the mouse genome. (a) SMC1 peaks were first defined and then a window ±1 kb on either side is shown for each factor.49, 66, 67, 68, 69, 70, 71, 72, 73, 74 Clustering was performed as described for Figure 4. Cohesin peaks correlate well with CTCF binding.74 (b) Cohesin (SMC1) binds at more intergenic and intronic regions of genes than promoters. Gene ontology (GO) analysis of the active genes associated with cohesin in mESCs showed enrichment for genes involved in cell morphogenesis, cellular signal transduction, cellular developmental process, and cell adhesion.
Analysis of the cohesin loader factor (NIPBL) binding throughout the genome of mouse embryonic stem cells. (a) NIPBL peaks from chromatin immunoprecipitation (ChIP) seq data were defined along with windows of ±1 kb on either side. These were then used to select the ChIP signal for each additional factor shown.49, 66, 67, 68, 69, 70, 71, 72, 73, 74 ChIP‐seq data were subjected to unbiased clustering using seqMiner v1.3.3 package.75 Kmeans rank was used as the method of clustering, with the following parameters: left and right extensions = 1 kb; internal bins = 160; flanking region bins = 20; and the number of clusters = 7. NIPBL binds to active genes, enhancer regions, and intergenic and intronic regions of genes. NIPBL does not colocalize with CTCF.74 (b) The majority of NIPBL‐binding sites were at the promoter regions of genes. Gene ontology (GO) analysis of NIPBL‐bound active genes showed enrichment for genes important for RNA processing, mRNA processing, and RNA splicing.Genome‐wide analysis of cohesin (SMC1) binding in the mouse genome. (a) SMC1 peaks were first defined and then a window ±1 kb on either side is shown for each factor.49, 66, 67, 68, 69, 70, 71, 72, 73, 74 Clustering was performed as described for Figure 4. Cohesin peaks correlate well with CTCF binding.74 (b) Cohesin (SMC1) binds at more intergenic and intronic regions of genes than promoters. Gene ontology (GO) analysis of the active genes associated with cohesin in mESCs showed enrichment for genes involved in cell morphogenesis, cellular signal transduction, cellular developmental process, and cell adhesion.All NIPBL peaks (4413) overlapped with cohesin peaks (SMC1). While over 50% of NIPBL binds to promoter regions of genes, only 7% of cohesin sites can be accounted for by these same genes. On the surface, this suggests that SMC1 and NIPBL may both participate in the regulation of a common gene group, while SMC1 may also be involved in the regulation of additional genes. However, because their function at these regions may be different, they may not have similar effects on gene expression at even the common gene group upon loss of function. For instance, NIPBL's primary role may be to maintain a nucleosome‐free region while cohesin may be mediating looping events. Gene ontology (GO) analysis of the active genes with NIPBL enrichment revealed genes important for RNA processing, translation, and splicing (Figure 4(b)). These results are consistent with observations suggesting that loss of nipbla/b function negatively impacts translation in zebrafish.55 At these genes, mutations in NIPBL could compromise the maintenance of a nucleosome‐free region which could impair transcription initiation. However, the active genes bound by SMC1 show GO term enrichment for cell morphogenesis, cellular signal transduction, cellular developmental process, and cell adhesion (Figure 5(b)). At these genes, loss of cohesin function might impair looping and gene regulation, consistent with observations that loss of cohesin function is detrimental to expression of Hox genes and development.A large portion of NIPBL (36%) and cohesin (51%) peaks are found at intergenic regions. Enhancer regions were defined as sites at which NIPBL or SMC1 binding correlates strongly with CBP, P300, Mediator, and CDK949, 67, 70 (Figures 4(a) and 5(a)). CBP and P300 are important transcriptional activators that promote acetylation of histone H3 at lysine 27 (H3K27) at active enhancers77 and serve as convenient markers of enhancers in both D. melanogaster and mammalian studies. Although CBP and P300 are found at intergenic or intragenic regions with NIPBL and SMC1 (Figures 4(a) and 5(a)), they also occupy the promoters of active genes and acetylate many proteins, including p53. CBP and P300 also perform many other cellular roles such as acetylation of promoters and acetylation of histone H3 during deposition.78, 79 Mutations that impair the activity of CBP and P300 result in Rubinstein–Taybi syndrome, a congenital disorder characterized by mental and growth retardation, a wide range of dysmorphic features, and an increased susceptibility to tumor formation,80 suggesting that levels of these proteins are also important for development. SMC1 also binds at additional intergenic sites that do not have a distinct colocalization signature, making them difficult to assign to a category.Taken together, our examination of the binding sites of cohesin versus NIPBL demonstrates that these proteins not only share some binding sites but also show unique colocalization signatures with other proteins, arguing that these genes may influence gene expression and chromosome architecture in common and unique ways. It will be important to continue to evaluate the genome‐wide association patterns and genes regulated by these proteins in order to understand how they influence gene expression.
COHESIN AND HUMAN DISEASES
Mutations in cohesin and genes with related function result in a broad spectrum of diseases termed ‘cohesinopathies.’ Roberts syndrome (RBS) and Cornelia de Lange syndrome (CdLS) patients have a constellation of phenotypes including craniofacial, heart, and gastrointestinal defects, poor growth, developmental delay, and intellectual disability.81 Both RBS and CdLS arise from mutations in cohesin‐associated genes. Altered gene expression is clearly an important part of the etiology of these diseases and contributes to the developmental defects observed. Emerging evidence suggests that cohesin and NIPBL may promote gene expression programs that support translation, making it interesting to consider translational defects as part of the etiology of these syndromes.82
Mutations in ESCO2 Cause RBS
RBS, an autosomal recessive disorder, is caused by mutations in both copies of ESCO2, a gene encoding a member of the acetyltransferase family.9 The mutations in ESCO2 cause either the complete loss of protein or the loss of acetyltransferase activity in patients.83 Mutations in ESCO1 have not yet been reported in humans, probably because they would be lethal. Affected individuals display a wide variety of malformations including craniofacial deformities, hypoplastic nasal alae, cleft lip and palate, and reduction in digit number, bone length, or bone formation in both arms and legs.84 RBS newborns have a high mortality rate.85 Loss of Esco2 in mice is lethal and leads to termination of preimplantation and postimplantation stage embryos.86, 87 In RBS cells, cytological observations include aneuploidy, micronuclei, and heterochromatic repulsion. Loss of Esco2 in early mitosis also results in changes in the chromosomal location of cohesin and its protector SGO1.87 The heterochromatic repulsions observed in humanRBS cells are located at the pericentric domains and nucleolar organizing regions (NOR or rDNA),88 suggesting a defect in cohesion at these regions. Studies in various model organisms such as yeast, zebrafish, and human show that ESCO2 mutation affects nucleolar organization, rRNA production, ribosome biogenesis, and protein translation.53, 54, 89 RBS cell lines have increased apoptosis with elevated p53 and reduced cell proliferation.54 Some fraction of the phenotypes associated with RBS could be the result of poor translation and cell proliferation contributing to abnormal development during embryogenesis.
Mutations in NIPBL and Cohesin Cause CdLS
CdLS, also known as Brachmann–de Lange syndrome, is the most common of the ‘cohesinopathies,’ occurring in approximately 1 in 10,000 live births. CdLS is clinically heterogeneous and affects multiple aspects of development. Patients with CdLS have distinct phenotypic characteristics, which vary from mild intellectual disability to severe developmental and intellectual impairment.90 Affected individuals have craniofacial deformities, upper limb extremity defects, hirsutism, gastroesophageal dysfunction, immune dysfunction, and growth and neurodevelopmental delay51, 81 (http://www.cdlsusa.org/what‐is‐cdls/).More than half (∼65%) of CdLS cases arise from mutations in the NIPBL gene and are dominantly inherited.91, 92 Examination of the phenotypic and genotypic correlation of patients shows that severe clinical features arise from deletions or truncations in NIPBL,90 indicating that CdLS is caused by haploinsufficiency. The fact that NIPBL mutations accounted for only a little more than half of CdLS cases prompted investigators to look for mutations in other genes with related functions. Subsequent genetic screens in large cohorts of CdLS patients without NIPBL mutations identified 5% missense or small in‐frame deletions in SMC1A to be causative.93, 94, 95 Mutations in SMC3 were also identified.94 Since then, mutations in other cohesin‐associated genes (RAD21 and HDAC8) have been identified as giving rise to CdLS or related disorders.20, 96Even though mutations in cohesin and cohesin‐associated genes cause these two syndromes, the underlying etiology for RBS and CdLS could be different.81 In fact, mutations in different cohesin genes cause clinically distinct subtypes of CdLS. In contrast to RBS cells, most CdLS patient cells do not exhibit gross defects in chromosome segregation, but primarily exhibit gene dysregulation. Most of the differential gene expression changes observed in cells derived from patients or mouse models for CdLS are modest at best (lower than twofold), suggesting that either these small changes in gene expression result in the developmental features or additional factors contribute to the etiology. Studies in SMC1A‐ and SMC3‐mutated CdLS cell lines using a proteomic approach revealed dysregulation of proteins important for metabolism, cytoskeleton organization, and RNA processing, among others.97 Defects in ribosomal RNA production and protein synthesis in RBS and CdLS model organisms suggest that the changes in gene expression could lead to translational defects which contribute to ‘cohesinopathies.’
COHESINOPATHIES AND TRANSLATION
Ribosomes Can Have Regulatory Capacity
Decades of research has shown that the development of single cells into complex organisms is regulated at transcriptional, posttranscriptional, translational, and posttranslational levels. Translation in eukaryotes is an intricate and essential process requiring various factors, chief among them being ribosomes. Ribosomes are large ribonucleoprotein (RNP) particles that convert mRNAs into proteins in the cytoplasm. Eukaryotic ribosomes consist of four ribosomal RNAs (25S, 18S, 5S, and 5.8S rRNA) bound by about 75 ribosomal proteins that are assembled into large and small ribosomal subunits (60S and 40S). The ribosomal RNAs are transcribed from the ribosomal DNA located in a specialized compartment called the nucleolus by RNA polymerase I and polymerase III. Complete synthesis and assembly of 40S and 60S subunits into mature forms requires ribosomal proteins and additional modification and processing factors which are important for maturation of the RNAs, transport of the immature ribosomal subunits, stabilization of ribosome structure, and regulation of mRNA translation. Because ribosome biogenesis requires so many factors, the process is coupled to cellular metabolism and can be regulated in many ways. Furthermore, because ribosomes are composed of dozens of proteins, they have the potential to have a regulatory function in translation. Ribosomes can have both quantitative and qualitative effects on mRNA translation, which can result in a wide variety of pathological conditions. NIPBL and cohesin bind to the rDNA repeats and contribute to their organization into a nucleolar structure that produces sufficient rRNAs and ribosomes.36, 54 We speculate that the cohesinopathies may be caused in part by defects in translation.Mutations that impair proper ribosome biogenesis in various model organisms cause developmental defects. Haploinsufficiency in ribosomal proteins gives rise to ‘minute’ flies.98 ‘Minute’ flies are small and have short and thin bristles. The small size likely results from overall defects in translation while the bristle phenotype probably occurs because bristle production places exceptional demands on translation. Developmental delay and bristle phenotypes are similarly observed in Drosophila bobbed mutants, which have fewer rDNA repeats. Together, these fly mutants demonstrate the essential nature of producing large amounts of both ribosomal proteins and RNAs. In a remarkable demonstration of the regulatory potential of ribosomal proteins, loss of function of Rpl38 in the mouse is associated with a specific developmental defect, namely loss of axial skeletal patterning that results from a lack of translation of a specific subset of homeobox (Hox) mRNAs.99 This study demonstrates the potential of the ribosome for a regulatory role in translation.
Ribosomopathies Are Human Diseases Caused by Defects in Ribosome Biogenesis
A growing number of disorders associated with ribosome biogenesis and function, termed ribosomopathies, have also been identified and are discussed below. These diseases can be caused either by mutations in ribosomal protein genes or by mutations in genes involved in the processing or modification of ribosomal RNAs.Several ribosomopathies are caused by haploinsufficiency for ribosomal protein genes. Diamond Blackfan anemia (DBA), for example, is caused by a mutation in one copy of the gene encoding ribosomal protein S19 (RPS19), RPL5, RPL35A, or RPS7, among others.100 Characteristic features of DBA patients include enhanced sensitivity of hematopoietic progenitors to apoptosis, and craniofacial, cardiac, and limb abnormalities (http://dbafoundation.org). 5q‐syndrome, another ribosomopathy, is a subtype of myelodysplasia with phenotypes similar to DBA, attributable to haploinsufficiency of the ribosomal protein S14 (RPS14).101 Patients with 5q‐syndrome have erythroid hypoplasia of the bone marrow, normal or reduced neutrophil counts, and high susceptibility to acute myeloid leukemia.102 In both DBA and 5q‐syndrome patient cells, rRNA processing is compromised, affecting ribosome production. Mutations in a gene needed for the maturation of the 60S ribosomal subunits cause Shwachman Bodian Diamond syndrome (SBDS).103, 104 SBDS is characterized by pancreatic exocrine insufficiency, hematologic dysfunction, and skeletal abnormalities. A novel ribosomopathy caused by X‐linked loss of function in RPL10 has recently been reported.105 Characteristic features of patients with this syndrome include microcephaly, growth retardation, hypotonia, and neurodevelopmental problems. Together these diseases support the idea that dosage of ribosomal proteins is important for proper ribosome function. Furthermore, loss of function in different ribosomal protein genes yields distinct syndromes, suggesting that each protein makes a unique contribution to translation.X‐linked dyskeratosis congenita (DKC) has been controversially categorized as a ribosomopathy. DKC is characterized by nail dystrophy, reticular skin pigmentation, oral leukoplakia, bone marrow failure, increased susceptibility to cancer, and skin abnormalities.106, 107 Some cases of DKC are caused by mutations in DKC1, a gene encoding the catalytic protein component of a RNP complex that along with the guiding H/ACA snoRNAs converts uridines to pseuodouridines in mRNAs and noncoding RNAs including rRNAs and telomerase RNA. DKC can also be caused by mutations in other genes important for telomere biology such as TERC, TERT, TINF2, NHP2, and NOP10. Hypomorphic Dkc1 mutant mice have reduced pseudouridylation in their ribosomal RNAs, and cells show impaired translational fidelity, including poor translation of mRNAs with an internal ribosome entry site.108, 109 Whether these changes in translational fidelity contribute to the etiology of DKC is still an open question.Some cases of the ribosomopathy Treacher Collins syndrome (TCS) are caused by mutations in TCOF1, which encodes Treacle, a dense fibrillary component of nucleoli.110, 111 Additional cases are caused by mutations in genes that encode subunits shared between RNA polymerase I and polymerase III.112 Characteristic features of patients with TCS include skeletal dysmorphism of the orbits, midface and zygomatic hypoplasia, microtia, and mandibular microretrognathia.113 Patients with cartilage‐hair hypoplasia, another ribosomopathy, have metaphyseal chondrodysplasia, short stature, immunodeficiency, anemia, gastrointestinal disorders, and hair abnormality.114 The mutations in this disease are in the gene encoding an RNA component of RNAse MRP, a RNP complex that processes pre‐rRNA. Therefore, defects in rRNA production, either through transcription, processing, or modification, have all been linked with defective ribosome biogenesis and human disease (reviewed in Ref 115).
Comparing and Contrasting Ribosomopathies and Cohesinopathies
Cohesinopathies and ribosomopathies share compromised ribosome biogenesis and protein synthesis.54, 89 Studies in yeast, zebrafish, and human cells show that mutations in the cohesinacetyltransferaseESCO2 are associated with defects in ribosomal RNA production and protein synthesis, strongly suggesting that defective translation contributes to RBS.54, 89 Zebrafish models of CdLS also show reduced phosphorylation of biomarkers of translation, and reduced protein synthesis and rRNA production, suggesting that defective translation could be a common consequence of mutations in cohesin‐associated genes.55 Maybe not surprisingly, these two syndromic categories also share some phenotypic characteristics. Cohesinopathies and ribosomopathies are both characterized by prenatal and postnatal growth retardation, microcephaly, seizures, craniofacial abnormalities, and limb deformities. However, there are also clear differences between these syndromes at the cellular and molecular level. For instance, cohesinopathies are not associated with anemia. Furthermore, animal models of DBA show elevated activity of the TOR pathway which serves as a major node of control for ribosome biogenesis, while the cohesinopathies show low TOR pathway activity. The reduced TOR activity in the cohesinopathies makes them a good candidate for treatment with the TOR stimulator l‐leucine.55, 89Despite the fact that cohesinopathies and ribosomopathies share overlapping clinical presentations, these congenital diseases are distinct. While a majority of ribosomopathy probands are predisposed to cancer, no susceptibility to cancer has been associated with the cohesinopathies. Proteins levels of tumor suppressors such p27, XIAP, and p21 are reduced in DKC, while p53, a proapoptotic gene, is elevated in RBS and DBA patient cells.89, 116 The primary effects of the various mutations can trigger different pathways with different outcomes and hence manifest in the unique clinical characteristics observed in patients. It is worthy to note that mutations in cohesin subunits have been implicated in acute myeloid leukemia and a wide range of tumor types.117, 118 The underlying mechanisms regarding how mutations in cohesin subunits cause cancer are, however, yet to be determined.
Comparing and Contrasting the Cohesinopathies
Although the genes mutated in RBS and CdLS all contribute to cohesin function, even the clinical manifestations of these two diseases are distinct. The difference could result in part from differences in the exact molecular defect caused by the mutation and differences in the cell types that the mutations affect. Cell types vary in gene expression, translational control, and requirements for protein synthesis. Cohesin has been shown to be important for maintaining a gene expression program that enables pluripotency in embryonic stem cells.49, 71 In addition, translation has been proposed to be critical for germline stem cell (GSC) maintenance in Drosophila. Translational repressors such as Pum and Nos have been proposed to be important for maintenance of GSCs and control germ cell development from Caenorhabditis elegans to human.119, 120 Studies have shown that GSCs have increased expression of proteins involved in ribosome biogenesis and translation when compared with differentiating cells (high translational output). If mutations in cohesin can affect translational efficiency and fidelity, cells that are highly proliferative or have an otherwise high requirement for protein synthesis, such as GSCs, may be more acutely affected.Another mechanism whereby defects in translation could contribute to cell‐type‐specific defects in the cohesinopathies is apoptosis. When ribosomal proteins and rRNAs are unbalanced, p53 can be induced followed by apoptosis. Cancer cells that overproduce ribosomal proteins are especially prone to apoptosis via this mechanism when treated with a RNA polymerase I inhibitor.121 Cell types with high levels of ribosomal proteins could be primed for apoptosis by this mechanism when a cohesin mutation causes a reduced level of rRNA. Increased apoptosis in developing neural tissues has been observed in nipbl loss of function zebrafish.122 p53‐triggered apoptosis in neurons could contribute to the developmental abnormalities observed in the central nervous system of CdLS patients.
PERSPECTIVE
Although tremendous work has been done over the last decade to decipher the pathogenesis of ‘cohesinopathies,’ more is still required to fully understand how mutations in cohesin contribute to human disease. How cohesin and its regulators mediate gene expression is likely to occur via multiple mechanisms including the maintenance of nucleosome‐free regions and DNA looping. Several key regulators have been shown to be affected by mutations in cohesin including c‐MYC, p53, mTOR, and Hox genes. The role of cohesin in regulating translational processes has recently been uncovered and will require additional study. The body of work accumulating on ribosomopathies provides a helpful counterpoint to understand how the regulation of translational processes by cohesin could contribute to human disease. By supporting gene expression that promotes translation, cohesin could provide a means for coupling chromosome structure with the translational output of cells. Understanding how cohesin synergizes with different proteins and pathways to regulate genome architecture, gene expression, and translation should enable us to more fully understand the molecular etiology of the ‘cohesinopathies.’
Authors: Jung-Sik Kim; Xiaoyuan He; Jie Liu; Zhijun Duan; Taeyeon Kim; Julia Gerard; Brian Kim; Manoj M Pillai; William S Lane; William S Noble; Bogdan Budnik; Todd Waldman Journal: J Biol Chem Date: 2019-04-22 Impact factor: 5.157
Authors: Preston Countryman; Yanlin Fan; Aparna Gorthi; Hai Pan; Jack Strickland; Parminder Kaur; Xuechun Wang; Jiangguo Lin; Xiaoying Lei; Christian White; Changjiang You; Nicolas Wirth; Ingrid Tessmer; Jacob Piehler; Robert Riehn; Alexander J R Bishop; Yizhi Jane Tao; Hong Wang Journal: J Biol Chem Date: 2017-11-24 Impact factor: 5.157