| Literature DB >> 16738128 |
Andrew Flaus1, David M A Martin, Geoffrey J Barton, Tom Owen-Hughes.
Abstract
The Snf2 family of helicase-related proteins includes the catalytic subunits of ATP-dependent chromatin remodelling complexes found in all eukaryotes. These act to regulate the structure and dynamic properties of chromatin and so influence a broad range of nuclear processes. We have exploited progress in genome sequencing to assemble a comprehensive catalogue of over 1300 Snf2 family members. Multiple sequence alignment of the helicase-related regions enables 24 distinct subfamilies to be identified, a considerable expansion over earlier surveys. Where information is known, there is a good correlation between biological or biochemical function and these assignments, suggesting Snf2 family motor domains are tuned for specific tasks. Scanning of complete genomes reveals all eukaryotes contain members of multiple subfamilies, whereas they are less common and not ubiquitous in eubacteria or archaea. The large sample of Snf2 proteins enables additional distinguishing conserved sequence blocks within the helicase-like motor to be identified. The establishment of a phylogeny for Snf2 proteins provides an opportunity to make informed assignments of function, and the identification of conserved motifs provides a framework for understanding the mechanisms by which these proteins function.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16738128 PMCID: PMC1474054 DOI: 10.1093/nar/gkl295
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Tree view of Snf2 family. (A) Schematic diagram illustrating hierarchical classification of superfamily, family and subfamily levels. (B) Unrooted radial neighbour-joining tree from a multiple alignment of helicase-like region sequences excluding insertions at the minor and major insertion regions from motifs I to Ia and conserved blocks C–K for 1306 Snf2 proteins identified in the Uniref database. The clear division into subfamilies is illustrated by wedge backgrounds, coloured by grouping of subfamilies. Subfamilies DRD1 and JBP2 were not clearly separated, as discussed in text. (C) In order to illustrate the relationship between subfamilies, a rooted tree was calculated using HMM profiles for full-length alignments of the helicase regions. Groupings of subfamilies are indicated by colouring as in (B).
Summary of subfamilies
| Subfamily | Archetype gene | Assigned from Uniref | Other names associated with members |
|---|---|---|---|
| Snf2 | 117 | Snf2p, Sth1p, snf21, SMARCA4, BRG1, BAF190, hSNF2beta, SNF2L4, SMARCA2, hBRM, hSNF2a, SNF2L2, SNF2LA, SYD, splayed, psa-4, brahma | |
| Iswi | 83 | Isw1p, Isw2p, SMARCA1, SNF2L, SNF2L1, SNF2LB, SMARCA5, hSNF2H | |
| Lsh | 35 | YFR038W, SMARCA6, HELLS, LSH, PASG, DDM1, cha101 | |
| ALC1 | 19 | SNF2P | |
| Chd1 | 96 | CHD2, CHD-Z, hrp1, hrp3 | |
| Mi-2 | 88 | CHD3, Mi-2a, Mi2alpha, ZFH, PKL, pickle, CHD4, Mi-2b, Mi2beta, let-418, CHD5 | |
| CHD7 | 53 | CHD6, RIGB, KISH2, Kis-L, kismet, CHD8, HELSNF1, DUPLIN | |
| Swr1 | 44 | SRCAP, Snf2-related CBP activator protein, dom, domino, PIE1 | |
| EP400 | 27 | E1A binding proten p400, TNRC12, hDomino | |
| Ino80 | 34 | ||
| Etl1 | 44 | SMARCAD1, hHEL1, Fun30p, snf2SR | |
| Rad54 | 76 | Rad54l, hRAD54, RAD54A, Rdh54p, RAD54B, Tid1, okr, okra, mus-25 | |
| ATRX | 52 | XH2, XNP, Hp1bp2 | |
| Arip4 | 23 | ARIP4 | |
| DRD1 | 12 | ||
| JBP2 | 4 | ||
| Rad5/16 | 61 | rhp16, rad8, SMARCA3, SNF2L3, HIP116, HLTF, ZBU1, RNF80, RUSH-1alpha, P113, MUG13.1 | |
| Ris1 | 35 | ||
| Lodestar | 40 | LDS, TTF2, hLodestar, HuF2,factor 2 | |
| SHPRH | 44 | YLR247C | |
| Mot1 | 45 | TAFII170, TAF172, BTAF1, Hel89B | |
| ERCC6 | 71 | rad26, rhp26, CSB, csb-1, RAD26L | |
| SSO1653 | 149 | SsoRad54like | |
| SMARCAL1 | 54 | HARP, DAAD, ZRANB3, Marcal1 |
Listing subfamily name from prevailing protein name for first characterized member, archetype organism and official gene name, number of subfamily members identified in Uniref as in Materials and Methods, and a non-exhaustive list of alternative names for archetype and other subfamily members.
Subfamily occurrences in selected complete eukaryotic genomes
A: Counts by genome of predicted proteins assigned to each subfamily, for assignments based on highest positive bitscore for protein against each subfamily HMM profile in turn, where maximum bitscore >100. For a complete list of 54 genomes see Supplementary Table S1A. B: Counts of unique genes encoding the predicted proteins listed in part A. Single gene encoding each protein assumed for fungal genomes. Gene names for seven model organisms are tabulated in Supplementary Table S2, including official protein names where assigned.
Functional and sequence characteristics of subfamilies
| Grouping | Subfamily | Functional characteristics |
|---|---|---|
| Snf2-like | Snf2 | The archetype of the Snf2 subfamily, and the entire Snf2 family, is the |
| Close sequence homologues have also been identified in many model organisms, including the paralogue RSC ( | ||
| Homologues such as BRG1 and hBRM are components of megadalton-sized complexes containing other proteins that are also related to components of the yeast SWI/SNF complex ( | ||
| ISWI | Iswi (Imitation | |
| Iswi subfamily members participate in a variety of complexes and functional interactions. For example, human SNF2H has been found as part of RSF, hACF/WCRF, hWICH, hCHRAC, NoRC and also associated with cohesin, while SNF2L is the catalytic subunit of human NURF [summarized in ( | ||
| Lsh | Despite its name, the archetypal mouse Lsh (lymphoid-specific helicase) protein ( | |
| Lsh subfamily members are detected over a very broad range of eukaryotes including not only fungi, plants and animals, but also protists where their function is likely to be independent of DNA methylation. Furthermore, our genome scans also did not identify Lsh subfamily members in a number of lower animals, or in | ||
| ALC1 | The ALC1 subfamily derives its name from the observation that the human gene is ‘Amplified in Liver Cancer’ ( | |
| Chd1 | The archetypal ‘Chd’ protein is mouse Chd1, named after the presence of ‘Chromodomain, Helicase and DNA binding’ motifs ( | |
| Mouse Chd1 protein is the archetype of the first chromodomain-containing subfamily. Chd1 proteins have been purified as single subunits ( | ||
| Mi-2 | The second of the chromodomain-containing subfamilies is Mi-2, whose name derives from the Mi-2α and Mi-2β proteins which are the commonly used names for the human CHD3 and CHD4 gene products, respectively. Mi-2 was isolated as an autoantigen in the human disease dermatomyositis ( | |
| An additional human member of the Mi-2 subfamily, CHD5, may have a role in neural development and neuroblastomas ( | ||
| CHD7 | The third chromodomain-containing CHD7 subfamily includes four human genes, CHD6–CHD9. CHD7 has recently been linked to CHARGE syndrome which is a common cause of congenital abnormalities ( | |
| Swr1-like | Swr1 | The archetype of the Swr1 subfamily is Swr1p (SWI/SNF-related protein) from |
| EP400 | The EP400 subfamily archetype, E1A-binding protein p400, appears to have a role in regulation of E1A-activated genes ( | |
| The complex patterns of similarities and distinctions between EP400 and Swr1 subfamilies suggest a close functional relationship (Supplementary Figure S3A). Our HMM profiles clearly distinguish EP400 from Swr1 members, and show that EP400 members are restricted to vertebrates whereas Swr1 subfamily members are found in almost all eukaryotes ( | ||
| Ino80 | The archetype of the Ino80 subfamily is the Ino80 protein from | |
| Etl1 | Mouse Etl1 (Enhancer Trap Locus 1) derives its name from identification in an expression screen for loci having interesting properties in early development ( | |
| Rad54-like | Rad54 | The archetype of the Rad54 subfamily is the Rad54 protein from |
| Rad54 proteins have been extensively studied | ||
| ATRX | The ATRX subfamily derives its name from the Alpha Thalassemia/Mental Retardation syndrome, X-linked genetic disorder caused by defects in the activity of the human member, ATRX ( | |
| Arip4 | Mouse androgen receptor interacting protein 4 ( | |
| DRD1 | ||
| JBP2 | The JBP2 subfamily takes its name from the | |
| Rad54-like (cont.) | Both DRD1 and JBP2 are involved in processes which target modifications at the C5 position of the pyrimidine ring which will be exposed in the major groove. JBP2 and DRD1 members show sequence similarity, but have been conservatively assigned to separate subfamilies due to their distinct evolutionary ranges and the limited numbers of members available for building HMM profiles. The identification of a DRD1 subfamily member in | |
| Rad5/16-like | Rad5/16 | |
| Rad5p acts with the Ubc13p–Mms2p E2 ligase complex via its RING finger in one fate of the Rad6 pathway of replication linked DNA damage bypass to poly-ubiquitylate PCNA in ( | ||
| Rad16p acts in complex with Rad7p and Elc1p as the NEF4 nucleotide excision repair factor ( | ||
| Paradoxically, no DNA repair link has been reported for the single member of the Rad5/16 subfamily present in each mammalian genome such as human SMARCA3 (see also Lodestar and ERCC6 sections of this table). Instead, under the name RUSH1alpha, some have been reported as steroid regulated transcriptional regulators ( | ||
| Ris1 | The Ris1 protein from | |
| Lodestar | This subfamily is the only one within the Rad5/16 grouping which does not contain RING fingers in the major insertion site (Figure S3B). | |
| SHPRH | SHPRH proteins derive their name from the ordered sequence of domains | |
| Fungal SHPRH subfamily members typically do not contain the linker histone-related motif, although they do contain the PHD and RING finger domains. Animal SHPRH members contain an additional ∼50 kDa polypeptide sequence immediately upstream of the RING finger domain within the major insertion region (Supplementary Figure S3B). Although lacking an identifiable motif, this region has a number of cysteines suggestive of a zinc finger type coordination and has some 30% charged residues. Fungal members also contain a significant region of charged residues ahead of the RING finger | ||
| SSO1653-like | Mot1 | |
| ERCC6 | Human ERCC6 protein ( | |
| Most higher animal genomes contain three separate genes assigned to the ERCC6 subfamily, along with single Lodestar and Rad5/16 subfamily members. Conversely, fungal genomes typically encode a single ERCC6 member, no Lodestar subfamily member, but at least two Rad5/16 members. This may reflect divergent strategies for accomplishing transcription-coupled repair | ||
| A number of mutations in the helicase region which result in Cockayne syndrome have been identified and these map to interesting locations in the Snf2 family crystal structures ( | ||
| SSO1653 | SSO1653, the sole Snf2 family member in archaeal | |
| SSO1653-like (cont.) | The SSO1653 subfamily helicase-like region also shows close linkage with a zinc finger SWIM motif that may bind to nucleic acids ( | |
| Distant | SMARCAL1 | The human SMARCAL1 (SMARCA-Like 1) protein and homologues, also known as HARP ( |
| The second subtype is found in animal, plant and some protist genomes and contains SMARCAL1 subfamily members related in overall domain organization to the human ZRANB3 protein (Zinc finger, RAN-Binding domain containing 3). The helicase region is located at the N-terminus of the polypeptide, followed by an unusual zinc finger structure related to those found in Ran protein binding proteins ( | ||
| rapA group | The rapA group includes some 220 eubacterial and archaeal members with significantly more sequence variation than other subfamilies. Subsets of sequences are qualitatively visible within multiple alignments of the rapA group, but initial attempts to distinguish them have been unreliable due to the variability of microbial sequences and non-homogeneous sampling in sequenced organisms (e.g. half of all complete bacterial genomes are for a limited range of firmicute and gamma proteobacterial genera) | |
| Although the rapA group contains the conserved sequence patterns of the Snf2 family for the classical helicase-like motifs, other conserved blocks cannot be easily identified (Supplementary Figure S4). The characteristic extended span of at least 160 residues between helicase motifs III and IV ( | ||
| The rapA group also includes a number of polypeptides for which the helicase-like region comprises effectively the entire polypeptide, in contrast with other Snf2 family members which almost universally contain sequences outside the helicase-like region that are likely to form accessory domains or interaction surfaces | ||
| The only member of this group for which biological function has been investigated is |
Summaries of known biochemical, biological and distinctive sequence of each subfamily. Background colouring of subfamilies for groupings as shown in Figure 1.
Figure 2Conserved residues within Snf2 helicase-like region. Sequence logo of global multiple alignment of 1306 Snf2 helicase-like region for alignment positions with residues in >90% of proteins. Helicase motifs are indicated in solid black boxes with roman numerals I–IV, additional conserved blocks are indicated in dashed black boxes with uppercase letters A–N, and conserved hydrophobic residues packing in the core of the structure by grey solid boxes. Motif and box labels as in Thoma et al. (47) with extensions. A comparison to other nomenclatures is in Supplementary Table S5. See Table 4 for actual distances between conserved blocks.
Figure 3Conserved blocks contribute to distinctive structural features of Snf2 family proteins. Structural components of Snf2 family proteins relevant to the conservation are illustrated on the zebrafish Rad54A structure [pdb 1Z3I (153)]. (A) core recA-like domains 1 and 2 including colouring of helicase motifs (I in green, Ia in blue, II in bright red, III in yellow, IV in cyan, V in teal and VI in dark red). (B) Q motif (pink). (C) antiparallel alpha helical protrusions 1 and 2 (red) projecting from recA-like domains 1 and 2, respectively. (D) Linker spanning from protrusion 1 to protrusion 2 (middle blue). (E) Major insertion region behind protrusion 2 (light green). (F) triangular brace (magenta). (G) Schematic diagram showing location of structural elements and helicase motifs coloured as in A–F, with conserved blocks from Figure 2 shown as white boxes. Spans identified by Pfam profiles SNF2_N and Helicase_C are shown flanking the major insertion site.
Spacings between helicase motifs and major conserved blocks by subfamily
Means and standard deviations (in parentheses) by subfamily for the number of amino acids between helicase motifs and the conserved blocks B, C and K, calculated for 1306 Snf2 family members from Uniref assigned to subfamilies (Table 1). Spacings are calculated from the edges of the core conserved residues of motifs as marked in Supplementary Figure S4.