Literature DB >> 16738128

Identification of multiple distinct Snf2 subfamilies with conserved structural motifs.

Andrew Flaus¹, David M A Martin, Geoffrey J Barton, Tom Owen-Hughes.

Abstract

The Snf2 family of helicase-related proteins includes the catalytic subunits of ATP-dependent chromatin remodelling complexes found in all eukaryotes. These act to regulate the structure and dynamic properties of chromatin and so influence a broad range of nuclear processes. We have exploited progress in genome sequencing to assemble a comprehensive catalogue of over 1300 Snf2 family members. Multiple sequence alignment of the helicase-related regions enables 24 distinct subfamilies to be identified, a considerable expansion over earlier surveys. Where information is known, there is a good correlation between biological or biochemical function and these assignments, suggesting Snf2 family motor domains are tuned for specific tasks. Scanning of complete genomes reveals all eukaryotes contain members of multiple subfamilies, whereas they are less common and not ubiquitous in eubacteria or archaea. The large sample of Snf2 proteins enables additional distinguishing conserved sequence blocks within the helicase-like motor to be identified. The establishment of a phylogeny for Snf2 proteins provides an opportunity to make informed assignments of function, and the identification of conserved motifs provides a framework for understanding the mechanisms by which these proteins function.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2006 PMID： 16738128 PMCID： PMC1474054 DOI： 10.1093/nar/gkl295

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Some 15 years ago Gorbalenya and Koonin (1,2) identified a large group of proteins sharing a series of short ordered motifs. The majority of members with known function were nucleic acid strand separating helicases so the sequences became known as helicase motifs and were labelled sequentially I, Ia, II, III, IV, V and VI. A number of additional conserved blocks with broad distributions within these helicase-like proteins have subsequently been identified, such as the TxGx (3) and Q motifs (4). Proteins containing the helicase motifs are subdivided into several superfamilies on the basis of similarity. Structural characterizations have revealed that helicase-like superfamilies 1 and 2 (SF1 and SF2) are related with a common core of two recA-like domains (5). The helicase-like enzymes link ATP hydrolysis to a directed change in the relative orientation of these domains (6). Structural and mutagenesis studies have shown that each of the conserved motifs in the active site cleft between the recA-like domains plays a role in the transformation of chemical energy from ATP hydrolysis to mechanical motion. This enzymatic process has been suggested to represent one application of a more general mechanism used in many proteins containing a recA-like domain (7). Proteins with a helicase-like region of similar primary sequence to Saccharomyces cerevisiae Snf2p comprise the Snf2 family within SF2 (Figure 1A). Indeed, Snf2p was specifically aligned within SF2 by Gorbalenya and Koonin (1). Many of the first identified Snf2 family members were ATPases within chromatin remodelling complexes and it was recognized that the presence of a core polypeptide related to Snf2p is a defining property of ATP-dependent chromatin remodelling (8). It is now apparent that the Snf2 family comprises a large group of ATP-hydrolysing proteins that are ubiquitous in eukaryotes, but also present in eubacteria and archaea.

Figure 1

Tree view of Snf2 family. (A) Schematic diagram illustrating hierarchical classification of superfamily, family and subfamily levels. (B) Unrooted radial neighbour-joining tree from a multiple alignment of helicase-like region sequences excluding insertions at the minor and major insertion regions from motifs I to Ia and conserved blocks C–K for 1306 Snf2 proteins identified in the Uniref database. The clear division into subfamilies is illustrated by wedge backgrounds, coloured by grouping of subfamilies. Subfamilies DRD1 and JBP2 were not clearly separated, as discussed in text. (C) In order to illustrate the relationship between subfamilies, a rooted tree was calculated using HMM profiles for full-length alignments of the helicase regions. Groupings of subfamilies are indicated by colouring as in (B).

At least a subset of Snf2 family proteins act as ATP-dependent DNA translocases (9–12). Some of these proteins have also been found to be capable of generating unconstrained superhelical torsion in DNA (11,13–17), proposed to occur as a result of the translocation of DNA into constrained loops. This is substantiated by recent analysis of the action of RSC on single DNA molecules (18). In addition to distorting DNA, the ATP-dependent action of these proteins can disrupt chromatin as measured using a range of different assays (8), although other DNA–protein interactions can also be affected. For example, Rad54 promotes Rad51-dependent strand pairing (13), and Mot1 displaces the TATA-binding protein (TBP) from DNA (19). Thus although many Snf2 family proteins are likely to act to alter chromatin structure, this is not the case for all members of the family. Early biochemical studies and sequence alignments suggested that members of the Snf2 family could be further subdivided into a number of subfamilies (20). These subfamilies have traditionally taken the name of the archetypal member, such as S.cerevisiae Snf2p (Snf2 subfamily), Drosophila melanogaster Iswi (Iswi subfamily), Mus musculus Chd1 (Chd subfamily) and S.cerevisiae Rad54p (Rad54 subfamily). Snf2p, therefore, lends its name to both the collective Snf2 family and a specific Snf2 subfamily (Figure 1A). The only comprehensive analysis of the Snf2 family sequences to date was performed by Eisen et al. in 1995 (20). Although it has subsequently been revisited within the context of various biochemical studies (21–24), no broad survey has been conducted for over a decade. To gain new insights into the Snf2 family, we have catalogued Snf2 family members by scanning for proteins containing spans with similarity in sequence over the helicase-like region, classifying them into subfamilies, analysing the distribution of these subfamilies in complete genomes, and mapping the common sequence characteristics onto the newly available three-dimensional structures. We have identified 24 distinct subfamilies, 11 with near ubiquitous representation in eukaryotic genomes. Many of these subfamilies correlate with known biological function, but there remain a significant number for which little information is currently available. The abundance of Snf2 family members in eukaryotes in comparison to archaea and eubacteria points to their diversification early in eukaryote radiation. This diversity and the currently known functional linkages suggest the Snf2 family helicase-like region is specifically adapted to perform distinct functions within different subfamilies. Underlying this, analysis of the conserved blocks of residues reveals a common core of structural features likely to be fundamental to the mechanism of the Snf2 family motors.

MATERIALS AND METHODS

Data and software sources

Swissprot/Uniprot (25) release 42 and Uniref100 (26) release 5 were downloaded from the European Bioinformatics Institute ( and , respectively). The sources and version details for the predicted protein complements of the 54 eukaryotic, 24 archaeal and 269 prokaryotic organisms surveyed are available at the webserver . Analyses were performed on a cluster of dual Pentium III microcomputers running a customized Debian Linux operating system. Sequence data were manipulated with the EMBOSS suite version 2.8 (27). Multiple sequence alignments were created with Muscle version 3.0 (28) and MAFFT version 5.667 with parameters retree = 2 and maxiterate = 1000 (29) and visualized with Jalview version 2 (30). Phylogenetic and pairwise trees were constructed with neighbor, protdist and drawtree from the PHYLIP suite version 3.572 (31) and additionally visualized with ATV version 2.03 (32) and Hypertree version 1.0.0 (33). Hidden Markov model (HMM) construction, calibration and searching was performed with the HMMer suite versions 2.1.1 and 2.2g (34), and pairwise comparison of HMMs carried out with PRC version 1.5.3 in global-global mode (35). Sequence LOGOs were generated with WebLogo version 2.8.2 (36) and profile logos generated with logomat-p using the draw_logo method, version 0.71 (37). Protein structures were visualized with PyMol version 0.99 (38). Data were managed using mySQL () and PostgreSQL () relational databases. Calculations were carried out using default parameters except where indicated. All other analyses used custom Perl or Python scripts written by the authors. All supplementary data for this report and an interactive database of the results are publicly available at the web server .

Experimental methods

A full technical description of the library construction and validation and details of the web server will be available elsewhere (D.M.A. Martin and A. Flaus, manuscript in preparation). The procedure used is summarized in outline below.

Global Snf2 family HMM construction

Twenty-eight biochemically characterized S.cerevisiae Snf2p-like chromatin remodelling proteins or close homologues were selected as seed sequences. The core helicase-like region spanning from 50 amino acids N-terminal to helicase motif I to 50 amino acids C-terminal of helicase motif VI was excised from each protein sequence and multiple alignments were created with Muscle using default parameters. An initial seed HMM was constructed following manual assessment of the alignment. Swissprot 42 was searched using this profile, expanding the set of Snf2 family sequences to 620 candidates with matches up to E-values of 2 although some matches near this cut-off were helicase-like sequences which were not members of the Snf2 family. Further iterations of HMM construction and searching of Swissprot 42 and model organism databases, followed by curation of sequences to remove fragmented sequences and other artefacts not belonging to the Snf2 family, yielded a set of 948 manually curated sequences which were then aligned by MAFFT. The resultant profile was employed in searching Uniref100 and identified 5046 sequences with a match of E-value 10 or better of which 3932 had E-values below 1 and 1879 had positive bit-scores. This cut-off may appear generous but was intentional to enable maximum possible inclusion of Snf2 family members. It is predicated by the considerable variation in sequence between helicase motifs III and IV giving rise to poor alignments to the general model in this region and consequently lower bit scores. The cut-offs selected were determined by manual inspection of the hit lists and alignments to include established Snf2 family-like proteins but exclude more distant relatives. The highest E-value for a sequence classified into a subfamily (see below) was 2.2 × 10−16.

Phylogenetic analysis of helicase-like sequences

A multiple sequence alignment with Muscle, followed by distance matrix calculation and neighbour-joining tree reconstruction allowed the curation of 2305 sequences into subgroupings in the Snf2 family based on the sequence of the helicase-like region. 1306 sequences were classified in 24 individual subfamilies within the Snf2 family (Table 1) after the exclusion of 436 which were fragmentary (did not span completely from helicase motifs I to VI) or contained unique large inserts or deletions. A further 220 were assigned to the prokaryotic rapA group and the remainder to more distantly related clusters or as highly truncated outliers which could not be reasonably aligned. An overview of a neighbour-joining tree constructed from a multiple alignment excluding the variable minor and major insertion regions (see text) and visualized using Hypertree demonstrates the clearly distinguishable division of subfamilies (Figure 1B). Each subfamily was individually examined for redundancy and further internal structure (data not shown).

Table 1

Summary of subfamilies

Subfamily	Archetype gene	Assigned from Uniref	Other names associated with members
Snf2	S.cerevisiae SNF2	117	Snf2p, Sth1p, snf21, SMARCA4, BRG1, BAF190, hSNF2beta, SNF2L4, SMARCA2, hBRM, hSNF2a, SNF2L2, SNF2LA, SYD, splayed, psa-4, brahma
Iswi	D.melanogaster Iswi	83	Isw1p, Isw2p, SMARCA1, SNF2L, SNF2L1, SNF2LB, SMARCA5, hSNF2H
Lsh	M.musculus Hells	35	YFR038W, SMARCA6, HELLS, LSH, PASG, DDM1, cha101
ALC1	Homo sapiens CHD1L	19	SNF2P
Chd1	M.musculus Chd1	96	CHD2, CHD-Z, hrp1, hrp3
Mi-2	H.sapiens CHD3	88	CHD3, Mi-2a, Mi2alpha, ZFH, PKL, pickle, CHD4, Mi-2b, Mi2beta, let-418, CHD5
CHD7	H.sapiens CHD7	53	CHD6, RIGB, KISH2, Kis-L, kismet, CHD8, HELSNF1, DUPLIN
Swr1	S.cerevisiae SWR1	44	SRCAP, Snf2-related CBP activator protein, dom, domino, PIE1
EP400	H.sapiens EP400	27	E1A binding proten p400, TNRC12, hDomino
Ino80	S.cerevisiae INO80	34
Etl1	M.musculus Smarcad1	44	SMARCAD1, hHEL1, Fun30p, snf2SR
Rad54	S.cerevisiae RAD54	76	Rad54l, hRAD54, RAD54A, Rdh54p, RAD54B, Tid1, okr, okra, mus-25
ATRX	H.sapiens ATRX	52	XH2, XNP, Hp1bp2
Arip4	M.musculus Srisnf2l	23	ARIP4
DRD1	Arabidopsis thaliana DRD1	12
JBP2	T.brucei JBP2	4
Rad5/16	S.cerevisiae RAD5, RAD16	61	rhp16, rad8, SMARCA3, SNF2L3, HIP116, HLTF, ZBU1, RNF80, RUSH-1alpha, P113, MUG13.1
Ris1	S.cerevisiae RIS1	35
Lodestar	D.melanogaster Lodestar	40	LDS, TTF2, hLodestar, HuF2,factor 2
SHPRH	H.sapiens SHPRH	44	YLR247C
Mot1	S.cerevisiae MOT1	45	TAFII170, TAF172, BTAF1, Hel89B
ERCC6	H.sapiens ERCC6	71	rad26, rhp26, CSB, csb-1, RAD26L
SSO1653	S.solfataricus SSO1653	149	SsoRad54like
SMARCAL1	H.sapiens SMARCAL1	54	HARP, DAAD, ZRANB3, Marcal1

Listing subfamily name from prevailing protein name for first characterized member, archetype organism and official gene name, number of subfamily members identified in Uniref as in Materials and Methods, and a non-exhaustive list of alternative names for archetype and other subfamily members.

Construction and use of subfamily profiles

Each subfamily sequence set was realigned by MAFFT, manually curated with Jalview and an HMM profile constructed. These profiles were aligned with PRC in an all-against-all comparison. Although these profile comparisons do not give a true phylogenetic tree, the scores obtained from the pairwise profile alignments can be used to construct a representational tree (Figure 1C), indicating the relationship between the HMM profiles to be consistent with the sequence-based tree (Figure 1B). It was also observed that the subfamilies could be aggregated into some broad groupings that correlate with functional properties, where known (Figure 1). All 24 subfamily profiles were combined in one HMM library and the hmmpfam application employed in searching individual genomic datasets to provide phylogenomic information about the taxonomic distribution of Snf2-like proteins. The subfamily hit with maximal bitscore >100 was used to assign membership in a semi-automatic procedure. With very few exceptions, classification was extremely clear with strong discrimination between the top hit and the second best hit (data not shown).

RESULTS AND DISCUSSION

Starting from a seed set of helicase-like region sequences from 28 demonstrated Snf2p-related proteins or close homologues, we have carried out a broad survey of Snf2 family proteins. This was achieved by iterative cycles of manual curation of multiple alignments and neighbour-joining trees to identify Snf2 proteins by similarity, construction of an HMM profile from the multiple alignments of identified proteins, and scanning of global and model organism protein databases using the HMM profile to uncover further sequences for curation. Our current global Snf2 family profile scan revealed 3932 sequences with E-value under 1 (1879 with bitscore > 0) in Uniref100 release 5 [2.4 million entries (26)]. Of these, 1306 sequences were identified as belonging to the Snf2 family and to span the full helicase region from motifs I to VI without introducing large unique insertions or deletions. A further 220 sequences fall within the rapA group, while other hits appear to belong to more distantly related groups (see below) or were too highly truncated to be aligned. Neighbour-joining trees from multiple alignments of the set of 1306 sequences revealed a well-defined branching structure (Figure 1B) and enabled their assignment to 24 distinct subfamilies (Table 1). Subfamily-specific HMM profiles were constructed from these assignments and used to characterize the Snf2 family complement for 54 complete eukaryote genomes. The counts of predicted proteins and unique encoding genes for 21 selected genomes are listed in Table 2, part A and B, respectively (see Supplementary Table S1A for full analysis of eukaryotic genomes, and Supplementary Table S2 for gene IDs by subfamily for seven common model organisms). In addition, 24 complete archaeal and 269 bacterial genomes were scanned (Supplementary Tables S1B and S1C).

Table 2

Subfamily occurrences in selected complete eukaryotic genomes

A: Counts by genome of predicted proteins assigned to each subfamily, for assignments based on highest positive bitscore for protein against each subfamily HMM profile in turn, where maximum bitscore >100. For a complete list of 54 genomes see Supplementary Table S1A. B: Counts of unique genes encoding the predicted proteins listed in part A. Single gene encoding each protein assumed for fungal genomes. Gene names for seven model organisms are tabulated in Supplementary Table S2, including official protein names where assigned.

Subfamilies within the Snf2 family

The clear distinction and significant number of subfamilies based on the helicase-like region (Figure 1B and Table 1) reflects both a remarkable breadth and specificity in the Snf2 family. An additional level of similarity distinguishes apparent groupings of subfamilies (Figure 1), which echo current understanding of their functional diversity (Table 3). Most of the best studied Snf2 family proteins fall into a grouping of ‘Snf2-like’ subfamilies including proteins such as S.cerevisiae Snf2p, D.melanogaster Iswi, mouse Chd1 and human Mi-2, which are core subunits of the well-known ATP-dependent chromatin remodelling complexes. A separate ‘Swr1-like’ grouping encompasses the Swr1, Ino80, EP400 and Etl1 subfamilies. The ‘Rad54-like’ grouping contains the Rad54 subfamily, relatives such as ATRX and Arip4, and also includes the recently recognized DRD1 and JBP2 proteins. A further, unexpected, ‘Rad5/16-like’ grouping links several poorly studied subfamilies, three of which contain RING finger insertions within the helicase-like region (see below). The ‘SSO1653-like’ grouping of Mot1, ERCC6 and SSO1653 is notable because all three subfamilies are thought to have non-chromatin substrates. Finally, we have labelled SMARCAL1 proteins as ‘distant’ because they lack several otherwise conserved sequence hallmarks of the Snf2 family (see below). Although some groupings are clear, further investigations will be required to verify those where the boundaries are less distinct.

Table 3

Functional and sequence characteristics of subfamilies

Grouping	Subfamily	Functional characteristics
Snf2-like	Snf2	The archetype of the Snf2 subfamily, and the entire Snf2 family, is the S.cerevisiae Snf2 protein, originally identified genetically (mutations that were Sucrose Non Fermenting or defective in mating type SWItching). However, these genes were later found to play roles in regulating transcription of a broader spectrum of genes and to catalyse alterations to chromatin structure. Subsequently, the proteins were purified as a non-essential 11 subunit multi-protein complex capable of ATP-dependent chromatin disruption termed the SWI/SNF complex (8)
		Close sequence homologues have also been identified in many model organisms, including the paralogue RSC (60,61) and the orthologues D.melanogaster Brahma (62), and human hBRM (63) and BRG1 (64). Many of these have been shown to alter the structure of chromatin at the nucleosomal level and to be involved in transcription regulation, although other nucleosome-related roles have also been identified (8). Recent hypotheses have centred on Snf2 subfamily members performing a generally disruptive function on nucleosomes leading either to sliding of the nucleosome (65,66) alterations of histone DNA contacts (67) or to partial or complete removal of the histone octamer components (68,69)
		Homologues such as BRG1 and hBRM are components of megadalton-sized complexes containing other proteins that are also related to components of the yeast SWI/SNF complex (70). However, Snf2 subfamily members have also been reported to interact with additional proteins including histone deacetylases (71), methyl DNA-binding proteins (72), histone methyl transferases (73), the retinoblastoma tumor suppressor protein (74,75), histone chaperones (76), Pol II (77,78) and cohesin (79). These complexes may be recruited to specific regions of the genome through interactions with sequence-specific DNA-binding proteins [reviewed by (80)] or specific patterns of histone modifications (81,82)
	ISWI	Iswi (Imitation SWI2) protein was identified in D.melanogaster by similarity to Snf2p (83) and is at the catalytic core of both the NURF and the ACF/CHRAC chromatin remodelling complexes (84–86). Biochemical studies favour the ability of Iswi proteins to reposition rather than disrupt nucleosomes. Significantly, all Iswi subfamily proteins require a particular region of the histone H4 tail near the DNA surface as an allosteric effector (87–89)
		Iswi subfamily members participate in a variety of complexes and functional interactions. For example, human SNF2H has been found as part of RSF, hACF/WCRF, hWICH, hCHRAC, NoRC and also associated with cohesin, while SNF2L is the catalytic subunit of human NURF [summarized in (90)]. Such complexes are involved in a variety of functions including activation/repression of the initiation and elongation of transcription, replication and chromatin assembly [reviewed in (90–93)]. Similar to the Snf2 subfamily, Iswi subfamily members appear to be adaptable subunits for complexes related to the alteration of nucleosome positioning (90)
	Lsh	Despite its name, the archetypal mouse Lsh (lymphoid-specific helicase) protein (94) is widely expressed and without detectable helicase activity. Lsh and its human homologue are alternatively known as PASG, SMARCA6 or by the official gene name HELLS (Helicase Lymphoid Specific). Mutants lead to premature aging with cells exhibiting replicative senescence (95). Importantly, global loss of CpG methylation is observed in both mammalian mutant cell lines and the Arabidopsis thaliana homologue, DDM1 (96,97). Consistent with a direct role in DNA methylation, Lsh is localized to heterochromatic regions (98). Evidence has been presented that A.thaliana DDM1 can slide nucleosomes in vitro (99). The S.cerevisiae subfamily member at locus YFR038W has no assigned name and deletion strains are viable
		Lsh subfamily members are detected over a very broad range of eukaryotes including not only fungi, plants and animals, but also protists where their function is likely to be independent of DNA methylation. Furthermore, our genome scans also did not identify Lsh subfamily members in a number of lower animals, or in S.pombe. This may represent functional redundancy or difficulties in assigning distant homologues relative to other subfamilies in the grouping
	ALC1	The ALC1 subfamily derives its name from the observation that the human gene is ‘Amplified in Liver Cancer’ (100). Two alternative but potentially confusing names, CHD1L [CHD1-like (101)] and SNF2P [SNF2-like in plants (102)], have also been used to refer to subfamily members. ALC1 subfamily members contain a helicase-like region which is relatively similar to the nucleosome-active Snf2, Iswi and Chd1 subfamilies, but which is coupled ahead to a macro domain implicated in ADP–ribose interactions (103). ALC1 subfamily members can be identified in both higher animals and plants, but not in lower animals
	Chd1	The archetypal ‘Chd’ protein is mouse Chd1, named after the presence of ‘Chromodomain, Helicase and DNA binding’ motifs (104). The characteristic chromodomain motifs can in principle bind diverse targets including proteins, DNA and RNA (105). Although Snf2 family proteins containing chromodomains are often referred to as a single ‘Chd’ subfamily, it has previously been recognized they fall into the same three distinct subfamilies (106,107) which we have distinguished in this analysis
		Mouse Chd1 protein is the archetype of the first chromodomain-containing subfamily. Chd1 proteins have been purified as single subunits (108,109) although associations between Chd1 and other proteins have been identified subsequently (110,111). Yeast Chd1 has been implicated in transcription elongation and termination (58,112), and the human CHD1 and CHD2 proteins and D.melanogaster dChd1 have been linked to transcriptional events (113). S.pombe Chd1 subfamily member hrp1 (helicase related protein 1) has been linked with both transcription termination (58) and chromosome condensation (114), whereas the paralogous hrp3 has been linked with locus-specific silencing (115)
	Mi-2	The second of the chromodomain-containing subfamilies is Mi-2, whose name derives from the Mi-2α and Mi-2β proteins which are the commonly used names for the human CHD3 and CHD4 gene products, respectively. Mi-2 was isolated as an autoantigen in the human disease dermatomyositis (116). Subsequently, the two proteins and their homologues in D.melanogaster and Xenopus have been recognized as core subunits of NuRD complexes which link DNA methylation to chromatin remodelling and deacetylation (117). The chromodomains in D.melanogaster Mi-2 are required for activity on nucleosome substrates (118). Human Mi-2α differs from Mi-2β principally by its additional C-terminal domain which directs complexes containing it for a specific transcriptional repression role (119). Since Mi-2 proteins are widely expressed but have specific roles, it has been suggested they may be directed by incorporation of different targeting subunits (117,120)
		An additional human member of the Mi-2 subfamily, CHD5, may have a role in neural development and neuroblastomas (121,122) although its biochemical associations are unknown. The A.thaliana subfamily member, PKL (swollen roots of mutants resemble a pickle), has been shown to play a role in repressing embryonic genes during plant development (123)
	CHD7	The third chromodomain-containing CHD7 subfamily includes four human genes, CHD6–CHD9. CHD7 has recently been linked to CHARGE syndrome which is a common cause of congenital abnormalities (124) with most linked mutations resulting in major nonsense, frameshift or splicing changes (125). There is little functional information available about CHD6 [originally known as CHD5 (126)], CHD8 or CHD9 [also known as CReMM (127)]. The most studied member of the CHD7 subfamily is the product of the D.melanogaster gene kismet. The enormous 574 kDa KIS-L (but not the ‘smaller’ 225 kDa KIS-S form) contains a Snf2 family helicase-like region (128). Although identified as a trithorax family gene acting during development, a recent report suggests that KIS-L may play a global role at an early stage in RNA pol II elongation (129)
Swr1-like	Swr1	The archetype of the Swr1 subfamily is Swr1p (SWI/SNF-related protein) from S.cerevisiae which is part of the large SWR1 complex that exchanges histone H2A.Z-containing for wild-type H2A-containing dimers (41–43,130). Three other characterized proteins belong to the subfamily: PIE1 is involved in the A.thaliana vernalization regulation [Photoperiod-Independent Early flowering (131)] through a pathway intimately linked with histone lysine methylation events (132). D.melanogaster Domino is an essential, development-linked protein (133) and alleles can suppress position effect variegation, implying they may be linked to heterochromatin functions. Domino participates in a complex that combines components of the homologous yeast SWR1 and NuA4 complexes (134), and has been shown to function in acetylation dependent histone variant exchange within the TRRAP/TIP60 complex (135). The human member SRCAP [Snf2p-related CBP activating protein (136)] acts as a transcriptional co-activator of steroid hormone dependent genes and has recently been shown to be a component of a human TRRAP/TIP60 complex (137) and other complexes (see EP400 below). It also interacts with several coactivators including CBP (136,138). SRCAP can rescue D.melanogaster domino mutants (139), implying functional homology
	EP400	The EP400 subfamily archetype, E1A-binding protein p400, appears to have a role in regulation of E1A-activated genes (140–142). EP400 has been shown to interact strongly with ruvB-like helicases TAP54α/β in the TRRAP/TIP60 histone acetyl transferase complex (142)
		The complex patterns of similarities and distinctions between EP400 and Swr1 subfamilies suggest a close functional relationship (Supplementary Figure S3A). Our HMM profiles clearly distinguish EP400 from Swr1 members, and show that EP400 members are restricted to vertebrates whereas Swr1 subfamily members are found in almost all eukaryotes (Table 2 and Supplementary Table S1A). Although most vertebrate genomes contain a gene each for an Swr1 and EP400 member, some have only one of the pair. The consensus for the helicase-like regions of the subfamilies shows 50% identity and animal members of both contain large proline and serine/threonine rich insertions at the major insertion site (Supplementary Figure S3A; see below). Members of the two subfamilies also contain overlapping combinations of accessory domains outside the helicase-like region: D.melanogaster DominoA (Swr1 subfamily) and human EP400 (EP400 subfamily) both contain SANT domains, whereas human SRCAP (Swr1 subfamily) instead contains an AT hook (139). This has led to some confusion, with human EP400 being referred to as hDomino although D.melanogaster Domino has higher similarity to human SRCAP (142) and SRCAP can complement Domino mutants (139). In addition to the complexity in primary sequence relationships, complexes of potentially overlapping composition exist involving human EP400 and SRCAP including the NCoR-1 histone deacetylase (143), TRRAP/TIP60 histone acetylase (137,142,144) and DMAP1 complex (145). The confusing overlap of the mammalian Swr1 and EP400 subfamily members may stem from multiple roles for Swr1 subfamily members in lower animals and fungi. For example, it has been suggested that D.melanogaster alternative splice isoforms DominoA and DominoB are functional homologues of EP400 and SRCAP, respectively (139), and that S.cerevisiae Eaf1p is a functional homologue of human EP400 (134) although it lacks both Snf2-related helicase-like and extended proline-rich regions
	Ino80	The archetype of the Ino80 subfamily is the Ino80 protein from S.cerevisiae. Further members have been identified by sequence similarity in fungi, plants and animals (24). Ino80p was first isolated through its role in transcriptional regulation of inositol biosynthesis (146,147) and forms part of the large Ino80.com complex (148). This complex is notable not only because it can reposition nucleosomes, but also because it is the only known Snf2 family-related complex able to separate DNA strands in a traditional helicase assay (148). However, the Ino80 complex contains two RuvB-like helicase subunits which may assist in this. The human INO80 complex has recently been shown to contain many proteins homologous to Ino80.com subunits, including the RuvB-like helicases, and to be capable of mobilizing mononucleosomes (149). S.cerevisiae INO80-deleted strains are sensitive to DNA damaging agents, and recent studies have implicated Ino80p directly in the events of double-stranded break repair (150,151), perhaps for the eviction of nucleosomes in the vicinity of the break (152) although other remodelling complexes such as Swr1, RSC and SWI/SNF may also participate in this repair pathway [reviewed in (153)]
	Etl1	Mouse Etl1 (Enhancer Trap Locus 1) derives its name from identification in an expression screen for loci having interesting properties in early development (154). Although members are present in all but the lowest eukaryotes, including the human homologue SMARCAD1 (SMARCA containing DEAD box 1) (155) and S.cerevisiae FUN30 (Function Unknown 30) (156), very little attention has been focussed on these proteins. Etl1 is very widely expressed but non-essential, although deletion causes a variety of significant developmental phenotypes (157). FUN30 deletions are viable although temperature sensitive (158), and mutants show decreased sensitivity to UV radiation (159)
Rad54-like	Rad54	The archetype of the Rad54 subfamily is the Rad54 protein from S.cerevisiae which was isolated because its inactivation leads to increased sensitivity to ionizing radiation. Rad54p and its homologues in other organisms play an important but as yet incompletely understood role in homologous recombination by stimulating Rad51-mediated single strand invasion into the target duplex, and subsequent steps in the process (160,161). Many organisms also contain a second subfamily member, such as S.cerevisiae Rdh54p or S.pombe tid1p. These are frequently implicated in mitotic repair and meiotic crossover (162), although the role of the human homologue RAD54B is unclear (163)
		Rad54 proteins have been extensively studied in vitro. They have been shown to be able to generate local changes in DNA topology in supercoiled plasmids (13,164,165), to translocate along DNA by biochemical (11) and other methods, and to alter the accessibility of nucleosomal DNA (11,166,167). However, this latter activity appears inefficient compared to purified complexes from the Snf2 and Iswi subfamilies. The crystal structure of the zebrafish Rad54 helicase-like region has been determined recently (47) and is discussed in more detail in the text
	ATRX	The ATRX subfamily derives its name from the Alpha Thalassemia/Mental Retardation syndrome, X-linked genetic disorder caused by defects in the activity of the human member, ATRX (168). This protein is localized to centromeric heterochromatin (169), and purified complexes have been shown to increase the accessibility of nucleosomal DNA although with only moderate efficiency (12). ATRX has been implicated both in the regulation of transcription and heterochromatin structure (170), although the mechanism by which it acts is unclear
	Arip4	Mouse androgen receptor interacting protein 4 (171) can bind to DNA and generate ATP-dependent local torsion (16). Although it can also bind nucleosomes, Arip4 does not appear to be able to alter their nuclease sensitivity, leading to the conclusion that nucleosome mobilization may not be its primary role (172). Interestingly, mutation of the six lysine sumoylation sites in the protein destroyed DNA binding and ATPase activity (172)
	DRD1	A.thaliana DRD1 is named from its phenotype ‘Defective in RNA-directed DNA methylation’ (173). DRD1 functions together with an atypical RNA polymerase IV to establish and also remove non-CpG DNA methylation as part of an RNA interference mediated pathway (174,175)
	JBP2	The JBP2 subfamily takes its name from the T.brucei J Binding Protein 2 which regulates insertion of an unusual glycosylated thymine-derived base, J, which marks silenced telomeric DNA (176)
Rad54-like (cont.)		Both DRD1 and JBP2 are involved in processes which target modifications at the C5 position of the pyrimidine ring which will be exposed in the major groove. JBP2 and DRD1 members show sequence similarity, but have been conservatively assigned to separate subfamilies due to their distinct evolutionary ranges and the limited numbers of members available for building HMM profiles. The identification of a DRD1 subfamily member in Dictyostelium suggests that the subfamilies may be more widespread than indicated by the current small sample set. The L.major, T.brucei and T.cruzi genomes with JBP2 subfamily members also contain proteins assigned to the Arip4 subfamily whose members are otherwise found in higher fungi and animals, but not in DRD1-containing plants. It is possible that a relationship encompasses not only DRD1 and JBP2, but also Arip4
Rad5/16-like	Rad5/16	S.cerevisiae Rad5 and Rad16 proteins are distinct but dual archetypes for this subfamily and both are intimately involved in DNA repair pathways
		Rad5p acts with the Ubc13p–Mms2p E2 ligase complex via its RING finger in one fate of the Rad6 pathway of replication linked DNA damage bypass to poly-ubiquitylate PCNA in (177). It has also been suggested that Rad5p participates in double-stranded break repair in a role dependent on its helicase-like region but not its RING finger (178). A clear function for the helicase-like motor in either role has not been suggested
		Rad16p acts in complex with Rad7p and Elc1p as the NEF4 nucleotide excision repair factor (179,180), possibly scanning along chromatin for lesions as part of non-transcribed strand repair (179) or by distorting DNA to expose the lesion for processing (17). Although the basis is not known, the RING finger of Rad16p influences the stability of the Rad4 protein responsible for recognizing the lesion (180)
		Paradoxically, no DNA repair link has been reported for the single member of the Rad5/16 subfamily present in each mammalian genome such as human SMARCA3 (see also Lodestar and ERCC6 sections of this table). Instead, under the name RUSH1alpha, some have been reported as steroid regulated transcriptional regulators (181) and, under the name HLTF, to be silenced in cancers (182)
	Ris1	The Ris1 protein from S.cerevisiae interacts with Sir4p and has a role in mating type silencing (183) Members are found in all fungi and plant genomes, but not in animals or lower eukaryotes
	Lodestar	This subfamily is the only one within the Rad5/16 grouping which does not contain RING fingers in the major insertion site (Figure S3B). D.melanogaster Lodestar protein was first identified as an essential cell-cycle regulated protein localizing to chromosomes during mitosis (184). Subsequently, the human homologue TTF2 was shown to terminate elongating RNA pol I and pol II complexes independently of transcript length, possibly by directly clearing Pol II from the template at the entry to mitosis (185). TTF2 may also have a role in interphase termination, and in repair (185). This suggestion is interesting because no clear functional homologue of S.cerevisiae Rad5p has been identified in the higher eukaryote genomes which Lodestar is restricted to (see also ERCC6 section of this table). TTF2 has been observed to rescue RNA polymerases stalled at lesions (186)
	SHPRH	SHPRH proteins derive their name from the ordered sequence of domains Snf2_N, Linker_Histone (i.e. H1), PHD finger, Zf_C3HC4 (i.e. e RING finger), Helicase_C in the human member (187) (Supplementary Figure S3B). The Linker_Histone and PHD finger motifs are located adjacent to each other at the minor insertion site between motifs I and Ia, whereas the RING finger domain is located at the major insertion site. The linker histone-related domain in human SHPRH corresponds to the globular winged helix structure of histone H1 (188) and transcription factor HNF3 (189,190). The PHD finger motifs are specialized zinc finger structures which occur in a range of proteins involved in chromatin-mediated transcriptional regulation but whose exact function is unclear (191,192)
		Fungal SHPRH subfamily members typically do not contain the linker histone-related motif, although they do contain the PHD and RING finger domains. Animal SHPRH members contain an additional ∼50 kDa polypeptide sequence immediately upstream of the RING finger domain within the major insertion region (Supplementary Figure S3B). Although lacking an identifiable motif, this region has a number of cysteines suggestive of a zinc finger type coordination and has some 30% charged residues. Fungal members also contain a significant region of charged residues ahead of the RING finger
SSO1653-like	Mot1	S.cerevisiae Mot1 protein (Modifier of Transcription) (193) and homologues with highly conserved helicase-like regions are present across fungi and all higher eukaryotes, where they are known as BTAF1 or TAF172 (194). In vitro and in vivo studies suggest that Mot1p interacts intimately with TBP (195), probably acting to recycle it from DNA-bound states (196). Mot1p is therefore thought to be a Snf2 family enzyme whose role is not to manipulate nucleosome structure, although a possible direct involvement with chromatin has also been proposed (197)
	ERCC6	Human ERCC6 protein (198), also known as Cockayne Syndrome B (CSB), and S.cerevisiae homologue Rad26p (199) were initially regarded as repair proteins due to effects on transcription coupled nucleotide excision repair. However, it has more recently been suggested that the function of these proteins may be to assist transcribing RNA polymerases to either pass or dissociate from blocking DNA lesions (200). Such a role would not directly involve chromatin. The consequent barriers to transcription elongation and sensitivity to DNA damage for non-functional mutants would explain features of Cockayne syndrome and is analogous to the role of the non-Snf2 family Mfd DNA translocase from Escherichia coli (201)
		Most higher animal genomes contain three separate genes assigned to the ERCC6 subfamily, along with single Lodestar and Rad5/16 subfamily members. Conversely, fungal genomes typically encode a single ERCC6 member, no Lodestar subfamily member, but at least two Rad5/16 members. This may reflect divergent strategies for accomplishing transcription-coupled repair
		A number of mutations in the helicase region which result in Cockayne syndrome have been identified and these map to interesting locations in the Snf2 family crystal structures (46,198). In vitro, purified ERCC6 protein can alter nuclease sensitivity and spacing of nucleosomes in an ATP-dependent manner (202). ERCC6 can also bind and negatively supercoil DNA in the presence of non-hydrolysable ATP analogues (203)
	SSO1653	SSO1653, the sole Snf2 family member in archaeal S.solfataricus, is the archetype for the uniquely archaeal and eubacterial subfamily most similar to the eukaryotic Snf2 proteins (see text). It is encoded in the P2 strain genome by juxtaposing SSO1653 and SSO1655 genes, which are punctuated by transposase SSO1654 inserted into the second recA domain immediately upstream of motif V. Although it is highly unlikely that a Snf2 family enzyme would be functional with a 40 kDa transposase insertion in this conserved part of the protein, the enzyme with transposase removed can generate DNA torsion in an ATP-dependent analogous to eukaryotic Snf2 family proteins and was used successfully for structure determination (46). Since a full-length gene can be cloned with appropriate screening (M. F. White, personal communication), the SSO1654 transposase must be active and we refer to the re-fusion of SSO1653 and SSO1655 for simplicity as SSO1653. No information is available for the biological role of any member of the subfamily, although a role in an archaeal chromatin remodelling can be excluded because S.solfataricus lacks archaeal histone-like proteins
SSO1653-like (cont.)		The SSO1653 subfamily helicase-like region also shows close linkage with a zinc finger SWIM motif that may bind to nucleic acids (204,205). For example, coordinately regulated SSO1656 immediately downstream of SSO1653–1655 encodes a 26 kDa basic protein containing the SWIM motif. An SSO1653 subfamily member is present in all Bacillus and Streptococcus genomes (21), and many of these polypeptides also carry a SWIM motif in the same polypeptide. Polypeptides with a Snf2 family helicase-like region but lacking a SWIM motif appear to be in the same operon as a second smaller protein which carries the SWIM motif instead (204). Although the SWIM motif also occurs in eukaryotes, it has not been linked to any of the eukaryotic Snf2 family proteins (204)
Distant	SMARCAL1	The human SMARCAL1 (SMARCA-Like 1) protein and homologues, also known as HARP (22), are unusual within the analysis because they include two subtypes with highly similar helicase-like regions that are flanked by completely different auxiliary domains. The first consists of proteins in higher eukaryotes related to human SMARCAL1 itself with centrally located helicase-like regions and one or more Harp motifs immediately N-terminal to this. Mutants in the helicase-like region of human SMARCAL1 have been linked to a genetic disorder Schimke immuno-osseous dysplasia (206) although the molecular function of the SMARCAL1 protein is unknown. The bovine homologue ADAAD is stimulated by DNA single-double strand boundaries (207)
		The second subtype is found in animal, plant and some protist genomes and contains SMARCAL1 subfamily members related in overall domain organization to the human ZRANB3 protein (Zinc finger, RAN-Binding domain containing 3). The helicase region is located at the N-terminus of the polypeptide, followed by an unusual zinc finger structure related to those found in Ran protein binding proteins (208), and a putative HNH type endonuclease domain at the C-terminus (209). No functional information about any proteins in this subtype is available
	rapA group	The rapA group includes some 220 eubacterial and archaeal members with significantly more sequence variation than other subfamilies. Subsets of sequences are qualitatively visible within multiple alignments of the rapA group, but initial attempts to distinguish them have been unreliable due to the variability of microbial sequences and non-homogeneous sampling in sequenced organisms (e.g. half of all complete bacterial genomes are for a limited range of firmicute and gamma proteobacterial genera)
		Although the rapA group contains the conserved sequence patterns of the Snf2 family for the classical helicase-like motifs, other conserved blocks cannot be easily identified (Supplementary Figure S4). The characteristic extended span of at least 160 residues between helicase motifs III and IV (44) is maintained in the rapA group, but the central part of this region diverges markedly from the other subfamilies and lacks the highly conserved features characteristic of the Snf2 family. The specific difficulty of aligning this region has been remarked previously (20)
		The rapA group also includes a number of polypeptides for which the helicase-like region comprises effectively the entire polypeptide, in contrast with other Snf2 family members which almost universally contain sequences outside the helicase-like region that are likely to form accessory domains or interaction surfaces
		The only member of this group for which biological function has been investigated is E.coli rapA, also known as HepA (210), which influences polymerase recycling under high salt conditions, possibly by aiding the release of stalled polymerases (211)

Summaries of known biochemical, biological and distinctive sequence of each subfamily. Background colouring of subfamilies for groupings as shown in Figure 1.

Since the subfamily assignments are based only on the common helicase-like region, this suggests that the ‘motor’ at the core of even large multiprotein remodeller complexes is tuned to the mechanistic requirements of its function. Such properties are not unprecedented for motor protein subfamilies. The ubiquitous kinesin and myosin proteins are divided into at least 14 and 17 subfamilies, respectively (39,40), and those subfamilies are recognized to reflect tuning of the motors for enzymatic properties linked to particular functional roles. As this also appears to be true for Snf2 family proteins we can anticipate that mechanistic features of the motors will be shared within subfamilies and groupings. This may be useful in helping to predict function of poorly characterized proteins. For example, owing to the recent observation that Swr1 functions in histone exchange (41–43), it is tempting to speculate that the Snf2 motors within other subfamilies in the Swr1-like grouping may be adapted for related purposes. Owing to the remarkable diversity revealed by this classification and the occurrence of many subfamilies which have not been intensively investigated, we briefly summarize current functional and biochemical understanding and characteristic features of each subfamily in Table 3.

Defining the Snf2 family

The survey of Snf2 family proteins enables detailed analysis of sequence conservation in the helicase-like region (Figure 2). This reveals a number of unique features distinguishing them from other helicase superfamily SF2 members. First, the conserved helicase motifs show a highly conserved character across the Snf2 family, and some motifs are extended by juxtaposed residues such as conserved blocks E and G (Figure 2 and Supplementary Figure S4). Second, the helicase-like region in the Snf2 family is significantly longer than for many other helicases, primarily due to an increased spacing between motifs III and IV of >160 residues compared to 38 and 78 for typical SF2 helicases NS3 and RecG, respectively (44). Third, a number of unique conserved blocks are found in Snf2 family proteins (Figure 2 and Supplementary Table S5). Several of these blocks have been noted previously (20,45–48), with conserved block B having been confused in a number of early manuscripts with motif IV. Conserved blocks B, C and K are of particular interest because they are located within the characteristic extended inter-motif III–IV region (Figure 3G).

Figure 2

Conserved residues within Snf2 helicase-like region. Sequence logo of global multiple alignment of 1306 Snf2 helicase-like region for alignment positions with residues in >90% of proteins. Helicase motifs are indicated in solid black boxes with roman numerals I–IV, additional conserved blocks are indicated in dashed black boxes with uppercase letters A–N, and conserved hydrophobic residues packing in the core of the structure by grey solid boxes. Motif and box labels as in Thoma et al. (47) with extensions. A comparison to other nomenclatures is in Supplementary Table S5. See Table 4 for actual distances between conserved blocks.

Figure 3

Conserved blocks contribute to distinctive structural features of Snf2 family proteins. Structural components of Snf2 family proteins relevant to the conservation are illustrated on the zebrafish Rad54A structure [pdb 1Z3I (153)]. (A) core recA-like domains 1 and 2 including colouring of helicase motifs (I in green, Ia in blue, II in bright red, III in yellow, IV in cyan, V in teal and VI in dark red). (B) Q motif (pink). (C) antiparallel alpha helical protrusions 1 and 2 (red) projecting from recA-like domains 1 and 2, respectively. (D) Linker spanning from protrusion 1 to protrusion 2 (middle blue). (E) Major insertion region behind protrusion 2 (light green). (F) triangular brace (magenta). (G) Schematic diagram showing location of structural elements and helicase motifs coloured as in A–F, with conserved blocks from Figure 2 shown as white boxes. Spans identified by Pfam profiles SNF2_N and Helicase_C are shown flanking the major insertion site.

The SMARCAL1 subfamily contains classical helicase motifs which are highly similar to the other subfamilies. It also has an extended motif III–IV spacing, but it nevertheless lack conserved blocks within the motif III–IV region (Supplementary Figure S4). The rapA group has similar properties but is more diverse in overall sequence and retains less similarity in the classical motifs. It is unclear whether the SMARCAL1 subfamily and particularly the rapA group will maintain the structural features of the Snf2 family and they are therefore at the limit of the definition of the Snf2 family. We have also noticed further protein groupings with extended spacing between motifs III and IV and detectable similarity to the classical helicase-like motifs of the Snf2 family sequences (Supplementary Figure S4). These include poxvirus NPH-I related proteins involved in transcription termination (49) and the FANCM/MPH1/Hef group of helicases encompassing yeast Mph1p, archaeal Hef and human FANCM proteins involved in DNA repair (50–52). However, those proteins show low similarity to the Snf2 family between motifs III and IV and appear to lack the characteristic conserved blocks C, J and K of the Snf2 family. Interestingly, comparison of the recently determined Pyrococcus furiosus archaeal Hef helicase structure reveals that the MPH1/Hef group has a related structural organization to Zebrafish Rad54, but contains only a single compact alpha-helical domain encoded between motifs III and IV (Supplementary Figure S6). It has been noted that this extra alpha-helical domain has some similarities with the thumb domain of Taq DNA polymerase which grips the DNA minor groove (53). It is therefore likely that the SMARCAL1 subfamily, rapA group, NPH-I and MPH1/Hef proteins reflect a continuum of diversity while sharing core features with the other Snf2 subfamilies.

Evolution of Snf2 family diversity

None of the 293 scanned archaeal or bacterial genomes contains a protein classified in any of the eukaryotic subfamilies (Supplementary Tables S1B and S1C). All identified archaeal and bacterial proteins belong to the SSO1653 subfamily and rapA group. Conversely, the SSO1653 subfamily and rapA group are likely to be specific to microbial organisms because the only two members of these families identified in eukaryotes (Supplementary Table S1A) appear to be false positives (data not shown). Over two-thirds of complete microbial genomes contain members of the SSO1653 subfamily and/or rapA group. This broad yet incomplete distribution suggests they perform non-essential functions that are sufficiently advantageous to maintain their prevalence. Although rapA group proteins are distinguished by the lack of several features characteristic of eukaryotic Snf2 family members (see above), the SSO1653 subfamily carries all the Snf2 family sequence and structural hallmarks (Supplementary Figure S4). SSO1653 subfamily members are present in both bacteria and archaea, but they are not ubiquitous in archaeal genomes despite the presence of transcription, replication and repair mechanisms with significant similarity to those of eukaryotes (54,55). There is also no obvious linkage between the presence of histone-like proteins and SSO1653 subfamily members in archaeal genomes (Supplementary Table S1B). Furthermore, the SSO1653 subfamily falls in a grouping (Figure 1C) with the eukaryotic ERCC6 and Mot1 subfamilies whose biochemical role appears not to involve chromatin directly. In contrast to the limited archaeal and bacterial distribution of Snf2 family proteins, all eukaryote genomes contain multiple Snf2 family proteins. The early branching Giardia lamblia and the minimal Encephalozooan cuniculi genomes both encode six different Snf2 family genes falling into subfamilies represented across eukarya (Supplementary Table S1A), several of which have clear linkage to chromatin transactions. It is therefore possible that the microbial SSO1653 subfamily represents an ancestral Snf2-like form from which the eukaryotic subfamilies radiated. Such expansion of the Snf2 family early in eukaryote evolution (20) could have been coincident with the development of high-density nucleosomal packaging (56).

Distribution of Snf2 family members in complete genomes

The linkage between the primary sequence-based definitions of the subfamilies and distinct biological function is strongly supported by the presence of one or more subfamily members in each eukaryotic genome across large evolutionary ranges (Table 2 and Supplementary Table S1A). For example, a common set of subfamilies are found in almost all fungi, plant and animal genomes comprising Snf2, Iswi, Chd1, Swr1, Etl1, Mot1, Ino80, Rad5/16, ERCC6, Rad54 and SHPRH. Increased genomic complexity is also paralleled by increasing numbers of subfamilies and members: E.cuniculi with a genome encoding some 2000 gene products has 6 Snf2 family members from 6 subfamilies, whereas the S.cerevisiae genome encoding some 6000 genes has 17 Snf2 family members from 13 subfamilies, and the human genome encoding some 25 000 genes has 32 Snf2 family genes from 20 subfamilies (Table 2, part B). The functional linkage across large evolutionary ranges suggests that each subfamily may have distinctive properties of their ATPase motors tuned to their function. This is supported by recent biochemical results demonstrating that helicase-like regions can be swapped within but not between subfamilies (57). However, a counterpoint is that functional redundancy can occur between subfamilies. For example, synthetic deletion of all three of the S.cerevisiae ISW1, ISW2 and CHD1 genes together is required to generate a strong phenotype (58,59). Redundancy also provides an explanation why some genomes lack certain members: the small genome of Schizosaccharomyces pombe lacks an Iswi subfamily member but maintains two Chd1 subfamily members. In addition to the 11 subfamilies represented broadly across eukaryotes are a number of others restricted to specific taxonomic ranges. For example, CHD7 members are found almost exclusively in animals, and ATRX members are found only in animals and plants.

Subfamily-specific properties

A number of specific features contribute to the distinction between subfamilies. First, the spacing between motifs III and IV is extended significantly beyond the minimal ∼160 residues for a number of subfamilies (Table 4). For the Rad5/16, Ris1 and SHPRH subfamilies, the additional sequences all include RING fingers, whereas for the Swr1 and EP400 subfamilies they comprise highly proline and serine/threonine-rich spans. Ino80 and ATRX subfamilies also contain large, novel and distinct spans. Remarkably, all these large extra insertions occur at the same location in the primary sequence, between conserved blocks C and K which we term the ‘major insertion site’ (Figure 3G and Supplementary Figure S7A). Even for the subfamilies without large insertions there is variation in the length of sequence in the major insertion site (Table 4). For example, the Zebrafish Rad54 structure contains some 25 more residues forming two additional small alpha helices compared to the Sulfolobus solfataricus SSO1653 structure. When Snf2 family members from different subfamilies are aligned, the variability of the major insertion region strongly disturbs the alignment such that a contiguous pattern becomes difficult to define. This has led to some of the Snf2 family proteins being described as having ‘split’ helicase-like ATPase regions. The discontinuity is also the cause of protein motif databases such as SMART and Pfam defining Snf2 family members as matching a bipartite combination of SNF2_N and Helicase_C profiles (Figure 3G). The C-terminal end of the SNF2_N profile corresponds to conserved block C.

Table 4

Spacings between helicase motifs and major conserved blocks by subfamily

Means and standard deviations (in parentheses) by subfamily for the number of amino acids between helicase motifs and the conserved blocks B, C and K, calculated for 1306 Snf2 family members from Uniref assigned to subfamilies (Table 1). Spacings are calculated from the edges of the core conserved residues of motifs as marked in Supplementary Figure S4.

Second, subfamilies have characteristic small insertions at other sites (Table 4). Two such sites, also in the motif III–IV region, are located between conserved blocks H and B and between J and C (Figure 2). These are likely to influence the length of the long alpha helical protrusions 1 and 2, respectively (see below, Figure 3C), and there is a difference of some 40 residues between the shortest and longest subfamily lengths for each (Table 4). A ‘minor insertion site’ located between motifs I and Ia on the back of recA-like domain 1 is also occupied by recognizable domains in a few subfamilies from the Rad5/16-like grouping such as SHPRH (Supplementary Figure S3B). A number of other small insertions map to loops between various secondary structural elements (data not shown). Third, although adhering to a general Snf2 family-specific pattern, individual subfamilies show characteristic patterns in the helicase motifs and in other conserved blocks (Supplementary Figure S4). For example, the well-known helicase motif II with typical DEAH pattern favours DEGH in the Snf2, Mot1 and Rad54 subfamilies, DEAQ in the Swr1, EP400, Ino80 and SSO1653 subfamilies or DESH in the SMARCAL1 subfamily. Likewise, for the typical conserved block E—motif I combined pattern ILADEMGLGKT all ATRX subfamily members have histidine instead of aspartate (i.e. ILAHEMGLGKT) and most Mot1 subfamily members have cysteine replacing alanine (i.e. ILCDEMGLGKT). It is also possible to identify other residues correlating with groups of subfamilies. For example, members of the Snf2, Iswi, Chd1, Mi-2, CHD7, ALC1, Rad54, ATRX and Arip4 subfamilies have an arginine immediately following the motif II DEAH. In the zebrafish Rad54 structure this residue R294 interacts with the sulphate which is suggested to mimic the ATP gamma phosphate.

Conserved blocks encode the unique structural features of the Snf2 family

Two structural determinations of the helicase-like regions of Snf2 family members have been presented recently: zebrafish Rad54 (pdb code 1Z3I) (47) and S.solfataricus SSO1653 (pdb codes 1Z6A, 1Z63, 1Z5Z) (46). As expected for members of the Snf2 family, the fold of each core recA-like domain in the Rad54 and SSO1653 structures is substantially similar and related to those of other known SF1 and SF2 helicases. In the zebrafish Rad54 structure the two recA-like domains are oriented equivalently to those of other known helicase structures (Figure 3A and B), whereas in the S.solfataricus SSO1653 structures recA-like domain 2 is flipped by 180° to an arrangement never previously observed for a helicase (Supplementary Figure S7B). This unusual orientation in SSO1653 is observed for both the DNA free and DNA-bound forms (46). The most striking feature of the Snf2 family structures is the presence of several additional structural elements grafted onto the core helicase structure. These comprise antiparallel alpha helical protrusions from both recA-like domains 1 and 2 (Figure 3C), a structured linker between the recA-like domains (Figure 3D), the major insertion region at the back side of the domain 2 alpha helical protrusion (Figure 3E) and a triangular brace packed against the domain 2 alpha helical protrusion (Figure 3F). The two alpha-helical protrusions and linker are all encoded within the enlarged span between motifs III and IV. The triangular brace is encoded immediately downstream of motif VI. Remarkably, the primary sequence features of the Snf2 family correspond directly to the additional structural elements (Figure 3G). First, the bases of the protrusions from recA-like domains 1 and 2 are both fixed by conserved blocks. For protrusion 1, this involves conserved block H composed of a repeating pattern of aromatic residues, with additional involvement of aromatics from conserved block A. For protrusion 2 this involves the arrangement of conserved blocks C, J and K. Second, the protrusions themselves are relatively conserved in sequence and length within subfamilies but not across the whole Snf2 family. Although there is no obvious correlation between the lengths of the protrusions 1 and 2, the distribution of protrusion lengths adheres to multiples of the alpha helical repeat (Supplementary Figure S8), suggesting that protrusions retain structure while varying in extension. Third, the Q motif structure found in many SF2 proteins utilizes a different arrangement of residues to DEAD box helicases such as eIF4A, where an aromatic residue orients the adenine base ring for contacts with a downstream glutamine (4) (Figures 3B). In the Snf2 family, the aromatic residue is contributed by conserved block F downstream of the glutamine. The Q motif affects ATP hydrolysis in DEAD box helicases and mutation of the core glutamine in yeast Snf2 subfamily member Sth1p causes slow growth (4). Fourth, the linker connecting protrusions 1 and 2 contains highly conserved dual arginines in conserved block B. Their central location between the ATP-associating and DNA-associating structural elements suggests that they may play an important role in the mechanism of Snf2 family enzymes. Consistent with this, mutation of the second arginine of the pair in Snf2p leads to effectively complete loss of function of the protein in vivo (48). Finally, the brace is composed of a principal alpha helix anchored by conserved block M into the junction at the base of protrusion 2 composed of conserved blocks C, J and K. The major insertion region is immediately behind protrusion 2, almost diametrically opposite the ATP-binding site in the zebrafish Rad54 structure (Figure 3E). The nearest residues of the major insertion region in Rad54 are some 15 Å from DNA phosphates for docked DNA (Supplementary Figure S7A). However, an appropriately oriented alpha helix of some 20 residues would be sufficient to reach into the major groove, so large insertions at the major insertion site could potentially interact with DNA or other DNA-binding proteins bound in the groove. In the flipped conformation of domain 2 observed in the SSO1653 structure, the major insertion region is juxtaposed immediately adjacent to the DNA such that two non-conserved arginines from the major insertion region make direct DNA phosphate contacts. As the distinctive structural features are defined by unique and highly conserved blocks, they are likely to confer properties to the ATPase motor that adapts the action of the core recA-like domains for a unique mechanism. We anticipate that while some features of the Snf2 family mechanism will be common to SF2 translocases, other aspects will be distinctive. Knowledge of the conserved residues and their structural location provides important information for understanding these distinctions.

Other levels of Snf2 family identity

We have demonstrated that the common helicase-like region is sufficient to enable classification of Snf2 family members. However, almost all Snf2 family polypeptides contain significant additional sequences likely to harbour accessory domains. For some subfamilies there is good correlation with the presence of particular accessory domain combinations (Supplementary Table S9). For example, almost all Snf2 subfamily members contain a bromodomain, ISWI members contain a SANT domain, and Chd1, Mi-2 and CHD7 members contain a chromodomain. However, many domain profiles in resources used for domain analysis have unidentified function or are unreliable in the context of Snf2 proteins. For example, Pfam lacks a SANT-specific profile and detects <10% of SANT domains with a more generic general ‘Myb_DNA-binding’ profile. We are currently undertaking further analysis to improve the relevant profiles and analyse the linkage of Snf2 family accessory domains in detail. Finally, many Snf2 family proteins are part of larger multi-protein complexes. Accessory motifs within these complexes are also likely to adapt the function of Snf2 motors for different purposes.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

207 in total

1. Nucleosome mobilization catalysed by the yeast SWI/SNF complex.

Authors: I Whitehouse; A Flaus; B R Cairns; M F White; J L Workman; T Owen-Hughes
Journal: Nature Date: 1999-08-19 Impact factor: 49.962

2. INO80 and gamma-H2AX interaction links ATP-dependent chromatin remodeling to DNA damage repair.

Authors: Ashby J Morrison; Jessica Highland; Nevan J Krogan; Ayelet Arbel-Eden; Jack F Greenblatt; James E Haber; Xuetong Shen
Journal: Cell Date: 2004-12-17 Impact factor: 41.582

3. Crystal structure and functional implications of Pyrococcus furiosus hef helicase domain involved in branched DNA processing.

Authors: Tatsuya Nishino; Kayoko Komori; Daisuke Tsuchiya; Yoshizumi Ishino; Kosuke Morikawa
Journal: Structure Date: 2005-01 Impact factor: 5.006

4. Human transcription release factor 2 dissociates RNA polymerases I and II stalled at a cyclobutane thymine dimer.

Authors: R Hara; C P Selby; M Liu; D H Price; A Sancar
Journal: J Biol Chem Date: 1999-08-27 Impact factor: 5.157

5. Yeast Rad54 promotes Rad51-dependent homologous DNA pairing via ATP hydrolysis-driven change in DNA double helix conformation.

Authors: G Petukhova; S Van Komen; S Vergano; H Klein; P Sung
Journal: J Biol Chem Date: 1999-10-08 Impact factor: 5.157

6. Recruitment of the INO80 complex by H2A phosphorylation links ATP-dependent chromatin remodeling with DNA double-strand break repair.

Authors: Haico van Attikum; Olivier Fritsch; Barbara Hohn; Susan M Gasser
Journal: Cell Date: 2004-12-17 Impact factor: 41.582

Review 7. Recombination proteins in yeast.

Authors: Berit Olsen Krogh; Lorraine S Symington
Journal: Annu Rev Genet Date: 2004 Impact factor: 16.830

8. Acetylation by Tip60 is required for selective histone variant exchange at DNA lesions.

Authors: Thomas Kusch; Laurence Florens; W Hayes Macdonald; Selene K Swanson; Robert L Glaser; John R Yates; Susan M Abmayr; Michael P Washburn; Jerry L Workman
Journal: Science Date: 2004-11-04 Impact factor: 47.728

Review 9. Role of chromatin modification in flowering-time control.

Authors: Yuehui He; Richard M Amasino
Journal: Trends Plant Sci Date: 2005-01 Impact factor: 18.313

10. The Universal Protein Resource (UniProt).

Authors: Amos Bairoch; Rolf Apweiler; Cathy H Wu; Winona C Barker; Brigitte Boeckmann; Serenella Ferro; Elisabeth Gasteiger; Hongzhan Huang; Rodrigo Lopez; Michele Magrane; Maria J Martin; Darren A Natale; Claire O'Donovan; Nicole Redaschi; Lai-Su L Yeh
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

323 in total

1. ATP-dependent chromatin remodeling factors tune S phase checkpoint activity.

Authors: Tracey J Au; Jairo Rodriguez; Jack A Vincent; Toshio Tsukiyama
Journal: Mol Cell Biol Date: 2011-09-19 Impact factor: 4.272

2. Probing the conformation of the ISWI ATPase domain with genetically encoded photoreactive crosslinkers and mass spectrometry.

Authors: Ignasi Forné; Johanna Ludwigsen; Axel Imhof; Peter B Becker; Felix Mueller-Planitz
Journal: Mol Cell Proteomics Date: 2011-12-13 Impact factor: 5.911

3. The Arabidopsis cell division cycle.

Authors: Crisanto Gutierrez
Journal: Arabidopsis Book Date: 2009-03-20

4. Heterozygous missense mutations in SMARCA2 cause Nicolaides-Baraitser syndrome.

Authors: Jeroen K J Van Houdt; Beata Anna Nowakowska; Sérgio B Sousa; Barbera D C van Schaik; Eve Seuntjens; Nelson Avonce; Alejandro Sifrim; Omar A Abdul-Rahman; Marie-José H van den Boogaard; Armand Bottani; Marco Castori; Valérie Cormier-Daire; Matthew A Deardorff; Isabel Filges; Alan Fryer; Jean-Pierre Fryns; Simone Gana; Livia Garavelli; Gabriele Gillessen-Kaesbach; Bryan D Hall; Denise Horn; Danny Huylebroeck; Jakub Klapecki; Malgorzata Krajewska-Walasek; Alma Kuechler; Matthew A Lines; Saskia Maas; Kay D Macdermot; Shane McKee; Alex Magee; Stella A de Man; Yves Moreau; Fanny Morice-Picard; Ewa Obersztyn; Jacek Pilch; Elizabeth Rosser; Nora Shannon; Irene Stolte-Dijkstra; Patrick Van Dijck; Catheline Vilain; Annick Vogels; Emma Wakeling; Dagmar Wieczorek; Louise Wilson; Orsetta Zuffardi; Antoine H C van Kampen; Koenraad Devriendt; Raoul Hennekam; Joris Robert Vermeesch
Journal: Nat Genet Date: 2012-02-26 Impact factor: 38.330