| Literature DB >> 20525797 |
Chih Yuan Wu1, Yao Chi Chen, Carmay Lim.
Abstract
Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a 'corner' architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present 'only' in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20525797 PMCID: PMC2919736 DOI: 10.1093/nar/gkq478
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Recurring structural patterns in structurally non-redundant DBPs. The 3D motif cdehja (in yellow) and corresponding amino acid sequence and CATH code in (A) the chromosomal protein Sso7d (1c8c-A); (B) transcription initiation factor IIA (1nh2-C) and (C) the universal transcriptional effector of Notch signaling, CSL (1ttu-A).
Figure 2.Defining 3D motifs. (A) The 6-mer structural pattern ddehjl is found in 1ecr-A:57−66 (green), 1xgn-A:763−772 (yellow) and 2bcq-A:410−419 (red); since their backbone structures do not superimpose, ddehjl is not considered to be a 3D motif. (B) The 6-mer structural pattern cdehja is found in 1c8c-A:21−30 (green), 1nh2-C:249−258 (yellow) and 1ttu-A:589−598 (red); as their backbone structures superimpose, cdehja is considered to be a 3D motif.
The afklmm segments found in 40 structurally non-redundant dsDNA-binding proteins
aSegments in italics might have ≥1 atoms close to a DNA atom if the protein had been complexed with a longer DNA, whereas segments underlined do not contain any atoms close to DNA, but have at least one atom within 5 Å of an atom in another protein chain.
bCATH code of the domain containing the afklmm segment—a dash means no CATH code has been assigned for that domain.
cResidues whose atoms are in vdW contact or are H bonded directly or indirectly via water molecules with DNA atoms are in bold, whereas the other residues have atoms within 5 Å of a DNA atom.
dResidues whose atoms are within 5 Å of an atom in another protein chain.
Figure 3.The ‘corner’ motif. (A) Representative 3D structures of the afklmm ‘corner’ motif (left) in the Staphylococcus aureus multidrug-binding protein QacR (1jt0-A) consisting of residues 32SESSKGNLYY41 and an alternative dfklmm ‘corner’ architecture (right) in POU domain, class 2, transcription factor 1 (1e3o-C) consisting of residues 40NDFSQTTISR49. The dotted lines denote H bonds. (B) Structural definition of the ‘corner’ motif (see text). (C) Sequence logo of the ‘corner’ motif.
The m(4)nopafklm(4) motif (in bold italics) in structurally non-redundant HTH proteins
| PDB | Protein name | HTH motif amino acid and structural letter sequence | CATH code | |
|---|---|---|---|---|
| 1d3u−B | Transcription initiation factor IIB | 1268−1292 | 1116−1134 | 1.10.472.10 |
| 1212−1231 | ||||
| 1268−1286 | ||||
| 1ddn−A | Diphtheria toxin repressor | 27−50 | 27−45 | 1.10.10.10 |
| 83−101 | 1.10.60.10 | |||
| 1fok−A | Type-2 restriction enzyme FokI | 325−353 | 321−330* | 1.10.10.10 |
| mmmmmmmnopacb | ||||
| 1gdt−A | Transposon γ-δ resolvase | 161−181 | 150−168 | 1.10.10.60 |
| 161−179 | ||||
| 1lmb−3 | Repressor protein CI | 33−51 | 22−40 | 1.10.260.40 |
| 33−51 | ||||
| 62−80 | ||||
| 1qpz−A | HTH-type transcriptional repressor purr | 4−23 | 4−22 | 1.10.260.40 |
| 1run−A | Catabolite gene activator | 169−189 | 169−187 | 1.10.10.10 |
| 1trr−A | Trp operon repressor | 68−91 | 68−86 | 1.10.1270.10 |
| 2hdd−A | Segmentation polarity homeobox protein engrailed | 28−57 | 31−49 | 1.10.10.60 |
| mmm | ||||
| 3hts−B | Heat shock factor protein | 228−254 | − | 1.10.10.10 |
| mmmmmmmmmpfbdc | ||||
| 6cro−A | Regulatory protein cro | 16−36 | 16−34 | 3.30.240.10 |
aAn asterisk means absence of mmmmnopafklmmmm but presence of afklmmm; a ‘dash’ means no mmmmnopafklmmmm or afklmmm structural letter sequence in the protein structure.
bCATH code for the domain containing the afklmm segment.
cFrom Littlefield et al. (24).
dFrom White et al. (25).
eFrom Wah et al. (26).
fFrom Jones et al. (6).
gFrom Parkinson et al. (27).
hFrom Lawson and Carey (28).
iFrom Littlefield et al. (29).
jFrom Albright and Matthews (30).
Figure 4.Proteins with afklmm segments that might contact DNA or are involved in oligomerization. (A) The afklmm segment of the transposase for transposon Tn5 (1mus-A; 220−229) might contact DNA if the protein had been complexed with a longer DNA. (B) Residues 14−23 comprising the afklmm segment of chain A (green) contact residues in chain B (light blue) in the HU dimer (1p71-A), but they do not contact DNA. In (A) and (B), the afklmm segment is in magenta, while the DNA is in orange.
Figure 5.DNA-specific motifs. Representative 3D structure of (A) the cfbfklmmmmghimmmmm motif (yellow) in the MutY adenine DNA glycosylase (1rrq-A:108−136), and (B) the cfklmmmmnopmmm motif (yellow) in the S. aureus multidrug-binding protein QacR (1jt0-A:21−45). The characteristic structural features of each motif are shown on the right. Note that P14 in (A) and P12 in (B) correspond to P1 of the motif (in magenta).
Figure 6.The predicted DNA-binding sites in the N-terminal fragment of topoisomerase I (1mw8-X). (A) The DNA-binding sites predicted by the two ‘corner’ motifs (magenta): 297−306 and 381−390, and (B) the DNA-binding sites, S1 (red) and S2 (yellow), predicted by the method described in our previous work (36). The catalytic residue, Y319, is shown as ball and stick.
Proteins with unknown function containing the cfklm(4)nopafklm(6) or HTH motif
| Proteins | Motif hits | Motif features conserved | HTHQuery hits | HTH query prediction |
|---|---|---|---|---|
| 3cym-A | 401−425 | No | 403−424 | Unlikely (−9) |
| 2nx4-A | 28−52 | Yes | 30−51 | Possible (2) |
| 2ibd-A | 33−57 | No | 35−56 | Possible (−2) |
| 2ia0-A | 21−45 | Yes | 23−44 | Likely (9) |
| 2g7u-A | 27−51 | Yes | 29−51 | Likely (9) |
| 2fi0-A | 48−72 | No | 49−79 | Unlikely (−9) |
aAmino acid sequence corresponding to the cfklm(4)nopafklm(6) motif.
bThe cfklm(4)nopafklm(6) motif features are conserved if the motif structure has a 8-residue helix from P3 and a second helix from P14 containing ≥7 residues, as well as P3↔P17 and P7 ↔P14 vdW contacts between the 2 helices (see Figure 5B).
cThe amino acid sequence of the HTH motif according to HTHquery.
dThe number in parentheses is an integer based score from the linear predictor of HTHquery. A score >3 is a likely hit (the protein is likely to have a DNA-binding HTH motif), a score between −3 and 3 is a possible hit, and a score less than −3 is an unlikely hit.
eAbsence of P3↔P17 and P7 ↔P14 vdW contacts.
fThe first helix contains only seven instead of eight residues.