| Literature DB >> 19767616 |
Abstract
3D-footprint is a living database, updated and curated on a weekly basis, which provides estimates of binding specificity for all protein-DNA complexes available at the Protein Data Bank. The web interface allows the user to: (i) browse DNA-binding proteins by keyword; (ii) find proteins that recognize a similar DNA motif and (iii) BLAST similar DNA-binding proteins, highlighting interface residues in the resulting alignments. Each complex in the database is dissected to draw interface graphs and footprint logos, and two complementary algorithms are employed to characterize binding specificity. Moreover, oligonucleotide sequences extracted from literature abstracts are reported in order to show the range of variant sites bound by each protein and other related proteins. Benchmark experiments, including comparisons with expert-curated databases RegulonDB and TRANSFAC, support the quality of structure-based estimates of specificity. The relevant content of the database is available for download as flat files and it is also possible to use the 3D-footprint pipeline to analyze protein coordinates input by the user. 3D-footprint is available at http://floresta.eead.csic.es/3dfootprint with demo buttons and a comprehensive tutorial that illustrates the main uses of this resource.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19767616 PMCID: PMC2808867 DOI: 10.1093/nar/gkp781
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A–E) Typical content of 3D-footprint entries, illustrated with dimeric complex 1llm_CD, a Zif23-GCN4 chimera (32), and with non-redundant monomeric complex 1a0a_B, positive regulatory protein PHO4 (33). (A) An interface graph dissecting atomic contacts and nucleotides at the interface responsible for specific DNA discrimination, where solid bases indicate indirect readout mechanisms. (B) Footprint logo diagram of 1llm_CD containing four central base-pairs subject to indirect readout. (C) Sequence logo and structure-based PWM obtained by averaging contact and readout PWMs for complex 1llm_CD. The calculated information content places this complex in the dark gray region of the boxplot in panel G (see below). Note that the underneath link exports this PWM to a RSAT form where the user can scan genomes of DNA fragments for occurrences of this motif. (D) Examples of literature-extracted DNA sequences associated to the term ‘GCN4’ and their E-values, corresponding to non-redundant entry 1llm_D. (E) Dendrogram of similar interfaces for entry 1a0a_B, where the distance tree is based on the estimated structural similarity between binding domains and interface residues—those with 4.5Å heavy atom contacts with nitrogen bases—are aligned coloring their nucleotide partners. (F) Querying 3D-footprint with the protein sequence of Zea mays transcription factor PTZm00668.1 (34). Note that all six interface residues are covered in the alignment, but only three are conserved. (G) Scale of specificity observed for SCOP superfamilies in the database, computed over the parenthesized number of non-redundant of complexes, after excluding superfamilies with less than seven complexes. An up-to-date scale is available at http://floresta.eead.csic.es/3dfootprint/stats.html.