| Literature DB >> 27508249 |
Filipa L Sousa1, Daniel J Parente2, Jacob A Hessman2, Allen Chazelle2, Sarah A Teichmann3, Liskin Swint-Kruse2.
Abstract
The AlloRep database (www.AlloRep.org) (Sousa et al., 2016) [1] compiles extensive sequence, mutagenesis, and structural information for the LacI/GalR family of transcription regulators. Sequence alignments are presented for >3000 proteins in 45 paralog subfamilies and as a subsampled alignment of the whole family. Phenotypic and biochemical data on almost 6000 mutants have been compiled from an exhaustive search of the literature; citations for these data are included herein. These data include information about oligomerization state, stability, DNA binding and allosteric regulation. Protein structural data for 65 proteins are presented as easily-accessible, residue-contact networks. Finally, this article includes example queries to enable the use of the AlloRep database. See the related article, "AlloRep: a repository of sequence, structural and mutagenesis data for the LacI/GalR transcription regulators" (Sousa et al., 2016) [1].Entities:
Year: 2016 PMID: 27508249 PMCID: PMC4961497 DOI: 10.1016/j.dib.2016.07.006
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1AlloRep database scheme. The five sections of the AlloRep database are contained within the dashed boxes. Each section contains one or more tables (smaller boxes with blue headings). Lines between tables indicate connections that may be accessed with SQL queries.
Fig. 2Comparisons of inter- and intra-molecular contacts among 65 structures of LacI/GalR homologs. All available structures were collected for each equivalent state of a given protein (from the same species and bound the same ligands), including the occurrences of multiple structures present in a unit cell. Inter- and intra-monomeric contacts were determined as defined in the text, and the frequency of each contact was calculated for the set of structures. If only one structure was available, the frequency was set to 100% by default. As an example, panel (a) shows an excerpt from a matrix containing information about the frequency of various contacts for all structures of E.coli apo-PurR. Each contact matrix was then linearized in numerical order (b) to make one column of panel (c). As a second example, the dashed box contains the composite information for all structures of LacI bound to DNA and the small molecule NPF. In panel (c), the contacts were ordered on the Y axis so that those involving the N-terminal DNA binding domain are at the top, those of the linker come next (positions 45–62 in E. coli LacI), followed by contacts in the regulatory domain. Each column along the X axis corresponds to the named group of equivalent structures. Bound ligands are in parentheses and ligand abbreviations can be found in the table “struct2_ligand_description”. Different colors indicate the frequency that a particular contact occurs. Inter-monomeric contacts are collected on the left of panel (c). Some structures contained monomers that could not be dimerized by symmetry operations; thus their inter-monomer contacts could not be determined. Intra-monomeric contacts are shown on the right. Once contact frequency was calculated, agnostic, hierarchical clustering was used to order the inter- and intra-monomeric contacts in panel (c). These plots show that the inter-monomer contacts (left panel) cluster according to their ligand binding state. For example, the DNA bound structures for different homologs are more similar to each other than to their respective inducer bound structures. In contrast, the intra-monomeric contacts (right panel) cluster so that the structures for each LacI/GalR subfamily are most closely related, regardless of their binding state.
Fig. 3Screen shot of the AlloRep database. This screen shot was taken after entering the AlloRep database from the home webpage (www.AlloRep.org) and selecting the table “mut1_single” under the “allorep” tree that appears on the left-hand side of the window. When viewing this table under the “Browse” tab (the default option after choosing a table), individual entries can be browsed and specific features can be sorted by clicking on the column headings. For more advanced searches and filtering, the tabs near the top of the window can be used to reach the built-in search fields (“Search”) and the command-line tools (“SQL”). Example command-line queries are given in the supplement to this manuscript.
| Subject area | |
|---|---|
| More specific subject area | |
| Type of data | |
| How data was acquired | |
| Data format | |
| Experimental factors | |
| Experimental features | |
| Data source location | |
| Data accessibility |