| Literature DB >> 21036863 |
Laura Soito1, Chris Williamson, Stacy T Knutson, Jacquelyn S Fetrow, Leslie B Poole, Kimberly J Nelson.
Abstract
PREX (http://www.csb.wfu.edu/prex/) is a database of currently 3516 peroxiredoxin (Prx or PRDX) protein sequences unambiguously classified into one of six distinct subfamilies. Peroxiredoxins are a diverse and ubiquitous family of highly expressed, cysteine-dependent peroxidases that are important for antioxidant defense and for the regulation of cell signaling pathways in eukaryotes. Subfamily members were identified using the Deacon Active Site Profiler (DASP) bioinformatics tool to focus in on functionally relevant sequence fragments surrounding key residues required for protein activity. Searches of this database can be conducted by protein annotation, accession number, PDB ID, organism name or protein sequence. Output includes the subfamily to which each classified Prx belongs, accession and GI numbers, genus and species and the functional site signature used for classification. The query sequence is also presented aligned with a select group of Prxs for manual evaluation and interpretation by the user. A synopsis of the characteristics of members of each subfamily is also provided along with pertinent references.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21036863 PMCID: PMC3013668 DOI: 10.1093/nar/gkq1060
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of the Prx subfamilies present in PREX
| Subfamily | Number of database members | Canonical subfamily members | Phylogenetic distribution | Typical location of |
|---|---|---|---|---|
| AhpC/Prx1 | 1059 | Archea, Bacteria, Plants, Unicellular and Multicellular Eukaryotes | C-terminus (>96%) | |
| BCP/PrxQ | 1115 | Bacteria, Plants | Helix α2 (∼50%) or α3 (∼7%) | |
| Prx5 | 517 | Bacteria, Eukaryotes | Helix α5 (∼17%) | |
| Prx6 | 493 | Archea, Bacteria, Plants, Unicellular and Multicellular Eukaryotes | No | |
| Tpx | 307 | Bacteria | Helix α3 (>96%) | |
| AhpE | 25 | Bacteria | Unknown |
aStructural designations as in (10). If no CR is present, resolving thiol must come from another protein or small molecule.
bThe AhpC/Prx1 subfamily is also known as the ‘typical 2-Cys’ Prxs and includes tryparedoxin peroxidases, Arabidopsis thaliana 2-Cys Prx, barley Bas1 and Saccharomyces cerevisiae TSA1 and TSA2.
cThe CR is near the C-terminus of the partner subunit within the homodimer; upon oxidation, intersubunit disulfide forms between the CP and the CR of the two chains.
dIntrasubunit disulfide formed in oxidized protein (so-called ‘atypical’ 2-Cys Prxs).
eThe Prx5 subfamily includes Populus trichocarpa PrxD, the plant type II Prxs, mammalian Prx5 and a group of bacterial glutaredoxin-Prx5 fusion proteins.
fThe Prx6 subfamily (frequently referred to as the ‘1-Cys’ group) also includes the bacterial Prx6 proteins, A. thaliana 1-Cys Prx and S. cerevisiae mitochondrial Prx1.
gThe Tpx subfamily includes bacterial proteins (e.g. from Streptococcus pneumoniae and Helicobacter pylori) named thiol peroxidase, p20 and scavengase.
hCanonical member contains no CR, but >50% of sequences include a potential CR in α2, similar to E. coli BCP.
Figure 1.Identification of Prx sequences using the DASP tool. (i) The active site of human Prx6 (PDB identifier 1prx) is shown with the four key residues highlighted in red. (ii) Structural segments located within 10 Å of the center of geometry of the key catalytic residues are identified (each segment shown in a different color) and extracted from the global structure. (iii) The sequence fragments are then combined to form a functional site signature [residue colors correspond to the color of structure segments in (ii); key residues are highlighted in red]. (iv) Functional-site signatures for structurally characterized members of the Prx6 subfamily are aligned using ClustalW (22,24) to create a functional site profile. (v) Motifs are identified within any fragments that contain at least three residues and position specific scoring matrices (PSSM) (25) are created for each motif. (vi) For each sequence in a user-selected sequence database, the PSSM for each motif is used to find and score the segment within a query sequence which best matches a motif. (vii) Each time a motif is matched to a position in the protein sequence, a P-value is calculated that represents the probability of finding a match as good as the observed match within a random sequence. The P-values for all motifs in a single sequence are then combined using QFAST to obtain the final statistical significance score (final P-value) (26). (viii) The protein information (including accession numbers, annotations and species), final P-value and sequence fragments matched to each queried motif are exported for all sequences with a final P-value more significant than a user-selected P-value. See (13–15) for a more detailed description of DASP utilities and architecture.
Figure 2.Examples of queries and results from PREX. The screenshot shown in Box 1a represents a taxonomy search for database members from Treponema pallidum. Text searches of the PREX database can also be conducted by GI, accession numbers, PDB ID or protein annotation. If a matching protein is identified, the user is taken directly to the results window; the screenshot shown in Box 2 represents the single Prx found in T. pallidum. Protein sequence searches of PREX (Box 1b) utilize BLAST to identify PREX database members with high sequence similarity to the query sequence. Shown in Box 1c is the BLAST output obtained after searching the full sequence of T. pallidum AhpC. Selecting the GI number of one of the identified proteins will direct the user to the results window for that PREX database member (Box 2). Selecting the functional site signature generates a multiple sequence alignment (Box 3) containing the functional site signatures for the selected PREX database protein (labeled as PREX_query), 4–5 selected members of the same subfamily and one representative from each of the other subfamilies. If accessed through a BLAST search, the multiple sequence alignment also includes the full sequence of the original sequence query (labeled as BLAST_query). Colors in Box 3 identify the subfamily assignment for each signature. In bold is the PXXX(T/S)XXC sequence motif that is invariant at the active site of Prx proteins (16).