Literature DB >> 22923302

HSPIR: a manually annotated heat shock protein information resource.

Ratheesh Kumar R¹, Nagarajan N S, Arunraj S P, Devanjan Sinha, Vinoth Babu Veedin Rajan, Vinoth Kumar Esthaki, Patrick D'Silva.

Abstract

SUMMARY: Heat shock protein information resource (HSPIR) is a concerted database of six major heat shock proteins (HSPs), namely, Hsp70, Hsp40, Hsp60, Hsp90, Hsp100 and small HSP. The HSPs are essential for the survival of all living organisms, as they protect the conformations of proteins on exposure to various stress conditions. They are a highly conserved group of proteins involved in diverse physiological functions, including de novo folding, disaggregation and protein trafficking. Moreover, their critical role in the control of disease progression made them a prime target of research. Presently, limited information is available on HSPs in reference to their identification and structural classification across genera. To that extent, HSPIR provides manually curated information on sequence, structure, classification, ontology, domain organization, localization and possible biological functions extracted from UniProt, GenBank, Protein Data Bank and the literature. The database offers interactive search with incorporated tools, which enhances the analysis. HSPIR is a reliable resource for researchers exploring structure, function and evolution of HSPs. AVAILABILITY: http://pdslab.biochem.iisc.ernet.in/hspir/

Entities: Disease Gene

Mesh：

Substances：

Year: 2012 PMID： 22923302 PMCID： PMC3476333 DOI： 10.1093/bioinformatics/bts520

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 INTRODUCTION

Heat shock proteins (HSPs) are a specialized group of proteins robustly synthesized in all living organisms in response to various conditions of stress, including elevated temperatures. HSPs are critical for cell survival both constitutively and in times of stress to ensure proper folding of non-native states of proteins (Bukau ; Parsell and Lindquist, 1993). Based on their nature of functions and molecular mass, HSPs are classified broadly into six major families, namely, Hsp70, Hsp40 (J-proteins), Hsp60 (chaperonins), Hsp90, Hsp100 (Clp proteins) and small HSPs. (Lund 2001; Kampinga and Craig, 2010) They function cooperatively by forming an intricate molecular network, thereby maintaining the overall cellular protein homeostasis (Lund, 2001). Their diversified nature and vast repertoire of functions have generated a significant interest to deduce an intricate cellular chaperone network and functional crosstalk among major families of HSPs. Presently, ‘cpnDB’ (Hill ) database exists but contains data only for Hsp60 family. Heat shock protein information resource (HSPIR) provides a comprehensive collection of information on six major HSP families across various genomes, with detailed subclassification based on their domain, structural organization and localization. HSPIR also includes sequences that are not yet annotated in UniProt. Additionally, HSPIR offers various tools like BLAST (Altschul ) for homology search, CLUSTALW (Larkin ) for multiple sequence alignment, Archaeopteryx (Han and Zmasek, 2009) for phylogenetic tree visualization and manipulation and Jmol (http://www.jmol.org/) structural viewer. The database currently holds ∼10 000 hand-curated entries from six kingdoms, covering all the major model organisms and 295 3D structures.

2 DATA RETRIEVAL AND CURATION

We did an extensive literature survey to retrieve names, nomenclature, functions and structural information of HSPs using the PubMed query system. With this knowledge, we created a comprehensive list of standard names and alternative names for each HSP family. Structures and their corresponding sequences of HSPs were retrieved from Protein Data Bank (PDB). The aforementioned generated data were used for keyword and sequence search against SwissProt (Boeckmann ). These data sets were then filtered to include sequences that belong to protein existence level 1 or 2 (evidence at protein level or evidence at the transcript level, respectively). Sequences with domains that are partial in length or missing any functional motifs were discarded. Using these initial data sets as seed sequences (refer Supplementary Table S1), position-specific scoring matrix (PSSM) was created for each family of HSP. Organism-specific PSI-BLAST was performed using the PSSM with an e-value cut-off of 0.0001 against the NCBI non-redundant protein sequence database (Altschul ; Benson ) to populate HSPIR. Extreme care was taken to remove the duplicated and highly truncated sequences from the data sets. These collated data sets were then manually curated by taking single sequence at a time and using different database search methods to annotate structural and functional information (Supplementary Figure S1). We used a wider collection of the protein family databases such as NCBI CDD (Marchler-Bauer ), Pfam (Finn ), InterPro (Hunter ) and SMART (Letunic ) to identify domain architecture and associated functional motifs of HSPs. The secondary structural assignments were done using PSIPRED Version 3.2 (McGuffin ), and subcellular localization and signal peptide regions were predicted using TargetP (Emanuelsson ; Nielsen ), WolfPsort (Horton ) and Psort (Nakai and Horton, 1999). Taxon information of each organism was obtained from NCBI Taxonomy database (Benson ). Gene ontologies were inferred from UniProtKB (The UniProt Consortium, 2012). Experimentally determined 3D structures were retrieved from the PDB (Berman ). Available literature references were generated from PubMed; the cross-references and identifiers of external databases were also imported.

3 DATABASE IMPLEMENTATION

HSPIR is built using open source MySQL database and interfaced with server-side PHP scripts. The Web interface uses dynamically generated HTML pages supported by JavaScript and CSS to provide an interactive environment for public access. Perl-CGI scripts have been used to compile the BLAST, CLUSTALW and hidden Markov model search features (Eddy, 1998).

4 DATABASE INTERFACE AND VISUALIZATION

HSPIR is a user-friendly Web resource in which the homepage provides a model of functional networks of six major HSP families (Supplementary Figure S2). Each of them is mapped with a dedicated Web page explaining their structure, domain organization, classification and physiological significance with diagrammatic illustrations.

4.1 Search features

HSPIR incorporates four different search features. Basic keyword search allows finding of HSPs based on their names (includes gene names, standardized and synonymous names), families, identifiers (HSPIR and external) and classifications. The advanced search is our key feature, which narrows the search criteria for specific and better results. Users are able to refine their search using the combined search method with logical and relational operators dynamically organized on the page. The data retrieval can be further streamlined using other specialized query tools such as genome-wide search and domain-based search. These tools can query database independently to retrieve records based on a specific genome, specified combination of domains.

4.2 Results

Results of all the search tools are presented in the form of a paginated table. Protein records can be viewed by clicking the accession ID, or have been added to the HSPIR cart for downloading and further analysis. Individual protein records comprise names and lineage, classification, sequence information, domains and motifs, structures, ontologies, references, cross-references, external links to different databases and, finally, the protein record information.

4.3 Sequence comparison

BLAST stand-alone package implemented in the database allows the users to search for a query protein against HSPIR database and identify similar HSP sequences. Comparing of multiple sequences can be done using CLUSTALW, and the tree is visualized using Archaeopteryx.

4.4 HSP identification

The HSP identification tool allows the user to identify and classify unknown sequences into a particular HSP family. The user-provided sequence is scanned against predefined HSP libraries of profiles created from a set of validated seed sequences (Supplementary Table S1).

5 SUMMARY AND FUTURE PRESPECTIVES

The scope of HSPIR is to provide a dedicated resource for HSPs with functional annotations. The interactive search features with collated information provided in the database will allow researchers to perform comparative analysis and explore additional physiological functions of HSPs in different species, which was not well appreciated previously. Moreover, the data in the HSPIR will be checked for updates weekly, using PHP scripts and parsers scheduled by crontab. These updated records will be reviewed and uploaded by the curation team. The future perspective is to incorporate HSP information for additional genomes, with a special emphasis on pathogenic species. We will include other specialized chaperones like disulfide isomerases, accessory proteins such as nucleotide exchange factors, prefoldins and HSP90 co-chaperones.

21 in total

1. The Protein Data Bank.

Authors: H M Berman; J Westbrook; Z Feng; G Gilliland; T N Bhat; H Weissig; I N Shindyalov; P E Bourne
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization.

Authors: K Nakai; P Horton
Journal: Trends Biochem Sci Date: 1999-01 Impact factor: 13.807

3. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors: Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

Review 4. Molecular chaperones and protein quality control.

Authors: Bernd Bukau; Jonathan Weissman; Arthur Horwich
Journal: Cell Date: 2006-05-05 Impact factor: 41.582

Review 5. Profile hidden Markov models.

Authors: S R Eddy
Journal: Bioinformatics Date: 1998 Impact factor: 6.937

6. Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites.

Authors: H Nielsen; J Engelbrecht; S Brunak; G von Heijne
Journal: Protein Eng Date: 1997-01

Review 7. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

Review 8. The function of heat-shock proteins in stress tolerance: degradation and reactivation of damaged proteins.

Authors: D A Parsell; S Lindquist
Journal: Annu Rev Genet Date: 1993 Impact factor: 16.830

9. cpnDB: a chaperonin sequence database.

Authors: Janet E Hill; Susanne L Penny; Kenneth G Crowell; Swee Han Goh; Sean M Hemmingsen
Journal: Genome Res Date: 2004-08 Impact factor: 9.043

10. Reorganizing the protein space at the Universal Protein Resource (UniProt).

Authors:
Journal: Nucleic Acids Res Date: 2011-11-18 Impact factor: 16.971

21 in total

1. Pervasive convergent evolution and extreme phenotypes define chaperone requirements of protein homeostasis.

Authors: Yasmine Draceni; Sebastian Pechmann
Journal: Proc Natl Acad Sci U S A Date: 2019-09-16 Impact factor: 11.205

2. Analysis of serum heat shock protein 70 (HSPA1A) concentrations for diagnosis and disease activity monitoring in patients with rheumatoid arthritis.

Authors: Seyed Reza Najafizadeh; Zaniar Ghazizadeh; Arash Aghajani Nargesi; Masoud Mahdavi; Shabnam Abtahi; Hossein Mirmiranpour; Manouchehr Nakhjavani
Journal: Cell Stress Chaperones Date: 2015-03-06 Impact factor: 3.667

3. Using FlyBase to Find Functionally Related Drosophila Genes.

Authors: Alix J Rey; Helen Attrill; Steven J Marygold
Journal: Methods Mol Biol Date: 2018

4. RNAi-Mediated Reverse Genetic Screen Identified Drosophila Chaperones Regulating Eye and Neuromuscular Junction Morphology.

Authors: Sandeep Raut; Bhagaban Mallik; Arpan Parichha; Valsakumar Amrutha; Chandan Sahi; Vimlesh Kumar
Journal: G3 (Bethesda) Date: 2017-07-05 Impact factor: 3.154

5. Expression of Heat Shock Proteins in Thermally Challenged Pacific Abalone Haliotis discus hannai.

Authors: Dongsoo Kyeong; Juyeon Kim; Younhee Shin; Sathiyamoorthy Subramaniyam; Byeong-Chul Kang; Eun-Ha Shin; Eun Hee Park; Eun Soo Noh; Young-Ok Kim; Jung Youn Park; Bo-Hye Nam
Journal: Genes (Basel) Date: 2019-12-23 Impact factor: 4.096

6. Predicting the types of J-proteins using clustered amino acids.

Authors: Pengmian Feng; Hao Lin; Wei Chen; Yongchun Zuo
Journal: Biomed Res Int Date: 2014-04-02 Impact factor: 3.411

7. PredHSP: Sequence Based Proteome-Wide Heat Shock Protein Prediction and Classification Tool to Unlock the Stress Biology.

Authors: Ravindra Kumar; Bandana Kumari; Manish Kumar
Journal: PLoS One Date: 2016-05-19 Impact factor: 3.240

8. JPPRED: Prediction of Types of J-Proteins from Imbalanced Data Using an Ensemble Learning Method.

Authors: Lina Zhang; Chengjin Zhang; Rui Gao; Runtao Yang
Journal: Biomed Res Int Date: 2015-10-26 Impact factor: 3.411

9. sHSPdb: a database for the analysis of small Heat Shock Proteins.

Authors: Emmanuel Jaspard; Gilles Hunault
Journal: BMC Plant Biol Date: 2016-06-13 Impact factor: 4.215

10. ir-HSP: Improved Recognition of Heat Shock Proteins, Their Families and Sub-types Based On g-Spaced Di-peptide Features and Support Vector Machine.

Authors: Prabina K Meher; Tanmaya K Sahu; Shachi Gahoi; Atmakuri R Rao
Journal: Front Genet Date: 2018-01-11 Impact factor: 4.599