Literature DB >> 22123743

PASS2 version 4: an update to the database of structure-based sequence alignments of structural domain superfamilies.

A Gandhimathi¹, Anu G Nair, R Sowdhamini.

Abstract

Accurate structure-based sequence alignments of distantly related proteins are crucial in gaining insight about protein domains that belong to a superfamily. The PASS2 database provides alignments of proteins related at the superfamily level and are characterized by low sequence identity. We thus report an automated, updated version of the superfamily alignment database known as PASS2.4, consisting of 1961 superfamilies and 10,569 protein domains, which is in direct correspondence with SCOP (1.75) database. Database organization, improved methods for efficient structure-based sequence alignments and the analysis of extreme distantly related proteins within superfamilies formed the focus of this update. Alignment of family-specific functional residues can be realized using such alignments and is shown using one superfamily as an example. The database of alignments and other related features can be accessed at http://caps.ncbs.res.in/pass2/.

Entities: Chemical Disease

Mesh：

Substances：

Year: 2011 PMID： 22123743 PMCID： PMC3245109 DOI： 10.1093/nar/gkr1096

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The motivation for improved protein structure comparison, alignment and characterization is currently defined simply by quantity-the rate of increase in the number of experimentally determined new folds and the number of structures adopting each fold. Accurate sequence alignments for homologous proteins are essential for constructing accurate motifs, profiles and in building homology models (1). The correct sequence alignment of distantly related proteins, where the sequence similarity is very low, is often hard to obtain based on sequence similarity alone (2,3). In such cases, structure-based sequence alignment methods could be helpful to reveal features that are essential for both structure and function. The observation of structural homology leads to the development of structural alignment tools, which are becoming useful upon the acceleration of protein structure determination and the Structural Genomics project (4). Protein domains that are grouped together at superfamily level are defined as having structural, functional and sequence similarities and evidence for a common evolutionary ancestor. They are also characterized by conserved structural core and poor sequence identity. SCOP (5) database provides a detailed and comprehensive description about protein structures organized at different hierarchies of structural and functional similarities. ASTRAL (6) provides an explicit mapping between the PDB ATOM and SEQRES records within PDB files, which is used to derive databases of sequences corresponding to the SCOP domains. A somewhat similar database as ours is S4 (7), which provides multiple structure-based alignments of SCOP (version 1.63) protein superfamilies and was made publicly available in the year 2005. There are well known databases available for alignment of homologous proteins. The HOMSTRAD (8) database contains aligned three-dimensional structures of homologous proteins. PALI (9) is another database providing Phylogeny and ALIgnment of homologous protein structures and contains structure-based sequence alignments. PASS2 database provides structure-based sequence alignments of the SCOP superfamilies and it is updated according to the SCOP release since 1998. Here, we report an updated version of the PASS2 version 4 in direct correspondence with the SCOP 1.75. Besides a simple update with accumulated entries (as described in ‘Overview of PASS2 versions’ below), we have modified the codes to handle large superfamilies. The codes have now been organized in Linux platform for convenient updates in future and our alignment protocol employs improved methods of alignment. We have explained about the mapping of family-specific functional residues using riboflavin synthase superfamily as an example. We have also analysed the extreme-deviant members, the outliers, of some superfamilies.

OVERVIEW OF PASS2 VERSIONS

The idea of structure-based sequence alignment and analysis of protein domain superfamilies originally started with CAMPASS (10), The automated version of CAMPASS, called as PASS2 (11), which we now refer to as PASS2.1, contained 613 superfamilies in direct correspondence with SCOP 1.53. The subsequent versions of PASS2 [PASS2.2 and PASS2.3 (12,13)] have been updated in direct correspondence with SCOP1.63 and SCOP 1.73, respectively. In most PASS2 versions, we have classified the superfamilies into single-member (SMS), two-member (TMS) and multi-member (MMS) superfamilies, which directly implies the number of domains with <40% identity with other domains in the superfamily. TMS and MMS are aligned using specific alignment method from PASS2 version 3 onwards. The statistics of all the four versions are reported in Figure 1. The current version of PASS2, PASS2.4, holds 10 569 protein domains (at a 40% sequence identity cut-off) belonging to 1961 superfamilies and is in direct correspondence with SCOP 1.75.

Figure 1.

Overview of PASS2 over the past few versions. Number of superfamilies from PASS2.1 (8), PASS2.2 (9) through PASS2.3 (10) have increased over the years. Total number of superfamilies are shown in SMS, TMS and MMS categories.

IMPROVEMENTS IN THE CURRENT VERSION

PASS2 version 4 is updated in correspondence with the SCOP 1.75. Alignment protocol has been revised as described in alignment protocol. This version of database also aims at improved user interface, like JMOL view, JMOL command input area and introductory pop-ups for search results. In continuation of our introduction of the outliers in PASS2.3, in the current version, we have re-examined the nature and category of outliers in superfamilies (Supplementary Data). In the earlier versions, there were difficulties in aligning large superfamilies. These issues have been addressed so that it is possible to automate the whole protocol and move the codes to the Linux platform. The protocol is being automated for further updates to minimize any manual interventions.

ALIGNMENT PROTOCOL

Initially, pre-processing of the domains such as removing the hetero atoms and retaining one coordinate set in NMR structure are done using in-house programmes. For TMS, MINRMS (14) is used for the initial alignment and that initial equivalences are utilized by COMPARER (15) for the refined alignment. After a careful assessment of different protocols for the alignment of MMS (detailed in Supplementary Data, Supplementary Tables ST1, ST2 and Supplementary Figure SF1), MATT (16) was chosen for initial alignment. From the initial alignment, equivalent regions were identified by JOY (17) and structure-guided tree information was obtained from MATT to form as inputs in COMPARER. These initial equivalences serve as seeds for rigid-body superposition using MNFC, a modified form of MNYFIT (18) (Supplementary Figure SF2). Final accepted alignments were structure annotated for the structural information such as, secondary structural regions, solvent accessibility of residues and pattern of hydrogen bonds by employing the JOY program. The alignment is assessed using mean RMSD and percentage of conserved secondary structural equivalence (POCSSE) (Supplementary Data).These two parameters were viewed as important quality checks of multiple alignment.

ORGANIZATION OF THE DATABASE

Similar to the previous versions (Supplementary Table ST3), the major focus of database is at the superfamily level, but searches can be made using keywords at various levels, like SCOP classes, folds and domains. The current version, PASS2.4, provides information about features such as HMM (19,20), Structural Motif (21), structural phylogeny, PCA analysis and CUSP (22,23) as discussed in the previous versions of PASS2.2 and PASS2.3. In addition, all the feature files, alignments and structural superposition are downloadable via webpage. At the protein domain level, accessory files, used for JOY (17), like PSA, SST and HBD files are also downloadable. Other utilities such as, PSI-BLAST (24), PHI-BLAST (24), constructed HMM profiles based on PASS2 alignments and 3D structural annotation of query alignment/sequence, were modified and updated corresponding to the latest PASS2 database. Some general utilities such as Alistat (19), multiple formats of the alignment and a README file, which is helpful for the user to know more details about the each superfamily are also provided as in the previous version.

MAPPING FAMILY-SPECIFIC FUNCTIONAL RESIDUE MOTIFS: EXAMPLE OF RIBOFLAVIN SYNTHASE SUPERFAMILY

Protein function prediction is one of the central problems in computational biology. The confidence of function inference from structure depends on the number of family-specific motifs found in the query structure compared to their distribution in a large non-redundant database of proteins (25). The PASS2 protocol is able to map the family specific as well as functional important residues. We have done the case study on riboflavin synthase superfamily which consists of three families. After a careful structure-based alignment of superfamily members, as recorded in PASS2.4, the motif pattern of LTV and VNV are specific to only riboflavin synthase family (26,27) and pattern GD and GQ are specific to NADPH-cytochrome p450 reductase FAD-binding domain-like family and reductase FAD-binding domain-like family, respectively. The results show that our structure-based sequence alignment protocol retains family specific as well as functionally important residues in equivalent positions in the alignment. This is one of the important applications of the PASS2 alignments that show the critical analysis of superfamilies and functionally important as well as family-specific residues is possible (riboflavin synthase superfamily in Supplementary Figure SF3).

CONCLUSIONS

PASS2 database organizes structure-based sequence alignments of protein domain superfamilies in correspondence with SCOP definitions. In this update of PASS2 database, PASS2.4, we have introduced maximal level of automation. In addition, PASS2.4 alignments were useful to align functionally important residues as well as family-specific residues (Supplementary Figure SF4–SF6). We also suggest that structurally deviant superfamily members could be removed as outliers, so that such extreme distant relationships will not influence the alignment. Analysis of structural and sequence differences amongst known superfamily members hopefully provide useful guidelines for modelling distantly related proteins.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Tables 1–3, Supplementary Figures 1–6 and Supplementary References [28,29].

FUNDING

Funding for open access charge: Department of Biotechnology, India. Conflict of interest statement. None declared.

28 in total

1. Large-scale comparison of protein sequence alignment algorithms with structure alignments.

Authors: J M Sauder; J W Arthur; R L Dunbrack
Journal: Proteins Date: 2000-07-01

Review 2. Protein structure similarities.

Authors: P Koehl
Journal: Curr Opin Struct Biol Date: 2001-06 Impact factor: 6.809

3. Comparison of sequence and structure alignments for protein domains.

Authors: Aron Marchler-Bauer; Anna R Panchenko; Naomi Ariel; Stephen H Bryant
Journal: Proteins Date: 2002-08-15

4. MINRMS: an efficient algorithm for determining protein structure similarity using root-mean-squared-distance.

Authors: Andrew I Jewett; Conrad C Huang; Thomas E Ferrin
Journal: Bioinformatics Date: 2003-03-22 Impact factor: 6.937

5. HOMSTRAD: recent developments of the Homologous Protein Structure Alignment Database.

Authors: Lucy A Stebbings; Kenji Mizuguchi
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

6. The ASTRAL Compendium in 2004.

Authors: John-Marc Chandonia; Gary Hon; Nigel S Walker; Loredana Lo Conte; Patrice Koehl; Michael Levitt; Steven E Brenner
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

7. The structure of the N-terminal domain of riboflavin synthase in complex with riboflavin at 2.6A resolution.

Authors: Winfried Meining; Sabine Eberhardt; Adelbert Bacher; Rudolf Ladenstein
Journal: J Mol Biol Date: 2003-08-29 Impact factor: 5.469

8. PASS2: a semi-automated database of protein alignments organised as structural superfamilies.

Authors: V Mallika; Anirban Bhaduri; R Sowdhamini
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

9. PASS2: an automated database of protein alignments organised as structural superfamilies.

Authors: Anirban Bhaduri; Ganesan Pugalenthi; Ramanathan Sowdhamini
Journal: BMC Bioinformatics Date: 2004-04-02 Impact factor: 3.169

10. CUSP: an algorithm to distinguish structurally conserved and unconserved regions in protein domain alignments and its application in the study of large length variations.

Authors: Sankaran Sandhya; Barah Pankaj; Madabosse Kande Govind; Bernard Offmann; Narayanaswamy Srinivasan; Ramanathan Sowdhamini
Journal: BMC Struct Biol Date: 2008-05-31

8 in total

1. Capreomycin susceptibility is increased by TlyA-directed 2'-O-methylation on both ribosomal subunits.

Authors: Tanakarn Monshupanee; Shanna K Johansen; Albert E Dahlberg; Stephen Douthwaite
Journal: Mol Microbiol Date: 2012-08-01 Impact factor: 3.501

2. PASS2.7: a database containing structure-based sequence alignments and associated features of protein domain superfamilies from SCOPe.

Authors: Teerna Bhattacharyya; Soumya Nayak; Smit Goswami; Vasundhara Gadiyaram; Oommen K Mathew; Ramanathan Sowdhamini
Journal: Database (Oxford) Date: 2022-04-12 Impact factor: 4.462

3. Key challenges for the creation and maintenance of specialist protein resources.

Authors: Gemma L Holliday; Amos Bairoch; Pantelis G Bagos; Arnaud Chatonnet; David J Craik; Robert D Finn; Bernard Henrissat; David Landsman; Gerard Manning; Nozomi Nagano; Claire O'Donovan; Kim D Pruitt; Neil D Rawlings; Milton Saier; Ramanathan Sowdhamini; Michael Spedding; Narayanaswamy Srinivasan; Gert Vriend; Patricia C Babbitt; Alex Bateman
Journal: Proteins Date: 2015-04-22

4. Rebelling for a reason: protein structural "outliers".

Authors: Gandhimathi Arumugam; Anu G Nair; Sridhar Hariharaputran; Sowdhamini Ramanathan
Journal: PLoS One Date: 2013-09-20 Impact factor: 3.240

5. The value of protein structure classification information-Surveying the scientific literature.

Authors: Naomi K Fox; Steven E Brenner; John-Marc Chandonia
Journal: Proteins Date: 2015-09-19

6. PASS2 version 6: a database of structure-based sequence alignments of protein domain superfamilies in accordance with SCOPe.

Authors: Pritha Ghosh; Teerna Bhattacharyya; Oommen K Mathew; Ramanathan Sowdhamini
Journal: Database (Oxford) Date: 2019-01-01 Impact factor: 3.451

7. LenVarDB: database of length-variant protein domains.

Authors: Eshita Mutt; Oommen K Mathew; Ramanathan Sowdhamini
Journal: Nucleic Acids Res Date: 2013-11-04 Impact factor: 16.971

8. PASS2 database for the structure-based sequence alignment of distantly related SCOP domain superfamilies: update to version 5 and added features.

Authors: Arumugam Gandhimathi; Pritha Ghosh; Sridhar Hariharaputran; Oommen K Mathew; R Sowdhamini
Journal: Nucleic Acids Res Date: 2015-11-08 Impact factor: 16.971

8 in total