Literature DB >> 26339154

PHYSICO2: an UNIX based standalone procedure for computation of physicochemical, window-dependent and substitution based evolutionary properties of protein sequences along with automated block preparation tool, version 2.

Shyamashree Banerjee1, Parth Sarthi Sen Gupta1, Arnab Nayek1, Sunit Das2, Vishma Pratap Sur2, Pratyay Seth2, Rifat Nawaz Ul Islam2, Amal K Bandyopadhyay2.   

Abstract

UNLABELLED: Automated genome sequencing procedure is enriching the sequence database very fast. To achieve a balance between the entry of sequences in the database and their analyses, efficient software is required. In this end PHYSICO2, compare to earlier PHYSICO and other public domain tools, is most efficient in that it i] extracts physicochemical, window-dependent and homologousposition-based-substitution (PWS) properties including positional and BLOCK-specific diversity and conservation, ii] provides users with optional-flexibility in setting relevant input-parameters, iii] helps users to prepare BLOCK-FASTA-file by the use of Automated Block Preparation Tool of the program, iv] performs fast, accurate and user-friendly analyses and v] redirects itemized outputs in excel format along with detailed methodology. The program package contains documentation describing application of methods. Overall the program acts as efficient PWS-analyzer and finds application in sequence-bioinformatics. AVAILABILITY: PHYSICO2: is freely available at http://sourceforge.net/projects/physico2/ along with its documentation at https://sourceforge.net/projects/physico2/files/Documentation.pdf/download for all users.

Entities:  

Keywords:  ABPT; BLOCK; CYGWIN; FASTA; Program; Protein sequence

Year:  2015        PMID: 26339154      PMCID: PMC4546997          DOI: 10.6026/97320630011366

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Candidate sequences of almost every protein in the database belong to a named family (e.g cytochrome c) with many taxonomic groups (e.g. metazoan, cyanobacteriaum etc). Comprehensive statistical analyses of a) physicochemical [1, 2, 3], b) window-dependent [2], c) homologous position-based substitution properties (PWS) [4] and their comparison among different taxonomic groups have been the recent trend in sequence bioinformatics [1, 2]. Considering the evergrowing sequence databases of about 6000 genomes, rapid yet accurate PWS-analyses are sought to bring a balance between sequences entry via genome project and their studies. There are different web-tools that perform either physicochemical [5, 6] or window-dependent [6] analysis on per-sequence basis for one [5, 6] or few properties [6]. Webtools are also there that use amino acid index values [7] for prediction of interaction profile of sequence [8, 9] or sequences [9] per run. However, web-tools are rare that allow mass-scale, user-friendly analyses of PWS (3-in-1) properties in a single run using any form of FASTA-file. Gaining insight into PWS differential among different taxonomic groups, is of great significance in sequence bioinformatics [1, 2, 4], would be computationally costly by analyzing one sequence at a time and then computing the average. Moreover management of analytical and graphical web-data by later procedure is very cumbersome and error-prone. Further, sharing of same websoftware by worldwide users might cause lower processivity. While PHYSICO [10], in contrast performs batch analyses for PWS properties, their ranges are still less exhaustive. Now the later to serve better, an upgraded version seems urgent such that users i] could relish the flexibility in setting relevant inputparameters, ii] could procure additional outputs on windowdependent profiles for RAW-FASTA, pI-profiles and items in similar kind of output as PHYSICO that contains novel reports on substitution-based positional as well as BLOCK specific diversity and conservation, iii] can access detailed documentation on principle and methodologies used in the program. Although capable in analyzing, PHYSICO is unable to perform painstaking BLOCK-FASTA-file preparation that would not only be necessary for extraction of extra information but also for comparison of novel evolutionary properties among different taxonomic groups of a given family. PHYSICO2 incorporates all the above attributes along with comprehensive up-gradation of PWS properties in reference to earlier version and thus has been a unique tool in sequence bioinformatics.

Methodology

The program works on input FASTA file of any form Figure 1: (F1 & F2). Upon execution it optionally allows to change default input-parameters (DPAR) such as residue classes, pI-method and Shannon-threshold by users one (UPAR). Program then enters into first phase (P1) of analyses. In contrast to BLOCKFASTA (F2), RAW-FASTA (F1) input produces only one output (Figure 1:R1) as in this case homologous positions are noncomparable and thus column specific analyses (that produce additional three outputs in the former one: B2, B3 and B4) are skipped. RAW-FASTA-file (F1) harboring sequences from one or more taxonomic groups that are readily converted into BLOCK-FASTA-file or files of identical width respectively using ABPT (F3) of the program. In second phase of computation (P2), the program performs window-dependent property analysis. In this case if the input is BLOCK-FASTA (where homologous positions are aligned), all sequence specific profiles are redirected into one excel table (R5) to facilitate easy computation of mean along with standard deviation for taxonomically related sequences (Documentation) otherwise each sequence specific profile is saved separately (R21, R22 etc) in named directory (Documentation).
Figure 1

Flow chart for functioning of PHYSICO2.

Program input

PHYSICO2 is extensively tested to function in CYGWIN (32-bit) environment. It takes either RAW (Figure 1: F1) or BLOCK − FASTA (F2) as input. While the former is directly usable upon downloading from the database, the later is to be prepared (either manually or programmatically) prior to its use as input. Unlike PHYSICO where one needs to prepare BLOCK-FASTAfile manually, PHYSICO2 includes ABPT for its preparation (Documentation). Users are also prompted for inputparameters such as residue-classes, pI-method and Shannonthreshold.

Method of computation, performance of the program and experimental validation

Detailed method precedes analytical results of each item in each output file. We performed PHYSICO2 based analyses on representative candidate sequences from two taxonomic groups (metazoa: 31 and cyanobacteria: 32 sequences) of “cytochrome c family” in Intel(R) core™ i3 CPU M330 @2.13 GHz PCCYGWIN (32 bit) environment. We also performed same analysis using “PROTPARAM” for physicochemical (8 properties [6] and their averages) and “PROTSCALE” (individual and average profile) for window-dependent [6] properties using “Alliance Broadband: PRIME (54Mbps) package” internet connection. Efficient use of these tools took ≥ 10 hours for obtaining these results in excel. On the other hand, only 6 minutes was sufficient to obtain the above and other additional properties in PWS-format using PHYSICO2. To compare itemized-results, same set of sequences were subjected for analysis using available web-tools [5, 6] and PHYSICO2. Although not all items of results could be compared due to lack of public domain program (e.g. substitution hetero-pair diversity) above items showed exactly similar results (data not shown).

Program output

PHYSICO2 redirects itemized one output on physicochemical properties for each of BLOCK (Figure 1: B1) and RAW-FASTAfiles (R1). Unlike RAW-FASTA-file, BLOCK-FASTA-file produces homologous position based three additional outputs (B2, B3 and B4). Window-dependent property in case of BLOCK-FASTA-file is in one compact output for all sequences (Figure 1: B5) and that are in separate outputs in case of RAWFASTA-file (Figure 1: R11, R22, etc). Detailed method also precedes analytical results of each item of each output.

Caveats and future development

PHYSICO2 is written in AWK language that runs from C or B shell of CYGWIN (32-bit) operating system. We are developing GUI-based application for the program.

Conclusion

PHYSICO2, unlike other public domain programs acts as PWS analyzer as 3-in-1 form. Unlike earlier PHYSICO, it not only doubles PWS properties in outputs but also redirects new output; new items in outputs along with detailed methodology from identical input file as earlier. It provides users flexibility for input-parameters and helps auto-preparation of BLOCKFASTA file.
  9 in total

Review 1.  Protein identification and analysis tools in the ExPASy server.

Authors:  M R Wilkins; E Gasteiger; A Bairoch; J C Sanchez; K L Williams; R D Appel; D F Hochstrasser
Journal:  Methods Mol Biol       Date:  1999

2.  Methods and algorithms for statistical analysis of protein sequences.

Authors:  V Brendel; P Bucher; I R Nourbakhsh; B E Blaisdell; S Karlin
Journal:  Proc Natl Acad Sci U S A       Date:  1992-03-15       Impact factor: 11.205

3.  AAindex: Amino Acid Index Database.

Authors:  S Kawashima; H Ogata; M Kanehisa
Journal:  Nucleic Acids Res       Date:  1999-01-01       Impact factor: 16.971

4.  SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments.

Authors:  C Geourjon; G Deléage
Journal:  Comput Appl Biosci       Date:  1995-12

5.  Amino acid substitutions preserve protein folding by conserving steric and hydrophobicity properties.

Authors:  I Ladunga; R F Smith
Journal:  Protein Eng       Date:  1997-03

6.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.

Authors:  H B Rao; F Zhu; G B Yang; Z R Li; Y Z Chen
Journal:  Nucleic Acids Res       Date:  2011-05-23       Impact factor: 16.971

7.  Computational and statistical analyses of amino acid usage and physico-chemical properties of the twelve late embryogenesis abundant protein classes.

Authors:  Emmanuel Jaspard; David Macherel; Gilles Hunault
Journal:  PLoS One       Date:  2012-05-16       Impact factor: 3.240

8.  PHYSICO: An UNIX based Standalone Procedure for Computation of Individual and Group Properties of Protein Sequences.

Authors:  Parth Sarthi Sen Gupta; Shyamashree Banerjee; Rifat Nawaz Ul Islam; Sudipta Mondal; Buddhadev Mondal; Amal K Bandyopadhyay
Journal:  Bioinformation       Date:  2014-02-19

9.  Analogue encoding of physicochemical properties of proteins in their cognate messenger RNAs.

Authors:  Anton A Polyansky; Mario Hlevnjak; Bojan Zagrovic
Journal:  Nat Commun       Date:  2013       Impact factor: 14.919

  9 in total
  7 in total

1.  Substitutional Analysis of Orthologous Protein Families Using BLOCKS.

Authors:  Parth Sarthi Sen Gupta; Shyamashree Banerjee; Rifat Nawaz Ul Islam; Vishma Pratap Sur; Amal K Bandyopadhyay
Journal:  Bioinformation       Date:  2017-01-19

2.  Insight into SNPs and epitopes of E protein of newly emerged genotype-I isolates of JEV from Midnapur, West Bengal, India.

Authors:  Shyamashree Banerjee; Parth Sarthi Sen Gupta; Amal Kumar Bandyopadhyay
Journal:  BMC Immunol       Date:  2017-03-06       Impact factor: 3.615

3.  Structural insights from water-ferredoxin interaction in mesophilic algae and halophilic archaea.

Authors:  Amal Kumar Bandyopadhyay; Rifat Nawaz U Islam; Debanjan Mitra; Sahini Banerjee; Arunava Goswami
Journal:  Bioinformation       Date:  2019-02-28

4.  AUTOMINv1.0: an automation for minimization of Protein Data Bank files and its usage.

Authors:  Rifat Nawaz Ul Islam; Debanjan Mitra; Parth Sarthi Sen Gupta; Sahini Banerjee; Buddhadev Mondal; Amal Kumar Bandyopadhyay
Journal:  Bioinformation       Date:  2018-12-22

5.  Insights from the salt bridge analysis of malate dehydrogenase from H. salinarum and E.coli.

Authors:  Amal Kumar Bandyopadhyay; Rifat Nawaz Ul Islam; Debanjan Mitra; Sahini Banerjee; Saba Yasmeen; Arunava Goswami
Journal:  Bioinformation       Date:  2019-02-28

6.  PROPAB: Computation of Propensities and Other Properties from Segments of 3D structure of Proteins.

Authors:  Rifat Nawaz Ul Islam; Chittran Roy; Parth Sarthi Sen Gupta; Shyamashree Banerjee; Debanjan Mitra; Sahini Banerjee; Amal Kumar Bandyopadhyay
Journal:  Bioinformation       Date:  2018-05-31

7.  Intrinsic basis of thermostability of prolyl oligopeptidase from Pyrococcus furiosus.

Authors:  Sahini Banerjee; Parth Sarthi Sen Gupta; Rifat Nawaz Ul Islam; Amal Kumar Bandyopadhyay
Journal:  Sci Rep       Date:  2021-06-02       Impact factor: 4.379

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.