Literature DB >> 22829720

CARd: Carbon distribution analysis program for protein sequences.

Ekambaram Rajasekaran1.   

Abstract

Carbon distribution is responsible for stability and structure of proteins. Arrangement of carbon along the protein sequence is depends on how the amino acids are organized and is guided by mRNAs. An atomic level revision is important for understanding these codes. This will ultimately help in identification of disorders and suggest mutations. For this purpose a carbon distribution analysis program has been developed. This program captures the hydrophobic / hydrophilic / disordered regions in a protein. The program gives accurate results. The calculations are precise and sensitive to single amino acid resolution. This program is to help in mutational studies leading to protein stabilisation.

Entities:  

Keywords:  carbon distribution; hydropathy; hydrophobic hydrophilic; mutational study; program; protein disorder

Year:  2012        PMID: 22829720      PMCID: PMC3398768          DOI: 10.6026/97320630008508

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Proteins evolve in response to the nature of interaction and stability. Hydrophobic force considered to be the major force involved in protein folding and action. Carbon is the element contributes towards this dominant force. Arrangement of this carbon along the sequence depends on how the amino acids are organized. Recent studies on this matter find that proteins prefer to have 31.45% of carbon [1, 2] for stability in its structure and sequence. Given this standard, a carbon analysis program [2] has been developed to study the carbon distribution profile of protein sequences. This is not sensitive to single amino acid level. However it program can be used to see carbon along the sequence [3]. This program helps in identification of ligand binding sites [4, 5] and to understand the problem of proteinprotein and protein-DNA interactions [6]. However protein disorders in a short stretch of sequence and possible mutations are not possible to predict. It is reported that allotment of carbon along the sequence is responsible for disorders in proteins [7]. With this idea and carbon scale, a new program CARd has been developed for sensitive measure of hydrophobic quantity at amino acid level.

Methodology

Dataset:

The sample sequence of super oxide dismutase (SOD) was taken from Swissprot database. The UniprotKB ID number of the protein is P00441.

Method:

The flow diagram (Figure 1) outlines how the distribution of inner lengths based on carbon fraction is counted in a outer length. The pink coloured sequence is protein sequence which is converted into atomic sequence (shown in multiple colours (red+blue+pink). The red portion is an inner length. The blue portions are outer length which includes the red portion as well. The entire atomic sequences are given in pink colour which includes both blue and red. Here the outer and inner lengths are taken as 100 and 35 atoms. There are 65 (100-35) inner lengths for statistics. These 65 inner lengths are grouped based on carbon fraction. The C11 has (11/35) 0.31 carbon fraction. The flow chart (Figure 2) on the other side shows how the algorithm works in calculation. A program has been written to do all this calculations. The program reads the protein sequence, converts into atomic sequence, takes a length (anything from 77 to 700 atoms) of sequence, split into small lengths (from 32 to -350 atoms) of equal sizes, finds fraction of carbon atoms in all this small lengths and counts number of small lengths that contain a defined fraction of carbon. There are small lengths with 0.25 / 0.45 carbon but maximum at around 0.31. A distribution of range of small lengths based on carbon fraction appears like a normal distribution curve. This distribution curve is obtained for all possible outer length. The outer lengths can be any length between 77 and 700 corresponds to 5 and 45 amino acids. Any outer length chosen between 100 and 150 atoms is sufficient for most of the observations. Here it is chosen as 150 in all calculations. The inner length can be between 32 and 350 which are not exceeding half of the outer length. Inner length of 35 atoms is chosen here in all calculations as it is the smallest unit with 11 carbons which can produce fraction of 0.31. The outer length is moved with step value of selected atoms. Normally it is half of the outer length. Here it is 75 atoms as the outer length is 150. A carbon distribution profile is obtained for all possible outer length.
Figure 1

Flow Diagram showing how carbon distribution obtained with Outer-Inner length method

Figure 2

Flow chart showing how the algorithm works in the CARd program.

Generally normal profiles will have a Gaussian distribution curve with maximum frequency at 0.3145. Any shift from this maximum is considered as hydrophilic (negative side) or hydrophobic (positive side). Difference in normal distribution is considered as disordered outer length which contains improper amino acid distribution. When the outer length contains a proper arrangement of amino acids then there is normal distribution. That is improper arrangements can be identified from this calculations means that a stretch of sequence which are hydrophobic or hydrophilic or unstable can be predicted. The statistical mean, median, mode and standard deviation of the distribution profile curve can be obtained to check whether the distribution is normal or not. While the mean, median and mode are equal for a given stretch, it is considered as normal distribution. Then the stretch of sequence is in proper mode of arrangement. CARd program computes for every (outer length) distribution profile and prints at the end of the profile results. CARd, the flexible program has option to output mean, median, mode and standard deviation at every amino acid site. It also gives simple average carbon fraction at an amino acid site and for given output length. This replaces our earlier CARBANA program [2] for carbon analysis. A plot of mean value versus amino acid number can be plotted. This plot is to see the overall hydrophobic or hydrophilic regions in the sequence. It is similar to hydropathy plot.

Discussion

Carbon distribution profile:

CARd analysis is carried out for super oxide dismutase sequence. An outer length of 150 atoms and inner length of 35 atoms are used. The carbon distribution plot (Figure 3) is obtained for every outer length with a gap of 75 atoms. The plots show the frequency versus carbon fraction. Each plot is labeled with the range of amino acids involved in the particular outer length. The first plot shows distribution profile for first 10 amino acids and the next one is for 11 amino acids from 6 to 16. Third outer length is from 10 to 21 amino acids and 11 amino acids long. This way the distribution plot till last possible outer length is obtained. Each plot shows a different distribution profile. Outer lengths 72-82, 77-88, 82-93, 88-99, 114-123, 119-129 and 141-152 are having normal distribution and considered as stable regions. Outer lengths 88-99, 114-123 and 129-141 are in perfect distribution. Amino acids in these lengths are arranged with perfect carbon distribution. Outer lengths 93-104 and 99- 110 are in order based on carbon distribution but hydrophobic in nature. Outer lengths 50-63, 110-119, 123-135, 129-141 and 135-146 are having normal carbon distribution but hydrophilic. Similarly the outer lengths 56-68, 63-72, 68-77 and 104-114 are hydrophilic plus disorder. Infact 56-68, 63-72 and 68-77 are metal binding sites. There are metal binding sites such as 46-56, 77-88, 82-93 and 114-123 are having normal carbon distribution. The outer length 50-63 and 56-68 has a disulphide bond that stabilise the structure but carbon distribution is not in order. The disordered outer lengths need to be taken for mutational study. Similar plot on abnormal proteins reveal that most of the stretches are in disrorder regions. So this carbon distribution analysis program can find disordered proteins, small stretch that are disorder and amino acid responsible for disorder.This mathod is sensitive to single amino acid level. This can be better exploited for mutational study leading to stabilisation of proteome.
Figure 3

CARd analysis on SOD protein with 150 atoms (10 amino acid) of outer length and 35 atoms of inner length. Gap between each outer length is chosen as75 atoms. The individual plots are frequency versus carbon fraction for the short stretch labeled. For example the first plot is for first 10 amino acids, the second one is for residues 6-16, third one is for residues 10-21 and so on. The distribution plot is shown till last possible outer length. Each plot shows a different distribution curve. If the distribution plot is normal and maximum frequency at 0.3145, then the particular stretch is normal and stable (e.g. 77-88 and 88-99). When the curve is normal and shifted to left or right then the stretch is a hydrophilic (e.g. 125-146) or hydrophobic (e.g. 10-21) region. If the distribution in not normal then it is disordered region (e.g. 104-114).

Mean median, mode and standard deviation of the distribution profile:

The statistical mean, median, mode and standard deviation of the distribution curve obtained for SOD. The results are not shown. It gives output of average carbon fraction, mean, median with mode and standard deviation of the distribution profile at every amino acid position as shown in Table 1 (see supplementary material). This is achieved by selecting step size of suitable atoms. A plot of amino acid number versus mean values can be obtained to identify the hydrophobic or hydrophilic regions along the sequence. This is similar to CARBANA output on average carbon fraction along the sequence.

Conclusion

CARbon distribution (CARd) analysis program has been developed to capture the hydrophobic/hydrophilic regions in proteins. The calculations are precise and sensitive to single amino acid change. This program is hoped to help in mutational studies leading to protein stabilisation. It can capture the essence of hydrophobicity in small stretch of sequence.
  1 in total

1.  Role of large hydrophobic residues in proteins.

Authors:  Veerasamy Jayaraj; Ramamoorthi Suhanya; Marimuthu Vijayasarathy; Perumal Anandagopu; Ekambaram Rajasekaran
Journal:  Bioinformation       Date:  2009-06-13
  1 in total
  2 in total

1.  Carbon distribution to toxic effect in toxin proteins.

Authors:  Kannaiyan Akila; Rajendran Kaliaperumal; Ekambaram Rajasekaran
Journal:  Bioinformation       Date:  2012-08-03

2.  CARd-3D: Carbon Distribution in 3D Structure Program for Globular Proteins.

Authors:  Rajasekaran Ekambaram; Akila Kannaiyan; Vijayasarathy Marimuthu; Vinobha Chinnaiah Swaminathan; Senthil Renganathan; Ananda Gopu Perumal
Journal:  Bioinformation       Date:  2014-03-19
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.