Literature DB >> 35984536

Plant catalase in silico characterization and phylogenetic analysis with structural modeling.

Takio Nene1, Meera Yadav2, Hardeo Singh Yadav3.   

Abstract

BACKGROUND: Catalase (EC 1.11.1.6) is a heme-containing tetrameric enzyme that plays a critical role in signaling and hydrogen peroxide metabolism. It was the first enzyme to be crystallized and isolated. Catalase is a well-known industrial enzyme used in diagnostic and analytical methods in the form of biomarkers and biosensors, as well as in the textile, paper, food, and pharmaceutical industries. In silico analysis of CAT genes and proteins has gained increased interest, emphasizing the development of biomarkers and drug designs. The present work aims to understand the catalase evolutionary relationship of plant species and analyze its physicochemical characteristics, homology, phylogenetic tree construction, secondary structure prediction, and 3D modeling of protein sequences and its validation using a variety of conventional computational methods to assist researchers in better understanding the structure of proteins.
RESULTS: Around 65 plant catalase sequences were computationally evaluated and subjected to bioinformatics assessment for physicochemical characterization, multiple sequence alignment, phylogenetic construction, motif and domain identification, and secondary and tertiary structure prediction. The phylogenetic tree revealed six unique clusters where diversity of plant catalases was found to be the largest for Oryza sativa. The thermostability and hydrophilic nature of these proteins were primarily observed, as evidenced by a relatively high aliphatic index and negative GRAVY value. The distribution of 5 sequence motifs was uniformly distributed with a width length of 50 with the best possible amino residue sequences that resemble the plant catalase PLN02609 superfamily. Using SOPMA, the predicted secondary structure of the protein sequences revealed the predominance of the random coil. The predicted 3D CAT model from Arabidopsis thaliana was a homotetramer, thermostable protein with 59-KDa weight, and its structural validation was confirmed by PROCHECK, ERRAT, Verify3D, and Ramachandran plot. The functional relationships of our query sequence revealed the glutathione reductase as the closest interacting protein of query protein.
CONCLUSIONS: This theoretical plant catalases in silico analysis provide insight into its physiochemical characteristics and functional and structural understanding and its evolutionary behavior and exploring protein structure-function relationships when crystal structures are unavailable.
© 2022. The Author(s).

Entities:  

Keywords:  Catalase; Homology modeling; In silico; Phylogenetic; Thermostable

Year:  2022        PMID: 35984536      PMCID: PMC9391562          DOI: 10.1186/s43141-022-00404-6

Source DB:  PubMed          Journal:  J Genet Eng Biotechnol        ISSN: 1687-157X


Background

Catalases (EC 1.11.1.6) are iron porphyrin oxidoreductase enzymes that scavenge hydrogen peroxide into water and oxygen [1, 2]. They are heme-containing tetrameric enzymes found in subcellular organelles (peroxisomes), the primary source of H2O2 production during oxidative stress conditions via photorespiratory oxidation, beta oxidation of fatty acids, and purine catabolism [3]. CAT plays a crucial role due to pathological events connected to their dysfunction, such as increased vulnerability to apoptosis, tumor stimulation, regulated aging, and inflammation. It also aids in defensive mechanisms and protects the cell from oxidative damage. Another significant property of catalase is its strong catalytic activity, using H2O2 as a substrate to oxidize phenols, insecticides, herbicides, polyaromatic hydrocarbons, and synthetic textile dyes [4]. Catalase was the first enzyme to crystallize and isolate. They are found in various plant species such as tobacco, Arabidopsis thaliana, pepper, mustard, saffron, maize, castor bean, sunflower, cotton, wheat, and spinach [5-11]. The role of catalase in aging, senescence, and plant defense has been of significant importance. In light of the different applications of catalase mentioned above, the current work is being conducted for in silico analysis from plant sources. Computational investigation of the plant catalase amino sequence revealed the conserved secondary structure in sequences that play a crucial role in evolution. Primary research on catalases was conducted to examine their characteristics and key biological functions. Analyses of the phylogeny of the catalase gene has indicated the existence of three primary clades that separated themselves early in the evolution of this gene family by at least two gene duplication events [12]. A phylogenetic approach could help us account for the intrinsic divergence in enzyme dynamics induced by the natural evolution of sequence variation across time [13]. As genomics advances, computational tools are becoming increasingly crucial in helping to find and describe possible gene families for various industrial uses. This helps untangle the sequence-structure-functional relationship between enzyme protein sequences [14]. The analysis of genes and proteins in silico has gained increased interest, emphasizing the development of biomarkers, drug design, and the development of a very effective microbiological agent suitable for a wide range of industries. The present work aims to understand the catalase evolutionary relationship of plant species and analyze its physicochemical characteristics, homology, phylogenetic tree construction, secondary structure prediction, and 3D modeling of protein sequences and its validation using a variety of conventional computational methods to assist researchers in better understanding the structure of proteins.

Methods

Protein sequence recovery

In FASTA format for various computational analyses, sixty-five full-length catalase protein sequences from various plant sources were retrieved from the NCBI (National Center for Biotechnology Information) database. The number of protein sequences with accession numbers and source organisms is given in Table 1.
Table 1

Selected protein sequences of catalases from different plant sources

Sl. no.Source organismsAccession number of protein sequence retrievedNumber of sequences
1Vigna radiataNP 001304079, BAA02755, ADZ45556, ADZ455554
2Populus deltoidesCAI439481
3Ziziphus jujubaAET975641
4Prunus persicaCAD42908, CAB56850, CAD429093
5Phyllanthus emblicaATO983111
6Nicotiana plumbaginifoliaCAA85426, CAA854242
7Bruguiera gymnorhizaADC956291
8Arabidopsis thalianaCAB80226, CAA17773, CAA455643
9Raphanus sativusAAF717421
10Brassica junceaAAD17934, AAD17936, AAD17935, AAD179334
11Arabis alpinaKFK301471
12Musa acuminataSIW589631
13Solanum tuberosumAAR14052, AAA80650, CAA854703
14Vitis viniferaNP 001268098, AAL837202
15SaccharumAIU99487, AIU99488, AIM43584, AIU994824
16Saccharum spontaneumAIU99481, AIU99480, AIU99485, AIU994864
17Saccharum arundinaceumAIU994841
18Oryza sativaAKO90140, BAA34204, BAA05494, BAA34205, BAA34714, BAA06232, CAA43814, BAA81677, BAA81672, BAA81671, BAA8167011
19Triticum aestivumADF83496, BAA130682
20Festuca arundinaceaCAG239201
21Capsicum annuumNP 001311603, BAF91369, AAF347183
22Solanum melongenaCAA506441
23Solanum lycopersicumAAA341451
24Oryza meridionalisBAA81679, BAA816782
25Oryza rufipogonBAA81676, BAA81675, BAA81674, BAA816734
26Oryza glaberrimaBAA81682, BAA816812
27Oryza barthiiBAA816801
Selected protein sequences of catalases from different plant sources

ProtParam tool for primary sequence analysis

The ExPasy ProtParam tool was used to compute the physiochemical parameters of the selected catalases. ProtParam calculates a variety of physicochemical properties that can be derived from the sequence of a protein. The molecular weight, theoretical pI, amino acid composition, atomic composition, extinction coefficient, estimated half-life, instability index, aliphatic index, and grand average of hydropathicity (GRAVY) are all parameters computed by ProtParam [15] (http://web.expasy.org/protparam/).

Multiple Sequence Alignment (MSA)

The multiple sequence alignment of protein profiles was developed using MEGA 6.1 software to verify the accuracy of the alignment. The ClustalW program was used to perform multiple alignments of sequences.

Amino acid composition

MEGA 11 examined the catalase-encoding amino acid composition where all species’ individual amino acid frequencies were retrieved (https://www.megasoftware.net/).

Phylogenetic tree construction

To better understand the evolutionary relationships between plant species, catalase phylogenetic trees were constructed with MEGA6 software, and the visualization of phylogenetic tree patterns was performed using the neighbor-joining (NJ) method or UPGMA [16].

Motifs search and domain discovery

The analysis of motifs was done using the MEME tool (http://meme.sdsc.edu/meme/meme.html), which was also used to search their protein family using the NCBI conserved domain database (CDD) (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi). The biological activities of conserved protein motif data collected by MEME were analyzed using BLAST, and domains were assessed using InterProScan by offering the most significant possible match of sequences based on their highest similarity score [17].

Prediction of secondary structure

Secondary structures have a direct impact on how proteins fold and deform. This is how various amino acid sequences of plant catalase form helixes, sheets, and turns in the molecule. SOPMA (self-optimized prediction method with alignment) was used to predict the secondary structure of different plant catalases [18]. It is a self-optimized homologous tool based on Levin and his colleagues [19].

Comparative 3D modeling

A query protein sequence from each cluster group generated from a phylogenetic tree of plant catalase was analyzed, and comparative homology modeling was performed using the SWISS-MODEL (http://swissmodel.expasy.org) [20], based on automated comparative 3D modeling of protein structures.

Model evaluation

The most crucial step in homology modeling is model evaluation, which demonstrates that the modeled protein is of acceptable quality. Here, the predicted CAT model was evaluated and verified by the ERRAT value [21], Verify3D score [22], and PROCHECK [23] programs available from the SAVES server (http://nihserver.mbi.ucla.edu/SAVES). The quality of the predicted model was evaluated by Ramachandran plot assessment.

Protein-protein interaction

STRING v10.0 (http://string-db.org/) server was used to determine the catalase interaction of Arabidopsis thaliana with other closely related proteins. The query sequence was Arabidopsis thaliana with accession number CAA45564.1, and a functional protein association network was created [24].

Results

Retrieval of sequences

The protein sequences of many enzymes like peroxidases [25-27], pectinases, proteases [28], lipases [29], phytases, polyphenol oxidases [15], and cellulases [29] have been assessed and analyzed using bioinformatics tools. The current study used various bioinformatic tools to analyze the protein sequences of industrially important enzyme catalases from various plant sources. Around 150 catalase protein sequences from various plant sources were initially retrieved from NCBI using the BLAST method. From there, sequences with more than 70% similarity were selected where only 65 sequences were computationally evaluated based on full-length protein sequences (see Table 1). The diversity of plant sources for catalases was observed and found the largest for Oryza sativa, with 11 accession numbers forming the main group. Oryza sativa consists of four catalase genes OsCATA, OsCATB, OsCATC, and OsCATD [30], with functional variations under various abiotic stress conditions. Multiple accessions of the same catalase source help us gain insight into the structural and functional diversity of enzymatic proteins.

Physicochemical characterization

ProtParam was used to elucidate several physiochemical properties of the sequences. The amino acid residue variability in the 65 catalase protein sequences studied ranged from 90 to 533. The molecular weights varied between 10,322.46 and 61,366.87 daltons, while the pI values varied between 4.53 and 7.95. Most catalases had pI ranging from 5 to 7, while AAF34718 of Capsicum annuum has the pI value of 7.11, and the Oryza family placed in group F of the phylogenetic tree showed pI ranging from 4 to 5. Other physicochemical characteristics such as instability index, aliphatic index, and hydropathicity (GRAVY) were also variable for these CAT proteins. The aliphatic index measures the relative volume filled by the aliphatic side chain of amino acids such as alanine, valine, leucine, and isoleucine and provides information on the thermostability of globular proteins. It may be seen positively in increasing the thermostability of globular proteins. The following formula is used to determine the aliphatic index [31]. The coefficients a and b are the relative volume of valine side chain (a = 2.9) and of Leu/Ile side chains (b = 3.9) to the side chain of alanine. Plant catalases are assumed to be thermostable based on the data shown in Table 2. The instability index represents the in vivo half-life of a protein, and a number greater than 40 suggests a half-life of less than 5 h, while a value less than 40 indicates a half-life of more than 16 h. It also estimates the stability of the protein molecule [32, 33]. Most plant catalases have an instability index of less than 40, except a few that belong to the Oryza, Capsicum annuum, and Brassica juncea families. The hydrophobicity value of a peptide is represented by the grand average hydropathicity index (GRAVY), which is calculated as the sum of the hydropathy values of all amino acids divided by the sequence length, revealing that the negative value of the obtained plant proteins is hydrophilic.
Table 2

Physiochemical characterization of protein sequences of plant catalases as revealed by ProtParam

S. no.Accession numberSource organismsNo. of amino acidsMolecular weightTheoretical pITotal number of negatively charged residues (Asp + Glu)Total number of positively charged residues (Arg + Lys)Instability indexAliphatic indexGrand average of hydropathicity (GRAVY)
1NP001304079Vigna radiata51458955.506.69635938.7673.81−0.488
2BAA02755Vigna radiata var. radiata52560026.276.82646140.1274.70−0.462
3ADZ45556Vigna radiata51459000.146.58645938.3373.30−0.493
4ADZ45555Vigna radiata51558990.106.58645938.7073.30−0.492
5CAI43948Populus deltoides51959759.306.30655738.8771.97−0.479
6AET97564Ziziphus jujuba49257012.286.78615836.6771.34−0.586
7CAD42908Prunus persica51659502.856.67656141.9370.66−0.563
8ATO98311Phyllanthus emblica20623309.007.95192037.1172.43−0.176
9CAA85426Nicotiana plumbaginifolia52760803.506.68625838.7572.73−0.436
10ADC95629Bruguiera gymnorhiza52260289.996.99626039.9672.30−0.521
11CAB80226Arabidopsis thaliana52760803.506.68625838.7572.73−0.436
12CAA17773Arabidopsis thaliana52259978.316.56645939.6572.15−0.508
13CAA45564Arabidopsis thaliana52259932.316.67635939.6574.02−0.484
14AAF71742Raphanus sativus51859445.816.67635940.8070.04−0.521
15AAD17934Brassica juncea49256828.176.63625841.1170.14−0.571
16AAD17936Brassica juncea49256946.316.63625839.9370.14−0.569
17KFK30147Arabis alpina51559241.567.15636242.8771.59−0.529
18AAD17935Brassica juncea49256915.306.90615941.5869.53−0.581
19AAD17933Brassica juncea49657411.826.75625941.8168.97−0.574
20SIW58963Musa acuminata28932993.155.60352533.4375.57−0.248
21AAR14052Solanum tuberosum50958871.716.76625936.2572.81−0.496
22CAA85424Nicotiana plumbaginifolia52760359.756.73615842.6072.54−0.431
23CAB56850Prunus persica51960050.616.44666038.9472.33−0.525
24CAD42909Prunus persica51659586.906.83666343.9870.68−0.582
25NP 001268098Vitis vinifera51559395.136.60646036.3771.53−0.487
26AAL83720Vitis vinifera51659439.226.61646036.5272.54−0.462
27AIU99487Saccharum hybrid cultivar ROC2252961053.517.23616035.7870.96−0.459
28AIU99488Saccharum hybrid cultivar ROC2252961049.507.23616034.8071.70−0.454
29AIU99481Saccharum spontaneum52260096.686.65625734.0670.06−0.475
30AIU99480Saccharum spontaneum52260137.746.76625833.1371.36−0.489
31AIU99484Saccharum arundinaceum52460310.916.65635833.5871.83−0.462
32AIU99485Saccharum spontaneum52260074.726.76615733.2669.67−0.476
33AIM43584Saccharum hybrid cultivar Yacheng05-17953361366.876.79635933.0871.16−0.464
34AIU99486Saccharum spontaneum52260105.776.65625733.8270.79−0.461
35AIU99482Saccharum hybrid cultivar ROC2252961091.496.89636033.3272.25−0.461
36AKO90140Oryza sativa51459001.536.69635928.9472.67−0.476
37BAA34204Oryza sativa Japonica Group49256575.006.49625631.7170.73−0.521
38BAA05494Oryza sativa Japonica Group49256518.896.47625629.8070.35−0.519
39ADF83496Triticum aestivum51959739.156.35665936.5470.08−0.520
40BAA13068Triticum aestivum51959662.626.44655937.7070.27−0.522
41CAG23920Festuca arundinacea52159642.716.13685835.7168.71−0.551
42BAA34205Oryza sativa Japonica Group49256806.006.93605834.5670.73−0.583
43BAA34714Oryza sativa51458477.817.44595938.4065.66−0.570
44NP001311603Capsicum annuum51659027.526.89615941.9171.24−0.456
45BAF91369Capsicum annuum51759149.216.89615942.0571.30−0.443
46AAF34718Capsicum annuum51759102.647.11616042.0271.12−0.439
47CAA50644Solanum melongena51959612.466.49656038.1272.52−0.462
48AAA80650Solanum tuberosum51959490.246.46656040.9167.65−0.536
49AAA34145Solanum lycopersicum52259953.616.46656039.8970.25−0.489
50CAA85470Solanum tuberosum52560195.176.58656138.9070.23−0.501
51BAA06232Oryza sativa51358948.736.78656240.9768.42−0.521
52CAA43814Oryza sativa Indica Group52460255.596.77666339.8868.66−0.496
53BAA81679Oryza meridionalis11913672.554.55211139.1875.55−0.396
54BAA81678Oryza meridionalis11913672.554.55211139.1875.55−0.396
55BAA81677Oryza sativa f. spontanea12313890.664.53201040.1168.37−0.396
56BAA81676Oryza rufipogon11613488.204.70191138.8566.55−0.474
57BAA81675Oryza rufipogon11613488.204.70191138.8566.55−0.474
58BAA81674Oryza rufipogon12113958.734.72201238.5567.02−0.469
59BAA81673Oryza rufipogon12113958.734.72201238.5567.02−0.469
60BAA81672Oryza sativa12013504.304.84201339.1271.67−0.436
61BAA81671Oryza sativa Indica Group13114831.734.86211437.9072.37−0.401
62BAA81670Oryza sativa Japonica Group9010322.464.57191039.9563.89−0.700
63BAA81682Oryza glaberrima11913476.864.86201343.6069.83−0.492
64BAA81681Oryza glaberrima11713291.634.74201244.9071.03−0.463
65BAA81680Oryza barthii11513113.424.95191242.5371.39−0.492
Physiochemical characterization of protein sequences of plant catalases as revealed by ProtParam

Assessment of phylogenetic tree and MSA

The phylogenetic tree revealed six unique clusters labeled A, B, C, D, E, and F, each of which had 4, 22, 12, 5, 7, and 15 protein sequences are shown in Fig. 1. Multiple accessions belonging to the same genus were grouped, suggesting similarity at the sequence level, except for the Oryza sativa protein sequence was distributed in both groups D and F. The phylogenetic analysis provides a depth understanding of how species evolve due to genetic alterations. Scientists can use phylogenetics to examine the path that connects a modern plant CAT organism to its ancestral origin and anticipate future genetic divergence. It can also be helpful in comparative genomics, which analyzes the relationship between genomes of different species by gene prediction or discovery, locating specific genetic regions along a genome [34-36]. Before building the phylogenetic tree, the alignment of multiple sequences is shown in Fig. 2, revealing the degree of homology between the sequences from different plant sources. This information could be used to synthesize a specific catalase probe or primer that would serve as a marker to remove putative genes from sequenced plant strains. The advancement in the comparative genomic study of proteins provides a detailed understanding of functional genes within and between plant species, providing clear evidence for evolution research and gene function hypotheses of plant catalase [37].
Fig. 1

Construction of phylogenetic tree of protein sequences of plant catalases using NJ method. The unique clusters A, B, C, D, E, and F are highlighted, consisting of 4, 22, 12, 5, 7, and 15 members, respectively

Fig. 2

Multiple sequence alignment of distinct clusters A, B, C, D, E, and F of plant catalases

Construction of phylogenetic tree of protein sequences of plant catalases using NJ method. The unique clusters A, B, C, D, E, and F are highlighted, consisting of 4, 22, 12, 5, 7, and 15 members, respectively Multiple sequence alignment of distinct clusters A, B, C, D, E, and F of plant catalases

Motifs and domain identification

The structure and functional complexity of enzymes can be predicted and assessed using attributes such as sequence and function order features, domains, and motifs. Sequence motifs identified by protein sequence analysis can be used as signature sequences for targeted enzymes to determine their putative functions [38-40]. The distribution of 5 sequence motifs among 65 plant catalases was analyzed, uniformly distributed with a width length of 50 with the best possible amino residue sequences, as shown in Table 3. When these motifs were subjected to BLAST, they resembled the plant catalase superfamily PLN02609.
Table 3

The five motifs with best match possible amino acid sequences with their respective domain

MotifsWidthBest possible amino acidsConserved domain
150KFHWKPTCGVKCLMEDEAITVGGTNHSHATQDLYDSIAAGNYPEWKLFIQPlant catalase PLN02609 superfamily
250APGVQTPVIVRFSTVIHERGSPETLRDPRGFAVKFYTREGNFDLVGNNMPPlant catalase PLN02609 superfamily
350DFDPLDVTKTWPEDILPLQPVGRMVLNKNIDNFFAENEQLAFCPAIIVPGPlant catalase PLN02609 superfamily
450KPNPKSHIQENWRILDFFSHHPESLHMFTFLFDDVGIPQDYRHMEGSGVNPlant catalase PLN02609 superfamily
550IYYSDDKMLQTRIFSYADTQRHRLGPNYLQLPVNAPKCAHHNNHHEGFMNPlant catalase PLN02609 superfamily
The five motifs with best match possible amino acid sequences with their respective domain MEGA 11 was used to compute the composition of the amino acid sequences individually. The average amino acid composition was highest for proline at 7.38%, followed by aspartate (7.12%) given in Table 4, suggesting significant conformational rigidity of the secondary structure of the protein due to the distinctive cyclic structure of the proline side chain [41].
Table 4

Amino acid composition (%) of CAT protein from different plant sources

Accession numberAlaCysAspGluPheGlyHisIleLysLeuMetAsnProGlnArgSerThrValTrpTyrTotal
CAB56850.15.911.187.095.676.864.735.444.965.446.861.424.967.333.316.865.914.026.381.893.78423
NP001311603.16.11.836.715.496.15.284.075.085.086.711.835.287.322.646.915.895.286.911.424.07492
AIU99487.15.491.836.915.086.715.695.494.884.886.912.445.896.913.056.54.885.286.51.832.85492
AIU99484.15.491.836.915.286.55.495.495.084.887.112.245.897.113.056.54.475.286.711.832.85492
ADF83496.15.490.817.325.496.15.494.474.674.677.112.245.497.522.247.116.55.286.12.033.86492
CAB80226.15.691.226.715.896.715.284.476.14.886.32.036.17.523.056.915.694.475.691.633.66492
BAA34205.15.280.817.724.476.35.494.885.084.477.321.835.497.722.447.326.914.885.892.243.46492
CAI43948.16.11.636.55.696.55.694.675.084.886.911.635.287.522.856.716.14.276.32.033.66492
CAA45564.15.891.226.715.696.715.284.476.34.886.711.836.17.323.056.915.694.475.491.633.66492
NP001304079.16.50.616.715.697.325.284.885.284.886.711.636.37.112.646.915.694.076.712.033.05492
NP001268098.15.691.026.715.897.325.284.074.475.286.31.835.697.322.856.715.494.887.321.424.47492
SIW58963.13.490.787.755.438.916.594.655.434.267.362.335.817.362.715.045.045.436.981.553.1258
AAR14052.25.471.266.955.686.535.474.425.894.636.741.685.476.533.377.166.114.846.321.683.79475
AKO90140.15.691.837.525.086.15.694.884.884.887.522.445.697.112.646.715.285.086.31.832.85492
AIU99488.15.491.836.715.286.715.695.495.084.886.912.245.896.913.056.54.885.286.51.832.85492
AIU99486.15.281.836.915.286.715.695.495.084.886.912.245.896.912.856.54.675.286.911.832.85492
AIU99485.15.491.836.715.286.715.695.495.084.886.712.245.897.113.056.54.675.286.711.832.85492
AIU99482.15.491.837.115.286.55.495.495.084.886.912.245.696.713.256.54.885.286.711.832.85492
AIU99481.15.691.836.915.286.715.495.495.084.886.912.245.896.713.056.55.085.086.51.832.85492
AIU99480.15.491.636.915.286.55.495.495.084.887.322.035.896.913.056.714.885.286.51.832.85492
AIM43584.15.491.836.915.286.715.495.494.884.886.912.245.896.913.056.54.675.496.711.832.85492
KFK30147.15.891.226.56.16.715.284.475.694.676.32.036.17.722.647.325.894.076.11.633.66492
AET97564.15.891.226.715.696.55.284.885.694.886.711.425.897.523.056.915.894.075.892.033.86492
ADZ45556.16.50.816.715.697.115.284.885.284.886.711.636.17.322.646.915.694.276.52.033.05492
ADZ45555.16.50.816.715.697.115.284.885.284.886.711.636.17.112.646.915.894.276.52.033.05492
AAA80650.15.892.037.115.695.895.893.864.885.086.31.635.497.322.646.915.895.286.51.424.27492
AAA34145.15.892.036.915.895.895.493.865.085.086.51.635.497.522.646.915.695.286.51.424.27492
AAF71742.15.491.226.56.16.715.494.475.494.886.52.036.17.722.646.916.14.276.11.633.66492
AAF34718.16.51.836.55.696.15.284.075.085.086.51.835.287.322.647.115.894.886.911.424.07492
ADC95629.15.891.636.55.496.55.494.885.494.476.911.226.17.522.857.115.494.276.52.033.66492
AAL83720.15.691.026.715.897.325.284.074.475.286.31.835.697.322.856.715.494.887.321.424.47492
AAD17936.15.491.226.56.16.915.494.475.894.676.32.036.37.722.647.115.694.275.891.633.66492
AAD17935.15.491.226.16.36.715.284.475.084.886.52.036.37.722.647.115.894.476.51.633.66492
AAD17934.15.691.226.56.16.715.494.475.694.886.32.036.37.722.646.915.694.276.11.633.66492
AAD17933.15.441.216.256.257.065.244.445.044.846.452.026.257.662.627.066.054.446.451.613.63496
BAA34204.15.691.837.325.286.55.695.084.884.677.112.445.697.112.856.715.285.086.31.832.64492
BAA05494.15.891.837.525.086.55.894.884.884.677.112.445.696.912.856.715.285.086.11.832.85492
BAA02755.16.50.616.715.697.325.284.885.284.886.711.636.37.112.646.915.694.076.712.033.05492
BAF91369.16.11.836.715.496.15.284.075.085.086.711.835.287.322.646.915.895.286.911.424.07492
CAA85470.16.111.836.925.915.915.53.875.095.096.521.435.37.332.656.925.915.56.521.434.28491
BAA06232.16.311.637.545.56.925.54.483.874.286.111.634.898.352.448.154.075.57.741.633.46491
CAA17773.15.691.226.715.896.715.284.476.14.886.32.036.17.523.056.915.694.475.691.633.66492
CAG23920.15.690.817.935.086.15.894.474.884.277.112.035.497.522.447.325.895.495.892.033.66492
CAA50644.16.11.836.715.896.15.284.075.085.086.711.835.287.322.647.115.494.887.111.424.07492
CAA85426.15.691.836.715.696.715.284.884.885.086.712.035.897.522.856.715.284.276.52.243.25492
CAA85424.15.151.447.015.366.85.364.124.744.746.61.865.577.223.097.016.85.366.61.443.71485
CAA43814.16.521.637.545.56.925.54.483.874.285.911.634.897.942.658.154.075.57.941.633.46491
CAD42909.16.11.027.115.696.34.885.084.675.087.111.224.887.722.857.326.14.276.712.033.86492
CAD42908.15.281.226.915.696.55.284.674.885.087.111.425.497.723.056.916.34.076.712.033.66492
BAA13068.15.490.817.325.496.15.494.474.674.677.112.245.497.522.247.116.55.286.12.033.86492
BAA34714.15.691.427.324.475.896.14.474.274.276.51.835.497.722.247.527.725.895.892.442.85492
ATO98311.15.291.186.473.53107.656.473.535.295.292.354.125.882.355.885.885.889.411.182.35170
BAA81682.15.812.3311.639.35.815.813.493.496.985.811.162.338.142.333.493.495.816.983.492.3386
BAA81681.15.952.3811.99.525.954.763.573.575.955.951.192.388.332.383.573.575.957.143.572.3884
BAA81680.15.882.3511.769.415.884.713.533.537.065.881.182.358.242.353.533.535.887.063.532.3585
BAA81679.15.952.3811.99.525.954.763.572.385.957.141.192.388.332.383.573.575.957.143.572.3884
BAA81678.15.952.3811.99.525.954.763.572.385.957.141.192.388.332.383.573.575.957.143.572.3884
BAA81677.15.952.3811.99.525.954.763.572.385.957.141.192.388.332.383.573.575.957.143.572.3884
BAA81676.15.952.3811.99.525.954.763.572.385.957.141.192.388.332.383.573.575.957.143.572.3884
BAA81675.15.952.3811.99.525.954.763.572.385.957.141.192.388.332.383.573.575.957.143.572.3884
BAA81674.16.742.2512.368.995.625.623.372.256.746.741.122.257.872.253.373.375.627.873.372.2589
BAA81673.16.742.2512.368.995.625.623.372.256.746.741.122.257.872.253.373.375.627.873.372.2589
BAA81672.16.592.212.098.795.495.493.32.26.596.591.12.27.692.25.493.35.497.693.32.291
BAA81671.16.592.212.098.795.495.493.32.26.596.591.12.27.692.25.493.35.497.693.32.291
BAA81670.16.672.2212.228.895.565.563.332.226.676.671.112.227.782.224.443.335.567.783.332.2290
Avg. %5.781.457.125.716.595.464.654.974.916.721.885.567.382.766.795.534.96.551.853.43400.9
Amino acid composition (%) of CAT protein from different plant sources Predicting the secondary structure of proteins is critical to understanding protein folding in three dimensions. The secondary structure is predicted using the primary protein sequence [42]. Using SOPMA, the predicted secondary structure of protein sequences revealed the predominance of random coils with more than 40% except for a few sequences such as Capsicum annuum, Solanum melongena, Solanum lycopersicum, Oryza meridionalis, Oryza rufipogon, Oryza glaberrima, and Oryza barthii, which had extended arms in the majority. The alpha helix and beta turn found the highest repeats in Populus deltoides and Oryza sativa, as given in Table 5.
Table 5

Secondary structure prediction of plant catalases using SOPMA

OrganismAccession numberAlpha helixBeta turnRandom coilExtended strand
Vitis viniferaAAL8372027.03% (133)7.7% (38)48.78% (240)16.46% (81)
Vigna radiataADZ45555127.44% (135)7.93% (39)49.39% (243)15.24% (75)
Populus deltoidesCAI43948129.88% (147)7.32% (36)48.37% (238)14.43% (71)
Ziziphus jujubaAET97564128.46% (140)7.52% (37)49.19% (242)14.84% (73)
Prunus persicaCAD42909127.85% (137)7.32% (36)48.98% (241)15.85% (78)
Phyllanthus emblicaATO98311117.96% (29)12.35% (21)40.49% (69)30% (51)
Nicotiana plumbaginifoliaCAA85426127.03% (133)6.91% (34)50.81% (250)15.24% (75)
Bruguiera gymnorhizaADC95629128.86% (142)7.93% (39)47.15% (232)16.06% (79)
Arabidopsis thalianaCAA17773127.64% (136)7.93% (39)48.78% (240)15.65% (77)
Raphanus sativusAAF71742126.42% (130)7.93% (39)50.81% (250)14.84% (73)
Brassica junceaAAD17934128.25% (139)7.52% (37)48.58% (239)15.65% (77)
Arabis alpinaKFK30147128.66% (141)7.93% (39)48.37% (238)15.04% (74)
Musa acuminataSIW58963120.93% (54)10.47% (27)47.29% (122)21.32% (55)
Solanum tuberosumAAR14052227.16% (129)8% (38)50.11% (238)14.74% (70)
SaccharumAIU99482125.20% (124)7.72% (38)49.8% (245)17.28% (85)
Saccharum spontaneumAIU99486125.81% (127)7.72% (38)49.59% (244)16.87% (83)
Saccharum arundinaceumAIU99484128.05% (138)7.72% (38)48.17% (237)16.06% (79)
Oryza sativaBAA34204127.44% (135)8.13% (40)47.76% (235)16.67% (82)
Triticum aestivumBAA13068128.25% (139)7.52% (37)47.36% (233)16.87% (83)
Festuca arundinaceaCAG23920126.63% (131)7.93% (39)49.39% (243)16.06% (79)
Capsicum annuumAAF34718129.07% (143)6.91% (34)14.02% (69)50% (246)
Solanum melongenaCAA50644126.83% (132)6.71% (33)16.06% (79)50.41% (248)
Solanum lycopersicumAAA34145128.86% (142)8.54% (42)15.65% (77)46.95% (231)
Oryza meridionalisBAA81679125% (144)7.64% (44)16.49% (95)50.87% (293)
Oryza rufipogonBAA81674134.83% (31)8.99% (8)14.61% (13)41.57% (37)
Oryza glaberrimaBAA81681126.19% (22)8.33% (7)16.67% (14)48.81% (41)
Oryza barthiiBAA81680127.06% (23)7.06% (6)16.47% (14)49.41% (42)
Secondary structure prediction of plant catalases using SOPMA

Comparative homology modeling and its functional analysis

To predict the 3D structure, a well-known template sequence is required, similar to the query sequence. A single organism from each cluster was selected, as shown in Table 6, and homology modeling of the 3D protein structure was carried out, where Arabidopsis thaliana was found as the query sequence to have the highest sequence identity and the GMQE score. The 3D structure was built by SWISS-MODEL using template 4qol.1.A Bacillus pumilus catalase by extrapolating experimental data from an evolutionarily related protein structure that serves as a template in Fig. 3, and the quality estimation of the predicted model is shown in Fig. 4a. The template’s sequence identity was 53.8% compared to the query sequence, the QMEAN score was −1.44, the GMQE value at 0.81 values, and the predicted model’s oligo state was homotetramer with 1.65 A resolution [43]. As part of the evaluation and validation process, the predicted protein model of the query sequence (in. PDB format) was uploaded to many servers. The Ramachandran plot analysis showed that 89.8% resided in the most favored (red) regions, while 10.1% fell into the additional allowed (brown) regions and 0.4% in the generously allowed regions, validating the quality of the modeled structure given in Fig. 5.
Table 6

Characterization of selected organism modeling from each cluster evaluated by SWISS-MODEL

OrganismTemplateResiduesGMQESequence identity (%)
Vitis vinifera4qol.1.A14-4880.8150.84
Arabidopsis thaliana4qol.1.A17-4880.8153.83
Saccharum spontaneum4qol.1.A14-4900.8151.99
Triticum aestivum4qol.1.A18-4870.8053.32
Solanum tuberosum4qol.1.A17-4880.8050.32
Oryza sativa japonica4qol.1.A14-4890.8146.86
Fig. 3

Predicted protein model of catalase enzyme of Arabidopsis thaliana showing distinct four homo-tetrameric chains

Fig. 4

Predicted protein model quality estimation by SWISS-MODEL

Fig. 5

Ramachandran plot of predicted CAT model from Arabidopsis thaliana generated from PROCHECK. Residues in most favored regions (A, B, L)—89.8%. Residues in additional allowed regions (a, b, l, p)—10.1%. Residues in generously allowed regions (~a, ~b, ~l, ~p)—0.4%. Residues in disallowed regions—0.4%

Characterization of selected organism modeling from each cluster evaluated by SWISS-MODEL Predicted protein model of catalase enzyme of Arabidopsis thaliana showing distinct four homo-tetrameric chains Predicted protein model quality estimation by SWISS-MODEL Ramachandran plot of predicted CAT model from Arabidopsis thaliana generated from PROCHECK. Residues in most favored regions (A, B, L)—89.8%. Residues in additional allowed regions (a, b, l, p)—10.1%. Residues in generously allowed regions (~a, ~b, ~l, ~p)—0.4%. Residues in disallowed regions—0.4% The overall G factor of dihedral angles and covalent forces was −0.16, higher than the allowable threshold of −0.5. A high G factor indicates that a stereochemical characteristic correlates with a high probability of conformation [44, 45]. The predicted model was submitted to the SAVES server. ERRAT plots were used to examine the protein model’s atom distribution with one another and to make decisions regarding the model’s reliability when evaluating the amino acid environment. The overall quality factor of ERRAT was 92.5, indicating a slightly negligible value of the individual residues (Fig. 6). The Verify3D suggested that the CAT model has at least 80% of amino acids with a score > = 0.2 in the 3D/1D profile, while the average residue was around 70.2%, suggesting the compatibility of the predicted model with its amino acid residues [46]. The QMEAN Z-score in Fig. 4b and c was −1.4, which was in the expected range of 0.0 to −2.0, representing a well-defined structure [47]. The cellular machinery is built on a foundation of proteins and their functional relationships. It is necessary to consider a network of webs between organisms to understand biological phenomena. The STRING analysis revealed ten predicted interacting partners of query CAT protein from the organism Arabidopsis thaliana (accession number CAA45564.1), which encodes peroxisomal catalase and revealed glutathione reductase as the closest interacting protein with the shortest distance. On the contrary, ACX5 (putative peroxisomal acyl-coenzyme A oxidase) remained distant from the query protein (Figs. 7 and 8) [48].
Fig. 6

ERRAT plot of Arabidopsis thaliana catalase model with overall quality factor 92.47

Fig. 7

Map of the protein-protein interaction of Arabidopsis thaliana catalase protein

Fig. 8

Predicted interacting protein partners of the query sequence from STRING server

ERRAT plot of Arabidopsis thaliana catalase model with overall quality factor 92.47 Map of the protein-protein interaction of Arabidopsis thaliana catalase protein Predicted interacting protein partners of the query sequence from STRING server

Discussion

Computational approaches have established themselves as a valuable complement to our understanding of the protein universe and its properties. In silico analysis is one of the most helpful tools that contributes significantly to computational biology for exploring the structural and functional properties of the protein. Hence, the study was conducted to explore the structural and functional properties of catalase enzymes from plants using different bioinformatics tools such as ProtParam, MEGA-X, SOPMA, SWISS-MODEL, and SAVES server. The Expasy tool revealed several physiochemical characteristics of the retrieved catalase sequences, each representing its unique behavior. The pH at which a protein does not have a net electrical charge and is considered neutral is known as its isoelectric or isoionic point [49]. In the development of buffer systems for purification and isoelectric focus, the prediction of pI is critical. The study suggested that the theoretical pI value of most plant catalases is acidic ranging from 5 to 7, but Capsicum annuum has an alkaline pI value of 7.11. The instability index of protein catalases ranged from 28.94 to 44.90, except for a few species of catalases having an index of more than 40 with accession number CAD42908, CAD42909 (Prunus persica), AAD17934, AAD17935, AAD17938 (Brassica juncea), KFK30147 (Arabis alpina), CAA85424 (Nicotiana plumbaginifolia), BAF91369, AAF34718 (Capsicum annuum), BAA81682, BAA81681 (Oryza glaberrima), and BAA81680 (Oryza barthii). The aliphatic index refers to the percentage of a protein’s total volume occupied by its hydrophobic aliphatic side chains. The heat stability of a protein depends on its aliphatic index. A higher aliphatic index means that proteins are better able to withstand high temperatures [50]. Catalases with an aliphatic index ranging from 65.66 to 75.55 have substantial amounts of hydrophobic amino acids and are very thermally stable. The hydrophilic nature of the plant catalases was observed with the GRAVY score. The GRAVY negative score indicates that the protein could be globular (hydrophilic) rather than membranous (hydrophobic). This information could aid in the identification of these proteins [51]. The phylogenetic tree analysis was constructed using the maximum likelihood method to show evolutionary relationships among plant catalases. The distribution of Oryza sativa in different clusters C, D, and F revealed its genetic diversity and similarity with Festuca arundinacea and Saccharum spontaneum. Using a Pfam database search and NCBI/CDD-BLAST, the proteins were categorized into specific families based on the presence of a specific domain of their sequences. The NCBI BLAST designated the PLN02609 superfamily for catalase proteins with conserved domains. Overlapping annotations on the same protein sequences are generated by a superfamily, which is a collection of conserved models that have evolutionary domains. Protein secondary structure prediction from sequences is regarded as a link between the prediction of primary and tertiary structures [52]. Based on catalase secondary structure prediction, it was revealed the predominance of random coils followed by alpha helix in most of the catalases [3], which is highly similar to the results of CAT1 genes of PgCAT1, Soldanella alpina, and Gossypium hirsutum [7]. Random coils are irregular secondary arrangements found in the N and C terminal arms and loops of the protein structure occur because of electrostatic repulsion and steric hindrance of bulky adjacent residues such as isoleucine or charged residues such as glutamic acid or aspartic acid. In a random coil state, the average conformation of each amino acid residue is independent of the conformations of all residues other than those immediately proximal in the primary structure [53]. The amino acid composition of plant catalases revealed the highest proline content, which could explain the predominant coiled structural content. Proline has the unique ability to cause coiling by disrupting secondary conformations by causing kinks in polypeptide chains [54]. In silico prediction of a 3D model of a protein is a difficult element of correlating data received from NMR or crystallography-based approaches [48]. The query sequence (CAA45564) was blasted against PDB to find the best template. The highest sequence identity of 53.8% with negative QMEAN value and GMQE score suggested the template selection 4qol.1.A of Bacillus pumilus catalase. The validation of the predicted structure was performed by computational tools where 89.8% favored region of Ramachandran plot implied good quality of the model. The SAVES server tools ERRAT, Verify3D, and QMEAN Z-scores suggested a well-defined protein structure. The functional relationships of our query sequence revealed the glutathione reductase as the closest interacting protein with the shortest distance, which may be associated with the overlapping of its functional roles in the metabolic pathway [55].

Conclusion

In silico analysis of plant catalase protein provides insight into the numerous catalytic sites, allowing for possible manipulation of desirable qualities relevant to various sectors. Phylogenetic analysis revealed the similarity of various plant catalases, elucidating how species evolve genetically. Scientists can use phylogenetics to determine the genetic link between a modern organism and its ancestral origin and anticipate future genetic divergence. Numerous conserved amino acid residues among distinct clusters may allow for developing particular probes or markers that reflect source species from a specific taxon. Secondary structure analysis confirmed the predominance of a random coil followed by an alpha helix, an extended strand, and a beta turn. Plant catalases had the highest proline content in their amino acid composition, which could explain their coiled structural content. Proline has the unique ability to cause coiling in polypeptide chains by disrupting secondary conformations. The predicted 3D CAT model from Arabidopsis thaliana was a homotetramer, thermostable protein with 59-KDa weight, and its structural validation was confirmed by PROCHECK, ERRAT, Verify3D, and Ramachandran plot. In silico protein structure analysis is an extremely valuable technique for exploring protein structure-function relationships when crystal structures are unavailable. It can also help predict ligand-receptor interactions, enzyme-substrate interactions, mutagenesis experiments, SAR data, and loop structure prediction. While these studies build a robust foundation for wet-lab experimentation, they also provide a strong framework for looking at novel sources utilizing metagenomics approaches and directed evolution to incorporate desired functional qualities.
  44 in total

1.  Differential expression of three catalase genes in hot pepper (Capsicum annuum L.).

Authors:  Sang Ho Lee; Chung Sun An
Journal:  Mol Cells       Date:  2005-10-31       Impact factor: 5.034

2.  Stereochemical quality of protein structure coordinates.

Authors:  A L Morris; M W MacArthur; E G Hutchinson; J M Thornton
Journal:  Proteins       Date:  1992-04

3.  In silico comparative analysis and expression profile of antioxidant proteins in plants.

Authors:  S Sheoran; B Pandey; P Sharma; S Narwal; R Singh; I Sharma; R Chatrath
Journal:  Genet Mol Res       Date:  2013-02-27

4.  Homology modeling and phylogenetic relationships of catalases of an opportunistic pathogen Rhizopus oryzae.

Authors:  Beáta Linka; Gerda Szakonyi; Tamás Petkovits; László G Nagy; Tamás Papp; Csaba Vágvölgyi; Sándor Benyhe; Ferenc Ötvös
Journal:  Life Sci       Date:  2012-06-28       Impact factor: 5.037

5.  Effect of proline and glycine residues on dynamics and barriers of loop formation in polypeptide chains.

Authors:  Florian Krieger; Andreas Möglich; Thomas Kiefhaber
Journal:  J Am Chem Soc       Date:  2005-03-16       Impact factor: 15.419

6.  Enhancement of Cd tolerance in transgenic tobacco plants overexpressing a Cd-induced catalase cDNA.

Authors:  Ziqiu Guan; Tuanyao Chai; Yuxiu Zhang; Jin Xu; Wei Wei
Journal:  Chemosphere       Date:  2009-05-26       Impact factor: 7.086

7.  Homology modeling and virtual screening approaches to identify potent inhibitors of VEB-1 β-lactamase.

Authors:  Abdelmonaem Messaoudi; Hatem Belguith; Jeannette Ben Hamida
Journal:  Theor Biol Med Model       Date:  2013-04-02       Impact factor: 2.432

8.  Isolation and in-silico characterization of Peroxidase isoenzymes from Wheat (Triticum aestivum) against Karnal Bunt (Tilletia indica).

Authors:  Shalini Purwar; Ankit Gupta; Geetika Vajpayee; Shanthy Sundaram
Journal:  Bioinformation       Date:  2014-02-19

9.  MEME SUITE: tools for motif discovery and searching.

Authors:  Timothy L Bailey; Mikael Boden; Fabian A Buske; Martin Frith; Charles E Grant; Luca Clementi; Jingyuan Ren; Wilfred W Li; William S Noble
Journal:  Nucleic Acids Res       Date:  2009-05-20       Impact factor: 16.971

10.  DiffLogo: a comparative visualization of sequence motifs.

Authors:  Martin Nettling; Hendrik Treutler; Jan Grau; Jens Keilwagen; Stefan Posch; Ivo Grosse
Journal:  BMC Bioinformatics       Date:  2015-11-17       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.