Arafat Rahman Oany1, Tahmina Pervin Jyoti2, Shah Adil Ishtiyaq Ahmad1. 1. Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Santosh, Tangail, Bangladesh. 2. Biotechnology and Genetic Engineering Discipline, Khulna University, Khulna, Bangladesh.
Abstract
Pyrococcus furiosus is a hyperthermophilic archaea. A hypothetical protein of this archaea, PF0847, was selected for computational analysis. Basic local alignment search tool and multiple sequence alignment (MSA) tool were employed to search for related proteins. Both the secondary and tertiary structure prediction were obtained for further analysis. Three-dimensional model was assessed by PROCHECK and QMEAN6 programs. To get insights about the physical and functional associations of the protein, STRING network analysis was performed. Binding of the SAM (S-adenosyl-l-methionine) ligand with our protein, fetched from an antibiotic-related methyltransferase (PDB code: 3P2K: D), showed high docking energy and suggested the function of the protein as methyltransferase. Finally, we tried to look for a specific function of the proposed methyltransferase, and binding of the geneticin bound to the eubacterial 16S rRNA A-site (PDB code: 1MWL) in the active site of the PF0847 gave us the indication to predict the protein responsible for aminoglycoside antibiotic resistance.
Pyrococcus furiosus is a hyperthermophilic archaea. A hypothetical protein of this archaea, PF0847, was selected for computational analysis. Basic local alignment search tool and multiple sequence alignment (MSA) tool were employed to search for related proteins. Both the secondary and tertiary structure prediction were obtained for further analysis. Three-dimensional model was assessed by PROCHECK and QMEAN6 programs. To get insights about the physical and functional associations of the protein, STRING network analysis was performed. Binding of the SAM (S-adenosyl-l-methionine) ligand with our protein, fetched from an antibiotic-related methyltransferase (PDB code: 3P2K: D), showed high docking energy and suggested the function of the protein as methyltransferase. Finally, we tried to look for a specific function of the proposed methyltransferase, and binding of the geneticin bound to the eubacterial 16S rRNA A-site (PDB code: 1MWL) in the active site of the PF0847 gave us the indication to predict the protein responsible for aminoglycoside antibiotic resistance.
Pyrococcus furiosus is a hyperthermophilic archaea. It is considered a model organism to study the hyperthermophilic extremophiles, mostly because of its capability to thrive best at 100 °C.1 These types of archaea are of special interest because of their evolutionary history and unique physiology, and also for their crucial biotechnological applications associated with their thermostable enzymes.2,3 Recent progresses ensure that P. furiosus is highly recombinogenic and able to take up DNA via natural competence.4–8 With the advancement in sequencing technologies, it is now considerably easier to obtain the whole genome sequence of such single cell organisms. Still there are lots of protein sequences in the public database whose functions are yet to be discovered experimentally.9 There are many open reading frames within the genome sequences on the database, for which we do not have any experimental characterization. In silico analysis of these hypothetical proteins is crucial—to predict the physical properties and biological functions. Here we represent the computational function prediction of the hypothetical protein PF0847 of P. furiosus by using various bioinformatics tools.Methyltransferases are a large group of proteins, with different subclasses having defined functions. P. furiosus has been reported to contain 43 methyltransferase proteins having various functional specificities. In addition to these 43 characterized proteins, P. furiosus genome contains many other hypothetical proteins that contain methyltransferase domains. We presumed that with such a huge collection of a single class of proteins, we might find out some significant roles for the hypothetical proteins that show sequence similarity with the methyltransferase proteins. Sgm, a methyltransferase from the actinomyceteMicromonospora zionensis, was of particular influence to this study.10 This protein had been shown to confer antibiotic resistance to an organism with its ability to interact with the ribosomal A-subunit and methylate specific residues—thereby rendering the ribosome indifferent to the particular antibiotics.
Materials and Methods
Sequence retrieval
Initially, we searched the NCBI ( http://www.ncbi.nlm.nih.gov/ ) protein database for proteins containing methyltransferase-like sequences. The hypothetical protein PF0847 (gi|18977219|) of P. furiosus (DSM 3638), consisting of 248 amino acid residues, was selected for the study. Then the sequence was stored as a FASTA format sequence for further analysis.
Physicochemical properties analysis
The ProtParam ( http://web.expasy.org/protparam/ )11 tool of ExPASy was used for the analysis of the physiological and chemical properties of the targeted protein sequence. The properties including aliphatic index (AI), GRAVY (grand average of hydropathy), extinction coefficients, isoelectric point (pI), and molecular weight were analyzed using this tool.
Homology identification and domain analysis
The PSI-BLAST program of NCBI database ( http://blast.ncbi.nlm.nih.gov/Blast.cgi ) was used for searching the homology of PF0847 with the non-redundant database. For the domain analysis, we used the Pfam ( http://pfam.sanger.ac.uk/ ) program of the Sanger Institute.12
Multiple sequence alignment (MSA) and phylogenic tree construction
For the identification of the sequence conservation among different species and strains, MSA was done with BioEdit biological sequence alignment editor,13 and the phylogenetic tree was also constructed by Jalview 2 tool.14
Structure prediction
The secondary structure of the protein was predicted by PSIPRED server of UCL Department of Computer Science ( http://bioinf.cs.ucl.ac.uk/psipred/ ),15 and the tertiary structure was predicted by MODELLER16 through HHpred17,18 tools of the Max Planck Institute for Development Biology.
Model quality assessment
Finally, the quality of the predicted structure was determined by PROCHECK19 and QMEAN620 programs of ExPASy server of SWISS-MODEL Workspace.21
Protein–protein interaction analysis
Protein residues interact with each other for their accurate functions. Here we used STRING ( http://string-db.org/ ), a database of known and predicted protein interactions that works through physical and functional associations derived from genomic context, high-throughput experiments, coexpression and previous knowledge. This database quantitatively integrates interaction data from above sources. Currently, this database covers 5,214,234 proteins from 1133 organisms.22
Active site detection
Active site of the protein was determined by the computed atlas of surface topography of proteins (CASTp) ( http://sts.bioengr.uic.edu/castp/ ),23 which provides an online resource for locating, delineating, and measuring concave surface regions on three-dimensional structures of proteins. These include pockets located on protein surfaces and voids buried in the interior of proteins. This provides an important means for the prediction of the interacting sites on protein with the ligand molecules.
Docking analysis
Docking analysis was performed by Molegro Virtual Docker (MVD) of CLC bio lab. Docking is performed in an integrated environment for studying and predicting how ligands interact with macromolecules. This offers high-quality docking that depends on a novel optimization technique.24 The combined binding of the target protein PF0847 with SAM (S-adenosyl-l-methionine), genticin and 16S rRNA A-site was obtained using PyMOL (The PyMOL Molecular Graphics System, Version 1.5.0.4, Schrödinger, LLC).
Results
Analysis of physicochemical properties and homology searching
Different physicochemical properties of the hypothetical protein PF0847 were analyzed using ProtParam analysis tool (Table 1). The 248 amino acids containing protein was estimated to possess a molecular mass of 27,905.5 and isoelectric pH at 9.36.
Table 1
ProtParam tool analysis result.
NO. OF AMINO ACID
MW
pI
(ASP + GLU)
(ARG + LYS)
EXT. COEFFICIENT
ALIPHATIC INDEX (AI)
INSTABILITY INDEX (II)
GRAND AVERAGE OF HYDROPATHICITY (GRAVY)
248
27905.5
9.36
29
37
25900
98.63
25.90
−0.104
Non-redundant database was searched for protein sequences homologous with PF0847, and some of the homologs found are listed in Table 2. The Pfam server identified conserved domain in our targeted protein. MSA was done among the homologs from Table 2, and the output is shown in Figure 1. Using the same data, a phylogenic tree was constructed as shown in Figure 2.
Table 2
Similar proteins obtained from non-redundant database.
ENTRY NAME
ORGANISM
PROTEIN NAME
IDENTITY
SCORE
E-VALUE
gi|212375063|
Pyrococcus Furiosus
Methyltransferase
98%
436
1e-152
gi|57641719|
Thermococcus kodakarensis KOD1
SAM-dependent methyltransferase
72%
376
9e-129
gi|389852483|
Pyrococcus sp. ST04
SAM-dependent methyltransferase
73%
373
2e-127
gi|350525995|
Thermococcus sp. AM4
Putative RNA methyltransferase
73%
370
2e-126
gi|545713642|
Galdieria sulphuraria
Ribosomal RNA large subunit methyltransferase F
35%
70.5
3e-11
gi|340783590|
Acidithiobacillus caldus SM-1
Adenine-specific methylase YfcB
32%
68.9
1e-10
Figure 1
MSA among different methyltransferase proteins with the target protein at the top row (Sources for the sequences: Row 2 – P. furiosus, Row 3 – T. kodakarensis KOD1, Row 4 – Pyrococcus sp. ST04, Row 5 – Thermococcus sp. AM4, Row 6 – P. yayanosii CH1, Row 7– Pyrococcus sp. NA2 and Row 8 – T. barophilus MP).
Figure 2
Phylogenetic tree showing average distance among different methyltransferase proteins and the target protein.
Structure analysis and model quality assessment
PSIPRED server was used to predict the secondary structure of the protein (Fig. 3). Tertiary structure of the protein was modeled by MODELLER (Fig. 4). Quality assessment of the predicted tertiary structure was obtained from PROCHECK through “Ramachandran plot” where we found 93.4% amino acid residues within the most favored region (Table 3 and Fig. 5A). The quality of our model was further checked by QMEAN6 server where the model was placed inside the dark zone and considered good (Fig. 5B). Active site of our targeted protein was analyzed by CASTp (Fig. 6). The amino acid residues of the active site were also determined.
Figure 3
Predicted secondary structure of the protein PF0847.
Figure 4
Predicted three-dimensional structure of the protein PF0847.
Table 3
Ramachandran plot statistics of the protein PF0847.
RAMACHANDRAN PLOT STATISTICS
NUMBER OF AA RESIDUES
PERCENTAGE (%)
Residues in the most favored regions [A, B, L]
198
93.4%
Residues in the additional allowed regions [a, b, l, p]
13
6.1%
Residues in the generously allowed regions [a, b, l, p]
1
0.5%
Residues in the disallowed regions
0
0.0%
Number of non-glycine and non-proline residues
212
100.0%
Number of end-residues (excl. Gly and Pro)
2
Number of glycine residues (shown in triangles)
24
Number of proline residues
10
Total number of residues
248
Figure 5
Model quality assessment. (A) Ramachandran plot of modeled structure validated by PROCHECK program. (B) Graphical presentation of estimation of absolute quality of model.
Figure 6
Active site determination of the protein PF0847. (A) The green sphere region indicates the most potent active site. (B) The amino acid residues in the active site.
Biological function analysis
Using our analysis thus far on the protein under study, we relied on molecular docking to find out the probable ligand. Molegro Virtual Docker docked the selected ligand SAM with both the hypothetical protein and the reference protein (3P2 K: D) with grid lines X: 21.84; Y: 10.96; Z: −7.07 and X: 21.87; Y: 10.94; Z: −7.12, respectively. The docking results are shown in Figure 7 and Table 4.
Figure 7
SAM ligand (red stick) docked in the active site of proteins. (A) SAM-bound antibiotic-related methyltransferase protein (PDB code 3P2K: D). (B) SAM-bound hypothetical protein (PF0847). (C) Interacting amino acid residues of the protein (PDB code 3P2K: D) with SAM. (D) Interacting amino acid residues of the protein PF0847 with SAM. Here blue dots indicate hydrogen bond.
To visualize the protein–protein interaction network of the protein PF0847, STRING was employed, and the obtained network is shown in Figure 8. Continuing with the STRING results, we found that geneticin bound to the eubacterial 16S rRNA A-site (PDB code: 1MWL) binds with the active site of the target protein PF0847 (Fig. 9). As we could not find an archaeal rRNA A-site 3D structure entry on the database, we tried to look for the similarity between eubacterial and archaeal rRNA with MSA (Fig. 10).
Figure 8
STRING network representing the predicted functional partners of the protein PF0847.
Figure 9
A-site of 16S rRNA (blue) bound to the protein PF0847 (cyan). The zoom view shows that geneticin (yellow) binds with the protein very close to the SAM (red) binding site.
Figure 10
MSA of the 16S rRNA A-site from different organisms (gi|444303952 was taken from P. furiosus DSM 3638, gi|155076 was taken from the eubacteria T. thermophilus, and 1MWL: A and B were from PDB crystal structure of genticin bound to the eubacterial 16S rRNA A-site).
Discussion
Physicochemical properties of the protein were calculated by the ProtParam server including AI, instability index (II), pI, extinction coefficient and average hydropathicity. The AI is the relative volume occupied by the side chains of amino acids (alanine, leucine, valine and isoleucine). Increase in AI denotes increased thermostability of the globular proteins.25 The calculated II of our protein was 25.90, which means it is stable in test tube condition.26 The extinction coefficient indicates the light absorption capacity.27 pI denotes protein net charge. Most of the calculations in this server demonstrate protein stability, because the stability is related to its proper function.28 PSI-BLAST against non-redundant database revealed 98% similarity with methyltransferase protein. It also found similarity with putative RNA methyltransferase and adenine-specific methylase protein. Pfam server identified mostly conserved methyltransferase domain from 79 to 205 amino acid residues. MSA among the related proteins showed higher conservancy with methyltransferase domain and with the whole protein sequences too. Phylogenetic tree also expressed evolutionary relationship among different methyltransferase-related proteins of both archaeal and eubacterial origins. It also indicated that the target protein PF0847 had some evolutionary relation with eubacterial methyltransferases, even though they were very distant.The proposed secondary structure predicted by PSIPRED has a good confidence of prediction. Tertiary structure was modeled by MODELLER with multiple templates to cover the whole sequence. Quality of the model was assessed by PROCHECK and is represented by Ramachandran plot (Fig. 5A). According to the plot statistics, 93.4% residues are in the most favored regions [A, B, L], 6.1% residues are in the additional allowed regions [a, b, l, p], and 0.0% residues are in the disallowed regions – a statistics that reveals a good model. QMEAN6 server assessment (Fig. 5B) result showed that the Z score of the predicted model was 0.18, which indicates a high-quality model. Active site of the protein predicted by CASTp server (Fig. 6) gives insight about the active site cleft and the amino acid residues that interact with different ligands.STRING interaction network revealed that our targeted protein (PF0847) interacted with four different proteins for its functioning. The protein flpA (PF0059) from P. furiosus, which is involved in pre-rRNA and tRNA processing, interacted experimentally with our protein.22 The protein flpA utilizes the methyl donorSAM to catalyze the site-specific 2′-hydroxyl methylation of ribose moieties in rRNA and tRNA. Site specificity is provided by a guide RNA that base pairs with the substrate. Methylation occurs at a characteristic distance from the sequence involved in base pairing with the guide RNA.29,30 The target protein also showed interactions with hypothetical proteins PF0213, PF0848 and radA, which are involved in DNA repair and recombination pathways.Molegro Virtual Docker (MVD) performed docking between SAM ligand and our targeted protein within an integrated environment. The ligand SAM was fetched from an antibiotic-related methyltransferase protein (PDB code 3P2K: D). SAM ligand docked with both reference and targeted proteins active site (Fig. 7), and the docking results revealed that for both the bindings, the binding energy was similar (Table 4). RMSD value is an indication of how significant the computer-derived docking is, and smaller values indicate better docking. The RMSD values for the docking of SAM ligand to PF0847 and the reference protein were very close, which suggests a significant binding of SAM with PF0847.From the insights of the STRING interaction network, we found that our targeted protein also binds with the geneticin bound to the eubacterial 16S rRNA A-site (Fig. 9). It has been reported that P. furiosus confers antibiotic resistance through the modification of their 23S rRNA31 and as a highly recombinogenic4–8 organism, there is significant similarity between the bacterial and archeal ribosome.32 The crystal structure of an archaeal 16S rRNA A-site would be more convenient to compare the binding to PF0847, but no such archaeal structures were available on the database and we had to use the eubacterial structure. To further support our use of an eubacterial structure in the context of archaeal study, we showed the MSA of 16S rRNA A-site from archea (P. furiosus), eubacteria (Thermus thermophilus), and our bound crystal structure, which showed a strong alignment (Fig. 10). Resistance mechanism of many bacteria against aminoglycoside antibiotics through 16S rRNA methyltransferase has been reported in many articles.33–37
Conclusion
The study was designed to predict the three-dimensional structure and biological function of the hypothetical protein PF0847 of P. furiosus DSM 3638. All the above findings suggested the function of the target protein to be a SAM-dependent methyltransferase. The interaction of the protein with ribosomal A-site of bacteria further proposed its function as an aminoglycoside antibiotic resistance conferring protein. It needs further verification through laboratory experiments to validate the proposed function of the protein PF0847. But with the strong correlation that we showed in our in silico study, this protein will surely be of some interest—especially for those who are working with aminoglycoside resistance conferring proteins. We also encourage similar studies on other hypothetical proteins. We feel that extensive studies on this path might produce some breakthrough leads for future biomedical research.
Authors: Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton Journal: Bioinformatics Date: 2009-01-16 Impact factor: 6.937
Authors: Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn Journal: Nucleic Acids Res Date: 2011-11-29 Impact factor: 16.971