| Literature DB >> 35012287 |
Lincon Mazumder1, Mehedi Hasan2, Ahmed Abu Rus'd1, Mohammad Ariful Islam1.
Abstract
Campylobacter jejuni is one of the most prevalent organisms associated with foodborne illness across the globe causing campylobacteriosis and gastritis. Many proteins of C. jejuni are still unidentified. The purpose of this study was to determine the structure and function of a non-annotated hypothetical protein (HP) from C. jejuni. A number of properties like physiochemical characteristics, 3D structure, and functional annotation of the HP (accession No. CAG2129885.1) were predicted using various bioinformatics tools followed by further validation and quality assessment. Moreover, the protein-protein interactions and active site were obtained from the STRING and CASTp server, respectively. The hypothesized protein possesses various characteristics including an acidic pH, thermal stability, water solubility, and cytoplasmic distribution. While alpha-helix and random coil structures are the most prominent structural components of this protein, most of it is formed of helices and coils. Along with expected quality, the 3D model has been found to be novel. This study has identified the potential role of the HP in 2-methylcitric acid cycle and propionate catabolism. Furthermore, protein-protein interactions revealed several significant functional partners. The in-silico characterization of this protein will assist to understand its molecular mechanism of action better. The methodology of this study would also serve as the basis for additional research into proteomic and genomic data for functional potential identification.Entities:
Keywords: Campylobacter jejuni; functional annotation; homology modeling; hypothetical protein; in-silico characterization; propionate catabolism
Year: 2021 PMID: 35012287 PMCID: PMC8752978 DOI: 10.5808/gi.21043
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
List of bioinformatics tools and databases used for sequence based function annotation
| Sl | Software | Function | References |
|---|---|---|---|
| A | Sequence similarity search | ||
| 1 | BlastP | Used to find similar sequences in protein databases | [ |
| 2 | MUSCLE | Used to conduct multiple sequence alignment | [ |
| 3 | MEGA X | Used for inferring phylogenetic trees | [ |
| B | Physiochemical characterization | ||
| 4 | ExPASy-Protparam tool | Used for computation of various physical and chemical parameters of protein | [ |
| C | Sub-cellular localization | ||
| 5 | CELLO | Assign localization to both prokaryotic and eukaryotic proteins | [ |
| 6 | PSLpred | Used to predict subcellular localization of proteins from Gram-negative bacteria | [ |
| 7 | PSORTb | Used to predict subcellular localization of bacterial proteins | [ |
| D | Secondary structure prediction | ||
| 8 | SOPMA | Used to predict the secondary structure of protein | [ |
| 9 | PSIPRED | Used for predicting PSI-blast based secondary structure to analyze protein | [ |
| E | 3D structure prediction and quality assessment | ||
| 10 | HHpred | Used to detect protein homology by HMM-HMM comparison | [ |
| 11 | YASARA | Utilized to increase the stability of the 3D model structure | [ |
| 12 | PyMOL | Used for structural analysis and model figure generation | [ |
| 13 | PROCHECK’s Ramachandran plot analysis | Used to analyze the quality and accuracy of the predicted 3D model structure | [ |
| 14 | Verify3D | Used to assess protein’s model with 3D profiles | [ |
| 15 | ERRAT | Used to analyze the statistics of non-bonded interactions between different atoms and verify protein structures | [ |
| F | Functional annotation | ||
| 16 | CD Search | Used to search for conserved structural and functional domains in a sequence | [ |
| 17 | InterProScan | Used to search interPro for motif discovery | [ |
| G | Protein-protein interaction | ||
| 18 | STRING | Used for predicting protein-protein interaction | [ |
| H | Active site identification | ||
| 19 | CASTp | Used to find, outline, and estimate inward surface regions on protein 3D structure | [ |
Similar protein obtained from non-redundant protein sequences (nr) database
| Protein name | Source organism | Accession ID | Identity (%) | Score | e-value |
|---|---|---|---|---|---|
| MULTISPECIES: MmgE/PrpD family protein |
| WP_002866694.1 | 100 | 910 | 0 |
| MmgE/PrpD family protein |
| EHD2634150.1 | 99.78 | 909 | 0 |
| MmgE/PrpD family protein |
| WP_057100379.1 | 99.78 | 909 | 0 |
| MmgE/PrpD family protein |
| WP_193228049.1 | 99.55 | 908 | 0 |
| MULTISPECIES: MmgE/PrpD family protein |
| WP_002877370.1 | 99.78 | 908 | 0 |
Similar protein obtained from UniProtKB/Swiss-Prot (swissprot) database
| Protein name | Source organism | Accession ID | Identity (%) | Score | e-value |
|---|---|---|---|---|---|
| Cis-aconitate decarboxylase |
| P54987.2 | 27.06 | 133 | 5e-33 |
| Cis-aconitate decarboxylase |
| A6NK06.1 | 26.91 | 130 | 6e-32 |
| Uncharacterized protein YxeQ | P54956.2 | 23.81 | 128 | 2e-31 | |
| Cis-aconitate decarboxylase |
| B3IUN8.1 | 25.49 | 114 | 2e-26 |
| Cis-aconitate decarboxylase | Q0C8L3.1 | 25.98 | 113 | 7e-26 |
Fig. 1.Phylogenetic relatedness of the study protein (indicated with a black diamond) along with similar other proteins obtained from non-redundant protein sequences (nr) database. Scale bars represents substitutions per nucleotide site. Evolutionary analyses were conducted in MEGA X using Jones-Taylor-Thornton model with 1,000 bootstraps.
Fig. 2.Secondary structure model predicted by the SOPMA server.
Fig. 3.Secondary structure model by PSIPRED server.
Fig. 4.Predicted 3D structure of the hypothetical protein rendered by PyMOL.
Fig. 5.3D model of the studied hypothetical protein of Campylobacter jejuni validated by Ramachandran plot of PROCHECK program (A), ERRAT (B) (value overall quality factor: 96.991 from the SAVES server), and Verify3D (C).
ROC results of various tools and databases used in the present study
| Tools name | Accuracy of prediction (%) | Sensitivity (%) | Specificity (%) | ROC area |
|---|---|---|---|---|
| BLAST | 97.5 | 97.4 | 100 | 0.99 |
| CD Search | 95 | 94.9 | 100 | 0.99 |
| InterProScan | 97.5 | 97.4 | 100 | 0.99 |
| Average | 96.7 | 96.6 | 100 | 0.99 |
ROC, receiver operating characteristic.
Fig. 6.Protein-protein interaction network of the hypothetical protein from the STRING server. The colored nodes indicate the query proteins and the first shell of interactors, the white nodes indicate the second shell of interactors, the empty nodes represent proteins with an unknown three-dimensional structure, and the filled nodes represent proteins with a known or predicted three-dimensional structure.
Fig. 7.Active site (indicated as red color) of the studied hypothetical protein.
Fig. 8.The amino acid residues in the active site of the studied protein (blue color).