Literature DB >> 31720472

In silico characterization of hypothetical proteins from Orientia tsutsugamushi str. Karp uncovers virulence genes.

Nikhat Imam1,2, Aftab Alam2, Rafat Ali2, Mohd Faizan Siddiqui3, Sher Ali2, Md Zubbair Malik4, Romana Ishrat2.   

Abstract

Scrub typhus also known as bush typhus is a disease with symptoms similar to Chikungunya infection. It is caused by a gram-negative bacterium Orientia tsutsugamushi which resides in its vertebrate host, Mites. The genome of Orientia tsutsugamushi str. Karp encodes for 1,563 proteins, of which 344 are characterized as hypothetical ones. In the present study, we tried to identify the probable functions of these 344 hypothetical proteins (HPs). All the characterized hypothetical proteins (HPs) belong to the various protein classes like enzymes, transporters, binding proteins, metabolic process and catalytic activity and kinase activity. These hypothetical proteins (HPs) were further analyzed for virulence factors with 62 proteins identified as the most virulent proteins among these hypothetical proteins (HPs). In addition, we studied the protein sequence similarity network for visualizing functional trends across protein superfamilies from the context of sequence similarity and it shows great potential for generating testable hypotheses about protein structure-function relationships. Furthermore, we calculated toplogical properties of the network and found them to obey network power law distributions showing a fractal nature. We also identifed two highly interconnected modules in the main network which contained five hub proteins (KJV55465, KJV56211, KJV57212, KJV57203 and KJV57216) having 1.0 clustering coefficient. The structural modeling (2D and 3D structure) of these five hub proteins was carried out and the catalytic site essential for its functioning was analyzed. The outcome of the present study may facilitate a better understanding of the mechanism of virulence, pathogenesis, adaptability to host and up-to-date annotations will make unknown genes easy to identify and target for experimentation. The information on the functional attributes and virulence characteristic of these hypothetical proteins (HPs) are envisaged to facilitate effective development of novel antibacterial drug targets of Orientia tsutsugamushi.
© 2019 The Author(s).

Entities:  

Keywords:  Biochemistry; Bioinformatics; Functional annotation; Genetics; Hypothetical proteins (HPs); Microbiology; ROC analysis; Scrub typhus; Virulence

Year:  2019        PMID: 31720472      PMCID: PMC6838952          DOI: 10.1016/j.heliyon.2019.e02734

Source DB:  PubMed          Journal:  Heliyon        ISSN: 2405-8440


Introduction

Orientia tsutsugamushi, an obligate intracellular bacterium, is a causative agent of scrub typhus or tsutsugamushi disease. The clinical manifestation of Scrub typhus is diverse, ranging from a nonspecific febrile illness to severe multiorgan dysfunction [1, 2]. There is an estimated one million new scrub typhus cases each year, and over one billion individuals around the world are at risk. Without appropriate treatment, the case fatality rate of scrub typhus can reach up to 30% or even higher [3]. Scrub typhus is an endemic disease to a sect of the world known as ‘the tsutsugamushi triangle’, which extends from Northern Japan and far East Russia in the North, to Northern Australia in the South and to Pakistan in the West [4]. India is an integral component of this triangle and it has been shown that during 2007–2017, in India, while dengue and Chikungunya are targeted as a focused disease meanwhile, scrub typhus affected a large number of the population of northern India (Himachal Pradesh) and other states including Rajasthan, Jammu and Kashmir, Uttrakhand, Puducherry, Sikkim, Bihar, West Bengal, Meghalaya, Rajasthan, Maharashtra, Karnataka, Andhra Pradesh and Tamil Nadu [5]. It is an occupational disease normally found amongst farmers working in the fields [4]. The possibilities of tsutsugamushi disease are higher in fruit farmers and chestnut gatherers. Similarly, the prevalence of infection is more likely to occur in tropical areas, especially agricultural lands, people living close to bushes and wood piles, farmers, rodent observers and those rearing domestic animals [6]. It mostly occures in the rainy season, However, epidemic periods have been reported during the winter season in southern India [7]. Certain areas such as forest clearings, river banks, bushy areas and grassy lands provide optimal conditions for the infected mites to survive. Thus, the best measure would be to avoid going to such places. The symptoms of tsutsugamushi disease are very similar to that of other arboviral infections like Dengue, Chikungunya and West Nile causing high grade fever (>104 °F) of 7–14 days duration, having symptoms like severe headache, anorexia, myalgia, maculopapular rash, profuse sweating and swelling of major lymph nodes (neck region and groin). The genome sequence of O. tsutsugamushi str. Karp is available in the NCBI database (Taxonomy ID: 1359185, GenBank Accession #: NZ_LYMA00000000.2) containing 1563 genes encoding proteins. The proteins with unknown functions are referred to as hypothetical proteins (HPs). The HPs are predicted to be expressed from an open reading frame (ORF), but have no experimental evidence of translation. These proteins constitute a substantial fraction in both prokaryotes and eukaryotes, including humans [8]. Annotation of HPs assist in finding newer structures and functions enabling their classification into other pathways and cascades. They also aid as markers and pharmacological targets for drug design, discovery, and in screening [9]. Amongst prokaryotes, the proteins from a number of bacteria remain uncharacterized despite the fact that their genome sequences are known. This provides an opportunity to annotate these putative proteins with respect to their functions. Many “putative proteins” are shared by a number of bacterial species, which suggest their much broader biological roles within and across species. Proteins that occur in various species are represented as orthologous groups that are useful for functional analyses and annotations of the newly sequenced genomes [10]. The ability to predict the function of a gene based on its sequence is an important area of biological research. The main objective of our study is to annotate the putative functions, determine its classification and identify the most virulent proteins of all 344 HPs. So we have analysed the sequences of all 344 HPs from O. tsutsugamushi str. Karp. The systematic analysis began with the prediction of physiochemical properties, sub-cellular localizations, domain/motif predictions, and function annotation using established bioinformatics databases and tools. The ROC (Receiver operating characteristic) analysis was used to assess the performance of approaches used (integrated bioinformatics tools) in the predictions on the basis of confidence and precision levels. If the confidence level is high for more than three tools, this indicates the same functions. So, we have successfully annotated functions of all 344 HPs of O. tsutsugamushi str. Karp. All 344 HPs were classified according to their functional properties like binding proteins, transporter proteins, kinase, hydrolae, transferase, etc. We analysed most virulent proteins for their involvement in defining different cellular fates. We believe that such analysis expands our understanding regarding, HPs and their functional role in the cell and provides an opportunity to discover novel potential drug targets.

Methods

Sequence retrieval

We analysed the genome of O. tsutsugamushi str. Karp and found 1,563 protein coding genes (http://www.ncbi.nlm.nih.gov/genome) and a total of 344 proteins as hypothetical proteins (HPs). The sequences of all 344 HPs proteins were retrieved from Uniprot (http://www.uniprot.org) in the FASTA format. The sequence similarity search was performed via pBLAST against the non-redundant database. The similarity search against the Protein Data Bank (PDB) yield no potential structural templates. Generally, HPs contain low identity as compared to other known or annotated proteins.

Physicochemical properties of HPs

The physicochemical parameters of all 344 HPs were studied using Expasy's ProtParam server [11] (www.web.expasy.org/protparam), which was then used for theoretical measurements such as molecular weight, isoelectric point, extinction coefficient [12], instability index [13], aliphatic index and grand average of hydropathicity (GRAVY) [14]. The extinction coefficient is the measure of the amount of light that proteins absorb at a certain wavelength. An estimation of the stability of a protein in a test tube is provided by the instability index. The aliphatic index of a protein is defined as the relative volume occupied by aliphatic side chain amino acids. The GRAVY score for a peptide or protein is calculated as the sum of the hydropathy values of all of the amino acids, divided by the number of residues in the query sequence. The predicted properties of HPs are listed in S1_ Table.

Sub-cellular localization & protein classification

The functions of proteins are usually related to its sub-cellular localization. Thus, the ability to predict sub-cellular localization directly from protein sequences will be useful for inferring its functions at the cellular level. It is a well known fact that a protein present in the cytoplasm can act as a possible drug target, while membrane proteins found on the surface are considered as vaccine targets [15]. The sub-cellular localization of HPs were predicted using PSLpred (it is a hybrid approach-based method that integrates PSI-BLAST and three SVM (Support vector machine) modules based on compositions of residues, dipeptides and physico-chemical properties [16] and CELLO (multi-class SVM classification system) (S2_ Table) [17].

Functional domain/motif prediction

The aim of the functional domain prediction is to find out the conserved part of a protein because domains often form functional units. The functional domains of HPs were predicted by using various publicly available databases such as ScanProsite [18], SMART [19], Motif Scan (including peroxiBase profiles, HAMAP, PROSITE patterns, Pfam HMMs for local & global models) [20] and PFP-FunDSeqE [21]. ScanProsite provides a web interface to identify protein matches against signatures from the PROSITE database. The Simple Modular Architecture Research Tool (SMART) [19] was used for similarity search based on domain architecture and profiles rather than by direct sequence similarity. PFP-FunDSeqE server [22] covers the protein fold types and compared with the existing predictors tested by the same stringent benchmark dataset, the new predictor can, for the first time, achieve high success rate. The detailed results are given in (S3_ Table).

Virulence protein prediction

The identification of virulent proteins in bacterial protein sequences is useful in estimating its pathogenic ability and understanding the complex virulence mechanism of pathogenesis [23]. Here, we used VirulentPred tool (Bi-layer cascade Support Vector Machine) (http://bioinfo.icgeb.res.in/virulent/) for the identification of virulence factors among HPs. It is a Support Vector Machine (SVM) based method to predict virulence proteins with accuracy. We considered five modules based on protein features such as Amino Acid Compositions, Dipeptide Composition, PSI-BLAST created PSSM Profiles, Higher Order Dipeptide Composition Based and Cascade of SVMs and PSI-BLAST. These modules gave SVM predicted scores and similarity-search based information for each of the 344 sequences (S4_ Table). We selected the average of highly significant values from each module i. e; average values >1.0.

Function prediction

The accurate annotation of protein function is a key to understand the processes of life at the molecular level. In our study, we have predicted the gene ontology (Moleculer function and Biological process) of most virulent proteins. We used PFP (Protein function prediction) and Argot2 (Annotation Retrieval of Genel Ontology Terms), which quickly process thousands of sequences for functional inference.

Performance assessment

The predicted functions of 344 HPs from the O. tsutsugamushi str. Karp were validated using the Receiver Operating Characteristic (ROC) analysis. We predicted the function of 60 proteins (functions already known) to check the accuracy of a tools which were used to annotate our 344 HPs. The details are given in (S5_Table). The diagnostic efficacy was evaluated at twelve levels. The two binary numerals ‘‘0’’ or ‘‘1’’ were used to classify the prediction as true positive (‘‘1’’) or true negative (‘‘0’’).

Sequence similarity networks (SSN)

The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. We also calculated topological properties of the parent network, the topological analysis helps to understand the structure of a network which facilitates in understanding the hidden mechanisms. The networks properties (Degree distribution, Neighborhood connectivity, Clustering co-efficient, Betweenness centrality and Closeness centrality) were analysed to seek the important behaviours of the network. In our study we have taken all the top 62 virulent proteins to construct the sequence similarity networks (SSN) using STRING database (V-10.0) [24] and find clusters (highly interconnected regions) in the network using MCODE(1.5.1) in Cytoscape [25].

Secondary and tertiary structure prediction of proteins

The sequences of five hub proteins, namely KJV55465 (OTBS_1583), KJV56211 (OTBS_0920), KJV57212 (OTBS_0674), KJV57203 (OTBS_0675) and KJV57216 (OTBS_0676) were considered for predicting secondary and tertiary structures. To obtain the probable 2D structure for hub proteins, we used PSIPRED server [26], which is a simple and accurate 2D structure prediction server, incorporating two feed-forward neural networks which perform an analysis on output obtained from PSI-BLAST [27]. The 3D prediction was made with through homology modelling using the MODELLER (version 9.21) [28] and best model structure was choosen based on DOPE score and GA341 assessment score, This methodology is very accurate for modelling 3D protein structures according to the sequence identity between the target sequence and the template proteins. Thus, the final predicted model was validated using the PROCHECK programs [29]. It analyzes the stereochemical quality of the protein model by analyzing residue-by-residue geometry and overall structure geometry.

Results

Our study analysed the sequence from O. tsutsugamushi str. Karp genome using advanced bioinformatics tools. After retrieval of all the 344 HPs sequences, we calculated the physiochemical properties and found that most of the HPs are localized in the cytoplasmic region of the cell. Generally, the cytoplasmic and inner membrane proteins are considered as potential drug targets but extracellular and outer membrane proteins are effective for vaccine [30]. Finally, We have classified all the 344 HPs into different protein classes like Signal_transducer (12 HPs), Receptor (07 HPs), Hormone (07 HPs), Structural_protein (59 HPs), Transporter (15 HPs), Voltage-gated_ion_channel (01 HP), Transcription (03 HPs), Transcription_regulation (10 HPs), Stress_response (16 HPs), Immune_response (51 HPs), Growth_factor (94 HPs). There was no evidence to classify the remaining 69 HPs into protein classes. Furthermore, we found that 130 out of 344 HPs are showed similar characteristics with the functions of known proteins. All the characterized hypothetical proteins (HPs) belong to various protein classes like enzymes, transporters, binding proteins, metabolic process and catalytic activity, kinase activity etc. The details of HPs are given in Fig. 1.
Fig. 1

(1) The gene ontology of all 344 hypothetical proteins. (2) Only 130 out 0f 344 HPs showed characteristic similarities with the functions of known proteins including binding proteins, transporter proteins, enzymes, metabolic pathways and other miscellaneous classes.

(1) The gene ontology of all 344 hypothetical proteins. (2) Only 130 out 0f 344 HPs showed characteristic similarities with the functions of known proteins including binding proteins, transporter proteins, enzymes, metabolic pathways and other miscellaneous classes.

Protein classification

Binding proteins

We characterized a total of proteins that showed properties like DNA binding (47 HPs), nucleotide binding (02 HPs), metal binding (02 HPs), protein binding (08 HPs), ATP- binding (04 HPs), Zinc ion binding (02 HPs), Cation binding (01 HP), calcium ion binding (02 HPs) and RanGTPase binding (01 HP). The DNA binding protein class was found to be a major class which contained 47 hypothetical proteins (HPs). It is well known fact that bacterial DNA binding protein plays a crucial role in DNA replication; the protein is involved in stabilizing the lagging strand as well as interacting with DNA polymerase III [31]. Currently, many more functions of bacterial DNA binding proteins have been identified, including the regulation of gene expression by histone-like nucleoid-structuring protein [32].

Transporter

Two HPs (KJV56037 and KJV55034) are annotated to be involved in protein and hydrogen transporation respectively. Bacterial transport proteins mediate passive and active transport of small solutes across membranes. Protein involved in the transport of hydrogen ions across a membrane, used to power processes such as ATP synthesis in the bacteria. The transporter systems of O. tsutsugamushi comprise of secondary transporters, in which transport activity and ABC-type transporters are driven by an ion gradient across the membrane and ATP hydrolysis respectively [33].

Kinase activity

We identified fproteins that have kinase activity including KJV54167, KJV54168, KJV56574 and KJV51784. It has been shown that bacteria have a versatile repertoir of protein kinases like histidine and aspartic acid kinases, serine/threonine kinases, and more recently tyrosine and arginine kinases. Currently, Tyrosine phosphorylation is known to be a key regulatory device of bacterial physiology, linked to exopolysaccharide production, virulence, stress response and DNA metabolism [34].

Hydrolase activity

Seven hypothetical proteins showed hydrolase activity that catalyze the hydrolysis of a chemical bond. Hydrolytic enzymes play key roles in the invasion of the host tissue and evading the host defense mechanism [35]. The genomes of gram-negative and gram-positive bacterial species all encode an inclusive variability of hydrolase enzymes that accounts for the specific cleavage of different peptidoglycan (PG) bonds; hydrolases are involved in several critical functions, including peptidoglycan (PG) maturation, turnover, recycling, autolysis, and cleavage of the septum during cell division [36].

Transferase activity

A total of hypothetical proteins (HPs) were found to exhibit transferase activity. A transferase is a class of enzyme that performs the transfer of specific functional groups (e.g. a methyl or glycosyl group) from one molecule to another. Bacterial interactions with the host mostly depend on the bacterial glycome. Particularly the bacterial glycome is largely determined by glycosyltransferases (GTs) [37].

Metabolic process and catalytic activity

A total HPs were found which were involved in both biosynthetic and catabolic metabolic processes. O. tsutsugamushi has a large genome size, and its metabolic pathways have been well organised due to the existence of repeated sequences. Overall, O. tsutsugamushi and Rickettsia have similar metabolic pathways but they are slightly different in many pathways like in the TCA-cycle, carbohydrate metabolism, bacterial cell wall synthesis and in transportations. The O. tsutsugamushi has five copies of ATP/ADP translocase. Recently, it has been shown that the translocases have differential transportation properties for nucleotides. There are several components of salvage pathways of purine and pyrimidine biosynthesis present in O. tsutsugamushi. The absence of enzymes for the interconversion of adenine and guanine suggests that these bacteria depend on the host for both purines and may import them via different subtypes of ATP/ADP translocases [33]. In contrast to O. tsutsugamushi, the members of anaplasmataceae is well equipped with the enzymes for the de novo nucleotide synthetic pathways and the pentose phosphate pathway. The majority of genes for fatty acid biosynthesis were present in O. tsutsugamushi but the β-oxidation system of fatty acids for energy generation was absent in O. tsutsugamushi.

Miscellaneous functions

We have found proteins exhibiting miscellaneous functions, such as peptide activity, oxidoreductase activity, NADH dehydrogenase (quinone) activity, Thiosulfate sulphur transference activity, RNA directed DNA polymerase, Structural molecule activity, RNA polymerase II transcription cofactor activity etc. Identification of bacterial virulent protein sequences has implications for characterization of novel virulence-associated factors, finding novel drug/vaccine targets against proteins indispensable to pathogenicity, and understanding the complex virulence mechanism in pathogens. So, the VirulentPred tool was used to predict the virulence factors amongst 344 HPs and it was found that 62 proteins were most virulent (virulent score >1.0). The results are shown in Table 1. We have successfully assigned a proposed function to most virulent HPs from O. tsutsugamushi str. Karp. We used PFP and ARGOT2 server, which are sequence similarity-based protein function prediction tools designed to predict GO annotations from protein sequences [38]. PFP server gives the results on the basis of PFP scores (Very high confidence: >20K, High confidence: >10K, and Low confidence: ≥ 100), while the ARGOT2 gives the results on the basis of confidence scores. After extensive analysis and compilation of annotated functions, we selected top scoring GO-annotation terms (Moleculer functions and Biological processs) from both tools, The details are given in Table 2.
Table 1

A list showing the properties of the 62 most virulent proteins among the 344 HPs which had virulence scores >1.0.

S.NoAccession No.Amino acid Composition based
Dipeptide Composition Based
PSI-BLAST created PSSM Profiles
Higher order Dipeptide Composition Based
Cascade of SVMs and PSI-BLAST
Average Scores
ResultsScoresResultsScoresResultsScoresResultsScoresResultsScores
1KJV57131Virulent0.9178Virulent2.2957Virulent1.2206Virulent2.6358Virulent0.91841.59766
2KJV57379Virulent1.5868Virulent1.7882Virulent1.4959Virulent1.2774Virulent1.53731.53712
3KJV57416Virulent1.511Virulent1.3622Virulent1.5987Virulent2.0421Virulent0.96921.49664
4KJV50735Virulent1.0381Virulent1.8509Virulent1.0601Virulent2.2208Virulent0.86161.4063
5KJV55958Virulent1.0906Virulent1.6763Virulent1.4196Virulent1.8863Virulent0.94771.4041
6KJV50994Virulent0.9864Virulent1.8571Virulent1.0425Virulent2.2675Virulent0.84481.39966
7KJV53007Virulent0.9266Virulent1.8414Virulent1.1017Virulent2.2594Virulent0.83941.3937
8KJV52681Virulent1.4108Virulent1.873Virulent1.1693Virulent1.4896Virulent1.01281.3911
9KJV56935Virulent0.9485Virulent1.8072Virulent1.1589Virulent2.1562Virulent0.86931.38802
10KJV54168Virulent1.2096Virulent1.6667Virulent1.2093Virulent1.7423Virulent1.0211.36978
11KJV53284Virulent1.3103Virulent1.0041Virulent1.2612Virulent2.2527Virulent0.97691.36104
12KJV57348Virulent0.9076Virulent1.772Virulent1.4456Virulent1.6504Virulent0.93411.34194
13KJV54735Virulent1.4234Virulent1.5279Virulent1.4345Virulent1.1387Virulent1.0851.3219
14KJV53188Virulent1.08Virulent1.5717Virulent1.1006Virulent1.8276Virulent1.01631.31924
15KJV55734Virulent0.9327Virulent1.1912Virulent1.2217Virulent2.2464Virulent0.9631.311
16KJV53939Virulent1.3339Virulent1.511Virulent1.0233Virulent1.551Virulent1.07621.29908
17KJV55659Virulent1.2404Virulent1.8791Virulent0.8758Virulent1.4898Virulent0.98891.2948
18KJV54587Virulent1.6516Virulent1.4787Virulent1.2528Virulent1.019Virulent1.06791.294
19KJV54139Virulent1.4621Virulent1.2146Virulent1.1112Virulent1.51Virulent1.11511.2826
20KJV56583Virulent1.3002Virulent0.9569Virulent1.5489Virulent1.4896Virulent1.09441.278
21KJV54508Virulent1.3796Virulent1.0061Virulent1.4037Virulent1.439Virulent1.12151.26998
22KJV55874Virulent1.3446Virulent0.9225Virulent1.5398Virulent1.3943Virulent1.10281.2608
23KJV55533Virulent1.239Virulent1.8973Virulent0.8824Virulent1.2263Virulent1.00191.24938
24KJV56683Virulent1.3132Virulent1.2083Virulent0.9426Virulent1.6811Virulent1.0751.24404
25KJV57311Virulent0.9809Virulent1.7789Virulent1.1815Virulent1.3329Virulent0.88751.23234
26KJV56036Virulent1.3556Virulent1.8793Virulent0.8037Virulent1.1186Virulent0.95951.22334
27KJV53935Virulent1.3824Virulent0.9454Virulent1.142Virulent1.4598Virulent1.12361.21064
28KJV55746Virulent1.0308Virulent1.5521Virulent0.9978Virulent1.373Virulent1.09211.20916
29KJV56143Virulent1.1156Virulent1.0511Virulent1.1646Virulent1.5631Virulent1.10721.20032
30KJV57393Virulent0.9855Virulent1.159Virulent1.2236Virulent1.5162Virulent1.09111.19508
31KJV52864Virulent1.2226Virulent1.4901Virulent1.0702Virulent1.0677Virulent1.1021.19052
32KJV57120Virulent1.4178Virulent0.9323Virulent1.4383Virulent0.967Virulent1.03751.15858
33KJV54670Virulent1.1082Virulent1.3983Virulent0.9956Virulent1.1537Virulent1.11421.154
34KJV57203Virulent1.023Virulent1.0318Virulent1.0671Virulent1.5494Virulent1.09711.15368
35KJV56211Virulent1.086Virulent1.2792Virulent1.08Virulent1.1946Virulent1.1241.15276
36KJV51002Virulent1.1881Virulent1.1188Virulent1.0868Virulent1.2358Virulent1.13161.15222
37KJV56684Virulent1.0339Virulent0.9725Virulent1.195Virulent1.4313Virulent1.10251.14704
38KJV50818Virulent1.2394Virulent1.081Virulent0.9281Virulent1.2893Virulent1.11641.13084
39KJV53916Virulent1.1855Virulent0.9014Virulent1.0891Virulent1.3351Virulent1.11721.12566
40KJV52751Virulent1.2581Virulent0.6884Virulent1.3906Virulent1.1715Virulent1.09251.12022
41KJV57626Virulent1.1033Virulent0.9177Virulent1.2272Virulent1.104Virulent1.09941.09032
42KJV50787Virulent0.821Virulent1.134Virulent0.8971Virulent1.4924Virulent1.08291.08548
43KJV54906Virulent1.1008Virulent1.138Virulent0.9251Virulent1.1233Virulent1.11131.0797
44KJV57212Virulent0.736Virulent1.3844Virulent0.7176Virulent1.4391Virulent1.07671.07076
45KJV57216Virulent1.199Virulent1.4498Virulent1.3006Virulent0.4069Virulent0.99151.06956
46KJV52478Virulent1.1052Virulent0.6497Virulent1.2606Virulent1.2378Virulent1.0821.06706
47KJV56675Virulent0.9893Virulent1.0009Virulent0.7595Virulent1.507Virulent1.06111.06356
48KJV50671Virulent1.1341Virulent0.8286Virulent1.0583Virulent1.1183Virulent1.09391.04664
49KJV54785Virulent0.9488Virulent0.9501Virulent0.9662Virulent1.2497Virulent1.09151.04126
50KJV55465Virulent1.2074Virulent0.6517Virulent0.8766Virulent1.4166Virulent1.05251.04096
51KJV52046Virulent1.125Virulent0.4258Virulent1.0812Virulent1.5058Virulent1.04331.03622
52KJV53129Virulent1.3451Virulent0.9869Virulent1.2255Virulent0.5632Virulent1.0551.03514
53KJV54170Virulent1.0829Virulent0.9067Virulent1.0249Virulent1.0456Virulent1.08941.0299
54KJV57117Virulent1.2103Virulent1.004Virulent1.1659Virulent0.6582Virulent1.06331.02034
55KJV57230Virulent0.7618Virulent1.4071Virulent0.9733Virulent0.8748Virulent1.06681.01676
56KJV57144Virulent0.7842Virulent0.9406Virulent0.8525Virulent1.4167Virulent1.06481.01176
57KJV56401Virulent1.3022Virulent1.3618Virulent0.821Virulent0.5723Virulent0.98691.00884
58KJV54671Virulent1.4225Virulent0.7659Virulent0.9325Virulent0.8532Virulent1.06721.00826
59KJV51409Virulent0.9735Virulent0.8943Virulent1.05Virulent1.0433Virulent1.07381.00698
60KJV57217Virulent1.0185Virulent0.8513Virulent0.9405Virulent1.1414Virulent1.07861.00606
61KJV53065Virulent0.9979Virulent0.7991Virulent0.9951Virulent1.1638Virulent1.07331.00584
62KJV54489Virulent0.9321Virulent1.2993Virulent0.7014Virulent1.0264Virulent1.0691.00564
Table 2

Molecular functions (MF) and Biological Processes (BP) of the 62 most virulent proteins.

S.NoACC.NOMolecular Functions (MF)Biological Processes (BP)
1KJV57131Protein binding, transferase activity, protein serine/threonine kinase activityCell adhesion, protein phosphorylation & ubiquitination
2KJV57379Protein binding, protein tyrosine phosphatase activityResponse to stress, peptidyl-tyrosine dephosphorylation
3KJV57416Cation bindingPorphyrin-containing compound biosynthetic process
4KJV50735Manganese ion bindingProteolysis, fatty acid metabolic process
5KJV55958Nucleotide bindingProtein phosphorylation
6KJV50994Protein binding, zinc ion bindingProteolysis, protein phosphorylation
7KJV53007Protein binding, ATP bindingTranscription, DNA-dependent, protein phosphorylation,
8KJV52681Nucleotide binding, 3′-5′ exonuclease activity, catalytic activityRibonucleoprotein complex biogenesis, nucleic acid phosphodiester bond hydrolysis
9KJV56935Nucleic acid bindingTranscription, DNA-dependent, protein phosphorylation
10KJV54168Protein kinase activity, transcription factor activity, sequence-specific DNA bindingProtein phosphorylation, regulation of transcription, DNA-templated
11KJV53284Protein bindingTranscription, DNA-dependent
12KJV57348Nucleotide binding, pullulanase activity, hydrolase activityTranscription, DNA-dependent, carbohydrate metabolic process
13KJV54735Catalytic activityL-methionine salvage, nucleoside metabolic process
14KJV53188Nucleotide bindingMetabolic process, translational termination
15KJV55734Protein binding, kinase activity, transferase activityTranscription, DNA-dependent
16KJV53939Nucleotide binding, NAD + ADP-ribosyl transferase activityMacromolecule metabolic process, protein phosphorylation
17KJV55659Protein binding, transferase activity, transferring acyl groupsMetabolic process, cell adhesion
18KJV54587Structural constituent of ribosome, DNA bindingTranslation, cell redox homeostasis
19KJV54139Nucleotide binding, cell surface receptor signaling pathwaySignaling, signal transducer activity
20KJV56583Nucleic acid binding,Transcription, DNA-dependent, cellular biogenic amine metabolic process
21KJV54508Nucleotide binding, cysteine-type peptidase activityAmino acid activation, proteolysis
22KJV55874Ligase activity, calcium ion bindingMetabolic process, axial cellular bud site selection
23KJV55533Nucleotide binding, transferase activity, transferring acyl groupsAmino acid activation, RNA phosphodiester bond hydrolysis, endonucleolytic
24KJV56683Nucleotide binding, peptidoglycan bindingTranscription, DNA-dependent, cell division
25KJV57311Protein binding, hydrolase activityMetabolic process
26KJV56036Structural molecule activity, DNA binding, hydrolase activityViral penetration into host nucleus, protein polyubiquitination
27KJV53935Catalytic activity, nucleic acid bindingRNA processing, phosphatidylinositol phosphorylation proteolysis
28KJV55746Nucleic acid binding, sequence-specific DNA bindingTranscription, DNA-dependent
29KJV56143Uroporphyrinogen-III synthase activityTetrapyrrole biosynthetic process
30KJV57393Nucleotide bindingChorismate metabolic process
31KJV52864Protein binding, translation elongation factor activityTranscription, translational elongation, peptide biosynthetic process
32KJV57120ATP binding, peptidase activityPurine nucleotide biosynthetic process, proteolysis
33KJV54670ATP bindingMetabolic process
34KJV57203Nucleotide binding, DNA bindingChorismate metabolic process, transposition, DNA-mediated
35KJV56211DNA binding, DNA-directed RNA polymerase activityTranscription, DNA-dependent
36KJV51002ATP binding, phosphorylationTranscription, DNA-dependent, kinase activity
37KJV56684Galactosyltransferase activity, aminoacyl-trna ligase activity, glycine-trna ligase activityMetabolic process
38KJV50818Protein binding, lysozyme activity, hydrolase activityNucleic acid metabolic process, peptidoglycan catabolic process
39KJV53916Receptor bindingDefense response
40KJV52751ATP binding, nucleotide bindingTranscription, DNA-dependent
41KJV57626Phosphotransferase activity, nitrogenous group as acceptor, kinase activitySignal transduction, phosphorylation
42KJV50787Nucleotide binding, metallopeptidase activity, metalloendopeptidase activity, hydrolase activity, metallopeptidase activityPurine nucleotide biosynthetic process, oxidation-reduction process, methanogenesis
43KJV54906Catalytic activity, DNA-directed DNA polymerase activity, nucleotidyl transferase activityTetrapyrrole metabolic process, oxidation-reduction process, DNA biosynthetic process, DNA replication
44KJV57212Transporter activity, aspartic-type endopeptidase activity, hydrolase activity, peptidase activityTransport, protein processing, proteolysis
45KJV57216Ion binding, hydrolase activityProteolysis, transcription, DNA-templated
46KJV52478Ion binding, transferase activityMetabolic process, drug transmembrane transport, regulation of transcription
47KJV56675Nucleotide binding, mannose binding, carbohydrate bindingMetabolic process, regulation of defense response to virus by virus,
48KJV50671Transferase activity, transferring phosphorus-containing groups, signal transducer activityPhosphatidylinositol-mediated signaling, cell surface receptor signaling pathway
49KJV54785Protein binding, actin binding, tropomyosin binding, sequence-specific DNA bindingTranscription, DNA-dependent, cellular component organization
50KJV55465Electron carrier activity, NADH dehydrogenase (ubiquinone) activityElectron transport chain, oxidation-reduction process
51KJV52046Zinc ion bindingProteolysis
52KJV53129Protein binding, metalloendopeptidase activity, hydrolase activity, metallopeptidase activityCellular response to stimulus, transcription, DNA-templated
53KJV54170ATP binding, protein kinase activityTrehalose metabolic process, protein phosphorylation
54KJV57117DNA binding, DNA-directed RNA polymerase activity, proteolysisTranscription, DNA-dependent, peptidase activity
55KJV57230Nucleotide binding, hydrogen ion transmembrane transporter activityAmino acid activation, proton transport
56KJV57144Nucleotide binding, zinc ion bindingMetabolic process, protein phosphorylation
57KJV56401Nucleotide binding, transferase activityAmino acid activation, peptidyl-aspartic acid modification, protein phosphorylation
58KJV54671ATP bindingMetabolic process
59KJV51409Nucleotide binding, single-stranded DNA bindingTranscription, DNA-dependent, SOS response, DNA replication, DNA repair
60KJV57217Nucleotide binding, hydrolase activity, nuclease activity, exonuclease activityMetabolic process, cellular response to DNA damage stimulus, vesicle fusion with Golgi apparatus, intracellular protein transport
61KJV53065Transferase activityPyrimidine-containing compound biosynthetic process
62KJV54489ATP binding,Transcription, DNA-dependent, protein deubiquitination
A list showing the properties of the 62 most virulent proteins among the 344 HPs which had virulence scores >1.0. Molecular functions (MF) and Biological Processes (BP) of the 62 most virulent proteins.

ROC (Receiver Operating Characteristic) performance measurement

As the number of genome sequences available increases, more protein products that can be computationaly processed for further study become available. If autonomic computing predictions are solely depended on, it is important that the accuracy in the methods is high. There are various available conventional methods for comparing the tool's accuracy, but the ROC analysis is a widely used method. As suggested from the plots (Fig. 2a–d) the effect of sample size (N) can not be seen on both sensitivity (γ) and specificity (η) for both the tools as far as Molecular function (MF) is concerned. The decrease in γ is followed by an increase in η for biological process (BP). As seen in the plot 2A (for ARGOT2) the , while ranges between ; in case of specificity (plot 2B) shows a triphasic pattern that increases to a value of 1.0 then decreases in the range of while showing a biphasic pattern that does not showing an effect up to ; while above this (i.e. ) it becomes highly specific. In addition to this plot 2C (for PFP) the while ranges between ; in case of specificity (plot 2D) & show constancy in their behaviour. Thus, it can be said that for the prediction of MF, the sample size (N) behaves as an independent variable. On the other hand, for BP the trend shown by the plots satisfies N to be an intrinsic property. Thus, from these plots and the ranges they fall in we can infer that large sample size and a combination of these tools can offer the promise of predicting accuracy for both MF and BP. The average sensitivity are 1.00 (Molecular function) and 0.955 (Biological processes) and the average specificity are 0.5 (moleculaer function) and 0.64 (biological processes) from ARGOT2 tool. Similarly, for PFP server, the the average sensitivity are 1.00 (Molecular function) and 0.922 (Biological processes) and the average specificity are 0.916 (moleculaer function) and 0.916 (biological processes), shown in Table 3. So, the overall accuracy of both the tools, PFP and ARGOT2 server was found to be satisfactory.
Fig. 2

ROC plots presenting the change of trend of specificity and sensitivity at different sample size (N) respectively for both Argot2 and PFP tools. Where A,B refers to Argot2 and C,D to PFP. Black lines refer to Biological process (BP) and red lines refer to Molecular function (MF).

Table 3

Sensitivity and specificity at various cut-off points for the prediction of functionally, annotated HPs and Biological processes.

S.No.SAMPLE SIZETRUE POSITIVE(a)TRUE NEGATIVE(d)FALSE POSITIVE(c)FALSE NEGATIVE(b)SensitivitySpecificity
ARGOT2
10MFBPMFBPMFBPMFBPMFBPMFBP
255500000011.00.00.0
310101000000011.00.00.0
415151401000011.00.01.0
520201901000011.00.01.0
625252202010011.00.00.67
730302403020110.960.00.67
835342714020210.931.00.66
940373035020310.911.00.71
1045423535020310.921.00.71
1150463747020410.911.00.77
1255504257020410.911.00.77
13605447672410.921.00.77
PFP Function
10MFBPMFBPMFBPMFBPSensitivitySpecificity
255500000011.00.00.0
3109812000011.01.01.0
415131124000011.01.01.0
520181624000011.01.01.0
625221836000110.951.01.0
730272236000210.921.01.0
835312448000310.891.01.0
9403626410000410.871.01.0
10454131410000410.891.01.0
11504534510000610.851.01.0
12555039510000610.871.01.0
13605541511000610.881.01.0

MF:Molecular Function, BP:Biological Process

ROC plots presenting the change of trend of specificity and sensitivity at different sample size (N) respectively for both Argot2 and PFP tools. Where A,B refers to Argot2 and C,D to PFP. Black lines refer to Biological process (BP) and red lines refer to Molecular function (MF). Sensitivity and specificity at various cut-off points for the prediction of functionally, annotated HPs and Biological processes. MF:Molecular Function, BP:Biological Process Sequence similarity networks (SSN) have emerging importance because they allow comparative analysis of mammoth datasets without the need for multiple sequence alignments (MSA). Currently, it has become more dependable as datasets constantly increase in size. The greater number of pairwise relationships determined within the networks leads to a more accurate placement of sequences among putative homologs [39]. In this study, we constructed a network with 62 most virulent proteins on which only 21 proteins interacted with others and the rest were determined as outlier proteins in the network, so we eliminated them from the network. These 21 highly identical proteins are from the same O. tsutsugamushi (strain Boryong) family. The topological parameters of the network obey power law distributions. The probability of clustering co-efficient C(k), degree distributions P(k), and neighborhood connectivity C(k) exhibit a fractal nature. The power law fits on the data points of the network's topological parameters were done and confirmed by following the standard statistical fitting method given by Clauset et al [40] where the p values for all data sets were calculated (against 2500 random samplings) and found to be greater than 0.1 and data fitting goodness was less. P(k) and C(k) have negative values (-0.556 and -0.245 respectively) which implies the network follows a hierarchical pattern and positive-value of (C(k) = 0.756) that means the network follows the assortativity that identifies a huge cluster of degree-nodes (rich club) which regulates the network. The centrality parameters: betweenness (CB = 1.170) and closeness (CC = 0.152) of the network also showed fractal behaviour and good connectivity of nodes in a network. The characterizing modular structure of a biological network is an important way to identify novel genes for targeted therapeutics. So, we have identified 2 modules (highly interconnected regions) in the parent network using MCODE(v1.5.1) in Cytoscape. These modules contain five proteins (OTBS_1583, OTBS_0920, OTBS_0674, OTBS_0675, OTBS_0676) that were considered the hub proteins in the network, shown in Fig. 3. Since the popularity of leading hubs gets changed according to the protein activities and its regulation, it cannot be determined whether these hub proteins are key regulators but some may play a significant role in pathogen survival. In the first module, the hub proteins OTBS_1583 and OTBS_0920 interacted with nine other proteins including OTBS_1584, OTBS_1581, mhA, pnP, dcD, ndK, rpoC, argS, dnaQ. While in the second moldule, the hub proteins are OTBS_0674, OTBS_0675 and OTBS_0676 interatced with OTBS_0677, OTBS_0678, OTBS_0679, OTBS_0680, and OTBS_0681.
Fig. 3

Protein-Protein interaction: Sequence similarity network (cluster coff. = 0.6) was constructed by STRING database using 62 most virulent proteins, but out of which only 21 proteins (Magenta colour nodes) interact with other proteins (yellow nodes). The toplogical properteies show the hierarchical pattern of the network, The behaviours of degree distributions (P(k)), clustering co-efficient (C(k)), neighborhood connectivity (CN(k)), betweenness (CB(k)) and closeness (CC(k)) measurements as a function of degree k. The lines are fitted lines with power laws in the data sets. The parent network was broken into 2 highly interconnected modules (cluster coff. = 1) which contains five hub proteins namely KJV55465(OTBS_1583),KJV56211(OTBS_0920), KJV57212(OTBS_0674), KJV57203 (OTBS_0675) and KJV57216 (OTBS_0676).

Protein-Protein interaction: Sequence similarity network (cluster coff. = 0.6) was constructed by STRING database using 62 most virulent proteins, but out of which only 21 proteins (Magenta colour nodes) interact with other proteins (yellow nodes). The toplogical properteies show the hierarchical pattern of the network, The behaviours of degree distributions (P(k)), clustering co-efficient (C(k)), neighborhood connectivity (CN(k)), betweenness (CB(k)) and closeness (CC(k)) measurements as a function of degree k. The lines are fitted lines with power laws in the data sets. The parent network was broken into 2 highly interconnected modules (cluster coff. = 1) which contains five hub proteins namely KJV55465(OTBS_1583),KJV56211(OTBS_0920), KJV57212(OTBS_0674), KJV57203 (OTBS_0675) and KJV57216 (OTBS_0676).

Protein structures prediction

Secondary structures were predicted with the Psipred server. The protein KJV55465 was observed to be organized in long coil regions interrupted with short beta sheets and alpha helices, while proteins KJV57203 and KJV57216 were organised in large strands and coils. Two proteins KJV56211 and KJV57212 were well organised in long alfa helices interrupted with short coil regions. For prediction of proteins strutcurs, the most identical templates (PDB_Blast) against each of the hub proteins was considered then tertiary structures were built using the Modeller. On the basis of the lowest value of the DOPE assessment score, or the highest GA341 assessment score, the “best” model was selected. The details are given in Table 4. Since it is known that Ligand binding is required for many proteins to function properly, the most probable binding sites amongst the hub proteins identified in this study using the ASP server (Active Site Prediction) [41]. The binding site of the five hub proteins, indicate that the residues in the active site are as follows: (i) KJV55465 (L-104, L-105, R-106; I-41, I-42, F-44,N-45). (ii) KJV56211 (H-77, L-80, Y-111; H-72, Q-73, Q-75; K-71, T-74, S-76). (iii) KJV57212 (N-171, D-172, I-174, V-175; K-77, Q-78, G-79; N-14, S-15, N-16, N-17, T-18). (iv) KJV57203 (M-101, L-102, Q-103,L-104; D-81, Y-82, A-83; E-140, A-141, D-142). (V) KJV57216 (D-81, N-176, D-177; M-101, L-102, Q-103; F-139, D-142), the details are given in Fig. 4. The functional domains of the hub proteins were predicted using Pfam databse [42]. In the protein “KJV55465”, the NDUFA12 domain was found between residues 13-118, which is an accessory subunit of the mitochondrial membrane respiratory chain NADH dehydrogenase (Complex I) [43]. Complex-I is found in bacteria and cyanobacteria (as a NADH-plastoquinone oxidoreductase) that is a main source of reactive oxygen species (ROS) predominantly formed by electron transfer from FMNH. Similarly, the “KJV57203” protein has two domains: (i) DUF2163 (between 01-155 residue) (ii) Phage_BR0599 (between 184-263 residue). Phage conserved hypothetical protein “BR0599” is found almost exclusively in phage or in prophage regions of bacterial genomes, including the phage-like Rhodobacter capsulatus gene transfer agent, which packages DNA. The protein “KJV57216” has only one domain namely DUF2460. The domains “DUF2163” and “DUF2460” are uncharacterized conserved proteins but their functions are still unknown. However, no positive or negative evidence was found regarding the functional domain of these proteins “KJV56211” and “KJV57212”.
Table 4

Summary of the best model produced with the lowest DOPE score and highest GA341 assessment score.

S.No.Modelled ProteinsDOPE scoreGA341 score
1KJV55465 (OTBS_1583)-6911.3750.00210
2KJV56211 (OTBS_0920)-8583.0880.04392
3KJV57212 (OTBS_0674)-12852.5960.04440
4KJV57203 (OTBS_0675)-20830.9000.16169
5KJV57216 (OTBS_0676)-13358.3630.03311
Fig. 4

Secondary structure of hub proteins predicted with the Psipred server. (1) The Psipred model showed; alpha helices, strand and coil regions and result accuracy. The confidence prediction scores are shown in the blue, grey or black bars. The red dashed line shows the conserved functional domain in the proteins. (2) The 3D structure of the five hub proteins are modelled by MODELLER and the predicted probable binding sites are circled (oily yellow) in the structures.

Summary of the best model produced with the lowest DOPE score and highest GA341 assessment score. Secondary structure of hub proteins predicted with the Psipred server. (1) The Psipred model showed; alpha helices, strand and coil regions and result accuracy. The confidence prediction scores are shown in the blue, grey or black bars. The red dashed line shows the conserved functional domain in the proteins. (2) The 3D structure of the five hub proteins are modelled by MODELLER and the predicted probable binding sites are circled (oily yellow) in the structures.

Discussion

Orientia tsutsugamushi is an obligate intracellular bacterium transmitted to humans via the bite of an infected trombiculid mite [44]. Here, the main goal of our study was to determe the protein functions from the hypothetical protein sequences. Functional annotation of 344 hypothetical proteins are of major importance in providing insight into their molecular functions and will help us in the identification of new drugs against the disease. The subcellular localization of these proteins is important to understand their interactions with drug molecules and other proteins. In our study, we found that most of the HPs exist in the cytoplasmic region (75%), inner membrane regions (3.45%), outer membrane regions (8.43%), extra cellular regions (8.13%) and in the periplasmic region (4.36%). Further we classified these 344 HPs into various protein classes like DNA binding proteins, transporter proteins, enzymes, outer membrane proteins, and many other proteins. It has been studied that the DNA-binding proteins play an important role in bacterial stress tolerance and survival in the host and may be responsible for virulence [45, 46]. It is known that enzymes play important roles in pathogen virulence (like bacterial cell-wall & ubiquinone biosynthesis, antibiotic resistance (β-lactam), invasion, and intracellular replication) [47, 48, 49]. Besides enzymes, transporter proteins are also crucial for the intracellular survival of bacteria as they efflux drug molecules out of the bacterial cell. Therefore, all the classified proteins are important for the survival and pathogenicity of the pathogen. The present investigation mainly focused only on identifying virulence factors among these hypothetical proteins. From 344 hypothetical proteins only 62 proteins were found to be most virulent and these virulence properties may help in the pathogen's survival. Therefore these proteins may be therapeutic drug-targets to combat pathogens. It has already been reported that virulent proteins were targeted for drug discovery and development [50]. Further, these 62 virulent proteins were considered for sequence similarity network analysis. In network analysis, two modules and 05 hub genes were identified which closely interacted with important proteins:- KJV55465 (OTBS_1583): Is a protein 120 residues long known as “Putative NADH ubiquinone oxidoreductase”. It is mainly involed in electron transporter activity and NADH dehydrogenase (ubiquinone) activity. In network module, it was found to be interacting with rnhA, OTBS_1581 and OTBS_1584. It well known that rnhA (Ribonuclease H) is involved in repairing processes in cases where RNA/DNA duplexes are generated during DNA replication [51]. The protein OTBS_1584 is a putative ABC transporter substrate binding protein which play roles in nutrient uptake and drug resistance. However, there is increasing evidence that these transport systems play either direct or indirect roles in the virulence of bacteria [52]. KJV56211 (OTBS_0920): It is 120 residues long protein which intercted with ndK, rpoC, dcD and pnP. Nucleoside diphosphate kinase (Ndk) is an important enzyme which plays an important role in bacterial growth, signal transduction and pathogenicity [53]. The rpoC protein, which encodes the RNA polymerase β′ subunit that help to bacterial pathogen growth. The dcD is know as dCTP deaminase which catalyzes the deamination of dCTP to dUTP. The primary source of dUMP, the precursor for dTTP in the gram-negative bacteria is getting throgh a pathway where dCTP is deaminated by dcD to produce ammonia and dUTP that later hydrolyzed by dUTPase to generate dUMP and pyrophosphatea Similarly, the pnP is known as polyribonucleotide nucleotidyltransferase which is widely conserved and plays a major role in RNA decay in both gram-negative and gram-positive bacteria [54]. KJV57212 (OTBS_0674): It is 142 long residues protein, which interacted with two hub proteins OTBS_0675 (270 residue), OTBS_0676 (199 residue) and these three proteins are synergistically interacted with other proteins like OTBS_0677 (uncharacterized protein), OTBS_0677 (uncharacterized protein), OTBS_0679 (uncharacterized protein), OTBS_0680 and OTBS_0681. In which OTBS_0680 and OTBS_0681 are known as phage major capsid protein (HK97) and phage prohead HK97 respectively.

Conclusion

The strategies used in our study to annotate functions of hypothetical proteins can be useful for designing experimental approaches geared towards the evolution of the exact function of the corresponding gene. Here, we have characterized and functionally annotated the hypothetical proteins from O. tsutsugamushi str. Karp and categorized them into different protein classes. Among these 344 HPs, 62 proteins were found to be most virulent. Virulence refers to the severity of infection, and different toxins are produced by pathogenic bacteria to withstand the host immune system. In addition, we constructed a sequence similarity network to understand the interaction of these virulent proteins and identifed five hub proteins (KJV55465, KJV56211, KJV57212, KJV57203 and KJV57216) among them which play key regulatory and co-regulatory roles in the network. The conserverd domains of these hub proteins necessary for functional information, experimental design and genome-level annotation were analyzed. Conclusively secondary structure prediction and 3D modelling further provided insight into the spatial arrangement of the amino acids in the proteins to find the most probale binding sites for drugs. These HPs may serve as potential therapeutic targets and may be considered as a milestone in the emerging field of drug discovery. We hope that the information of HPs from O. tsutsugamushi will be innovative for further in-vitro analysis of this disease.

Declarations

Author contribution statement

Nikhat Imam: Conceived and designed the experiments; Performed the experiments; Wrote the paper. Aftab Alam: Analyzed and interpreted the data; Wrote the paper. Rafat Ali: Analyzed and interpreted the data. Mohd Faizan Siddiqui: Performed the experiments; Analyzed and interpreted the data. Sher Ali, Romana Ishrat: Conceived and designed the experiments. Md. Zubbair Malik: Contributed reagents, materials, analysis tools or data.

Funding statement

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Competing interest statement

The authors declare no conflict of interest.

Additional information

No additional information is available for this paper.
  49 in total

1.  Prediction of protein subcellular localization.

Authors:  Chin-Sheng Yu; Yu-Ching Chen; Chih-Hao Lu; Jenn-Kang Hwang
Journal:  Proteins       Date:  2006-08-15

2.  Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence.

Authors:  K Guruprasad; B V Reddy; M W Pandit
Journal:  Protein Eng       Date:  1990-12

3.  Exploitation of the endocytic pathway by Orientia tsutsugamushi in nonprofessional phagocytes.

Authors:  Hyuk Chu; Jung-Hee Lee; Seung-Hoon Han; Se-Yoon Kim; Nam-Hyuk Cho; Ik-Sang Kim; Myung-Sik Choi
Journal:  Infect Immun       Date:  2006-07       Impact factor: 3.441

Review 4.  Targeting bacterial secretion systems: benefits of disarmament in the microcosm.

Authors:  Christian Baron; Brian Coombes
Journal:  Infect Disord Drug Targets       Date:  2007-03

5.  AADS--an automated active site identification, docking, and scoring protocol for protein targets based on physicochemical descriptors.

Authors:  Tanya Singh; D Biswas; B Jayaram
Journal:  J Chem Inf Model       Date:  2011-09-15       Impact factor: 4.956

6.  The Histone-Like Nucleoid Structuring Protein (H-NS) Is a Negative Regulator of the Lateral Flagellar System in the Deep-Sea Bacterium Shewanella piezotolerans WP3.

Authors:  Huahua Jian; Guanpeng Xu; Yingbao Gai; Jun Xu; Xiang Xiao
Journal:  Appl Environ Microbiol       Date:  2016-04-04       Impact factor: 4.792

7.  In Silico Structural and Functional Annotation of Hypothetical Proteins of Vibrio cholerae O139.

Authors:  Md Saiful Islam; Shah Md Shahik; Md Sohel; Noman I A Patwary; Md Anayet Hasan
Journal:  Genomics Inform       Date:  2015-06-30

8.  Beta-lactamase induction and cell wall metabolism in Gram-negative bacteria.

Authors:  Ximin Zeng; Jun Lin
Journal:  Front Microbiol       Date:  2013-05-22       Impact factor: 5.640

Review 9.  A Review of Scrub Typhus (Orientia tsutsugamushi and Related Organisms): Then, Now, and Tomorrow.

Authors:  Alison Luce-Fedrow; Marcie L Lehman; Daryl J Kelly; Kristin Mullins; Alice N Maina; Richard L Stewart; Hong Ge; Heidi St John; Ju Jiang; Allen L Richards
Journal:  Trop Med Infect Dis       Date:  2018-01-17

10.  The Pfam protein families database in 2019.

Authors:  Sara El-Gebali; Jaina Mistry; Alex Bateman; Sean R Eddy; Aurélien Luciani; Simon C Potter; Matloob Qureshi; Lorna J Richardson; Gustavo A Salazar; Alfredo Smart; Erik L L Sonnhammer; Layla Hirsh; Lisanna Paladin; Damiano Piovesan; Silvio C E Tosatto; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more
  2 in total

1.  An Educational Bioinformatics Project to Improve Genome Annotation.

Authors:  Zoie Amatore; Susan Gunn; Laura K Harris
Journal:  Front Microbiol       Date:  2020-12-07       Impact factor: 5.640

2.  Exploration of Streptococcus core genome to reveal druggable targets and novel therapeutics against S. pneumoniae.

Authors:  Zeshan Mahmud Chowdhury; Arittra Bhattacharjee; Ishtiaque Ahammad; Mohammad Uzzal Hossain; Abdullah All Jaber; Anisur Rahman; Preonath Chondrow Dev; Md Salimullah; Chaman Ara Keya
Journal:  PLoS One       Date:  2022-08-18       Impact factor: 3.752

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.