Literature DB >> 34358261

Efficacy of signal peptide predictors in identifying signal peptides in the experimental secretome of Picrophilous torridus, a thermoacidophilic archaeon.

Neelja Singhal¹, Anjali Garg¹, Nirpendra Singh², Pallavi Gulati³, Manish Kumar¹, Manisha Goel¹.

Abstract

Secretory proteins are important for microbial adaptation and survival in a particular environment. Till date, experimental secretomes have been reported for a few archaea. In this study, we have identified the experimental secretome of Picrophilous torridus and evaluated the efficacy of various signal peptide predictors (SPPs) in identifying signal peptides (SPs) in its experimental secretome. Liquid chromatography mass spectrometric (LC MS) analysis was performed for three independent P. torridus secretome samples and only those proteins which were common in the three experiments were selected for further analysis. Thus, 30 proteins were finally included in this study. Of these, 10 proteins were identified as hypothetical/uncharacterized proteins. Gene Ontology, KEGG and STRING analyses revealed that majority of the sercreted proteins and/or their interacting partners were involved in different metabolic pathways. Also, a few proteins like malate dehydrogenase (Q6L0C3) were multi-functional involved in different metabolic pathways like carbon metabolism, microbial metabolism in diverse environments, biosynthesis of antibiotics, etc. Multi-functionality of the secreted proteins reflects an important aspect of thermoacidophilic adaptation of P. torridus which has the smallest genome (1.5 Mbp) among nonparasitic aerobic microbes. SPPs like, PRED-SIGNAL, SignalP 5.0, PRED-TAT and LipoP 1.0 identified SPs in only a few secreted proteins. This suggests that either these SPPs were insufficient, or N-terminal SPs were absent in majority of the secreted proteins, or there might be alternative mechanisms of protein translocation in P. torridus.

Entities: Chemical

Mesh：

Substances：

Year: 2021 PMID： 34358261 PMCID： PMC8345856 DOI： 10.1371/journal.pone.0255826

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Picrophilus torridus is an extremely acidophilic and moderately thermophilic (optimal growth temperature ~ 55–60°C) euryarchaeon, which was first isolated from dry solfataric fields of northern Japan [1]. The whole genome sequence analysis of P. torridus revealed that it had the highest coding density among thermoacidophiles and the smallest genome (1.55 Mbp) among nonparasitic aerobic microbes culturable on organic substrates [2]. Though the intracellular pH of thermoacidophiles is reportedly ca. neutral, but in case of Picrophilus spp. an unusual intracellular pH of around 4.6 has been reported [3]. Microbial secretome/secreted proteins play an important role in adaptation and survival in a particular niche, including thermoacidophilic environment. The secretome performs a variety of functions like degradation of complex polymeric substances (carbohydrates and proteins), passage of nutrients inside the cell, protection against toxic compounds, signal transduction etc [4,5]. In prokaryotes, eukaryotes and archaea, a variety of transport systems can be utilized for protein secretion. The ABC transporters are used for excretion of peptides and toxins [6]. The universally conserved general secretory pathway (Sec-pathway) is used for translocation of unfolded secretory proteins across the cytoplasmic membrane [7]. The proteins intended for secretion harbour a signal peptide (SP) at their N-terminal. The SP is made up of three regions: the N- terminal (n-region) containing positively charged amino acid residues, the hydrophobic (h-) region containing hydrophobic amino acid residues and a c-region containing small, uncharged amino acid residues and a characteristic cleavage site [8]. The SPs are cleaved from the proteins during or after their translocation across the cell membrane by specialized enzymes called signal peptidases. The signal peptidases are of two types, signal peptidase I (SPase I) or signal peptidase II (SPase II). SPase I substrates are usually released as soluble proteins, whereas SPase II substrates (lipoproteins) are attached to the cell membrane with the help of a lipid anchor. Although, genomes of many archaea encode for proteins whose N-terminal contain lipobox, SPase II homologs are detected rarely in archaea [9]. Apart from the Sec-pathway, the twin-arginine translocation (TAT) pathway is another protein translocation pathway which allows secretion of folded proteins [10]. The TAT substrates were reportedly present in haloarchaea like Haloferax volcanii and Natrinema sp. J7-2 [11,12]. A very few studies have investigated the composition of archaeal secretomes. To the best of our knowledge, till date, experimental secretomes have been identified for an antartic archaeon Methanococcoides burtonii [13], hyperthermoacidophilic archaeon Sulfolobus spp. [14], hyperthermophilic archaeon Pyrococcus furiosus [5] and haloarchaea like Haloferax volcanii and Natrinema sp. J7-2 [11,12]. However, secreted proteins of thermoacidophilic archaeon P. torridus have not been identified experimentally, till date. An earlier study, reported the composition of whole cell proteins of P. torridus using a bottom down proteomics approach, where proteins separated by two-dimensional (2D) gel electrophoresis were identified by mass spectrometry [15]. In the present study we have discerned the experimental secretome of P. torridus using liquid chromatography mass spectrometry (LC MS) and evaluated the efficacy of four signal peptide predictors (SPPs)—PRED-SIGNAL [16], SignalP 5.0 [17], PRED-TAT [18] and LipoP 1.0 [19] in identifying SPs in the experimental secretome. Though, many SPPs are available for predicting SPs like, SignalP 4.0 [20], Phobius [21], DeepSig [22] etc., in this study only four SPPs were used. PRED-SIGNAL was used because it was specifically designed for prediction of archaeal SPs and was trained on archaeal proteins having experimentally verified SPs [16]. The reason underlying the use of SignalP 5.0 was that, besides being one of the most cited and widely used SPPs, SignalP 5.0 can predict SPs and their cleavage sites in archaeal proteins, also [17]. Since, earlier studies have reported that TAT substrates and lipoproteins were abundant in the secretome of archaea [5,13] hence; SPPs PRED-TAT and LipoP, respectively [18,19] were used to discern their presence in P. torridus secretome.

Materials and methods

Bacterial culture and growth conditions

P. torridus (DSM 9790) was purchased from Leibniz Institute, DSMZ-German Collection of Microorganisms and Cell Cultures GmbH, Germany. The archaeal cells were grown at 55°C in 1 L of the culture medium in a shaking incubator set at 100 rpm. The components of the culture medium were: 0.05% magnesium sulfate, 0.025% calcium chloride, 0.02% ammonium sulfate, 0.3% potassium dihydrogen phosphate, 0.2% yeast extract and 1% glucose; pH 1.0 [23].

Preparation of culture filtrate proteins

The P. torridus cultures were sampled at late exponential growth phase (~1–2 x 108 cells/ml). The cultures were transferred to 500 ml centrifuge bottles and centrifuged at 8000 rpm at 4°C for 30 min. The supernatant was sterile filtered using filters of 0.2 μm pore size. The proteins in the cell free supernatant were concentrated using a Vivacell 250 ultra-filtration unit (Sartorius AG, Germany; filter cut off 3 kDa) following the manufacturer’s instructions, to a final volume of 0.5 ml.

Identification of culture filtrate proteins by LC MS

The concentrated culture filtrate proteins were precipitated with 10% trichloroacetic acid (TCA) at 4°C, overnight. The resulting pellet was processed for protein identification by LC MS using the methods described earlier [24]. Briefly, the protein pellet was washed with sodium acetate solution (2% in ethanol), kept for air-drying and finally resuspended in 200 μl of 8 M urea buffer (UB). Then 100 μl of 0.05 M iodoacetamide (IAA) was added, kept for 20 min incubation and centrifuged, followed by two washes with 100 μl of 0.05 M ammonium bicarbonate (ABC) and, centrifugation. This was followed by addition of 40 μl of ABC with trypsin (Promega V511A) (enzyme: protein ratio 1:100) and incubation at 37°C in a water bath for 16–18 h. The digested peptides were eluted by centrifugation at 14,000 × g for 10 min; acidified with 0.1% formic acid and finally concentrated to 10 μl using a speed vac. LC MS analysis of the secretome was performed using AB SCIEX Triple TOF 5600. The peptides were identified by the ProteinPilot software version 4.0 (AB SCIEX) using Paragon algorithm as the search engine. The proteins with a cut-off set at 1% false-discovery rate and a minimum of 2-peptide-per-protein were selected for further study. The LC MS analysis was performed for three independent P. torridus secretome samples and only those proteins which were common in the three experiments were selected for further analysis (S1–S3 Files).

Gene ontology and protein-protein interaction (PPI) studies

The functional annotation of the secretome was performed using the slim version of Gene ontology (GO) terms retrieved from the Gene Ontology Consortium [25]. The information about the interactome of secretory proteins of P. torridus was retrieved from STRING (database version 10.5)—a public repository of protein-protein interaction networks [26]. The analysis parameters included data from all the interaction sources like text mining, experiments, databases, co‑expression, neighbourhood at default values. Interacting partners of the proteins were discerned using an in-house perl script and a confidence value ≥0.4. An interaction network of the secreted proteins was constructed using Cytoscape version 3.6.1 [27]. Simultaneously, the secretory proteins were also mapped on their corresponding metabolic networks in the Kyoto Encyclopaedia of Genes and Genomes (KEGG). KEGG is an extensively used reference knowledge base that cross-integrates genomic, chemical, and systemic functional information of an organism [28].

Computational analysis of hypothetical/uncharacterized proteins

Computational analysis of the probable function of hypothetical/uncharacterized proteins was done using BLASTp (http://blast.ncbi.nlm.nih). The top five BLAST hits were selected for annotating the function of each hypothetical protein. BLAST search was performed at NCBI using the default threshold E-value—10, including the threshold value of 0.005. The domains present in the hypothetical proteins were discerned using Conserved Domain Database (CDD) (https://www.ncbi.nlm.nih.gov/cdd), Pfam 32 (https://pfam.xfam.org/) and InterPro 74 (https://www.ebi.ac.uk/interpro/). The top five BLAST hits were selected for functional annotation and probing the conserved domains of each hypothetical/uncharacterized secretory protein.

Identification of N-terminal signal sequences

The N-terminal signal sequences in the culture filtrate proteins were identified using the SPPs like PRED-SIGNAL, SignalP 5.0, PRED-TAT and LipoP 1.0. PRED-SIGNAL was a SPP which was trained on archaeal secretory proteins and was especially designed for identification of SPs in archaeal proteins [17]. SignalP 5.0 can predict the SPs and their cleavage sites in proteins of gram-positive and–negative bacteria, archaea and eukaryotes [18]. PRED-TAT is a SPP program which can predict twin-arginine and secretory SPs in proteins of both gram-positive and–negative bacteria [18]. The LipoP 1.0 is a SP prediction program which can discriminate between lipoprotein SPs, other SPs and N-terminal membrane helices in Gram-negative bacteria [19].

Results

LC MS based protein identification and discerning the domains in hypothetical proteins for functional annotation

The number of proteins identified by LC MS in the three independent experiments was 68, 75 and 97 (S1–S3 Files). Only 30 proteins which were found to be present in the three independent secretome samples of P. torridus were selected for further analysis. The details of the 30 proteins selected for further analysis are shown in and their prominent domains are depicted using a Circos plot (S1 Fig). Of the 30 proteins, the 3D structure of only malate dehydrogenease (Q6L0C3) was present in the Protein Data Bank (PDB). PDB BLAST of the other proteins revealed that only 18 proteins showed identity with the known 3D protein structures available in the PDB (S1 Table). Ten proteins were hypothetical/uncharacterized viz. Q6L2C5, Q6L2C8, Q6KZG4, Q6KZB9, Q6L1G3, Q6L2S5, Q6L2L9, Q6L268, Q6L1Y4 and Q6KZK5. The protein domains discerned in the hypothetical proteins using CDD, Pfam and InterPro are summarized briefly in . The top five BLAST hits of Q6L2C5 revealed that it was a conserved protein in Picrophilus spp. and other archaea like Thermoplasmatales archaeon I-plasma and Aciduliprofundum sp. MAR08-339. Pfam did not find any domain but InterPro predicted a domain of unknown function DUF929 (IPR009272), while CDD search predicted a domain of Reo_sigmaC super family. The top four hits of Q6L2C8 indicated that it was a Von Willebrand factor type A (VWA)-domain containing protein also reported in archaea like Sulfurisphaera tokodaii and Acidianus spp. One BLAST hit indicated that it was a hypothetical protein of archaea Candidatus aramenus sulfurataquae. InterPro and Pfam revealed presence of VWA domain and archaellum regulatory network B, C-terminal. CDD search listed two domain hits, one of vWFA super family (cl00057) and the other of YfbK (COG2304). The top five BLAST hits of Q6KZG4 and Q6KZB9 indicated that these proteins contained a DUF929 domain and were prevalent in Picrophilus spp. and Ferroplasma spp. InterPro, Pfam and CDD indicated presence of domains of undetermined function. The top five BLAST hits of Q6L1G3 indicated that it was a hypothetical, exported protein in Picrophilus spp., Thermoplasma spp. and Thermoplasmatales archaeon A-plasma. InterPro, Pfam and CDD revealed presence of a cell adhesion related domain found in bacteria (CARDB). The top five hits of Q6L2S5 indicated it to be a hypothetical protein present in Picrophilus spp., and Thermoplasma spp. InterPro and CDD indicated the presence of a domain of DrsEFH/DsrE superfamily while pfam failed to identify any domain. The top five hits of Q6L2L9 indicated it to be a transcriptional regulator of ArsR family present in archaeal organisms like Acidiplasma spp and Ferroplasma spp. InterPro identified a DNA-binding domain and CDD revealed the presence of a domain of COG4738 super family (accession: cl01956) which might function as a transcriptional regulator. Pfam did not suggest any domain. The top five hits of Q6L268 and Q6KZ5 indicated that these were hypothetical proteins of Picrophilus spp., and Ferroplasma spp. InterPro, Pfam and CDD did not reveal any conserved domains in these proteins. The top five hits of Q6L1Y4 indicated that it was a transcriptional regulator found in Picrophilus and Thermoplasma spp. Pfam did not identify any domain in Q6L1Y4t but InterPro revealed a DNA-binding domain and CDD revealed the presence of a domain of phenylalanyl-tRNA synthetase subunit alpha.

Functional analysis of the secreted proteins

The secreted proteins were assigned functional categories according to the annotation derived from the P. torridus genome sequence (NCBI Reference sequence: NC_005877.1). It was observed that majority of the secreted proteins were membrane proteins, followed by proteins involved in other activities, followed by proteins involved in oxidoreductase activities, proteins involved in ion binding activity, peptidase activity, structural constituents of ribosome, GTPase activity, rRNA binding, peptidase, lyase and transferase activities (Fig 1). Analysis of functional enrichment by Gene Ontology (GO) revealed that the secreted proteins were involved in a variety of biological processes (BP) of which the major function was metabolic processes (Fig 2A). In the category molecular function (MF), the secreted proteins were involved in oxidoreductase activity (6 proteins), ion binding (3 proteins) and peptidase activity (2 proteins). One protein each was found to be involved in RNA binding, rRNA binding, structural constituent of ribosome, ligase activity, translation factor activity, DNA binding, GTPase activity, unfolded protein binding (Fig 2B). Cell component (CC) enrichment analysis revealed that most of the secretory proteins were cytoplasmic proteins (6 proteins), followed by intracellular, ribosomal, cell and macromolecular complex proteins (1 protein each) (Fig 2C).

Fig 1

Distribution of P. torridus secretory proteins according to their functional categories.

Fig 2

Functional categories of P. torridus secretory proteins on the basis of Gene Ontology (GO): (a) biological function (b) molecular function and (c) cellular component function.

Protein-protein interactions (PPIs) and KEGG pathway analysis

STRING analysis revealed that except one protein (Q6KZG4), all the proteins had known or predicted interacting partners. According to the STRING database, the 29 secretory proteins of P. torridus interacted with 488 other proteins of P. torridus. KEGG pathway map of the 30 proteins revealed their involvement in 29 different pathways (). The interaction network of the secretory proteins was created using Cytoscape (Fig 3). The secretory proteins are marked inside the squares, their interacting proteins are marked in circles and their respective pathways are depicted via a particular colour. Since, Cytoscape can show a single pathway at a time, hence only a single pathway has been depicted for some multi-functional protein(s).

Fig 3

The protein-protein interaction network and KEGG pathway map of the secretory proteins of P. torridus.

Prediction efficacy of SPPs

The average prediction efficacy of PRED-SIGNAL and PRED-TAT in identifying SPs in the secretory proteins of P. torridus identified in three independent experiments (S1–S3 Files) were almost similar (~16%), followed by Signal P (15.07%) and LipoP (13.55%) (S2 Table). Evaluation of the prediction efficacy of SPPs in identifying SPs in the 30 proteins that were common in three independent secretome samples revealed that, each SPP identified N-terminal signal sequences in eight different proteins of P. torridus (Table 4). Thus, the prediction efficacy of each SPP was 26.66%. However, all the four SPPs identified N-terminal SPs in five proteins of P. torridus namely, Q6L2C5, Q6L182, Q6L0C3, Q6L1G3 and Q6L081. PRED-SIGNAL, PRED-TAT and LipoP identified SPs in the protein Q6KZB, while SignalP, PRED-TAT and LipoP identified SPs in protein Q6KZG4. Both PRED-SIGNAL and SignalP identified SPs in P. torridus proteins Q6L2N0 and Q6KZE9. Both PRED-SIGNAL and SignalP made identical predictions, except for the protein Q6KZG4 in which SP was predicted by SignalP, while PRED-SIGNAL identified trans membrane segments in this protein. PRED-SIGNAL identified the SPs in protein QCKZB9, but SignalP could not. Though, predictions by PRED-TAT and SignalP were similar, PRED-TAT additionally identified SPs in the protein Q6L2S5. Like SignalP and LipoP, PRED-TAT also identified SPs in the protein Q6KZG4. PRED-TAT identified transmembrane segments in Q6L2N0 and Q6KZE9, while PRED-SIGNAL and SignalP predicted SPs in these proteins. LipoP and PRED-TAT identified SPs in the proteins Q6KZT9 and Q6L2S5, respectively. Though, most of the LipoP predictions were similar to other predictors, unlike PRED-SIGNAL and SignalP, it identified SPs in proteins Q6KZG4 and Q6KZT9.

Table 4

Details of the signal peptides, their cleavage sites and trans membrane segments predicted by various signal peptide predictors in the secretory proteins of P. torridus.

S. No.	Protein accession number	Signal peptide predictor (signal peptides and signal cleavage site)
S. No.	Protein accession number	PRED-SIGNAL	SignalP 5.0	PRED-TAT	LipoP 1.0
1	Q6L2C5	MDNKKIISIAMVAIMVLSAFAVLGSMPVQQAATHNKA signal cleavage site (37–38)	MDNKKIISIAMVAIMVLSAFAVLGSMPVQQA signal cleavage site (31–32)	MDNKKIISIAMVAIMVLSAAVLGSMPVQQA signal cleavage site (31–32)	MDNKKIISIAMVAIMVLSAFAVLGSMPVQQA, signal cleavage site (31–32)
2	Q6L182	MSESDYRKKFKKYMLIAAVLIVSLIFVAEGFGAAIPGQTSAPAVA signal cleavage site (45–46)	MSESDYRKKFKKYMLIAAVLIVSLIFVAEGFGAA signal cleavage site (34–35)	MSESDYRKKFKKYMLIAAVLIVSLIFVAEGFGAAIPGQTSAPAVA signal cleavage site (45–46)	MSESDYRKKFKKYMLIAAVLIVSLIFVAEGFGA signal cleavage site (33–34)
3	Q6L2N6	TM (8–28)	-	TM (8–28)
4	Q6L2M0	TM (7–26)	-	TM (10–30)	-
5	Q6L0Y1	-	-	TM (53–73)	-
6	Q6L0C3	MARSKISVIGAGAVGATVAQTLA signal cleavage site (23–24)	MARSKISVIGAGAVGATVAQTLA signal cleavage site (23–24)	MARSKISVIGAGAVGATVAQTLAIR signal cleavage site (25–26)	MARSKISVIGAGAVGATVA signal cleavage site (19–20)
7	Q6KZG4	TM (6–25)	MANINYKLLVLFIAVFVVIAFFAVDYDLYHA signal cleavage site (31–32)	MANINYKLLVLFIAVFVVIAFFA signal cleavage site (23–24)	MANINYKLLVLFIAVFVVIAFFA signal cleavage site (23–24)
8	Q6L2N0	MRGIKIIAIIIICMFIITS signal cleavage site (19–20)	MRGIKIIAIIIICMFIITSMDVVIP signal cleavage site (25–26)	TM (4–24)	-
9	Q6KZB9	MAKNNKRSTNKNQKNKNSASKNQNKKNNINLKNKNVIGSAIAAVIIVVLVVVVLTHPLYR signal cleavage site (64–65)	-	MAKNNKRSTNKNQKNKNSASKNQNKKNNINLKNKNVIGSAIAAVIIVVLVVVVLTHPLYR signal cleavage site (60–61)	MAKNNKRSTNKNQKNKNSASKNQNKKNNINLKNKNVIGSAIAAVIIVVLVVVVLT signal cleavage site (55–56)
10	Q6L1G3	MNKTRRGIIVAVTLLMVLSTFAFVSQA signal cleavage site (27–28)	MNKTRRGIIVAVTLLMVLSTFAFVSQA signal cleavage site (27–28)	MNKTRRGIIVAVTLLMVLSTFAFVSQA signal cleavage site (27–28)	MNKTRRGIIVAVTLLMVLSTFAFVSQA signal cleavage site (27–28)
11	Q6KZE9	MNKKVIASLIIVVIIIISGISYVYIHSNTATSGKITVKA signal cleavage site (39–40)	MNKKVIASLIIVVIIIISGISYVYIHS signal cleavage site (27–28)	TM (5–25)	-
12	Q6KZT9	TM (7–29)	-	TM (10–30)	MVMNSKARIIIAVVVVIIIIAAGFSFA signal cleavage site (27–28)
13	Q6L2S5	-	-	MKNVAIIISTSNKEKAVA signal cleavage site (18–19)	-
14	Q6L268	TM (35–63)	-	TM (36–66)	-
15	Q6L081	MAKNKIIAIVAIVIVIIVIGSVIA signal cleavage site (24–25)	MAKNKIIAIVAIVIVIIVIGSVIA signal cleavage site (24–25)	MAKNKIIAIVAIVIVIIVIGSVIA signal cleavage site (24–25)	MAKNKIIAIVAIVIVIIVIGSVIA, signal cleavage site (24–25)

TM: Trans Membrane segment.

Number in parenthesis indicates amino acid position.

TM: Trans Membrane segment. Number in parenthesis indicates amino acid position.

Discussion

The aim of the current study was to evaluate the efficacy of various SPPs in identifying SPs in the experimental secretome of P. torridus. Culture filtrate proteins of P. torridus were concentrated, processed and analyzed by LC MS. Using this approach; 68 proteins were identified by LC MS in the first experiment (S1 File), 75 proteins in the second experiment (S2 File) and 97 proteins in the third experiment (S3 File). To avoid any ambiguity and remove any technical artefacts, only 30 proteins which were present in all the three experiments were included in this study. In depth analysis of the experimental secretome of P. torridus revealed that majority of the secreted proteins were involved in various metabolic processes and one-third of the secreted proteins were hypothetical/uncharacterized. In this regard, our results are similar to an earlier study which reported that most of the annotated secreted proteins of P. torridus were components of the respiratory chain or hypothetical proteins, transporters, proteases and exported binding proteins [2]. Despite the fact that many of these proteins had intracellular functions, and should not be present in culture filtrate, intracellular proteins have been regularly reported from culture filtrates of archaea [14,29] and bacteria [30]. If this is due to the artefacts during cell lysis or due to active secretion of intracellular proteins in the surrounding culture medium [31], the underlying reason is still unclear. In archaea, protein export via membrane vesicles has been proposed as another possible reason underlying the presence of these proteins in culture filtrate [32,33]. An earlier study reported that various proteins involved in translation and, energy and metabolism were exported via secreted membrane vesicles in archaeal Sulfolobus species [14]. Interestingly, in our study too, some secreted proteins like malate dehydrogenase (Q6L0C3) were observed to be involved in many different metabolic pathways like carbon metabolism, microbial metabolism in diverse environments, biosynthesis of antibiotics, biosynthesis of secondary metabolites, pyruvate metabolism, citrate cycle etc. Multifunctional secreted proteins might be an important attribute for thermoacidophilic adaptation of P. torridus which has the smallest genome (1.5 Mbp) among nonparasitic aerobic microbes. Of the 30 proteins discerned in the experimental secretome, ten proteins were identified as hypothetical/uncharacterized proteins. Due to absence of any conserved domain(s) or domain(s) of underdetermined functions, putative functions of four secreted hypothetical proteins—Q6KZG4, Q6L268, and Q6KZK5 and Q6KZB9 could not be predicted in silico. Of the six hypothetical secreted proteins whose putative functions could be predicted, two proteins, Q6L2L9 and Q6L1Y4 were probably transcriptional regulators. Proteins containing domains of Reo_sigmaC super family have been reportedly involved in host-virus interactions, hence it might be anticipated that Q6L2C5 might also be involved in Picrophilus-viral interactions [34]. The secreted protein Q6L2C8 contained a Von Willebrand factor type A (vWA) domain which was originally found in the blood coagulation protein von Willebrand factor (vWF) where it helps in the formation of protein aggregates [35]. The vWA domain containing proteins are involved in a variety of important cellular functions like formation of the basal membrane formation, signalling, cell migration, cell differentiation, adhesion, haemostasis, chromosomal stability and in immune defences. Thus, it might be anticipated that Q6L2C8 might also be involved in vital cellular functions of P. torridus. Interestingly proteins containing a vWA domain have been reported to be present in secreted membrane vesicles of archaeal Sulfolobus species [14]. The protein Q6L1G3 contained a domain related to cell adhesion in bacteria (CARDB). Proteins containing CARDB domain were reported to be homologs of calpain which is an essential, cytoplasmic, calcium-dependent cysteine endopeptidase of eukaryotes [36]. Calpains are implicated in a variety of calcium-regulated cellular processes in eukaryotes such as signal transduction, cell proliferation, cell cycle progression, differentiation, apoptosis, etc [37,38]. Thus, Q6L1G3 might also be involved in various calcium-regulated cellular processes of P. torridus. The protein Q6L2S5 contained a DsrF-like family domain. DsrE/DsrF are small soluble proteins which are involved in intracellular reduction of sulphur [39]. Hence, the protein Q6L2S5 might help in survival of P. torridus in solfataric environment. The prediction efficacy of the four SPPs on the 30 proteins which were common in the three independent secretome samples was identical (26.66%) because each program identified SPs in eight different proteins of P. torridus. The supplementary information contained in the SP prediction program PRED-SIGNAL showed that 86 proteins of P. torridus have SPs, while in silico predictions by SignalP revealed that 121 proteins of P. torridus were secretory proteins [2]. Till recently, PRED-SIGNAL was the only program available for prediction of archaeal SPs. Since, it was trained on archaeal secretome, its prediction accuracy was expected to be better than other prediction programs. However, our results revealed that it could identify SPs in only eight proteins, and trans membrane segments in five proteins. Though the earlier versions of SignalP could predict SPs in secretory proteins gram-positive and–negative bacteria, the latest version, SignalP 5.0 can predict the SPs in archaeal proteins, also [40,41]. However, our results revealed that SignalP 5.0 could also identify SPS in only eight proteins of P. torridus. This suggests that the experimental secretome of P. torridus might be smaller than the theoretical secretome predicted by various SPPs. However, there might be several reasons underlying the differences observed in the experimental and theoretical secretome. Of which, the first might be that, the SignalP was trained on SPs of gram-positive and- negative bacteria which might have led to an over estimated number of SPs in P. torridus, which is an archaea. Second, secretome profile of microorganisms varies greatly in accordance with their growth conditions and different stages of growth (log phase versus exponential phase). Thus, the P. torridus secretome reported in the present study might be specific to the growth conditions which were used in this study. Third, some low- level expressed proteins might have been missed in this study from proteomic identification, due to technical constraints like, detection limit of mass spectrometry. Since TAT substrates have been reportedly present in the secretome of archaea [13] the SPP PRED-TAT was used to identify TAT substrates in the secretome of P. torridus. PRED-TAT predicts twin-arginine and secretory signal peptides using Hidden Markov Models [18]. Of the 30 proteins, PRED-TAT identified SPs in eight proteins and trans membrane segments in seven proteins. PRED-TAT did not identify any TAT substrates in the secretome of P. torridus. Lipoproteins were also reportedly abundant in the secretome of archaea [5] hence; their presence in the experimental secretome of P. torridus was investigated using the SPP LipoP. Though, lipoproteins are usually attached to the cell membrane, they might also be present in the culture filtrate due to natural shedding [42,43]. LipoP predicted that eight proteins harbored SPs, were transported via the standard Sec/SPI pathway and none of them was a lipoprotein. The predictions by PRED-TAT and LipoP suggest that TAT substrates and lipoproteins might be absent in the secretome of P. torridus. Also, the fact that N-terminal SPs were identified in only a small fraction of the experimental secretome of P. torridus suggests two plausible underlying reasons. Either, the SPPs used in this study were less efficient in identifying archaeal SPs or protein transloction in P. torridus does not take place only via the general SP-dependent, Sec-pathway. Additionally, there might be alternative mechanisms of protein transport in P. torridus like, secreted membrane vesicles as reported earlier in archaeal Sulfolobus species [14].

Conclusion

The information about secreted proteins of archaea is still fragmentary. The present study adds to the slowly growing knowledge base of archaeal secretomes and is the first study about secretome of P. torridus. Under the specific growth conditions which were used in this study, 30 proteins of P. torridus were identified as secreted proteins by LC MS. TAT substrates frequently reported from the secretome of haloarchaea [13] and lipoproteins reportedly abundant in the secretome of P. furious [5] were found to be completely absent in the secretome of P. torridus. The majority of the secreted proteins were predicted to be involved in metabolic pathways. Since, vWA domain containing proteins, were reportedly exported via secreted membrane vesicles in archaeal Sulfolobus species [14] hence, it can be speculated that the hypothetical protein of P. torridus with such domains might also be exported by membrane vesicles. The four SPPs used in this study, PRED-SIGNAL, SignalP, PRED-TAT and LipoP identified N-terminal SPs in a small fraction of the secreted proteins. This indicates that either these four SPPs were insufficient in identifying the N-terminal signal sequences or N- terminal signal sequences might not exist in majority of the secreted proteins of P. torridus. This suggests that there might be alternative mechanisms of protein translocation in P. torridus like, secretory membrane vesicles, as reported for Sulfolobus spp [14]. However, further experiments are required to corroborate our findings. Nevertheless, this preliminary study is expected to provide a useful basis for further studies on protein translocation in this thermoacidophilic archaeon.

Circos plot showing the domains present in the 30 secretory proteins of P. torridus.

(PNG) Click here for additional data file.

P. torridus secretory proteins, PDB BLAST hits with percentage identity.

(DOCX) Click here for additional data file.

Prediction efficacy of various SPPs in identifying SPs in P. torridus secretome identified in three independent experiments.

(DOCX) Click here for additional data file.

P. torridus proteins identified by LC MS in the first experiment.

(XLSX) Click here for additional data file.

P. torridus proteins identified by LC MS in the second experiment.

(XLSX) Click here for additional data file.

P. torridus proteins identified by LC MS in the third experiment.

(XLSX) Click here for additional data file. 20 Apr 2021 PONE-D-21-06360 Efficacy of signal peptide predictors in identifying signal peptides in the experimental secretome of Picrophilous torridus, a thermoacidophilic archaeon PLOS ONE Dear Dr. Goel, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jun 04 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Dinesh Gupta Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ 3. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/A Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: [Major comment] 1. It’s not clear why author predicted the N-terminal sequences and what is the take-home message from that analysis. 2.Results are poorly represented as the Figures. I recommend authors to show proteins domains in the 30 secretome proteins (e.g., stacked over each other) for better visualization, similar to how HMMER/Pfam/Interpro represent. Authors may also use CIRCOS plots to highlight functional domains of 30 proteins as a condensed representation. 3.Similarly, predicted N-terminal regions of interest may also be visualized. 4.Please provide phylogenetic comparison of secretome of P. torridus with known secretome of related archaea. 5.Is there any secreted protein (out of 30) for which 3D structure is available or can be predicted using current methods? A result Table with PDB-blast may help here. 6.Are there any effector proteins predicted? [Minor comments] 1.Line 98: The text describing the tools is not required in “introduction” section. Please move the text to the “method” section. 2.Line 142 and 174: Please explain how UniProtKB ID was assigned to filtrate proteins. Please cite UniProtKB. 3.Line 147: For STRING database, what parameters and sources were used for shortlisting interacting proteins. 4.Line 165: The section “Analysis of N-terminal signal sequences”, needs more details. It’s not clear what authors mean by “analysis” here. 5.Please use first letter in capital, whenever abbreviations are used, e.g., in Table 1 please use Trans Membrane. 6.Table 2. It should be - Protein accession number. Please use proper sentence case in other part of the manuscript too. 7.Line 232: Please use small paragraph heading. I think, name of the tools in heading is not required. 8.Line 237: Please do not write version number of interpro or pfam everywhere in the entire manuscript. Its sufficient to cite version number in method section only. 9.Line 353: Please remove the result hyperlink for PRED-SIGNAL. 10.Line 345: The section “Prediction efficacy of various SP predictors for the experimental secretome of P. torridus” should be in the result section (within the section “Identification of N-terminal signal sequences”), instead of discussion. Also results of this section must be presented as a separate Table or figure, as its too confusing to read and summarize the different variables. 11.A review from native English speaker may be required to fix minor grammatical errors. Reviewer #2: Authors generated the LC MS data of P. torridus and predicted the efficacy of signal peptide prediction software. Thought the work is of interest, following points should be addressed: # Authors should highlight 1-2 metabolic pathways in the abstract. # Authors conclude 3 possibilities in conclusion: “currently available SP prediction programs” SPP are insufficient / inefficient? N-ter SP absent in P. t. Alternative mech. of protein translocation in P. t. So “currently available SP prediction (SPP) programs” -- > “these 4 SP prediction programs” should be used to tone down the conclusion. Authors should include the latest methods e.g. How many software are available. Why did not use latest ones e.g. DeepSig What is rationale of using these 4 software. # Please make suppl. table or excel showing how many SPPs are based on archaea datasets to identify archaea SPs. Provide the very briefly basic details of the methods along with the data sample size especially archaea data used for machine learning model development. # Please provide some quantitative manner to show the “Efficacy of signal peptide predictors in identifying signal peptides”, e.g. %age of proteins found to have SP using 4 software. Additionally, authors should provide for all full 3 sets in supplementary file the prediction score of all 4 software. # Authors should order these SPPs based on some rationale. E.g. year of publish or performance? and be consistent in describing these methods all over in that order only. # on line 353, this thing is counter intuitive. "The supplementary information enlisted in PRED-SIGNAL indicated that 86 proteins of P. torridus were secretory proteins (http://bioinformatics.biol.uoa.gr/PRED-SIGNAL-results/). However, when the 30 experimentally derived secreted proteins were submitted to PRED-SIGNAL it identified signal peptides in only eight proteins, while trans membrane segments were identified in five proteins." Were your 30 sequences among 1535 seq mentioned on that page of signal-pep? If yes, then how only 8 are shown to have SPs? What input was given when you say "when the 30 experimentally derived secreted proteins were submitted to PRED-SIGNAL"? Was the sequence of your 30 proteins and their 30 out of 1535 proteins different or identical? Authors need to explain this section. # line 80, soluble in what? Water or lipid? # For Ref 17, what data was used to make prediction model. It is based on archaeal protein.? # Line 116, is there any reference for such protocol used in this section, authors may cite that. # protein pilot software is not mentioned in the text while suppl. table 1 mentions it. # Why CPU time and rates are missing for other rows in excel sheet Speed and Distribution Analysis in suppl file 1, sheet 1 # Why Global FDR is recommended at 5 and 10% unlike Global FDR fit at 1%? In protein FDR Summary sheet in supp. Tables # Authors should mention version of all software and databases used in the study. Which version of backend database was used for blast. # What is meaning of % Coverage (95) and Peptides (95%) in table 1 should be mentioned. # Authors should sort table 1 based on some criteria, so that it is easy for readers to comprehend. # Line 178: Pfam 32.0 failed to predict any domain in the hypothetical protein Q6L265? Did not find any protein with this name Q6L265. Also, what is the reason for not finding any domain? # Table 1 is least described in the text. Expand it like authors explain table 2. Table 1 is explained rather in discussion. # Table 1 contains 30 proteins while fig 1 contains 34? # Order fig 1 based on counts. # Fig 2, reorder based on counts. Also, the legend title is ‘Function’ for all the panels, it should be corrected. # line 226 "Here, due to the limitations of the Cytoscape software tool, we have shown only one pathway of multi-functional protein(s)". It is not clear. # Heading in lines 232/233 seem not in continuation. Please double check. # Fig 3, Text mentions protein id while figure contains gene ids, need to be consistent for proper interpretation of the figure. Mention about the size of nodes, length/thickness of edges if they mean anything or not. What grey color represents in fig 3. # Line 237. Be consistent in metnioning version name e.g. InterPro 74 and Pfam 32 # line 238 : "InterPro 74 and Pfam 32 could not find any domain, while CDD search predicted presence of a domain of Reo_sigmaC super family". But table 1 shows interpro finds a domain? # Section of text in 235-265 should also be available as an additional table or suppl table # Suppl files 1,2,3 need to be explained properly as they contain multiple sub sheets. Need to explain at least one suppl file with the related content and interpretation. # Line 297/298: “…proteins might have been missed in this study from proteomic identification, due to technical constraints like, detection limit of mass spectrometry.” What is that limit in this study’s experiment.? # English needs to be improved including semantic and grammatical errors e.g. in line 85 sp. Vs spp. artifacts vs technical artefacts, have read and approve -- > have read and approved ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 7 Jul 2021 AUTHORS’ REPLY TO THE REVIEWERS’ COMMENTS Reviewer #1: [Major comment] Comment 1: It’s not clear why author predicted the N-terminal sequences and what is the take-home message from that analysis. Authors’Reply: The aim of the present study was to identify the experimental secretome of P. torridus and evaluate the efficacy of efficacy of four signal peptide predictors (SPPs) - PRED-SIGNAL, SignalP 5.0, PRED-TAT and LipoP 1.0 in identifying SPs in the experimental secretome. It is important to study the microbial secretome because secretory proteins perform a variety of functions like degradation of complex polymeric substances (carbohydrates and proteins), passage of nutrients inside the cell, protection against toxic compounds, signal transduction etc and also helps in adaptation and survival in a particular niche, including thermoacidophilic environment. Comment 2: Results are poorly represented as the Figures. I recommend authors to show proteins domains in the 30 secretome proteins (e.g., stacked over each other) for better visualization, similar to how HMMER/Pfam/Interpro represent. Authors may also use CIRCOS plots to highlight functional domains of 30 proteins as a condensed representation. Authors’ Reply: This point of the Reviewer is well taken. In the revised manuscript, a CIRCOS plot highlighting the functional domains of the 30 proteins as a condensed representation has been included. (please see page 7, line 187 of the revised manuscript and Supplementary Fig.1) Comment 3: Similarly, predicted N-terminal regions of interest may also be visualized. Authors’ Reply: This point of the Reviewer is well taken. The predicted N-terminal regions of the 30 proteins have been shown separately in Table 4. Comment 4: Please provide phylogenetic comparison of secretome of P. torridus with known secretome of related archaea. Authors’ Reply: To the best of our knowledge, secretomes of only five archaea have been discerned. These are - an antartic archaeon Methanococcoides burtonii, hyperthermoacidophilic archaeon Sulfolobus spp., hyperthermophilic archaeon Pyrococcus furiosus and haloarchaea like Haloferax volcanii and Natrinema sp. J7-2. It was not possible to perform phylogenetic comparison of the secretory proteins of P. torridus with the secretory proteins of these archaea because only a few proteins were common between them. Comment 5. Is there any secreted protein (out of 30) for which 3D structure is available or can be predicted using current methods? A result Table with PDB-blast may help here. Authors’ Reply: This point of the Reviewer is well taken. A table showing the PDB-BLAST results of the secreted proteins has been included in the revised manuscript. (please see page 7, line 187-190 of the revised manuscript and Supplementary Table 1 ) Comment 6: Are there any effector proteins predicted? Authors’ Reply: None of the secretory protein of P. torridus was predicted as effector protein. [Minor comments] Comment 1: Line 98: The text describing the tools is not required in “introduction” section. Please move the text to the “method” section. Authors’ Reply: This point of the Reviewer is well taken. The text describing the tools has been moved to the method section. (please see page 6, line 168-177 of the revised manuscript) Comment 2: Line 142 and 174: Please explain how UniProtKB ID was assigned to filtrate proteins. Please cite UniProtKB. Authors’ Reply: The authors have deleted these lines from the revised manuscript because the UniProtKB ID of the secretory proteins were not assigned by us. These were given by the software during peptide identification by LC MS experiments. Comment 3: Line 147: For STRING database, what parameters and sources were used for shortlisting interacting proteins. Authors’ Reply: This point of Reviewer is well taken. The parameters and sources which were used for shortlisting interacting proteins with the STRING database have been included in the revised manuscript. (please see page 6, lines 149-152 of the revised manuscript) Comment 4: Line 165: The section “Analysis of N-terminal signal sequences”, needs more details. It’s not clear what authors mean by “analysis” here. Authors’ Reply: The authors are thankful to the Reviewer for pointing this. In the revised manuscript the word “analysis” has been replaced with “identification” Comment 5: Please use first letter in capital, whenever abbreviations are used, e.g., in Table 1 please use Trans Membrane. Authors’ Reply: This point of Reviewer is well taken. The authors have checked and verified that the first letter in abbreviations is in capitals throughout the manuscript. Comment 6: Table 2. It should be - Protein accession number. Please use proper sentence case in other part of the manuscript too. Authors’ Reply: This point of Reviewer is well taken. The authors have replaced Uniprot accession number with Protein accession number and verified that proper sentence case has been used in other part of the manuscript Comment 7: Line 232: Please use small paragraph heading. I think, name of the tools in heading is not required. Authors’ Reply: This point of Reviewer is well taken. In the revised manuscript, small paragraph headings have been used and the name of the tools has been removed from the headings. Comment 8 : Line 237: Please do not write version number of interpro or pfam everywhere in the entire manuscript. It is sufficient to cite version number in method section only. Authors’ Reply: This point of Reviewer is well taken. In the revised manuscript, version number of interpro or pfam has been included only in the methods section. Comment 9 : Line 353: Please remove the result hyperlink for PRED-SIGNAL. Authors’ Reply: This point of Reviewer is well taken. The result hyperlink for PRED-SIGNAL has been removed in the revised manuscript. Comment 10 : Line 345: The section “Prediction efficacy of various SP predictors for the experimental secretome of P. torridus” should be in the result section (within the section “Identification of N-terminal signal sequences”), instead of discussion. Also results of this section must be presented as a separate Table or figure, as its too confusing to read and summarize the different variables. Authors’ Reply: This point of the Reviewer is well taken. The section Prediction efficacy of various SP predictors for the experimental secretome of P. torridus has been moved to the results section and some of the lines have been re-written to enable easy understanding of the different variables. Also, the results of this section have been presented in a separate Table. (please see page 8-9, lines 261-281 of the revised manuscript and Table 4 ) Comment 11: A review from native English speaker may be required to fix minor grammatical errors. Authors’ Reply: This point of the Reviewer is well taken. The authors have taken the help from a senior colleague to fix the minor grammatical errors. Reviewer #2: Authors generated the LC MS data of P. torridus and predicted the efficacy of signal peptide prediction software. Thought the work is of interest, following points should be addressed: Comment 1: # Authors should highlight 1-2 metabolic pathways in the abstract. Authors’Reply: This point of the Reviewer is well taken. A metabolic pathway has been highlighted in the abstract section. (please see page 1, lines 47-51 of the revised manuscript). Comment 2: # Authors conclude 3 possibilities in conclusion: “currently available SP prediction programs” SPP are insufficient / inefficient? N-ter SP absent in P. t. Alternative mech. of protein translocation in P. t. So “currently available SP prediction (SPP) programs” -- > “these 4 SP prediction programs” should be used to tone down the conclusion. Authors’ Reply: This point of the Reviewer is well taken. The above mentioned lines have been modified as suggested by the Reviewer. (please see page 1, lines 53-55 of the revised manuscript) Comment 3: Authors should include the latest methods e.g. How many software are available. Why did not use latest ones e.g. DeepSig. What is rationale of using these 4 software. Authors’ Reply: This point of the Reviewer is well taken. In the revised manuscript the information regarding the available SPP programs and the rationale of using these four SPP programs has been included. (please see page 4, lines 101-109 of the revised manuscript) Comment 4: # Please make suppl. table or excel showing how many SPPs are based on archaea datasets to identify archaea SPs. Provide the very briefly basic details of the methods along with the data sample size especially archaea data used for machine learning model development. Authors’ Reply: The authors would like to present that to the best of our knowledge, only two SPPs, PRED-SIGNAL and SignalP5 are based on archaeal datsets. The basic details of these these SPPs are included in the revised manuscript. (please see page 6, lines 170-177 of the revised manuscript) Comment 5: # Please provide some quantitative manner to show the “Efficacy of signal peptide predictors in identifying signal peptides”, e.g. %age of proteins found to have SP using 4 software .Additionally, authors should provide for all full 3 sets in supplementary file the prediction score of all 4 software. Authors’ Reply: This point of the Reviewer is well taken. The efficacy of SPPs has been presented in a quantitative manner. Also, the prediction scores of each SPP for full 3 sets has been presented as a supplementary file. (please see page 9, lines 261-264 and supplementary Table 2 of the revised manuscript) Comment 6: # Authors should order these SPPs based on some rationale. E.g. year of publish or performance? and be consistent in describing these methods all over in that order only. Authors’ Reply: This point of the Reviewer is well taken. The SPPs have been described in a consistent order throughout the manuscript. Comment 7: # on line 353, this thing is counter intuitive. "The supplementary information enlisted in PRED-SIGNAL indicated that 86 proteins of P. torridus were secretory proteins (http://bioinformatics.biol.uoa.gr/PRED-SIGNAL-results/). However, when the 30 experimentally derived secreted proteins were submitted to PRED-SIGNAL it identified signal peptides in only eight proteins, while trans membrane segments were identified in five proteins." Were your 30 sequences among 1535 seq mentioned on that page of signal-pep? If yes, then how only 8 are shown to have SPs? What input was given when you say "when the 30 experimentally derived secreted proteins were submitted to PRED-SIGNAL"? Was the sequence of your 30 proteins and their 30 out of 1535 proteins different or identical? Authors need to explain this section. Authors’ Reply: This point of the Reviewer is well taken. This section has been re-written in the revised manuscript. Hope it is clear and easy to understand. (please see page 12, lines 337-345 of the revised manuscript) Comment 8: # line 80, soluble in what? Water or lipid? Authors’ Reply: souble in water Comment 9: # For Ref 17, what data was used to make prediction model. It is based on archaeal protein.? Authors’ Reply: The PRED-SIGNAL was based on archaeal secretory proteins. Comment 10: # Line 116, is there any reference for such protocol used in this section, authors may cite that. Authors’ Reply: No reference is available for citing with protocol used for concentration of culture filtrate proteins. Comment 11: # protein pilot software is not mentioned in the text while suppl. table 1 mentions it. Authors’ Reply: The authors are thankful to the Reviewer for pointing this mistake. The authors have re-written these lines in the revised manuscript. (please see page 5, lines 139-140 of the revised manuscript) Comment 12: # Why CPU time and rates are missing for other rows in excel sheet Speed and Distribution Analysis in suppl file 1, sheet 1 Authors’ Reply: The authors would like to mention that the supplementary files containing the LC MS raw data were huge and it was difficult to upload them. During reduction of their size, the sheets containing the Speed and Distribution Analysis might have been missed. We have again uploaded the suppl files containing all the missing data. Comment 13: # Why Global FDR is recommended at 5 and 10% unlike Global FDR fit at 1%? In protein FDR Summary sheet in supp. Tables Authors’ Reply: The authors are thankful to the Reviewer for pointing this mistake. The global FDR fit that was used was 1% as was shown in suppl Tables. The error has been rectified in the revised manuscript. (please see page 5, line 141 of the revised manuscript) Comment 14: # Authors should mention version of all software and databases used in the study. Which version of backend database was used for blast. Authors’ Reply:This point of the Reviewer is well taken. In the revised manuscript, the version of all the software and databases have been mentioned. Comment 15: # What is meaning of % Coverage (95) and Peptides (95%) in table 1 should be mentioned. Authors’ Reply: Since, the % Coverage (95) and Peptides (95%) is already mentioned in Supplementary file, it has been removed from Table 1. Comment 16: # Authors should sort table 1 based on some criteria, so that it is easy for readers to comprehend. Authors’ Reply: This point of the Reviewer is well taken. The Table 1 contains only the details of the 30 proteins identified by LC MS. The information regarding the SPPs has been removed from these table and presented in a separate table, Table 4. Comment 17: # Line 178: Pfam 32.0 failed to predict any domain in the hypothetical protein Q6L265? Did not find any protein with this name Q6L265. Also, what is the reason for not finding any domain? Authors’ Reply: The authors would like to bring to your kind notice that the mention that the protein accession no. is Q6L2C5 and not Q6L265. The authors feel that the database of Pfam 32.0 probably did not contain the information about the domains in this protein may be that’s why they did not predict any domains in this protein. Comment 18: # Table 1 is least described in the text. Expand it like authors explain table 2. Table 1 is explained rather in discussion. Authors’ Reply: This point of the Reviewer is well taken. As suggested by the Reviewer, the authors have described the Table 1 in detail. (please see page 7, lines 183-190 of the revised manuscript) Comment 19: Table 1 contains 30 proteins while fig 1 contains 34? Authors’ Reply: The authors would like to mention that some secretory proteins were multifunctional (like malate dehydrogenase), thus they have been represented more than once. Comment 20: # Order fig 1 based on counts. Authors’ Reply: This point of the Reviewer is well taken. The Fig. 1 has been reordered based on the counts. Comment 21: # Fig 2, reorder based on counts. Also, the legend title is ‘Function’ for all the panels, it should be corrected. Authors’ Reply: This point of the Reviewer is well taken. The Fig. 2 has been reordered based on the counts.The legend tile has also been corrected. Comment 22: # line 226 "Here, due to the limitations of the Cytoscape software tool, we have shown only one pathway of multi-functional protein(s)". It is not clear. Authors’ Reply: These lines have been re-written in the revised manuscript. (please see page 9, lines 251-255 of the revised manuscript) Comment 23: # Heading in lines 232/233 seem not in continuation. Please double check. Authors’ Reply: The authors have corrected the headings in the revised manuscript. Comment 24: # Fig 3, Text mentions protein id while figure contains gene ids, need to be consistent for proper interpretation of the figure. Mention about the size of nodes, length/thickness of edges if they mean anything or not. Authors’ Reply: This point of the Reviewer is well taken. For proper interpretation of the figure the gene ids and the protein ids of the all the proteins have been included in Table 1. Also the meaning of the various symbols have been described in detail. (please see page 9, lines 251-255 of the revised manuscript) Comment 25: # Line 237. Be consistent in metnioning version name e.g. InterPro 74 and Pfam 32 Authors’ Reply: This point of the Reviewer is well taken. In the revised manuscript, The authors have ensured consistency in the version names of InterPro 74 and Pfam 32. Comment 26: # line 238 : "InterPro 74 and Pfam 32 could not find any domain, while CDD search predicted presence of a domain of Reo_sigmaC super family". But table 1 shows interpro finds a domain? Authors’ Reply: The authors are thankful to the Reviewer for pointing this mistake. This mistake has been corrected in the revised manuscript. (please see page 7, lines 195-197 of the revised manuscript) Comment 27: # Section of text in 235-265 should also be available as an additional table or suppl table Authors’ Reply: Authors would like to present that detailed information regarding the text presented in these lines is separately presented as Tables 1 and 2. Comment 28: # Suppl files 1,2,3 need to be explained properly as they contain multiple sub sheets. Need to explain at least one suppl file with the related content and interpretation. Authors’ Reply: Authors would like to present that information contained in Suppl files 1,2,3 follows the common jagaron used in LC MS experiments. Comment 29: # Line 297/298: “…proteins might have been missed in this study from proteomic identification, due to technical constraints like, detection limit of mass spectrometry.” What is that limit in this study’s experiment.? Authors’ Reply: The limit in this study’s experiment is the detection limit of LC MS which is in picograms. Comment 30: # English needs to be improved including semantic and grammatical errors e.g. in line 85 sp. Vs spp. artifacts vs technical artefacts, have read and approve -- > have read and approved Authors’ Reply: This point of the Reviewer is well taken. In the revised manuscript being submitted, the authors have tried their best to remove all the semantic and grammatical errors. ________________________________________ Submitted filename: Response to Reviewers.docx Click here for additional data file. 26 Jul 2021 Efficacy of signal peptide predictors in identifying signal peptides in the experimental secretome of Picrophilous torridus, a thermoacidophilic archaeon PONE-D-21-06360R1 Dear Dr. Goel, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Dinesh Gupta Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/A ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No 29 Jul 2021 PONE-D-21-06360R1 Efficacy of signal peptide predictors in identifying signal peptides in the experimental secretome of Picrophilous torridus, a thermoacidophilic archaeon Dear Dr. Goel: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Dinesh Gupta Academic Editor PLOS ONE

Table 1

Details of the 30 secretory proteins identified in the secretome of P. torridus by LC MS and the signal peptides predicted by various signal peptide predictors.

S. No.	Protein accession number	Protein name/function	Gene name
1	Q6L2C5	Hypothetical membrane associated protein	PTO0292
2	Q6KZS2	Thermosome subunit/protein folding	PTO1195
3	Q6L182	Oligopeptide ABC transporter Opp1/transmenbrane protein	PTO0685
4	Q6KZF2	Glutamate dehydrogenase/aminoacid metabolism	PTO1315
5	Q6L2N6	Extracellular solute-binding protein/membrane protein	PTO0181
6	Q6L2M0	Quinoprotein dehydrogenase/membrane protein	PTO0197
7	Q6L202	Elongation factor 1-alpha (EF-1-alpha) (Elongation factor Tu) (EF-Tu)/Protein biosynthesis	PTO0415
8	Q6L0B7	2-oxoglutarate synthase, alpha chain (EC 1.2.7.3)	PTO1000
9	Q6L0Y1	Oligosaccharyl transferase STT3 subunit	PTO0786
10	Q6KZA7	Pyruvate ferredoxin oxidoreductase, alpha chain/pyruvate synthesis	PTO1360
11	Q6L2C8	Uncharacterized protein	PTO0289
12	Q6L248	Glutaredoxin related protein/electron transfer	PTO0369
13	Q6L0C3	Malate dehydrogenase/carbohydrate metabolism	PTO0994
14	Q6KZG4	Hypothetical exported protein	PTO1303
15	Q6L140	Peroxiredoxin 2/peroxidase activity	PTO0727
16	Q6L2N0	Membrane associated serine protease	PTO0187
17	Q6KZB9	Hypothetical membrane associated protein	PTO1348
18	Q6L1G3	Hypothetical exported protein	PTO0604
19	Q6KZE9	Iron(III) dicitrate ABC transporter extracellular binding protein/integral component of membrane	PTO1318
20	Q6L1T2	D-gluconate/D-galactonate dehydratase/D-gluconate catabolic process	PTO0485
21	Q6KZT9	ABC transporter extracellular solute-binding protein/membrane component	PTO1178
22	Q6L2S5	Uncharacterized protein	PTO0142
23	Q6L2L9	Uncharacterized protein	PTO0198
24	Q6L268	Hypothetical membrane protein	PTO0349
25	Q6L1Y4	Uncharacterized protein	PTO0433
26	Q6L0M9	CBS domain containing protein	PTO0888
27	Q6L081	Sugar ABC transporter 1/extracellular binding protein	PTO1036
28	Q6KZK5	Uncharacterized protein	PTO1262
29	Q6L0W3	Proteasome subunit alpha (Proteasome core protein)/protein degradation	PTO0804
30	Q6L1B1	50S ribosomal protein L6/translation	PTO0656

Table 2

Information about domains in the hypothetical/uncharacterized proteins in P. torridus secretome discerned using InterPro 74, Conserved Domain Database (CDD) and Pfam 32.

S.No.	Protein accession number	Gene name	InterPro 74	Conserved Domain Database	Pfam 32
1	Q6L2C5	PTO0292	Protein of unknown function DUF929 (IPR009272)	Reo_sigmaC superfamily	No result
2	Q6L2C8	PTO0289	von Willebrand factor, type A (IPR002035), archaellum regulatory network B, C-terminal (IPR040929)	vWFA superfamily (cl00057), YfbK (COG2304)	von Willebrand factor type A domain (PF00092), archaellum regulatory network B, C-terminal (PF18677)
3	Q6KZG4	PTO1303	IPR009272 (protein of unknown function DUF929)	DUF929 (pfam06053)	Domain of unknown function (PF06053)
4	Q6KZB9	PTO1348	No result	DUF929 (pfam06053)	Domain of unknown function (PF06053)
5	Q6L1G3	PTO0604	CARDB domain (IPR011635), Ig-like_fold (IPR013783)	CARDB superfamily (cl22904)	CARDB (PF07705)
6	Q6L2S5	PTO0142	DsrEFH-like (IPR027396)	DrsE superfamily (cl00672)	No result
7	Q6L2L9	PTO0198	Uncharacterized conserved protein UCP037373, transcriptional regulator, AF0674 (IPR017185), Winged helix-like DNA-binding domain superfamily (IPR036388)	COG4738 superfamily (cl01956)	No result
8	Q6L268	PTO0349	No result	No result	No result
9	Q6L1Y4	PTO0433	Winged helix-like DNA-binding domain superfamily (IPR036388)	pheS superfamily (cl30524)	No result
10	Q6KZK5	PTO1262	No result	No result	No result

Table 3

Details of P. torridus secretory proteins involved in the various KEGG pathways.

S.No.	KEGG pathway	Protein accession number
1.	Carbon metabolism	Q6L1T2, Q6L0C3, Q6L0B7, Q6KZF2, Q6KZA7
2.	Microbial metabolism in diverse environments	Q6L1T2, Q6L0C3, Q6L0B7, Q6KZF2, Q6KZA7
3.	Biosynthesis of antibiotics	Q6L0C3, Q6L0B7, Q6KZA7
4.	Biosynthesis of secondary metabolites	Q6L0C3, Q6L0B7, Q6KZA7
5.	Pyruvate metabolism	Q6L0C3, Q6L0B7, Q6KZA7
6.	Citrate cycle	Q6L0C3, Q6L0B7, Q6KZA7
7.	Carbon fixation pathways in prokaryotes	Q6L0C3, Q6L0B7, Q6KZA7
8.	Butanoate metabolism	Q6L0B7, Q6KZA7
9.	Glycolysis/Gluconeogenesis	Q6L0B7, Q6KZA7
10.	Galactose metabolism	Q6L1T2
11.	Cysteine and methionine metabolism	Q6L0C3
12.	Pentose phosphate pathway	Q6L1T2
13.	D-Glutamine and D-glutamate metabolism	Q6KZF2
14.	ABC transporters	Q6L081
15.	Glyoxylate and dicarboxylate metabolism	Q6L0C3
16.	Proteasome	Q6L0W3
17.	Arginine biosynthesis	Q6KZF2
18.	Alanine aspartate and glutamate metabolism	Q6KZF2
19.	Ribosome	Q6L1B1
20.	Nitrogen metabolism	Q6KZF2
21.	Methane metabolism	Q6L0C3

42 in total

1. The Gene Ontology Annotation (GOA) Database--an integrated resource of GO annotations to the UniProt Knowledgebase.

Authors: Evelyn Camon; Daniel Barrell; Vivian Lee; Emily Dimmer; Rolf Apweiler
Journal: In Silico Biol Date: 2003-12-01

Review 2. Moonlighting proteins: an intriguing mode of multitasking.

Authors: Daphne H E W Huberts; Ida J van der Klei
Journal: Biochim Biophys Acta Date: 2010-02-06

3. SignalP 5.0 improves signal peptide predictions using deep neural networks.

Authors: José Juan Almagro Armenteros; Konstantinos D Tsirigos; Casper Kaae Sønderby; Thomas Nordahl Petersen; Ole Winther; Søren Brunak; Gunnar von Heijne; Henrik Nielsen
Journal: Nat Biotechnol Date: 2019-02-18 Impact factor: 54.908

Review 4. Structure and physiological function of calpains.

Authors: H Sorimachi; S Ishiura; K Suzuki
Journal: Biochem J Date: 1997-12-15 Impact factor: 3.857

5. Prediction of signal peptides in archaea.

Authors: P G Bagos; K D Tsirigos; S K Plessas; T D Liakopoulos; S J Hamodrakas
Journal: Protein Eng Des Sel Date: 2008-11-06 Impact factor: 1.650

6. Sirohaem sulfite reductase and other proteins encoded by genes at the dsr locus of Chromatium vinosum are involved in the oxidation of intracellular sulfur.

Authors: Andrea S Pott; Christiane Dahl
Journal: Microbiology (Reading) Date: 1998-07 Impact factor: 2.777