| Literature DB >> 24349412 |
Eamonn P Culligan1, Roy D Sleator2, Julian R Marchesi3, Colin Hill1.
Abstract
Functional environmental screening of metagenomic libraries is a powerful means to identify and assign function to novel genes and their encoded proteins without any prior sequence knowledge. In the current study we describe the identification and subsequent analysis of a salt-tolerant clone from a human gut metagenomic library. Following transposon mutagenesis we identified an unknown gene (stlA, for "salt tolerance locus A") with no current known homologues in the databases. Subsequent cloning and expression in Escherichia coli MKH13 revealed that stlA confers a salt tolerance phenotype in its surrogate host. Furthermore, a detailed in silico analysis was also conducted to gain additional information on the properties of the encoded StlA protein. The stlA gene is rare when searched against human metagenome datasets such as MetaHit and the Human Microbiome Project and represents a novel and unique salt tolerance determinant which appears to be found exclusively in the human gut environment.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24349412 PMCID: PMC3861447 DOI: 10.1371/journal.pone.0082985
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
List of putative proteins encoded on SMG 25 fosmid insert.
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| 1 | 555 | Serine/ threonine protein kinase |
| 7.00E-53 | 50% | 44% (285) | None |
| 2 | 108 | Hypothetical protein (Amuc_1368) |
| 1.00E-04 | 86% | 34% (98) | None |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 | 201 | Hypothetical protein O71_18246 |
| 2.00E-06 | 36% | 39% (77) | DUF4339 |
| 9 | 153 | Hypothetical protein HALAR_0188 | Halophilic archaeon DL31 | 1.00E-04 | 53% | 36% (83) | TM2 |
| 10 | 320 | Ankyrin repeat protein |
| 2.00E-47 | 88% | 45% (291) | Ankyrin repeat |
| 11 | 73 | Uncharacterized protein BN502_01474 |
| 3.00E-04 | 54% | 50% (40) | None |
|
|
|
|
|
|
|
|
|
| 13 | 283 | Uncharacterised protein BN502_01467 |
| 0.00+00 | 100% | 94% (283) | DUF932 |
| 14 | 99 | Uncharacterized protein BN502_01466 |
| 1.00E-59 | 100% | 94% (99) | None |
| 15 | 52 | Uncharacterized protein BN502_01465 |
| 2.00E-22 | 100% | 98% (52) | None |
| 16 | 160 | Uncharacterized protein BN502_01464 |
| 2.00E-99 | 100% | 91% (160) | None |
| 17 | 43 | Uncharacterized protein BN502_01463 |
| 4.00E-04 | 95% | 90% (41) | None |
| 18 | 317 | Hypothetical protein (Amuc_1352 ) |
| 9.00E-08 | 31% | 40% (101) | None |
|
|
|
|
|
|
|
|
|
| 20 | 159 | Phage-associated protein |
| 3.00E-36 | 100% | 52% (159) | DUF4065, GepA |
| 21 | 264 | Hypothetical protein EC2865200_1013 |
| 5.00E-26 | 67% | 45% (181) | None |
| 22 | 154 | Uncharacterized protein BN502_01474 |
| 2.00E-31 | 72% | 56% (114) | None |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 | 52 | Hypothetical protein HALA3H3_770002 |
| 2.00E-04 | 73% | 63% (38) | None |
| 27 | 657 | H(+)-transporting two-sector ATPase |
| 0.00E+00 | 88% | 92% (584) | TrkH superfamily |
| 28 | 445 | MATE efflux family protein (Amuc_1131) |
| 0.00E+00 | 95% | 87% (445) | MATE, NorM |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 | 278 | Putative uncharacterized protein |
| 8.00E-56 | 100% | 70% (279) | None |
| 32 | 87 | Putative uncharacterized protein |
| 3.00E-31 | 100% | 83% (87) | None |
| 33 | 186 | Hypothetical protein (Amuc_1127) |
| 2.00E-61 | 81% | 83% (153) | None |
| 34 | 450 | Dimethyladenosine transferase |
| 0.00E+00 | 99% | 91% (448) | ksgA, NUDIX hydrolase |
| 35 | 393 | UDP-galactopyranose mutase |
| 7.00E-109 | 96% | 52% (381) | GLF, NAD binding |
| 36 | 329 | UDP-glucose 4-epimerase |
| 0.00E+00 | 100% | 96% (329) | UDP_G4E_1_SDR_e |
| 37 | 55 | Hypothetical protein (Amuc_1123) |
| 6.40E-03 | 85% | 43% (39) | None |
| 38 | 511 | Hypothetical protein (Amuc_1124) |
| 0.00E+00 | 98% | 86% (504) | Isoprenoid_C2_like |
| 39 | 144 | Sulphate transporter/anti-sigma factor antagonist |
| 4.00E-89 | 100% | 90% (144) | STAS superfamily |
| 40 | 453 | Putative uncharacterized protein |
| 3.00E-180 | 99% | 69% (454) | DUF2851 |
| 41 | 466 | Glutamate decarboxylase |
| 0.00E+00 | 100% | 91% (466) | AAT_I superfamily |
| 42 | 1217 | Outer membrane auto-transporter protein |
| 3.00E-96 | 100% | 83% (1217) | Auto-transporter superfamily |
| 43 | 142 | Hypothetical protein ( |
| 4.00E-55 | 99% | 69% (141) | NAT_SF domain |
| 44 | 132 | GCN5-related N-acetyltransferase |
| 8.00E-56 | 97% | 81% (129) | NAF_SF domain |
| 45 | 947 | DNA polymerase III, alpha subunit |
| 0.00E+00 | 100% | 95% (938) | DNA_polymerase_III |
Abbreviations and symbols: aa (amino acids); n/a (not applicable); %ID (% identity at amino acid level); DUF (Domain of Unknown Function); OM (outer membrane); Asterisk (*) indicates stlA gene product. Text in bold indicates that no homologues for these gene products were found following BLAST searches of NCBI database.
Bioinformatic analysis of StlA protein sequence.
|
|
|
|
|
|---|---|---|---|
|
| Allows the computation of various physical and chemical parameters for a given protein stored in Swiss-Prot or TrEMBL or for a user entered sequence | Molecular weight = 28.62 kDa; Theoretical pI = 6.39 | [ |
|
| Protein annotation resource that consists of a collection of well-annotated multiple sequence alignment models for ancient domains and full-length proteins. | No conserved domains were detected | [ |
|
| Consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them | No motifs were detected | [ |
|
| Predicts the presence and location of signal peptide cleavage sites in amino acid sequences from different organisms | Predicted signal peptide at position 1-35 | [ |
|
| Searches sequence databases for homologs of protein sequences, and for making protein sequence alignments | Predicted signal peptide at position 1-35; Four predicted TM regions and one disorder region | [ |
|
| Prediction of transmembrane (TM) helices in proteins | Predicted four TM regions | [ |
|
| Homology detection & structure prediction by HMM-HMM (Hidden Markov Model) comparison | Detected outer membrane insertion C-terminal signal, OmpP85 | [ |
|
| Prediction of bacterial promoters | -10 box predicted 56 base pairs upstream of ATG start codon; |
|
|
| Secondary structure prediction | Alpha-helix; 166/257 residues = 64.6%; Extended strand; 26/257 residues = 10.1%; Beta-turn; 18/257 residues = 7.00%; Random coil; 47/257 residues = 18.3% | [ |
|
| Automated protein structure homology-modelling server | No similar or suitable template structures found | [ |
|
| Protein structure and function predictions. 3D models are built based on multiple threading alignments | All 5 predicted 3D models had a C-score of -3.41 or less which are below the -1.50 threshold for a high-confidence prediction of structure | [ |
|
| Algorithm for | Of the 10 predicted 3D models, the top template modelling (TM) score was 0.342 ±0.083, which is below the threshold of TM-score >0.50 for predicted correct fold | [ |
Figure 1Growth experiments in NaCl and KCl.
(A) Growth of E. coli EPI300::pCC1FOS and clone SMG 25 in LB broth supplemented with 6.5% sodium chloride (NaCl) (P <0.0001). (B) Growth in LB broth and LB broth supplemented with 3% NaCl (P =0.0470), (C) 4% NaCl (P <0.0001) and (D) 4% KCl (P <0.0001). P-values were determined using the student t-test (unpaired). Results are presented as the average of triplicate experiments, with error bars being representative of the standard error of the mean (SEM).
Figure 2Bioinformatic analysis of SMG 25 fosmid insert.
(A) FFAS03 analysis of the StlA protein and the encoded proteins of flanking genes was performed to identify putative distant structural homologues. A score of -9.50 or lower is considered significant. (B) Representation of the G+C skew of the entire fosmid insert of SMG 25. (C) Representation of the gene arrangement on SMG 25. Gene lengths are approximately to scale and colour coding represents G+C content of each individual gene which can be determined from the G+C content gradient bar. The presence of a phage-associated gene and clear separation in G+C content over the length of the fosmid insert indicates much of this region may have been acquired via lateral gene transfer (LGT). Phage-associated gene is marked “P”, while the stlA gene is indicated with an asterisk (*) symbol. Genes are numbered as indicated in Table 1 and as mentioned in the text. Numbering of some shorter genes has been excluded for clarity. Selected nucleotide positions (in base pairs) are displayed in bold italic font above genes. (D) A detailed view is presented of the nucleotide and amino acid sequence of the stlA gene and StlA protein respectively. The putative start codon is in green, while a 250 base-pair region upstream of this is shown to include putative -35 and -10 promoter regions (underlined) and a predicted rpoD transcription factor binding site (in bold). Amino acids surrounded by grey box indicate the predicted signal sequence of StlA and those highlighted in blue represent four transmembrane regions. The location of the EZTn5 transposon insertion is indicated with a red triangle.
Gene, scaffold and subject information from which stlA homologues were found in Human Microbiome Project (HMP) dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| *159753524 (2) | SRS053214_LANL_scaffold_17021__gene_42707 | - | 25802 | 26569 | 768 | 255 | 59% (237) | hypothetical protein | 33560 | 0.5 |
| *159753524 (3) | SRS077730_LANL_scaffold_24345__gene_72567 | + | 2793 | 3560 | 768 | 255 | 59% (237) | membrane protein | 13529 | 0.49 |
|
| SRS015217_WUGC_scaffold_30292__gene_65222 | - | 463 | 1236 | 774 | 257 | 82% (237) | membrane protein | 5672 | 0.5 |
|
| SRS051882_Baylor_scaffold_22757__gene_50812 | - | 1791 | 2564 | 774 | 257 | 82% (237) | membrane protein | 7074 | 0.49 |
| 160643649 (1) | C2121591__gene_151559 | + | 1333 | 2157 | 825 | 274 | 89% (234) | membrane protein | 5507 | 0.48 |
| 158944319 (1) | C3406971__gene_199744 | - | 248 | 1072 | 825 | 274 | 80% (234) | membrane protein | 3122 | 0.52 |
| 159591683 (2) | SRS024549_LANL_scaffold_1815__gene_4559 | - | 4475 | 5248 | 774 | 257 | 82% (237) | membrane protein | 10434 | 0.5 |
| 158337416 (2) | C2998990__gene_162710 | + | 340 | 972 | 633 | 211 | 81% (211) | hypothetical protein | 974 | 0.45 |
| 765013792 (1) | SRS018656_WUGC_scaffold_544__gene_591 | - | 13762 | 14535 | 774 | 257 | 83% (237) | membrane protein | 26364 | 0.51 |
| 159510762 (2) | SRS024075_LANL_scaffold_21370__gene_63545 | - | 21610 | 22320 | 711 | 236 | 82% (216) | hypothetical protein | 35617 | 0.5 |
Information for stlA homologues found in HMP dataset, including scaffold and subject of origin. Symbols (* and ǂ) indicate detection of stlA homologue more than once from same subject. The StlA amino acid sequence was used to search against all 748 metagenome datasets from different body sites from the Human Microbiome Project (HMP) (1e-50 maximum e-value cut-off). Ten StlA homologues were identified in 8 different subjects. No StlA homologues were found in any other body site metagenome.